CatKinKitKat Blog | Avro vs Protobuf: Picking Your Serialization Battle

Every team using Kafka eventually has the serialization debate. JSON is easy but wasteful. Binary formats are efficient but harder to work with. And then the real argument starts: Avro or Protobuf?

I've used both in production. They're both fine. But "both fine" isn't helpful when you need to pick one, so let me give you the actual differences that matter.

The Core Difference

Avro and Protobuf both provide schema-based binary serialization. The fundamental philosophical difference is where the schema lives at read time.

Avro requires the schema to deserialize data. The reader needs the writer's schema (or a compatible one) to make sense of the bytes. Schemas are typically stored in a schema registry, and the serialized payload includes a schema ID reference.

Protobuf encodes field numbers in the payload itself. Each field is tagged with its number, so the reader can decode any field it knows about and skip ones it doesn't. You don't strictly need the full schema at read time - just the parts you care about.

This difference drives most of the practical trade-offs.

Schema Evolution

Both support schema evolution, but the mechanics differ.

Avro Evolution

Avro's compatibility is governed by the schema registry's compatibility mode (BACKWARD, FORWARD, FULL, NONE). The key rules:

Adding a field: Must have a default value (for backward compatibility with old readers)
Removing a field: The removed field must have had a default value (for forward compatibility with old writers)
Renaming a field: Use aliases. The old name stays in the schema as an alias.
Changing a field type: Limited. You can promote int to long, float to double. Anything else breaks compatibility.

{
  "type": "record",
  "name": "Order",
  "fields": [
    {"name": "orderId", "type": "string"},
    {"name": "amount", "type": "double"},
    {"name": "currency", "type": "string", "default": "EUR"}
  ]
}

Adding currency with a default of "EUR" is backward compatible. Old consumers that don't know about currency will ignore it. New consumers reading old messages without currency will get the default.

Protobuf Evolution

Protobuf's evolution is based on field numbers. The rules:

Adding a field: Just add it with a new field number. Old readers skip unknown fields.
Removing a field: Mark it as reserved so the number isn't accidentally reused. Old readers still have the field definition and handle missing data.
Renaming a field: Free. The wire format uses numbers, not names. Renaming a field in the .proto file doesn't affect serialized data at all.
Changing a field type: Some compatible changes work (int32 to int64), most don't.

message Order {
  string order_id = 1;
  double amount = 2;
  string currency = 3; // new field, old readers just ignore it
  reserved 4; // field 4 was removed, never reuse this number
}

Protobuf's field-number approach is arguably more forgiving for schema evolution. You don't need a schema registry to enforce compatibility - the wire format handles it naturally. That said, a schema registry still adds value for documentation, validation, and governance.

Performance

Let's talk numbers. In our benchmarks (Java, typical order-processing messages of 200-500 bytes):

Serialization speed: Protobuf is 2-3x faster than Avro for serialization. Protobuf's generated code is highly optimized; Avro's reflection-based approach (even with specific record classes) has more overhead.
Deserialization speed: Similar story. Protobuf's generated deserializer is faster.
Payload size: Roughly comparable. Avro is sometimes slightly smaller because it doesn't include field tags in the payload (the schema handles field identification). Protobuf includes varint-encoded field numbers. The difference is typically 5-15% and rarely matters in practice.
Schema resolution overhead: Avro pays a cost for schema resolution at deserialization time (looking up the writer schema from the registry, resolving it against the reader schema). Protobuf doesn't have this overhead.

For most Kafka workloads, the performance difference between Avro and Protobuf is irrelevant. Your bottleneck is the network, the broker disk, or your processing logic - not serialization. If you're processing millions of messages per second and every microsecond counts, Protobuf has an edge. For everything else, pick based on ecosystem fit.

Schema Registry

Confluent Schema Registry supports both Avro and Protobuf (and JSON Schema). The workflow is the same: register schemas, assign compatibility rules, and let the serializer/deserializer handle schema resolution.

With Avro, the schema registry is practically mandatory. The Avro deserializer needs the writer's schema, and embedding the full schema in every message would be absurd. The registry stores schemas and the serializer embeds a compact schema ID in each message.

With Protobuf, the schema registry is optional but recommended. You can deserialize Protobuf without it (the generated code has the schema baked in), but the registry provides compatibility enforcement, versioning, and a central catalog of your data contracts.

// Avro producer config
props.put("value.serializer", KafkaAvroSerializer.class);
props.put("schema.registry.url", "http://schema-registry:8081");

// Protobuf producer config
props.put("value.serializer", KafkaProtobufSerializer.class);
props.put("schema.registry.url", "http://schema-registry:8081");

Same config pattern, different serializer class. The schema registry abstraction makes them interchangeable from a producer/consumer configuration standpoint.

Ecosystem and Tooling

Avro

First-class citizen in the Kafka ecosystem (Confluent's default)
Excellent integration with Kafka Connect and Debezium
ksqlDB and Kafka Streams have native Avro support
Hadoop/Spark/data engineering ecosystem is heavily Avro-oriented
The Avro IDL (.avdl) format is nicer than raw JSON schemas

Protobuf

First-class citizen in the gRPC ecosystem
Excellent code generation across many languages (better than Avro for polyglot teams)
Growing Kafka ecosystem support (schema registry, Kafka Connect)
Better IDE support for .proto files than for Avro schemas
More widely used outside the JVM ecosystem

When to Use Each

Use Avro when:

You're all-in on the Confluent ecosystem (Schema Registry, ksqlDB, Kafka Connect)
Your data pipeline feeds into Hadoop/Spark/data lake infrastructure
Your team is primarily JVM-based
You need strong schema registry integration with compatibility enforcement

Use Protobuf when:

You have a polyglot service architecture (Protobuf code generation is better for Go, Python, C++)
You're already using gRPC and want a consistent serialization format
Serialization performance is critical
You want simpler schema evolution without mandatory registry compatibility checks

Use JSON Schema when:

You want your messages to be human-readable during development and debugging
You're doing light integration work where performance isn't critical
Your team is already comfortable with JSON and the performance penalty is acceptable

My Recommendation

For a Java/Spring Boot team working primarily within the Kafka ecosystem, Avro is the path of least resistance. The tooling is mature, the integration is seamless, and the schema registry workflow is well-established.

For teams building polyglot microservices where Kafka is one of several communication mechanisms (alongside gRPC, for instance), Protobuf gives you one serialization format across all channels.

Either way, the important thing is to pick one and standardize. A codebase with half the topics using Avro and half using Protobuf is worse than either choice on its own.