CatKinKitKat Blog | Testing Kafka-Based Systems: It's Harder Than You Think

Testing event-driven systems is one of those things that teams perpetually under-invest in. The unit tests cover the business logic. Maybe there's an integration test that spins up an embedded Kafka. But the interactions between services - the contracts, the timing, the failure modes - those are usually "tested in staging." Which means they're tested in production with extra steps.

I've been burned by this enough times to have opinions. Here are the testing strategies that actually work.

Unit Testing: The Foundation (That Everyone Skips)

Before reaching for Testcontainers, make sure your business logic is testable in isolation. This means separating your Kafka listener from your processing logic.

Bad:

@KafkaListener(topics = "orders")
public void handleOrder(Order order) {
    // 50 lines of business logic, database calls, and Kafka template sends
    // good luck testing this without a full Kafka cluster
}

Better:

@KafkaListener(topics = "orders")
public void handleOrder(Order order) {
    orderProcessor.process(order);
}

// This class has no Kafka dependency. Pure unit tests.
@Service
public class OrderProcessor {
    public ProcessingResult process(Order order) {
        // business logic here
    }
}

Now OrderProcessor can be tested with plain JUnit and Mockito. No Kafka, no containers, no waiting for brokers to start. Run these tests in milliseconds, not seconds.

This sounds obvious, but I've seen too many codebases where the Kafka listener is the business logic. Every test requires a running Kafka broker, and the test suite takes 20 minutes. Don't be that codebase.

Embedded Kafka: Fast but Fragile

Spring Kafka ships with @EmbeddedKafka, which starts an in-process Kafka broker for testing.

@SpringBootTest
@EmbeddedKafka(partitions = 1, topics = {"orders", "enriched-orders"})
class OrderEnrichmentIntegrationTest {

    @Autowired
    private KafkaTemplate<String, Order> kafkaTemplate;

    @Autowired
    private KafkaListenerEndpointRegistry registry;

    @Test
    void shouldEnrichOrderWithCustomerData() {
        // Send a test message
        kafkaTemplate.send("orders", "order-1", testOrder());

        // Wait for consumer to process
        await().atMost(Duration.ofSeconds(10)).untilAsserted(() -> {
            // verify the enriched order was produced
        });
    }
}

The Pros

Fast to start (a few seconds)
No Docker required
Runs in CI without special configuration
Good for testing Spring Kafka configuration (serializers, error handlers, retry policies)

The Cons

Not the same as real Kafka. Embedded Kafka is a stripped-down, single-broker, in-process version. Some behaviors differ from a real cluster.
Port conflicts if multiple test classes run in parallel
Flaky await() calls. How long do you wait? Too short and you get false negatives. Too long and your test suite crawls.
The embedded broker doesn't support all Kafka features (transactions can be particularly quirky)

I use embedded Kafka for fast feedback during development and for testing Spring configuration. For anything that needs to match production behavior closely, I use Testcontainers.

Testcontainers: Real Kafka, Real Confidence

Testcontainers spins up a real Kafka broker in a Docker container. It's slower to start (15-30 seconds), but you're testing against actual Kafka.

@SpringBootTest
@Testcontainers
class OrderProcessingIntegrationTest {

    @Container
    static KafkaContainer kafka = new KafkaContainer(
        DockerImageName.parse("confluentinc/cp-kafka:7.6.0")
    );

    @DynamicPropertySource
    static void kafkaProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
    }

    @Test
    void shouldProcessOrderEndToEnd() {
        // produce a message, verify processing, check side effects
    }
}

Testing with Schema Registry

If you're using Avro or Protobuf with a schema registry, you can add that as a container too:

@Container
static GenericContainer<?> schemaRegistry = new GenericContainer<>(
    DockerImageName.parse("confluentinc/cp-schema-registry:7.6.0"))
    .withExposedPorts(8081)
    .withEnv("SCHEMA_REGISTRY_HOST_NAME", "schema-registry")
    .withEnv("SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS", kafka.getBootstrapServers())
    .dependsOn(kafka);

Now your integration tests validate the full serialization pipeline, including schema compatibility.

Testing Failure Scenarios

The real value of Testcontainers is testing failure modes. What happens when the broker goes down? When a message fails deserialization? When the consumer takes too long to process?

@Test
void shouldRetryAndDlqOnDeserializationError() {
    // Send a malformed message (raw bytes that can't deserialize)
    try (KafkaProducer<String, byte[]> producer = createRawProducer()) {
        producer.send(new ProducerRecord<>("orders", "bad-key", "not-an-order".getBytes()));
    }

    // Verify the message lands in the DLQ
    await().atMost(Duration.ofSeconds(30)).untilAsserted(() -> {
        ConsumerRecords<String, byte[]> dlqRecords = readFromTopic("orders.DLT");
        assertThat(dlqRecords).hasSize(1);
    });
}

This kind of test catches configuration errors in your error handling pipeline that unit tests can't reach.

Contract Testing: The Missing Piece

Unit tests verify your logic. Integration tests verify your infrastructure. Contract tests verify your agreements between services. In an event-driven system, the contract is the message schema and the semantic meaning of events.

Pact for Async Messaging

Pact supports asynchronous messaging contracts. The producer side defines what messages it produces; the consumer side defines what messages it expects.

Consumer side (defines the expectation):

@Pact(consumer = "order-enrichment-service")
MessagePact orderCreatedPact(MessagePactBuilder builder) {
    return builder
        .expectsToReceive("an order created event")
        .withContent(new PactDslJsonBody()
            .stringType("orderId", "ORD-123")
            .stringType("customerId", "CUST-456")
            .decimalType("amount", 99.99))
        .toPact();
}

@Test
@PactVerificationContext
void verifyOrderCreatedEvent(MessagePactVerificationContext context) {
    context.verifyInteraction();
}

Producer side (verifies it meets the contract):

@TestTemplate
@ExtendWith(PactVerificationInvocationContextProvider.class)
void verifyPact(PactVerificationContext context) {
    context.verifyInteraction();
}

@PactVerifyProvider("an order created event")
MessageAndMetadata produceOrderCreated() {
    Order order = new Order("ORD-123", "CUST-456", BigDecimal.valueOf(99.99));
    byte[] serialized = objectMapper.writeValueAsBytes(order);
    return new MessageAndMetadata(serialized, Map.of("contentType", "application/json"));
}

If the producer changes the message format in a way that breaks the consumer's expectations, the contract test fails before deployment.

Spring Cloud Contract

If your team is already in the Spring ecosystem, Spring Cloud Contract is the alternative. You define contracts in Groovy or YAML, and the framework generates tests for both sides.

Contract.make {
    label("order_created")
    input {
        triggeredBy("createOrder()")
    }
    outputMessage {
        sentTo("orders")
        body([
            orderId: $(anyNonBlankString()),
            customerId: $(anyNonBlankString()),
            amount: $(anyDouble())
        ])
        headers {
            header("contentType", "application/json")
        }
    }
}

The generated producer test verifies that calling createOrder() produces a message matching the contract. The generated consumer stub lets the consumer team test against a fake producer without needing the real service.

My Testing Strategy

Here's the layered approach that's worked for our team:

Unit tests for all business logic (fast, run on every build)
Embedded Kafka tests for Spring Kafka configuration validation (fast-ish, run on every build)
Testcontainers integration tests for end-to-end flows and failure scenarios (slow, run in CI but not on every local build)
Contract tests for inter-service message format agreements (medium speed, run in CI, block deployments on failure)

The total test suite runs in about 8 minutes in CI. It's not perfect - there are edge cases that only surface in production - but it catches the vast majority of issues before deployment. The contract tests alone have prevented at least a dozen breaking changes from reaching staging.

Testing event-driven systems is harder than testing synchronous APIs. But "harder" doesn't mean "optional." Invest in the test infrastructure now, or invest in incident response later. Your call.