Every few months someone on my team suggests we adopt Istio. I've now done it twice - once successfully, once disastrously. The difference wasn't the technology. It was whether we actually needed it.

A service mesh is infrastructure-level networking that handles things like mTLS between services, traffic routing, circuit breaking, observability, and retries. Istio does all of this by injecting an Envoy sidecar proxy into every pod. Your application doesn't know it's there. All traffic flows through Envoy, which applies policies and collects telemetry.

The question isn't whether Istio is good technology. It is. The question is whether you need it.

The Architecture

Every pod gets an Envoy sidecar. Inbound and outbound traffic is intercepted via iptables rules and routed through Envoy. The control plane (istiod) pushes configuration to all the Envoy proxies.

[Service A] -> [Envoy Sidecar] -> [Envoy Sidecar] -> [Service B]

Your service talks to localhost. Envoy handles everything else: TLS termination, load balancing, retries, circuit breaking, collecting traces and metrics. You get observability and security without changing application code.

In theory, this is beautiful. In practice, you're adding a proxy to every network call, running a control plane that configures thousands of Envoy instances, and debugging network issues through an additional layer of abstraction.

When You Need It

mTLS Everywhere

If compliance requires encrypted service-to-service communication, Istio gives you this for free. Enable strict mTLS and every service communication is encrypted with certificates that Istio manages and rotates automatically.

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

Without a mesh, you'd need to configure TLS in every service, manage certificates, handle rotation. Istio makes it a cluster-wide policy. This alone justified Istio for one of our projects that had regulatory requirements.

Traffic Management

This is where Istio shines. Canary deployments by percentage:

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
    - order-service
  http:
    - route:
        - destination:
            host: order-service
            subset: v1
          weight: 90
        - destination:
            host: order-service
            subset: v2
          weight: 10

Traffic shadowing (mirroring production traffic to a test version without affecting users):

http:
  - route:
      - destination:
          host: order-service
          subset: v1
    mirror:
      host: order-service
      subset: v2-test
    mirrorPercentage:
      value: 100.0

Header-based routing (send QA traffic to a specific version):

http:
  - match:
      - headers:
          x-test-route:
            exact: "canary"
    route:
      - destination:
          host: order-service
          subset: v2
  - route:
      - destination:
          host: order-service
          subset: v1

This kind of traffic control is extremely powerful for safe deployments and testing. If you're doing sophisticated deployment strategies across many services, the mesh pays for itself.

Circuit Breaking

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

When a service starts returning errors, Envoy ejects it from the load balancing pool. This prevents cascading failures. You can do this in application code (Resilience4j, for example), but doing it at the infrastructure level means it works consistently across all services regardless of language or framework.

When You Don't Need It

Less Than 10 Services

If you have a handful of services, the operational overhead of Istio exceeds the benefit. Istio's control plane consumes resources, the sidecars add latency and memory to every pod, and the debugging complexity is real.

For small deployments, use Spring Cloud for service discovery, Resilience4j for circuit breaking, and let your ingress controller handle TLS.

Your Team Is Small

Istio has a learning curve. CRDs for VirtualServices, DestinationRules, Gateways, PeerAuthentication, AuthorizationPolicy. Debugging means understanding Envoy configs, checking sidecar logs, and knowing how iptables interception works. If your team is three developers, this is overhead you don't need.

Latency-Sensitive Workloads

Every hop through Envoy adds latency. Typically 1-3ms per hop. For most services this is nothing. For latency-critical paths where you're trying to stay under 10ms total, an extra 2-6ms (inbound + outbound sidecar) per service call adds up fast.

The Service Mesh vs API Gateway Debate

People often confuse these. An API gateway (Kong, Azure API Management, AWS API Gateway) sits at the edge and handles north-south traffic - external clients talking to your services. It does authentication, rate limiting, transformation, and routing.

A service mesh handles east-west traffic - service-to-service communication inside the cluster. It does mTLS, circuit breaking, traffic splitting, and observability between internal services.

You probably need both. The API gateway for external traffic. The mesh for internal traffic. They're complementary, not competing.

That said, Istio's ingress gateway can replace a standalone API gateway for simpler use cases. If your edge requirements are basic (TLS termination, routing, rate limiting), the Istio gateway might be enough. If you need OAuth flows, request/response transformation, developer portals, and usage analytics, you want a real API gateway.

The Honest Assessment

Istio is excellent infrastructure software that most teams adopt too early. Start without it. When you hit a problem that Istio solves (mTLS compliance, sophisticated traffic management, consistent cross-service observability), adopt it then. The migration path is straightforward - Istio's sidecar injection can be done incrementally per namespace.

Don't adopt it because it looks cool on an architecture diagram. Adopt it because you have a specific, current problem that it solves better than the alternatives.

And if you do adopt it, invest in understanding Envoy. Every Istio debugging session eventually becomes an Envoy debugging session.