Connection Pooling: The Silent Killer
Every Java developer uses connection pools. Very few understand them. And the ones who don't understand them end up debugging "connection timeout" errors in production at the worst possible time.
Why It Matters
Opening a database connection is expensive. TCP handshake, TLS negotiation (if you're doing it right), authentication, session setup - it adds 20-50ms per connection on a good day. If your service handles 100 requests per second and each one opens a fresh connection, you're spending 2-5 seconds per second just on connection overhead. That math doesn't work.
Connection pools keep a set of connections open and ready. Your code borrows one, uses it, returns it. The overhead drops to near zero.
HikariCP: The Default (For Good Reason)
Spring Boot uses HikariCP by default, and it's the right choice. It's fast, lightweight, and has sane defaults. But those defaults might not be sane for your workload.
The key settings:
spring:
datasource:
hikari:
maximum-pool-size: 10
minimum-idle: 5
connection-timeout: 30000
idle-timeout: 600000
max-lifetime: 1800000
maximum-pool-size
This is the one that matters most, and it's the one most people get wrong. The instinct is "more connections = more throughput." It's the opposite.
PostgreSQL's sweet spot for a given machine is roughly:
connections = (2 * number_of_cores) + number_of_disks
For a typical 4-core database server with SSDs, that's around 10. Not 50. Not 100. Ten.
Every connection beyond the optimal point adds context-switching overhead to the database. I've seen services go from 500ms average response time to 50ms by reducing the pool size from 50 to 10.
connection-timeout
How long a thread waits to get a connection from the pool before giving up. The default is 30 seconds, which is usually too long. If your pool is exhausted for 30 seconds, something is fundamentally wrong, and making requests wait that long just creates a cascading failure.
I set this to 5 seconds. If a connection isn't available in 5 seconds, fail fast and let the caller handle it.
max-lifetime
How long a connection lives before being recycled. Must be shorter than the database's connection timeout (PostgreSQL's idle_in_transaction_session_timeout or the firewall/load balancer timeout, whichever is shorter). If a connection exceeds the database's timeout and the pool doesn't know, the next query on that connection fails.
30 minutes is a reasonable default. If you're behind an Azure load balancer, their idle timeout is 4 minutes by default - set max-lifetime to 3 minutes or you'll get "connection reset" errors that take forever to diagnose. Ask me how I know.
The Connection Leak
The most common pool problem isn't configuration - it's leaks. A connection gets borrowed and never returned. The pool slowly drains until connection-timeout fires on every request.
The usual suspect:
// DON'T DO THIS
Connection conn = dataSource.getConnection();
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT ...");
// exception here = connection never returned
process(rs);
conn.close(); // never reached
Use try-with-resources:
try (Connection conn = dataSource.getConnection();
PreparedStatement stmt = conn.prepareStatement("SELECT ...")) {
ResultSet rs = stmt.executeQuery();
process(rs);
} // connection automatically returned to pool
If you're using Spring and JPA (which you should be), Spring manages this for you. But if you're doing raw JDBC calls alongside JPA - and in legacy migrations, you will be - watch for leaks.
HikariCP has a leak detection setting:
spring:
datasource:
hikari:
leak-detection-threshold: 60000 # log a warning if connection held > 60s
This has saved me hours of debugging. Turn it on in all environments.
Monitoring
You can't tune what you can't see. Expose HikariCP metrics:
management:
metrics:
enable:
hikaricp: true
The metrics that matter:
hikaricp.connections.active- how many connections are in use right nowhikaricp.connections.pending- how many threads are waiting for a connection (this should be 0)hikaricp.connections.timeout- how many times the pool failed to provide a connection
If pending is regularly above 0, your pool is too small or your queries are too slow. If timeout is above 0, you have an active problem.
The Lesson
Connection pool tuning isn't exciting. But a misconfigured pool is the kind of problem that manifests as "the app is slow sometimes" with no obvious cause. It wastes hours of investigation because the symptoms look like everything except a connection pool issue.
Set the pool size small. Set the timeouts tight. Turn on leak detection. Monitor the metrics. It's twenty minutes of configuration that saves you from a category of production incidents entirely.