I've been reviewing Spring Data JPA code at various clients for years now, and the same mistakes keep showing up. Not because developers are careless - because Spring Data JPA makes certain anti-patterns embarrassingly easy. The abstraction is so clean that it hides the SQL it generates, and that hidden SQL is often terrible.

The findAll Anti-Pattern

This is the most common one. A repository method called findAll() used to populate a dropdown, a table, or a report. On a table with 50 rows, it's fine. On a table with 5 million rows, it's an OOM exception waiting to happen.

// This is a production incident waiting to happen
List<Product> products = productRepository.findAll();

Spring Data JPA will load every single row into memory as fully hydrated entity objects. With lazy-loaded collections, each entity might trigger additional queries. With 5 million rows, you're looking at gigabytes of heap usage and minutes of database load.

The fix depends on the use case:

// For paginated display: use pagination
Page<Product> products = productRepository.findAll(PageRequest.of(0, 20));

// For dropdowns: use a projection
public interface ProductSummary {
    String getId();
    String getName();
}

List<ProductSummary> products = productRepository.findAllProjectedBy();

// For batch processing: use streaming
@Query("SELECT p FROM Product p")
Stream<Product> streamAll();

The findById Anti-Pattern

Almost as common. You need one field from an entity, but you load the entire object graph.

// Loading a full entity just to get the name
Product product = productRepository.findById(id).orElseThrow();
String name = product.getName();
// The entity has 20 fields, 3 eager associations, and 2 LOB columns
// All loaded, all useless except the name

Use a DTO projection:

public record ProductName(String id, String name) {}

@Query("SELECT new com.example.dto.ProductName(p.id, p.name) FROM Product p WHERE p.id = :id")
Optional<ProductName> findNameById(@Param("id") String id);

Or an interface projection:

public interface ProductNameProjection {
    String getId();
    String getName();
}

Optional<ProductNameProjection> findProductNameById(String id);

The SQL generated by projections only selects the columns you need. For wide tables or tables with LOB columns, the performance difference is significant.

DTO Projections: The Right Way

I use DTO projections aggressively. The rule is simple: if the caller doesn't need the full entity, don't return the full entity.

// Record-based DTO
public record OrderSummary(
    String id,
    String customerName,
    BigDecimal total,
    OrderStatus status,
    LocalDateTime createdAt
) {}

// Repository method
@Query("""
    SELECT new com.example.dto.OrderSummary(
        o.id, c.name, o.total, o.status, o.createdAt
    )
    FROM Order o
    JOIN o.customer c
    WHERE o.status = :status
    """)
List<OrderSummary> findOrderSummariesByStatus(@Param("status") OrderStatus status);

DTO projections are read-only, lightweight, and generate efficient SQL. They also make your API layer cleaner because you're not leaking entity details through your controllers.

Custom Repository Implementations

Spring Data's query derivation (method name-based queries) is great for simple cases. For complex queries, it produces method names that are longer than some of my commit messages:

// Please don't
List<Order> findByStatusAndCustomerCountryAndCreatedAtBetweenAndTotalGreaterThan(
    OrderStatus status, String country, LocalDateTime start, LocalDateTime end, BigDecimal minTotal
);

Use a custom repository implementation instead:

public interface OrderRepositoryCustom {
    List<OrderSummary> searchOrders(OrderSearchCriteria criteria);
}

@Repository
public class OrderRepositoryCustomImpl implements OrderRepositoryCustom {

    private final EntityManager em;

    @Override
    public List<OrderSummary> searchOrders(OrderSearchCriteria criteria) {
        CriteriaBuilder cb = em.getCriteriaBuilder();
        CriteriaQuery<OrderSummary> query = cb.createQuery(OrderSummary.class);
        Root<Order> order = query.from(Order.class);
        Join<Order, Customer> customer = order.join("customer");

        List<Predicate> predicates = new ArrayList<>();

        if (criteria.getStatus() != null) {
            predicates.add(cb.equal(order.get("status"), criteria.getStatus()));
        }
        if (criteria.getCountry() != null) {
            predicates.add(cb.equal(customer.get("country"), criteria.getCountry()));
        }
        if (criteria.getMinTotal() != null) {
            predicates.add(cb.greaterThan(order.get("total"), criteria.getMinTotal()));
        }

        query.select(cb.construct(OrderSummary.class,
            order.get("id"), customer.get("name"),
            order.get("total"), order.get("status"), order.get("createdAt")
        ));
        query.where(predicates.toArray(new Predicate[0]));

        return em.createQuery(query).getResultList();
    }
}

// Extend both interfaces
public interface OrderRepository extends JpaRepository<Order, String>, OrderRepositoryCustom {
}

Specifications: The Composable Alternative

If the Criteria API feels too verbose (it is), Specifications offer a more composable approach:

public class OrderSpecifications {

    public static Specification<Order> hasStatus(OrderStatus status) {
        return (root, query, cb) -> cb.equal(root.get("status"), status);
    }

    public static Specification<Order> inCountry(String country) {
        return (root, query, cb) -> {
            Join<Order, Customer> customer = root.join("customer");
            return cb.equal(customer.get("country"), country);
        };
    }

    public static Specification<Order> totalGreaterThan(BigDecimal amount) {
        return (root, query, cb) -> cb.greaterThan(root.get("total"), amount);
    }
}

// Usage: compose specifications dynamically
Specification<Order> spec = Specification.where(hasStatus(COMPLETED))
    .and(inCountry("PT"))
    .and(totalGreaterThan(new BigDecimal("100")));

List<Order> orders = orderRepository.findAll(spec);

Specifications compose like building blocks. Each one is testable in isolation. The repository just needs to extend JpaSpecificationExecutor<Order>.

Stream Methods

For large result sets that need processing but not full materialization in memory, use streaming:

@Query("SELECT o FROM Order o WHERE o.status = :status")
Stream<Order> streamByStatus(@Param("status") OrderStatus status);
@Transactional(readOnly = true)
public void processCompletedOrders() {
    try (Stream<Order> orders = orderRepository.streamByStatus(COMPLETED)) {
        orders.forEach(order -> {
            // Process one at a time, not all in memory
            archiveService.archive(order);
        });
    }
}

Critical: streams must be used within an open transaction and closed explicitly (try-with-resources). The stream fetches rows lazily from the database cursor, so the transaction and connection must stay open.

Entity Locking

Concurrent updates on the same entity will cause data loss unless you handle locking. JPA offers two strategies:

Optimistic locking (preferred for most cases):

@Entity
public class Account {
    @Id
    private String id;

    @Version
    private Long version;

    private BigDecimal balance;
}

The @Version field is checked on every update. If another transaction modified the entity since you read it, you get an OptimisticLockException. Catch it and retry.

Pessimistic locking (for high-contention scenarios):

@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT a FROM Account a WHERE a.id = :id")
Optional<Account> findByIdForUpdate(@Param("id") String id);

This issues a SELECT ... FOR UPDATE, which blocks other transactions from reading or writing the same row until your transaction completes. Use it sparingly - it reduces concurrency.

Envers Auditing

For entities that need an audit trail (who changed what, when), Hibernate Envers is built into Spring Data JPA:

@Entity
@Audited
public class Account {
    @Id
    private String id;
    private BigDecimal balance;
    private String owner;

    @NotAudited // Skip auditing for this field
    private LocalDateTime lastLoginAt;
}
@Repository
public interface AccountRepository extends JpaRepository<Account, String>,
    RevisionRepository<Account, String, Long> {
}

// Query audit history
Revisions<Long, Account> revisions = accountRepository.findRevisions(accountId);
revisions.forEach(revision -> {
    Account entity = revision.getEntity();
    RevisionMetadata<Long> metadata = revision.getMetadata();
    log.info("Revision {} at {}: balance={}",
        metadata.getRevisionNumber().orElse(0L),
        metadata.getRevisionInstant().orElse(null),
        entity.getBalance());
});

Envers creates a _AUD table for each audited entity and records every change. It's not a replacement for application-level audit logging, but it's invaluable for "what was the account balance on January 15th?" queries.

The Bottom Line

Spring Data JPA is a powerful abstraction, but it's an abstraction over SQL, and SQL performance characteristics don't disappear just because you're writing Java. Enable SQL logging in development (spring.jpa.show-sql=true or better, logging.level.org.hibernate.SQL=DEBUG), look at the queries being generated, and fix the egregious ones before they reach production.

The N+1 problem, the load-everything-into-memory problem, and the missing-projection problem are all solvable. You just have to look at what's actually happening under the abstraction.