Hunt Down the N Plus One Queries Quietly Slowing Your API
Trace, index, and batch the database calls behind your p95 latency so every request returns before the user notices.
Your API was fast in development. Every endpoint returned in 30 milliseconds, the demo was crisp, and you shipped. Then traffic arrived, the dataset grew, and your p95 latency crept from 80 milliseconds to 600. Nothing in the code changed, which is the same trap that makes field data tell a different story than your lab benchmarks. What changed is that the loop that fired one query per row in development is now firing two hundred, and your database is paying for every one of them.
This is the N plus one query problem, and it is the single most common reason a working application gets slow at scale. It hides in development because small datasets make the cost invisible. It surfaces in production because the cost is linear in the number of rows, and production has the rows. The good news is that it is one of the most fixable performance bugs you will ever meet, once you can see it.
What an N plus one actually looks like
The shape is always the same. You fetch a list of parents, then loop over them and touch a lazy-loaded relation inside the loop. One query for the list, then N more, one per parent. Hence "N plus one."
Picture an orders page. You fetch 100 orders in a single query. Then, for each order, your template asks for order.customer.name. If your ORM lazy-loads that relation, each access fires its own SELECT against the customers table. That is 1 query for the orders and 100 for the customers: 101 round trips where 2 would do. Every round trip carries network latency, connection overhead, and query parsing, so the damage is far worse than 101 times the work of one query.
The reason this is so easy to write is that ORMs make it look like plain property access. SQLAlchemy, ActiveRecord, Hibernate, Prisma, and Eloquent all lazy-load relations by default. The code reads like you are walking an object graph. Under the hood, every dotted access can be a database call.
See it before you fix it
You cannot fix latency you cannot attribute, and N plus one is invisible until you make the queries visible. There are three layers of detection, and you want all three.
Query logging in development. Turn on SQL echo in your ORM and watch the count for a single page render. If loading one orders page emits 101 queries, you have found it in five seconds. This is the fastest feedback loop and the one most teams skip.
A query counter in your test suite. This is the layer that keeps the bug from coming back, the database equivalent of a performance budget your team will not quietly break. Tools exist for every major stack: the bullet gem for Rails, nplusone for Python, query-count assertions in many frameworks. Wrap a request in a test, assert it issues no more than a fixed number of queries, and fail CI when a new N plus one sneaks in. Without this gate, you will fix the bug today and a teammate will reintroduce it next sprint.
Production tracing that points straight at the slow span. The same instrumentation that lets you find the root cause in minutes not hours makes an N plus one impossible to miss, because the repeated query shows up as a stack of near-identical spans.
Production tracing. APM and distributed tracing show you the real culprit at the percentile that matters. A flame graph of a slow p95 request with two hundred near-identical SELECT spans stacked end to end is the unmistakable signature. This is where you confirm that the endpoint you suspect is actually the one bleeding you.
The trap is testing with three rows of seed data. Three rows hide an N plus one completely. Load test with realistic volumes, because the bug only exists at scale and a small dataset will lie to you.
Eager loading: the primary fix
In most cases the fix is to tell the ORM to load the relation up front, in one query, instead of lazily one at a time. This is eager loading, and it is the first tool you reach for.
The mechanism is a JOIN or a second batched IN query instead of N separate ones. The exact syntax depends on your stack:
- Rails ActiveRecord:
Order.includes(:customer) - SQLAlchemy:
joinedloadorselectinload - Hibernate:
JOIN FETCH - Prisma: the
includeoption - Eloquent:
with('customer')
Adopt the orders example. Order.all followed by lazy order.customer is 101 queries. Order.includes(:customer) is 2: one for the orders, one batched query that pulls every needed customer in a single WHERE id IN (...). Same data, two round trips, and the latency collapses.
Eager loading consolidates the round trips, which is exactly where the time was going. Be deliberate about which relations you eager load, though. Pulling relations you never render wastes memory and bandwidth in the other direction, and a JOIN that explodes a one-to-many can fetch far more rows than you expect. Load what the response needs, nothing more.
Batching when eager loading is not practical
Eager loading works cleanly when you control the query that starts the chain. In a GraphQL API or a request that assembles data from several independent resolvers, you often do not. Each resolver runs in isolation and has no idea the one next to it is about to ask for the same parent.
This is what DataLoader was built for. It collects every individual ID requested during one tick of the event loop, then fires a single batched query for all of them at once, and caches the result so the same ID is never fetched twice in one request. Ten resolvers each asking for a user by ID become one SELECT ... WHERE id IN (...). The resolvers stay simple and independent; the batching happens underneath them.
The pattern generalizes beyond GraphQL. Any time you have a set of IDs and a per-ID fetch, collect the IDs first and issue one batched query against them. Building this discipline into how an API is structured, rather than bolting it on after the latency complaints arrive, is part of how we approach web applications that have to stay fast as the data grows.
The index that makes the fix actually fast
Collapsing 101 queries into 2 is only half the win. The batched query you just created does a WHERE customer_id IN (...) or joins on a foreign key column, and if that column is not indexed, the database falls back to a full table scan for every lookup. You traded a hundred fast queries for two slow ones, and on a large table the slow ones can be worse.
Index the foreign key columns you filter and join on. Most databases do not create an index on a foreign key automatically; they only index the primary key. Confirm with EXPLAIN that your batched query uses an index scan, not a sequential scan. The combination of eager loading plus the right index is what takes a 600-millisecond p95 back under 100. One without the other leaves money on the table. If you are hunting these in a running system rather than in development, how to hunt down N plus one queries before they melt production walks the live triage in more depth.
Pick the battles that matter
Not every N plus one deserves a fix. If a page shows three to five records, runs rarely, and is not on a hot path, the extra queries cost a few milliseconds nobody will ever feel. Spending an afternoon optimizing it is the wrong call.
Triage by traffic and result-set size. Find the endpoints with the highest request volume and the largest result sets, because that is where N plus one compounds into real latency and real database load. Your production tracing already ranks them by p95. Start at the top of that list, fix the worst offenders, lock them in with a query-count test, and move down until the next fix would save less time than it costs. The same prioritise-by-impact mindset applies to time to first byte, where a slow query inside the request often turns out to be the byte the browser is waiting on, and to a database connection pool exhausting itself during dev reloads, the other way the database quietly throttles a Next.js app.
The reward is the kind of latency improvement that feels like new hardware without buying any. The same code, the same database, two hundred round trips collapsed into two, and a p95 that returns before the user has finished moving the mouse. Make the queries visible, eager-load or batch the relations, index the keys, and gate it in CI so it stays fixed. That loop is most of the difference between an API that gets slower with success and one that does not.






