Skip to content
DERKONLINE

Next.js Server Components Without the Waterfall Tax

Structure data fetching so server components stream and parallelize instead of stacking sequential round trips the user feels as lag.

Derrick S. K. Siawor8 min read

Server components made data fetching feel simple, which is exactly how they make pages slow. You write a component, call your database or API right inside it with a plain await, and it works. Then you write the next component the same way, and the one after that, and without noticing you have built a waterfall: each component waits for the one before it to finish fetching before it starts its own, so three independent queries that could have run together instead run one after another, and the user feels the sum of all three as lag.

The page is correct. It is just slower than it needs to be, by exactly the time it spends waiting on requests that had no reason to be sequential. The fix is not a different framework or a clever library, it is structuring the data fetching so independent work runs in parallel and the parts that are ready stream to the user without waiting on the parts that are not. Here is how to get the speed server components promised instead of the waterfall they make easy.

How the waterfall sneaks in

The trap is that the natural way to write a server component is also the sequential way. When you await a fetch inside a component, and that component renders a child that awaits another fetch, the child's fetch cannot begin until the parent's resolves, because the child does not exist until the parent finishes rendering. Nest a few of these and you have a chain: fetch A completes, then B starts, completes, then C starts. Each request is fine on its own. Stacked end to end, they add up to a load time the user feels as a stall.

The reason this is so common is that it does not look wrong. There is no error, no warning, just a series of awaits that read perfectly clearly and happen to run in series. The waterfall is invisible in the code and obvious only in the timing, which is why teams ship it and then wonder why a page that does three quick queries takes as long as it does. The three queries were quick. The waiting between them was not.

Fetch in parallel when requests do not depend on each other

The first fix addresses requests that have no reason to wait on each other. If a page needs the user's profile, their recent orders, and a list of recommendations, those three are independent: none of them needs the result of the others to begin. So they should all start at once and resolve together, and the page should wait only as long as the slowest single request rather than the sum of all three.

The way to do this is to initiate the requests eagerly, before you await them. Start all the fetches, then await them together, so they run concurrently.

Sequential await waterfall versus Promise all parallel fetching in server components

// Sequential: total time = A + B + C
const profile = await getProfile(userId);
const orders = await getOrders(userId);
const recs = await getRecommendations(userId);

// Parallel: total time = max(A, B, C)
const [profile, orders, recs] = await Promise.all([
  getProfile(userId),
  getOrders(userId),
  getRecommendations(userId),
]);

The second version kicks off all three at the same time and waits for all of them to finish, so the page pays for the slowest one instead of the total. For independent data, this is the single highest-impact change you can make, and it is the difference between a page that feels snappy and one that feels sluggish for no reason the user could name. It is also where a hidden N plus one query quietly slowing your API does the most damage, since one fan-out inside a fetch undoes the parallelism you just won, which is why hunting down N plus one queries before they melt production belongs in the same pass as parallelizing your fetches.

Server components help here in ways worth knowing. When different components issue fetch calls during the server render, the framework can run them concurrently, and it deduplicates identical requests, so if two components need the same data only one request actually goes out. That means you do not always have to manually hoist every fetch into a Promise.all, but you do have to avoid the structural nesting that forces sequencing, because the framework can only parallelize requests it is allowed to start at the same time.

Sometimes requests genuinely are dependent: you need the user's account before you can fetch their account's settings. Those have to be sequential, and that is fine. The goal is not to parallelize everything, it is to parallelize everything that can be, and to make sure you are not accidentally sequencing things that are actually independent.

Stream the page so the fast parts do not wait for the slow ones

The second fix addresses the case where one part of the page is slow and the rest is fast. Traditional server rendering waits for everything before sending anything, so a single slow query holds the entire page hostage, and the user stares at a blank screen until the slowest data on the page resolves. That is the wrong trade, because the fast content was ready and could have been shown.

Streaming with Suspense fixes this. You wrap the slow region of the page in a Suspense boundary, and the server sends the rest of the page immediately with a fallback (a skeleton) in place of the slow region. Then, when the slow data resolves, the server streams the real content in to replace the fallback. The user sees and can interact with the fast parts of the page right away, while the slow part fills in when it is ready, rather than waiting for the whole page to be complete.

The mechanism is simple to apply. Put the slow data fetch in its own component, wrap that component in a Suspense boundary with a skeleton fallback, and the rest of the page renders without waiting on it.

<Dashboard>
  <Header />                    {/* fast, renders immediately */}
  <Suspense fallback={<FeedSkeleton />}>
    <Feed />                    {/* slow, streams in when ready */}
  </Suspense>
</Dashboard>

The page no longer pays for the slow region up front. It shows everything else instantly and fills the feed in when the data arrives, which is exactly the perceived-performance win of showing the shape of the content before all of it is ready, and it is the same mechanism that slashes time to first byte with streaming server rendering.

Where you put the boundary decides how fast it feels

The detail that separates a good streaming setup from a mediocre one is the placement of the Suspense boundaries, and it cuts both ways. Granular boundaries let independent parts of the page render and stream in parallel, each appearing as soon as its own data is ready. A boundary that is too coarse, wrapping a large region with several independent pieces inside it, recreates the waterfall, because the whole region is blocked until its slowest child resolves. One unresolved fetch inside a big boundary holds back everything else in that boundary, even the parts that were ready.

So the rule is to scope each boundary to a single unit of independent work. The slow feed gets its own boundary. The recommendations get theirs. The user's stats get theirs. Each streams in independently the moment its data is ready, and no part of the page waits on an unrelated part. Coarse boundaries undo the benefit you wrapped them for, which is why "where do the boundaries go" is the question that actually determines the felt speed.

Build it parallel from the start, not as a rescue

The throughline is that a fast server-rendered page is a structuring problem, not an optimization you bolt on later. Decide up front which data is independent and fetch it in parallel, which data is genuinely dependent and must be sequential, and which slow regions should stream behind their own Suspense boundary so the fast content never waits. Done from the start, the page is fast by construction. Done as a rescue after it ships slow, it is a refactor that touches the whole render tree.

This is the same discipline we apply to performance across the web applications and websites we build: the felt speed of a page is decided by what waits on what, and the biggest wins come from not making the user wait on work that had no reason to be sequential. Shipping zero JavaScript for the static parts via server components that ship zero JS compounds the win, since the parallel-fetched HTML arrives without a bundle to hydrate first. A page that streams its fast parts instantly and parallelizes its independent fetches feels quick even when some of its data is genuinely slow, because the user is reading and interacting while the slow part fills in, rather than watching a blank screen pay for the slowest query on the page. That felt speed is not cosmetic, it is the revenue every slow second costs you recovered at the structural level.

The short version

Server components make sequential fetching the path of least resistance, and sequential fetching is a waterfall the user feels as lag. Fetch independent data in parallel, with the requests started together so the page waits for the slowest one instead of the sum. Stream slow regions behind Suspense boundaries so the fast content shows immediately and the slow part fills in when ready. And scope those boundaries tightly, one per unit of independent work, because a boundary that is too coarse rebuilds the waterfall inside itself.

Structure the data fetching deliberately and the page is fast. Let the awaits fall where they may and the page is correct, slow, and slow for a reason no user will ever be able to explain to you.