Skip to content
DERKONLINE

Stop Brute Force and POST Floods With Nginx Rate Limit Zones

Layer connection limits and POST burst zones at the edge so abusive traffic dies in nginx before it ever touches your app.

Derrick S. K. Siawor8 min read

Your login endpoint does not care that a bot is trying ten thousand passwords a minute. Your application server dutifully spins up a worker for each attempt, hashes each password, queries the database, and returns "invalid credentials" ten thousand times, burning CPU and connection pool slots while a legitimate user waits in line behind the flood. The attacker is not even close to guessing the password. They are just keeping your server busy enough to hurt. The fix is not in your application. It is one layer up, at the edge, where nginx can drop the abuse before it ever reaches your code.

Nginx rate limiting is one of the highest-value, lowest-effort defenses you can deploy, and most teams either skip it or misconfigure it into uselessness. It belongs in the same baseline as the security headers every Next.js app should ship: cheap to add, expensive to omit. Done right, it means a brute-force attempt dies in nginx with a cheap 503, a POST flood gets throttled to a trickle, and your application server spends its resources on real users instead of absorbing junk. The cost is a few lines of configuration. The payoff is an attack surface that no longer scales for free.

The leaky bucket, in plain terms

Nginx rate limiting uses the leaky bucket algorithm. Picture a bucket with a small hole in the bottom. Requests pour in at the top. The hole drains them at a fixed rate. If requests arrive faster than the hole drains, the bucket fills. Once it overflows, extra requests are rejected. The drain rate is your steady-state limit, the bucket size is your burst tolerance, and the overflow is what gets the 503.

Nginx rate limiting flow: leaky bucket rate plus burst and connection cap decide pass to backend or reject 503

This model is what makes nginx rate limiting feel fair instead of brittle. A real user who clicks a few things in quick succession produces a small burst that the bucket absorbs. A bot hammering an endpoint produces sustained overflow that gets rejected. The same configuration handles both correctly because it distinguishes a brief spike from a relentless stream.

Two zones for two different attacks

You need two kinds of limiting, because rapid-fire requests and slow connection-holding attacks are different problems.

The first is request rate limiting with limit_req_zone, which controls how many requests per second a client can make. You define a zone keyed by the client's IP, using the binary form of the address to save memory, with a size and a rate.

# in the http block
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/s;
limit_req_zone $binary_remote_addr zone=post_limit:10m rate=10r/s;
limit_conn_zone $binary_remote_addr zone=conn_limit:10m;

The 10m is ten megabytes of shared memory, enough to track on the order of 160,000 unique IP addresses. The rate is the steady drain. Then you apply the zone where it matters, with a burst and the nodelay flag.

location /api/login {
    limit_req zone=login burst=5 nodelay;
    limit_conn conn_limit 10;
    proxy_pass http://backend;
}

The burst=5 lets a client exceed the rate by up to five queued requests, absorbing a legitimate spike. The nodelay flag tells nginx to serve those burst requests immediately rather than spacing them out, so a real user feels no lag while an abuser still hits the wall. Without nodelay, burst requests are delayed to smooth the rate, which protects the backend but adds latency for bursty-but-legitimate traffic. With it, burst capacity is available instantly and only true overflow gets rejected. For a login endpoint, a tight rate plus a small burst means a brute-force tool gets a handful of attempts and then a wall of 503s, while a human typing their password sees nothing unusual. The endpoint itself should already give an attacker as little to work with as possible, with generic responses that prevent user enumeration and scrypt password hashing with timing-safe comparison behind it. The edge limit is the volumetric layer; the identity-aware layer underneath it is progressive lockout that locks out credential stuffers per account after repeated failures for a specific email.

Connection limiting stops the slow attacks

Request rate limiting handles fast attacks. It does nothing against an attacker who opens hundreds of connections and holds them open slowly, the classic slow-loris pattern that exhausts your connection slots without ever sending many requests. That is what limit_conn is for. It caps the number of simultaneous connections from a single IP. The combination is the point: request rate limiting against rapid-fire floods, connection limiting against slow connection-hoarding, layered together for coverage of both.

location / {
    limit_req zone=post_limit burst=20 nodelay;
    limit_conn conn_limit 20;
    proxy_pass http://backend;
}

A single IP can now make at most 10 requests per second with a burst of 20, and hold at most 20 connections open. An attacker who wants to overwhelm you has to spread across many IPs, which raises their cost and shrinks their effectiveness, while a normal user never comes close to either ceiling. The same nginx that you are exposing here should also be hardened against Safari-only 520 errors that large auth cookies quietly cause, since a misconfigured field-size limit fails for real users in exactly the browsers you do not test.

Why the edge is the right place for this

The instinct of many teams is to put rate limiting in the application, with a middleware that tracks attempts per user. That layer is genuinely valuable for things the application understands, like locking an account after repeated failed logins for a specific email. But it has a fatal limitation for volumetric defense: by the time the request reaches your application middleware, your application has already accepted the connection, parsed the request, and spent resources getting to the point where it can reject it. Under a real flood, that per-request cost is the whole problem.

Nginx rejects abusive traffic before it touches your application at all. The 503 is generated at the edge, cheaply, without a database query, without a worker, without a connection to your backend. This is the difference between a flood that costs you almost nothing and a flood that knocks you over. Edge rate limiting and application-level lockout are not competitors. They are complementary layers, and a hardened stack runs both. The edge handles volume. The application handles identity-aware policy like progressive lockout windows.

This layered thinking is core to how we approach security audits and ongoing server administration. It is the same instinct as stopping leaks of your admin login URL in redirects and errors: make the cheap, high-value defense the default rather than the afterthought. An endpoint without edge rate limiting is an endpoint whose cost to abuse is paid entirely by you, and that is a bargain you are offering to every bot on the internet.

Tuning without locking out real users

The fear that keeps teams from deploying rate limiting is locking out legitimate users. That fear is reasonable and the answer is to tune from real traffic, not from guesses.

  • Start permissive and tighten. Set the rate generously above your observed normal traffic, watch the logs for rejections, and tighten only if abuse appears. A rate that never rejects a real user but catches a bot is the goal.
  • Differentiate by endpoint. Your login endpoint should be far stricter than your homepage. A static asset path may need no limit at all. Apply tight limits where abuse is likely and credentials are at stake, looser limits elsewhere.
  • Watch the 503s. Nginx logs every rejected request. If you see legitimate users getting 503s, your limits are too tight or your burst is too small. The logs tell you exactly where to adjust.
  • Account for shared IPs. Users behind a corporate NAT or a mobile carrier may share an IP, so several real users can look like one heavy user. Keep this in mind when setting per-IP limits on endpoints those users hit, and lean on application-level identity limits where IP-based limits would be unfair.

The quiet win

Once this is in place, the change is invisible on a good day and decisive on a bad one. Your real users notice nothing, because they never approach the limits. Then one day a bot decides to brute-force your login or flood your contact form, and instead of your server straining and your real users seeing slowdowns, the attack hits a wall of 503s at the edge and your application never even hears about it. You find out from the logs, not from an outage, which is exactly why turning noisy server logs into alerts you actually trust pays off the moment an attack arrives.

That is the shape of good infrastructure: defenses that cost you nothing when things are calm and save you everything when they are not. A few lines in your nginx config, keyed by IP, layering request limits against connection limits, and the most common volumetric attacks stop being your application's problem and become nginx's, which is exactly where they should die.