Skip to content
DERKONLINE

Kill Startup Races With docker compose up --wait and Healthchecks

Define real healthchecks and wait on them with up --wait so your app never boots before MySQL and Redis can answer.

Derrick S. K. Siawor6 min read

Here is a bug that wastes an afternoon and then hides. Your app boots, throws a connection error against MySQL, and crashes. You restart it, and this time it works fine. So you shrug and move on. What just happened is a startup race: the app tried to talk to the database before the database was ready to answer, and whether it crashes depends entirely on timing you do not control. It will pass on your fast laptop and fail on a slow CI runner, which makes it the kind of flake that erodes trust in the whole test suite, the dev-time cousin of the startup races a deploy script has to survive.

The root cause is a default that surprises people. By default, Docker Compose does not wait until a container is ready, only until it is running. A running MySQL container is not the same as a MySQL server that can serve queries. The container can be up while the database inside it is still initializing. The fix is to define real healthchecks and make your app wait on them, so "started" is never confused with "ready" again.

"Running" is not "ready"

When Compose starts your stack, it brings the containers up and considers its job done. For a stateless service that boots instantly, that is fine. For a relational database, a cache, or a message broker, it is not, because those services have their own internal startup sequence before they can accept connections. MySQL needs to initialize data directories and finish loading. Redis needs to come up and, if configured, load a snapshot. During that window the container is running and the service is not answering.

If your app is configured with a plain depends_on, Compose will start it as soon as the database container is running, not when the database is ready. The app fires its first query into a service that is not listening yet, and you get the connection refused that ruins your morning. The bug is intermittent precisely because the race outcome shifts with machine speed and load.

Define a healthcheck that means something

The first half of the fix is a healthcheck on each dependency that checks the actual readiness condition, not a proxy for it. For MySQL, the right test is mysqladmin ping, which confirms the server is up and responding to connections. For PostgreSQL it is pg_isready. For Redis it is a redis-cli ping that expects PONG.

services:
  db:
    image: mysql:8
    environment:
      MYSQL_ROOT_PASSWORD: dev
    healthcheck:
      test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
      interval: 5s
      timeout: 5s
      retries: 10
      start_period: 30s

Each parameter earns its place. interval is how often the check runs. timeout is how long a single check may take before it counts as a failure. retries is how many consecutive failures flip the container to unhealthy. start_period is the grace window during startup where failures do not count against the retry total, calibrated to the service's real boot time. Set start_period too short and the first real check can fail the instant the grace window closes, before the database has actually finished coming up.

There is a classic trap here worth naming. People reach for curl in a healthcheck and the check fails with "curl: not found," because curl is not installed inside the image. Docker then marks a perfectly healthy container as unhealthy. The healthcheck command must use a tool that actually exists inside that container. Verify the command runs in the image before you trust it.

Make the app wait on healthy, not on started

Compose sequence: healthcheck retries until DB healthy, then service_healthy gates app start

The second half is telling your app to wait for the dependency to be healthy, not merely running. Compose supports this with depends_on plus a condition.

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_healthy

condition: service_healthy is the difference that closes the race. With it, Compose holds the app back until the database's healthcheck passes and the container reports healthy. The plain depends_on only guarantees the dependency has started; the service_healthy condition guarantees it is ready to answer. That distinction is the entire bug.

One command that waits: docker compose up --wait

When you are bootstrapping a dev environment with a single command, you want the same guarantee at the command line: do not return until everything is genuinely ready. That is what docker compose up -d --wait provides. It blocks until every service with a healthcheck reports healthy, then returns. Combined with the healthchecks above, this means your dev bootstrap script can bring up the stack and know, when the command finishes, that the database is actually serving queries.

{
  "scripts": {
    "predev": "docker compose up -d --wait && pnpm migrate",
    "dev": "next dev -p 3040"
  }
}

Now the chain is honest end to end. up --wait does not return until MySQL answers a ping. Migrations run against a database that is ready. The dev server starts last, into a world that is fully prepared. Nobody has to remember to run anything, and the race that used to crash the first boot is structurally impossible. This is the one-command bootstrap principle we hold on every project: the command that starts the app makes the app's world ready, with no manual prerequisite steps, the same automate-by-default instinct behind scripts as the single source of truth so nobody fixes production by hand. The same hot-reload runtime is where a stray DB pool quietly exhausts MySQL connections if you are not careful.

A healthcheck is a contract worth getting right

A good healthcheck does more than gate startup. It becomes a continuous statement of whether the service is actually working, which is useful far beyond the first boot. A guideline that pays off: if you define an application-level healthcheck, make it verify the service's critical dependencies, the database at minimum, so a healthy report means the app can really do its job, not just that the process is alive. A process that is up but cannot reach its database is not healthy in any way that matters to a user.

That mindset is the bridge from local development to production. The same readiness signal that stops a dev startup race is the signal a deploy pipeline polls before it cuts traffic to a new version, and the signal a load balancer uses to decide whether to send requests to an instance. We wire genuine health endpoints into everything we run, because "is it up" and "is it working" are different questions, and only the second one keeps users happy. That same health signal is what lets an agent safely hand the deploy pipeline its mechanics while a human keeps the judgment. Getting that right is core to how we administer servers and to the web apps we build, where the deploy and rollback safety lives in every release.

Stop guessing, start waiting

Startup races are not rare edge cases. They show up the moment an app depends on a stateful service, which is almost always. The fix is not a sleep 10 and a prayer, which breaks the day the database takes eleven seconds. The fix is to define what "ready" means with a healthcheck, wait on that condition with service_healthy and --wait, and verify the check command actually exists in the image. Do that, and the first boot is as reliable as the hundredth. The race is gone, and it stays gone for the life of the project.