How To Measure Your Engineering Team Without Vanity Metrics

Track DORA and SPACE outcomes that predict shipped value instead of commit counts that reward looking busy.

Derrick S. K. SiaworJune 25, 20257 min read

Calm minimal white desk with a laptop, notepad, plant and a yellow chair in warm light — Photo · Cup of Couple / Pexels

A founder who does not write code wants to know if the engineering team is doing well, so they reach for the numbers they can see. Commits per week. Lines of code. Story points closed. Hours logged. These feel like productivity because they go up when people are busy, and they are almost entirely useless, because they measure motion rather than progress and they quietly teach your best engineers to do the wrong things. The moment a number becomes a target, people optimize for the number, and a team measured on commit count learns to split one logical change into five superficial commits to make the dashboard look healthy.

The deeper problem is that effort is not value. Lines of code, commits, and hours measure how hard people appear to be working, not what they actually delivered to customers. A senior engineer who deletes two hundred lines and ships a fix that prevents a class of outages produced negative lines of code and enormous value. A junior who writes a thousand lines of code that has to be rewritten produced impressive activity and negative value. Any metric that would rank the junior above the senior is not measuring the thing you care about.

Why vanity metrics actively harm

It would be one thing if commit counts were merely useless. They are worse than useless because measuring them changes behavior in destructive directions. When you reward lines of code, you incentivize bloated code, the opposite of what good engineering produces. When you reward commit count, you incentivize splitting single logical tasks into multiple superficial commits to game the dashboard. When you reward hours, you incentivize presence over output and punish the engineer who solved the problem fast and went home.

These are not hypothetical risks. They are the predictable response of rational people to the incentives you set. Tell engineers the dashboard counts commits and you will get more commits, each one smaller and less meaningful, and you will have made your codebase worse while feeling like you gained insight. The vanity metric does not just fail to measure value, it converts your team's energy into producing the appearance of value, which is a real cost paid in real engineering time.

DORA: measuring the delivery system, not the people

The established starting point for measuring engineering effectiveness is the set of DORA metrics, which assess the health of your software delivery pipeline rather than the output of individuals. They focus on how quickly and reliably your team moves a change from idea to running in production. The core four, often extended with a fifth, are:

Deployment frequency. How often you successfully ship to production. Frequent, small deployments correlate with healthy teams, because they reflect a pipeline that lets people ship safely and often rather than batching risk into rare, scary releases.
Lead time for changes. How long it takes a change to go from committed to deployed. Short lead time means a smooth path from work to value; long lead time means friction somewhere in the system.
Change failure rate. What fraction of deployments cause a problem requiring a fix. This is the quality counterweight that keeps speed honest, because shipping fast while breaking things constantly is not productivity. It is also where a shared error budget gives founders a reliability target the whole company can agree on.
Time to restore service. How fast you recover when something does break. Failures are inevitable; what distinguishes a strong team is how quickly they get back to healthy, and the cost of every minute it takes is exactly what an hour of downtime really costs your business.

The crucial framing is that DORA measures the system, not the individual. It tells you how efficiently your team as a whole moves code from commit to deploy and how stable that flow is. It does not rank engineers, which is exactly why it resists gaming in the way commit counts do not. You cannot meaningfully fake a low change failure rate or a fast restore time by splitting commits.

SPACE: the human and collaborative dimension DORA misses

DORA tells you about the mechanical health of your delivery pipeline. It does not tell you whether that delivery is sustainable, whether your engineers are burning out, or whether the team collaborates well or just happens to be moving code. That is what the SPACE framework adds, across five dimensions: Satisfaction and well-being, Performance, Activity, Communication and collaboration, and Efficiency and flow.

DORA and SPACE work better together than apart. DORA shows you how efficiently code moves from commit to deploy. SPACE shows you how sustainably and collaboratively that code gets written. A team with great DORA numbers and terrible SPACE numbers is shipping fast on a foundation of burnout, and that is a number that looks healthy right up until the team breaks. The same hidden fragility shows up when velocity is propped up by a single irreplaceable engineer, which is key person risk you have to engineer out before they leave. Measuring satisfaction and cognitive load alongside delivery throughput is how you catch the unsustainable version of "productive" before it costs you the team.

The AI wrinkle nobody can ignore now

The 2025 DORA research offered the first close look at how AI coding tools are changing these metrics, and the finding is worth sitting with: AI adoption improves throughput but increases delivery instability. Teams ship more, faster, with AI assistance, and they also break things more often. This matters for how you read your own numbers. A jump in deployment frequency that comes with a rising change failure rate is not unambiguous good news, it is the predictable signature of AI-accelerated output outrunning the team's ability to keep it stable.

The lesson is that no single metric tells the story, and that is more true, not less, as AI reshapes how code gets written. Throughput up and stability down is a real pattern with a real cause, and only by watching both together do you see it. Watching deployment frequency alone, you would celebrate. Watching the pair, you know to invest in the testing and review that brings stability back up to match the new speed. When the AI doing the writing is also acting in production, measuring the ROI of automation projects is the discipline that keeps that throughput honest about whether it actually paid for itself.

What to actually do as a non-technical founder

You do not need to become an engineer to measure engineering well. You need to measure the right things and refuse to measure the wrong ones.

Stop tracking commits, lines of code, and hours as proxies for productivity. They are not, and tracking them makes your team worse. Start tracking the DORA metrics as a read on your delivery system's health, deployment frequency, lead time, change failure rate, and time to restore, and pair them with a periodic honest read on team satisfaction and load. Together these tell you whether your team ships reliably, recovers quickly, and can keep doing so without burning out.

Then connect those system metrics to business outcomes, because even DORA and SPACE describe how well the machine runs, not whether it is building the right thing. A team with excellent delivery metrics that is shipping features nobody uses is efficient at the wrong work. The full picture combines delivery health, team sustainability, and the question only product judgment answers: is the shipped work actually moving the business. Some of that drag is invisible until you learn how to tell if technical debt is quietly killing your roadmap, which no throughput number alone will surface.

This is the kind of thing worth working through with someone who has run engineering teams, which is part of what we bring to a consultation and to the way we structure our own delivery in our process. It also informs harder structural calls, like why your first software hire should not be a developer when what you actually need is delivery, not headcount. The point is not a dashboard for its own sake. The point is that you can tell, honestly, whether your engineering team is healthy and effective, without resorting to numbers that reward looking busy over being useful.

The metrics that predict shipped value are the ones that resist gaming, measure the system rather than the individual, and pair speed with stability and sustainability. The metrics that reward looking busy are the ones that go up when people split commits and pad line counts. Choose the first set, retire the second, and you will finally be measuring the thing you actually wanted to know.

engineering metrics dora management productivity

All of the Journal