Why Your Lighthouse Score Lies and Field Data Tells the Truth

Lighthouse runs in a lab and cannot measure INP at all. Wire up the web-vitals library to optimize what real visitors actually feel.

Derrick S. K. SiaworNovember 28, 20257 min read

Long-exposure light trails under a highway interchange at dusk in warm amber tones — Photo · Unsplash contributor / Unsplash

You ship a page, run Lighthouse, see a green 95, and call it fast. Two weeks later a customer in Kumasi on a three-year-old Android phone tells you the page hangs for a second every time they tap a button. Your dashboard says everything is perfect. Their experience says otherwise. Both are telling the truth, because they are measuring two different things.

Lighthouse is a lab test. It runs on a clean machine, on a throttled but predictable network, with no extensions, no cold cache, no real interactions. Field data is what your actual visitors feel on their actual devices. When the two disagree, the field is right, because the field is the product. Here is how to stop guessing and start measuring what people experience.

The three metrics that matter, and where they come from

Google's Core Web Vitals are three numbers, each measured at the 75th percentile of your real traffic:

Largest Contentful Paint (LCP) measures loading. Good is 2.5 seconds or less, and making the LCP image load first every single time is usually the highest-leverage fix.
Interaction to Next Paint (INP) measures responsiveness. Good is 200 milliseconds or less. INP replaced First Input Delay as a Core Web Vital in March 2024, and cutting INP below 200ms before it tanks rankings is its own discipline.
Cumulative Layout Shift (CLS) measures visual stability. Good is 0.1 or less, which means driving layout shift to zero on dynamic pages by reserving space before content arrives.

The critical detail that breaks most performance work: INP cannot be measured in a lab at all. It needs a real human to tap, click, or type. Lighthouse reports Total Blocking Time instead, which correlates with INP but is not the same number. You can pass Total Blocking Time in the lab and still fail INP in the field, because your customer's mid-range phone takes 300 milliseconds to run the JavaScript your test machine ran in 80.

This is why a page can score 95 in Lighthouse and still fail Core Web Vitals in Google's Chrome User Experience Report (CrUX), the field dataset Google actually uses for search signals. The lab simulates one device on one network. CrUX is the aggregate of millions of real Chrome sessions on every device and connection imaginable.

What the lab hides

A synthetic run optimizes for a fiction. The lab device is faster than the median phone in most of the world. The lab network is steadier than real cellular. The lab session is a single cold page load with no interaction, so it never sees the INP problem that shows up when a user opens a dropdown, sorts a table, or submits a form.

According to the 2025 Web Almanac, 77 percent of mobile pages globally hit a good INP score. That leaves nearly a quarter failing on the exact metric your lab tool physically cannot test. If you are optimizing only to the Lighthouse number, you are blind to the most common real-world failure.

Set up real user monitoring with the web-vitals library

Real user monitoring loop: web-vitals beacon, aggregate at p75, reproduce in lab, fix the source

The fix is to instrument the page so it reports the actual numbers from actual sessions. Google ships a tiny official library for this, web-vitals, and it is the same measurement code that feeds CrUX.

Install it and wire up the four functions. Each one fires a callback with the real measured value once the metric is known for that session.

import { onLCP, onINP, onCLS, onTTFB } from 'web-vitals';

function send(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    id: metric.id,
    page: location.pathname,
  });
  navigator.sendBeacon('/api/vitals', body);
}

onLCP(send);
onINP(send);
onCLS(send);
onTTFB(send);

A few things make this production-grade rather than a toy:

Use navigator.sendBeacon, not fetch. INP and CLS are only final when the page is being hidden or unloaded. A normal fetch gets cancelled when the user navigates away. sendBeacon queues the payload and survives the unload.
Report the rating field. The library already buckets each value as good, needs-improvement, or poor against the official thresholds. Store it so you can compute the percentage of sessions passing without recomputing thresholds yourself.
Keep the id. It lets you deduplicate and attribute multiple INP candidates within one session to the same page view.
Attach context. Page path, device type, connection type via the Network Information API, and an anonymized session id turn a raw number into something you can act on. "LCP is 4 seconds" is noise. "LCP is 4 seconds on the product page for mobile users on 3G" is a ticket.

On the server, your /api/vitals endpoint writes each beacon to a table or a time-series store. Aggregate at the 75th percentile per page per day, because that is the percentile Google uses and the one that reflects your worst-affected real users rather than your luckiest ones.

Read the field data, then go fix the source

Once data flows in, the picture sharpens fast. You stop optimizing the homepage that already passes and start on the checkout page that quietly fails INP on mid-range Androids. You see that your LCP regression started the day a marketing team dropped a hero video. You watch CLS spike on pages that load a late ad slot.

The pattern we follow when we tune a client site is simple. Measure the field first to find the page and the metric that actually hurt real users. Then reproduce the worst case in the lab by throttling Lighthouse to a slow device and slow network, so the synthetic run finally resembles the customer's phone. Then fix the root cause: defer the third-party script eating the main thread, reserve space for the late-loading image, break up the long JavaScript task that blocks the next paint. Field data is also the only place you catch a slow leak: a session that degrades over time, the signature of memory leaks making a SPA slower by the hour, never shows up in a single cold lab run. This is the same field-first discipline we bring to every build on our web apps and the same instinct that drives LadenX, our AI site-reliability engineer, to read the real signal before it acts rather than trusting a summary.

INP is usually the one you are failing silently

If you only have time to instrument one metric, instrument INP. It is the one the lab cannot see, the one that maps directly to the "this feels laggy" complaint, and the one most likely to be failing while your Lighthouse score stays green. Common causes, in the order we usually find them:

A single oversized event handler doing layout-thrashing work on every click.
A heavy third-party widget (chat, analytics, A/B testing) hogging the main thread.
Hydration in a JavaScript framework that ties up the main thread right when the user first interacts, which is exactly why shipping zero JS from server components pays off where the user taps.
Synchronous work in a requestAnimationFrame loop that never yields.

The fix is almost always to break long tasks into smaller ones, yield to the browser with scheduler.yield() or setTimeout, and move non-urgent work off the click path. A heavy third-party widget hogging the main thread is the most common culprit, and the difference a slow second makes to revenue is why it is worth chasing.

The numbers that prove you fixed it

Lab tools are still useful. They are fast, repeatable, and catch regressions in CI before they ship. The mistake is treating the lab number as the goal. The goal is the field number, the one Google ranks on and the one your customer feels. Wiring a number you watch into a budget the whole team agrees not to break turns this from a one-time cleanup into a performance budget your team will not quietly break.

Run both. Use the lab to catch regressions before deploy and to reproduce a known field problem under controlled conditions. Use real user monitoring to decide what to work on and to prove the work landed. When the field 75th-percentile INP drops from 350 milliseconds to 180, you have not improved a score. You have made the product feel faster for three quarters of the people using it, including the ones on the phones you would never test on yourself.

That is the difference between a green dashboard and a fast website. One is a synthetic claim. The other is what your users actually live through, and it is the only one worth optimizing.

If your Core Web Vitals look fine in the lab but customers still complain, the field data usually has the answer waiting. Talk to us and we will instrument it, read it, and fix the source.

web-vitals core-web-vitals performance monitoring

All of the Journal