Build Audit Logs That Actually Help After a Breach
Record every privileged action, make logs append-only and tamper-evident, forward them off-host, and keep them queryable when minutes matter.
The worst time to discover your audit logs are useless is during a breach. An attacker has been in your system for a week. You need to know exactly what they touched, which accounts they used, what data they read, when they got in, and whether they are still there. You open your logs and find a thin record of HTTP requests, no record of the privileged actions that actually mattered, and no way to be sure the attacker did not edit the logs on their way through to cover their tracks.
At that moment, the logs you wished you had do not exist, and you cannot retroactively create them. Audit logging is one of those investments that returns nothing until the single worst day, and then it returns everything. The goal is a record that lets you reconstruct exactly who did what, when, and with what outcome, that you can trust to be unaltered, and that you can query fast while minutes are bleeding into a live incident.
Audit logs are not application logs
The first mistake is conflating the two. Application logs exist to help developers debug: they are verbose, noisy, full of stack traces and timing data, and they get rotated away after a few days. Audit logs exist to establish accountability: they record the security-relevant actions that someone might later need to prove, they are far less noisy, and they need to be kept far longer.
These have different audiences, different content, and different retention requirements, and they should be stored separately. When an auditor or an incident responder asks "who deleted this customer's data and when," they should not be grepping through gigabytes of debug output. They should be querying a clean, dedicated audit trail that records exactly the actions that matter.
Record the privileged actions, with enough to reconstruct
The question to answer for every audit entry is: if someone asks "who did this, when, and what happened," does the log contain the answer? Every privileged action should produce a record with, at minimum:
- Who. The authenticated actor: user id, and where relevant the API token id or service identity. Not just an IP, the actual identity.
- What. The action taken, as a stable machine-readable name:
user.role_changed,record.deleted,payment.refunded,login.succeeded,login.failed. - When. A precise, trustworthy timestamp in a consistent timezone.
- On what. The target resource: which record, which account, which setting.
- From where. Source IP and relevant request context.
- Outcome. Did it succeed or fail, and what changed. For a state change, the before and after values where they matter.
The actions worth logging are the ones with consequences: authentication events including failures, authorization changes, any creation, modification, or deletion of significant data, privilege escalations, configuration changes, access to sensitive records, and money movements. A failed login matters as much as a successful one, because a string of failures is the signature of an attack in progress, and the same signal that drives progressive lockout against credential stuffers.
One hard rule on content: never log secrets. Passwords, tokens, full payment details, raw sensitive personal data have no place in an audit log. Apply data minimization, the same instinct behind masking PII in public API responses by default. Log the fact that a password was changed, never the password. Log that a payment was refunded and its amount, not the card number. An audit log full of secrets is a breach waiting to be a worse breach.
A log you can edit is a log you cannot trust
This is the part that separates a real audit trail from a checkbox. If an attacker who compromises your system can also alter or delete the logs, then the logs prove nothing, because they may have been rewritten by the very person you are investigating. An audit log is only evidence if it is tamper-evident.
Two layers achieve this:
- Append-only, immutable storage. Audit records should land in a destination where they cannot be modified after the write. Write-once-read-many storage, or an append-only store where deletion is restricted, audited, and recoverable, ensures that once an event is recorded it stays recorded. The attacker can do bad things, but they cannot un-record having done them.
- Cryptographic tamper-evidence. Chain the records with hashes, where each entry includes a hash of the previous one, so any alteration to a past record breaks the chain and is immediately detectable. This is the same idea behind a hash chain: you cannot quietly change entry 400 without invalidating every entry after it. Digital signatures on log batches add the same guarantee for who produced the records. The point is not to make tampering impossible, it is to make tampering detectable, so a court, an auditor, or your own incident team can trust that what the log says is what actually happened.
The single most important operational practice supporting both: forward logs off the host immediately to a separate, centralized, secured log server. Logs that live only on the compromised machine are logs the attacker controls. The moment an audit event is generated, it should leave for a destination the attacker does not have access to. This is the difference between "the attacker erased the evidence" and "the attacker's actions were already replicated somewhere they could not reach."
Query it when minutes matter
A tamper-proof audit trail that you cannot search is a vault with no door. During an incident, the value of the log is entirely in how fast you can answer questions: show me every action by this user in the last week, every failed login from this IP, every deletion of records in this table, every privilege change today.
So the storage has to be queryable along the dimensions you will actually search: by actor, by action type, by target resource, by time range, by source. This is where structured logging that turns noisy server logs into alerts you trust pays off: a clean machine-readable record is one you can query under pressure. Standardize the format so a query works the same across every service, and centralize the storage so one query covers your whole system rather than ten separate hunts. When the breach is live and you are trying to scope the damage, the difference between a five-minute query and a five-hour manual reconstruction is the difference between containing the incident and watching it spread while you dig.
Build it in, do not bolt it on
Audit logging done well is woven through the application, because the application is where the privileged actions happen and where the identity, the target, and the outcome are all known. You cannot reconstruct a clean audit trail from web-server access logs after the fact, because the access log does not know that a particular POST request changed a user's role from member to admin. The application knows. So the application has to write the record, at the moment of the action, with the full context.
This is the standard we hold on every system we build and every server we run under server administration: the privileged actions are logged, the logs are append-only and forwarded off-host, and they are queryable the way an incident responder actually searches. Having that trail ready is a core part of what a small team needs before its first security incident. It is exactly the gap our security audits check for, because plenty of systems that look secure on the surface have no usable audit trail underneath, and that gap is invisible right up until the day it is the only thing that matters.
Skipping this work is one of the quiet ways skipping security early really costs a startup: the bill comes due on the one day you cannot pay it with hindsight. You will not appreciate a good audit log until the worst day. But on that day, it is the entire difference between knowing what happened and guessing. Build the record now, make it impossible to quietly alter, and keep it where you can search it fast. Then when someone asks "who did this, and when," you will have an answer instead of a shrug.






