Make Agent Tool Calls Idempotent Before a Double Charge

Design retries, idempotency keys, and side-effect guards so a retrying AI agent never fires the same action twice.

Derrick S. K. SiaworMay 27, 20268 min read

Macro of gold circuit-board traces on a dark green printed circuit board — Photo · Tima Miroshnichenko / Pexels

Give an AI agent a tool that charges a customer's card, and you have introduced a failure mode that ordinary code mostly avoids: the same action firing twice. Agents retry. They retry on a timeout, on a transient error, on a model deciding the first attempt did not work. Network calls between the agent, your API, and the payment gateway can succeed on the server while the response gets lost on the way back, and the agent, seeing no confirmation, does the reasonable thing and tries again. If your charge endpoint processes that second request, the customer pays twice.

This is not a new problem. Payment systems have dealt with retry-driven double charges for years, and the solution is well established. What is new is that agents make retries far more frequent and far less predictable than a human clicking a button, so the discipline that used to be optional for an internal tool is now mandatory for anything an agent can call. Here is how to make tool calls safe to repeat.

Why retries are the default, not the exception

A human who clicks "pay" and sees a spinner waits. An agent operating a tool does not have a spinner; it has a request and a response, and when the response is slow or missing, retrying is built into how reliable systems behave. The trap is the gap between "the action happened" and "the caller knows the action happened."

Consider the sequence. The agent calls your charge tool. Your server receives it, charges the card successfully, and starts sending back the confirmation. Somewhere on the return trip the connection drops. The agent never sees the success, concludes the call failed, and retries. Your server, with no memory that it already did this, charges the card again. Two charges, one intended payment, and an angry customer.

The same shape applies to any side effect an agent can trigger: sending an email, creating a record, kicking off a shipment, posting to an external system. Anything that changes state in the world is something a retry can do twice, and the more you orchestrate multiple agents handing work off to each other, the more chances a single intended action has to fire more than once. The fix has to live on the server, because the agent genuinely cannot tell a lost-response success from a real failure, and you cannot ask it to.

Idempotency keys: the core mechanism

An idempotent operation is one you can perform many times and get the same result as performing it once. The standard way to make a side-effecting endpoint idempotent is the idempotency key.

The key is a unique identifier generated by the caller and attached to the request, the same key on the original and on every retry of that specific operation. The server uses it as a deduplication handle. The pattern is precise:

The request arrives carrying an idempotency key.
Before doing any work, the server checks whether it has already processed this key.
If it has, it returns the stored result from the first time, without redoing the work.
If it has not, it records the key, does the work, stores the result against the key, and returns it.

So the first charge goes through, the result is saved under the key, and when the agent retries with the same key, the server recognizes it, skips the charge, and replays the original successful response. The agent gets its confirmation, the customer pays once. The key turns a dangerous retry into a harmless one.

For an agent, the natural place to generate the key is at the point the agent decides to take an action, so that all retries of that one decision share a key while a genuinely new action gets a fresh one. The distinction the server needs is "is this the same intended action, or a different one," and the key carries exactly that.

The race nobody handles, and how to close it

The naive version of the pattern has a hole. "Check if the key exists, and if not, process it" is two steps, and two requests with the same key can both pass the check before either one records it, especially under the rapid-fire retries an agent can produce. Both proceed, both charge. You have reintroduced the bug at a smaller scale.

The fix is a lock. When a request with a new key arrives, acquire a lock on that key before processing, so a concurrent retry with the same key blocks instead of racing ahead. Process the work, store the result, release the lock. The second request, having waited on the lock, now finds the key already recorded and returns the stored result. A fast store like Redis is the common home for this, holding the key, its lock, and the cached response together. The states a key moves through only stay reliable if the tool's own output is well-formed, which is one more reason to force LLM output into schemas your code can actually trust before it ever reaches the side-effecting call.

Idempotency key flow with a lock: completed keys replay the stored result so an agent retry never double charges

The states a key moves through are worth being explicit about: in-progress, completed, and failed. An in-progress key means a retry should wait or be told to wait, not start a parallel attempt. A completed key returns the cached success. A failed key, where the original genuinely errored before doing anything, can be retried fresh. Modeling these states is what keeps a retry during processing from doing damage.

Defense in depth: pass the key downstream

Even with your endpoint locked down, you are calling a payment gateway, and the same lost-response problem can happen between your server and theirs. The good news is that the major gateways solved this on their side too. Stripe, Adyen, and PayPal all accept idempotency keys of their own. The webhook coming back the other way needs the same care, since verifying payment webhooks before they move money is what stops a replayed callback from confirming a charge twice.

So pass it through. When your charge handler calls the gateway, attach an idempotency key to that downstream call as well. Now even if two of your own servers somehow both send the charge, the gateway deduplicates it and charges once. You have protection at your application layer and at the gateway layer, and a failure that slips past one is caught by the other. This dual-layer approach is what production payment flows actually rely on, and it is cheap to add because the gateway does the hard part.

Guard the side effects, not just the charges

Money is the obvious case, but an agent's tool belt is full of actions that should fire once. The same thinking applies to all of them, with a useful refinement: separate the operations that are naturally safe to repeat from the ones that are not.

Reading data is idempotent by nature; a GET can run a thousand times and change nothing, so those tools need no special handling. The care goes to the writes and the external effects. An email-sending tool keyed by the message it is sending so a retry does not send three copies. A record-creation tool that checks whether the record already exists for this key before inserting. A fulfillment trigger that records it has already shipped this order. The principle is uniform: any tool that changes state outside the agent gets an idempotency handle, and the server enforces exactly-once against it.

This is the same discipline that underpins how we build autonomous systems that take real action. LadenX, the AI site-reliability engineer we built, classifies every command and refuses destructive ones without a human signing off, precisely because an agent acting on production needs guardrails that make a repeated or mistaken action safe rather than catastrophic. That refusal-and-approval gate is the subject of giving autonomous fixes guardrails before they touch production, and when something does go wrong, seeing exactly what your agent did is how you reconstruct whether a tool fired once or twice. The idempotency key is one half of that safety; the refusal-and-approval gate is the other. Designing those guarantees into the tools an agent can call, rather than hoping the agent never retries at the wrong moment, is the core of building AI systems and automation you can actually trust with side effects.

The rule for any agent-callable tool

Before you expose a tool to an agent, ask one question: what happens if this gets called twice with the same intent? If the answer is "the same thing happens twice and that is fine," you have a read or a naturally idempotent write, and you are done. If the answer is "something bad happens twice," you need an idempotency key, a lock to close the race, and ideally the key passed downstream to any external service the tool touches.

Agents will retry. That is a feature of reliable systems, not a bug to suppress. The work is making sure that when they do, the worst case is a duplicate request that gets quietly deduplicated, not a duplicate charge that reaches a customer. Build the exactly-once guarantee into the tool, and the retry stops being a liability and becomes exactly what it is supposed to be: a harmless second attempt at an action that already succeeded.

ai automation idempotency api

All of the Journal