Skip to content
DERKONLINE

Price Your AI Feature So Token Costs Never Eat Your Margin

Usage, seat, and hybrid pricing models plus product-side cost controls that keep an AI feature profitable as token costs swing.

Derrick S. K. Siawor8 min read

Most SaaS founders learned pricing in a world where the cost of serving one more customer was close to zero. You wrote the software once, and the thousandth user cost you a rounding error in compute. That assumption is baked into per-seat pricing, and per-seat pricing is why classic SaaS runs at 80 to 90 percent gross margins. Then you bolt an AI feature onto your product, and the floor shifts under you.

Every AI query now incurs a real, variable cost. The model charges you per token, and a power user can run that bill up far faster than your flat monthly fee accounts for. AI SaaS companies are routinely seeing 50 to 60 percent gross margins where pure software saw 80 to 90. If you price the AI feature the way you priced the rest of the product, your best customers, the ones who use it most, become the ones who cost you money. This is a pricing problem, not an engineering one, and it is solvable, but only if you stop pretending the marginal cost is zero. It pairs with the engineering side of the same fight: cutting your LLM bill in half without touching answer quality.

Why per-seat alone breaks for AI

A seat is a proxy for value that worked because seats did not consume much. With AI, the seat and the cost decouple. Two users on the same plan can differ tenfold in token spend. One drafts a few emails a week. The other runs your AI feature in a loop all day. Under a flat per-seat price, you charge them the same and eat the difference on the heavy one.

The market has already moved on this. Hybrid pricing is now the fastest-growing model in SaaS, and a Bain analysis of more than thirty established vendors found that 65 percent of them had layered AI usage charges on top of a base seat structure. The reason is simple: a flat fee cannot track a variable cost, and a variable cost that you cannot track is a margin leak that grows exactly as fast as your most engaged customers do.

The three models, and what each protects

There are three honest ways to price an AI feature, and the right answer is usually a blend. The same instinct that says price your software product before you have customers applies here: you anchor on value, not on cost, but cost sets the floor you cannot go below.

Pure usage-based

Charge per unit of consumption: per token, per generation, per run, per outcome. This aligns your revenue perfectly with your cost, so a heavy user pays for the load they create and your margin holds no matter how the usage distribution skews. The downside is that customers hate unpredictable bills. A line item that swings month to month makes budgeting impossible and makes your invoice the thing finance flags. Pure usage maximizes margin protection and minimizes purchase confidence.

Per-seat (or flat subscription)

Predictable for the customer, easy to forecast, familiar to buy. The downside is the one above: it cannot absorb variable cost. Per-seat works for an AI feature only when usage is naturally capped or so cheap per action that even your heaviest user cannot move your margin. For anything genuinely token-hungry, per-seat alone is a bet that your customers stay light, and the customers worth keeping are the ones who do not.

Hybrid: a base plus usage

This is where the market landed, and for good reason. A predictable base (a seat or a subscription) covers your fixed costs and gives the customer a number they can plan around. A usage component scales revenue as consumption grows, which protects your margin on the heavy users. The customer starts small on the base and grows into the usage tier as they get value, so the bill rises in step with the benefit rather than ambushing them.

The cleanest version of this follows one rule: pick one metric the customer understands and one internal metric that protects your margin. The customer sees something legible, like "credits" or "AI actions" or "documents processed." Behind that, you track tokens, because tokens are what you actually pay for. The customer-facing unit is for trust. The internal unit is for survival. They do not have to be the same number, and they usually should not be. How you present that customer-facing unit is its own discipline, which is where a pricing page that makes the right plan feel obvious earns its keep.

Build the financial model before you build the pricing page

You cannot price an AI feature you have not costed. Before you publish a number, model the unit economics:

  • Cost per action. How many tokens does a typical run of your feature consume, input and output, and what does that cost at your model's rate? Measure the real distribution, not the happy-path average, because the tail is where the margin dies.
  • A sustainable markup. Your price has to cover the token cost plus a margin that survives the heavy users, not just the median one.
  • Sensitivity to model price changes. Token prices move. Sometimes they fall, which helps you, and sometimes a model you depend on raises rates or you have to switch to a pricier one for quality. Your pricing should have enough headroom that a swing in model cost does not erase your margin overnight. If a 30 percent rise in token price turns your margin negative, you priced too thin.

The teams that get burned are the ones that set a flat AI add-on price by gut feel, watch adoption climb, and only then discover that adoption is the thing eating their margin. Knowing the cost per action and the shape of the usage distribution before launch is what separates a feature that funds itself from one that quietly subsidizes your heaviest users.

Protect the margin in the product, not just the price

Pricing is one lever. Engineering is the other, and the two have to be designed together. The cost of an AI feature is not fixed by the model alone, it is shaped by how you call the model.

  • Right-size the model. Not every action needs your most expensive model. Route simple tasks to a cheaper, faster model and reserve the premium one for the work that genuinely needs it. This alone can move your cost per action by a large factor. Knowing when fine-tuning beats prompting and when it just burns money is part of that sizing. Pushed far enough, the cheapest model is one you own: a small local classifier can outperform a frontier model on your task at a fraction of the per-call cost.
  • Cap and meter at the boundary. Enforce the usage limits your pricing promises, gracefully, in the product. A user on a 1,000-action plan hits a clear cap and an upgrade path, not a silent overage that lands on your bill.
  • Cache and dedupe. If the same input produces the same output, do not pay to generate it twice. Static, app-owned AI content should be generated once and served, not synthesized per request.
  • Bound the inputs. An unbounded prompt is an unbounded cost. Cap the size of what you send to the model so one pathological input cannot run up a huge bill.

When we build AI features into a product, we treat cost per action as a first-class design constraint, the same way we treat latency or correctness. A feature that is correct and fast but loses money on every heavy user is not done. We saw this concretely building Mythic Intel, an AI interview trainer where every spoken answer, every generated question, and every voice sample carries a real inference cost, so the architecture has to keep that cost predictable per user before the pricing can. Some of that work is deciding up front what AI to build and what to buy so you are not paying a vendor's margin on top of your own token bill.

Match the model to how value is delivered

The last question is what your customer is actually buying, because that should anchor the customer-facing metric. If the value is "a tool my team uses," a seat base makes sense and usage rides on top. If the value is "outcomes produced," like documents drafted or tickets resolved, price closer to the outcome, because that is the unit the customer connects to their own return. If the value is raw capacity, like API calls into your AI, lean usage-based and make the base small.

There is no universal answer, but there is a universal failure: pricing an AI feature as if the marginal cost were zero. It is not, and it never will be while you pay per token. Cost the feature, pick a customer metric they trust and an internal metric that holds your margin, blend a predictable base with usage that scales, and engineer the product to keep the cost per action under control. Do that, and your most engaged customers become your most profitable ones instead of the ones quietly draining the account.