LLM Cost Calculator

Estimate the monthly cost of running an LLM workload across all major providers and hosting options. Adjust the workload inputs and see every pricing tier — on-demand, batch, and prompt-cached — ranked side-by-side.

34 pricing rows 4 providers 6 hosting options Prices verified 2026-06-07

Start from a scenario

Concurrent users Requests / user / day Working days / month Input tokens / request Output tokens / request Cache hit rate 0% Only applies to Prompt-cached tier

110K requests / month 220.0M input tokens 55.0M output tokens

Provider

Hosting

Pricing tier

Currency

Period

Mode

34 of 34 pricing rows visible · curated to releases since Sept 2025

Apply your negotiated rates Optional · enter your enterprise discount per hosting

OpenAI direct

% off

Azure OpenAI

% off

Anthropic direct

% off

AWS Bedrock

% off

GCP Vertex AI

% off

Google AI Studio

% off

Alibaba DashScope

% off

Model	Hosting	In / Out $ per M tokens	Effective rate after tier	Monthly cost	Per user / mo	Per request
GPT-5.4 nano ⓘ Released Mar 2026	OpenAI direct	$0.20 / $1.25	$0.20 / $1.25	$113	$1.13	$0.00103
GPT-5.4 nano Released Mar 2026	Azure OpenAI	$0.20 / $1.25	$0.20 / $1.25	$113	$1.13	$0.00103
Gemini 3.1 Flash-Lite ⓘ Released Feb 2026	GCP Vertex AI / AI Studio	$0.25 / $1.50	$0.25 / $1.50	$138	$1.38	$0.00125
Qwen3-Coder-480B-A35B ⓘ Released Sept 2025	AWS Bedrock	$0.22 / $1.80	$0.22 / $1.80	$147	$1.47	$0.00134
Qwen3.7-Plus ⓘ Released Jun 2026	Alibaba DashScope	$0.40 / $1.60	$0.40 / $1.60	$176	$1.76	$0.00160
GPT-5.4 mini Released Mar 2026	OpenAI direct	$0.75 / $4.50	$0.75 / $4.50	$413	$4.13	$0.00375
GPT-5.4 mini Released Mar 2026	Azure OpenAI	$0.75 / $4.50	$0.75 / $4.50	$413	$4.13	$0.00375
Qwen3-Coder-480B-A35B ⓘ Released Sept 2025	GCP Vertex AI	$1.00 / $4.00	$1.00 / $4.00	$440	$4.40	$0.00400
Claude Haiku 4.5 ⓘ Released Oct 2025	Anthropic direct	$1.00 / $5.00	$1.00 / $5.00	$495	$4.95	$0.00450
Claude Haiku 4.5 ⓘ Released Oct 2025	AWS Bedrock	$1.00 / $5.00	$1.00 / $5.00	$495	$4.95	$0.00450
Claude Haiku 4.5 Released Oct 2025	GCP Vertex AI	$1.00 / $5.00	$1.00 / $5.00	$495	$4.95	$0.00450
Gemini 3.5 Flash ⓘ Released May 2026	GCP Vertex AI / AI Studio	$1.50 / $9.00	$1.50 / $9.00	$825	$8.25	$0.00750
Qwen3.7-Max ⓘ Released May 2026	Alibaba DashScope	$2.50 / $7.50	$2.50 / $7.50	$963	$9.63	$0.00875
Gemini 3.1 Pro (preview) ⓘ Released Feb 2026	GCP Vertex AI / AI Studio	$2.00 / $12.00	$2.00 / $12.00	$1,100	$11	$0.010
GPT-5.4 ⓘ Released Mar 2026	OpenAI direct	$2.50 / $15.00	$2.50 / $15.00	$1,375	$14	$0.013
GPT-5.4 ⓘ Released Mar 2026	Azure OpenAI	$2.50 / $15.00	$2.50 / $15.00	$1,375	$14	$0.013
Claude Sonnet 4.5 Released Sept 2025	Anthropic direct	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Sonnet 4.6 ⓘ Released Feb 2026	Anthropic direct	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Sonnet 4.5 Released Sept 2025	AWS Bedrock	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Sonnet 4.6 Released Feb 2026	AWS Bedrock	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Sonnet 4.5 Released Sept 2025	GCP Vertex AI	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Sonnet 4.6 Released Feb 2026	GCP Vertex AI	$3.00 / $15.00	$3.00 / $15.00	$1,485	$15	$0.013
Claude Opus 4.7 ⓘ Released Apr 2026	Anthropic direct	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
Claude Opus 4.8 ⓘ Released May 2026	Anthropic direct	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
Claude Opus 4.7 Released Apr 2026	AWS Bedrock	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
Claude Opus 4.8 Released May 2026	AWS Bedrock	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
Claude Opus 4.7 Released Apr 2026	GCP Vertex AI	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
Claude Opus 4.8 Released May 2026	GCP Vertex AI	$5.00 / $25.00	$5.00 / $25.00	$2,475	$25	$0.022
GPT-5.5 ⓘ Released Apr 2026	OpenAI direct	$5.00 / $30.00	$5.00 / $30.00	$2,750	$28	$0.025
GPT-5.5 Released Apr 2026	Azure OpenAI	$5.00 / $30.00	$5.00 / $30.00	$2,750	$28	$0.025
GPT-5.4 Pro ⓘ Released Mar 2026	OpenAI direct	$30.00 / $180.00	$30.00 / $180.00	$16,500	$165	$0.150
GPT-5.5 Pro ⓘ Released Apr 2026	OpenAI direct	$30.00 / $180.00	$30.00 / $180.00	$16,500	$165	$0.150
GPT-5.4 Pro Released Mar 2026	Azure OpenAI	$30.00 / $180.00	$30.00 / $180.00	$16,500	$165	$0.150
GPT-5.5 Pro Released Apr 2026	Azure OpenAI	$30.00 / $180.00	$30.00 / $180.00	$16,500	$165	$0.150

How the math works Formulas · sources · scope · assumptions

total_requests = users × requests/user/day × days/month
monthly_cost_usd = (input_tokens / 1M) × input_price + (output_tokens / 1M) × output_price
hours_freed = users × 8 × pct_workday_automated × days/month
time_value = hours_freed × hourly_rate · value_multiple = time_value / cost · fte_equivalent = hours_freed / 176
annual = monthly × 12 · cad = usd × 1.3837

Scope — last 12 months only: the dataset is curated to current-generation models released roughly within the last year. Superseded families (GPT-4o, Claude 3.x, Gemini 2.0, Qwen 2.x) are intentionally excluded — they would compute correctly but aren't what you'd plan a new deployment around in 2026.
Row order: grouped by model owner (OpenAI → Anthropic → Google → Qwen), sorted alphabetically by model name within each group. The cheapest-visible row is highlighted regardless of group position.
Negotiated rates: the "Apply your negotiated rates" disclosure lets you enter an enterprise discount per hosting (e.g. 15% off Azure OpenAI). The discount is applied to both input and output token rates after the tier (on-demand / batch / cached) is resolved, so it stacks with batch and cached discounts. Discounts are encoded in the share-link hash so a colleague opening your URL sees the same negotiated view.
Scenario presets draw workload assumptions from public benchmarks: Microsoft Copilot for M365 impact study (knowledge-work uplift), GitHub Copilot productivity research, and McKinsey "AI in the workplace" 2024. Adjust against your own deployment metrics before treating any output as a budget commitment.
Business value mode compares inference cost against the dollar value of capacity freed up — users × workday hours × % automated × workdays × hourly rate. The % of workday automated slider is the lever: 10–25% is the range supported by Microsoft Copilot for M365 (~14% knowledge-worker uplift), GitHub Copilot studies (~26% engineer uplift), and McKinsey "AI in the workplace" 2024 (which projects up to 30% achievable by 2030). The default hourly rate ($100/hr) approximates $200K fully-loaded salary for US enterprise mid-level. Adjust both to your geography and role mix.
Capacity freed vs replacement: the value mode frames the output as hours reinvested (time given back to each user) rather than headcount displaced. The FTE-equivalent figure is a capacity benchmark — "this much extra bandwidth" — not a layoff projection.
Batch API tier: uses each provider's published batch price (typically ~50% of on-demand) where available. Models without a published batch tier fall back to on-demand at the same rate.
Prompt-cached tier: only this tier consumes the cache slider. Input cost becomes fresh × on-demand_input + cached × cached_input_price. Cache write cost is ignored at v1 (small relative to read at high hit rates).
Reasoning models (GPT-5.4 Pro, GPT-5.5 Pro, Claude Opus 4.8): output tokens include hidden chain-of-thought billed at the output rate. Pad output token estimates ~3–5× for reasoning-heavy tasks.
Provisioned throughput (PTUs / Bedrock PT) tiers from Azure OpenAI and AWS Bedrock aren't modelled — those are reserved-capacity quotes negotiated per workload, not a single per-token rate.
Gemini 3.1 Pro (preview) has a tiered price: $2/$12 for prompts ≤200K tokens; $4/$18 above. Calculator uses the ≤200K rate.

Token pricing source: LiteLLM model_prices_and_context_window.json, cross-checked against each provider's official pricing page (click any model name to open the citation). Snapshot date 2026-06-07.
USD → CAD rate: 1.3837 as published by Bank of Canada daily exchange rates on 2026-06-01. FX moves daily — for budget commitments above $100K, re-verify the rate.
Provider prices change without notice — re-verify before any financial decision.

Related research

From Pilot to P&L: The Economics of Enterprise AI Value — the routing, caching, and batching levers this calculator prices, with a worked −88% example.
The Reference Design: An Executable Blueprint for the AI Platform — where the gateway, routing tiers, and blocking budgets sit in a full platform build.

Need a TCO including infra and ops?

Token-pricing is one input. If you're sizing a real deployment — controls, runtime, observability, internal model risk — we can run the full picture with you.

Talk to us Read the white papers