Skip to content
All tools Calculator

LLM Cost Calculator

Estimate the monthly cost of running an LLM workload across all major providers and hosting options. Adjust the workload inputs and see every pricing tier — on-demand, batch, and prompt-cached — ranked side-by-side.

29 pricing rows 4 providers 6 hosting options Prices verified 2026-05-27
Start from a scenario

Workload

Adjust the numbers — costs update live across 29 pricing rows.

110K requests / month 220.0M input tokens 55.0M output tokens
Provider
Hosting
Pricing tier
Currency
Period
Mode
29 of 29 pricing rows visible · curated to releases since Jun 2025
Apply your negotiated rates Optional · enter your enterprise discount per hosting
Model Hosting In / Out
$ per M tokens
Effective rate
after tier
Monthly cost Per user / mo Per request
GPT-5 nano Released Aug 2025 OpenAI direct $0.05 / $0.40 $0.05 / $0.40 $33 $0.330 $0.00030
GPT-5 nano Released Aug 2025 Azure OpenAI $0.05 / $0.40 $0.05 / $0.40 $33 $0.330 $0.00030
Gemini 2.5 Flash-Lite Released Jul 2025 GCP Vertex AI / AI Studio $0.10 / $0.40 $0.10 / $0.40 $44 $0.440 $0.00040
Qwen3-235B-A22B Released Jul 2025 AWS Bedrock $0.22 / $0.88 $0.22 / $0.88 $97 $0.968 $0.00088
Qwen3-235B-A22B Released Jul 2025 GCP Vertex AI $0.25 / $1.00 $0.25 / $1.00 $110 $1.10 $0.00100
Qwen3-Coder-480B-A35B Released Sept 2025 AWS Bedrock $0.22 / $1.80 $0.22 / $1.80 $147 $1.47 $0.00134
Qwen Plus Released Jul 2025 Alibaba DashScope $0.40 / $1.20 $0.40 / $1.20 $154 $1.54 $0.00140
GPT-5 mini Released Aug 2025 OpenAI direct $0.25 / $2.00 $0.25 / $2.00 $165 $1.65 $0.00150
GPT-5 mini Released Aug 2025 Azure OpenAI $0.25 / $2.00 $0.25 / $2.00 $165 $1.65 $0.00150
Gemini 2.5 Flash Released Jun 2025 GCP Vertex AI / AI Studio $0.30 / $2.50 $0.30 / $2.50 $204 $2.04 $0.00185
Gemini 3 Flash (preview) Released Feb 2026 GCP Vertex AI / AI Studio $0.50 / $3.00 $0.50 / $3.00 $275 $2.75 $0.00250
Qwen3-Coder-480B-A35B Released Sept 2025 GCP Vertex AI $1.00 / $4.00 $1.00 / $4.00 $440 $4.40 $0.00400
Claude Haiku 4.5 Released Oct 2025 Anthropic direct $1.00 / $5.00 $1.00 / $5.00 $495 $4.95 $0.00450
Claude Haiku 4.5 Released Oct 2025 AWS Bedrock $1.00 / $5.00 $1.00 / $5.00 $495 $4.95 $0.00450
Claude Haiku 4.5 Released Oct 2025 GCP Vertex AI $1.00 / $5.00 $1.00 / $5.00 $495 $4.95 $0.00450
GPT-5 Released Aug 2025 OpenAI direct $1.25 / $10.00 $1.25 / $10.00 $825 $8.25 $0.00750
GPT-5 Released Aug 2025 Azure OpenAI $1.25 / $10.00 $1.25 / $10.00 $825 $8.25 $0.00750
Gemini 2.5 Pro Released Jun 2025 GCP Vertex AI / AI Studio $1.25 / $10.00 $1.25 / $10.00 $825 $8.25 $0.00750
Gemini 3 Pro (preview) Released Feb 2026 GCP Vertex AI / AI Studio $2.00 / $12.00 $2.00 / $12.00 $1,100 $11 $0.010
Claude Sonnet 4.5 Released Sept 2025 Anthropic direct $3.00 / $15.00 $3.00 / $15.00 $1,485 $15 $0.013
Claude Sonnet 4.5 Released Sept 2025 AWS Bedrock $3.00 / $15.00 $3.00 / $15.00 $1,485 $15 $0.013
Claude Sonnet 4.5 Released Sept 2025 GCP Vertex AI $3.00 / $15.00 $3.00 / $15.00 $1,485 $15 $0.013
Claude Opus 4.5 Released Nov 2025 Anthropic direct $5.00 / $25.00 $5.00 / $25.00 $2,475 $25 $0.022
Claude Opus 4.5 Released Nov 2025 AWS Bedrock $5.00 / $25.00 $5.00 / $25.00 $2,475 $25 $0.022
Claude Opus 4.5 Released Nov 2025 GCP Vertex AI $5.00 / $25.00 $5.00 / $25.00 $2,475 $25 $0.022
GPT-5.5 Released Apr 2026 OpenAI direct $5.00 / $30.00 $5.00 / $30.00 $2,750 $28 $0.025
GPT-5.5 Released Apr 2026 Azure OpenAI $5.00 / $30.00 $5.00 / $30.00 $2,750 $28 $0.025
GPT-5 Pro Released Oct 2025 OpenAI direct $15.00 / $120.00 $15.00 / $120.00 $9,900 $99 $0.090
GPT-5 Pro Released Oct 2025 Azure OpenAI $15.00 / $120.00 $15.00 / $120.00 $9,900 $99 $0.090
How the math works Formulas · sources · scope · assumptions

total_requests = users × requests/user/day × days/month
monthly_cost_usd = (input_tokens / 1M) × input_price + (output_tokens / 1M) × output_price
hours_freed = users × 8 × pct_workday_automated × days/month
time_value = hours_freed × hourly_rate  ·  value_multiple = time_value / cost  ·  fte_equivalent = hours_freed / 176
annual = monthly × 12  ·  cad = usd × 1.3837

  • Scope — last 12 months only: the dataset is curated to current-generation models released roughly within the last year. Superseded families (GPT-4o, Claude 3.x, Gemini 2.0, Qwen 2.x) are intentionally excluded — they would compute correctly but aren't what you'd plan a new deployment around in 2026.
  • Row order: grouped by model owner (OpenAI → Anthropic → Google → Qwen), sorted alphabetically by model name within each group. The cheapest-visible row is highlighted regardless of group position.
  • Negotiated rates: the "Apply your negotiated rates" disclosure lets you enter an enterprise discount per hosting (e.g. 15% off Azure OpenAI). The discount is applied to both input and output token rates after the tier (on-demand / batch / cached) is resolved, so it stacks with batch and cached discounts. Discounts are encoded in the share-link hash so a colleague opening your URL sees the same negotiated view.
  • Scenario presets draw workload assumptions from public benchmarks: Microsoft Copilot for M365 impact study (knowledge-work uplift), GitHub Copilot productivity research, and McKinsey "AI in the workplace" 2024. Adjust against your own deployment metrics before treating any output as a budget commitment.
  • Business value mode compares inference cost against the dollar value of capacity freed up — users × workday hours × % automated × workdays × hourly rate. The % of workday automated slider is the lever: 10–25% is the range supported by Microsoft Copilot for M365 (~14% knowledge-worker uplift), GitHub Copilot studies (~26% engineer uplift), and McKinsey "AI in the workplace" 2024 (which projects up to 30% achievable by 2030). The default hourly rate ($100/hr) approximates $200K fully-loaded salary for US enterprise mid-level. Adjust both to your geography and role mix.
  • Capacity freed vs replacement: the value mode frames the output as hours reinvested (time given back to each user) rather than headcount displaced. The FTE-equivalent figure is a capacity benchmark — "this much extra bandwidth" — not a layoff projection.
  • Batch API tier: uses each provider's published batch price (typically ~50% of on-demand) where available. Models without a published batch tier fall back to on-demand at the same rate.
  • Prompt-cached tier: only this tier consumes the cache slider. Input cost becomes fresh × on-demand_input + cached × cached_input_price. Cache write cost is ignored at v1 (small relative to read at high hit rates).
  • Reasoning models (GPT-5 Pro, GPT-5.5): output tokens include hidden chain-of-thought billed at the output rate. Pad output token estimates ~3–5× for reasoning-heavy tasks.
  • Provisioned throughput (PTUs / Bedrock PT) tiers from Azure OpenAI and AWS Bedrock aren't modelled — those are reserved-capacity quotes negotiated per workload, not a single per-token rate.
  • Gemini 2.5 Pro has a tiered price: $1.25/$10 for prompts ≤200K tokens; $2.50/$15 above. Calculator uses the ≤200K rate.

Token pricing source: LiteLLM model_prices_and_context_window.json, cross-checked against each provider's official pricing page (click any model name to open the citation). Snapshot date 2026-05-27.
USD → CAD rate: 1.3837 as published by Bank of Canada daily exchange rates on 2026-06-01. FX moves daily — for budget commitments above $100K, re-verify the rate.
Provider prices change without notice — re-verify before any financial decision.

Need a TCO including infra and ops?

Token-pricing is one input. If you're sizing a real deployment — controls, runtime, observability, internal model risk — we can run the full picture with you.