LLM Cost Calculator
Estimate the monthly cost of running an LLM workload across all major providers and hosting options. Adjust the workload inputs and see every pricing tier — on-demand, batch, and prompt-cached — ranked side-by-side.
Workload
Adjust the numbers — costs update live across 29 pricing rows.
Hourly rate is fully loaded (salary + benefits + overhead). McKinsey 2024 enterprise medians: professional ≈ $80–120/hr, senior engineer ≈ $150–200/hr. % automated sits in the 10–25% range supported by Microsoft Copilot for M365 (≈14% knowledge-worker uplift) and GitHub Copilot impact studies (≈26% engineer uplift); McKinsey projects 30% achievable by 2030. Tune both to your org.
Apply your negotiated rates Optional · enter your enterprise discount per hosting 0 active
| Model | Hosting | In / Out $ per M tokens | Effective rate after tier | Monthly cost | Per user / mo | Per request |
|---|---|---|---|---|---|---|
| GPT-5 nano ⓘ Released Aug 2025 | OpenAI direct | $0.05 / $0.40 | $0.05 / $0.40 | $33 | $0.330 | $0.00030 |
| GPT-5 nano Released Aug 2025 | Azure OpenAI | $0.05 / $0.40 | $0.05 / $0.40 | $33 | $0.330 | $0.00030 |
| Gemini 2.5 Flash-Lite ⓘ Released Jul 2025 | GCP Vertex AI / AI Studio | $0.10 / $0.40 | $0.10 / $0.40 | $44 | $0.440 | $0.00040 |
| Qwen3-235B-A22B Released Jul 2025 | AWS Bedrock | $0.22 / $0.88 | $0.22 / $0.88 | $97 | $0.968 | $0.00088 |
| Qwen3-235B-A22B Released Jul 2025 | GCP Vertex AI | $0.25 / $1.00 | $0.25 / $1.00 | $110 | $1.10 | $0.00100 |
| Qwen3-Coder-480B-A35B Released Sept 2025 | AWS Bedrock | $0.22 / $1.80 | $0.22 / $1.80 | $147 | $1.47 | $0.00134 |
| Qwen Plus ⓘ Released Jul 2025 | Alibaba DashScope | $0.40 / $1.20 | $0.40 / $1.20 | $154 | $1.54 | $0.00140 |
| GPT-5 mini Released Aug 2025 | OpenAI direct | $0.25 / $2.00 | $0.25 / $2.00 | $165 | $1.65 | $0.00150 |
| GPT-5 mini Released Aug 2025 | Azure OpenAI | $0.25 / $2.00 | $0.25 / $2.00 | $165 | $1.65 | $0.00150 |
| Gemini 2.5 Flash Released Jun 2025 | GCP Vertex AI / AI Studio | $0.30 / $2.50 | $0.30 / $2.50 | $204 | $2.04 | $0.00185 |
| Gemini 3 Flash (preview) ⓘ Released Feb 2026 | GCP Vertex AI / AI Studio | $0.50 / $3.00 | $0.50 / $3.00 | $275 | $2.75 | $0.00250 |
| Qwen3-Coder-480B-A35B Released Sept 2025 | GCP Vertex AI | $1.00 / $4.00 | $1.00 / $4.00 | $440 | $4.40 | $0.00400 |
| Claude Haiku 4.5 ⓘ Released Oct 2025 | Anthropic direct | $1.00 / $5.00 | $1.00 / $5.00 | $495 | $4.95 | $0.00450 |
| Claude Haiku 4.5 ⓘ Released Oct 2025 | AWS Bedrock | $1.00 / $5.00 | $1.00 / $5.00 | $495 | $4.95 | $0.00450 |
| Claude Haiku 4.5 Released Oct 2025 | GCP Vertex AI | $1.00 / $5.00 | $1.00 / $5.00 | $495 | $4.95 | $0.00450 |
| GPT-5 ⓘ Released Aug 2025 | OpenAI direct | $1.25 / $10.00 | $1.25 / $10.00 | $825 | $8.25 | $0.00750 |
| GPT-5 ⓘ Released Aug 2025 | Azure OpenAI | $1.25 / $10.00 | $1.25 / $10.00 | $825 | $8.25 | $0.00750 |
| Gemini 2.5 Pro ⓘ Released Jun 2025 | GCP Vertex AI / AI Studio | $1.25 / $10.00 | $1.25 / $10.00 | $825 | $8.25 | $0.00750 |
| Gemini 3 Pro (preview) ⓘ Released Feb 2026 | GCP Vertex AI / AI Studio | $2.00 / $12.00 | $2.00 / $12.00 | $1,100 | $11 | $0.010 |
| Claude Sonnet 4.5 ⓘ Released Sept 2025 | Anthropic direct | $3.00 / $15.00 | $3.00 / $15.00 | $1,485 | $15 | $0.013 |
| Claude Sonnet 4.5 Released Sept 2025 | AWS Bedrock | $3.00 / $15.00 | $3.00 / $15.00 | $1,485 | $15 | $0.013 |
| Claude Sonnet 4.5 Released Sept 2025 | GCP Vertex AI | $3.00 / $15.00 | $3.00 / $15.00 | $1,485 | $15 | $0.013 |
| Claude Opus 4.5 ⓘ Released Nov 2025 | Anthropic direct | $5.00 / $25.00 | $5.00 / $25.00 | $2,475 | $25 | $0.022 |
| Claude Opus 4.5 Released Nov 2025 | AWS Bedrock | $5.00 / $25.00 | $5.00 / $25.00 | $2,475 | $25 | $0.022 |
| Claude Opus 4.5 Released Nov 2025 | GCP Vertex AI | $5.00 / $25.00 | $5.00 / $25.00 | $2,475 | $25 | $0.022 |
| GPT-5.5 ⓘ Released Apr 2026 | OpenAI direct | $5.00 / $30.00 | $5.00 / $30.00 | $2,750 | $28 | $0.025 |
| GPT-5.5 Released Apr 2026 | Azure OpenAI | $5.00 / $30.00 | $5.00 / $30.00 | $2,750 | $28 | $0.025 |
| GPT-5 Pro ⓘ Released Oct 2025 | OpenAI direct | $15.00 / $120.00 | $15.00 / $120.00 | $9,900 | $99 | $0.090 |
| GPT-5 Pro Released Oct 2025 | Azure OpenAI | $15.00 / $120.00 | $15.00 / $120.00 | $9,900 | $99 | $0.090 |
How the math works Formulas · sources · scope · assumptions
total_requests = users × requests/user/day × days/month
monthly_cost_usd = (input_tokens / 1M) × input_price + (output_tokens / 1M) × output_price
hours_freed = users × 8 × pct_workday_automated × days/month
time_value = hours_freed × hourly_rate · value_multiple = time_value / cost · fte_equivalent = hours_freed / 176
annual = monthly × 12 · cad = usd × 1.3837
- Scope — last 12 months only: the dataset is curated to current-generation models released roughly within the last year. Superseded families (GPT-4o, Claude 3.x, Gemini 2.0, Qwen 2.x) are intentionally excluded — they would compute correctly but aren't what you'd plan a new deployment around in 2026.
- Row order: grouped by model owner (OpenAI → Anthropic → Google → Qwen), sorted alphabetically by model name within each group. The cheapest-visible row is highlighted regardless of group position.
- Negotiated rates: the "Apply your negotiated rates" disclosure lets you enter an enterprise discount per hosting (e.g. 15% off Azure OpenAI). The discount is applied to both input and output token rates after the tier (on-demand / batch / cached) is resolved, so it stacks with batch and cached discounts. Discounts are encoded in the share-link hash so a colleague opening your URL sees the same negotiated view.
- Scenario presets draw workload assumptions from public benchmarks: Microsoft Copilot for M365 impact study (knowledge-work uplift), GitHub Copilot productivity research, and McKinsey "AI in the workplace" 2024. Adjust against your own deployment metrics before treating any output as a budget commitment.
- Business value mode compares inference cost against the dollar value of capacity freed up —
users × workday hours × % automated × workdays × hourly rate. The % of workday automated slider is the lever: 10–25% is the range supported by Microsoft Copilot for M365 (~14% knowledge-worker uplift), GitHub Copilot studies (~26% engineer uplift), and McKinsey "AI in the workplace" 2024 (which projects up to 30% achievable by 2030). The default hourly rate ($100/hr) approximates $200K fully-loaded salary for US enterprise mid-level. Adjust both to your geography and role mix. - Capacity freed vs replacement: the value mode frames the output as hours reinvested (time given back to each user) rather than headcount displaced. The FTE-equivalent figure is a capacity benchmark — "this much extra bandwidth" — not a layoff projection.
- Batch API tier: uses each provider's published batch price (typically ~50% of on-demand) where available. Models without a published batch tier fall back to on-demand at the same rate.
- Prompt-cached tier: only this tier consumes the cache slider. Input cost becomes
fresh × on-demand_input + cached × cached_input_price. Cache write cost is ignored at v1 (small relative to read at high hit rates). - Reasoning models (GPT-5 Pro, GPT-5.5): output tokens include hidden chain-of-thought billed at the output rate. Pad output token estimates ~3–5× for reasoning-heavy tasks.
- Provisioned throughput (PTUs / Bedrock PT) tiers from Azure OpenAI and AWS Bedrock aren't modelled — those are reserved-capacity quotes negotiated per workload, not a single per-token rate.
- Gemini 2.5 Pro has a tiered price: $1.25/$10 for prompts ≤200K tokens; $2.50/$15 above. Calculator uses the ≤200K rate.
Token pricing source: LiteLLM model_prices_and_context_window.json,
cross-checked against each provider's official pricing page (click any model name to open the citation). Snapshot date 2026-05-27.
USD → CAD rate: 1.3837 as published by Bank of Canada daily exchange rates on 2026-06-01. FX moves daily — for budget commitments above $100K, re-verify the rate.
Provider prices change without notice — re-verify before any financial decision.
Need a TCO including infra and ops?
Token-pricing is one input. If you're sizing a real deployment — controls, runtime, observability, internal model risk — we can run the full picture with you.