Sloth LabSlothLab Tools

API Pricing Calculator

Compare LLM API costs across providers. Calculate per-request and monthly spending.

Usage Presets
tokens
tokens
/day

Model (7/19)

Cost Comparison

ModelProviderInput CostOutput CostTotal/ReqDailyMonthly
GPT-4o MiniCheapestOpenAI$0.0001$0.0003$0.0004$0.05$1.35
Gemini 2.5 FlashGoogle$0.0001$0.0003$0.0004$0.05$1.35
Claude 3.5 HaikuAnthropic$0.0008$0.0020$0.0028$0.28$8.40
Gemini 2.5 ProGoogle$0.0013$0.0050$0.0063$0.63$18.75
GPT-4oOpenAI$0.0025$0.0050$0.0075$0.75$22.50
Claude Sonnet 4Anthropic$0.0030$0.0075$0.0105$1.05$31.50
Claude Opus 4Anthropic$0.0150$0.0375$0.0525$5.25$157.50

Costs shown per request | Sorted by cost (lowest first)

Monthly Cost Comparison

GPT-4o Mini
$1.35
Gemini 2.5 Flash
$1.35
Claude 3.5 Haiku
$8.40
Gemini 2.5 Pro
$18.75
GPT-4o
$22.50
Claude Sonnet 4
$31.50
Claude Opus 4
$157.50

Last Updated: March 16, 2026

How It Works

LLM APIs charge based on token usage. Tokens are chunks of text — roughly 4 characters or 0.75 words in English. Pricing is split into input tokens (your prompt) and output tokens (the model's response), each charged per 1 million tokens. This calculator multiplies your per-request token usage by the number of daily requests to estimate daily and monthly costs across multiple providers.

Why This Matters

LLM API spending is one of the fastest-growing cost categories for technology companies. A 2024 Andreessen Horowitz survey of AI startups found that API costs represented 20-40% of total cloud infrastructure spend for companies using LLM-powered features. For a typical SaaS product serving 10,000 daily active users with an AI chatbot feature, monthly API costs can range from $3,000 (using efficient small models) to $150,000+ (using frontier models with long contexts) — a 50x difference that directly impacts unit economics and viability. The LLM API pricing landscape is also one of the most dynamic in technology. Prices have dropped 90%+ over the past two years as competition intensifies — OpenAI's GPT-4 Turbo launched at $30/1M output tokens and its successor costs $10/1M, while capable open-weight models are available via API for under $1/1M tokens. Understanding this pricing landscape helps developers and product managers make critical build-vs-buy decisions and choose the right model for each use case. For individual developers and startups, API cost surprises are a common failure mode. A prototype that costs $5/day during development can scale to $500/day at 100 users if token usage is not optimized. This calculator helps teams forecast costs before launch and compare providers to find the optimal price-performance ratio for their specific workload.

Real-World Examples

Scenario 1: The SaaS Chatbot — A customer support SaaS processes 5,000 conversations/day with an average of 2,000 input tokens (system prompt + customer query + retrieved context) and 500 output tokens per exchange. Using Claude Sonnet 4 ($3/$15 per 1M tokens), the monthly cost is (5,000 × 30 × 2,000/1M × $3) + (5,000 × 30 × 500/1M × $15) = $900 + $1,125 = $2,025/month. Switching to Claude Haiku for simpler queries (60% of volume) reduces the bill to approximately $1,100/month — a 46% savings. Scenario 2: The Document Analysis Pipeline — A legal tech startup processes 200 contracts/day, each averaging 15,000 tokens input with 3,000 tokens output analysis. Using GPT-4o ($2.50/$10 per 1M tokens), monthly cost is (200 × 30 × 15,000/1M × $2.50) + (200 × 30 × 3,000/1M × $10) = $225 + $180 = $405/month. With prompt caching (50% discount on the 10,000-token system prompt portion), cost drops to approximately $330/month. Scenario 3: The Budget-Conscious Indie Developer — A solo developer building an AI writing assistant has 50 users making 10 requests/day each. With 1,000 input and 800 output tokens per request, using Gemini 2.0 Flash ($0.10/$0.40 per 1M tokens), monthly cost is only (15,000 × 1,000/1M × $0.10) + (15,000 × 800/1M × $0.40) = $1.50 + $4.80 = $6.30/month — running an AI-powered product for the price of a coffee.

Methodology & Sources

This calculator uses current API pricing data from major LLM providers including OpenAI, Anthropic, Google, Mistral, Meta, Cohere, and others. Prices are listed per million tokens for both input and output, reflecting the standard billing model used across the industry. Cost per request is calculated as: Total Cost = (Input Tokens × Input Price/1M) + (Output Tokens × Output Price/1M). Monthly estimates multiply per-request cost by the estimated number of requests per month. Token count estimation uses the general approximation of ~0.75 words per token for English text. Actual tokenization varies by model and tokenizer — GPT-4 and Claude use BPE tokenization, while other models may use SentencePiece or similar tokenizers. Data sources: Official API pricing pages from each provider, updated regularly. Prices reflect standard tier pricing without volume discounts or committed use agreements. Limitations: API pricing changes frequently as providers compete and release new models. Cached token pricing, batch processing discounts, and enterprise agreements may significantly reduce costs. The calculator does not account for rate limits, latency differences, or quality variations between models at different price points.

Common Mistakes to Avoid

1. Choosing the most expensive model by default — Many developers default to the largest frontier model (GPT-4o, Claude Opus, Gemini Pro) for all tasks. In reality, 70-80% of typical API workloads (classification, extraction, summarization, simple Q&A) can be handled by models that cost 5-20x less with minimal quality difference. Always benchmark smaller models first. 2. Ignoring output token costs — Developers often focus on input pricing when comparing models, but output tokens are typically 3-6x more expensive. An application with verbose responses can have output costs dominate the bill. Setting max_tokens limits and instructing the model to be concise can reduce output costs by 30-50%. 3. Not implementing prompt caching for repeated content — If your system prompt, few-shot examples, or RAG context template exceeds 1,000 tokens and is included in every request, you are paying full input price for identical text thousands of times. Enabling prompt caching saves 50-90% on that portion, which can reduce total costs by 20-40% for typical applications. 4. Failing to monitor and set spending limits — API costs can spike unexpectedly due to bugs (infinite loops calling the API), traffic surges, or prompt injection attacks that generate long outputs. Always set monthly budget alerts and hard spending caps through your provider's dashboard. A single runaway script can generate thousands of dollars in charges overnight. 5. Comparing prices without considering quality and latency — The cheapest model is not always the best value. A $0.25/1M token model that requires 3 retry attempts due to poor quality is more expensive and slower than a $1.00/1M model that succeeds on the first try. Factor in accuracy, latency, and retry rates when calculating true per-request cost.

Frequently Asked Questions

How are API costs calculated?
API providers charge based on tokens processed. Each request has input tokens (your prompt) and output tokens (the model's response). The cost formula is: (input_tokens / 1,000,000) × input_price + (output_tokens / 1,000,000) × output_price. For example, a request with 1,000 input tokens and 500 output tokens to GPT-4o costs (1000/1M × $2.50) + (500/1M × $10.00) = $0.0025 + $0.005 = $0.0075 per request.
What is a token?
A token is a chunk of text that language models process. In English, one token is roughly 4 characters or about 0.75 words. So a 1,000-word English text is approximately 1,333 tokens. However, tokenization varies by language — Chinese, Japanese, and Korean text typically uses more tokens per character. Code also tends to use more tokens due to special characters and formatting.
Which model offers the best value?
It depends on your use case. For simple tasks like classification or extraction, smaller models like GPT-4o Mini, Gemini 2.5 Flash, or Claude 3.5 Haiku offer excellent value at a fraction of the cost. For complex reasoning, coding, or creative tasks, larger models like GPT-4o, Claude Sonnet 4, or Gemini 2.5 Pro provide better quality. Always test with smaller models first — you may be surprised by how well they perform for your specific task.
Why is output pricing usually higher than input pricing?
Output tokens are more expensive because generating each output token requires a full forward pass through the model, making it computationally intensive. Input tokens, by contrast, can be processed in parallel during the prefill phase. Additionally, output generation is auto-regressive — each new token depends on all previous tokens — which limits parallelization and increases per-token compute cost. This is why output is typically 2-5x more expensive than input.
How can I reduce my LLM API costs?
Several strategies can significantly reduce API costs: (1) Use prompt caching for repeated context — many providers offer 50-90% discounts on cached tokens. (2) Choose the right model size — smaller models like GPT-4o-mini or Claude Haiku are 10-20x cheaper and sufficient for many tasks. (3) Optimize prompt length by removing unnecessary instructions or context. (4) Use batch processing APIs when real-time responses aren't needed. (5) Implement response length limits to avoid unnecessarily long outputs.
What is prompt caching and how much does it save?
Prompt caching stores frequently used input text (system prompts, few-shot examples) so it does not need to be reprocessed on every request. Anthropic offers 90% discount on cached input tokens, OpenAI offers 50%, and Google offers 75%. For applications with large system prompts (2,000+ tokens) sent with every request, caching can reduce input costs by 50-90%. It is particularly valuable for RAG pipelines and chatbots with extensive system instructions.
How do I estimate token counts for my application?
For English text, use the approximation of 1 token per 4 characters or 0.75 words. A typical chatbot exchange might use 500-1,500 input tokens (system prompt + user message + context) and 200-800 output tokens. A RAG pipeline with retrieved documents can reach 4,000-8,000 input tokens per query. For precise counts, use tokenizer tools like OpenAI's tiktoken library or Anthropic's token counting API endpoint.
Should I use local models or API models for cost optimization?
The breakeven depends on volume. Running a local Llama 3 8B on an RTX 4090 ($2,000 upfront) generates approximately 40 tokens/second at zero marginal cost. At typical API rates of $0.15/1M input tokens for a comparable model, you break even after roughly 13 billion tokens — about 6-12 months at 500 requests/day. Below that volume, APIs are cheaper. Above it, local inference saves money but requires DevOps overhead.

Related Guides

Learn more about the concepts behind this tool