RoninForge / LLM API pricing

LLM API pricing, side by side

Token prices for six providers, taken from their official pricing pages and nothing else. Standard tier, USD per 1 million tokens, with the caching and long-context caveats that quietly change your bill.

Every number verified against the linked source on 2026-06-10. The spread is real: Mistral Small 4 costs $0.40 per combined 1M in + 1M out, GPT-5.5 Pro costs $210.00.

ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
Claude Fable 5$10.00$1.00$50.00$60.001M
Claude Opus 4.8$5.00$0.50$25.00$30.001M
Claude Sonnet 4.6$3.00$0.30$15.00$18.001M
Claude Haiku 4.5$1.00$0.10$5.00$6.00-
  • Cached input is the cache read price (0.1x input). Cache writes cost extra: 1.25x input for 5-minute TTL, 2x for 1-hour TTL.
  • Batch API: 50 percent off input and output.
  • Fable 5, Opus 4.8/4.7/4.6, and Sonnet 4.6 include the full 1M-token context window at standard pricing.
ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
GPT-5.5 Protiered$30.00-$180.00$210.00-
GPT-5.5tiered$5.00$0.50$30.00$35.00-
GPT-5.4tiered$2.50$0.25$15.00$17.50-
GPT-5.3-Codex$1.75$0.175$14.00$15.75-
GPT-5.4 mini$0.75$0.075$4.50$5.25-
GPT-5.4 nano$0.20$0.02$1.25$1.45-
  • GPT-5.5, GPT-5.5 Pro, and GPT-5.4 have a separate long-context tier (roughly 2x input, 1.5x output). The threshold is not stated on the pricing page.
  • Batch and Flex: 50 percent off. Priority tier costs roughly 2x standard.
ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
Gemini 3.1 Pro (preview)tiered$2.00$0.20$12.00$14.00-
Gemini 3.5 Flash$1.50$0.15$9.00$10.50-
Gemini 2.5 Protiered$1.25$0.125$10.00$11.25-
Gemini 3 Flash (preview)$0.50$0.05$3.00$3.50-
Gemini 3.1 Flash-Lite$0.25$0.025$1.50$1.75-
  • Gemini 3.1 Pro and 2.5 Pro charge higher rates for prompts above 200K tokens (3.1 Pro: $4.00 in / $18.00 out).
  • Flash models charge more for audio input than text input. Rates shown are text/image/video.
  • Context caching storage costs $1.00 per 1M tokens per hour on top of cached-input rates.
ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
DeepSeek V4 Pro$0.44$0.0036$0.87$1.311M
DeepSeek V4 Flash$0.14$0.0028$0.28$0.421M
  • Cached input is the cache-hit price; cache misses pay the normal input rate.
  • deepseek-chat and deepseek-reasoner aliases deprecate 2026-07-24.
ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
Magistral Medium$2.00-$5.00$7.00-
Mistral Medium 3.5$1.50-$7.50$9.00-
Mistral Large 3$0.50-$1.50$2.00-
Codestral$0.30-$0.90$1.20-
Mistral Small 4$0.10-$0.30$0.40-
  • Mistral Large 3 is priced below the newer Mistral Medium 3.5 on the official page (verified twice on 2026-06-10).
  • Batch API: 50 percent off.
ModelInput /1MCached /1MOutput /1M1M in + 1M outContext
Grok 4.3$1.25$0.20$2.50$3.751M
Grok 4.20 (reasoning)$1.25$0.20$2.50$3.751M
Grok Build 0.1$1.00$0.20$2.00$3.00256K
  • Batch API: 20 to 50 percent off standard rates, varies per model.

How to read this table honestly

  • Cached input is not one thing. Anthropic and DeepSeek list a cache-read price; Anthropic also bills cache writes (1.25x input), Google bills cache storage per hour. Two providers with the same "cached" number can produce different bills.
  • "Tiered" means the listed price is the floor. OpenAI's GPT-5.5/5.4 and Google's Pro models charge roughly double for long-context requests, which is exactly what agent workflows produce.
  • Batch discounts are large. Anthropic, OpenAI, and Mistral all offer 50 percent off for async batch workloads; xAI 20 to 50 percent.
  • Excluded providers are excluded for a reason. Meta Llama API, Cohere, and Amazon Nova render their prices client-side, so they could not be verified from source. No number on this page is quoted from a third-party blog.