MODEL COMPARISONS
Compare LLM Models Side-by-Side
Benchmarks, pricing, and context windows for 1197 head-to-head pairs across OpenAI, Anthropic, Google, Meta, DeepSeek, xAI, and more. Pick the right model before you write a single line of code.
WHY THIS MATTERS
Two numbers from a marketing page never settle a model choice
Choosing between two frontier LLMs by reading their vendors' announcement posts is how teams end up rewriting prompts a month later. A solid model decision compares the same axes across both sides: Artificial Analysis benchmark scores, throughput, time-to-first-token, context window, modality coverage, and per-million-token pricing — including cache-read and cache-write rates when the vendor publishes them.
Every page in this directory follows the same template. The same dimensions are scored side-by-side, the same FAQ format is answered, and every claim cites either a public benchmark page or a vendor pricing page. ElliotGate runs both models behind one OpenAI-compatible endpoint, so the comparisons stay vendor-neutral — once you've picked, switching back is a slug change, not a migration.
EDITOR'S PICKS
Eight comparisons teams reach for first
These are the cross-vendor matchups we get asked about most often. Start here if you're early in model selection — these pairs cover the four flagship families currently shipping API access.
Anthropic · OpenAI
Claude Opus 4.7 vs GPT-5.5
Read the comparison
Anthropic · OpenAI
Claude Sonnet 4.6 vs GPT-5.5
Read the comparison
Anthropic · Google
Claude Opus 4.7 vs Gemini 3.1 Pro Preview
Read the comparison
Google · OpenAI
Gemini 3.1 Pro Preview vs GPT-5.5
Read the comparison
DeepSeek · OpenAI
DeepSeek: DeepSeek V4 Pro vs GPT-5.5
Read the comparison
Anthropic · xAI
Claude Opus 4.7 vs Grok 4.20
Read the comparison
Anthropic · Moonshot AI
Claude Opus 4.7 vs Kimi K2.6
Read the comparison
Anthropic · DeepSeek
Claude Sonnet 4.6 vs DeepSeek: DeepSeek V4 Pro
Read the comparison
BY VENDOR
Browse by vendor
Each vendor's highest-signal matchups, with cross-vendor pairs tagged. All 1197 comparison pages are indexed in our sitemap and searchable.
OpenAI627
- DeepSeek: DeepSeek V4 Pro vs o4 MiniCross-vendor
- GPT-5 vs Qwen: Qwen3.6 35B A3BCross-vendor
- GPT-5 Chat vs MiMo-V2-OmniCross-vendor
- GPT-5.1 vs Kimi K2 ThinkingCross-vendor
- Claude Sonnet 4.5 vs GPT-5.5Cross-vendor
- Claude Opus 4.1 vs GPT-5.5Cross-vendor
- Claude Opus 4.1 vs GPT-5.4Cross-vendor
- Gemini 3.1 Pro Preview vs o4 Mini HighCross-vendor
- GPT-5 Chat vs Qwen3.6 27BCross-vendor
- Gemini 3.1 Pro Preview vs o4 MiniCross-vendor
+617 more in the sitemap
Anthropic284
- Claude Sonnet 4.5 vs MiniMax M2.7Cross-vendor
- Claude Sonnet 4.5 vs GPT-5.5Cross-vendor
- Claude Opus 4.1 vs GPT-5.5Cross-vendor
- Claude Sonnet 4.5 vs Kimi K2.6Cross-vendor
- Claude Opus 4.1 vs GPT-5.4Cross-vendor
- Claude Opus 4.1 vs Kimi K2.6Cross-vendor
- Claude Sonnet 4.5 vs Qwen3.6 Max PreviewCross-vendor
- Claude Opus 4.1 vs Grok 4.3Cross-vendor
- Claude Sonnet 4.6 vs Qwen3.5 397B A17BCross-vendor
- Claude Sonnet 4.5 vs GLM 5.1Cross-vendor
+274 more in the sitemap
Google99
- Gemini 3.1 Pro Preview vs o4 Mini HighCross-vendor
- Gemini 3.1 Pro Preview vs o4 MiniCross-vendor
- Gemini 3.1 Flash Lite Preview vs Grok 4.3Cross-vendor
- Gemini 3.1 Pro Preview vs Qwen3.5-9BCross-vendor
- Claude Opus 4 vs Gemini 3.1 Pro PreviewCross-vendor
- Gemini 3.1 Pro Preview vs Mercury 2Cross-vendor
- Gemini 3.1 Pro Preview vs Grok 3 MiniCross-vendor
- Gemini 3.1 Pro Preview vs MiniMax M2Cross-vendor
- Claude Sonnet 4 vs Gemini 3.1 Pro PreviewCross-vendor
- Claude Sonnet 4.5 vs Gemini 3.1 Pro PreviewCross-vendor
+89 more in the sitemap
xAI144
- Grok 4 vs Kimi K2.5Cross-vendor
- Grok 3 Mini vs Kimi K2.6Cross-vendor
- Claude Opus 4.1 vs Grok 4.3Cross-vendor
- GPT-5.2-Codex vs Grok 4Cross-vendor
- Gemini 3.1 Flash Lite Preview vs Grok 4.3Cross-vendor
- GPT-5.5 vs Grok 3 MiniCross-vendor
- gpt-oss-120b vs Grok 4.3Cross-vendor
- Grok 4.3 vs o4 Mini HighCross-vendor
- Grok 4.3 vs o4 MiniCross-vendor
- Claude Sonnet 4 vs Grok 4.3Cross-vendor
+134 more in the sitemap
DeepSeek129
- DeepSeek: DeepSeek V4 Flash vs Qwen3.5-122B-A10BCross-vendor
- DeepSeek: DeepSeek V4 Pro vs o4 MiniCross-vendor
- DeepSeek: DeepSeek V4 Flash vs MiMo-V2-OmniCross-vendor
- DeepSeek: DeepSeek V4 Pro vs MiniMax M2Cross-vendor
- Claude Opus 4.1 vs DeepSeek: DeepSeek V4 ProCross-vendor
- DeepSeek: DeepSeek V4 Pro vs gpt-oss-120bCross-vendor
- Claude Sonnet 4.5 vs DeepSeek: DeepSeek V4 ProCross-vendor
- DeepSeek: DeepSeek V4 Flash vs MiniMax M2.5Cross-vendor
- DeepSeek: DeepSeek V4 Flash vs Qwen: Qwen3.6 35B A3BCross-vendor
- Claude Opus 4.7 vs DeepSeek V3.2 ExpCross-vendor
+119 more in the sitemap
Qwen268
- DeepSeek: DeepSeek V4 Flash vs Qwen3.5-122B-A10BCross-vendor
- GPT-5 vs Qwen: Qwen3.6 35B A3BCross-vendor
- GPT-5 Chat vs Qwen3.6 27BCross-vendor
- Claude Sonnet 4.5 vs Qwen3.6 Max PreviewCross-vendor
- GPT-5 vs Qwen3.5 397B A17BCross-vendor
- GLM 5.1 vs Qwen3.5-35B-A3BCross-vendor
- Claude Sonnet 4.6 vs Qwen3.5 397B A17BCross-vendor
- Gemini 3.1 Pro Preview vs Qwen3.5-9BCross-vendor
- MiMo-V2-Omni vs Qwen3.5 397B A17BCross-vendor
- MiniMax M2.7 vs Qwen3.5-35B-A3BCross-vendor
+258 more in the sitemap
Moonshot AI130
- Grok 4 vs Kimi K2.5Cross-vendor
- Grok 3 Mini vs Kimi K2.6Cross-vendor
- GPT-5.1 vs Kimi K2 ThinkingCross-vendor
- Claude Sonnet 4.5 vs Kimi K2.6Cross-vendor
- Claude Opus 4.1 vs Kimi K2.6Cross-vendor
- Kimi K2.6 vs o4 Mini HighCross-vendor
- GPT-5.2-Codex vs Kimi K2 ThinkingCross-vendor
- Kimi K2 Thinking vs MiMo-V2-ProCross-vendor
- Kimi K2.5 vs Qwen3.5-122B-A10BCross-vendor
- Kimi K2 Thinking vs Qwen3.6 PlusCross-vendor
+120 more in the sitemap
Z.ai201
- GLM 5V Turbo vs GPT-5.1 ChatCross-vendor
- GLM 5 Turbo vs GPT-5.1-CodexCross-vendor
- GLM 5.1 vs Qwen3.5-35B-A3BCross-vendor
- Claude Sonnet 4.5 vs GLM 5.1Cross-vendor
- GLM 5.1 vs MiniMax M2Cross-vendor
- Claude Opus 4.1 vs GLM 5.1Cross-vendor
- Claude Opus 4.6 vs GLM 5V TurboCross-vendor
- GLM 5.1 vs gpt-oss-120bCross-vendor
- GLM 4.7 vs GPT-5.4 MiniCross-vendor
- GLM 5.1 vs Step 3.5 FlashCross-vendor
+191 more in the sitemap
MiniMax118
- Claude Sonnet 4.5 vs MiniMax M2.7Cross-vendor
- MiMo-V2-Pro vs MiniMax M2.5Cross-vendor
- DeepSeek: DeepSeek V4 Pro vs MiniMax M2Cross-vendor
- GLM 5.1 vs MiniMax M2Cross-vendor
- MiniMax M2.7 vs Qwen3.5-35B-A3BCross-vendor
- GPT-5.2-Codex vs MiniMax M2.1Cross-vendor
- GPT-5.4 vs MiniMax M2Cross-vendor
- GPT-5.3-Codex vs MiniMax M2Cross-vendor
- GPT-5 Mini vs MiniMax M2.7Cross-vendor
- GPT-5.2 vs MiniMax M2.1Cross-vendor
+108 more in the sitemap
Xiaomi137
- GPT-5 Chat vs MiMo-V2-OmniCross-vendor
- MiMo-V2-Pro vs MiniMax M2.5Cross-vendor
- DeepSeek: DeepSeek V4 Flash vs MiMo-V2-OmniCross-vendor
- MiMo-V2-Omni vs Qwen3.5 397B A17BCross-vendor
- GPT-5 Codex vs MiMo-V2-OmniCross-vendor
- Claude Opus 4.1 vs MiMo-V2.5-ProCross-vendor
- Grok 4 vs MiMo-V2-ProCross-vendor
- GPT-5 Mini vs MiMo-V2-ProCross-vendor
- Claude Opus 4.6 vs MiMo-V2-OmniCross-vendor
- Kimi K2 Thinking vs MiMo-V2-ProCross-vendor
+127 more in the sitemap
FREQUENTLY ASKED
FAQ
How do I fairly compare two LLM models?
Pick the same axes for both sides and source each number from a third-party benchmark instead of a vendor announcement. Artificial Analysis publishes Intelligence Index, Coding Index, GPQA, HLE, TerminalBench Hard, and Tau-2 on a comparable scale. Add throughput (tokens per second) and time-to-first-token because they shape user-perceived quality, and pull per-million-token pricing from each vendor's own pricing page — including cache-read and cache-write when they're published. Every page in this directory follows that template so you can compare two pairs the same way.
Which benchmarks matter most for my use case?
It depends on the workload. Knowledge-heavy products lean on GPQA Diamond and Humanity's Last Exam. Coding agents care about Coding Index, TerminalBench Hard, and SWE-Bench Verified. Tool-using agents weight Tau-2 above raw reasoning. Retrieval and summarization care more about throughput and Long Context Recall than aggregate Intelligence. The use-cases hub at /use-cases scores each task with explicit criteria weights so you can borrow the methodology directly.
Are these prices the same as the upstream providers?
Yes. ElliotGate publishes the same per-token rates as Anthropic, OpenAI, Google, and the other upstream vendors charge directly. There is no gateway markup. Cache-read and cache-write rates are shown when the vendor publishes them. We refresh pricing snapshots whenever a vendor changes their public list.
Can I switch models without changing my code?
Yes. The OpenAI-compatible chat completions endpoint accepts every listed model — change the model slug, keep the rest of the request body the same. Most teams build a one-line router that picks per request based on input length, latency budget, or a feature flag. The Anthropic Messages endpoint is also supported for prompt caching workflows that depend on it.
How often is this comparison data updated?
Each comparison page carries its own publishedAt and reviewedAt date. We re-pull Artificial Analysis scores when AA refreshes its leaderboard and re-check vendor pricing pages quarterly at minimum. When a model is retired or significantly updated, the relevant comparison pages are reviewed within a week.
One key, every model on the list
Pick the comparison that fits your workload, run both sides behind one OpenAI-compatible API key, and ship without juggling vendor accounts.