- Tools
- LLM leaderboard
LLM leaderboard
Compare large language models for performance, price and more, to find the best match for your needs.
Leaderboard
- Gemini 2.5 Pro
- Gemini 2.0 Flash
- Gemini 2.0 Flash-Lite
- o1-pro
- o1
- o3-mini
- R1 Distill LLama 8B
- Ministral 3B
- Gemini 1.5 Flash-8B
Model comparison
Model | Input price / 1M tokens | Output price / 1M tokens | Context window | Output token limit | Reasoning model | Open source |
|---|---|---|---|---|---|---|
Gemini 1.5 Flash-8B | $0.04 | $0.15 | 1000000 | 8192 | ||
Ministral 3B | $0.04 | $0.04 | 128000 | 4096 | ||
R1 Distill LLama 8B | $0.04 | $0.04 | 128000 | 8000 | ||
Qwen Turbo | $0.05 | $0.20 | 1000000 | 8192 | ||
GPT-5 Nano | $0.05 | $0.40 | 128000 | 16384 | ||
Coder V2 Lite | $0.06 | $0.18 | 128000 | 8000 | ||
Gemini 2.0 Flash-Lite | $0.07 | $0.30 | 1000000 | 8192 | ||
Gemini 1.5 Flash | $0.07 | $0.30 | 1000000 | 8192 | ||
Gemini 2.0 Flash | $0.10 | $0.40 | 1000000 | 8192 | ||
Llama 3.1 8B | $0.10 | $0.10 | 128000 | 2048 | ||
Ministral 8B | $0.10 | $0.10 | 128000 | 4096 | ||
GPT-4.1 Nano | $0.10 | $0.40 | 128000 | 16384 | ||
Gemma 2 9B | $0.12 | $0.15 | 8000 | 8192 | ||
Coder V2 | $0.14 | $0.28 | 128000 | 8000 | ||
GPT-4o mini | $0.15 | $0.60 | 128000 | 16384 | ||
GPT-4o mini Audio | $0.15 | $0.60 | 128000 | 16384 | ||
Gemma 2 27B | $0.17 | $0.51 | 8000 | 8192 | ||
Mistral Saba | $0.20 | $0.60 | 32000 | 4096 | ||
Grok 4 Fast | $0.20 | $0.50 | 128000 | 8192 | ||
Claude 3 Haiku | $0.25 | $1.25 | 200000 | 4096 | ||
GPT-5 Mini | $0.25 | $2.00 | 128000 | 16384 | ||
V3 | $0.27 | $1.10 | 128000 | 8000 | ||
Codestral | $0.30 | $0.90 | 128000 | 4096 | ||
R1 Distill Qwen 32B | $0.30 | $0.30 | 128000 | 8000 | ||
Gemini 2.5 Flash | $0.30 | $2.50 | 1000000 | 64000 | ||
Grok 3 Mini | $0.30 | $0.50 | 128000 | 8192 | ||
GPT-4.1 Mini | $0.40 | $1.60 | 128000 | 16384 | ||
GPT-3.5 Turbo | $0.50 | $1.50 | 16385 | 4096 | ||
Llama 2 Chat | $0.50 | $0.25 | 4096 | 2048 | ||
QwQ 32B | $0.55 | $0.75 | 131000 | 8192 | ||
DeepSeek Reasoner | $0.55 | $2.19 | 64000 | 8000 | ||
Llama 3.3 70B | $0.59 | $0.70 | 128000 | 2048 | ||
GPT-4o mini Realtime | $0.60 | $2.40 | 128000 | 4096 | ||
Llama 3.2 | $0.60 | $0.60 | 128000 | 2048 | ||
R1 Distill Llama 70B | $0.72 | $0.99 | 128000 | 8000 | ||
Claude 3.5 Haiku | $0.80 | $4.00 | 200000 | 8192 | ||
Qwen 2.5 Coder 32B | $0.80 | $0.80 | 131000 | 8192 | ||
R1 Distill Qwen 14B | $0.88 | $0.88 | 128000 | 8000 | ||
Sonar Reasoning | $1.00 | $5.00 | 127000 | N/A | ||
Sonar | $1.00 | $1.00 | 127000 | N/A | ||
Claude 4.5 Haiku | $1.00 | $5.00 | 200000 | 8192 | ||
o3-mini | $1.10 | $4.40 | 200000 | 100000 | ||
o1-mini | $1.10 | $4.40 | 128000 | 65536 | ||
o4-mini | $1.10 | $4.40 | 200000 | 65536 | ||
Gemini 2.5 Pro | $1.25 | $10.00 | 2000000 | 64000 | ||
GPT-5 | $1.25 | $10.00 | 128000 | 16384 | ||
GPT-5.1 | $1.25 | $10.00 | 128000 | 16384 | ||
GPT-5.1 Codex | $1.25 | $10.00 | 128000 | 16384 | ||
Qwen 2.5 Max | $1.60 | $6.40 | 32000 | 8192 | ||
Mistral Large | $2.00 | $6.00 | 128000 | 4096 | ||
Pixtral Large | $2.00 | $6.00 | 128000 | 4096 | ||
Sonar Reasoning Pro | $2.00 | $8.00 | 128000 | N/A | ||
Sonar Deep Research | $2.00 | $8.00 | 200000 | N/A | ||
Gemini 3 Pro | $2.00 | $12.00 | 1000000 | 64000 | ||
GPT-4.1 | $2.00 | $8.00 | 128000 | 16384 | ||
GPT-4o | $2.50 | $10.00 | 128000 | 16384 | ||
GPT-4o Audio | $2.50 | $10.00 | 128000 | 16384 | ||
Claude 3.7 Sonnet | $3.00 | $15.00 | 200000 | 8192 | ||
Claude 3.5 Sonnet | $3.00 | $15.00 | 200000 | 8192 | ||
Sonar Pro | $3.00 | $15.00 | 200000 | N/A | ||
Claude Sonnet 4.5 | $3.00 | $15.00 | 200000 | 8192 | ||
Grok 3 | $3.00 | $15.00 | 128000 | 8192 | ||
Grok 4 | $3.00 | $15.00 | 128000 | 8192 | ||
Llama 3.1 405B | $3.50 | $3.50 | 128000 | 2048 | ||
GPT-4o Realtime | $5.00 | $20.00 | 128000 | 4096 | ||
GPT-4 Turbo | $10.00 | $30.00 | 128000 | 4096 | ||
o3 | $10.00 | $40.00 | 200000 | 100000 | ||
o3 Deep Research | $10.00 | $40.00 | 200000 | 100000 | ||
o1 | $15.00 | $60.00 | 200000 | 100000 | ||
Claude 3 Opus | $15.00 | $75.00 | 200000 | 4096 | ||
Claude Opus 4 | $15.00 | $75.00 | 200000 | 4096 | ||
GPT-5 Pro | $15.00 | $120.00 | 128000 | 16384 | ||
o3 Pro | $20.00 | $80.00 | 200000 | 100000 | ||
GPT-4 | $30.00 | $60.00 | 8192 | 8192 | ||
GPT-4.5 | $75.00 | $150.00 | 128000 | 16384 | ||
o1-pro | $150.00 | $600.00 | 200000 | 100000 | ||
Gemma 3 1B | N/A | N/A | 32000 | 8192 | ||
Gemma 3 27B | N/A | N/A | 128000 | 8192 | ||
Qwen 2.5 72B | N/A | N/A | 131000 | 8192 |
Key definitions
LLM Leaderboard FAQ
Large Language Models (LLMs) are a type of AI system trained on massive amounts of text data. They learn patterns, relationships, and structures in language, allowing them to generate human-like text, translate languages, answer questions, and perform various other language-based tasks. They typically use neural networks (often transformer models) as their underlying architecture. Modern LLMs can perform a wide range of tasks including writing, translation, summarization, question answering, code generation, and reasoning.
LLMs are large neural networks, specifically transformer architectures, that process and generate text. During training, the neural networks learn to identify patterns and relationships within massive datasets of text, by accomplishing tasks like predicting the next word in a sequence. When prompted, the neural networks use their learnt parameters to generate coherent and relevant text given the provided context.
LLM training is a two-stage process: pre-training and fine-tuning. During pre-training, models learn general language patterns. During fine-tuning, models learn to accurately accomplish a variety of tasks like answering questions, summarizing text, generating code, identifying entities and importantly, following human instructions.
LLMs are capable of solving a diversity of natural language understanding and generation tasks. They can of course generate content, from business emails to children stories. They are capable of extracting information from the provided context, summarizing and analyzing text, and translating between languages.
We can access LLMs in multiple ways. Either directly through the provider’s console or API, but also by self-hosting and on cloud providers’ platforms.
Using the provider console : ChatGPT, Gemini AI studio, Claude allow us to chat with their state-of-the-art LLM models either as a free or paid method.
APIs : Almost all providers (OpenAI, Gemini, Claude, etc) allow users to use their LLMs via paid service using APIs. This is the easiest way for one to integrate with their applications and leverage AI capabilities.
Self-hosting : Research teams have released open-source LLMs (Deepseek-R1, Gemma 3, Llama 3.3, Mistral, Phi-4) allowing us to download and run them on our local machine or in cloud instances. This gives more control over the model, but may require significant computational resources and technical expertise. That said, almost all open-source versions have smaller model flavours (with fewer parameters) that are distilled from the larger models, such as Deepseek R1 1.5B and Llama 3 7B. These can be more easily hosted on local machines with decent specifications (sometimes without GPUs as well). One can interact with them either on the command line using tools like llama.cpp and ollama, or with a web interface like open-webui or text-generation-webui.
Cloud-based platforms : Cloud providers (e.g., AWS, Azure, GCP) offer managed services that allow you to deploy and run LLMs on their infrastructure. This provides a balance between control and ease of use.
It depends on the deployment method and expected usage.
Utilizing LLMs through provider APIs incurs costs based on usage, billed by token consumption.
Self-hosting open-source LLMs may require substantial computational resources, particularly GPUs. This approach offers greater control and privacy, but GPUs need to be highly utilized for it to be cheaper than using provider APIs.
Cloud providers offer managed LLM services that balance cost and control. Pricing models vary, but often involve a combination of compute and usage-based charges.
LLMs represent a significant advancement over traditional Natural Language Processing (NLP) techniques. They generalize better, which allows them to solve very diverse tasks effectively. Being trained on large text corpuses, they embed strong priors about natural language, producing human-like text. Finally, they are computationally efficient and can use GPUs (which are highly parallel processors) very well.
GPT (Generative Pre-trained Transformer) is a specific family of Large Language Models developed by OpenAI. The term LLM is a more generic term referring to any large language model. GPT designates a specific implementation and architecture, and therefore GPT is an LLM.
These are the number of parameters in the LLM, in billions. LLMs are neural networks that are built from matrices: the no. parameters is the total no. elements in these matrices. Parameters are learnt during the training process and then used to make predictions. The higher the number of parameters, the more complex patterns the model can learn. In most cases, they are also directly proportional to the model’s performance. However, more parameters require higher computational resources to run the models (essentially GPU memory and bandwidth).
While previous iterations of LLMs were primarily trained on vast amounts of text, newer models are also trained on non-textual data like images, audio and video thanks to specialized encoders that act as translators. These encoders convert images or audio into numerical representations (vector embeddings) that the LLM can understand. On the output side, corresponding decoders generate text, but may also generate images and audio. Therefore, modern iterations can support various modalities, combining them to solve complex tasks.