Question 1

What are Large Language Models (LLMs)?

Accepted Answer

Large Language Models (LLMs) are a type of AI system trained on massive amounts of text data. They learn patterns, relationships, and structures in language, allowing them to generate human-like text, translate languages, answer questions, and perform various other language-based tasks. They typically use neural networks (often transformer models) as their underlying architecture. Modern LLMs can perform a wide range of tasks including writing, translation, summarization, question answering, code generation, and reasoning.

Question 2

How do LLMs work?

Accepted Answer

LLMs are large neural networks, specifically transformer architectures, that process and generate text. During training, the neural networks learn to identify patterns and relationships within massive datasets of text, by accomplishing tasks like predicting the next word in a sequence. When prompted, the neural networks use their learnt parameters to generate coherent and relevant text given the provided context.

Question 3

How are LLMs trained?

Accepted Answer

LLM training is a two-stage process: pre-training and fine-tuning. During pre-training, models learn general language patterns. During fine-tuning, models learn to accurately accomplish a variety of tasks like answering questions, summarizing text, generating code, identifying entities and importantly, following human instructions.

Question 4

What are some good use cases for LLMs?

Accepted Answer

LLMs are capable of solving a diversity of natural language understanding and generation tasks. They can of course generate content, from business emails to children stories. They are capable of extracting information from the provided context, summarizing and analyzing text, and translating between languages.

Question 5

How can I access different LLMs by different providers?

Accepted Answer

We can access LLMs in multiple ways. Either directly through the provider’s console or API, but also by self-hosting and on cloud providers’ platforms.

Using the provider console : ChatGPT, Gemini AI studio, Claude allow us to chat with their state-of-the-art LLM models either as a free or paid method.

APIs : Almost all providers (OpenAI, Gemini, Claude, etc) allow users to use their LLMs via paid service using APIs. This is the easiest way for one to integrate with their applications and leverage AI capabilities.

Self-hosting : Research teams have released open-source LLMs (Deepseek-R1, Gemma 3, Llama 3.3, Mistral, Phi-4) allowing us to download and run them on our local machine or in cloud instances. This gives more control over the model, but may require significant computational resources and technical expertise. That said, almost all open-source versions have smaller model flavours (with fewer parameters) that are distilled from the larger models, such as Deepseek R1 1.5B and Llama 3 7B. These can be more easily hosted on local machines with decent specifications (sometimes without GPUs as well). One can interact with them either on the command line using tools like llama.cpp and ollama, or with a web interface like open-webui or text-generation-webui.

Cloud-based platforms : Cloud providers (e.g., AWS, Azure, GCP) offer managed services that allow you to deploy and run LLMs on their infrastructure. This provides a balance between control and ease of use.

Question 6

Are LLMs expensive to set up and use?

Accepted Answer

It depends on the deployment method and expected usage.

Utilizing LLMs through provider APIs incurs costs based on usage, billed by token consumption.

Self-hosting open-source LLMs may require substantial computational resources, particularly GPUs. This approach offers greater control and privacy, but GPUs need to be highly utilized for it to be cheaper than using provider APIs.

Cloud providers offer managed LLM services that balance cost and control. Pricing models vary, but often involve a combination of compute and usage-based charges.

Question 7

What are the advantages of using LLMs over previous approaches?

Accepted Answer

LLMs represent a significant advancement over traditional Natural Language Processing (NLP) techniques. They generalize better, which allows them to solve very diverse tasks effectively. Being trained on large text corpuses, they embed strong priors about natural language, producing human-like text. Finally, they are computationally efficient and can use GPUs (which are highly parallel processors) very well.

Question 8

What is GPT, and is this different to LLM?

Accepted Answer

GPT (Generative Pre-trained Transformer) is a specific family of Large Language Models developed by OpenAI. The term LLM is a more generic term referring to any large language model. GPT designates a specific implementation and architecture, and therefore GPT is an LLM.

Question 9

What do the numbers like 2B, 7B, or 30B indicate on the model names?

Accepted Answer

These are the number of parameters in the LLM, in billions. LLMs are neural networks that are built from matrices: the no. parameters is the total no. elements in these matrices. Parameters are learnt during the training process and then used to make predictions. The higher the number of parameters, the more complex patterns the model can learn. In most cases, they are also directly proportional to the model’s performance. However, more parameters require higher computational resources to run the models (essentially GPU memory and bandwidth).

Question 10

LLMs were fundamentally trained on text, so why are recent models like GPT4, Gemini 2.0 , Llama 3 and Claude 3 considered multimodal?

Accepted Answer

While previous iterations of LLMs were primarily trained on vast amounts of text, newer models are also trained on non-textual data like images, audio and video thanks to specialized encoders that act as translators. These encoders convert images or audio into numerical representations (vector embeddings) that the LLM can understand. On the output side, corresponding decoders generate text, but may also generate images and audio. Therefore, modern iterations can support various modalities, combining them to solve complex tasks.

Model	Input price / 1M tokens	Output price / 1M tokens	Context window	Output token limit
Gemini 1.5 Flash-8B	$0.04	$0.15	1000000	8192
Ministral 3B	$0.04	$0.04	128000	4096
R1 Distill LLama 8B	$0.04	$0.04	128000	8000
Qwen Turbo	$0.05	$0.20	1000000	8192
GPT-5 Nano	$0.05	$0.40	128000	16384
Coder V2 Lite	$0.06	$0.18	128000	8000
Gemini 2.0 Flash-Lite	$0.07	$0.30	1000000	8192
Gemini 1.5 Flash	$0.07	$0.30	1000000	8192
Gemini 2.0 Flash	$0.10	$0.40	1000000	8192
Llama 3.1 8B	$0.10	$0.10	128000	2048
Ministral 8B	$0.10	$0.10	128000	4096
GPT-4.1 Nano	$0.10	$0.40	128000	16384
Gemma 2 9B	$0.12	$0.15	8000	8192
Coder V2	$0.14	$0.28	128000	8000
GPT-4o mini	$0.15	$0.60	128000	16384
GPT-4o mini Audio	$0.15	$0.60	128000	16384
Gemma 2 27B	$0.17	$0.51	8000	8192
Mistral Saba	$0.20	$0.60	32000	4096
Grok 4 Fast	$0.20	$0.50	128000	8192
Claude 3 Haiku	$0.25	$1.25	200000	4096
GPT-5 Mini	$0.25	$2.00	128000	16384
V3	$0.27	$1.10	128000	8000
Codestral	$0.30	$0.90	128000	4096
R1 Distill Qwen 32B	$0.30	$0.30	128000	8000
Gemini 2.5 Flash	$0.30	$2.50	1000000	64000
Grok 3 Mini	$0.30	$0.50	128000	8192
GPT-4.1 Mini	$0.40	$1.60	128000	16384
GPT-3.5 Turbo	$0.50	$1.50	16385	4096
Llama 2 Chat	$0.50	$0.25	4096	2048
QwQ 32B	$0.55	$0.75	131000	8192
DeepSeek Reasoner	$0.55	$2.19	64000	8000
Llama 3.3 70B	$0.59	$0.70	128000	2048
GPT-4o mini Realtime	$0.60	$2.40	128000	4096
Llama 3.2	$0.60	$0.60	128000	2048
R1 Distill Llama 70B	$0.72	$0.99	128000	8000
Claude 3.5 Haiku	$0.80	$4.00	200000	8192
Qwen 2.5 Coder 32B	$0.80	$0.80	131000	8192
R1 Distill Qwen 14B	$0.88	$0.88	128000	8000
Sonar Reasoning	$1.00	$5.00	127000	N/A
Sonar	$1.00	$1.00	127000	N/A
Claude 4.5 Haiku	$1.00	$5.00	200000	8192
o3-mini	$1.10	$4.40	200000	100000
o1-mini	$1.10	$4.40	128000	65536
o4-mini	$1.10	$4.40	200000	65536
Gemini 2.5 Pro	$1.25	$10.00	2000000	64000
GPT-5	$1.25	$10.00	128000	16384
GPT-5.1	$1.25	$10.00	128000	16384
GPT-5.1 Codex	$1.25	$10.00	128000	16384
Qwen 2.5 Max	$1.60	$6.40	32000	8192
Mistral Large	$2.00	$6.00	128000	4096
Pixtral Large	$2.00	$6.00	128000	4096
Sonar Reasoning Pro	$2.00	$8.00	128000	N/A
Sonar Deep Research	$2.00	$8.00	200000	N/A
Gemini 3 Pro	$2.00	$12.00	1000000	64000
GPT-4.1	$2.00	$8.00	128000	16384
GPT-4o	$2.50	$10.00	128000	16384
GPT-4o Audio	$2.50	$10.00	128000	16384
Claude 3.7 Sonnet	$3.00	$15.00	200000	8192
Claude 3.5 Sonnet	$3.00	$15.00	200000	8192
Sonar Pro	$3.00	$15.00	200000	N/A
Claude Sonnet 4.5	$3.00	$15.00	200000	8192
Grok 3	$3.00	$15.00	128000	8192
Grok 4	$3.00	$15.00	128000	8192
Llama 3.1 405B	$3.50	$3.50	128000	2048
GPT-4o Realtime	$5.00	$20.00	128000	4096
GPT-4 Turbo	$10.00	$30.00	128000	4096
o3	$10.00	$40.00	200000	100000
o3 Deep Research	$10.00	$40.00	200000	100000
o1	$15.00	$60.00	200000	100000
Claude 3 Opus	$15.00	$75.00	200000	4096
Claude Opus 4	$15.00	$75.00	200000	4096
GPT-5 Pro	$15.00	$120.00	128000	16384
o3 Pro	$20.00	$80.00	200000	100000
GPT-4	$30.00	$60.00	8192	8192
GPT-4.5	$75.00	$150.00	128000	16384
o1-pro	$150.00	$600.00	200000	100000
Gemma 3 1B	N/A	N/A	32000	8192
Gemma 3 27B	N/A	N/A	128000	8192
Qwen 2.5 72B	N/A	N/A	131000	8192

LLM leaderboard

Leaderboard

Model comparison

Key definitions

Sources

LLM Leaderboard FAQ