Groq Models
Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.
Visit Groq →11
Models Available
$0.050
Cheapest Input / 1M
262K
Largest Context
What is Groq?
Groq is an AI model provider offering 11 large language models for developers. Their cheapest model starts at $0.050 per 1M input tokens, and their largest context window reaches 262K. Groq provides ultra-fast inference using custom LPU hardware, delivering the fastest token generation speeds available. They host popular open-weight models like Llama and Mixtral with industry-leading latency.
Groq Strengths
All Groq Models
| Model | Input $/1M | Output $/1M | Context | Max Output | Released |
|---|---|---|---|---|---|
| Llama 3.1 8b Instant | $0.050 | $0.080 | 128K | 8,192 | — |
| Gemma 7b It | $0.050 | $0.080 | 8K | 8,192 | — |
| Openai/Gpt Oss 20b | $0.075 | $0.30 | 131K | 32,768 | — |
| Openai/Gpt Oss Safeguard 20b | $0.075 | $0.30 | 131K | 65,536 | — |
| Meta Llama/Llama 4 Scout 17b 16e Instruct | $0.11 | $0.34 | 131K | 8,192 | — |
| Openai/Gpt Oss 120b | $0.15 | $0.60 | 131K | 32,766 | — |
| Meta Llama/Llama Guard 4 12b | $0.20 | $0.20 | 8K | 8,192 | — |
| Meta Llama/Llama 4 Maverick 17b 128e Instruct | $0.20 | $0.60 | 131K | 8,192 | — |
| Qwen/Qwen3 32b | $0.29 | $0.59 | 131K | 131,000 | — |
| Llama 3.3 70b Versatile | $0.59 | $0.79 | 128K | 32,768 | — |
| Moonshotai/Kimi K2 Instruct 0905 | $1.00 | $3.00 | 262K | 16,384 | — |
Model Details
Llama 3.1 8b Instant
Llama 3.1 8b Instant is available via Groq with a 128K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.
Gemma 7b It
Gemma 7b It is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.0500/1M input tokens, $0.0800/1M output tokens.
Openai/Gpt Oss 20b
Openai/Gpt Oss 20b is available via Groq with a 131K context window and up to 32,768 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.
Openai/Gpt Oss Safeguard 20b
Openai/Gpt Oss Safeguard 20b is available via Groq with a 131K context window and up to 65,536 output tokens. Pricing: $0.0750/1M input tokens, $0.3000/1M output tokens.
Meta Llama/Llama 4 Scout 17b 16e Instruct
Meta Llama/Llama 4 Scout 17b 16e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.1100/1M input tokens, $0.3400/1M output tokens.
Openai/Gpt Oss 120b
Openai/Gpt Oss 120b is available via Groq with a 131K context window and up to 32,766 output tokens. Pricing: $0.1500/1M input tokens, $0.6000/1M output tokens.
Meta Llama/Llama Guard 4 12b
Meta Llama/Llama Guard 4 12b is available via Groq with a 8K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.2000/1M output tokens.
Meta Llama/Llama 4 Maverick 17b 128e Instruct
Meta Llama/Llama 4 Maverick 17b 128e Instruct is available via Groq with a 131K context window and up to 8,192 output tokens. Pricing: $0.2000/1M input tokens, $0.6000/1M output tokens.
Qwen/Qwen3 32b
Qwen/Qwen3 32b is available via Groq with a 131K context window and up to 131,000 output tokens. Pricing: $0.2900/1M input tokens, $0.5900/1M output tokens.
Llama 3.3 70b Versatile
Llama 3.3 70b Versatile is available via Groq with a 128K context window and up to 32,768 output tokens. Pricing: $0.5900/1M input tokens, $0.7900/1M output tokens.
Moonshotai/Kimi K2 Instruct 0905
Moonshotai/Kimi K2 Instruct 0905 is available via Groq with a 262K context window and up to 16,384 output tokens. Pricing: $1.00/1M input tokens, $3.00/1M output tokens.
Compare Groq model pricing
Use our pricing calculator to find the cheapest Groq model for your workload.