Groq

groq

Chat Code

Updated 1 hour ago

Groq is an AI inference company that pioneered the LPU (Language Processing Unit) in 2016, the first chip purpose-built for AI inference. Their proprietary LPU inference engine delivers ultra-low latency AI inference, with benchmarks showing Llama 2 70B running at 300 tokens per second—reportedly 10x faster than NVIDIA H100 clusters and 18x faster on Anyscale's LLMPerf Leaderboard. Groq focuses on making AI inference fast and affordable at scale, offering both cloud services and on-premises deployment options, with a mission to enable real-time AI applications that were previously impossible due to latency constraints.

Visit Website LLM.txt

Browse 17 LLM models available from Groq. Compare prices and features.

Models (17)

Organization	Model Name	Original Model	Input	Output
OpenAI	GPT OSS 120B	`openai/gpt-oss-120b`	$0.15	$0.60	View
Moonshot AI	Kimi K2-Instruct-0905	`moonshotai/kimi-k2-instruct-0905`	$1.00	$3.00	View
Moonshot AI	Kimi K2 Instruct	`moonshotai/kimi-k2-instruct`	$1.00	$3.00	View
OpenAI	GPT OSS 20B	`openai/gpt-oss-20b`	$0.08	$0.30	View
DeepSeek	DeepSeek R1 Distill Llama 70B	`deepseek-r1-distill-llama-70b`	$0.75	$0.99	View
Groq	Llama 3.1 8B Instant	`llama-3.1-8b-instant`	$0.05	$0.08	View
Groq	Mistral Saba 24B	`mistral-saba-24b`	$0.79	$0.79	View
Groq	Llama 3 8B	`llama3-8b-8192`	$0.05	$0.08	View
Groq	Qwen QwQ 32B	`qwen-qwq-32b`	$0.29	$0.39	View
Groq	Llama 3 70B	`llama3-70b-8192`	$0.59	$0.79	View
Groq	Llama Guard 3 8B	`llama-guard-3-8b`	$0.20	$0.20	View
google	Gemma 2 9B	`gemma2-9b-it`	$0.20	$0.20	View
Groq	Llama 3.3 70B Versatile	`llama-3.3-70b-versatile`	$0.59	$0.79	View
qwen	Qwen3 32B	`qwen/qwen3-32b`	$0.29	$0.59	View
Meta	llama-4-scout-17b-16e-instruct	`meta-llama/llama-4-scout-17b-16e-instruct`	$0.11	$0.34	View
Meta	llama-4-maverick-17b-128e-instruct	`meta-llama/llama-4-maverick-17b-128e-instruct`	$0.20	$0.60	View
Groq	Llama Guard 4 12B	`meta-llama/llama-guard-4-12b`	$0.20	$0.20	View

Back to Providers Visit Website