# Fireworks AI

Fireworks AI is a high-performance AI inference platform that provides fast, affordable access to over 200 open-source and proprietary AI models. The platform specializes in production-grade inference with ultra-low latency, offering models including Llama, Qwen, DeepSeek, Mistral, Google Gemma, FLUX image models, and more. Fireworks features serverless deployment, custom fine-tuning capabilities, and competitive pricing per 1M tokens. The platform is known for its speed and reliability, with models available through OpenAI-compatible APIs and dedicated instances for enterprise workloads.

## Provider Information

- **Website**: <https://fireworks.ai/>
- **Available Models**: 19

## Models

| Name | Original Name | $ Input Price (per 1M) | $ Output Price (per 1M) | Free | Link |
|------|---------------|---------------------|----------------------|------|------|
| FLUX.1 Kontext Pro | flux-kontext-pro | 0.04 |  |  |  |
| OpenAI gpt-oss-20b | gpt-oss-20b | 0.07 | 0.30 |  |  |
| OpenAI gpt-oss-120b | gpt-oss-120b | 0.15 | 0.60 |  |  |
| FLUX.1 Kontext Max | flux-kontext-max | 0.08 |  |  |  |
| DeepSeek V3.1 | deepseek-v3p1 | 0.56 | 1.68 |  |  |
| Deepseek v3.2 | deepseek-v3p2 | 0.56 | 1.68 |  |  |
| FLUX.1 [dev] FP8 | flux-1-dev-fp8 | 0.00 |  |  |  |
| GLM-4.7 | glm-4p7 | 0.60 | 2.20 |  |  |
| Kimi K2 Instruct 0905 | kimi-k2-instruct-0905 | 0.60 | 2.50 |  |  |
| Kimi K2 Thinking | kimi-k2-thinking | 0.60 | 2.50 |  |  |
| MiniMax-M2.1 | minimax-m2p1 | 0.30 | 1.20 |  |  |
| Qwen3 VL 30B A3B Instruct | qwen3-vl-30b-a3b-instruct | 0.15 | 0.60 |  |  |
| Kimi K2.5 | kimi-k2p5 | 0.60 | 3.00 |  |  |
| MiniMax-M2.5 | minimax-m2p5 | 0.30 | 1.20 |  |  |
| Llama 3.3 70B Instruct | llama-v3p3-70b-instruct | 0.90 |  |  |  |
| Qwen3 8B | qwen3-8b | 0.20 |  |  |  |
| Qwen3 Embedding 8B | qwen3-embedding-8b | 0.00 |  |  |  |
| Qwen3 Reranker 8B | qwen3-reranker-8b | 0.00 |  |  |  |
| Qwen3 VL 30B A3B Thinking | qwen3-vl-30b-a3b-thinking | 0.15 | 0.60 |  |  |

---

[← Back to all providers](/llm.txt)