# Fireworks AI Fireworks AI is a high-performance AI inference platform that provides fast, affordable access to over 200 open-source and proprietary AI models. The platform specializes in production-grade inference with ultra-low latency, offering models including Llama, Qwen, DeepSeek, Mistral, Google Gemma, FLUX image models, and more. Fireworks features serverless deployment, custom fine-tuning capabilities, and competitive pricing per 1M tokens. The platform is known for its speed and reliability, with models available through OpenAI-compatible APIs and dedicated instances for enterprise workloads. ## Provider Information - **Website**: - **Available Models**: 19 ## Models | Name | Original Name | $ Input Price (per 1M) | $ Output Price (per 1M) | Free | Link | |------|---------------|---------------------|----------------------|------|------| | FLUX.1 Kontext Pro | flux-kontext-pro | 0.04 | | | | | OpenAI gpt-oss-20b | gpt-oss-20b | 0.07 | 0.30 | | | | OpenAI gpt-oss-120b | gpt-oss-120b | 0.15 | 0.60 | | | | FLUX.1 Kontext Max | flux-kontext-max | 0.08 | | | | | DeepSeek V3.1 | deepseek-v3p1 | 0.56 | 1.68 | | | | Deepseek v3.2 | deepseek-v3p2 | 0.56 | 1.68 | | | | FLUX.1 [dev] FP8 | flux-1-dev-fp8 | 0.00 | | | | | GLM-4.7 | glm-4p7 | 0.60 | 2.20 | | | | Kimi K2 Instruct 0905 | kimi-k2-instruct-0905 | 0.60 | 2.50 | | | | Kimi K2 Thinking | kimi-k2-thinking | 0.60 | 2.50 | | | | MiniMax-M2.1 | minimax-m2p1 | 0.30 | 1.20 | | | | Qwen3 VL 30B A3B Instruct | qwen3-vl-30b-a3b-instruct | 0.15 | 0.60 | | | | Kimi K2.5 | kimi-k2p5 | 0.60 | 3.00 | | | | MiniMax-M2.5 | minimax-m2p5 | 0.30 | 1.20 | | | | Llama 3.3 70B Instruct | llama-v3p3-70b-instruct | 0.90 | | | | | Qwen3 8B | qwen3-8b | 0.20 | | | | | Qwen3 Embedding 8B | qwen3-embedding-8b | 0.00 | | | | | Qwen3 Reranker 8B | qwen3-reranker-8b | 0.00 | | | | | Qwen3 VL 30B A3B Thinking | qwen3-vl-30b-a3b-thinking | 0.15 | 0.60 | | | --- [← Back to all providers](/llm.txt)