# Nebius Token Factory Nebius is a cloud platform that provides access to AI models through their TokenFactory inference service. The platform offers a wide range of open-source and proprietary models including DeepSeek, MiniMax, Kimi, Qwen, and others. Nebius focuses on providing fast, cost-effective AI inference with competitive pricing per 1M tokens and various quantization options (fp4, fp8) to optimize performance and cost. ## Provider Information - **Website**: - **Available Models**: 42 ## Models | Name | Original Name | $ Input Price (per 1M) | $ Output Price (per 1M) | Free | Link | |------|---------------|---------------------|----------------------|------|------| | gpt-oss-20b | openai/gpt-oss-20b | 0.05 | 0.20 | | [View](https://huggingface.co/openai/gpt-oss-20b) | | gpt-oss-120b | openai/gpt-oss-120b | 0.15 | 0.60 | | [View](https://huggingface.co/openai/gpt-oss-120b) | | MiniMax-M2.1 | MiniMaxAI/MiniMax-M2.1 | 0.30 | 1.20 | | [View](https://huggingface.co/MiniMaxAI/MiniMax-M2.1) | | DeepSeek-V3.2 | deepseek-ai/DeepSeek-V3.2 | 0.30 | 0.45 | | [View](https://huggingface.co/deepseek-ai/DeepSeek-V3.2) | | Kimi-K2-Thinking | moonshotai/Kimi-K2-Thinking | 0.60 | 2.50 | | [View](https://huggingface.co/moonshotai/Kimi-K2-Thinking) | | Qwen3-Coder-480B-A35B-Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct | 0.40 | 1.80 | | [View](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct) | | Hermes-4-405B | NousResearch/Hermes-4-405B | 1.00 | 3.00 | | [View](https://huggingface.co/NousResearch/Hermes-4-405B) | | Hermes-4-70B | NousResearch/Hermes-4-70B | 0.13 | 0.40 | | [View](https://huggingface.co/NousResearch/Hermes-4-70B) | | GLM-4.5 | zai-org/GLM-4.5 | 0.60 | 2.20 | | [View](https://huggingface.co/zai-org/GLM-4.5) | | GLM-4.5-Air | zai-org/GLM-4.5-Air | 0.20 | 1.20 | | [View](https://huggingface.co/zai-org/GLM-4.5-Air) | | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 | 0.80 | 2.40 | | [View](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) | | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528-fast | 2.00 | 6.00 | | [View](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) | | Qwen3-235B-A22B-Thinking-2507 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 0.20 | 0.80 | | [View](https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507) | | Qwen3-235B-A22B-Instruct-2507 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 0.20 | 0.60 | | [View](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) | | Qwen3-30B-A3B-Thinking-2507 | Qwen/Qwen3-30B-A3B-Thinking-2507 | 0.10 | 0.30 | | [View](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507) | | Qwen3-30B-A3B-Instruct-2507 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 0.10 | 0.30 | | [View](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | | Qwen3-Coder-30B-A3B-Instruct | Qwen/Qwen3-Coder-30B-A3B-Instruct | 0.10 | 0.30 | | [View](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | | Qwen3-32B | Qwen/Qwen3-32B | 0.10 | 0.30 | | [View](https://huggingface.co/Qwen/Qwen3-32B) | | Qwen3-32B | Qwen/Qwen3-32B-fast | 0.20 | 0.60 | | [View](https://huggingface.co/Qwen/Qwen3-32B) | | Llama-3_1-Nemotron-Ultra-253B-v1 | nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 | 0.60 | 1.80 | | [View](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1) | | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 | 0.50 | 1.50 | | [View](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) | | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324-fast | 0.75 | 2.25 | | [View](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) | | Llama-3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | 0.13 | 0.40 | | [View](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | | Llama-3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct-fast | 0.25 | 0.75 | | [View](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | | Gemma-3-27b-it | google/gemma-3-27b-it-fast | 0.20 | 0.60 | | [View](https://huggingface.co/google/gemma-3-27b-it) | | Gemma-3-27b-it | google/gemma-3-27b-it | 0.10 | 0.30 | | [View](https://huggingface.co/google/gemma-3-27b-it) | | Meta-Llama-3.1-8B-Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct-fast | 0.03 | 0.09 | | [View](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | | Meta-Llama-3.1-8B-Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct | 0.02 | 0.06 | | [View](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) | | Qwen2.5-Coder-7B | Qwen/Qwen2.5-Coder-7B-fast | 0.03 | 0.09 | | [View](https://huggingface.co/Qwen/Qwen2.5-Coder-7B) | | Qwen2.5-VL-72B-Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 0.25 | 0.75 | | [View](https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct) | | Gemma-2-2b-it | google/gemma-2-2b-it | 0.02 | 0.06 | | [View](https://huggingface.co/google/gemma-2-2b-it) | | Meta-Llama-Guard-3-8B | meta-llama/Llama-Guard-3-8B | 0.02 | 0.06 | | [View](https://huggingface.co/meta-llama/Llama-Guard-3-8B) | | Qwen3-Embedding-8B | Qwen/Qwen3-Embedding-8B | 0.01 | 0.00 | | [View](https://huggingface.co/Qwen/Qwen3-Embedding-8B) | | FLUX.1-schnell | black-forest-labs/flux-schnell | | | | [View](https://huggingface.co/black-forest-labs/FLUX.1-schnell) | | FLUX.1-dev | black-forest-labs/flux-dev | | | | [View](https://huggingface.co/black-forest-labs/FLUX.1-dev) | | Nemotron-Nano-V2-12b | nvidia/Nemotron-Nano-V2-12b | 0.07 | 0.20 | | | | Nemotron-3-Nano-30B-A3B | nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B | 0.06 | 0.24 | | [View](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) | | GLM-4.7 | zai-org/GLM-4.7-FP8 | 0.40 | 2.00 | | [View](https://huggingface.co/zai-org/GLM-4.7-FP8) | | Qwen3-Next-80B-A3B-Thinking | Qwen/Qwen3-Next-80B-A3B-Thinking | 0.15 | 1.20 | | [View](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking) | | Gemma-2-9b-it | google/gemma-2-9b-it-fast | 0.03 | 0.09 | | [View](https://huggingface.co/google/gemma-2-9b-it) | | Kimi-K2.5 | moonshotai/Kimi-K2.5 | 0.50 | 2.50 | | [View](https://huggingface.co/moonshotai/Kimi-K2.5) | | Kimi-K2-Instruct | moonshotai/Kimi-K2-Instruct | 0.50 | 2.40 | | [View](https://huggingface.co/moonshotai/Kimi-K2-Instruct) | --- [← Back to all providers](/llm.txt)