Model Information
| Slug | llama-3-3-nemotron-super-49b-v1-5 |
|---|---|
| LLM.txt | View |
| Release Date | October 10, 2025 |
Organization
| Name | Nvidia |
|---|---|
| Website | https://www.nvidia.com/en-us/ai/ |
Model Description
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and multi-turn chat, followed by multiple RL stages; Reward-aware Preference Optimization (RPO) for alignment, RL with Verifiable Rewards (RLVR) for step-wise reasoning, and iterative DPO to refine tool-use behavior. A distillation-driven Neural Architecture Search (“Puzzle”) replaces some attention blocks and varies FFN widths to shrink memory footprint and improve throughput, enabling single-GPU (H100/H200) deployment while preserving instruction following and CoT quality.
In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
In internal evaluations (NeMo-Skills, up to 16 runs, temp = 0.6, top_p = 0.95), the model reports strong reasoning/coding results, e.g., MATH500 pass@1 = 97.4, AIME-2024 = 87.5, AIME-2025 = 82.71, GPQA = 71.97, LiveCodeBench (24.10–25.02) = 73.58, and MMLU-Pro (CoT) = 79.53. The model targets practical inference efficiency (high tokens/s, reduced VRAM) with Transformers/vLLM support and explicit “reasoning on/off” modes (chat-first defaults, greedy recommended when disabled). Suitable for building agents, assistants, and long-context retrieval systems where balanced accuracy-to-cost and reliable tool use matter.
Available at 11 Providers
| Provider | Type | Model Name | Original Model | Input ($/1M) | Output ($/1M) | Free | Actions | |
|---|---|---|---|---|---|---|---|---|
|
|
Nvidia |
llama-3.3-nemotron-super-49b-v1.5
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
$0.00 | $0.00 | |||
|
|
OpenRouter |
Chat
Code
|
Llama 3.3 Nemotron Super 49B V1.5
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
$0.10 | $0.40 | ||
|
|
DeepInfra |
Llama-3.3-Nemotron-Super-49B-v1.5
|
nvidia/Llama-3.3-Nemotron-Super-49B-v1.5
|
$0.10 | $0.40 | |||
|
|
Kilo Code |
Code
|
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
$0.10 | $0.40 | ||
|
|
WaveSpeed AI |
Chat
Code
|
llama-3.3-nemotron-super-49b-v1.5
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
$0.11 | $0.44 | ||
|
|
Nano-GPT |
Nvidia Nemotron Super 49B v1.5
|
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
|
- | - | |||
|
|
ValorGPT |
Llama 3.3 Nemotron Super 49B V1.5
|
nvidia-llama-3.3-nemotron-super-49b-v1.5
|
- | - | |||
|
|
Yupp |
Chat
|
Llama 3.3 Nemotron Super 49B V1.5 (OpenRouter)
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
- | - | ||
|
|
LangDB |
llama-3.3-nemotron-super-49b-v1.5
|
llama-3.3-nemotron-super-49b-v1.5
|
- | - | |||
|
|
Blackbox AI |
Code
|
blackboxai/nvidia/llama-3.3-nemotron-super-49b-v1.5
|
blackboxai/nvidia/llama-3.3-nemotron-super-49b-v1.5
|
- | - | ||
|
|
Writingmate |
Chat
Code
|
NVIDIA: Llama 3.3 Nemotron Super 49B V1.5
|
nvidia/llama-3.3-nemotron-super-49b-v1.5
|
- | - |