# Inference

Inference.net is a high-performance AI inference platform that provides access to leading large language models including Meta's Llama family, DeepSeek, Mistral, Google's Gemma, Qwen, and OpenAI models. The platform focuses on delivering fast, efficient inference with various precision options (fp-16, fp-8, bf-16) to optimize performance and cost. Inference.net offers an OpenAI-compatible API interface, making it easy to integrate with existing applications while providing access to state-of-the-art models through their unified endpoint.

## Provider Information

- **Website**: <https://inference.net/>
- **Available Models**: 20

## Models

| Name | Original Name | $ Input Price (per 1M) | $ Output Price (per 1M) | Free | Link |
|------|---------------|---------------------|----------------------|------|------|
| Qwen 3 Embedding 4B | qwen/qwen3-embedding-4b | 0.01 | 0.00 |  |  |
| Llama 3.2 11B Vision Instruct | meta/llama-3.2-11b-vision-instruct | 0.06 | 0.06 |  |  |
| Llama 3.1 8B Instruct | meta/llama-3.1-8b-instruct | 0.03 | 0.03 |  |  |
| Llama 3.2 3B Instruct | meta/llama-3.2-3b-instruct | 0.02 | 0.02 |  |  |
| Llama 3.2 1B Instruct | meta/llama-3.2-1b-instruct | 0.01 | 0.01 |  |  |
|  | meta-llama/llama-3.2-1b-instruct/fp-16 |  |  |  |  |
|  | meta-llama/llama-3.2-3b-instruct/fp-16 |  |  |  |  |
|  | meta-llama/llama-3.1-8b-instruct/fp-16 |  |  |  |  |
|  | meta-llama/llama-3.2-11b-instruct/fp-16 |  |  |  |  |
|  | meta-llama/llama-3.3-70b-instruct/fp-8 |  |  |  |  |
|  | meta-llama/llama-3.1-70b-instruct/fp-16 |  |  |  |  |
|  | deepseek/deepseek-v3/fp-8 |  |  |  |  |
|  | deepseek/deepseek-v3-0324/fp-8 |  |  |  |  |
|  | deepseek/deepseek-r1/fp-8 |  |  |  |  |
|  | deepseek/deepseek-r1-0528/fp-8 |  |  |  |  |
|  | qwen/qwen2.5-7b-instruct/bf-16 |  |  |  |  |
|  | qwen/qwq-32b/fp-8 |  |  |  |  |
|  | qwen/qwen3-30b-a3b/fp8 |  |  |  |  |
|  | openai/gpt-oss-120b |  |  |  |  |
|  | openai/gpt-oss-20b |  |  |  |  |

---

[← Back to all providers](/llm.txt)