# inclusionAI: Ming-flash-omni Preview

Ming-flash-omni Preview is a multimodal AI model that supports text, speech, image, and video inputs while generating text, speech, and image outputs. Built on a sparse 100-billion-parameter Mixture-of-Experts architecture with 6 billion active parameters per token, it achieves state-of-the-art speech recognition across 12 ContextASR benchmarks and delivers significant improvements for 15 Chinese dialects. The model introduces high-fidelity text rendering in image generation, enhanced scene consistency, and superior identity preservation during editing. Its innovative generative segmentation capability unifies segmentation and editing into a semantics-preserving framework, achieving a 0.90 score on GenEval for fine-grained spatial control. A Dual-Balanced Routing Mechanism ensures stable cross-modal training through auxiliary load balancing loss and modality-level router bias updates. Compared to Ming-lite-omni v1.5, Ming-flash-omni Preview offers substantial advancements in architecture efficiency, editing precision, and speech understanding, establishing itself as a highly competitive solution among leading multimodal models.

## Model Information

- **Organization**: [InclusionAI](/llm.txt)
- **Slug**: ming-flash-omni-preview
- **Available at Providers**: 0

## Providers

| Provider | Name | $ Input (per 1M) | $ Output (per 1M) | Free | Link |
|----------|------|-----------------|------------------|------|------|

---

[← Back to all providers](/llm.txt)