InclusionAI
•
ming-flash-omni-preview
| Slug |
ming-flash-omni-preview
|
|---|---|
| Aliases | ming-flash-omni-preview mingflashomnipreview |
| Name | InclusionAI |
|---|---|
| Website | https://www.inclusion-ai.org |
Ming-flash-omni Preview is a multimodal AI model that supports text, speech, image, and video inputs while generating text, speech, and image outputs. Built on a sparse 100-billion-parameter Mixture-of-Experts architecture with 6 billion active parameters per token, it achieves state-of-the-art speech recognition across 12 ContextASR benchmarks and delivers significant improvements for 15 Chinese dialects. The model introduces high-fidelity text rendering in image generation, enhanced scene consistency, and superior identity preservation during editing. Its innovative generative segmentation capability unifies segmentation and editing into a semantics-preserving framework, achieving a 0.90 score on GenEval for fine-grained spatial control. A Dual-Balanced Routing Mechanism ensures stable cross-modal training through auxiliary load balancing loss and modality-level router bias updates. Compared to Ming-lite-omni v1.5, Ming-flash-omni Preview offers substantial advancements in architecture efficiency, editing precision, and speech understanding, establishing itself as a highly competitive solution among leading multimodal models.