# inclusionAI: Ming-flash-omni Preview Ming-flash-omni Preview is a multimodal AI model that supports text, speech, image, and video inputs while generating text, speech, and image outputs. Built on a sparse 100-billion-parameter Mixture-of-Experts architecture with 6 billion active parameters per token, it achieves state-of-the-art speech recognition across 12 ContextASR benchmarks and delivers significant improvements for 15 Chinese dialects. The model introduces high-fidelity text rendering in image generation, enhanced scene consistency, and superior identity preservation during editing. Its innovative generative segmentation capability unifies segmentation and editing into a semantics-preserving framework, achieving a 0.90 score on GenEval for fine-grained spatial control. A Dual-Balanced Routing Mechanism ensures stable cross-modal training through auxiliary load balancing loss and modality-level router bias updates. Compared to Ming-lite-omni v1.5, Ming-flash-omni Preview offers substantial advancements in architecture efficiency, editing precision, and speech understanding, establishing itself as a highly competitive solution among leading multimodal models. ## Model Information - **Organization**: [InclusionAI](/llm.txt) - **Slug**: ming-flash-omni-preview - **Available at Providers**: 0 ## Providers | Provider | Name | $ Input (per 1M) | $ Output (per 1M) | Free | Link | |----------|------|-----------------|------------------|------|------| --- [← Back to all providers](/llm.txt)