InclusionAI

inclusionAI: Ming-flash-omni Preview

InclusionAI ming-flash-omni-preview

Model Information
Slug ming-flash-omni-preview
Aliases ming-flash-omni-preview mingflashomnipreview

Ming-flash-omni Preview is a multimodal AI model that supports text, speech, image, and video inputs while generating text, speech, and image outputs. Built on a sparse 100-billion-parameter Mixture-of-Experts architecture with 6 billion active parameters per token, it achieves state-of-the-art speech recognition across 12 ContextASR benchmarks and delivers significant improvements for 15 Chinese dialects. The model introduces high-fidelity text rendering in image generation, enhanced scene consistency, and superior identity preservation during editing. Its innovative generative segmentation capability unifies segmentation and editing into a semantics-preserving framework, achieving a 0.90 score on GenEval for fine-grained spatial control. A Dual-Balanced Routing Mechanism ensures stable cross-modal training through auxiliary load balancing loss and modality-level router bias updates. Compared to Ming-lite-omni v1.5, Ming-flash-omni Preview offers substantial advancements in architecture efficiency, editing precision, and speech understanding, establishing itself as a highly competitive solution among leading multimodal models.

Available at 1 Provider
Provider Model Name Original Model Input ($/1M) Output ($/1M) Free
ZenMUX icon ZenMUX inclusionAI: Ming-flash-omni Preview inclusionai/ming-flash-omni-preview $0.80 $1.80 Visit