Qwen3-Omni MoE Training
Overview
Qwen3-Omni MoE is a multimodal Mixture-of-Experts model supporting image, audio, and text with Expert Parallelism.
Supported Features
Feature |
Support |
|---|---|
FSDP2 |
✅ |
USP |
❌ |
Muon Optimizer |
✅ |
Liger Kernel |
✅ |
Packing |
✅ |
NSA |
❌ |
Expert Parallelism (EP) |
✅ |
Highlights: Multimodal MoE with EP (image, audio, text)
Quick Start
See the example configuration:
Example Config: examples/qwen3_omni_moe_ep2.yaml
Key Configuration
dataset_config:
dataset_type: qwen_omni_iterable
processor_config:
processor_type: Qwen2_5OmniProcessor
video_backend: qwen_omni_utils
model_config:
attn_implementation: flash_attention_2
monkey_patch_kwargs:
patch_type: ["liger"]
trainer_args:
use_liger_kernel: true
use_rmpad: true
fsdp2: true
ep_degree: 2 # Expert Parallelism degree
Expert Parallelism
Expert Parallelism (EP) distributes MoE experts across GPUs for efficient training. Set ep_degree based on your GPU availability.