# Qwen3-Omni MoE Training ## Overview Qwen3-Omni MoE is a multimodal Mixture-of-Experts model supporting image, audio, and text with Expert Parallelism. ## Supported Features | Feature | Support | |---------|---------| | **FSDP2** | ✅ | | **USP** | ❌ | | **Muon Optimizer** | ✅ | | **Liger Kernel** | ✅ | | **Packing** | ✅ | | **NSA** | ❌ | | **Expert Parallelism (EP)** | ✅ | **Highlights**: Multimodal MoE with EP (image, audio, text) ## Quick Start See the example configuration: - **Example Config**: [examples/qwen3_omni_moe_ep2.yaml](../../examples/qwen3_omni_moe_ep2.yaml) ## Key Configuration ```yaml dataset_config: dataset_type: qwen_omni_iterable processor_config: processor_type: Qwen2_5OmniProcessor video_backend: qwen_omni_utils model_config: attn_implementation: flash_attention_2 monkey_patch_kwargs: patch_type: ["liger"] trainer_args: use_liger_kernel: true use_rmpad: true fsdp2: true ep_degree: 2 # Expert Parallelism degree ``` ## Expert Parallelism Expert Parallelism (EP) distributes MoE experts across GPUs for efficient training. Set `ep_degree` based on your GPU availability.