Qwen2.5-Omni Training
Overview
Qwen2.5-Omni is a unified multimodal model supporting image, audio, and text understanding.
Supported Features
Feature |
Support |
|---|---|
FSDP2 |
✅ |
USP |
✅ |
Muon Optimizer |
✅ |
Liger Kernel |
✅ |
Packing |
✅ |
NSA |
❌ |
Expert Parallelism |
❌ |
Highlights: Unified multimodal (image, audio, text)
Quick Start
See the example configuration and run script:
Example Config: examples/qwen2_5_omni/example_config.yaml
Run Script: examples/qwen2_5_omni/run.sh
Key Configuration
dataset_config:
dataset_type: qwen_omni_iterable
processor_config:
processor_type: Qwen2_5OmniProcessor
audio_max_length: 60
video_backend: qwen_omni_utils
model_config:
load_from_pretrained_path: Qwen/Qwen2.5-Omni-7B
attn_implementation: flash_attention_2
trainer_args:
use_liger_kernel: true
use_rmpad: true
fsdp2: true