# Qwen2.5-Omni Training

## Overview

Qwen2.5-Omni is a unified multimodal model supporting image, audio, and text understanding.

## Supported Features

| Feature | Support |
|---------|---------|
| **FSDP2** | ✅ |
| **USP** | ✅ |
| **Muon Optimizer** | ✅ |
| **Liger Kernel** | ✅ |
| **Packing** | ✅ |
| **NSA** | ❌ |
| **Expert Parallelism** | ❌ |

**Highlights**: Unified multimodal (image, audio, text)

## Quick Start

See the example configuration and run script:
- **Example Config**: [examples/qwen2_5_omni/example_config.yaml](../../examples/qwen2_5_omni/example_config.yaml)
- **Run Script**: [examples/qwen2_5_omni/run.sh](../../examples/qwen2_5_omni/run.sh)

## Key Configuration

```yaml
dataset_config:
  dataset_type: qwen_omni_iterable
  processor_config:
    processor_type: Qwen2_5OmniProcessor
    audio_max_length: 60
  video_backend: qwen_omni_utils

model_config:
  load_from_pretrained_path: Qwen/Qwen2.5-Omni-7B
  attn_implementation: flash_attention_2

trainer_args:
  use_liger_kernel: true
  use_rmpad: true
  fsdp2: true
```