# SiT (Scalable Interpolant Transformer) Training ## Overview SiT is a diffusion transformer for image generation using interpolant framework, supporting ImageNet-1K training with Classifier-Free Guidance (CFG). ## Supported Features | Feature | Support | |---------|---------| | **FSDP2** | ✅ | | **USP** | ❌ | | **Muon Optimizer** | ✅ | | **Liger Kernel** | ❌ | | **Packing** | ❌ | | **NSA** | ❌ | | **Expert Parallelism** | ❌ | **Highlights**: Interpolant Transformer, CFG, ImageNet-1K ## Quick Start See the example configuration and run script: - **Example Config**: [examples/scalable_interpolant_transformer/sit_xl_2.yaml](../../examples/scalable_interpolant_transformer/sit_xl_2.yaml) - **Run Script**: [examples/scalable_interpolant_transformer/run.sh](../../examples/scalable_interpolant_transformer/run.sh) - **Documentation**: [examples/scalable_interpolant_transformer/README.md](../../examples/scalable_interpolant_transformer/README.md) ## Model Variants | Model | Parameters | Hidden Size | Depth | Heads | |-------|-----------|-------------|-------|-------| | SiT-S/2 | ~33M | 384 | 12 | 6 | | SiT-B/2 | ~130M | 768 | 12 | 12 | | SiT-L/2 | ~458M | 1024 | 24 | 16 | | SiT-XL/2 | ~675M | 1152 | 28 | 16 | ## Key Configuration ```yaml model_config: load_from_config: model_type: "sit" hidden_size: 1152 # XL model depth: 28 # XL model num_heads: 16 vae_path: "stabilityai/sd-vae-ft-ema" path_type: "Linear" prediction: "velocity" cfg_scale: 1.0 trainer_args: bf16: true fsdp2: true ``` ## Features - **Interpolant Paths**: Linear, GVP, VP - **EMA**: Exponential Moving Average for stable generation - **CFG**: Classifier-Free Guidance support - **VAE**: Stable Diffusion VAE for latent space encoding