RAE-SigLip Training

Overview

RAE (Representation AutoEncoder) with SigLip is a vision representation learning model with LPIPS loss and EMA support.

Supported Features

Feature

Support

FSDP2

USP

Muon Optimizer

Liger Kernel

Packing

NSA

Expert Parallelism

Highlights: Representation AutoEncoder, LPIPS loss, EMA

Quick Start

See the example run script:

Key Features

  • LPIPS Loss: Perceptual loss for better visual quality

  • EMA: Exponential Moving Average for stable representations

  • SigLip Encoder: Strong vision encoder backbone

Usage

# Training
bash examples/representation_autoencoder/run.sh

# Reconstruction
python examples/representation_autoencoder/reconstruct.py