LMMs Engine

Getting Started

  • Getting Started
    • Introduction
      • Getting Started
      • Key Concepts
    • Quick Start
      • Installation
      • Your First Training Job
      • Basic Configuration
      • Next Steps
    • Train
      • Key fields
    • Run
    • Run direct with cli and override with hydra
      • Basic Usage
      • Common Overrides
      • Advanced Example
      • Overriding Existing YAML Config
      • Tips

User Guide

  • Datasets and Packing: Naive vs Streaming
    • Overview
    • Quick Start
      • Basic Usage
    • Dataset Implementation Details
      • Naive Dataset (Precomputed Packing)
        • How it works:
        • Characteristics:
        • When to use:
      • Streaming Dataset (On-the-fly Packing)
        • How it works:
        • Characteristics:
        • When to use:
    • Distributed Training Behavior
      • Naive Dataset
      • Streaming Dataset
    • Configuration Reference
      • Core Parameters
      • Packing Strategies (Naive Only)
      • Configuration Examples
        • YAML Configuration
        • Python Configuration
    • Performance Tips
      • Optimizing Packing Efficiency
      • Memory Management
        • For Naive Dataset:
        • For Streaming Dataset:
    • Troubleshooting
      • Common Issues
        • 1. Distributed Training Hangs
        • 2. Imbalanced Workload
        • 3. OOM with Naive Dataset
    • Decision Matrix
    • Migration Guide
      • From Naive to Streaming
      • From Streaming to Naive
  • Data Preparation
    • Recommended Data Format
      • Cloud Data Access
    • HF Format
  • Performance Optimization
    • Flash Attention Installation
  • Merging FSDP Checkpoints
    • Recommended: Using the Built-in Merger
      • CLI Usage
      • API Usage
      • Features
    • Legacy: Using merge_fsdp.py Tool
    • Prerequisites
    • Evaluation
      • Manual Evaluation
      • Automatic Evaluation During Training
        • How Automatic Merging Works
        • Configuration
        • Benefits
  • FSDP2 Mixed Precision: Reduce Dtype Considerations
    • Default Behavior
    • Benefits
    • Caution
    • Configuration Example
  • Asynchronous Checkpoint Evaluation During Training
    • Overview
    • Prerequisites
      • Start the LMMS-Eval Server
    • Configuration
      • Configuration Parameters
        • eval_strategy
        • eval_config Parameters
    • How It Works
      • 1. Checkpoint Saving
      • 2. Background Polling
      • 3. Metric Logging
      • 4. Training Completion
    • Example Configuration
    • EMA Checkpoint Evaluation
    • Distributed Training
    • Monitoring Evaluation Progress
      • Check W&B/TensorBoard
      • Check Evaluation Server Logs
      • Check Training Logs
    • Troubleshooting
      • Evaluations Not Starting
      • Evaluation Results Not Appearing
      • Duplicate Evaluations
    • Best Practices
    • Additional Resources

Tutorial

  • Tutorial
    • Adding a Custom Parallel Strategy (Expert Parallel)
      • Overview
      • Step 1: Create the Model Folder
      • Step 2: Define Your Parallel Style
      • Step 3: Implement Parallelization Functions
      • Step 4: Apply FSDP2
      • Step 5: Register Your Parallelization Function
      • Key Utilities
      • Configuration in Training Arguments
      • Summary

Developer Guide

  • Creating New Datasets
    • Architecture Overview
      • Key Components
    • Choosing Between Map-Style and Iterable Datasets
      • Map-Style Datasets (Recommended for Most Use Cases)
      • Iterable Datasets (For Streaming Data)
    • Quick Start: Creating a Map-Style Dataset
      • Step 1: Create Your Dataset Class
      • Step 2: Register Your Dataset
    • Creating an Iterable Dataset
    • Required Methods to Implement
      • 1. load_from_json(data, data_folder=None)
      • 2. load_from_hf(data)
      • 3. get_collator()
      • 4. _build_from_config() (Optional)
    • Available Media Loading Methods
      • Loading Images
      • Loading Audio
      • Loading Videos
    • Supported Data Formats
    • Object Storage Support
    • Configuration
    • Best Practices
    • Example: Complete Vision Dataset
    • Testing Your Dataset
    • Common Issues
      • Issue: AttributeError for load_from_json
      • Issue: Missing media files
      • Issue: Processor returns empty tensors
    • See Also
  • Creating Custom Data Processors
    • Architecture Overview
      • Key Components
    • When to Create a Custom Processor
    • Processor Roles
    • Architecture: Three Main Approaches
      • 1. Text-Only Processor (Simplest)
      • 2. Audio-Only Processor
      • 3. Multimodal Processor (Most Common)
    • Quick Start: Creating a Vision-Only Processor
      • Step 1: Create the Processor Class
      • Step 2: Register Configuration
    • Required Methods
      • 1. __init__(config: ProcessorConfig)
      • 2. build()
      • 3. _build_processor()
      • 4. process(images, hf_messages, ...)
      • 5. save_pretrained(path: str)
    • Message Format
    • Key Implementation Patterns
      • Pattern 1: Simple Inheritance from Base
      • Pattern 2: Custom Processing Logic
      • Pattern 3: Token Expansion
    • Properties and Utilities
    • Chat Templates
    • ProcessorConfig
    • Testing Your Processor
    • Best Practices
    • Common Issues and Solutions
      • Issue: KeyError for ‘input_ids’
      • Issue: Token count mismatch
      • Issue: Chat template not applied
      • Issue: Audio/Video not processed
    • Advanced: Custom Collator Integration
    • See Also
  • Adding a new model (and performance monkey patches)
    • Overview
    • 1) File layout
    • 2) Register the model
    • Monkey Patcher: Registering and Applying Patches
    • Concepts
    • Quick start
      • 1) Register a patch function
      • 2) Ensure the registration module is imported
      • 3) Apply the patch (pre-init)
      • 4) Apply the patch to an instance (post-init)
    • Signature filtering
    • Overwrite behavior
    • Current limitations
    • Recommendations for adding new patches
  • Adding a new trainer
    • How the registry works
    • Constructor requirements
    • Step 1: Implement your trainer
    • Step 2: Ensure registration is imported
    • Step 3: Select your trainer in config
    • Step 4: Run
    • Troubleshooting

Reference

  • Main API of this framework
    • Base Class
    • Mixin Class
    • Processor Class
    • Collator
    • Dataset
    • Proto
  • Design Principle
    • Factory Pattern
    • Builder
  • Video Configuration Guide
    • Video Configuration Parameters
      • Basic Video Parameters
      • Frame Sampling Parameters
      • Video Size Limits
      • Filtering Options
    • Example Configuration
    • Processor Configuration for Video
    • Migration from Torchvision Backend
      • Migration Steps
    • Training Performance Optimization
      • Memory Management
    • Troubleshooting
      • Common Issues
    • Best Practices
    • Audio from Video Extraction
      • Configuration
  • Model FLOPs Utilization (MFU) Reference
    • Overview
    • Text Models
      • Qwen2.5 7B & Qwen2.5-VL-7B
    • Image Models (Vision-Language)
      • Qwen2.5-VL-7B & Qwen3-VL-8B
    • Video Models (Vision-Language)
      • Qwen2.5-VL-7B
      • Qwen3-VL-8B with Sequence Parallel
    • Unified Models
      • Bagel
    • Important Considerations
      • ViT FLOPs Not Included in MFU Calculation
      • Packing Length Recommendations
      • Optimization Trade-offs

Models

  • BAGEL Model Training Guide
    • Overview
    • Prerequisites
    • QuickStart
      • 1. Prepare Your Dataset
      • 2. Overwrite config or load from converted weight
      • 3. Configure Training
      • Basic Training Configuration
    • Dataset Format
      • Required Fields
      • Example Dataset Entry
    • Key Configuration Options
      • FSDP2 Configuration
    • Advanced Features
      • Native Sparse Attention (NSA) Support
        • Prerequisites
        • Configuration
        • NSA Parameters
        • Usage Notes
      • Sequence Packing
      • Mixed Precision Training
      • Gradient Checkpointing
      • Model Architecture Details
      • Components
      • Training Objectives
  • Qwen-VL Model Training Guide
    • Overview
      • Qwen2.5-VL
      • Qwen3-VL
    • Prerequisites
      • Install Flash Attention
    • Quick Start
      • 1. Prepare Your Dataset
      • 2. Configure Training
    • Training Configuration (Example)
      • Qwen2.5-VL Configuration
      • Qwen3-VL Configuration
    • Key Configuration Parameters
      • Dataset Type (Example)
      • Processor Configuration
      • FSDP2 Configuration
    • Advanced Features
      • Sequence Parallelism
      • Liger Kernel
      • RMPad (Remove Padding)
      • Freezing Modules
      • Mixed Precision Training
      • Gradient Checkpointing
    • Run Training
      • Launch Command
      • Multi-Node Training
    • Model Architecture Details
      • Qwen2.5-VL Architecture
      • Qwen3-VL Architecture
      • Architecture Comparison
      • Model Selection Guide
    • Troubleshooting
      • Common Issues
        • 1. Out of Memory (OOM)
        • 2. Flash Attention Installation Issues
        • 3. Slow Training Speed
        • 4. Video Loading Errors
        • 5. Qwen3-VL Dataset Length Unknown
    • Performance Tips
      • Optimizing Training Speed
      • Memory Management
    • Best Practices
    • Model Variants
      • Qwen2.5-VL
      • Qwen3-VL
    • Additional Resources
      • Official Documentation
      • Technical Papers
      • LMMS Engine Guides
      • Community Resources
  • dLLM (Diffusion Language Model) Training
    • Overview
    • Supported Features
    • Quick Start
    • Available Configurations
    • Key Configuration
    • Architecture
  • FLA Models (DGN) Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
    • About FLA
  • RAE-SigLip Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Features
    • Usage
  • SiT (Scalable Interpolant Transformer) Training
    • Overview
    • Supported Features
    • Quick Start
    • Model Variants
    • Key Configuration
    • Features
  • WanVideo Training
    • Overview
    • Supported Features
    • Quick Start
    • Model Variants
    • Key Configuration
  • Qwen2.5 LLM Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
  • Qwen2.5-Omni Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
  • Qwen3-VL MoE Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
    • Expert Parallelism
  • Qwen3-MoE Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
    • Expert Parallelism
  • Qwen3-Omni MoE Training
    • Overview
    • Supported Features
    • Quick Start
    • Key Configuration
    • Expert Parallelism

Troubleshooting

  • Troubleshooting
    • Transformers 5.0 Migration Guide
      • Overview
      • Compatibility Matrix
      • Installation
        • For Qwen3.5 Training (New Feature)
        • For Legacy Models (LLaVA-OneVision1.5, DLLM)
      • Verified Compatibilities
        • Tested with transformers >= 5.0
        • Tested with transformers < 5.0
      • How It Works
      • Troubleshooting
        • Error: “Module not found” for Qwen3.5
        • Error: “Module not found” for LLaVA-OneVision1.5 or DLLM
        • Error: ImportError when importing models
      • Implementation Details
      • Related Resources
LMMs Engine
  • LMMs Engine Documentation
  • View page source

LMMs Engine Documentation

Welcome to the LMMs Engine documentation! LMMs Engine is a flexible and extensible framework for training large multimodal models with support for various model architectures, datasets, and training strategies.

Getting Started

  • Getting Started
    • Introduction
    • Quick Start
    • Train
    • Run
    • Run direct with cli and override with hydra

User Guide

  • Datasets and Packing: Naive vs Streaming
  • Data Preparation
  • Performance Optimization
  • Merging FSDP Checkpoints
  • FSDP2 Mixed Precision: Reduce Dtype Considerations
  • Asynchronous Checkpoint Evaluation During Training

Tutorial

  • Tutorial
    • Adding a Custom Parallel Strategy (Expert Parallel)

Developer Guide

  • Creating New Datasets
  • Creating Custom Data Processors
  • Adding a new model (and performance monkey patches)
  • Adding a new trainer

Reference

  • Main API of this framework
  • Design Principle
  • Video Configuration Guide
  • Model FLOPs Utilization (MFU) Reference

Models

  • BAGEL Model Training Guide
  • Qwen-VL Model Training Guide
  • dLLM (Diffusion Language Model) Training
  • FLA Models (DGN) Training
  • RAE-SigLip Training
  • SiT (Scalable Interpolant Transformer) Training
  • WanVideo Training
  • Qwen2.5 LLM Training
  • Qwen2.5-Omni Training
  • Qwen3-VL MoE Training
  • Qwen3-MoE Training
  • Qwen3-Omni MoE Training

Troubleshooting

  • Troubleshooting

Indices and tables

  • Index

  • Module Index

  • Search Page

Next

© Copyright 2024, LMMs Engine Contributors.

Built with Sphinx using a theme provided by Read the Docs.