Main API of this framework
Base Class
Base Trainer
Base Dataset
Mixin Class
LMMsDataMixin
Processor Class
AeroDataProcessor
LLaVADataProcessor
… (lots of processor)
Collator
Vision Collator (Most of the collator we want to use)
Dataset
Vision Audio
Vision
Proto
Data Proto
LMMs Proto