Method | POPE | HallusionBench | MMBench-EN | MMBench-CN | SEED-I | MMMU | MMVP | GQA | AI2D |
---|---|---|---|---|---|---|---|---|---|
LLaVA-v1.5-7B | 86.2 | 47.5 | 65.5 | 58.5 | 66.0 | 34.4 | 20.0 | 62.0 | 55.4 |
LLaVA-v1.6-7B | 86.5 | 35.8 | 67.4 | 60.1 | 70.2 | 35.8 | 37.8 | 64.2 | 67.1 |
Cambrian-1-8B | 87.4 | 48.7 | 75.9 | 68.9 | 74.7 | 42.7 | 51.3 | 64.6 | 73.0 |
Ross-7B | 88.3 | 57.1 | 79.1 | 77.1 | 73.6 | 46.6 | 56.7 | 65.5 | 79.3 |
LLaVA-v1.5-13B | 82.5 | 44.9 | 68.8 | 63.6 | 68.2 | 36.6 | 31.9 | 63.3 | 60.8 |
LLaVA-v1.6-13B | 86.2 | 36.7 | 70.0 | 64.1 | 71.9 | 36.2 | 35.6 | 65.4 | 72.4 |
Cambrian-1-13B | 85.7 | 54.0 | 75.7 | 65.9 | 74.4 | 40.0 | 41.3 | 64.3 | 73.6 |
Ross-13B | 88.7 | 56.4 | 73.6 | 67.4 | 71.1 | 41.3 | 44.7 | 65.2 | 73.8 |
@article{wang2024ross,
author={Haochen Wang and Anlin Zheng and Yucheng Zhao and Tiancai Wang and Ge Zheng and Xiangyu Zhang and Zhaoxiang Zhang},
title={Reconstructive Visual Instruction Tuning},
journal={arXiv preprint arXiv:2410.09575},
year={2024},
}