LLaVA-NeXT (Stronger) models released

LLaVA-NeXT (Stronger) models are released, with support of stronger LLM including LLama-3 (8B) and Qwen-1.5 (72B/110B)

Code

Blog

Today, we expanded the LLaVA-NeXT with recent stronger open LLMs, reporting our findings on more capable language models:

Increasing multimodal capaiblies with stronger & larger language models, up to 3x model size. This allows LMMs to present better visual world knowledge and logical reasoning inherited from LLM. It supports LLaMA3 (8B) and Qwen-1.5 (72B and 110B).
Better visual chat for more real-life scenarios, covering different applications. To evaluate the improved multimodal capabilities in the wild, we collect and develop new evaluation datasets, LLaVA-Bench (Wilder), which inherit the spirit of LLaVA-Bench (in-the-wild) to study daily-life visual chat and enlarge the data size for comprehensive evaluation.

Comments