Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Models · AIMv2 ViT-3B/14 + Llama 3.0 8B4 results · 4 benchmarks
Model card

AIMv2 ViT-3B/14 + Llama 3.0 8B.

unknown
§ 02 · Benchmarks

Every benchmark AIMv2 ViT-3B/14 + Llama 3.0 8B has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01GQAMultimodal · Visual Question Answeringaccuracy73.3%#1/4source ↗
02VQA v2.0Multimodal · Visual Question Answeringaccuracy80.9%#7/16source ↗
03DocVQAComputer Vision · Document Understandinganls30.4%#21/21source ↗
04TextVQAMultimodal · Visual Question Answeringaccuracy58.2%#21/23source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 03 · Strengths by area

Where AIMv2 ViT-3B/14 + Llama 3.0 8B actually performs.

Multimodal
3
benchmarks
avg rank #9.7
Computer Vision
1
benchmark
avg rank #21.0
§ 04 · Papers

1 paper with results for AIMv2 ViT-3B/14 + Llama 3.0 8B.

  1. 2024-11-21· 4 results

    Multimodal Autoregressive Pre-training of Large Vision Encoders

§ 06 · Sources & freshness

Where these numbers come from.

pwc-dump
4
results
0 of 4 rows marked verified.