Codesota · Models · BLIP-2Salesforce3 results · 3 benchmarks
Model card

BLIP-2.

Salesforceopen-sourceUnknown paramsFrozen image encoder + Q-Former + frozen LLM

Bootstrapped vision-language pre-training with Q-Former connecting frozen encoders. OPT/FlanT5 backbone. 2023. Source: arxiv:2301.12597.

§ 01 · Benchmarks

Every benchmark BLIP-2 has a recorded score for.

#BenchmarkArea · TaskMetricValueRankDateSource
01COCO CaptionsMultimodal · Image CaptioningCIDEr145.80#1/22023-01-30source ↗
02VQA v2.0Multimodal · Visual Question Answeringaccuracy82.2%#4/72023-01-30source ↗
03TextVQAMultimodal · Visual Question Answeringaccuracy42.5%#9/92023-01-30source ↗
Rank column shows this model’s position vs all other models scored on the same benchmark + metric (competitors after the slash). #1 in red means current SOTA. Sorted by rank, then newest result.
§ 02 · Strengths by area

Where BLIP-2 actually performs.

Multimodal
3
benchmarks
avg rank #4.7
§ 03 · Papers

1 paper with results for BLIP-2.

  1. 2023-01-30· Multimodal· 3 results

    BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

§ 04 · Related models

Other Salesforce models scored on Codesota.

CoAct-1
1 result · 1 SOTA
CodeT5-base
1 result · 1 SOTA
GTA1 (7B)
1 result
CodeT5+
Unknown params · 0 results
CodeT5+ 2B
Unknown params · 0 results
§ 05 · Sources & freshness

Where these numbers come from.

arxiv
3
results
3 of 3 rows marked verified.