
import torch
from transformers import AutoModel
MODEL_NAME = "BAAI/BGE-VL-base" # or "BAAI/BGE-VL-large"
model = AutoModel.from_pretrained(MODEL_NAME, trust_remote_code=True) # You must set trust_remote_code=True
model.set_processor(MODEL_NAME)
model.eval()
with torch.no_grad():
query = model.encode(
images = "./assets/cir_query.png",
text = "Make the background dark, as if the camera has taken the photo at night"
)
candidates = model.encode(
images = ["./assets/cir_candi_1.png", "./assets/cir_candi_2.png"]
)
scores = query @ candidates.T
print(scores)

-
零样本组合图像检索 BGE-VL在零样本组合图像检索任务中树立了新的性能标杆。在CIRCO基准测试中,BGE-VL-base模型,尽管只有1.49亿个参数,却超越了所有之前的模型,包括那些参数量多出50倍的模型。此外,BGE-VL-MLLM相较于之前的最先进模型,性能提升了8.1%。

-
在MMEB上的零样本性能 尽管仅在图像文本到图像的范式下进行训练,BGE-VL-MLLM在大规模多模态嵌入基准测试(MMEB)上实现了最先进的零样本性能。这表明MegaPairs在多模态嵌入方面具有出色的泛化能力。


https://github.com/VectorSpaceLab/MegaPairs
https://hf-mirror.com/BAAI/BGE-VL-MLLM-S2
https://arxiv.org/pdf/2412.14475
(文:PaperAgent)