BAAI
/

Visual Question Answering
Transformers
Safetensors
English
Chinese
qwen2
text-generation
multimodal
text-generation-inference

Why 26 encoders instead of 27?

#2
by niktheod - opened

Hi and thank you for your great work,

I noticed that your vision tower has 26 transformer encoder layers and not 27, as siglip-so400m-patch14-384. Could you explain this discripancy please?

Sign up or log in to comment