Please add multimodal projections

#2
by arbv - opened

The mmproj* files seem to be missing (Or are they omitted on purpose)?

Unsloth AI org

We're working on it!

arbv changed discussion title from Please add multimedia projections. to Please add multimodal projections

We're working on it!

I'm confused. Isn't the whole point of the new unified architecture that the mmproj is not needed anymore?

There are not separate encoders for vision / audio, so the mmproj will be very small but there are still specific tensors that are deployed via mmproj. See https://huggingface.co/ggml-org/gemma-4-12B-it-GGUF/tree/main

Okay unsloth mmproj is uploaded, these are also hanging on load for me πŸ€”

"unknown projector type: gemma4uv"... wait for llama.cpp update?

Support is now fully in llama.cpp but naturally you need to be running the correct version, the just pushed a fix for the long model loading

Unsloth AI org

Hey guys vision and audio works!
@arbv @BladeRunner3000 @Dampfinchen @G3d @Dev9124 @JohnUser

Vision (images) works great, no issues. I'm using Gemma 4 12B Q6_K_XL and Q8_0 with the BF16 mmproj file.

I'm having some issues getting audio (mp3s) working. I'm using llama.cpp version 9496 (94a220cd6) (just rebuilt). I got one file working, it is about 1.4mb in size. I tried 3 other mp3 files over 2mb and all are giving me a really weird output of "<|channel>thought" repeating. These .mp3 files work properly on a different (older) llama.cpp instance with Gemma 4 E4B.

I can confirm a similar issue with audio (mp3s) in the current Gemma 4 12B GGUF when running through llama.cpp(b9505).

The model works normally if I do not enable --jinja, but when --jinja is enabled, the output enters a repeated loop of:

<|channel|>thought
<|channel|>thought
<|channel|>thought
...

So this looks like a chat template / special token / EOG metadata issue rather than a general model loading failure.

Current workaround:

Do not enable --jinja for now.

Observed status:

Text generation: works
MTP draft: works
Image/mmproj: loads and responds
Audio: works
Jinja template: triggers <|channel|>thought loop

It may be worth checking whether the embedded chat template, control tokens, and EOG tokens are correctly exported in the GGUF metadata.

Sign up or log in to comment