Hacker News

BUFU
Ollama 0.4 is released with support for Meta's Llama 3.2 Vision models locally ollama.com

Patrick_Devine4 hours ago

This was a pretty heavy lift for us to get out which was why it took a while. In addition to writing new image processing routines, a vision encoder, and doing cross attention, we also ended up re-architecting the way the models get run by the scheduler. We'll have a technical blog post soon about all the stuff that ended up changing.

csomar6 minutes ago

Any info of when we will get the 11B and 90B models?

zozbot2342 hours ago

How long until Vulkan Compute support is merged into ollama? There is an active pull request at https://github.com/ollama/ollama/pull/5059 but it seems to be stalled with no reviews.

exe344 hours ago

did you feed back into llama.cpp?

also, can it do grounding like cogvlm?

either way, great job!

Patrick_Devine4 hours ago

It's difficult because we actually ditched a lot of the c++ code with this change and rewrote it in golang. Specifically server.cpp has been excised (which was deprecated by llama.cpp anyway), and the image processing routines are all written in go as well. We also bypassed clip.cpp and wrote our own routines for the image encoder/cross attention (using GGML).

The hope is to be able to get more multimodal models out soon. I'd like to see if we can get Pixtral and Qwen2.5-vl in relatively soon.

qrios29 minutes ago

> Specifically server.cpp has been excised (which was deprecated by llama.cpp anyway)

Is there any more specific info available about who (llama.cpp or Ollama) removed what, where? As far as I can see, the server is still part of llama.cpp.

And more generally: Is this the moment when Ollama and Llama part ways?

o11c3 hours ago

Did they fix multiline editing yet? Any interactive input that wraps across 3+ lines seems to become off-by-one when editing (but fine if you only append?), and this will be only more common with long filenames being added. And triple-quote breaks editing entirely.

How does this address the security concern of filenames being detected and read when not wanted?

[deleted]2 hours agocollapsed

inasring4 hours ago

Can it run the quantized models?

fallingsquirrel3 hours ago

vasilipupkin4 hours ago

how likely is it to run on a reasonably new windows laptop?

ac293 hours ago

With 16GB of RAM these vision models will run. How quickly depends on a lot of factors.

hn-front (c) 2024 voximity
source