Hacker News

prestoj

Playing with Vision Embeddings prestonbjensen.com

markusMB3 hours ago

Beautiful illustrations I find, 'Playing' is just the free and motivated version of 'exploration'.

One thought on your nicely illustrated "key observation [is] that neural networks tend to place features along directions": my guess is that the neural net was TOLD to behave that way by choosing e.g. Cosine Loss?

archermarksan hour ago

Nice article! The generated images make me so nostalgic for the early days of AI image generation. DeepDream and others had such uncanny, interesting generations.

RealityVoid2 hours ago

For some reason, the uncanniness of the feature pictures are deeply unsettling for me. It just stirs intense unease. A bit amusing, to be honest.

jcattle5 hours ago

Very nice visualizations, thanks for that!

One thing I still struggle with in my head is how these vision embeddings can then be used to give LLMs eyes.

Because you somehow need a giant training set which describes images in natural language, no? Is that actually how it works, or is there some smart trick so you don't need to pay labellers a bunch of money to look at pictures and describe them.

dilyevsky4 hours ago

> Because you somehow need a giant training set which describes images in natural language, no?

That's definitely one way - they train a text encoder together with an image encoder on a labelled set of images. WL & 3b1b made a nice video on it: https://www.youtube.com/watch?v=iv-5mZ_9CPY

jcattle4 hours ago

Thanks I'll check out that video

[deleted]an hour agocollapsed

cdogukankan hour ago

[flagged]

[deleted]6 hours agocollapsed

SkitterKherpi4 hours ago

[dead]

source