Hacker News

pedriquepacheco
So you wanna build a local RAG? blog.yakkomajuri.com

simonw30 minutes ago

My advice for building something like this: don't get hung up on a need for vector databases and embedding.

Full text search or even grep/rg are a lot faster and cheaper to work with - no need to maintain a vector database index - and turn out to work really well if you put them in some kind of agentic tool loop.

The big benefit of semantic search was that it could handle fuzzy searching - returning results that mention dogs if someone searches for canines, for example.

Give a good LLM a search tool and it can come up with searches like "dog OR canine" on its own - and refine those queries over multiple rounds of searches.

Plus it means you don't have to solve the chunking problem!

leetrout5 minutes ago

Simon have you ever given a talk or written about this sort of pragmatism? A spin on how to achieve this with Datasette is an easy thing to imagine IMO.

mips_avatar38 minutes ago

One thing I didn’t see here that might be hurting your performance is a lack of semantic chunking. It sounds like you’re embedding entire docs, which kind of breaks down if the docs contain multiple concepts. A better approach for recall is using some kind of chunking program to get semantic chunks (I like spacy though you have to configure it a bit). Then once you have your chunks you need to append context to how this chunk relates to the rest of your doc before you do your embedding. I have found anthropics approach to contextual retrieval to be very performant in my RAG systems (https://www.anthropic.com/engineering/contextual-retrieval) you can just use gpt oss 20b as the model for generation of context.

Unless I’ve misunderstood your post and you are doing some form of this in your pipeline you should see a dramatic improvement in performance once you implement this.

barbazoo8 minutes ago

> What that means is that when you're looking to build a fully local RAG setup, you'll need to substitute whatever SaaS providers you're using for a local option for each of those components.

Even starting with having "just" the documents and vector db locally is a huge first step and much more doable than going with a local LLM at the same time. I don't know any one or any org that has the resources to run their own LLM at scale.

nilirl30 minutes ago

Why is it implicit that semantic search will outperform lexical search?

Back in 2023 when I compared semantic search to lexical search (tantivy; BM25), I found the search results to be marginally different.

Even if semantic search has slightly more recall, does the problem of context warrant this multi-component, homebrew search engine approach?

By what important measure does it outperform a lexical search engine? Is the engineering time worth it?

hn-front (c) 2024 voximity
source