Hacker News

ftonato

Show HN: A production-style recommender using vector retrieval and re-ranking

I’ve been exploring how recommendation systems are actually implemented in production, beyond just training models.

A common pattern I kept seeing is to split the problem into two stages:

1. Retrieve a small set of relevant candidates

2. Re-rank them using a model

Instead of doing brute-force inference across all items, I built a small prototype around this idea.

The flow looks like this:

- Store embeddings in a vector database (ChromaDB)

- Retrieve the Top-K most similar items/users based on vector similarity

- Run a TensorFlow.js model to re-rank the candidates

The goal is to reduce the search space before applying inference, which seems necessary when latency and scale matter.

What I found interesting is that once you move to this approach, a lot of the complexity shifts from the model itself to the retrieval layer:

- choosing K

- filtering candidates

- embedding quality

- latency vs recall trade-offs

Curious how others approach this in real systems:

- How do you decide on K?

- Do you rely purely on vector similarity or add heuristics?

- How do you handle re-ranking at scale?