Hacker News

elawler24
Show HN: Velvet – Store OpenAI requests in your own DB usevelvet.com

Hey HN! We’re Emma and Chris, founders of Velvet (https://www.usevelvet.com).

Velvet proxies OpenAI calls and stores the requests and responses in your PostgreSQL database. That way, you can analyze logs with SQL (instead of a clunky UI). You can also set headers to add caching and metadata (for analysis).

Backstory: We started by building some more general AI data tools (like a text-to-SQL editor). We were frustrated by the lack of basic LLM infrastructure, so ended up pivoting to focus on the tooling we wanted. So many existing apps, like Helicone, were hard to use as power users. We just wanted a database.

Scale: We’ve already warehoused 50m requests for customers, and have optimized the platform for scale and latency. We’ve built the proxy on Cloudflare Workers, and latency is nominal. We’ve built some “yak shaving” features that were really complex such as decomposing OpenAI Batch API requests so you can track each log individually. One of our early customers (https://usefind.ai/) makes millions of OpenAI requests per day, up to 1500 requests per second.

Vision: We’re trying to build development tools that have as little UI as possible, that can be controlled entirely with headers and code. We also want to blend cloud and on-prem for the best of both worlds — allowing for both automatic updates and complete data ownership.

Here are some things you can do with Velvet logs:

- Observe requests, responses, and latency

- Analyze costs by metadata, such as user ID

- Track batch progress and speed

- Evaluate model changes

- Export datasets for fine-tuning of gpt-4o-mini

(this video shows how to do each of those: https://www.youtube.com/watch?v=KaFkRi5ESi8)

--

To see how it works, try chatting with our demo app that you can use without logging in: https://www.usevelvet.com/sandbox

Setting up your own proxy is 2 lines of code and takes ~5 mins.

Try it out and let us know what you think!


DeveloperErrata4 days ago

Seems neat - I'm not sure if you do anything like this but one thing that would be useful with RAG apps (esp at big scales) is vector based search over cache contents. What I mean is that, users can phrase the same question (which has the same answer) in tons of different ways. If I could pass a raw user query into your cache and get back the end result for a previously computed query (even if the current phrasing is a bit different than the current phrasing) then not only would I avoid having to submit a new OpenAI call, but I could also avoid having to run my entire RAG pipeline. So kind of like a "meta-RAG" system that avoids having to run the actual RAG system for queries that are sufficiently similar to a cached query, or like a "approximate" cache.

davidbarker4 days ago

I was impressed by Upstash's approach to something similar with their "Semantic Cache".

https://github.com/upstash/semantic-cache

  "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."

OutOfHere4 days ago

I strongly advise not relying on embedding distance alone for it because it'll match these two:

1. great places to check out in Spain

2. great places to check out in northern Spain

Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.

DeveloperErrata4 days ago

I agree, a naive approach to approximate caching would probably not work for most use cases.

I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits

OutOfHere4 days ago

Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.

jankovicsandras3 days ago

A hybrid search approach might help, like combining vector similarity scores with e.g. BM25 scores.

Shameless plug (FOSS): https://github.com/jankovicsandras/plpgsql_bm25 Okapi BM25 search implemented in PL/pgSQL for Postgres.

OutOfHere4 days ago

That would totally destroy the user experience. Users change their query so they can get a refined result, not so they get the same tired result.

pedrosorio4 days ago

Even across users it’s a terrible idea.

Even in the simplest of applications where all you’re doing is passing “last user query” + “retrieved articles” into openAI (and nothing else that is different between users, like previous queries or user data that may be necessary to answer), this will be a bad experience in many cases.

Queries A and B may have similar embeddings (similar topic) and it may be correct to retrieve the same articles for context (which you could cache), but they can still be different questions with different correct answers.

elawler24op4 days ago

Depends on the scenario. In a threaded query, or multiple queries from the same user - you’d want different outputs. If 20 different users are looking for the same result - a cache would return the right answer immediately for no marginal cost.

OutOfHere4 days ago

That's not the use case of the parent comment:

> for queries that are sufficiently similar

elawler24op4 days ago

Thanks for the detail! This is a use case we plan to support, and it will be configurable (for when you don’t want it). Some of our customers run into this when different users ask a similar query - “NY-based consumer founders” vs “consumer founders in NY”.

OutOfHere4 days ago

A cache is better when it's local rather than on the web. And I certainly don't need to pay anyone to cache local request responses.

knowaveragejoe4 days ago

How would one achieve something similarly locally, short of just running a proxy and stuffing the request/response pairs into a DB? I'm sure it wouldn't be too terribly hard to write something, but I figure something open source already exists for OpenAI-compatible APIs.

w-ll4 days ago

Recently did this workflow.

Started with nginx proxy with rules to cache base on url/params. Wanted more control over it and explored lua/redis apis, and opted to build a app to do be a little more smart for what i wanted. Extra ec2 cost is negligible compared to cache savings.

doubleorseven4 days ago

Yes! It's amazing how many things you can do with lua in nginx. I had a server that served static websites where the files and the certificates for each website were stored in a bucket. Over 20k websites with 220ms overhead if the certificate wasn't cached.

OutOfHere4 days ago

There are any number of databases and language-specific caching libraries. A custom solution or the use of a proxy isn't necessary.

nemothekid4 days ago

As I understand it, your data remains local, as it leverages your own database.

manojlds4 days ago

Why do I even ha e to use this saas? This should be a open source lib or just a practice that I implement myself.

dsmurrell4 days ago

Implement it yourself then and save your $$ at the expense of your time.

torlok3 days ago

If you factor in dealing with somebody's black box code 6 months into a project, you'll realise you're saving both money and time.

OutOfHere3 days ago

It's not complicated as you make it. There are numerous caching libraries, and databases have been a thing for decades.

manojlds3 days ago

Like this is not a big thing to implement, that's my point. There are already libraries like OpenLLMetry and sink to a DB. We are doing something like this already.

nemothekid3 days ago

Yes, the ol' Dropbox "you can already build such a system yourself quite trivially by getting an FTP account" comment. Even after 17 years, people still feel the need to make this point.

heavensteeth4 days ago

So they can charge you for it.

angoragoats3 days ago

I don't understand the problem that's being solved here. At the scale you're talking about (e.g. millions of requests per day with FindAI), why would I want to house immutable log data inside a relational database, presumably alongside actual relational data that's critical to my app? It's only going to bog down the app for my users.

There are plenty of other solutions (examples include Presto, Athena, Redshift, or straight up jq over raw log files on disk) which are better suited for this use case. Storing log data in a relational DB is pretty much always an anti-pattern, in my experience.

philip12093 days ago

Philip here from Find AI. We store our Velvet logs in a dedicated DB. It's postgres now, but we will probably move it to Clickhouse at some point. Our main app DB is in postgres, so everybody just knows how it works and all of our existing BI tools support it.

Here's a video about what we do with the data: https://www.youtube.com/watch?v=KaFkRi5ESi8

elawler24op3 days ago

It's a standalone DB, just for LLM logging. Since it's your DB - you can configure data retention, and migrate data to an analytics DB / warehouse if cost or latency becomes a concern. And, we're happy to support whatever DB you require (ClickHouse, Big Query, Snowflake, etc) in a managed deployment.

angoragoats3 days ago

I guess I should have elaborated to say that even if you're spinning up a new database expressly for this purpose (which I didn't see specifically called out in your docs anywhere as a best practice), you're starting off on the wrong foot. Maybe I'm old-school, but relational databases should be for relational data. This data isn't relational, it's write-once log data, and it belongs in files on disk, or in purpose-built analytics tools, if it gets too large to manage.

elawler24op3 days ago

Got it. We can store logs to your purpose-built analytics DB of choice.

PostgreSQL (Neon) is our free self-serve offering because it’s easy to spin up quickly.

phillipcarter4 days ago

Congrats on the launch! I love the devex here and things you're focusing on.

Have you had thoughts on how to you might integrate data from an upstream RAG pipeline, say as a part of a distributed trace, to aid in debugging the core "am I talking to the LLM the right way" use case?

elawler24op4 days ago

Thanks! You can layer on as much detail as you need by including meta tags in the header, which is useful for tracing RAG and agent pipelines. But would love to understand your particular RAG setup and whether that gives you enough granularity. Feel free to email me too - [email protected]

simple104 days ago

Looks cool. Just out of curiosity, how does this compare to other OpenLLMetry-type observation tools like Arize, Traceloop, LangSmith, LlamaTrace, etc.?

From personal experience, they're all pretty simple to install and use. Then mileage varies in analyzing and taking action on the logs. Does Velvet offer something the others do not?

For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

RAG support would be great to add to Velvet. Specifically pgvector and pinecone traces. But maybe Velvet already supports it and I missed it in the quick read of the docs.

elawler24op3 days ago

Velvet takes <5 mins to get set up in any language, which is why we started as a proxy. We offer managed / custom deployments for enterprise customers, so we can support your client requirements.

We warehouse logs directly to your DB, so you can do whatever you want with the data. Build company ops on top of the DB, run your own evals, join with other tables, hash data, etc.

We’re focusing on backend eng workflows so it’s simple to run continuous monitoring, evals, and fine-tuning with any model. Our interface will focus on surfacing data and analytics to PMs and researchers.

For pgvector/pinecone RAG traces - you can start by including meta tags in the header. Those values will be queryable in the JSON object.

Curious to learn more though - feel free to email me at [email protected].

marcklingen4 days ago

disclosure: founder/maintainer of Langfuse (OSS LLM application observability)

I believe proxy-based implementations like Velvet are excellent for getting started and solve for the immediate debugging use case; simply changing the base path of the OpenAI SDK makes things really simple (the other solutions mentioned typically require a few more minutes to set up).

At Langfuse (similarly to the other solutions mentioned above), we prioritize asynchronous and batched logging, which is often preferred for its scalability and zero impact on uptime and latency. We have developed numerous integrations (for openai specifically an SDK wrapper), and you can also use our SDKs and Decorators to integrate with any LLM.

> For my client projects, I've been leaning towards open source platforms like Arize so clients have the option of pulling it inhouse if needed. Most often for HIPAA requirements.

I can echo this. We observe many self-hosted deployments in larger enterprises and HIPAA-related companies, thus we made it very simple to self-host Langfuse. Especially when PII is involved, self-hosting makes adopting an LLM observability tool much easier in larger teams.

reichertjalex3 days ago

Very nice! I really like the design of the whole product, very clean and simple. Out of curiosity, do you have a designer, or did you take inspiration from any other products (for the landing page, dashboard, etc) when you were building this? I'm always curious how founders approach design these days.

elawler24op3 days ago

I’m a product designer, so we tend to approach everything from first principles. Our aim is to keep as much complexity in code as possible, and only surface UI when it solves a problem for our users. We like using tools like Vercel and Supabase - so a lot of UI inspiration comes from the way they surface data views. The AI phase of the internet will likely be less UI focused, which allows for more integrated and simple design systems.

ramon1564 days ago

> we were frustrated by the lack of LLM infrastructure

May I ask what you specifically were frustrated about? Seems like there are more than enough solutions

elawler24op4 days ago

There were plenty of UI-based low code platforms. But they required that we adopt new abstractions, use their UI, and log into 5 different tools (logging, observability, analytics, evals, fine-tuning) just to run basic software infra. We didn’t feel these would be long-term solutions, and just wanted the data in our own DB.

TripleChecker3 days ago

Does it support MySQL for queries/storage - or only PostgreSQL?

Also, caught a few typos on the site: https://triplechecker.com/s/o2d2iR/usevelvet.com?v=qv9Qk

elawler24op3 days ago

We can support any database you need, PostgreSQL is the easiest way to get started.

turnsout4 days ago

Nice! Sort of like Langsmith without the Langchain, which will be an attractive value proposition to many developers.

efriis4 days ago

Howdy Erick from LangChain here! Just a quick clarification that LangSmith is designed to work great for folks not using LangChain as well :)

Check out our quickstart for an example of what that looks like! https://docs.smith.langchain.com/

turnsout4 days ago

TIL! LangSmith is great.

ji_zai4 days ago

Neat! I'd love to play with this, but site doesn't open (403: Forbidden).

elawler24op4 days ago

Might be a Cloudflare flag. Can you email me your IP address and we'll look into it? [email protected].

codegladiator4 days ago

Error: Forbidden

403: Forbidden ID: bom1::k5dng-1727242244208-0aa02a53f334

hiatus4 days ago

This seems to require sharing our data we provide to OpenAI with yet another party. I don't see any zero-retention offering.

elawler24op4 days ago

The self-serve version is hosted (it’s easy to try locally), but we offer managed deployments where you bring your own DB. In this case your data is 100% yours, in your PostgreSQL. That’s how Find AI uses Velvet.

knowaveragejoe4 days ago

Where is this mentioned? Is there a github(etc) somewhere that someone can use this without using the hosted version?

elawler24op3 days ago

Right now, it’s a managed service that we set up for you (we’re still a small team). Email me if you’re interested and I can share details - [email protected].

[deleted]4 days agocollapsed

bachback4 days ago

interesting, seems more of an enterprise offering. its OpenAI only for and you plan to expand to other vendors? anything opensource?

elawler24op3 days ago

We already support OpenAI and Anthropic endpoints, and can add models/endpoints quickly based on your requirements. We plan to expand to Llama and other self-hosted models soon. Do you have a specific model you want supported?

beepbooptheory3 days ago

I guess I don't understand what this is now. If its just proxying requests and storing in db, can't it be literally any API?

elawler24op3 days ago

We could support any API. We’re focused on building data pipelines and tooling for LLM use cases.

[deleted]4 days agocollapsed

hn-front (c) 2024 voximity
source