Hacker News

fragmede
STORM: Get a Wikipedia-like report on your topic storm.genie.stanford.edu

accurrent4 months ago

I gave a prompt and it stright up hallucinated. My prompt was about writing an article about the advantages and disadvantages of rust in the robotics ecosystem. It claimed that google cartographer was written in rust. The annoying thing about this is that it was quite convincing, I found the citation it used to be geeks for geeks blogspam that did not mention cartographer any where so I went and checked it was a C++ only project. Its worrisome when you see people relying on llms for knowledge.

jeroenhd4 months ago

People trusting LLMs to tell the truth is the advanced version of people taking the first link on Google as indubitable facts.

This whole trend is going to get much worse before it gets better.

tikkun4 months ago

I'm optimistic that hallucination rates will go down quite a bit again with the next gen of models (gpt5 / claude 4 / gemini 2 / llama 4).

I've noticed that the hallucination rate of newer more SOTA models is much lower.

3.5 sonnet hallucinates less than gpt 4 which hallucinates less than gpt 3.5 which hallucinates less than llama 70b which hallucinates less than gpt 3.

nytesky4 months ago

Eventually won’t most training data be AI generated? Will we see feedback issues?

leettools4 months ago

We are actually working on a tool that provides similar functions (although we focus more on the knowledgebase curation part). Here is an article we generated from the prompt "the advantages and disadvantages of rust in the robotics ecosystem" (https://svc.leettools.com/#/share/leettools/research?id=9886...): the basic flow is to query Google using the prompt, generate the article outline using the search result summaries, and then generate each section separately. Interested to see your opinions on the differences, thanks!

accurrent4 months ago

I'm impressed, its better than the article I found written by Storm. That being said both tend to rely on whats available on the internet, so lack things that are more subtle. Its impressive that your article picked on Pixi. Of course as a practicing roboticist my arguments would be different, but at this point I'm knitpicking.

leettools4 months ago

Thanks for the feedback! Yeah, by default this kind of survey articles are generated by publicly available information through search results. So the quality depends a lot of Google's ranking mostly and your search terms. Right now we can add expert-picked documents to the KB and generate the results from the curated KB instead directly from the search. Better prompting (specific to the target field of study) and more iterations (have a quality check and rewrite accordingly) should also be very helpful.

Meganet4 months ago

[dead]

kingkongjaffa4 months ago

Very cool! I asked it to create an article on the topic of my thesis and it was very good, but it lacked nuance and second order thinking i.e. here's the thing, what are the consequences of it and potential mitigations. It was able to pull existing thinking on a topic but not really synthesise a novel insight.

Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking.

From the paper it seems like this is only marginally better than the benchmark approach they used to compare against:

>Outline-driven RAG (oRAG), which is identical to RAG in outline creation, but

>further searches additional information with section titles to generate the article section by section

It seems like the key ingredients are:

- generating questions

- addressing the topic from multiple perspectives

- querying similar wikipedia articles (A high quality RAG source for facts)

- breaking the problem down by first writing an outline.

Which we can all do at home and swap out the wikipedia articles with our own data sets.

kingkongjaffa4 months ago

I was able to mimic this in GPT with out the RAG component with this custom instruction prompt, it does indeed write decent content, better than other writing prompts I have seen.

PROMPT: create 3 diverse personas who would know about the user prompt generate 5 questions that each persona would ask or clarify use the questions to create a document outline, write the document with $your_role as the intended audience.

westurner4 months ago

PROMPT`: Then, after conducting background research, Generate testable and untestable hypotheses and also suggestions for further study given market challenges and relevant marginally advantageous new and proven technologies.

dredmorbius4 months ago

"Sign in with Google" is a show-stopper.

zackmorris4 months ago

Ya and unfortunately this is from Stanford. It's a private university, but that's still not a good look. It's amazing in 2024 that so many demos, especially in AI, are getting this wrong.

We're long overdue for better sources of online revenue. I understand that AI costs money to train (I don't believe that it costs substantial money to run - that's a scam) but if we thought that walled gardens were bad, we ain't seen nothin yet. We're entering an exclusive era where the haves enjoy vastly more money than the have nots, so basically the bottom half of the population will be ignored as customers. The good apps will be exclusive clubs that the plebeians gaze at from afar, like a reverse zoo.

I just want something where I can pay 1 cent to $1 to skip login. Ideally from a virtual account that's free to use but guilts me into feeding it money. So maybe after 100 logins I pay it a few dollars. And then a reward system where wealthy users can pay it forward so others can browse for free.

I would make it in my spare time, but of course there is no such thing in the 21st century climate of boom-bust cycles and mass layoffs.

anotheraccount94 months ago

Yes, and it's not possible to delete the account (or association with).

jgalt2124 months ago

And it's a challenge not to click that modal in error.

Meganet4 months ago

[dead]

mburns4 months ago

Reminds me of Cuil.

> Cuil worked on an automated encyclopedia called Cpedia, built by algorithmically summarizing and clustering ideas on the web to create encyclopedia-like reports. Instead of displaying search results, Cuil would show Cpedia articles matching the searched terms.

https://en.wikipedia.org/wiki/Cuil

chankstein384 months ago

Does anyone have more info on this? They thank Azure at the top so I'm assuming it's a flavor of GPT? How do they prevent hallucinations? I am always cautious about asking an LLM for facts because half of the time it feels like it just adds whatever it wants. So I'm curious if they addressed that here or if this is just poorly thought-out...

EMIRELADERO4 months ago

akiselev4 months ago

morsch4 months ago

Thanks. There's an example page (markdown) at the very end. You can pretty easily spot some weaknesses in the generated text, it's uncanny valley territory. The most interesting thing is that the article contains numbered references, but unfortunately those footnotes are missing from the example.

Sn0wCoder4 months ago

Not sure how it prevents hallucinations, but I tried inputting too much info and got a pop-up saying it was using Chat GPT 3.5 The article it generated was OK but seemed to repeat the same thing over and over with slightly different wording

infecto4 months ago

If you ask an LLM what color is the sky it might say purple but if you give it a paragraph describing the atmosphere and then ask the same question it will almost always answer correctly. I don't think hallucinations are as big of a problem as people make them out to be.

misnome4 months ago

So, it only works if you already know enough about the problem to not need to ask the LLM, check.

infecto4 months ago

Are you just writing negative posts without even seeing the product? The system queries the internet, aggregates that information and writes information based on your query.

misnome4 months ago

ChatGPT, please explain threaded discussions and context of statements as if you were talking to a five year old.

infecto4 months ago

Ahh so you are a child who has no intellectual capability past writing negative attack statements. Got it.

keiferski4 months ago

No, if the data you’re querying contains the information you need, then it is mostly fine to ask for that data in a format amendable to your needs.

o11c4 months ago

The problem with LLMs is not a data problem. LLMs are stupid even on data they just generated.

One recent catastrophic failure I found: Ask an LLM to generate 10 pieces of data. Then in a second input, ask it to select (say) only numbers 1, 3, and 5 from the list. The LLM will probably return results numbered 1, 3, and 5, but chances are at least one of them will actually copy the data from a different number.

wsve4 months ago

I'm absolutely not bullish on LLMs, but I think this is kinda judging a fish on its ability to climb a tree.

LLMs are looking at typical constructions of text, not an understanding of what it means. If you ask it what color the sky is, it'll find what text usually follows a sentence like that, and tries to construct a response from it.

If you ask it the answer to a math question, the only way it could reliably figure it out is if it has in its database an exact copy of that math question. Asking it to choose things from a list is kinda like that, but one could imagine that the designers would try to supplement that manually with a different technique from pure LLM.

smcin4 months ago

Any ideas why that misnumbering happens? It sounds a very basic thing to get wrong. And as a fallback, could be brute-force kludged with an extra pass which appends the output list to the prompt.

o11c4 months ago

It's an LLM, we cannot expect any real idea.

Unless of course we rephrase it as "when I roll 2d6, why do I sometimes get snake eyes?"

pistoriusp4 months ago

Yet remains unsolvable.

infecto4 months ago

Huh?

chx4 months ago

There are no hallucinations. It's just the normal bullshit people hang a more palatable name on. There is nothing else.

https://hachyderm.io/@inthehands/112006855076082650

> You might be surprised to learn that I actually think LLMs have the potential to be not only fun but genuinely useful. “Show me some bullshit that would be typical in this context” can be a genuinely helpful question to have answered, in code and in natural language — for brainstorming, for seeing common conventions in an unfamiliar context, for having something crappy to react to.

> Alas, that does not remotely resemble how people are pitching this technology.

infecto4 months ago

Why does this get downvoted so heavily? It’s my experience running LLM in production. At scale hallucinations are not a huge problem when you have reference material.

DylanDmitri4 months ago

Seems a promising approach. Feedback at the bottom is (?) missing a submit button. Article was fine, but veered into overly verbose with redundant sections. A simplification pass, even on the outline, could help.

kingkongjaffa4 months ago

It auto-saves I believe.

siscia4 months ago

We have been discussing a similar idea with friends.

The topic of knowledge synthesis is fascinating, especially in big organisations.

Moving away from fragmented documents into a set of facts from which LLM synthetize documents from, tailored for the reader.

There are few tricks that would be interesting to have working.

For instance the agent keep evaluating itself against a set of questions. Or user adding questions to see if the agent is able to understand the nuances of the topic and so if it can be trusted.

(Not dissimilar to what would be regression testing in classical software engineering)

Then the "homework" sections, when we ask human experts to evaluate that the facts stored by the agents are still relevant and up to date.

All these can then be enhanced with actions usable by the agent.

Think about fetching the PoC for a particular piece of software. It is the employer Foo.

If we write this down in a document, it will definitely get outdated when Foo move, or get promoted.

If we put this inside a knowledge synthesis system, the system itself may keep asking every 6 months to Foo if it is still the PoC for the software project.

Or it could daily talk with the LDPA system and ask the same question as soon as it notices that Foo has changed its position or reporting structure.

This can be expanded for processes to follow. Report to create, etc...

OutOfHere4 months ago

STORM motivated me to independently create my own similar project https://github.com/impredicative/newssurvey which works very differently to write a survey article on a medical or science topic. Its generated samples are linked in the readme.

sitkack4 months ago

I like this, but the quality is lower and more voluminous than phind or perplexity.

But I like the direction of the research. I'd like to be able to specify the output reduction prompts and to tweak the evaluation agents.

This is "just" multi-agent summarization and synthesis. Most summarizers are already doing this.

Nice thing that is this is open source, https://github.com/stanford-oval/storm

[1] https://www.phind.com/search?cache=z3qe9c0z6yb0x1hqbq64mrci

[2] https://www.perplexity.ai/search/please-summarze-and-explain...

dvt4 months ago

I want to build this locally, I think it would be an absolute killer product. Could also consider doing an internet "deep dive" where the agent would browse for maybe 1-2 hours before sorting & collating the data. Add multi-modality for even more intelligence-gathering.

ukuina4 months ago

This is a neat idea. DEVONagent, but actually agentic.

https://www.devontechnologies.com/apps/devonagent

andai4 months ago

Fascinating. Last summer, inspired by AutoGPT, I made a simple Python script that does a web search for a query and uses that to answer the user's question. Looking at this I'm thinking, I could take the web results and ask it to reformat them in the style of Wikipedia, and wondering how that compares.

(I built it because ChatGPT couldn't search the web yet. When Phind launched a few weeks later, my project was basically obsolete!)

It seems the main improvement this paper has over that naive approach is the quality of the inputs, i.e. using "trusted sources" rather than random web results. (They appear to get their sources from Wikipedia itself?)

I'm not sure how much value all the other steps in the process add though.

WillAdams4 months ago

I keep getting:

>Sorry, STORM cannot follow arbitrary instruction. Please input a topic you want to learn about. (Our input filtering uses OpenAI GPT-3.5, which may result in false positives. We apologize for any inconvenience.)

Nydhal4 months ago

You have to input the title of the article you want, not instructions like "write me ..."

WillAdams4 months ago

I did. Eventually I managed to get an article, but by then it was generic enough to not be particularly useful.

anotheraccount94 months ago

A very interesting project. Btw, I could not find a way to delete my account when created. I've also found that the generated report is very generic and quickly goes outside the actual question or specific theme/keywords used.

A final point, the notice states that "The risks associated with this study are minimal. Study data will be stored securely, in compliance with Stanford University standards, minimizing the risk of confidentiality breach." When I use STORM, I can see other people's request. Are they supposed to be confidential?

audiala4 months ago

We use an approach inspired by this project to generate high level pages about city POIs such as this one: https://audiala.com/en/united-states/philadelphia/edgar-alla...

It's far from being perfect yet, sometimes too shallow and lacking a guiding thread, but after few iterations we believe it should offer all the information a visitor might need when planning a visit.

asterix_pano4 months ago

I suppose that the fact that it's too shallow could be improved by applying this approach recursively on each sub-topic, then synthesise them and create a narrative around them.

audiala4 months ago

Indeed, but as you increase the complexity, you increase the chance of failure, and increase the costs as well, even if those are quite minimal in comparison with the time a human would have to spend on this to do that manually.

firejake3084 months ago

I feel like this is the opposite of what LLMs are useful for. I like using LLMs to summarize and get an immediate answer to a specific question, like the AI-generated summary in Google Search. In that case, the increase in convenience outweighs the decrease in reliability. But if I wanted to read a full article about a topic, I would no longer be concerned about convenience, so I would look for a more reliable source than an LLM.

[deleted]4 months agocollapsed

canadiantim4 months ago

Black text on dark grey background not the most ideal for main hero segment.

Also would love a way to try without google authentication

thebuguy4 months ago

for the sources it used the crappy AI generated websites that pop up in the first page of google

mrkramer4 months ago

Funny enough few months ago I was thinking how would Wikipedia written by AI look like. Imagine automating writing of knowledge so humans don't have to crawl the Web , books and papers to write knowledge articles.

gpderetta4 months ago

It gets commented often, but:

   `Panther Moderns,' he said to the Hosaka, removing the trodes. `Five minute precis.' `Ready,' the computer said.

globular-toast4 months ago

What's the point of the "elaborate on purpose" box? It makes you fill it in but doesn't seem to affect the article, at least not that I can tell.

mrpf1ster4 months ago

Probably just metadata about the request for the researchers at Stanford

AgR_HZhang4 months ago

I want to write a funding application, can you help me?

AgR_HZhang4 months ago

I want write a funding application, can you help me?

mikewarot4 months ago

I kept tripping some sort of "don't give it direct instructions" filter, but it gave some interesting results. I asked it, multiple times, about my BitGrid (which it actually read about, and included!), FPGAs, LUTs and energy usage. It kept talking about the problems with the technology, and the need for specialist to program it, and environmental impacts.

I did discover a dearth of published information about just how much energy a 4x4 LUT requires per cycle, and it's idle power.

[deleted]4 months agocollapsed

rrr_oh_man4 months ago

I don't know, remindes me of GPT o1:

Lots of text, lots of headers, but extremly shallow.

jedberg4 months ago

Did anyone figure out how to share the article after you generate it?

ssalka4 months ago

I'm guessing this just isn't implemented yet. It feels like a very alpha-stage project, when I sign in with another account and use the URL from my previous session, it tries generating the article again, but seems to be hanging. Also, my 2nd account is unable to view the Discover page: a 403 error in dev tools says "User has not accepted consent form"

I would think sharing by URL should work, but has some bugs with it currently.

jedberg4 months ago

Same experience. Tried sharing by URL and had the same issues you did.

philipkglass4 months ago

I think that you have to download the PDF and upload it to your own site.

They have a "Discover" page with previously generated articles, but I think that they have some sort of manual review process to enable public access and it's not updated frequently. The newest articles there were from July. I tried copying the link for a previously generated article of mine and opening it from a private browser window but I just get sent to the main site.

nipponese4 months ago

The URLs for the generated articles are unique and persistent.

toddmorey4 months ago

I've gone from thinking AI would only further degrade our sources of news and information to thinking perhaps AI is the only thing that can help combat misinformation.

Everything is so siloed & biased now, it's hard to find any presentation of a topic from a source that has no agenda. AI to help surface, aggregate, and summarize real data, expert opinion, and analysis like this would be really powerful and much needed. Essentially on-demand wikipedia articles held to the same editorial standards. Wikipedia isn't perfect by far, but their model has been surprisingly successful considering the challenge.

kk584 months ago

Doesn't even work. 500 error

BeepInABox4 months ago

Not to be confused with Storm the language https://storm-lang.org/index.html

theanonymousone4 months ago

I was impressed at the beginning, but then disappointed seeing hallucination in easily-verifiable historical information :(

kylebenzle4 months ago

[flagged]

philipkglass4 months ago

Did you try it? It appears to use AI for search and summary but not for a foundational knowledge base. I asked it about a niche topic and I got a very useful encyclopedia-type report that included real links to published research. This is a topic that I have previously spent a lot of time on in Google Scholar where I had to skim and reject a lot of false positives that show up with simple keyword search.

Just like on actual-Wikipedia you should read the linked references and not completely trust the body text, but also like on actual-Wikipedia the majority of the report's text seemed aligned with the content of the linked references.

kylebenzle4 months ago

Yes, I did try it on two topics and it failed both. Guess thats why I was actually annoyed with it probably.

hn-front (c) 2024 voximity
source