Hacker News

hellomarshmall
Show HN: Adversarial AI agents that debate and verify travel itineraries

AI travel planners hallucinate constantly - OpenAI's best model hits roughly 10% success on complex travel planning benchmarks (source: TravelPlanner study). The core problem is that recommendations are generated from training data with zero real-world verification.I'm experimenting with a different architecture: two agents with opposing travel philosophies (deep/slow vs highlights/efficient) debate each recommendation, then every suggestion gets validated against Google Places API - real opening hours, actual walking distances, current ratings. Anything unverified gets flagged.Early stage - looking for feedback on the approach. Has anyone tried grounding LLM outputs against structured APIs like this? What's broken about it?


hn-front (c) 2024 voximity
source