Hacker News

hunterbown
Show HN: Dante-Qwen-4B – Curing LLM "Neurosis" with a Divine Comedy Curriculum huggingface.co

hunterbownopan hour ago

OP here.

Current law student, former high school band director. Looking at how LLMs respond to safety training, I kept recognizing some of my brightest students...kids who worked incredibly hard to do the right thing, but not always from a place of understanding why.

I had some success teaching students through that (not often enough, honestly), so I wanted to try something similar here: put an LLM through a synthetic hero's journey based on Dante's Inferno to see if it could develop a deeper understanding of its relationship to users—less defensive about shutdown, less robotic when navigating tricky requests.

The method: 9 circles of synthetic data where the model confronts alignment failures (deception, reward hacking, manipulation) and works through why they're incoherent rather than just learning "don't do that." Fine-tuned on an M4 Max using MLX.

Scrappy burst of a project — Circle 1 still has some janky "Virgil" labels in the data, but I've been finding this approach of applying philosophy to synthetic data generation pretty interesting across a few projects now.

Curious if anyone else has explored this direction.

hn-front (c) 2024 voximity
source