Hacker News

ayushnangia16
Show HN: DistilKitPlus, a distillation framework between any LLMs github.com

Over the past few months, I have built a distillation toolkit that supports cross-tokenizer distillation (e.g., distilling from LLaMA to Qwen vocab, or others). This approach has worked well on reasoning datasets like AIME, and we’ve validated on models like Phi and Qwen.

We’ve also integrated Modal for quick deployment (with $30/month credits to try it out).

Would love any feedback!

GitHub: https://github.com/agokrani/distillKitPlus

Docs: https://distillkitplus.mintlify.app/


vikramxD2 days ago

Cool , are you accepting contributions for adding new models

shikharM072 days ago

this is kinda interesting but I'm curious what is the smallest model size that I can distill without compromising the accuracy?

agokrani2 days ago

We can distill 14B model to 4B model with performance improvements on AIME24 and GSM8K. We will share our results with a detailed blog post later.

vijit-singh2 days ago

this is very cool. will try it out.

hn-front (c) 2024 voximity
source