Hacker News

ag2718
Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks aarushgupta.io

https://web.archive.org/web/20260609200156/https://aarushgup...


Lercan hour ago

Has there been much exploration on how much benefit comes from precision in activation functions in KANs? There's a little niggle in the back of my head that maybe 90% of the benefit of KANs can be gained from a quite small variety of function shapes. Combined with input weighting, I almost feel you could have a representation that scales from a standard relu perceptron though KANs to something with weighted inputs and fancy weighted activation functions.

Mark that out in 2d with axes of input weight precision and activation weight precision, you could perhaps do sweeps to find the best accuracy per parameter bit, or accuracy/speed, or some sweet spot that has a nice balance of operating speed, accuracy, and model size.

ag2718opa minute ago

There is definitely a precision-performance tradeoff to consider. We explored this through ablation studies on bitwidth precision / resource usage in our work (Figure 6a in https://arxiv.org/pdf/2512.12850, Figure 4 in https://arxiv.org/pdf/2602.02056). Further exploration into the mechanics here would definitely be useful.

Regarding your point that "90% of the benefit of KANs can be gained from a small variety of function shapes": even within the B-spline basis, the shapes are quite uniform. Much of the actual benefit of scaling up the basis size comes from learning more complex, piecewise-polynomial activation functions. Scaling up the number of basis functions (i.e. more granular intervals) also increases locality and allows the activation function's value across different parts of the domain to be learned semi-independently. (There obviously is a tradeoff here with overfitting.)

The number of basis functions (G+S) is largely what determines how expressive the activation is, as it relates to your point: "you could have a representation that scales from a standard relu perceptron though KANs to something with weighted inputs and fancy weighted activation functions."

hodgehog1143 minutes ago

The benefit in KANs is interpretability, not expressivity. It's a structure that lends itself well to performing symbolic regression or other interpretable downstream tasks. This can make it better suited for scientific tasks, for example. You can easily replicate the practical performance of any KAN with an MLP, and it will train and run faster on modern architectures. This proposes a method it might be faster, but it's early days to me.

Precision in the activation function is targetting a part of neural networks that you don't want. There are many other methods that work with high precision. You use neural networks because of their implicit bias toward regular solutions. That means there is a sweet spot at low precision that you're targetting.

mikeayles4 hours ago

So for people wondering if it can be used to accelerate LLM inference, sadly not.

I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.

It appears to be focussed more on latency, than throughput. Happy to be corrected?

ag2718op4 hours ago

You're correct that this work is not very applicable for LLMs and that the focus here is primarily on latency.

ai_fry_ur_brainan hour ago

Was anyone thinking this?

Cadwhiskeran hour ago

If you want to experiment with KANs yourself in a non-FPGA environment, there's a GitHub repo here: https://github.com/KindXiaoming/pykan

HN comments page on that is here: https://news.ycombinator.com/item?id=40219205

RantyDave6 hours ago

Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??

ag2718op5 hours ago

Yes, this work is focused on accelerating very small models, typically for real-time systems that require extremely low power or low latency.

One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).

laughing_man26 minutes ago

Drone target recognition?

poly2it5 hours ago

I'm not in HFT, but I assume this is also an interesting applicable domain?

UltraSane4 hours ago

The author actually works at Jane Street.

ag2718op5 hours ago

Yes, definitely: this type of work is applicable in domains where software run on general-purpose processors cannot meet latency or power requirements.

[deleted]5 hours agocollapsed

tomrod4 hours ago

Happy to hear that KANs continue to find solid footing.

Animats5 hours ago

This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.

throwaw125 hours ago

he is already at Jane Street

Animats5 hours ago

Of course.

ai_fry_ur_brainan hour ago

Sure, if they worked for 100 years maybe.. FPGA guy at jane st probably makes 600k to low seven figures... Maybe.

Not everyone in quant is a centi-millionaire, probably almost none of them in r&d actually.

[deleted]6 hours agocollapsed

cwmoore3 hours ago

took long enough

babelfish5 hours ago

Archive link, as it looks like the original post was taken down: https://web.archive.org/web/20260609200156/https://aarushgup...

ag2718op5 hours ago

Hmm the post is still up for me?

dang5 hours ago

For us too, but we'll put the archive link in the toptext since these things seem to vary a lot by region.

p.s. Thanks for posting this and welcome to HN!

amdeisimncrmnls4 hours ago

[dead]

KAN_LUT3 hours ago

[dead]

[deleted]3 hours agocollapsed

hn-front (c) 2024 voximity
source