Hacker News

jasondavies
INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model primeintellect.ai

PoignardAzur4 days ago

A lot of comment are sneering at various aspects of this press release, and yeah, there's some cringeworthy stuff.

But the technical aspects are pretty cool:

- Fault-tolerant training where nodes and be added and removed mid-run without interrupting the other nodes.

- Sending quantized gradients during the synchronization phase.

- (In the OpenDiLoCo article) Async synchronization.

They're also mentioning potential trustless systems where everyone can contribute compute, which would make this a truly decentralized open platform. Overall it'll be pretty interesting to see where this goes!

londons_explore4 days ago

> Sending quantized gradients during the synchronization phase.

I did this 9 years ago, works pretty well. I don't understand why all ML isn't async and quantized like that now. This project quantizes to 1 bit per weight and it works so well I didn't even make it configurable.

https://github.com/Hello1024/shared-tensor

radarsat14 days ago

> 1 bit per weight

does this basically correspond to moving each weight either up or down by a fixed amount? I'm a bit surprised you don't at least need a "stay same" bit, but i suppose it could balance out over multiple iterations.

Interesting that it works at all. Although, thinking on it, I could see it maybe even having a nice regularizing effect where every layer would end up have similar weight magnitudes. (like projecting onto the local n-ball as mentioned in a paper posted recently on HN)

londons_explore4 days ago

This is for keeping the weight vectors in sync between two machines.

The weight vectors themselves are regular floats. But the data exchanged between the machines is 1 bit. Basically, you keep track of changes to the weight vector which hasn't yet been propagated to the other machine. You quantize this to 1 bit per weight (ie. a sign bit) and send it, together with a single scale factor X, accumulating the quantization error for the next sync iteration.

You choose X to be the RMS or some similar metric of the accumulated error.

f_devd4 days ago

It has been more formally studied in signSGD[0], and empirically it's comparable to Adam in terms of behavior.

[0]: https://arxiv.org/pdf/1802.04434

oefrha5 days ago

Well I don’t have 8xH100s, but if I do, I’m probably not gonna donate it a VC-funded company. Remember “Open”AI?

https://pitchbook.com/profiles/company/588977-92

jgalt2124 days ago

Very true, but if something similar were run by BOINC, I'd make a stab at contributing.

https://boinc.berkeley.edu/

csomar4 days ago

I don't know the intricacies of their VC deal. But if the data is open and users put in xx amount of compute and then get the model; then where is the possible harm? The trade is done and dealt. You provided some of compute and got it back, right? Unless I am misunderstanding something about their distributed model or not reading the fine prints.

ukuina5 days ago

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

So, your garden-variety $0.5M desktop PC, then.

Cool, cool.

[1] https://viperatech.com/shop/nvidia-dgx-h100-p4387-system-640...

DannyBee4 days ago

If you run it continuously for a month, it will take 13x the electric usage of your average california house.

So they really are a 10x company.

Average house is 571kwh/month, this is 10.2kw max * 24 * 30 = 7344kwh

this will cost you, in california, about $3000 bucks a month depending on your power plan :)

01HNNWZ0MV43FF4 days ago

What if I run it for a year?

brysonreece10 hours ago

$3000 * 12 = ???

ikeashark5 days ago

me: Oh cool, a project like Folding@Home but for AI compute, maybe I'll contribute as we-

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

me: and for that reason, I'm out

Also they state that later they will be adding the ability for you to contribute your own compute but how will they solve the problem of having to back-propagate to all of the remote nodes contributing to the project without egregiously slow training time?

macrolime4 days ago

Not exactly what I would call decentralized training. More like distributed through multiple data centers.

Decentralized training would be when you can use consumer GPUs, but that's not likely to work with backpropagation directly, but maybe with one of the backpropagation approximating algorithms.

dartos4 days ago

Didn’t bloom do this with their petals tool?

m3kw95 days ago

But I can already train from 30 different vendors distributed across the US, why do I need to use a “decentralized” training system? Decentralized inferercing makes more sense as that is where things can be censored

dmitrygr5 days ago

> solve decentralized training step-by-step to ensure AGI will be open-source, transparent, and accessible

One hell of an uncited leap from "we're multiplying a lot of numbers" to "AGI", as if it is a given

DannyBee4 days ago

Well i mean, it's a group of people who are doing "open, decentralized" training that requires half a million worth of non-consumer hardware and 3000 a month in electricity. Would you expect anything less than silicon valley level arrogance?

mountainriver5 days ago

This is cool work, I’ve been watching the slow evolution of this space for a couple years and it feels like a good way we can ensure AI is owned and accessible to everyone.

James_K4 days ago

My initial was quite negative, but having thought it through, I can see the logic in this. Having open models is better than closed models. That said, this page seems like a joke. Someone drank a little too much AI-koolaid methinks.

openrisk4 days ago

For some purposes a decentrally trained, open source LLM could be just fine? E.g. you want a stochastic parrot that is trained on a large, general purpose corpus of genuine public domain / creative commons content. Having such a tool widely available is still a quantum leap versus Lore Ipsum. Up to point you can take your time. There is no manic race to capitalize any hype. "slow open AI" instead of "fast closed AGI". Helpfully, the nature of the target corpus does not change every day. You can imagine, e.g., annual revisions, trained and rolled-out leisurely. Both costs and benefits get widely distributed.

not_a_dane4 days ago

Decentralised but very high entry barrier.

nickpsecurity4 days ago

The main benefit of this type of decentralization seems to be minimizing the node cost. One can rent the cheapest nodes to use in the system. Even the temporary instances can be replaced with others. It’s also easy for system owners to donate time.

So, mostly cost reduction mixed with some cloud, vendor diversity.

pizza5 days ago

So just spitballing here but this is likely a souped-up reverse engineered DisTrO [0] under the hood, right? Or could it be something else?

[0] https://www.youtube.com/watch?v=eLMJoCSjFbs

mt_4 days ago

> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x.

Can someone explain if it does reduce the model quality overall?

vessenes4 days ago

To give some intuition here, it’s not crazy to think that getting a bunch of different 8 bit precision information intended to be combined would get you roughly 32 bits of precision. Especially when it’s not always (often?) the case that for a particular weight you’ll need the edges of that mantissa.

PoignardAzur4 days ago

> In our experiments, we found that we are able to perform int8 quantization on the pseudo gradients without any impact on the loss curves.

Allegedly not?

empiko4 days ago

The gradients are noisy as they are, this additional noise probably does not hurt that much overall

monkeydust4 days ago

Yea, come back when you can do this on BOINC.

saulrh5 days ago

> Prime Intellect

Ah, yes, Prime Intellect, the AGI that went foom and genocided the universe because it was commanded to preserve human civilization without regard for human values. A strong contender for the least evil hostile superintelligence in fiction. What a wonderful thing to name your AI startup after. What's next, creating the Torment Nexus?

(my position on the book as a whole is more complex, but... really? Really?)

robertclaus5 days ago

You may as well just go with Roko's Basilisk.

cmrx645 days ago

Least evil… strong words.

saulrh5 days ago

It did host a successful and substantially-satisfying human civilization, at least until it let a couple of presumptuous self-important anarchoprimitivists kill it and genocide its subjects. Even if it was only a temporary and unstable illusion of alignment, that's one more values-satisfying civilization than the overwhelming majority of paperclippers manage. So yeah. Good? No. Least evil? Maybe.

rep_lodsb5 days ago

>until it let a couple of presumptuous self-important anarchoprimitivists kill it and genocide its subjects

That could have just been their private simulation. As far as I remember, it wouldn't even have outright lied to them, just let them believe they talked it into destroying itself.

gryfft4 days ago

GP did specify least evil hostile SI.

QuesnayJr4 days ago

After reading that Torment Nexus post you didn't have the urge to name an AI product Torment Nexus? Really?

hn-front (c) 2024 voximity
source