Hacker News

news newest show ask jobs

jasondavies

INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model primeintellect.ai

111 points 36 comments a year ago

PoignardAzura year ago

A lot of comment are sneering at various aspects of this press release, and yeah, there's some cringeworthy stuff.

But the technical aspects are pretty cool:

- Fault-tolerant training where nodes and be added and removed mid-run without interrupting the other nodes.

- Sending quantized gradients during the synchronization phase.

- (In the OpenDiLoCo article) Async synchronization.

They're also mentioning potential trustless systems where everyone can contribute compute, which would make this a truly decentralized open platform. Overall it'll be pretty interesting to see where this goes!

londons_explorea year ago

> Sending quantized gradients during the synchronization phase.

I did this 9 years ago, works pretty well. I don't understand why all ML isn't async and quantized like that now. This project quantizes to 1 bit per weight and it works so well I didn't even make it configurable.

https://github.com/Hello1024/shared-tensor

radarsat1a year ago

> 1 bit per weight

does this basically correspond to moving each weight either up or down by a fixed amount? I'm a bit surprised you don't at least need a "stay same" bit, but i suppose it could balance out over multiple iterations.

Interesting that it works at all. Although, thinking on it, I could see it maybe even having a nice regularizing effect where every layer would end up have similar weight magnitudes. (like projecting onto the local n-ball as mentioned in a paper posted recently on HN)

londons_explorea year ago

This is for keeping the weight vectors in sync between two machines.

The weight vectors themselves are regular floats. But the data exchanged between the machines is 1 bit. Basically, you keep track of changes to the weight vector which hasn't yet been propagated to the other machine. You quantize this to 1 bit per weight (ie. a sign bit) and send it, together with a single scale factor X, accumulating the quantization error for the next sync iteration.

You choose X to be the RMS or some similar metric of the accumulated error.

f_devda year ago

It has been more formally studied in signSGD[0], and empirically it's comparable to Adam in terms of behavior.

[0]: https://arxiv.org/pdf/1802.04434

oefrhaa year ago

Well I don’t have 8xH100s, but if I do, I’m probably not gonna donate it a VC-funded company. Remember “Open”AI?

https://pitchbook.com/profiles/company/588977-92

jgalt212a year ago

Very true, but if something similar were run by BOINC, I'd make a stab at contributing.

https://boinc.berkeley.edu/

csomara year ago

I don't know the intricacies of their VC deal. But if the data is open and users put in xx amount of compute and then get the model; then where is the possible harm? The trade is done and dealt. You provided some of compute and got it back, right? Unless I am misunderstanding something about their distributed model or not reading the fine prints.

ukuinaa year ago

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

So, your garden-variety $0.5M desktop PC, then.

Cool, cool.

[1] https://viperatech.com/shop/nvidia-dgx-h100-p4387-system-640...

DannyBeea year ago

If you run it continuously for a month, it will take 13x the electric usage of your average california house.

So they really are a 10x company.

Average house is 571kwh/month, this is 10.2kw max * 24 * 30 = 7344kwh

this will cost you, in california, about $3000 bucks a month depending on your power plan :)

01HNNWZ0MV43FFa year ago

What if I run it for a year?

brysonreecea year ago

$3000 * 12 = ???

ikeasharka year ago

me: Oh cool, a project like Folding@Home but for AI compute, maybe I'll contribute as we-

> Decentralized training of INTELLECT-1 currently requires 8x H100 SXM5 GPUs.

me: and for that reason, I'm out

Also they state that later they will be adding the ability for you to contribute your own compute but how will they solve the problem of having to back-propagate to all of the remote nodes contributing to the project without egregiously slow training time?

macrolimea year ago

Not exactly what I would call decentralized training. More like distributed through multiple data centers.

Decentralized training would be when you can use consumer GPUs, but that's not likely to work with backpropagation directly, but maybe with one of the backpropagation approximating algorithms.

dartosa year ago

Didn’t bloom do this with their petals tool?

m3kw9a year ago

But I can already train from 30 different vendors distributed across the US, why do I need to use a “decentralized” training system? Decentralized inferercing makes more sense as that is where things can be censored

dmitrygra year ago

> solve decentralized training step-by-step to ensure AGI will be open-source, transparent, and accessible

One hell of an uncited leap from "we're multiplying a lot of numbers" to "AGI", as if it is a given

DannyBeea year ago

Well i mean, it's a group of people who are doing "open, decentralized" training that requires half a million worth of non-consumer hardware and 3000 a month in electricity. Would you expect anything less than silicon valley level arrogance?

mountainrivera year ago

This is cool work, I’ve been watching the slow evolution of this space for a couple years and it feels like a good way we can ensure AI is owned and accessible to everyone.

openriska year ago

For some purposes a decentrally trained, open source LLM could be just fine? E.g. you want a stochastic parrot that is trained on a large, general purpose corpus of genuine public domain / creative commons content. Having such a tool widely available is still a quantum leap versus Lore Ipsum. Up to point you can take your time. There is no manic race to capitalize any hype. "slow open AI" instead of "fast closed AGI". Helpfully, the nature of the target corpus does not change every day. You can imagine, e.g., annual revisions, trained and rolled-out leisurely. Both costs and benefits get widely distributed.

James_Ka year ago

My initial was quite negative, but having thought it through, I can see the logic in this. Having open models is better than closed models. That said, this page seems like a joke. Someone drank a little too much AI-koolaid methinks.

not_a_danea year ago

Decentralised but very high entry barrier.

nickpsecuritya year ago

The main benefit of this type of decentralization seems to be minimizing the node cost. One can rent the cheapest nodes to use in the system. Even the temporary instances can be replaced with others. It’s also easy for system owners to donate time.

So, mostly cost reduction mixed with some cloud, vendor diversity.

pizzaa year ago

So just spitballing here but this is likely a souped-up reverse engineered DisTrO [0] under the hood, right? Or could it be something else?

[0] https://www.youtube.com/watch?v=eLMJoCSjFbs

mt_a year ago

> We quantize the pseudo-gradients to int8, reducing communication requirements by 400x.

Can someone explain if it does reduce the model quality overall?

vessenesa year ago

To give some intuition here, it’s not crazy to think that getting a bunch of different 8 bit precision information intended to be combined would get you roughly 32 bits of precision. Especially when it’s not always (often?) the case that for a particular weight you’ll need the edges of that mantissa.

PoignardAzura year ago

> In our experiments, we found that we are able to perform int8 quantization on the pseudo gradients without any impact on the loss curves.

Allegedly not?

empikoa year ago

The gradients are noisy as they are, this additional noise probably does not hurt that much overall

monkeydusta year ago

Yea, come back when you can do this on BOINC.

saulrha year ago

> Prime Intellect

Ah, yes, Prime Intellect, the AGI that went foom and genocided the universe because it was commanded to preserve human civilization without regard for human values. A strong contender for the least evil hostile superintelligence in fiction. What a wonderful thing to name your AI startup after. What's next, creating the Torment Nexus?

(my position on the book as a whole is more complex, but... really? Really?)

robertclausa year ago

You may as well just go with Roko's Basilisk.

cmrx64a year ago

Least evil… strong words.

saulrha year ago

It did host a successful and substantially-satisfying human civilization, at least until it let a couple of presumptuous self-important anarchoprimitivists kill it and genocide its subjects. Even if it was only a temporary and unstable illusion of alignment, that's one more values-satisfying civilization than the overwhelming majority of paperclippers manage. So yeah. Good? No. Least evil? Maybe.

rep_lodsba year ago

>until it let a couple of presumptuous self-important anarchoprimitivists kill it and genocide its subjects

That could have just been their private simulation. As far as I remember, it wouldn't even have outright lied to them, just let them believe they talked it into destroying itself.

gryffta year ago

GP did specify least evil hostile SI.

QuesnayJra year ago

After reading that Torment Nexus post you didn't have the urge to name an AI product Torment Nexus? Really?

source