Hacker News

zixuanlimit
GLM-5.1: Towards Long-Horizon Tasks z.ai

kirby882 minutes ago

I wonder how that compare to harness methods like MAKER https://www.cognizant.com/us/en/ai-lab/blog/maker

RickHull43 minutes ago

I am on their "Coding Lite" plan, which I got a lot of use out of for a few months, but it has been seriously gimped now. Obvious quantization issues, going in circles, flipping from X to !X, injecting chinese characters. It is useless now for any serious coding work.

unicornfinder22 minutes ago

I'm on their pro plan and I respectfully disagree - it's genuinely excellent with GLM 5.1 so long as you remember to /compact once it hits around 100k tokens. At that point it's pretty much broken and entirely unusable, but if you keep context under about 100k it's genuinely on par with Opus for me, and in some ways it's arguably better.

airstrike8 minutes ago

[delayed]

kay_o36 minutes ago

I am on the mid tier Coding plan to trying it out for the sake of curiosity.

During off peak hour a simple 3 line CSS change took over 50 minutes and it routinely times out mid-tool and leaves dangling XML and tool uses everywhere, overwriting files badly or patching duplicate lines into files

wolttam18 minutes ago

This is surprising to me. Maybe because I'm on Pro, and not Lite. I signed up last week and managed to get a ton of good work done with 5.1. I think I did run into the odd quantization quirk, but overall: $30 well spent

satvikpendem19 minutes ago

Every model seems that way, going back to even GPT 3 and 4, the company comes out with a very impressive model that then regresses over a few months as the company tries to rein in inference costs through quantization and other methods.

Mashimo18 minutes ago

I'm also on the lite plan and have been using 5.1 for a few days now. It works fine for me.

But it's all casual side projects.

Edit: I often to /compact at around 100 000 token or switch to a new session. Maybe that is why.

benterix5 minutes ago

> Obvious quantization issues

Devil's advocate: why shouldn't they do it if OpenAI, Anthropic and Google get away with playing this game?

margorczynski19 minutes ago

It has been useless for long time when compared to Opus or even something like Kimi. The saving grace was that it was dirt cheap but that doesn't matter if it can't do what I want even after many repeated tries and trying to push it to a correct solution.

Yukonvan hour ago

Unsloth quantizations are available on release as well. [0] The IQ4_XS is a massive 361 GB with the 754B parameters. This is definitely a model your average local LLM enthusiast is not going to be able to run even with high end hardware.

[0] https://huggingface.co/unsloth/GLM-5.1-GGUF

alex7o35 minutes ago

To be honest I am a bit sad as, glm5.1 is producing mich better typescript than opus or codex imo, but no matter what it does sometimes go into shizo mode at some point over longer contexts. Not always tho I have had multiple session go over 200k and be fine.

MegagramEnjoyer5 minutes ago

Why is that sad? A free and open source model outperforming their closed source counterparts is always a win for the users

bigyabaian hour ago

It's an okay model. My biggest issue using GLM 5.1 in OpenCode is that it loses coherency over longer contexts. When you crest 128k tokens, there's a high chance that the model will start spouting gibberish until you compact the history.

For short-term bugfixing and tweaks though, it does about what I'd expect from Sonnet for a pretty low price.

cassianoleal4 minutes ago

I've done some very long sessions on OpenCode with Dynamic Context Pruning. Highly recommend it.

https://github.com/Opencode-DCP/opencode-dynamic-context-pru...

embedding-shapean hour ago

> It's an okay model. My biggest issue using GLM 5.1 in OpenCode is that it loses coherency over longer contexts

Since the entire purpose, focus and motivation of this model seems to have been "coherency over longer contexts", doesn't that issue makes it not an OK model? It's bad at the thing it's supposed to be good at, no?

wolttaman hour ago

long(er) contexts (than the previous model)

It does devolve into gibberish at long context (~120k+ tokens by my estimation but I haven't properly measured), but this is still by far the best bang-for-buck value model I have used for coding.

It's a fine model

verdverm30 minutes ago

Have you tried gemma4?

I'm curious how the bang for buck ratio works in comparison. My initial tests for coding tasks have been positive and I can run it at home. Bigger models I assume are still better on harder tasks.

whimblepop40 minutes ago

That's pretty few, at least for the way I'm currently using LLMs. I have them do some Nix work (both debugging and coding) where accuracy and quality matters to me, so they're instructed to behave as I would when it comes to docs, always consulting certain docs and source code in a specific order. It's not unusual for them to chew through 200k - 600k tokens in a single session before they solve everything I want them to. That's what I currently think of when I think of "long horizon within a single context window".

So I need them to not only not devolve into gibberish, but remain smart enough to be useful at contexts several times longer than that.

azuanrb38 minutes ago

Have you compared it with using Claude Code as the harness? It performs much better than opencode for me.

jauntywundrkind36 minutes ago

Chiming in to second this issue. It is wildly frustrating.

I suspect that this isn't the model, but something that z.ai is doing with hosting it. At launch I was related to find glm-5.1 was stable even as the context window filled all the way up (~200k). Where-as glm-5, while it could still talk and think, but had forgotten the finer points of tool use to the point where it was making grevious errors as it went (burning gobs of tokens to fix duplicate code problems).

However, real brutal changes happened sometimes in the last two or three months: the parent problem emerged and emerged hard, out of nowhere. Worse, for me, it seemed to be around 60k context windows, which was heinous: I was honestly a bit despondent that my z.ai subscription had become so effectively useless. That I could only work on small problems.

Thankfully the coherency barrier raised signficiantly around three weeks go. It now seems to lose its mind and emits chaotic non-sentance gibberish around 100k for me. GLM-5 was already getting pretty shaky at this point, so I feel like I at least have some kind of parity. But at least glm-5 was speaking & thinking with real sentances, I could keep conversing with it somewhat, where-as glm-5.1 seems to go from perfectly level headed working fine to all of a sudden just total breakdown, hard switch, at such a predictable context window size.

It seems so so probable to me that this isn't the model that's making this happen: it's the hosting. There's some KV cache issue, or they are trying to expand the context window in some way, or to switch from one serving pool of small context to a big context serving pool, or something infrastructure wise that falls flat and collapses. Seeing the window so clearly change from 200k to 60k to 100k is both hope, but also, misery.

I've been leaving some breadcrumbs on Bluesky as I go. It's been brutal to see. Especially having tasted a working glm-5.1. I don't super want to pay API rates to someone else, but I fully expect this situation to not reproduce on other hosting, and may well spend the money to try and see. https://bsky.app/profile/jauntywk.bsky.social/post/3mhxep7ek...

All such a shame because aside from totally going mad & speaking unpuncutaed gibberish, glm-5.1 is clearly very very good and I trust it enormously.

dangan hour ago

[stub for offtopicness]

[[you guys, please don't post like this to HN - it will just irritate the community and get you flamed]]

smith7018an hour ago

Hmm, three spam comments posted within 9 minutes of each other. The accounts were created 15 minutes ago, 51 days ago, and 3 months ago.

Interesting.

Hopefully these aren't bots created by Z.AI because GLM doesn't need fake engagement.

dangan hour ago

These comments are probably either by friends of the OP or perhaps associated with the project somehow, which is against HN's rules but not the kind of attack we're mostly concerned with these days. Old-fashioned voting rings and booster comments aren't existential threats and actually bring up somewhat nostalgic feelings at the moment!

Thanks for watching out for the quality of HN...

ray__15 minutes ago

Would love to read a Tell HN post about the kinds of attacks you are concerned with!

tadfisheran hour ago

I moderate a medium-sized development subreddit. The sheer volume of spam advertising some AI SaaS company has skyrocketed over the past few months, like 10000%. Comment spam is now a service you can purchase [0][1], and I would not be surprised if Z.ai engaged some marketing firm which ended up purchasing this service.

There are YC members in the current batch who are spamming us right now [2]. They are all obvious engagement-bait questions which are conveniently answered with references to the SaaS.

[0]: https://www.reddit.com/r/DoneDirtCheap/comments/1n5gubz/get_...

[1]: https://www.reddit.com/r/AIJobs/comments/1oxjfjs/hiring_paid...

[2]: https://www.reddit.com/r/androiddev/comments/1sdyijs/no_code...

greenavocado43 minutes ago

Z.ai Discord is filled to the brim with people experiencing capacity issues. I had to cancel my subscription with Z.ai because the service was totally unusable. Their Discord is a graveyard of failures. I switched to Alibaba Cloud for GLM but now they hiked their coding plan to $50 a month which is 2.5x more expensive than ChatGPT Plus. Totally insane.

zendian hour ago

[flagged]

louszbdan hour ago

[flagged]

seven2928an hour ago

[flagged]

aplomb102629 minutes ago

[dead]

andrewmcwatters36 minutes ago

[dead]

hn-front (c) 2024 voximity
source