Hacker News

terabytest
Ask HN: Claude Opus 4.5 vs. GPT 5.1 Codex Max for coding. Worth the upgrade?

I’m using gpt-5.1-codex-max comfortably for coding and hitting the weekly limit sometimes (but a few extra credits usually cover it).

I’ve heard Opus 4.5 might be better for coding. SWE-bench shows an 8% improvement but I'm having a hard time guessing what kind of effect that maps to in reality. For those who’ve switched, what changes have you seen, and how has it affected your work? Is the $100/month upgrade worth it?


otekengineering5 hours ago

I'm impressed with Opus 4.5. It's been useful working on firmware projects where earlier models were of negative value.

Here's an example of a one-shot output, the only change I made was Replace All 'battlezone'->'battleclone':

"build a clone of the classic arcade game battlezone using SVG graphics that are calculated on the fly for the required vector wireframe graphics"

https://omnispect.dev/battleclone00.html

muzani10 hours ago

I'm fine with just Copilot.

Opus 4.5 has excellent tool use, meaning it can jump in and out of a broad undocumented codebase better. It can evaluate what the code is trying to do. It's perfect for PRs - caught things like people submitting code that looks right, but ended up running a poorly documented/incomplete method.

GPT codex just messes up a lot for me. Whatever I'm doing with it, it's not working. The plain GPT-5.2 is good overall, but it confidently makes mistakes and tell you that it's done.

If you have an excellent codebase, GPT 5.2 might actually work better. If you're not sure what you're doing or are using AI to find out how things work, then Opus 4.5 is great.

The Claude models are also very much behind in terms of UI and visuals.

Take note that a lot of the benchmarks are on Python. What I'm finding is all the major ones make mistakes, but they make mistakes differently. OpenAI and Anthropic tend to mimic one another for some reason, while Grok and Gemini tend to give very different answers.

sourdoughness12 hours ago

Using Opus 4.5 through VScode/CoPilot gives so much better results than anything else I’ve tried that I kept paying when they briefly made it 3x token rate.

I really like the interaction flows better than Gemini 3 or Codex, though I can’t quite quantify why. The amount of explanation/supporting material in Opus’s output feels just right to me.

djinnrutger10 hours ago

I have been using VSCode / CoPilot with Opus 4.5 and it has been working the best of any of the system I have tried. Very happy with it so far. I never really got good results with GPT5.1. though 5.2 seems better...but not by alot, so I will stick with Opus 4.5 for now.

dalmo314 hours ago

I'm using Opus through Cursor, but not a heavy user so price is "the same".

Opus is so good I can actually give it a task and move my attention somewhere else. So although the model itself is much slower my general workflow is faster and less frustrating.

chaidhat2 days ago

You must have immense patience to daily drive codex. To be honest, I’ve observed better code quality from codex (in terms of separation of concerns, high cohesion loose coupling, etc.) but Opus has great quality at roughly 1/3rd of the speed. Try it on Cursor maybe then decide if you want to switch. I’m curious — have you tried gemini pro 3 and do you thibk deserve the hype?

hn-front (c) 2024 voximity
source