taurath6 hours ago
> Copilot excels at low-to-medium complexity tasks in well-tested codebases, from adding features and fixing bugs to extending tests, refactoring, and improving documentation.
Bounds bounds bounds bounds. The important part for humans seems to be maintaining boundaries for AI. If your well-tested codebase has the tests built thru AI, its probably not going to work.
I think its somewhat telling that they can't share numbers for how they're using it internally. I want to know that Microsoft, the company famous for dog-fooding is using this day in and day out, with success. There's real stuff in there, and my brain has an insanely hard time separating the trillion dollars of hype from the usefulness.
timrogers5 hours ago
We've been using Copilot coding agent internally at GitHub, and more widely across Microsoft, for nearly three months. That dogfooding has been hugely valuable, with tonnes of valuable feedback (and bug bashing!) that has helped us get the agent ready to launch today.
So far, the agent has been used by about 400 GitHub employees in more than 300 our our repositories, and we've merged almost 1,000 pull requests contributed by Copilot.
In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
(Source: I'm the product lead at GitHub for Copilot coding agent.)
overfeed4 hours ago
> we've merged almost 1,000 pull requests contributed by Copilot
I'm curious to know how many Copilot PRs were not merged and/or required human take-overs.
sethammons4 hours ago
textbook survivorship bias https://en.wikipedia.org/wiki/Survivorship_bias
every bullet hole in that plane is the 1k PRs contributed by copilot. The missing dots, and whole missing planes, are unaccounted for. Ie, "ai ruined my morning"
MoreQARespect2 hours ago
If they measured that too it would make it harder to justify a MSFT P/E ratio of 29.6.
n2d43 hours ago
It's not survivorship bias. Survivorship bias would be if you made any conclusions from the 1000 merged PRs (eg. "90% of all merged PRs did not get reverted"). But simply stating the number of PRs is not that.
tinesan hour ago
As with all good marketing, the conclusions omitted and implied, no?
literalAardvark3 hours ago
"We need to get 1000 PRs merged from Copilot" "But that'll take more time" "Doesn't matter"
[deleted]2 hours agocollapsed
worldsayshi3 hours ago
I do agree that some scepticism is due here but how can we tell if we're treading into "moving the goal posts" territory?
overfeed2 hours ago
I'd love to know where you think the starting position of the goal posts was.
Everyone who has used AI coding tools interactively or as agents knows they're unpredictably hit or miss. The old, non-agent Copilot has a dashboard that shows org-wide rejection rates for for paying customers. I'm curious to learn what the equivalent rejection-rate for the agent is for the people who make the thing.
mirkodrummer12 minutes ago
> 1,000 pull requests contributed by Copilot
I'd like a breakdown of this phrase, how much human work vs Copilot and in what form, autocomplete vs agent. It's not specified seems more like a marketing trickery than real data
KenoFischer2 hours ago
What's the motivation for restricting to Pro+ if billing is via premium requests? I have a (free, via open source work) Pro subscription, which I occasionally use. I would have been interested in trying out the coding agent, but how do I know if it's worth $40 for me without trying it ;).
dsl2 hours ago
> In the repo where we're building the agent, the agent itself is actually the #5 contributor
How does this align with Microsoft's AI safety principals? What controls are in place to prevent Copilot from deciding that it could be more effective with less limitations?
bamboozledan hour ago
Haha
binarymax5 hours ago
So I need to ask: what is the overall goal of your project? What will you do in, say, 5 years from now?
timrogers5 hours ago
What I'm most excited about is allowing developers to spend more of their time working on the work they enjoy, and less of their time working on mundane, boring or annoying tasks.
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates - and I really think we're heading to a world where AI can take the load of that and free me up to work on the most interesting and complex problems.
petetnt4 hours ago
What about developers who do enjoy writing for example high quality documentation? Do you expect that the status quo will be that most of the documentation will be AI slop and AI itself will just bruteforce itself through the issues? How close are we to the point where the AI could handle "tricky dependency updates", but not being able to handle "most interesting and complex problems"? Who writes the tests that are required for the "well tested" codebases for GitHub Copilot Coding Agent to work properly?
What is the job for the developer now? Writing tickets and reviewing low quality PRs? Isn't that the most boring and mundane job in the world?
doug_durham3 hours ago
If find your comment "AI Slop" in reference to technical documentation to strange. It isn't a choice between finely crafted prose versus banal text. It's documentation that exists versus documentation that doesn't exist. Or documentation that is hopelessly out of date. In my experience LLMs do a wonderful job in translating from code to documentation. It even does a good job inferring the reason for design decisions. I'm all in on LLM generated technical documentation. If I want well written prose I'll read literature.
petetnt3 hours ago
Documentation is not just translating code to text - I don't doubt that LLMs are wonderful at that: that's what they understand. They don't understand users though, and that's what separates a great documentation writer from someone who documents.
doug_durham3 hours ago
Great technical documentation rarely gets written. You can tell the LLM the audience they are targeting and it will do a reasonable job. I truly appreciate technical writers, and hold great ones in special esteem. We live in a world where the market doesn't value this.
skydhash2 hours ago
The market value good documentation. Anything critical and commonly used is pretty well documented (linux, databases, software like Adobe's,...). You can see how many books/articles have been written about those systems.
sourdoughness8 minutes ago
We’re not talking about AI writing books about the systems, though. We’re talking about going from an undocumented codebase to a decently documented one, or one with 50% coverage going to 100%.
Those orgs that value high-quality documentation won’t have undocumented codebases to begin with.
And let’s face it, like writing code, writing docs does have a lot of repetitive, boring, boilerplate work, which I bet is exactly why it doesn’t get done. If an LLM is filling out your API schema docs, then you get to spend more time on the stuff that’s actually interesting.
bamboozledan hour ago
Most developers don't love writing tests, or updating documentation, or working on tricky dependency updates
So they won’t like working on their job ?
tokioyoyoan hour ago
You know exactly what they meant, and you know they’re correct.
bamboozledan hour ago
I like updating documentation and feel that it's fairly important to be doing myself so I actually understand what the code / services do?
I use all of these tools, but you also know what "they're doing"...
I know our careers are changing dramatically, or going away (I'm working on a replacement for myself), but I just like listening to all the "what we're doing is really helping you..."
insina minute ago
I'd interpret the original statement as "tests which don't matter" and "documentation nobody will ever read", the ones which only exist because someone said they _have_ to, and nobody's ever going to check them as long as they exist (like a README.md in one my main work projects I came back to after temporarily being reassigned to another project - previously it only had setup instructions, now: filled with irrelevent slop, never to be read, like "here is a list of the dependencies we use and a summary of each of their descriptions!").
Doing either of them _well_ - the way you do when you actually care about them and they actually matter - is still so far beyond LLMs. Good documentation and good tests are such a differentiator.
binarymax4 hours ago
Thanks for the response… do you see a future where engineers are just prompting all the time? Do you see a timeline in which todays programming languages are “low level” and rarely coded by hand?
ilaksh5 hours ago
That's a completely nonsensical question given how quickly things are evolving. No one has a five year project timeline.
binarymax4 hours ago
Absolutely the wrong take. We MUST think about what might happen in several years. Anyone who says we shouldn’t is not thinking about this technology correctly. I work on AI tech. I think about these things. If the teams at Microsoft or GitHub are not, then we should be pushing them to do so.
ilaksh4 hours ago
He asked that in the context of an actual specific project. It did not make sense way he asked it. And it's the executive's to plan that out five years down the line.. although I guarantee you none of them are trying to predict that far.
NitpickLawyer4 hours ago
> In the repo where we're building the agent, the agent itself is actually the #5 contributor - so we really are using Copilot coding agent to build Copilot coding agent ;)
Really cool, thanks for sharing! Would you perhaps consider implementing something like these stats that aider keeps on "aider writing itself"? - https://aider.chat/HISTORY.html
ilaksh5 hours ago
What model does it use? gpt-4.1? Or can it use o3 sometimes? Or the new Codex model?
aaroninsf5 hours ago
Question you may have a very informed perspective on:
where are we wrt the agent surveying open issues (say, via JIRA) and evaluating which ones it would be most effective at handling, and taking them on, ideally with some check-in for conirmation?
Or, contrariwise, from having product management agents which do track and assign work?
9wzYQbTYsAIc5 hours ago
Check out this idea: https://fairwitness.bot (https://news.ycombinator.com/item?id=44030394).
The entire website was created by Claude Sonnet through Windsurf Cascade, but with the “Fair Witness” prompt embedded in the global rules.
If you regularly guide the LLM to “consult a user experience designer”, “adopt the multiple perspectives of a marketing agenc”, etc., it will make rather decent suggestions.
I’ve been having pretty good success with this approach, granted mostly at the scale of starting the process with “build me a small educational website to convey this concept”.
aegypti4 hours ago
Tell Claude the site is down!
burnt-resistor2 hours ago
When I repeated to other tech people from about 2012 to 2020 that the technological singularity was very close, no one believed me. Coding is just the easiest to automate away into almost oblivion. And, too many non technical people drank the Flavor Aid for the fallacy that it can be "abolished" completely soon. It will gradually come for all sorts of knowledge work specialists including electrical and mechanical engineers, and probably doctors too. And, of course, office work too. Some iota of a specialists will remain to tune the bots, and some will remain in the fields to work with them for where expertise is absolutely required, but widespread unemployment of what were options for potential upward mobility into middle class are being destroyed and replaced with nothing. There won't be "retraining" or handwaving other opportunities for the "basket of labor", but competition of many uniquely, far overqualified people for ever dwindling opportunities.
It is difficult to get a man to understand something when his salary depends upon his not understanding it. - Upton Sinclair
kenjacksonan hour ago
I don't think it was unreasonable to be very skeptical at the time. We generally believed that automation would get rid of repetitive work that didn't require a lot of thought. And in many ways programming was seen almost at the top of the heap. Intellectually demanding and requiring high levels of precision and rigor.
Who would've thought (except you) that this would be one of the things that AI would be especially suited for. I don't know what this progression means in the long run. Will good engineers just become 1000x more productive as they manage X number of agents building increasingly complex code (with other agents constantly testing, debugging, refactoring and documenting them) or will we just move to a world where we just have way fewer engineers because there is only a need for so much code.
burnt-resistoran hour ago
> I don't think it was unreasonable to be very skeptical at the time.
Well, that's back rationalization. I saw the advances like conducting meta sentiment analysis on medical papers in the 00's. Deep learning was clearly just the beginning. [0]
> Who would've thought (except you)
You're othering me, which is rude, and you're speaking as though you speak for an entire group of people. Seems kind of arrogant.
0. (2014) https://www.ted.com/talks/jeremy_howard_the_wonderful_and_te...
throw123543541 minutes ago
Its interesting that even people initially skeptical are now thinking they are on the "chopping block" so to speak. I'm seeing it all over the internet and the slow realization that what supposed to be the "top of the heap" is actually at the bottom - not because of difficulty of coding but because the AI labs themselves are domain experts in software and therefore have the knowledge and data to tackle it as a problem first. I also think to a degree they "smell blood" and fear, more so than greed, is the best marketing tool. Many invested a good chunk of time on this career, and it will result in a lot of negative outcomes. Its a warning to other intellectual careers that's for sure - and you will start seeing resistance to domain knowledge sharing from more "professionalized" careers for sure.
My view is in between yours: A bit of column A and B in the sense both outcomes to an extent will play out. There will be less engineers but not by the factor of productivity (Jevon's paradox will play out but eventually tap out), there will be even more software especially of the low end, and the ones that are there will be expected to be smarter and work harder for the same or less pay grateful they got a job at all. There will be more "precision and rigor", more keeping up required by workers, but less reward for the workers that perform it. In a capitalist economy it won't be seen as a profession to aspire to anymore by most people.
Given most people don't live to work, and use their career to also finance and pursue other life meanings it won't be viable for most people long term especially when other careers give "more bang for buck" w.r.t effort put into them. The uncertainty in the SWE career that most I know are feeling right now means to newcomers I recommend on the balance of risk/reward its better to go another career path especially for juniors who have a longer runway. To be transparent I want to be wrong, but the risk of this is getting higher now everyday.
i.e. AI is a dream for the capital class, and IMO potentially disastrous for social mobility long term.
mjr003 hours ago
From talking to colleagues at Microsoft it's a very management-driven push, not developer-driven. Friend on an Azure team had a team member who was nearly put on a PIP because they refused to install the internal AI coding assistant. Every manager has "number of developers using AI" as an OKR, but anecdotally most devs are installing the AI assistant and not using it or using it very occasionally. Allegedly it's pretty terrible at C# and PowerShell which limits its usefulness at MS.
shepherdjerred2 hours ago
If you aren't using AI day-to-day then you're not adapting. Software engineering is not going to look at all the same in 5-10 years.
antihipocrat2 hours ago
That's exactly what senior executives who aren't coding are saying everywhere.
Meanwhile, engineers are using it for code completion and as a Google search alternative.
I don't see much difference here at all, the only habit to change is learning to trust an AI solution as much as a Stack Overflow answer. Though the benefit of SO is each comment is timestamped and there are alternative takes, corrections, caveats in the comments.
mjr002 hours ago
What does this have to do with my comment? Did you mean to reply to someone else?
I don't understand what this has to do with AI adoption at MS (and Google/AWS, while we're at it) being management-driven.
evantbyrnean hour ago
It's just tooling. Costs nothing to wait for it to be better. It's not like you're going miss out on AGI. The cost of actually testing every slop code generator is non-trivial.
rsoto2an hour ago
AIs are boring
[deleted]5 hours agocollapsed
mrcsharpan hour ago
> Microsoft, the company famous for dog-fooding
This was true up around 15 years ago. Hasn't been the case since.
[deleted]5 hours agocollapsed
twodave6 hours ago
I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
In any case, I think this is the best use case for AI in programming—as a force multiplier for the developer. It’s for the best benefit of both AI and humanity for AI to avoid diminishing the creativity, agency and critical thinking skills of its human operators. AI should be task oriented, but high level decision-making and planning should always be a human task.
So I think our use of AI for programming should remain heavily human-driven for the long term. Ultimately, its use should involve enriching humans’ capabilities over churning out features for profit, though there are obvious limits to that.
[0] https://www.cnbc.com/2025/04/29/satya-nadella-says-as-much-a...
DeepYogurt4 hours ago
> I feel like I saw a quote recently that said 20-30% of MS code is generated in some way. [0]
Similar to google. MS now requires devs to use ai
greatwhitenorth5 hours ago
How much was previously generated by intellisense and other code gen tools before AI? What is the delta?
tmpz226 hours ago
How much of that is protobuf stubs and other forms of banal autogenerate code?
twodave6 hours ago
Updated my comment to include the link. As much as 30% specifically generated by AI.
OnionBlender5 hours ago
The 2nd paragraph contradicts the title.
The actual quote by Satya says, "written by software".
twodave3 hours ago
Sure but then he says in his next sentence he expects 50% by AI in the next year. He’s clearly using the terms interchangeably.
shafyy5 hours ago
I would still wager that most of the 30% is some boilterplate stuff. Which is ok. But sounds less impressive with that caveat.
ilaksh5 hours ago
You might want to study the history of technology and how rapidly compute efficiency has increased as well as how quickly the models are improving.
In this context, assuming that humans will still be able to do high level planning anywhere near as well as an AI, say 3-5 years out, is almost ludicrous.
_se4 hours ago
Reality check time for you: people were saying this exact thing 3 years ago. You cannot extrapolate like that.
ctkhn3 hours ago
That's great, our leadership is heavily pushing ai-generated tests! Lol
Scene_Cast26 hours ago
I tried doing some vibe coding on a greenfield project (using gemini 2.5 pro + cline). On one hand - super impressive, a major productivity booster (even compared to using a non-integrated LLM chat interface).
I noticed that LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt. One easy example is that I noticed them breaking abstractions (putting things where they don't belong). Unfortunately, there's not that much self-retrospection on these aspects if you ask about the quality of the code or if there are any better ways of doing it. Of course, if you pick up that something is in the wrong spot and prompt better, they'll pick up on it immediately.
I also ended up blowing through $15 of LLM tokens in a single evening. (Previously, as a heavy LLM user including coding tasks, I was averaging maybe $20 a month.)
candiddevmike6 hours ago
> I also ended up blowing through $15 of LLM tokens in a single evening.
This is a feature, not a bug. LLMs are going to be the next "OMG my AWS bill" phenomenon.
Scene_Cast26 hours ago
Cline very visibly displays the ongoing cost of the task. Light edits are about 10 cents, and heavy stuff can run a couple of bucks. It's just that the tab accumulates faster than I expect.
eterm5 hours ago
> Light edits are about 10 cents
Some well-paid developers will excuse this with, "Well if it saved me 5 minutes, it's worth an order of magnitude than 10 cents".
Which is true, however there's a big caveat: Time saved isn't time gained.
You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
shepherdjerred2 hours ago
> You can "Save" 1,000 hours every night, but you don't actuall get those 1,000 hours back.
What do you mean?
If I have some task that requires 1000 hours, and I'm able to shave it down to one hour, then I did just "save" 999 hours -- just in the same way that if something costs $5 and I pay $4, I saved $
eterm39 minutes ago
My point is that saving 1,000 hours each day doesn't actually give you 1,000 hours a day to do things with.
You still get your 24 hours, no matter how much time you save.
What actually matters is the value of what is delivered, not how much time it actually saves you. Justifying costs by "time saved" is a good way to eat up your money on time-saving devices.
grepfru_it2 hours ago
Hourly_rate / 12 = 5min_rate
If light_edit_cost < 5min_rate then savings=true
PretzelPirate6 hours ago
> Cline very visibly displays the ongoing cost of the task
LLMs are now being positioned as "let them work autonomously in the background" which means no one will be watching the cost in real time.
Perhaps I can set limits on how much money each task is worth, but very few would estimate that properly.
Aurornis2 hours ago
> LLMs are now being positioned as "let them work autonomously in the background"
The only people who believe this level of AI marketing are the people who haven't yet used the tools.
> which means no one will be watching the cost in real time.
Maybe some day there's an agentic coding tool that goes off into the weeds and runs for days doing meaningless tasks until someone catches it and does a Ctrl-C, but the tools I've used are more likely to stop short of the goal than to continue crunching indefinitely.
Regardless, it seems like a common experience for first-timers to try a light task and then realize they've spent $3, instantly setting expectations for how easy it is to run up a large bill if you're not careful.
philkuz2 hours ago
I think that models are gonna commoditize, if they haven't already. The cost of switching over is rather small, especially when you have good evals on what you want done.
Also there's no way you can build a business without providing value in this space. Buyers are not that dumb.
BeetleB6 hours ago
> I also ended up blowing through $15 of LLM tokens in a single evening.
Consider using Aider, and aggressively managing the context (via /add, /drop and /clear).
gen22035 minutes ago
I, too, recommend aider whenever these discussions crop up; it converted me from the "AI tools suck" side of this discussion to the "you're using the wrong tool" side.
I'd also recommend creating little `README`'s in your codebase that are mainly written with aider as the intended audience. In it, I'll explain architecture, what code makes (non-)sense to write in this directory, and so on. Has the side-effect of being helpful for humans, too.
Nowadays when I'm editing with aider, I'll include the project README (which contains a project overview + pointers to other README's), and whatever README is most relevant to the scope of my session. It's super productive.
I'm yet to find a model that beats the cost-effectiveness of Sonnet 3.7. I've tried the latest deepseek models, and while I love the price (nearly 50x cheaper?), it's just far too error-prone compared to Sonnet 3.7. It generates solid plans / architecture discussions, but, unlike Sonnet, the code it generates often confidently off-the-mark.
danenania5 hours ago
My tool Plandex[1] allows you to switch between automatic and manual context management. It can be useful to begin a task with automatic context while scoping it out and making the high level plan, then switch to the more 'aider-style' manual context management once the relevant files are clearly established.
1 - https://github.com/plandex-ai/plandex
Also, a bit more on auto vs. manual context management in the docs: https://docs.plandex.ai/core-concepts/context-management
shepherdjerred2 hours ago
$15 in an evening sounds like a great deal when you consider the cost of highly-paid software engineers
SkyPuncher3 hours ago
I loathe using AI in a greenfield project. There are simply too many possible paths, so it seems to randomly switch between approaches.
In a brownfield code base, I can often provide it reference files to pattern match against. So much easier to get great results when it can anchor itself in the rest of your code base.
imiric44 minutes ago
The trick for greenfield projects is to use it to help you design detailed specs and a tentative implementation plan. Just bounce some ideas off of it, as with a somewhat smarter rubber duck, and hone the design until you arrive at something you're happy with. Then feed the detailed implementation plan step by step to another model or session.
This is a popular workflow I first read about here[1].
This has been the most useful use case for LLMs for me. Actually getting them to implement the spec correctly is the hard part, and you'll have to take the reigns and course correct often.
[1]: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
jstummbillig6 hours ago
If you want to use Cline and are at all price sensitive (in these ranges) you have to do manual context management just for that reason. I find that too cumbersome and use Windsurf (currently with Gemini 2.5 pro) for that reason.
falcor846 hours ago
> LLMs need a very heavy hand in guiding the architecture, otherwise they'll add architectural tech debt
I wonder if the next phase would be the rise of (AI-driven?) "linters" that check that the implementation matches the architecture definition.
dontlikeyoueith6 hours ago
And now we've come full circle back to UML-based code generation.
Everything old is new again!
tmpz226 hours ago
While its being touted for Greenfield projects I've notices a lot of failures when it comes to bootstrapping a stack.
For example it (Gemini 2.5) really struggles with newer ecosystem like Fastapi when wiring libraries like SQLAlchemy, Pytest, Python-playwright, etc., together.
I find more value in bootstrapping myself, and then using it to help with boiler plate once an effective safety harness is in place.
nodja5 hours ago
I wish they optimized things before adding more crap that will slow things down even more. The only thing that's fast with copilot is the autocomplete, it sometimes takes several minutes to make edits on a 100 line file regardless of the model I pick (some are faster than others). If these models had a close to 100% hit rate this would be somewhat fine, but going back and forth with something that takes this long is not productive. It's literally faster to open claude/chatgpt on a new tab and paste the question and code there and paste it back into vscode than using their ask/edit/agent tools.
I've cancelled my copilot subscription last week and when it expires in two weeks I'll mostly likely shift to local models for autocomplete/simple stuff.
brushfoot5 hours ago
My experience has mostly been the opposite -- changes to several-hundred-line files usually only take a few seconds.
That said, months ago I did experience the kind of slow agent edit times you mentioned. I don't know where the bottleneck was, but it hasn't come back.
I'm on library WiFi right now, "vibe coding" (as much as I dislike that term) a new tool for my customers using Copilot, and it's snappy.
nodja4 hours ago
Here's a video of what it looks like with sonnet 3.7.
The claude and gemini models tend to be the slowest (yes, including flash). 4o is currently the fastest but still not great.
NicuCalcea3 hours ago
For me, the speed varies from day to day (Sonnet 3.7), but I've never seen it this slow.
BeetleB4 hours ago
Several minutes? Something is seriously wrong. For most models, it takes seconds.
nodja4 hours ago
2m27s for a partial response editing a 178 line file (it failed with an error, which seems to happen a lot with claude, but that's another issue).
allthenopes25an hour ago
"Drowning in technical debt?"
Stop fighting and sink!
But rest assured that with Github Copilot Coding Agent, your codebase will develop larger and larger volumes of new, exciting, underexplored technical debt that you can't be blamed for, and your colleagues will follow you into the murky depths soon.
bionhoward43 minutes ago
Major scam alert, they are training on your code in private repos if you use this
You can tell because they advertise “Pro” and “Pro+” but then the FAQ reads,
> Does GitHub use Copilot Business or Enterprise data to train GitHub’s model? > No. GitHub does not use either Copilot Business or Enterprise data to train its models.
Aka, even paid individuals plans are getting brain raped
muglug6 hours ago
> Copilot excels at low-to-medium complexity tasks
Oh cool!
> in well-tested codebases
Oh ok never mind
lukehoban5 hours ago
As peer commenters have noted, coding agent can be really good at improving test coverage when needed.
But also as a slightly deeper observation - agentic coding tools really do benefit significantly from good test coverage. Tests are a way to “box in” the agent and allow it to check its work regularly. While they aren’t necessary for these tools to work, they can enable coding agents to accomplish a lot more on your behalf.
(I work on Copilot coding agent)
CSMastermind5 hours ago
In my experience they write a lot of pointless tests that technically increase coverage while not actually adding much more value than a good type system/compiler would.
They also have a tendency to suppress errors instead of fixing them, especially when the right thing to do is throw an error on some edge case.
shepherdjerred2 hours ago
You can tell the AI not to suppress errors
[deleted]5 hours agocollapsed
abraham6 hours ago
Have it write tests for everything and then you've got a well tested codebase.
danielbln4 hours ago
Caveat empor, I've seen some LLMs mock the living hell out of everything, to the point of not testing much of anything. Something to be aware of.
yen2234 hours ago
I've seen too many human operators do that too. Definitely a problem to watch out for
eikenberry5 hours ago
You forgot the /s
throwaway123615 hours ago
In my experience it works well even without good testing, at least for greenfield projects. It just works best if there are already tests when creating updates and patches.
shwouchk5 hours ago
I played around with it quite a bit. it is both impressive and scary. most importantly, it tends to indiscriminately use dependencies from random tiny repos, and often enough not the correct ones, for major projects. buyer beware.
PhilipRomanan hour ago
This is something I've noticed as well with different AIs. They seem to disproportionately trust data read from the web. For example, I asked to check if some obvious phishing pages were scams and multiple times I got just a summary of the content as if it was authoritative. Several times I've gotten some random chinese repo with 2 stars presented as if it was the industry standard solution, since that's what it said in the README.
On an unrelated note, it also suggested I use the "Strobe" protocol for encryption and sent me to https://strobe.cool which is ironic considering that page is all about making one hallucinate.
jaymzcampbellan hour ago
> ... sent me to...
Oh wow, that was great - particularly if I then look at my own body parts (like my palm) that I know are not moving, it's particularly disturbing. That's a really well done effect, I've seen something similar but nothing quite like that.
yellow_lead2 hours ago
Given that PRs run actions in a more trusted context for private repos, this is a bit concerning.
[deleted]an hour agocollapsed
boomskats6 hours ago
My buddy is at GH working on an adjacent project & he hasn't stopped talking about this for the last few days. I think I've been reminded to 'make sure I tune into the keynote on Monday' at least 8 times now.
I gave up trying to watch the stream after the third authentication timeout, but if I'd known it was this I'd maybe have tried a fourth time.
unshavedyak6 hours ago
What specific keynote are they referring to? I'm curious, but thus far my searches have failed
babelfish6 hours ago
MS Build is today
throwaway123615 hours ago
Word of advice: just go to YouTube and skip the MS registration tax
tmpz226 hours ago
I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I’d love for this to blow past cursor. Will definitely tune in to see it.
dontlikeyoueith6 hours ago
>I’m always hesitant to listen to the line coders on projects because they’re getting a heavy dose of the internal hype every day.
I'm senior enough that I get to frequently see the gap between what my dev team thinks of our work and what actual customers think.
As a result, I no longer care at all what developers (including myself on my own projects) think about the quality of the thing they've built.
jerpint6 hours ago
These kinds of patterns allow compute to take much more time than a single chat since it is asynchronous by nature, which I think is necessary to get to working solutions on harder problems
lukehoban5 hours ago
Yes. This is a really key part of why Copilot coding agent feels very different to use than Copilot agent mode in VS Code.
In coding agent, we encourage the agent to be very thorough in its work, and to take time to think deeply about the problem. It builds and tests code regularly to ensure it understands the impact of changes as it makes them, and stops and thinks regularly before taking action.
These choices would feel too “slow” in a synchronous IDE based experience, but feel natural in a “assign to a peer collaborator” UX. We lean into this to provide as rich of a problem solving agentic experience as possible.
(I’m working on Copilot coding agent)
[deleted]4 hours agocollapsed
fvold4 hours ago
The biggest change Copilot has done for me so far is to have me replace my VSCode with VSCodium to be sure it doesn't sneak any uploading of my code to a third party without my knowing.
I'm all for new tech getting introduced and made useful, but let's make it all opt in, shall we?
qwertox4 hours ago
Care to explain? Where are they uploading code to?
hidelooktropican hour ago
UX-wise...
I kind of love the idea that all of this works in the familiar flow of raising an issue and having a magic coder swoop in and making a pull request.
At the same time, I have been spoiled by Cursor. I feel I would end up preferring that the magic coder is right there with me in the IDE where I can run things and make adjustments without having to do a followup request or comment on a line.
asadm6 hours ago
In the early days on LLM, I had developed an "agent" using github actions + issues workflow[1], similar to how this works. It was very limited but kinda worked ie. you assign it a bug and it fired an action, did some architect/editing tasks, validated changes and finally sent a PR.
Good to see an official way of doing this.
sync5 hours ago
Anthropic just announced the same thing for Claude Code, same day: https://docs.anthropic.com/en/docs/claude-code/github-action...
Yenrabbitan hour ago
And Google's version: https://jules.google
alvis3 hours ago
God save the juniors...
sethops1an hour ago
> Copilot coding agent is rolling out to GitHub Mobile users on iOS and Android, as well as GitHub CLI.
Wait, is this going to pollute the `gh` tool? Please tell me this isn't happening.
FergusArgyll6 minutes ago
ubuntu@pc:~$ gh --help
Sure! How can I help you?
qwertox4 hours ago
In hindsight it was a mistake that Google killed Google Code. Then again, I guess they wouldn't have put enough effort into it to develop into a real GitHub alternative.
Now Microsoft sits on a goldmine of source code and has the ability to offer AI integration even to private repositories. I can upload my code into a private repo and discuss it with an AI.
The only thing Google can counter with would be to build tools which developers install locally, but even then I guess that the integration would be limited.
And considering that Microsoft owns the "coding OS" VS Code, it makes Google look even worse. Let's see what they come up with tomorrow at Google I/O, but I doubt that it will be a serious competition for Microsoft. Maybe for OpenAI, if they're smart, but not for Microsoft.
abraham2 hours ago
Gemini has some GitHub integrations
https://developers.google.com/gemini-code-assist/docs/review...
geodel2 hours ago
You win some you lose some. Google could have continued with Google code. Microsoft could've continued with their phone OS. It is difficult to know when to hold and when to fold.
candiddevmike2 hours ago
Google Cloud has a pre-GA product called "Secure Source Manager" that looks like a fork of Gitea: https://cloud.google.com/secure-source-manager/docs/overview
Definitely not Google Code, but better than Cloud Source Repositories.
dangoodmanUT3 hours ago
Or they'll just buy Cursor
joelthelion5 hours ago
I don't know, I feel this is the wrong level to place the AI at this moment. Chat-based AI programming (such as Aider) offers more control, while being almost as convenient.
azhenley3 hours ago
Looks like their GitHub Copilot Workspace.
softwaredoug6 hours ago
Is Copilot a classic case of slow megacorp gets outflanked by more creative and unhindered newcomers (ie Cursor)?
It seems Copilot could have really owned the vibe coding space. But that didn’t happen. I wonder why? Lots of ideas gummed up in organizational inefficiencies, etc?
ilaksh5 hours ago
This is a direct threat to Cursor. The smarter the models get, the less often programmers really need to dig into an IDE, even one with AI in it. Give it a couple of years and there will be a lot of projects that were done just by assigning tasks where no one even opened Cursor or anything.
net01op3 hours ago
on a other note https://github.com/github/dmca/pull/17700 GitHub's automated auto-merged DMCA sync PRs get automated copilot reviews for every single one.
AMAZING
theusus6 hours ago
I have been so far disappointed by copilot's offerings. It's just not good enough for anything valuable. I don't want you to write my getter and setter. And call it a day.
rvz5 hours ago
I think we expected disappointment with this one. (I expected it at least)[0]
But the upgraded Copilot was just in response to Cursor and Winsurf.
We'll see.
quantadev3 hours ago
I love Copilot in VSCode. I have it set to use Claude most of the time, but it let's you pick your fav LLM, for it to use. I just open the files I'm going to refactor, type into the chat window what I want done, click 'accept' on every code change it recommends in it's answer, causing VSCode to auto-merge the changes into my code. Couldn't possibly be simpler. Then I scrutinize and test. If anything went wrong I just use GitLens to rollback the change, but that's very rare.
Especially now that Copilot supports MCP I can plug in my own custom "Tools" (i.e. Function calling done by the AI Agent), and I have everything I need. Never even bothered trying Cursor or Windsurf, which i'm sure are great too, but _mainly_ since they're just forks of VSCode, as the IDE.
SkyBelow2 hours ago
Have you tried the agent mode instead of the ask mode? With just a bit more prompting, it does a pretty good job of finding the files it needs to use on its own. Then again, I've only used it in smaller projects so larger ones might need more manual guidance.
OutOfHere6 hours ago
GitHub had this exact feature late last year itself, perhaps under a slightly different name.
timrogers5 hours ago
I think you're probably thinking of Copilot Workspace (<https://github.blog/news-insights/product-news/github-copilo...>).
Copilot Workspace could take a task, implement it and create a PR - but it had a linear, highly structured flow, and wasn't deeply integrated into the GitHub tools that developers already use like issues and PRs.
With Copilot coding agent, we're taking all of the great work on Copilot Workspace, and all the learnings and feedback from that project, and integrating it more deeply into GitHub and really leveraging the capabilities of 2025's models, which allow the agent to be more fluid, asynchronous and autonomous.
(Source: I'm the product lead for Copilot coding agent.)
throwup2386 hours ago
Are you thinking if Copilot Workspaces?
That seemed to drop off the Github changelog after February. I’m wondering if that team got reallocated to the copilot agent.
WorldMaker5 hours ago
Probably. Also this new feature seems like an expansion/refinement of Copilot Workspaces to better fit the classic Github UX: "assign an issue to Copilot to get a PR" sounds exactly like the workflow Copilot Workspaces wanted to have when it grew up.
OutOfHere5 hours ago
Which model does it use? Will this let me select which model to use? I have seen a big difference in the type of code that different models produce, although their prompts may be to blame/credit in part.
qwertox4 hours ago
I assume you can select whichever one you want (GPT-4o, o3-mini, Claude 3.5, 3.7, 3.7 thinking, Gemini 2.0 Flash, GPT=4.1 and the previews o1, Gemini 2.5 Pro and 04-mini), subject to the pricing multiplicators they announced recently [0].
Edit: From the TFA: Using the agent consumes GitHub Actions minutes and Copilot premium requests, starting from entitlements included with your plan.
[0] https://docs.github.com/en/copilot/managing-copilot/monitori...
sudhar1723 hours ago
Nice
2OEH8eoCRo03 hours ago
Kicking the can down the road. So we can all produce more code faster but there is NSB. Most of my time isn't spent writing the code anyway.
gitroom10 minutes ago
[dead]
r0ckarong6 hours ago
Check in unreviewed slop straight into the codebase. Awesome.
timrogers5 hours ago
Copilot pushes its work to a branch and creates a pull request, and then it's up to you to review its work, approve and merge.
Copilot literally can't push directly to the default branch - we don't give it the ability to do that - precisely because we believe that all AI-generated code (just like human generated code) should be carefully reviewed before it goes to production.
(Source: I'm the product lead for Copilot coding agent.)
olex6 hours ago
> Once Copilot is done, it’ll tag you for review. You can ask Copilot to make changes by leaving comments in the pull request.
To me, this reads like it'll be a good junior and open up a PR with its changes, letting you (the issue author) review and merge. Of course, you can just hit "merge" without looking at the changes, but then it's kinda on you when unreviewed stuff ends up in main.
DeepYogurt6 hours ago
Management: "Why aren't you going faster now that the AI generates all the code and we fired half the dev team?"
tmpz226 hours ago
A good junior has strong communication skills, humility, asks many good questions, has imagination, and a tremendous amount of human potential.
odiroot6 hours ago
I'm waiting for the first unicorn that uses just vibe coding.
erikerikson6 hours ago
I expect it to be a security nightmare
freeone30005 hours ago
And why would that matter?
postalrat6 hours ago
Now developers can produce 20x the slop and refactor at 5x speed.
OutOfHere6 hours ago
In my experience in VSCode, Claude 3.7 produced more unsolicited slop, whereas GPT-4.1 didn't. Claude aggressively paid attention to type compatibility. Each model would have its strengths.