Just want to clarify, this is not my Substack, I’m just sharing this because I found it insightful.
The author describes himself as a “fractional CTO”(no clue what that means, don’t ask me) and advisor. His clients asked him how they could leverage AI. He decided to experience it for himself. From the author(emphasis mine):
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Now when clients ask me about AI adoption, I can tell them exactly what 100% looks like: it looks like failure. Not immediate failure—that’s the trap. Initial metrics look great. You ship faster. You feel productive. Then three months later, you realize nobody actually understands what you’ve built.
Wasn’t this obvious? He didn’t need to go “all-in on ai” cause there is hundreds of thousands of people who tried the same thing already and everyone of them could tell him that’s not what ai can do.
Hundreds of thousands of internet strangers is different from lived experience.
I take the author’s opinion more seriously because they went out and tried it for themselves.
“Thousands of people said touching a hot stove hurts. He touched the stove to try it himself, and I respect him for burning himself instead of using shared human knowlesge.”
It looks like a rigid design philosophy that must completely rebuild for any change. If the speed of production becomes fast enough, and the cost low enough, iterating the entire program for every change would become feasible and cost effective.
I frequently feel that urge to rebuild from ground (specifications) up, to remove the “old bad code” from the context window and get back to the “pure” specification as the source of truth. That only works up to a certain level of complexity. When it works it can be a very fast way to “fix” a batch of issues, but when the problem/solution is big enough the new implementation will have new issues that may take longer to identify as compared with just grinding through the existing issues. Devil whose face you know kind of choice.
… as long as the giant corpos paying through the nose for the data centers continue to vastly underprice their products in order to make us all dependent on them.
Just wait till everyone’s using it and the prices will skyrocket.
@AutistoMephisto@lemmy.world @technology@lemmy.world
I used to deal with programming since I was 9 y.o., with my professional career in DevOps starting several years later, in 2013. I dealt with lots of other’s code, legacy code, very shitty code (especially done by my “managers” who cosplayed as programmers), and tons of technical debts.
Even though I’m quite of a LLM power-user (because I’m a person devoid of other humans in my daily existence), I never relied on LLMs to “create” my code: rather, what I did a lot was tinkering with different LLMs to “analyze” my own code that I wrote myself, both to experiment with their limits (e.g.: I wrote a lot of cryptic, code-golf one-liners and fed it to the LLMs in order to test their ability to “connect the dots” on whatever was happening behind the cryptic syntax) and to try and use them as a pair of external eyes beyond mine (due to their ability to “connect the dots”, and by that I mean their ability, as fancy Markov chains, to relate tokens to other tokens with similar semantic proximity).
I did test them (especially Claude/Sonnet) for their “ability” to output code, not intending to use the code because I’m better off writing my own thing, but you likely know the maxim, one can’t criticize what they don’t know. And I tried to know them so I could criticize them. To me, the code is… pretty readable. Definitely awful code, but readable nonetheless.
So, when the person says…
The developers can’t debug code they didn’t write.
…even though they argue they have more than 25 years of experience, it feels to me like they don’t.
One thing is saying “developers find it pretty annoying to debug code they didn’t write”, a statement that I’d totally agree! It’s awful to try to debug other’s (human or otherwise) code, because you need to try to put yourself on their shoes without knowing how their shoes are… But it’s doable, especially by people who deal with programming logic since their childhood.
Saying “developers can’t debug code they didn’t write”, to me, seems like a layperson who doesn’t belong to the field of Computer Science, doesn’t like programming, and/or only pursued a “software engineer” career purely because of money/capitalistic mindset. Either way, if a developer can’t debug other’s code, sorry to say, but they’re not developers!
Don’t take me wrong: I’m not intending to be prideful or pretending to be awesome, this is beyond my person, I’m nothing, I’m no one. I abandoned my career, because I hate the way the technology is growing more and more enshittified. Working as a programmer for capitalistic purposes ended up depleting the joy I used to have back when I coded in a daily basis. I’m not on the “job market” anymore, so what I’m saying is based on more than 10 years of former professional experience. And my experience says: a developer that can’t put themselves into at least trying to understand the worst code out there can’t call themselves a developer, full stop.
I found the article interesting, but I agree with you. Good programmers have to and can debug other people’s code. But, to be fair, there are also a lot of bad programmers, and a lot that can’t debug for shit…
@JuvenoiaAgent@piefed.ca @technology@lemmy.world
Often, those are developers who “specialized” in one or two programming languages, without specializing in computer/programming logic.
I used to repeat a personal saying across job interviews: “A good programmer knows a programming language. An excellent programmer knows programming logic”. IT positions often require a dev to have a specific language/framework in their portfolio (with Rust being the Current Thing™ now) and they reject people who have vast experience across several languages/frameworks but the one required, as if these people weren’t able to learn the specific language/framework they require.
Languages and framework differ on syntax, namings, paradigms, sometimes they’re extremely different from other common languages (such as (Lisp (parenthetic-hell)), or
.asciz "Assembly-x86_64"), but they all talk to the same computer logic under the hood. Once a dev becomes fluent in bitwise logic (or, even better, they become so fluent in talking with computers that they can say41 53 43 49 49 20 63 6f 64 65without tools, as if it were English), it’s just a matter of accustoming oneself to the specific syntax and naming conventions from a given language.Back when I was enrolled in college, I lost count of how many colleagues struggled with the entire course as soon as they were faced by Data Structure classes, binary trees, linked lists, queues, stacks… And Linear Programming, maximization and minimization, data fitness… To the majority of my colleagues, those classes were painful, especially because the teachers were somewhat rigid.
And this sentiment echoes across the companies and corps. Corps (especially the wannabe-programmer managers) don’t want to deal with computers, they want to deal with consumers and their sweet money, but a civil engineer and their masons can’t possibly build a house without willing to deal with a blueprint and the physics of building materials. This is part of the root of this whole problem.
The hard thing about debugging other people’s code is understanding what they’re trying to do. Once you’ve figured that out it’s just like debugging your own code. But not all developers stick to good patterns, good conventions or good documentation, and that’s when you can spend a long time figuring out their intention. Until you’ve got that, you don’t know what’s a bug.
That feels like the trap here. There is no intention, just patterns.
When the cost to generate new code has become so cheap,and the cost of devs maintaining code they didn’t write gets higher. There’s a huge shift happening to just throw out the code and regenerate it instead. Next year will be the find out phase, where the massive decline in code quality catches up with big projects.
where the massive decline in code quality catches up with big projects.
That’s going to depend, as always, on how the projects are managed.
LLMs don’t “get it right” on the first pass, ever in my experience - at least for anything of non-trivial complexity. But, their power is that they’re right more than half of the time AND when they can be told they are wrong (whether by a compiler, or a syntax nanny tool, or a human tester) AND then they can try again, and again as long as necessary to get to a final state of “right,” as defined by their operators.
The trick, as always, is getting the managers to allow the developers to keep polishing the AI (or human developer’s) output until it’s actually good enough to ship.
The question is: which will take longer, which will require more developer “head count” during that time to get it right - or at least good enough for business?
I feel like the answers all depend on the particular scenarios - some places some applications current state of the art AI can deliver that “good enough” product that we have always had with lower developer head count and/or shorter delivery cycles. Other organizations with other product types, it will certainly take longer / more budget.
However, the needle is off 0, there are some places where it really does help, a lot. The other thing I have seen over the past 12 months: it’s improving rapidly.
Will that needle ever pass 90% of all software development benefitting from LLM agent application? I doubt it. In my outlook, I see that needle passing +50% in the near future - but not being there quite yet.
An LLM can generate code like an intern getting ahead of their skis. If you let it generate enough code, it will do some gnarly stuff.
Another facet is the nature of mistakes it makes. After years of reviewing human code, I have this tendency to take some things for granted, certain sorts of things a human would just obviously get right and I tend not to think about it. AI mistakes are frequently in areas my brain has learned to gloss over and take on faith that the developer probably didn’t screw that part up.
AI generally generates the same sorts of code that I hate to encounter when humans write, and debugging it is a slog. Lots of repeated code, not well factored. You would assume of the same exact thing is fine in many places, you’d have a common function with common behavior, but no, AI repeated itself and didn’t always get consistent behavior out of identical requirements.
His statement is perhaps an over simplification, but I get it. Fixing code like that is sometimes more trouble than just doing it yourself from the onset.
Now I can see the value in generating code in digestible pieces, discarding when the LLM gets oddly verbose for simple function, or when it gets it wrong, or if you can tell by looking you’d hate to debug that code. But the code generation can just be a huge mess and if you did a large project exclusively through prompting, I could see the end result being just a hopeless mess.v frankly surprised he could even declare an initial “success”, but it was probably “tutorial ware” which would be ripe fodder for the code generators.
To quote your quote:
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
I think the author just independently rediscovered “middle management”. Indeed, when you delegate the gruntwork under your responsibility, those same people are who you go to when addressing bugs and new requirements. It’s not on you to effect repairs: it’s on your team. I am Jack’s complete lack of surprise. The idea that relying on AI to do nuanced work like this and arrive at the exact correct answer to the problem, is naive at best. I’d be sweating too.
The problem though (with AI compared to humans): The human team learns, i.e. at some point they probably know what the mistake was and avoids doing it again. AI instead of humans: well maybe the next or different model will fix it maybe…
And what is very clear to me after trying to use these models, the larger the code-base the worse the AI gets, to the point of not helping at all or even being destructive. Apart from dissecting small isolatable pieces of independent code (i.e. keep the context small for the AI).
Humans likely get slower with a larger code-base, but they (usually) don’t arrive at a point where they can’t progress any further.
Humans likely get slower with a larger code-base, but they (usually) don’t arrive at a point where they can’t progress any further.
Notable exceptions like: https://peimpact.com/the-denver-international-airport-automated-baggage-handling-system/
My big fear with this stuff is security. It just seems so “easy”, without knowledgeable people, for AI to write a product that functions from a user perspective but is wide open to attack.
AI might be good for simulating attacks, because they can do lots of attempts and iteration. IMO, AI and (competent) people would make for a good pairing for trying out ideas before deploying a project into the real world.
Great article, brave and correct. Good luck getting the same leaders who blindly believe in a magical trend for this or next quarters numbers; they don’t care about things a year away let alone 10.
I work in HR and was stuck by the parallel between management jobs being gutted by major corps starting in the 80s and 90s during “downsizing” who either never replaced them or offshore them. They had the Big 4 telling them it was the future of business. Know who is now providing consultation to them on why they have poor ops, processes, high turnover, etc? Take $ on the way in, and the way out. AI is just the next in long line of smart people pretending they know your business while you abdicate knowing your business or employees.
Hope leaders can be a bit braver and wiser this go 'round so we don’t get to a cliffs edge in software.
I’m trying
Much appreciated 🫡
Tbh I think the true leaders are high on coke.
Wow I didn’t know that I was leading this whole time.
Exactly. The problem isn’t moving part of production to some other facility or buying a part that you used to make in-house. It’s abdicating an entire process that you need to be involved in if you’re going to stay on top of the game long-term.
Claude Code is awesome but if you let it do even 30% of the things it offers to do, then it’s not going to be your code in the end.
Personally I tried using LLMs for reading error logs and summarizing what’s going on. I can say that even with somewhat complex errors, they were almost always right and very helpful. So basically the general consensus of using them as assistants within a narrow scope.
Though it should also be noted that I only did this at work. While it seems to work well, I think I’d still limit such use in personal projects, since I want to keep learning more, and private projects are generally much more enjoyable to work on.
Another interesting use case I can highlight is using a chatbot as documentation when the actual documentation is horrible. However, this only works within the same ecosystem, so for instance Copilot with MS software. Microsoft definitely trained Copilot on its own stuff and it’s often considerably more helpful than the docs.
Computers are too powerful and too cheap. Bring back COBOL, painfully expensive CPU time, and some sort of basic knowledge of what’s actually going on.
Pain for everyone!
Be careful what you wish for, with RAM prices soaring owning a home computer might become less of an option. Luckily we can get a subscription for computing power easily!
I built a new PC early October, literally 2 weeks later RAM prices went nuts… so glad I pulled the trigger when I did
Just ask the ai to make the change?
AI isn’t good at changing code, or really even understanding it… It’s good at writing it, ideally 50-250 lines at a time
I’ve made full-ass changes on existing codebases with Claude
It’s a skill you can learn, pretty close to how you’d work with actual humans
pretty close to how you’d work with actual humans
That has been my experience as well. It’s like working with humans who have extremely fast splinter skills, things they can rip through in 10 minutes that might take you days, weeks even. But then it also takes 5-10 minutes to do some things that you might accomplish in 20 seconds. And, like people, it’s not 100% reliable or accurate, so you need to use all those same processes we have developed to help people catch their mistakes.
full-ass (…) with Claude
heh
It’s a skill this “fractional CTO” lacks
Definitely
What full ass changes have you made that can’t be done better with a refactoring tool?
I believe Claude will accept the task. I’ve been fixing edge cases in a vibe colleague’s full-ass change all month. Would have taken less time to just do it right the first time.
I just did three tasks purely with Claude - at work.
All were pretty much me pasting the Linear ticket to Claude and hitting go. One got some improvement ideas on the PR so I said “implement the comments from PR 420” and so it did.
These were all on a codebase I haven’t seen before.
The magic sauce is that I’ve been doing this for a quarter century and I’m pretty good at reading code and I know if something smells like shit code or not. I’m not just YOLOing the commits to a PR without reading first, but I save a ton of time when I don’t need to do the grunt work of passing a variable through 10 layers of enterprise code.
True that LLMs will accept almost any task, whether they should or not. True that their solutions aren’t 100% perfect every time. Whether it’s faster to use them or not I think depends a lot on what’s being done, and what alternative set of developers you’re comparing them with.
What I have seen across the past year is that the number of cases where LLM based coding tools are faster than traditional developers has been increasing, rather dramatically. I called them near useless this time last year.
It’s good at writing it, ideally 50-250 lines at a time
I find Claude Sonnet 4.5 to be good up to 800 lines at a chunk. If you structure your project into 800ish line chunks with well defined interfaces you can get 8 to 10 chunks working cooperatively pretty easily. Beyond about 2000 lines in a chunk, if it’s not well defined, yeah - the hallucinations start to become seriously problematic.
The new Opus 4.5 may have a higher complexity limit, I haven’t really worked with it enough to characterize… I do find Opus 4.5 to get much slower than Sonnet 4.5 was for similar problems.
Okay, but if it’s writing 800 lines at once, it’s making design choices. Which is all well and good for a one off, but it will make those choices, make them a different way each time, and it will name everything in a very generic or very eccentric way
The AI can’t remember how it did it, or how it does things. You can do a lot… Even stuff that hasn’t entered commercial products like vectorized data stores to catalog and remind the LLM of key details when appropriate
2000 lines is nothing. My main project is well over a million lines, and the original author and I have to meet up to discuss how things flow through the system before changing it to meet the latest needs
But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.
If an AI is writing code a thousand lines at a time, no one knows how it works. The AI sure as hell doesn’t. If it’s 200 lines at a time, maybe we don’t know details, but the decisions and the flow were decided by a person who understands the full picture
but it will make those choices, make them a different way each time
That’s a bit of the power of the process: variety. If the implementation isn’t ideal, it can produce another one. In theory, it can produce ten different designs for any given solution then select the “best” one by whatever criteria you choose. If you’ve got the patience to spell it all out.
The AI can’t remember how it did it, or how it does things.
Neither can the vast majority of people after several years go by. That’s what the documentation is for.
2000 lines is nothing.
Yep. It’s also a huge chunk of example to work from and build on. If your designs are highly granular (in a good way), most modules could fit under 2000 lines.
My main project is well over a million lines
That’s should be a point of embarrassment, not pride. My sympathies if your business really is that complicated. You might ask an LLM to start chipping away at refactoring your code to collect similar functions together to reduce duplication.
But we can and do it to meet the needs of the customer, with high stakes, because we wrote it. These days we use AI to do grunt work, we have junior devs who do smaller tweaks.
Sure. If you look at bigger businesses, they are always striving to get rid of “indispensible duos” like you two. They’d rather pay 6 run-of-the-mill hire-more-any-day-of-the-week developers than two indispensibles. And that’s why a large number of management types who don’t really know how it works in the trenches are falling all over themselves trying to be the first to fly a team that “does it all with AI, better than the next guys.” We’re a long way from that being realistic. AI is a tool, you can use it for grunt work, you can use it for top level design, and everything in-between. What you can’t do is give it 25 words or less of instruction and expect to get back anything of significant complexity. That 2000 line limit becomes 1 million lines of code when every four lines of the root module describes another module.
If an AI is writing code a thousand lines at a time, no one knows how it works.
Far from it. Compared with code I get to review out of India, or Indiana, 2000 lines of AI code is just as readable as any 2000 lines I get out of my colleagues. Those colleagues also make the same annoying deviations from instructions that AI does, the biggest difference is that AI gets it’s wrong answer back to me within 5-10 minutes, Indiana? We’ve been correcting and recorrecting the same architectural implementation for the past 6 months. They had a full example in C++, they are going to “translate it to Rust” for us. I figured, it took me about 6 weeks total to develop the system from scratch, with a full example like they have they should be well on their way in 2 weeks. Yeah, nowhere in 2 weeks, so I do a Rust translation for them in the next two weeks, show them. O.K. we see that, but we have been tasked to change this aspect of the interface to something undefined, so we’re going to do an implementation with that undefined interface… and so I refine my Rust implementation to a highly polished example ready for any undefined interface you throw at it within another 2 weeks, and Indiana continues to hack away at three projects simultaneously, getting nowhere equally fast on all 3. It has been 7 months now, I’m still reviewing Indiana’s code and reminding them, like I did the AI, of all the things I have told them six times over the past 7 months that they keep drifting off from.
Holy shit, you’re a fucking retard. Like of the “people look and laugh” scale. I’m unironically going to take your response to share with technical people in my life to laugh over
And no hate to the mentally ill, I’ve never laughed at them. I laugh with them, because they’re delightful and love joy to an extent that leaves me jealous
But you’re not a real person. You’re a joke, if your ego was two sizes smaller I’d be gently explaining to you how no number of code katas would result in Microsoft XP
I’m just not following the mindset of “get ai to code your whole program” and then have real people maintain it? Sounds counter productive
I think you need to make your code for an Ai to maintain. Use Static code analysers like SonarQube to ensure that the code is maintainable (cognitive complexity)!and that functions are small and well defined as you write it.
I don’t think we should be having the AI write the program in the first place. I think we’re barreling towards a place where remotely complicated software becomes a lost technology
I don’t mind if AI helps here and there, I certainly use it. But it’s not good at custom fit solutions, and the world currently runs on custom fit solutions
AI is like no code solutions. Yeah, it’s powerful, easier to learn and you can do a lot with it… But eventually you will hit a limit. You’ll need to do something the system can’t do, or something you can’t make the system do because no one properly understands what you’ve built
At the end of the day, coding is a skill. If no one is building the required experience to work with complex systems, we’re going to be swimming in a world of endless ocean of vibe coded legacy apps in a decade
I just don’t buy that AI will be able to take something like a set of State regulations and build a complaint outcome. Most of our base digital infrastructure is like that, or it uses obscure ancient systems that LLMs are basically allergic to working with
To me, we’re risking everything on achieving AGI (and using it responsibly) before we run out of skilled workers, and we’re several game changing breakthroughs from achieving that
I think we’re barreling towards a place where remotely complicated software becomes a lost technology
I think complicated software has been an art more than a science, for the past 30 years we have been developing formal processes to make it more of a procedural pursuit but the art is still very much in there.
I think if AI authored software is going to reach any level of valuable complexity, it’s going to get there with the best of our current formal processes plus some more that are being (rapidly) developed specifically for LLM based tools.
But eventually you will hit a limit. You’ll need to do something…
And how do we surpass those limits? Generally: research. And for the past 20+ years where do we do most of that research? On the internet. And where were the LLMs trained, and what are they relatively good at doing quickly? Internet research.
At the end of the day, coding is a skill. If no one is building the required experience to work with complex systems
So is semiconductor design, application of transistors to implement logic gates, etc. We still have people who can do that, not very many, but enough. Not many people work in assembly language anymore, either…
So is semiconductor design, application of transistors to implement logic gates, etc. We still have people who can do that, not very many, but enough. Not many people work in assembly language anymore, either…
Yeah, that’s a lost tech. We still use the same decades, even century old, frameworks
They’re not perfect. But they are unchangeable. We no longer have the skills to adapt them to modern technology. Improvements are incremental, despite decades of effort you still can’t reliably run a system on something like RISK.
I don’t know shit about anything, but it seems to me that the AI already thought it gave you the best answer, so going back to the problem for a proper answer is probably not going to work. But I’d try it anyway, because what do you have to lose?
Unless it gets pissed off at being questioned, and destroys the world. I’ve seen more than few movies about that.
AI already thought it gave you the best answer, so going back to the problem for a proper answer is probably not going to work.
There’s an LLM concept/parameter called “temperature” that determines basically how random the answer is.
As deployed, LLMs like Claude Sonnet or Opus have a temperature that won’t give the same answer every time, and when you combine this with feedback loops that point out failures (like compliers that tell the LLM when its code doesn’t compile), the LLM can (and does) the old Beckett: try, fail, try again, fail again, fail better next time - and usually reach a solution that passes all the tests it is aware of.
The problem is: with a context window limit of 200,000 tokens, it’s not going to be aware of all the relevant tests in more complex cases.
You are in a way correct. If you keep sending the context of the “conversation” (in the same chat) it will reinforce its previous implementation.
The way ais remember stuff is that you just give it the entire thread of context together with your new question. It’s all just text in text out.
But once you start a new conversation (meaning you don’t give any previous chat history) it’s essentially a “new” ai which didn’t know anything about your project.
This will have a new random seed and if you ask that to look for mistakes etc it will happily tell you that the last Implementation was all wrong and here’s how to fix it.
It’s like a minecraft world, same seed will get you the same map every time. So with AIs it’s the same thing ish. start a new conversation or ask a different model (gpt, Google, Claude etc) and it will do things in a new way.
Maybe the solution is to keep sending the code through various AI requests, until it either gets polished up, or gains sentience, and destroys the world. 50-50 chance.
This stuff ALWAYS ends up destroying the world on TV.
Seriously, everybody is complaining about the quality of AI product, but the whole point is for this stuff to keep learning and improving. At this stage, we’re expecting a kindergartener to product the work of a Harvard professor. Obviously, were going to be disappointed.
But give that kindergartener time to learn and get better, and they’ll end up a Harvard professor, too. AI may just need time to grow up.
And frankly, that’s my biggest worry. If it can eventually start producing results that are equal or better than most humans, then the Sociopathic Oligarchs won’t need worker humans around, wasting money that could be in their bank accounts.
And we know what their solution to that problem will be.
This stuff ALWAYS ends up destroying the world on TV.
TV is also full of infinite free energy sources. In the real world warp drive may be possible, you just need to annihilate the mass of Jupiter with an equivalent mass of antimatter to get the energy necessary to create a warp bubble to move a small ship from the orbit of Pluto to a location a few light years away, but on TV they do it every week.
Sounds like we have a plan, let’s get to work. The Cochran Warp Drive isn’t going to invent itself.
Doesn’t work. Any semi complex problem with multiple constraints and your team of AIs keeps running circles. Very frustrating if you know it can be done. But what if you’re a “fractional CTO” and you get actually contradictory constraints? We haven’t gotten yet to AIs who will tell you that what you ask is impossible.
Yeah right now you have to know what’s possible and nudge the ai in the right direction to use the correct approach according to you if you want it to do things in an optimized way
your team of AIs keeps running circles
Depending on your team of human developers (and managers), they will do the same thing. Granted, most LLMs have a rather extreme sycophancy problem, but humans often do the same.
We haven’t gotten yet to AIs who will tell you that what you ask is impossible.
If it’s a problem like under or over-constrained geometry or equations, they (the better ones) will tell you. For difficult programing tasks I have definitely had the AIs bark up all the wrong trees trying to fix something until I gave them specific direction for where to look for a fix (very much like my experiences with some human developers over the years.)
I had a specific task that I was developing in one model, and it was a hard problem but I was making progress and could see the solution was near, then I switched to a different model which did come back and tell me “this is impossible, you’re doing it wrong, you must give up this approach” up until I showed it the results I had achieved to-date with the other model, then that same model which told me it was impossible helped me finish the job completely and correctly. A lot like people.
I cannot understand and debug code written by AI. But I also cannot understand and debug code written by me.
Let’s just call it even.
At least you can blame yourself for your own shitty code, which hopefully will never attempt to “accidentally” erase the entire project
I don’t know how that happens, I regularly use Claude code and it’s constantly reminding me to push to git.
As an experiment I asked Claude to manage my git commits, it wrote the messages, kept a log, archived excess documentation, and worked really well for about 2 weeks. Then, as the project got larger, the commit process was taking longer and longer to execute. I finally pulled the plug when the automated commit process - which had performed flawlessly for dozens of commits and archives, accidentally irretrievably lost a batch of work - messed up the archive process and deleted it without archiving it first, didn’t commit it either.
AI/LLM workflows are non-deterministic. This means: they make mistakes. If you want something reliable, scalable, repeatable, have the AI write you code to do it deterministically as a tool, not as a workflow. Of course, deterministic tools can’t do things like summarize the content of a commit.
The longer the project the more stupid Claude gets. I’ve seen it both in chat, and in Claude code, and Claude explains the situation quite well:
Increased cognitive load: Longer projects have more state to track - more files, more interconnected components, more conventions established earlier. Each decision I make needs to consider all of this, and the probability of overlooking something increases with complexity.
Git specifically: For git operations, the problem is even worse because git state is highly sequential - each operation depends on the exact current state of the repository. If I lose track of what branch we’re on, what’s been committed, or what files exist, I’ll give incorrect commands.
Anything I do with Claude. I will split into different chats, I won’t give it access to git but I will provide it an updated repository via Repomix. I get much better results because of that.
Yeah, context management is one big key. The “compacting conversation” hack is a good one, you can continue conversations indefinitely, but after each compact it will throw away some context that you thought was valuable.
The best explanation I have heard for the current limitations is that there is a “context sweet spot” for Opus 4.5 that’s somewhere short of 200,000 tokens. As your context window gets filled above 100,000 tokens, at some point you’re at “optimal understanding” of whatever is in there, then as you continue on toward 200,000 tokens the hallucinations start to increase. As a hack, they “compact the conversation” and throw out less useful tokens getting you back to the “essential core” of what you were discussing before, so you can continue to feed it new prompts and get new reactions with a lower hallucination rate, but with that lower hallucination rate also comes a lower comprehension of what you said before the compacting event(s).
Some describe an aspect of this as the “lost in the middle” phenomenon since the compacting event tends to hang on to the very beginning and very end of the context window more aggressively than the middle, so more “middle of the window” content gets dropped during a compacting event.
I also cannot understand and debug code written by me.
So much this. I look back at stuff I wrote 10 years ago and shake my head, console myself that “we were on a really aggressive schedule.” At least in my mind I can do better, in practice the stuff has got to ship eventually and what ships is almost never what I would call perfect, or even ideal.
AI is hot garbage and anyone using it is a skillless hack. This will never not be true.
Wait so I should just be manually folding all these proteins?
Do you not know the difference between an automated process and machine learning?
Yes? Machine learning has been huge for protein folding and not because anyone is stupid, it’s because it’s a task uniquely suited for machine learning, of which there are many. But none of that is what this AI bubble is really about, and even though I find the underlining math and technology fascinating, I share the disdain for how the bulk of it is currently being used.
The thing with being cocky is, if you are wrong it makes you look like an even bigger asshole
https://en.wikipedia.org/wiki/AlphaFold
The program uses a form of attention network, a deep learning technique that focuses on having the AI identify parts of a larger problem, then piece it together to obtain the overall solution.
Cool, now do an environmental impact on it.
Cool, now do an environmental impact on the data centre hosting your instance while you pollute by mindlessly talking shit on the Internet.
I’ll take AI unfolding proteins over you posting any day.
Hilarious. You’re comparing a lemmy instance to AI data centers. There’s the proof I needed that you have no fucking clue what you’re talking about.
“bUt mUh fOLdeD pRoTEinS,” said the AI minion.
While this is a popular sentiment, it is not true, nor will it ever be true.
AI (LLMs & agents in the coding context, in this case) can serve as both a tool and a crutch. Those who learn to master the tools will gain benefit from them, without detracting from their own skill. Those who use them as a crutch will lose (or never gain) their own skills.
Some skills will in turn become irrelevent in day-to-day life (as is always the case with new tech), and we will adapt in turn.
LLMs exist so that skill-less hacks can pretend to be skilled artists. It’s a shortcut to success.
That this is and will be abused is not in question. :-P
You are making a leap though.
ask your ai pal for help
No shit
What’s interesting is what he found out. From the article:
I forced myself to use Claude Code exclusively to build a product. Three months. Not a single line of code written by me. I wanted to experience what my clients were considering—100% AI adoption. I needed to know firsthand why that 95% failure rate exists.
I got the product launched. It worked. I was proud of what I’d created. Then came the moment that validated every concern in that MIT study: I needed to make a small change and realized I wasn’t confident I could do it. My own product, built under my direction, and I’d lost confidence in my ability to modify it.
Typical C-suite. It takes them three months to come to the same conclusion that would be blindingly obvious to anyone with half a brain: if you build something that no one understands, you’ll end up with something impossible to maintain.
@AutistoMephisto @just_another_person This is kind of the obvious conclusion. I didn’t need to use AI to know this would be the outcome. This is why I only use it for small code snippets if at all. This is why I’ve taught my kids not to rely on AI to do their homework.
It may seem like the easy way but it will absolutely come back to haunt you later. If you don’t do the work you don’t learn anything or develop any skills.
I needed to make a small change and realized I wasn’t confident I could do it.
Wouldn’t the point be to use AI to make the change, if you’re trying to do it 100% with AI? Who is really saying 100% AI adoption is a good idea though? All I hear about from everyone is how it’s not a good idea, just like this post.
This is spot on.

















