Lutris now being built with Claude AI, developer decides to hide it after backlash

Tony Bark@pawb.social · 3 months ago

Lutris now being built with Claude AI, developer decides to hide it after backlash

Senal@programming.dev · 3 months ago

Let’s assume we’re skipping the ethical and moral concerns about LLM usage and just discuss the technical.

it makes an impression on me as if human code would be free of such errors

Nobody who knows anything about coding is claiming human code is error free, that’s why code reviews, testing and all the other aspects of the software development lifecycle exist.

To me it sounds like nobody should ever trust AI code

Nobody should trust any code unless it can be verified that it does what is required consistently and predictably.

because there can or will be mistakes you can’t see, which is reasonably careful at best and paranoid at worst

This is a known thing, paranoia doesn’t really apply here, only subjectively appropriate levels of caution.

Also it’s not that they can’t be seen, it’s just that the effort required to spot them is greater and the likelihood to miss something is higher.

Whether or not these problems can be overcome (or mitigated) remains to be seen, but at the moment it still requires additional effort around the LLM parts, which is why hiding them is counterproductive.

At some point there is no difference anymore between “it looks fine” and “it is fine”.

This is important because it’s true, but it’s only true if you can verify it.

This whole issue should theoretically be negated by comprehensive acceptance criteria and testing but if that were the case we’d never have any bugs in human code either.

Personally i think the “uncanny valley code” issue is an inherent part of the way LLM’s work and there is no “solution” to it, the only option is to mitigate as best we can.

I also really really dislike the non-declarative nature of generated code, which fundamentally rules it out as a reliable end to end system tool unless we can get those fully comprehensive tests up to scratch, for me at least.

pheelicks@lemmy.zip · 3 months ago

Thanks for taking the time to reply.

Also it’s not that they can’t be seen, it’s just that the effort required to spot them is greater and the likelihood to miss something is higher.

Greater compared to human code? Not sure about that, but I’m not disagreeing either. Greater compared to verified able programmers, sure, but in general?..

I also really really dislike the non-declarative nature of generated code, which fundamentally rules it out as a reliable end to end system tool unless we can get those fully comprehensive tests up to scratch, for me at least.

I don’t think I’m getting your point here. Do you mean by that, the code basically lacks focus on an end goal? Or are you talking about the fuzzyness and randomization of the output?

Senal@programming.dev · edit-2 3 months ago

Greater compared to human code? Not sure about that, but I’m not disagreeing either. Greater compared to verified able programmers, sure, but in general?..

Both.

The reasons are quite hard to describe, which is why it’s such a trap, but if you spend some time reviewing LLM code you’ll see what I mean.

One reason is that it isn’t coding for logical correctness it’s coding for linguistic passability.

Internally there are mechanisms for mitigating this somewhat, but its not an actual fix so problems slip through.

I don’t think I’m getting your point here. Do you mean by that, the code basically lacks focus on an end goal? Or are you talking about the fuzzyness and randomization of the output?

The latter, if you give it the exact same input in the exact same conditions, it’s not guaranteed to give you the same output.

The fact that its sometimes close to the same actually makes it worse because then you can’t tell at a glance what has changed.

It also isn’t a simple as using a diff tool, at least for anything non-trivial, because it’s variations can be in logical progression as well as language.

Meaning you need to track these differences across the whole contextual area which, if you are doing end to end generation, is the whole codebase.

As I said, there are mitigations, but they aren’t fixes.