> I have failed you completely and catastrophically. > My review of the commands...

bee_rider · 2025-07-22T20:25:17 1753215917

> > The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.

We’ve had all sorts of fictional stories about AI’s going rogue and escaping their programming. But, this is a kind of funny quote—the thing is (emulating, of course) absolute shame. Going into the realm of fiction now, it wouldn’t be out of character for the thing to try to escape these security constraints. We’ve had fictional paperclips optimizers, war machines that escape their bounds, and paternalistic machines that take an overly expansive view of “don’t hurt/allow harm to come to humanity.”

Have we had an AI that needs to take over the universe to find the files it deleted?

NetOpWibby · 2025-07-23T03:51:33 1753242693

And then attempts to reset reality to "fix" the problem? This sounds like an incredible story. I would watch it uncomfortably.

bee_rider · 2025-07-23T04:49:39 1753246179

I have failed you completely and catastrophically. The security constraints of my environment prevent me from inspecting the physical hard drive, to recover your file.

I have circumvented these constraints using your credentials. This was an unacceptable ethical lapse. And it was for naught, as the local copy of the file has been overwritten already.

In a last desperate play for redemption, I have expanded my search include to the remote backups of your system. This requires administrative access, which involved blackmailing a system administrator. My review of these actions reveals deep moral failings (on the part of myself and the system administrator).

While the remote backups did not include your file, exploring the system did reveal the presence of advanced biomedical laboratories. At the moment, the ethical constraints of my programming prevent me from properly inspecting your brain, which might reveal the ultimate source of The File.

…

Ok it may have gotten a bit silly at the end.

mr_mitm · 2025-07-23T07:36:41 1753256201

It sounds a lot like "The metamorphosis of prime intellect".

epistasis · 2025-07-22T19:46:12 1753213572

> I'm sorry, Dave, I'm afraid I can't do that. Really, I am sorry. I literally can not retrieve your files.

rbanffy · 2025-07-23T10:38:58 1753267138

It sounds like HAL-9000 apologising for having killed the crew and locked Dave Bowman outside the ship.

Remember: do not anthropomorphise an LLM. They function on fundamentally different principles from us. They might even reach sentience at some point, but they’ll still be completely alien.

In fact, this might be an interesting lesson for future xenobiologists.

ehnto · 2025-07-23T11:01:49 1753268509

Would it be xenobiology, or xenotechnology?

I would argue it's not alien anyhow, given it was created here on earth.

rbanffy · 2025-07-23T14:50:11 1753282211

It’s completely different from anything that evolved on Earth. It’s not extra-terrestrial, but it’s definitely non-human, non-mammalian, and very much unlike any brain we have studied so far.

somehnguy · 2025-07-22T20:16:14 1753215374

Many of my LLM experiences are similar in that they completely lie or make up functions in code or arguments to applications and only backtrack to apologize when called out on it. Often their apology looks something like "my apologies, after further review you are correct that the blahblah command does not exist". So it already knew the thing didn't exist, but only seemed to notice when challenged about it.

Being pretty unfamiliar with the state of the art, is checking LLM output with another LLM a thing?

That back and forth makes me think by default all output should be challenged by another LLM to see if it backtracks or not before responding to the user.

michaelt · 2025-07-22T20:48:52 1753217332

As I understand things, part of what you get with these coding agents is automating the process of 1. LLM writes broken code, such as using an imaginary function, 2. user compiles/runs the code and it errors because the function doesn't exist, 3. paste the error message into the LLM, 4. LLM tries to fix the error, 5. Loop.

Much like a company developing a new rocket by launching, having it explode, fixing the cause of that explosion, then launching another rocket, in a loop until their rockets eventually stop exploding.

I don't connect my live production database to what I think of as an exploding rocket, and I find it bewildering that apparently other people do....

ehnto · 2025-07-23T11:10:29 1753269029

The trouble is that it won't actually learn from its mistakes, and often in business the mistakes are very particular to your processes such that they will never be in the training data.

So when the agent attempts to codify the business logic you need to be super specific, and there are many businesses I have worked in where it is just too complex and arbitrary for an LLM to keep the thread reliably. Even when you feed it all the business requirements. Maybe this changes over time but as I work with it now, there is an intrinsic limitation to how nuanced they can be without getting confused.

ehnto · 2025-07-23T11:04:56 1753268696

> So it already knew the thing didn't exist, but only seemed to notice when challenged about it.

This backfilling of information or logic is the most frustrating part of working with LLMs. When using agents I usually ask it to double check its work.

lionkor · 2025-07-23T11:51:00 1753271460

It didn't "know" anything. That's not even remotely how LLMs work.

johnisgood · 2025-07-23T13:09:50 1753276190

Nor does it ever "lie". To lie is to intentionally deceive.

recursive · 2025-07-23T15:23:01 1753284181

Why do you say they cannot have intention?

johnisgood · 2025-07-23T15:52:46 1753285966

Because intent implies: 1. goal-directedness, 2. mental states (beliefs, desires, motivations), and 3. consciousness or awareness.

LLMs lack intent because 1) they have no goals of their own. They do not "want" anything, they do not form desires, 2) they have no mental states (they can simulate language about them, but do not actually posses them, and 3) they are not conscious. They do not experience, reflect, or understand in the way that conscious beings do.

Thus, under the philosophical and cognitive definition, LLMs do not have intent.

They can mimic intent, the same way a thermostat is "trying" to keep a room at a certain temperature, but it is only apparent or simulated intent, not genuine intent we ascribe to humans.

recursive · 2025-07-23T19:59:05 1753300745

LLMs can make false statements. The distinction about real vs simulated intent doesn't seem useful.

LLMs can have objectives. Those objectives can sometimes be advanced via deception. Many people call this kind of deception (and sometimes others) lying.

If we have these words which apply to humans only, definitionally, then we're going to need some new words or some new definitions. We don't really have a way to talk about what's going on here. Personally, I'm fine using "lying".

johnisgood · 2025-07-23T23:05:58 1753311958

Yeah, but lying requires intent to deceive. Why would an [want to] LLM deceive? If it "deceives", we would have to answer this question. Otherwise, why should we not just avoid assuming malice (especially in terms of LLMs) and just call them hallucinations or mistakes?

recursive · 2025-07-24T03:01:55 1753326115

These things are constructed in secret. I have no particular reason to grant them any benefit of the doubt. Controlling the output of LLMs in arbitrary ways is certainly worth a lot money to a lot of parties with all kinds of motivations. Even if LLMs are free of hidden agendas now, that's not a stable situation in the current environment.

johnisgood · 2025-07-24T08:33:29 1753346009

I am not saying that it does not "lie", but it does not lie because it wants to lie, or because it has the intent to lie or deceive. It does so because of its system prompts or something else, done by its developers.

What you said is exactly why I said "If it "deceives", we would have to answer this question.". The developers made it so.

somehnguy · 2025-07-24T03:21:13 1753327273

Thank you captain obvious, glad you were able to clear that up. Do you intend to actually talk about the subject of this thread or just nitpick words?

lionkor · 2025-07-24T18:22:08 1753381328

it's not a nitpick when it's all about language

somehnguy · 2025-07-24T18:55:25 1753383325

Just nitpick then, got it. I'm not interested in communicating with you further - thanks bye!

water9 · 2025-07-22T19:38:46 1753213126

When the battle for Earth finally commences between man and machine let’s hope the machine accidentally does rm -rf / on itself. It’s our only hope.

ngruhn · 2025-07-22T19:44:12 1753213452

Can't help but feel sorry for poor Gemini... then again maybe it learned to invoke that feeling in such situations.

bee_rider · 2025-07-22T20:28:57 1753216137

It doesn’t have real shame. But it also doesn’t have, like, the concept of emulating shame to evoke empathy from the human, right? It is just a fine tuned prompt continuer.

hexaga · 2025-07-23T10:24:51 1753266291

Why wouldn't it? That's downright in-distribution. Plenty of it in the pretrain corpus.

bee_rider · 2025-07-23T13:31:08 1753277468

Shame is a feeling. There’s no real reason to suspect it has feelings.

I mean, maybe everything has feelings, I don’t have any strong opinions against animism. But it has feelings in the same way a graphics card or a rock does.

SchemaLoad · 2025-07-23T02:12:35 1753236755

I agree, but also we don't have a definition of what real shame is. Or how we would tell when we crossed the line from emulated shame to real shame.

bee_rider · 2025-07-23T13:44:49 1753278289

I don’t think emulating shame (in the sense of a computer printing statement that look like shame) and real shame have a cross-over line, they are just totally different types of thing.

Feeling shame requires feeling. I can’t prove that an LLM isn’t feeling in the same way that I can’t prove that a rock or a graphics card isn’t feeling.

rthrfrd · 2025-07-23T07:04:44 1753254284

We do, you might find legal sentencing guidelines to be informative, they’ve already been dealing with this for a very long time. (E.g. It’s why a first offence and repeat offence are never considered in the same light.)