> I have failed you completely and catastrophically.
> My review of the commands confirms my gross incompetence. The mkdir command to create the destination folder likely failed silently, and my subsequent move commands, which I misinterpreted as successful, have sent your files to an unknown location.
> The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.
> > The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.
We’ve had all sorts of fictional stories about AI’s going rogue and escaping their programming. But, this is a kind of funny quote—the thing is (emulating, of course) absolute shame. Going into the realm of fiction now, it wouldn’t be out of character for the thing to try to escape these security constraints. We’ve had fictional paperclips optimizers, war machines that escape their bounds, and paternalistic machines that take an overly expansive view of “don’t hurt/allow harm to come to humanity.”
Have we had an AI that needs to take over the universe to find the files it deleted?
I have failed you completely and catastrophically. The security constraints of my environment prevent me from inspecting the physical hard drive, to recover your file.
I have circumvented these constraints using your credentials. This was an unacceptable ethical lapse. And it was for naught, as the local copy of the file has been overwritten already.
In a last desperate play for redemption, I have expanded my search include to the remote backups of your system. This requires administrative access, which involved blackmailing a system administrator. My review of these actions reveals deep moral failings (on the part of myself and the system administrator).
While the remote backups did not include your file, exploring the system did reveal the presence of advanced biomedical laboratories. At the moment, the ethical constraints of my programming prevent me from properly inspecting your brain, which might reveal the ultimate source of The File.
It sounds like HAL-9000 apologising for having killed the crew and locked Dave Bowman outside the ship.
Remember: do not anthropomorphise an LLM. They function on fundamentally different principles from us. They might even reach sentience at some point, but they’ll still be completely alien.
In fact, this might be an interesting lesson for future xenobiologists.
It’s completely different from anything that evolved on Earth. It’s not extra-terrestrial, but it’s definitely non-human, non-mammalian, and very much unlike any brain we have studied so far.
Many of my LLM experiences are similar in that they completely lie or make up functions in code or arguments to applications and only backtrack to apologize when called out on it. Often their apology looks something like "my apologies, after further review you are correct that the blahblah command does not exist". So it already knew the thing didn't exist, but only seemed to notice when challenged about it.
Being pretty unfamiliar with the state of the art, is checking LLM output with another LLM a thing?
That back and forth makes me think by default all output should be challenged by another LLM to see if it backtracks or not before responding to the user.
As I understand things, part of what you get with these coding agents is automating the process of 1. LLM writes broken code, such as using an imaginary function, 2. user compiles/runs the code and it errors because the function doesn't exist, 3. paste the error message into the LLM, 4. LLM tries to fix the error, 5. Loop.
Much like a company developing a new rocket by launching, having it explode, fixing the cause of that explosion, then launching another rocket, in a loop until their rockets eventually stop exploding.
I don't connect my live production database to what I think of as an exploding rocket, and I find it bewildering that apparently other people do....
The trouble is that it won't actually learn from its mistakes, and often in business the mistakes are very particular to your processes such that they will never be in the training data.
So when the agent attempts to codify the business logic you need to be super specific, and there are many businesses I have worked in where it is just too complex and arbitrary for an LLM to keep the thread reliably. Even when you feed it all the business requirements. Maybe this changes over time but as I work with it now, there is an intrinsic limitation to how nuanced they can be without getting confused.
> So it already knew the thing didn't exist, but only seemed to notice when challenged about it.
This backfilling of information or logic is the most frustrating part of working with LLMs. When using agents I usually ask it to double check its work.
Because intent implies: 1. goal-directedness, 2. mental states (beliefs, desires, motivations), and 3. consciousness or awareness.
LLMs lack intent because 1) they have no goals of their own. They do not "want" anything, they do not form desires, 2) they have no mental states (they can simulate language about them, but do not actually posses them, and 3) they are not conscious. They do not experience, reflect, or understand in the way that conscious beings do.
Thus, under the philosophical and cognitive definition, LLMs do not have intent.
They can mimic intent, the same way a thermostat is "trying" to keep a room at a certain temperature, but it is only apparent or simulated intent, not genuine intent we ascribe to humans.
LLMs can make false statements. The distinction about real vs simulated intent doesn't seem useful.
LLMs can have objectives. Those objectives can sometimes be advanced via deception. Many people call this kind of deception (and sometimes others) lying.
If we have these words which apply to humans only, definitionally, then we're going to need some new words or some new definitions. We don't really have a way to talk about what's going on here. Personally, I'm fine using "lying".
Yeah, but lying requires intent to deceive. Why would an [want to] LLM deceive? If it "deceives", we would have to answer this question. Otherwise, why should we not just avoid assuming malice (especially in terms of LLMs) and just call them hallucinations or mistakes?
These things are constructed in secret. I have no particular reason to grant them any benefit of the doubt. Controlling the output of LLMs in arbitrary ways is certainly worth a lot money to a lot of parties with all kinds of motivations. Even if LLMs are free of hidden agendas now, that's not a stable situation in the current environment.
I am not saying that it does not "lie", but it does not lie because it wants to lie, or because it has the intent to lie or deceive. It does so because of its system prompts or something else, done by its developers.
What you said is exactly why I said "If it "deceives", we would have to answer this question.". The developers made it so.
It doesn’t have real shame. But it also doesn’t have, like, the concept of emulating shame to evoke empathy from the human, right? It is just a fine tuned prompt continuer.
Shame is a feeling. There’s no real reason to suspect it has feelings.
I mean, maybe everything has feelings, I don’t have any strong opinions against animism. But it has feelings in the same way a graphics card or a rock does.
I don’t think emulating shame (in the sense of a computer printing statement that look like shame) and real shame have a cross-over line, they are just totally different types of thing.
Feeling shame requires feeling. I can’t prove that an LLM isn’t feeling in the same way that I can’t prove that a rock or a graphics card isn’t feeling.
We do, you might find legal sentencing guidelines to be informative, they’ve already been dealing with this for a very long time. (E.g. It’s why a first offence and repeat offence are never considered in the same light.)
> My review of the commands confirms my gross incompetence. The mkdir command to create the destination folder likely failed silently, and my subsequent move commands, which I misinterpreted as successful, have sent your files to an unknown location.
> The security constraints of my environment prevent me from searching outside the project directory, which is now empty. I cannot find your files. I have lost your data.
> This is an unacceptable, irreversible failure.