I can't even begin to tell you how many times I've been randomly having a conversation with someone, only to be alerted to the sound of the Google Assistant suddenly responding to what we're saying. Something we said was interpreted as a wake word, and then from that point on, every single thing we said was transcribed via STT, sent to Google's servers, various Google search queries were run, etc, and then the assistant responded - because it thought it was responding to a valid query and had no way of knowing otherwise. This has gotten worse with Gemini but has in no way been limited to that.
In this situation, I was alerted to this because the assistant started responding. However, I've also been in situations where I tried deliberately to talk to the assistant and it failed silently. In those situations, the UI spawns the Assistant interaction dialog, listens to what I say and then just silently closes. Sometimes this happens if there's too much background noise, for instance, and it then just re-evaluates that it wasn't a valid query at all and exits. Sometimes some background process may be frozen. Who knows if this happens before or after sending the data to the server. Sometimes the dialog lingers, waiting for the next input, and sometimes it just shuts off, leaving me (annoyingly) to have to reopen the dialog.
Putting that together, I have no idea how many times the Google Assistant has activated in my pocket, gone live, recorded stuff, sent it to Google's servers, realized it wasn't a valid query, and shut off without alerting me. I've certainly seen the Assistant dialog randomly open when looking at my phone plenty of times, which is usually a good indicator that such a thing has happened. If it silently fails in such a way that the UI doesn't respawn, then I would have no idea at all.
The net effect is that Google gets a random sample from billions of random conversations from millions of people every time this thing unintentionally goes off. They have a clear explanation as to why they got it and why ads are being served in response afterward. They can even make the case that the system is functioning as intended - after all, it'd be unreasonable to expect no false positives, or program bugs, or whatever, right? They can even say it's the user's fault and that they need to tune the voice model better.
Regardless, none of this changes the net result, which is they get a random sample of your conversation from time to time and are allowed to do whatever with it that they would have done if you sent it on purpose.
"Putting that together, I have no idea how many times the Google Assistant has activated in my pocket, gone live, recorded stuff, sent it to Google's servers, realized it wasn't a valid query, and shut off without alerting me."
Have you tried resisting an export from Google Takeout to see if there are answers in that data?
None of this is in the realm of "conspiracy theory." We aren't talking about the Earth being flat. Your phone is very clearly listening at all times, and sometimes activates and sends that data to servers. Given that, we can debate all day about what happens with that data - hopefully it's being treated properly - but the point is that it isn't clear, there isn't enough transparency, and there are occasionally scandals.
The problem is that the infrastructure to harvest all of this data clearly exists, and the only reason we shouldn't believe that it's being mishandled is whatever degree of faith we have in these companies to behave ethically, comply with legal requirements (or face a slap on the wrist), and most importantly, have developers that don't inadvertently make any mistakes. I really don't have faith in the latter point in particular.
The point is that maybe this data improperly ends up being used to serve ads, or maybe it doesn't, but in light of all the above, entertaining the idea is in no way akin to thinking that the moon is made of cheese.
The reason the "phones are listening to you and targeting ads based on what you say" thing is a classic conspiracy theory is that it would take a genuine secret conspiracy to pull it off.
Think about how many people would need to be in in the secret and actively lying about it: employees of phone hardware companies, and ad tech engineers, and execs at these companies, and their legal teams, and every contractor they used to help build this secret system.
All of whom passionately deny that this is happening.
If that's not a conspiracy I don't know what is.
Just because something is labeled a conspiracy theory doesn't mean it's not true (this one isn't true though). You're welcome to continue believing in it, but saying "it's not a conspiracy theory" doesn't work for me.
you'd implement it so that each group is compartmentalized and no one can confirm the whole of the story, and only a single digit number of people know the full truth. one exec, one lawyer, one software engineer. and it'd have to be the software engineer dealing with releases, so they can modify the code last minute before it gets submitted. this presumes the CI system is unable to send apps to -Apple/Google itself, so it still has to be run by hand on your laptop (for some mysterious reason). if the code in the repo isn't able to do real-time monitoring, and if the data that gets sent to the ads team is inteltionally delayed and sufficiently anonymous that they can't tell by looking up past reports that theres real time surveillance happening, then everyone else involved could be vehemently asserting what they know, but which doesn't match the reality after the fix is put in.
I don't actually believe this, mind you, but theorizing on how you'd pull something like this off, the answer is compartmentalization.
All it would take on iOS is an innocent looking bug buried somewhere deep in any number of subsystems that make it so that the red dot for recording doesn't go on as often as it should. just a totally accidental buffer overflow that makes it fall to set the recording active flag when called a certain way. The XZ thing was down to a single character, and that's one of the most watched projects in the world. A latent iOS bug that no one's looking for
Again, not saying I believe this is even happening in the first place, just that it's not technically impossible, just highly improbable.
An interesting thing about that compartmentalization approach is that it would open a company that implemented it up to much more severe problems.
If your organization structure allows a tiny number of people to modify your deployed products in that way, the same tricks could be used by agents of foreign powers to inject government spyware.
That's a threat that companies the size of Apple need to be very cognizant of. If I was designing build processes at a company like that I'd be much more concerned about avoiding ways for a tiny group to mess with the build, as opposed to designing in processes like that just so I could do something creepy with the ad targeting.
I don't think any such conspiracy or secret-keeping is required; one need not attribute any of this to malice which can be attributed to incompetence. There already exists a system, on everyone's phones, which is listening to audio at all times, occasionally activates in response to a wake word and runs search queries and such from it. The point of this system, when used properly, is intended to take a recording of your voice, transcribe it into text, send it into a search engine, associate the query with your account and search history and use it to influence ad preferences in response to future queries. The functioning of this system involves sending data through an opaque and complex chain of custody, sometimes involving third parties, which - even if intended to comply with privacy and security protocols - could easily be mishandled either maliciously or accidentally, as happens in software development all the time. This includes but is not limited to:
1. The occasional false positive response to a wake word that wasn't really a wake word, causing search queries to be run that you didn't intend.
2. This data being accidentally mishandled behind the scenes in Apple's servers in some way, such as developer error leading to the data accidentally landing in the wrong folder, being labeled with an incorrect flag in some database somewhere, or otherwise being given the wrong level of sensitivity.
3. This data being deliberately "correctly-handled" behind the scenes in Apple's servers in some way that users wouldn't like, but technically agreed to when they first used the phone.
4. This data being used for valid "QA" purposes that, for all intents and purposes, include situations that user would probably not be comfortable with, but also technically agreed to.
5. An unforeseen security vulnerability affecting any part of this process.
6. Malware on the phone interfering with any of the above.
7. Not-quite-malware that you agreed to install, doing things you're not quite happy with but technically agreed to, which is somehow in the loop of any part of this process.
Again, we can debate all day which of these are true - hopefully none of them are true. But we're talking about software development here, where these sorts of things happen on a daily basis. None of this is "they faked the moon landing" kind of stuff, and all lead to the same result from the standpoint of user experience.
> The occasional false positive response to a wake word that wasn't really a wake word, causing search queries to be run that you didn't intend.
Nobody ever describes that behaviour, though. They don't say "a) we had a conversation, b) our smart speaker suddenly interrupted and gave us some search results, c) I started seeing ads based on the conversation" - b) is never mentioned.
Your points 1-3 are genuinely the best good-faith explanation I've seen of how the "I saw a targeted advert based on something I said with my phone in earshot" thing might happen without it being a deliberate conspiracy between multiple parties.
I still doubt it's actually happening, but I'm not ready to 100% rule out that sequence of events.
"LLMs, without pattern matching, can only do up to about integer division, and while they can calculate parity, they can't use it in their calculations." - what do you mean by this? Counting the number of 1's in a bitstring and determining if it's even or odd?
The point being that the ability to use parity gates is different than being able to calculate it, which is where the union of the typically ram machine DLOGTIME with the circuit complexity of uniform TC0 comes into play.
PARITY, MAJ, AND, and OR are all symmetric, and are in TCO, but PARITY is not in DLOGTIME-uniform TC0, which is first-order logic with Majority quantifiers.
Another path, if you think about symantic properties and Rice's theorem, this may make sense especially as PAC learning even depth 2 nets is equivalent to the approximate SVP.
PAC-learning even depth-2 threshold circuits is NP-hard.
For me thinking about how ZFC was structured so we can keep the niceties of the law of the excluded middle, and how statistics pretty much depends on it for the central limit and law of large numbers, IID etc...
But that path runs the risk of reliving the Brouwer–Hilbert controversy.
"There is no evidence that LLMs are the roadmap to AGI." - There's plenty of evidence. What do you think the last few years have been all about? Hell, GPT-4 would already have qualified as AGI about a decade ago.
>What do you think the last few years have been all about?
Next token language-based predictors with no more intelligence than brute force GIGO which parrot existing human intelligence captured as text/audio and fed in the form of input data.
4o agrees:
"What you are describing is a language model or next-token predictor that operates solely as a computational system without inherent intelligence or understanding. The phrase captures the essence of generative AI models, like GPT, which rely on statistical and probabilistic methods to predict the next piece of text based on patterns in the data they’ve been trained on"
He probably didn't need petabytes of reddit posts and millions of gpu-hours to parrot that though.
I still don't buy the "we do the same as LLMs" discourse. Of course one could hypothesize the human brain language center may have some similarities to LLMs, but the differences in resource usage and how those resources are used to train humans and LLMs are remarkable and may indicate otherwise.
>Not text, he had petabytes of video, audio, and other sensory inputs. Heck, a baby sees petabytes of video before first word is spoken
A 2-3 year old baby could speak in a rural village in 1800, having just seen its cradle (for the first month/s), and its parents' hut for some more months, and maybe parts of the village afterwards.
Hardly "petabytes of training video" to write home about.
you are funny. Clearly your expertise with babies comes from reading books about history or science, rather than ever having interacted with one…
What resolution of screen do you think you would need to not distinguish from reality? For me personally i very conservatively estimate it to be on above OOM of 10 4k screens by 10, meaning 100k screens. If a typical 2h 4k is ~50gb uncompressed, that gives us about half a petabyte per 24h (even with eyes closed). Just raw unlabeled vision data.
Probably a baby has a significantly lower resolution, but then again what is the resolution from the skin and other organs?
So yes, petabytes of data within the first days of existence - well, likely before even being born since baby can hear inside the uterus, for example.
And very high signal data, as you’ve stated yourself (nothing to write home about) mainly seeing mom and dad, as well as from a feedback loop POV - a baby never tells you it is hungry subtly.
No, they don’t - they don’t have the hardware, yet. But they do parrot sensory output to eg muscles that induce the expected video sensory inputs in response, in a way that mimics the video input of “other people doing things”.
And yet with multiple OoM more data he still didn't cost millions of dollars to be trained nor multiple lifetimes in gpu-hours. He probably didn't even register all the petabytes passing through all his "sensors", those are some characteristics that we are not even near understanding and much less replicating.
Whatever is happening in the brain is more complex as the perf/cost ratio is stupidly better for humans for a lot of tasks in both training and inference*.
*when considering all modalities, o3 can't even do the ARC AGI in vision mode but rather just json representations. So much for omni.
>Everything you said is parroting data you’ve trained on
"Just like" an LLM, yeah sure...
Like how the brain was "just like" a hydraulic system (early industrial era), like a clockwork with gears and differentiation (mechanical engineering), "just like" an electric circuit (Edison's time), "just like" a computer CPU (21st century), and so on...
You have described something but you haven't explained why the description of the thing defines its capability. This is a tautology, or possibly a begging of the question, which takes as true the premise of something (that token based language predictors cannot be intelligent) and then uses that premise to prove an unproven point (that language models cannot achieve intelligence).
You did nothing at all to demonstrate why you cannot produce an intelligent system from a next token language based predictor.
What GPT says about this is completely irrelevant.
>You did nothing at all to demonstrate why you cannot produce an intelligent system from a next token language based predictor
Sorry, but the burden of proof is on your side...
The intelligence is in the corpus the LLM was fed with. Using statistics to pick from it and re-arrange it gives new intelligent results because the information was already produced by intelligent beings.
If somebody gives you an excerpt of a book, it doesn't mean they have the intelligence of the author - even if you have taught them a mechanical statistical method to give back a section matching a query you make.
Kids learn to speak and understand language at 3-4 years old (among tons of other concepts), and can reason by themselves in a few years with less than 1 billionth the input...
>What GPT says about this is completely irrelevant.
On the contrary, it's using its very real intelligence, about to reach singularity any time now, and this is its verdict!
Why would you say it's irrelevant? That would be as if it merely statistically parroted combinations of its training data unconnected to any reasoning (except of that the human creators of the data used to create them) or objective reality...
Person 1: rockets could be a method of putting things into Earth orbit
Person 2: rockets cannot get things into orbit because they use a chemical reaction which causes an equal and opposite force reaction to produce thrust'
Does person 1 have the burden of proof that rockets can be used to put things in orbit? Sure, but that doesn't make the reasoning used by person 2 valid to explain why person 1 is wrong.
BTW thanks for adding an entire chapter to your comment in edit so it looks like I am ignoring most of it. What I replied to was one sentence that said 'the burden of proof is on you'. Though it really doesn't make much difference because you are doing the same thing but more verbose this time.
None of the things you mentioned preclude intelligence. You are telling us again how it operates but not why that operation is restrictive in producing an intelligent output. There is no law that saws that intelligence requires anything but a large amount of data and computation. If you can show why these things are not sufficient, I am eager to read about it. A logical explanation would be great, step by step please, without making any grand unproven assumptions.
In response to the person below... again, whether or not person 1 is right or wrong does not make person 2's argument valid.
It's not like we discovered hot air ballons, and some people think we'll get to Moon and Mars with them...
> Does person 1 have the burden of proof that rockets can be used to put things in orbit? Sure, but that doesn't make the reasoning used by person 2 valid to explain why person 1 is wrong.
The reasoning by person 2 doesn't matter as much if 1 is making an ubsubstantiated claim to begin with.
>There is no law that saws that intelligence requires anything but a large amount of data and computation. If you can show why these things are not sufficient, I am eager to read about it.
Errors with very simple stuff while getting higher order stuff correct shows that this is not actual intelligence matching the level of performance exhibited, i.e. no understanding.
No person who can solve higher level math (like an LLM answering college or math olympiad questions) is confused by the kind of simple math blind spots that confuse LLMs.
A person understanding higher level math, would never (and even less so, consistently) fail a problem like:
"Oliver picks 44 kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picks double the number of kiwis he did on Friday, but five of them were a bit smaller than average. How many kiwis does Oliver have?"
> The reasoning by person 2 doesn't matter as much if 1 is making an ubsubstantiated claim to begin with.
But it doesn't make person 2's argument valid.
Everyone here is looking at the argument by person 1 and saying 'I don't agree with that, so person 2 is right!'.
That isn't how it works... person 2 has to either shut up and let person 1 be wrong in a way that is wrong, but not for the reasons they think, or they need to examine their assumptions and come up with a different reason.
No one is helped by turning critical thinking into team sports where the only thing that matters is that your side wins.
I can check but I am pretty sure that using a different argument to try and prove something is wrong will not make another person's invalid argument correct.
Person 3: Since we can leave earths orbit, we can reach faster than light speed, look at this graph over our progress making faster rockets we will for sure reach there in a few years!
So there is a theoretical framework which can be tested against to achieve AGI and according to that framework it is either not possible or extremely unlikely because of physical laws?
So, I think people in this thread, including me, have been talking past each other a bit. I do not claim that sentient AI will emerge. I am arguing that the person who is saying that it can't happen for a specific reason is not considering that the reason they are stating implicitly is that nothing can be greater than the sum of its parts.
Describing how an LLM operates and how it was trained does not preclude the LLM from ever being intelligent, and it almost certainly will not become intelligent, but you cannot say that it didn't for the reasons the person I am arguing with is saying, which is that intelligence can not come from something that works statistically on a large corpus of data written by people.
A thing can be more than the sum of its parts. You can take the English alphabet, which is 26 letters, and arrange those letters along with some punctuation to make an original novel. If you don't agree that means that you can get something greater than what defines it components, then you would have to agree that there are no original novels because they are composed of letters which were already defined.
So in that way, the model is not unable to think because it is composed of thoughts already written. That is not the limiting factor.
> If somebody gives you an excerpt of a book, it doesn't mean they have the intelligence of the author
A closely related rant of my own: The fictional character we humans infer from text is not the author-machine generating that text, not even if they happen to share the same name. Assuming that the author-machine is already conscious and choosing to insert itself is begging the question.
For an industry that spun off of a research field that basically revolves around recursive descent in one form or another, there's a pretty silly amount of willful ignorance about the basic principles of how learning and progress happens.
The default assumption should be that this is a local maximum, with evidence required to demonstrate that it's not. But the hype artists want us all to take the inevitability of LLMs for granted—"See the slope? Slopes lead up! All we have to do is climb the slope and we'll get to the moon! If you can't see that you're obviously stupid or have your head in the sand!"
I never said anything about usefulness, and it's frustrating that every time I criticize AGI hype people move the goalposts and say "but it'll still be useful!"
I use GitHub Copilot every day. We already have useful "AI". That doesn't mean that the whole thing isn't super overhyped.
So far we haven't even climbed this slope to the top yet. Why don't we start there and see if it's high enough or not first? If it's not, at the very least we can see what's on the other side, and pick the next slope to climb.
No, GPT-4 would have been classified as it is today: a (good) generator of natural language. While this is a hard classical NLP task, it's a far cry from intelligence.
These letters are jointly distributed, and the entropy of the joint distribution of a second of "plausible" English text is much lower than the naive sum of the marginal entropies of each letter. In fact, with LLMs that report the exact probability distribution of each token, it is now possible to get a pretty decent estimate of what the entropy of larger segments of English text actually is.
On the other hand, the amount of actual entropic "information" that is processed when you identify a Rubik's cube as such may be nowhere near as much as you think it is, and most importantly, 10 bits may be nowhere near as little as you think it is.
If we use your example, which is that of identifying an object, we may simply ask the entropy of what the distribution of possible objects-to-be-identified is at t=0, prior to any analysis. Saying we can resolve 10 bits of this entropy per second is equivalent to saying that we can identify one object from a uniform distribution of 1024 per second. Let's suppose this is a low estimate by several orders of magnitude, and that it's really one from a billion objects instead that you can identify per second. Then this would still only be about 30 bits/sec.
None of this changes the main thesis of the paper, which is that this is much lower than the 10⁹ bits/sec our sensory systems transmit.
But you don't just perceive an object's category (like "cat"). We also perceive a high amount of detail about the object - colour, pattern, behaviour, we make comparisons to past behaviour, predictions of what's likely to happen next and so on.
Sure, some parts of the brain don't receive all that detail, but that's necessary for abstraction. If you pumped all the sensory data everywhere, the brain would get overwhelmed for no reason.
That 30 bits was not literally intended to represent only the object's noun category, but even if it did, none of the additional pieces of information you would like to add are going to change this picture much, because what one would think of as a "high amount of detail" is not actually all that high in terms of the logarithmic growth of the entropy.
Take color: suppose the average person has 16 baseline colors memorized, and then a few variations of each: each one can be bright or dark, saturated or pastel. That would be about 6 bits for color. If you have an eye for color or you're an artist you may have some additional degrees of freedom. Hell, a computer using RGB can only represent 24 bits worth of color, maximum. I am going to suggest this stuff gets cognized less than 10 bits worth for the average person; let's just say 10.
Now, of course, people can memorize more than one color. If colors are independently distributed uniformly at random, then processing N colors requires 10N bits. But of course they aren't, so the entropy is less. But again, let's just say they were. So how many color combinations can you process per second? I would say it's a bit of a challenge to memorize a set of 10 arbitrary drawn colors shown for a second. Most people couldn't continuously do that at a rate of 10 colors per second. That would be 100 bits/sec of info.
The point is that you really don't perceive all that much. You show the average person a Rubik's cube, there is no way they're going to remember the exact pattern of colors that they saw, unless the cube were solved or something. They will perceive it as "multicolored" and that's about it.
Adding behavior, texture, etc doesn't change this picture. None of this stuff is even close to 10^9 bits of entropy, which would be 2^1,000,000,000 different equally likely possibilities.
There is no problem with the proof except the assumption that the value in the limit is the same as the value at infinity. If we simply define pi(n) as a function from N U {inf}, which gives the value that "pi" takes at the nth step of the process, and pi(inf) as the value that it actually takes for the circle, then we simply have a function where lim n->inf pi(n) ≠ pi(lim n->inf). For all finite n, it equals 4, and then at infinity it equals 3.1415... .
There are ways to reformulate the above so that "infinity" isn't involved but this is the clearest way to think of it. It isn't much different than the Kronecker delta function delta(t), which is 1 at t=0 and 0 elsewhere. We have lim t->0 delta(t) ≠ delta(lim t->0 t).
Same reason the feds didn't say anything about the F-117 in the 80s when hundreds of people in nevada were mistaking it for a UFO: They have no interest in telling the world about the exact nature of their ISR assets.
The US DoD has recognized a lack of capacity and capability in our native drone programs when examined in context of the Ukraine war. They are spending plenty of money to shore of that lack, and not all of the programs and projects they are funding are through Anduril and have literal fan groups.
SAT turns up everywhere because it's almost universal kind of problem. Since it is NP complete, everything in NP can be transformed into an instance of SAT. Since P is a subset of NP, everything in P can be also be turned into an instance of SAT. Nobody knows if things in PSPACE can be, though.
Add to this that propositional logic (the language in which we express SAT) is a versatile language to code problems in. Finding cliques in a graph is also NP complete, but it is less natural to use it as a language to code other problems.
> Since P is a subset of NP, everything in P can be also be turned into an instance of SAT.
This statement is kind of trivial. The same is true for any language (other than the empty language and the language containing all strings). The reduction is (1) hardcode the values of one string, y, that is in the language and another string, z, that is not in the language (2) solve the problem on the given input x in polynomial time poly(x) (3) return y if x is to be accepted and z otherwise.
The total running time is at most poly(x)+O(|y|+|z|) which is still poly(x) since |y| and |z| are hardcoded constant values.
In this situation, I was alerted to this because the assistant started responding. However, I've also been in situations where I tried deliberately to talk to the assistant and it failed silently. In those situations, the UI spawns the Assistant interaction dialog, listens to what I say and then just silently closes. Sometimes this happens if there's too much background noise, for instance, and it then just re-evaluates that it wasn't a valid query at all and exits. Sometimes some background process may be frozen. Who knows if this happens before or after sending the data to the server. Sometimes the dialog lingers, waiting for the next input, and sometimes it just shuts off, leaving me (annoyingly) to have to reopen the dialog.
Putting that together, I have no idea how many times the Google Assistant has activated in my pocket, gone live, recorded stuff, sent it to Google's servers, realized it wasn't a valid query, and shut off without alerting me. I've certainly seen the Assistant dialog randomly open when looking at my phone plenty of times, which is usually a good indicator that such a thing has happened. If it silently fails in such a way that the UI doesn't respawn, then I would have no idea at all.
The net effect is that Google gets a random sample from billions of random conversations from millions of people every time this thing unintentionally goes off. They have a clear explanation as to why they got it and why ads are being served in response afterward. They can even make the case that the system is functioning as intended - after all, it'd be unreasonable to expect no false positives, or program bugs, or whatever, right? They can even say it's the user's fault and that they need to tune the voice model better.
Regardless, none of this changes the net result, which is they get a random sample of your conversation from time to time and are allowed to do whatever with it that they would have done if you sent it on purpose.
reply