The R1 paper (https://arxiv.org/pdf/2501.12948) emphasizes their success with reinforcement learning without requiring any supervised data (unlike RLHF for example). They note that this works well for math and programming questions with verifiable answers.
What's totally unclear is what data they used for this reinforcement learning step. How many math problems of the right difficulty with well-defined labeled answers are available on the internet? (I see about 1,000 historical AIME questions, maybe another factor of 10 from other similar contests). Similarly, they mention LeetCode - it looks like there are around 3000 LeetCode questions online. Curious what others think - maybe the reinforcement learning step requires far less data than I would guess?
Awesome post and thanks for writing this out - probably the most insightful piece I’ve read on plausible origin of life through pre-RNA autocatalytic peptides. Would you be willing to share a contact email / online profile? (could edit afterward to delete if you are worried about spam from crawlers)
To summarize: localize defect contamination to a very small unit size, by making the cores tiny and redundant.
Analogous to a conglomerate wrapping each business vertical in a limited liability veil so that lawsuits and bankruptcy do not bring down the whole company. The smaller the subsidiaries, the less defect contamination but also the less scope for frictionless resource and information sharing.
Sometimes people don’t care, but often they are just unaware because there is no mechanism for feedback to make its way to them after they have designed the thing. Whoever designed that bike ramp probably designed a thousand other road features, lives many miles away, and never communicates with the people that handle injury reports; he knows none of the visceral details that you see every day in your specific corner of the universe.
One day LLMs may replace traditional search engines… in the meantime Google has built a $2 trillion business on specialized engineering and millions of human-designed features and optimizations.
The Bitter Lesson is an elegant asymptotic result. But from a business perspective it pays to distinguish problems that general deep learning approaches will disrupt in 1 year vs. 5 years vs. 30 years.
> Some researchers have challenged parts of this picture, however; a 2024 study, for example, suggested waste clearance is actually faster during waking than during sleep.
When you talk to neuroscientists and researchers in private you often find that they are far less confident than public personas, PR, articles, or science reporting, make them sound. A lot of their findings are really more like "huh, that's weird. We should look at this more." What seems like a ton of consensus at cruising altitude is actually much more divisive as you approach ground level. The more recent emergence of the idea that neuroscientists are a kind of "super scientist" of human behavior (all self-help books are now "neuroscience"-based now, for example) has also made them seem much more certain about certain things than they actually are.
As a neuroscientist, yes. But this is true of most science & medicine news written for a general audience. I have to tell my parents to stop reading "Chocolate is a superfood / Chocolate causes cancer" articles every year.
Uncertainty isn't good for engagement, even if it's correct.
Nonlinear (and inverting!) response curves for drug dosimetry that's population and person specific. Especially when there's a temporal delay. Trying to explain grapefruit to my elderly parents is like pulling teeth.
> Trying to explain grapefruit to my elderly parents is like pulling teeth.
It is not that hard. It is a fruit roughly the same size and appearance as an orange, but more bitter. See! I explained it. :) Joking asside, what are you trying to explain about grapefruit to your elderly parents? Is it the weird way it interacts with certain medicines?
Here is a great review by Bailey and colleagues who discovered the risks of grapefruit in drug metabolism. It is from 2013. This is a GREAT example if genotype-by-drug interaction and why knowing your genotype can be extremely useful.
>Our research group discovered the interaction between grapefruit and certain medications more than 20 years ago [1990s].1–3 Currently, more than 85 drugs, most of which are available in Canada, are known or predicted to interact with grapefruit. This interaction enhances systemic drug concentration through impaired drug metabolism.
Many of the drugs that interact with grapefruit are highly prescribed and are essential for the treatment of important or common medical conditions. Recently, however, a disturbing trend has been seen. Between 2008 and 2012, the number of medications with the potential to interact with grapefruit and cause serious adverse effects (i.e., torsade de pointes, rhabdomyolysis, myelotoxicity, respiratory depression, gastrointestinal bleeding, nephrotoxicity) has increased from 17 to 43, representing an average rate of increase exceeding 6 drugs per year. This increase is a result of the introduction of new chemical entities and formulations.
It can potentially double or triple the effective dose you’ll absorb of many medications, so weird doesn’t adequately describe it. “Potentially life-threatening” is better.
I heard this before. (Perhaps even on hacker news.) The way i stored it in my brain is “grapefruit is weird, don’t eat or drink grapefruit juice when on medication. Why? It might make you ill or even kill you.”
Obviously that form is ovesimplified. But since I’m not a pharmacist, nor a doctor I can allow this simplification for myself because it “fails-safe”. That is it might make me refrain from eating grapefruit in a situation where I could safely do so, but it will save me from eating grapefruit in situations where it is not safe. It would be harder if I would need to remember that I must eat grapefruit in some situations and can’t eat in other situations.
The reason why I’m saying this is because this is how I would approach explaining this to someone. By oversimplifying to the point where the safe story is easy to remember. People already understand that they can’t mix alcohol and certain medications. So it is just one more thing you can’t mix with medication.
Have you heard the one about lychee? There is some compound in them which can lower your blood glucose levels. The sugar from digesting the fruit will eventually push up your sugar levels, so it all ends up okay eventually, and the drop is relatively small. But if you already have a low blood sugar level, and eat a lot of lychee it can maybe even kill you. So don’t eat them for an empty stomach just to stay on the safe side.
That is my other “tasty fruit with a weirdly dangerous rare side-effect” fact.
"The intensity comes down to how grapefruit's compounds mainly furanocoumarins inhibit CYP3A4. Normally, this enzyme would help break down psilocybin into psilocin and metabolize it out of your system. When CYP3A4 is inhibited, more psilocybin remains in your bloodstream longer, allowing for a higher concentration of psilocin to flood your brain at once. This rapid spike in active compound could overwhelm your serotonin receptors, particularly the 5-HT2A receptors, which are the primary target of psilocin." - Found that on erowid years ago, always thought it sounded silly (yet I copied it down ha) - maybe worth giving a go.
Another sign "big Carob" has taken over the health industry, pushing its "kids, it's healthier" lie. Chocolate, Wine and Cheese are the three vital food groups.
You can make a fantastic "mud-slide" with cheap ice cream and premade custard, and of course vodka, kahlua and baileys. So yes, I will admit ice cream to the pantheon. If you go fancy-schmancy and get gelati then I'll even admit fruit, as long as it's a strawberry daquiri
As a researcher myself, I really dislike that this is even a thing. I constantly have friends send me articles asking what I think about things (frequently the answer is "I have no idea" and/or "the paper says something different").
I'm livid about this because this erodes public trust in science. Worse, people don't see that connection...
I don't understand how major news publications can't be bothered to actually reach out to authors. Or how universities themselves will do that and embellish work. I get that it's "only a little" embellishment, but it's not a slippery slope considering how often we see that compound (and how it is an excuse rather than acting in good faith). The truth is that the public does not understand the difference in the confidence levels of scientists for things like "anthropomorphic climate change" vs "drinking wine and eating chocolate is healthy for you." To them, it's just "scientists" who are some conglomerate. It is so pervasive that I can talk to my parents about something things I have domain expertise and written papers on and they believe scientists are making tons of money over this while I was struggling with student debt. I have to explain when I worked at a national lab isn't full of rich people[0]. There's a lot of easier ways to make money... And my parents, each, made more than any of the scientists I knew...
[0] People I know that have jumped ship and moved from lab to industry 2x-3x their salary (these are people with PhDs btw).
[Side note]: I wish we were able to be more honest in papers too. But I have lots of issues with the review system and the biggest is probably that no one wants to actually make any meaningful changes despite constant failure in the process and widespread frustration.
Yes. My dad cites the vaccilation of the science on whether or not eggs are healthy as the reason he places no trust in what scientists say. Of course, he is mistaking science reporting for the science itself, but he should be able to trust the media to accurately communicate the scientific consensus. It's one thing when we're talking about eggs or chocolate, but now he's skeptical of science reporting on global warming and COVID-19. And is that unreasonable? Why should he put a high credence in any science reporting going forward? He's a construction worker. He's not equiped to go read the journal articles for himself.
Of course, we haven't even touched on the replication crisis, of which thankfully my dad is blissfully unaware.
A funny one was I was visiting my parents for the holidays. They have the news on and it is talking about that $2T funding cuts. It shows a bunch of examples with like $50m to this, $500k to that, and then a random $10k to "youth break dancing." And I'm like "why is that there? Seems like rage bait. Why not list more big ticket items? It's $2T, $10k is a nothing at that scale." You guessed right, this started a fight. I wasn't even trying to say that the thing should be funded (though I have no issues. Sounds like just a recreational program. Who cares? It's not even "pennies").
A lesson I continually fail to learn is that it isn't about the actual things. Information is a weapon to many people. Not a thing to chase, to uncover, to discover, but a thing that is concrete and certain. I still fear the man that "knows", since all I can be certain of is that he knows nothing.
That's because TV news is an entertainment program similar to a reality TV show. It's designed to enrage you with terrible news even when the reality is quite different and things are not that dire. When I realized it I stopped watching it.
I've also stopped watching TV entirelly when they shutdown analog TV as that was a nice natural ending of the traditional TV. Continuing required a repeated action (buying MPEG2 tuners when it was obvious that newer codecs will be used in a few years) and it was very easy to just do nothing instead.
>and then a random $10k to "youth break dancing." And I'm like "why is that there? Seems like rage bait.
This, but also the implication is that if they'll waste $10k on break dancing, then they must be wasting similar amounts on thousands of other programs and the leave it for the reader to imagine what those might be and get angry about whatever they made up in their mind.
How many such "pennies" projects were there? It could be that your parents understand that you can waste a lot of money throwing dollar bills out of the window. Most of the time these bills are full of waste such that the sum total of all such projects can equal the threshold of what you might think of as "real money".
I think you are failing to understand how much $2T actually is. That's 200m things you'd need to add up. That's nearly the population of America. It's more than the population of California, Texas, Florida, and New York combined. More than the number of square miles in California. If you started at the southern border of California, you could place a $10 bill side by side (the short side!) while traveling north you'd get more than half way through Oregon! [0] That's the number of things you need to add up, which is still far short of $2T
I cannot express how large of sum of money this is. Only that if you think you understand you're wrong (I do not understand). If you are indeed correct and a large portion of the money is composed of things costing under $1m then we're seriously SERIOUSLY fucked. Because a trillion is a million million.
As with most things, I have extremely high confidence that follows a power distribution (go look up Pareto). So there's going to be thousands of things more than $100m in cost.
So I'm extremely confident that if you're picking 10 examples to show on screen and of any single one of those values is under $10m then you're either grossly incompetent or showing rage bait. Because even $10m is chump change when we're talking about these numbers. You certainly aren't providing any meaningful communication to your viewers. At best you're accidentally grossly misrepresenting the problem. While I'm a fan of Hanlon's Razor, extreme incompetence in your domain of expertise is no different from malice.
[0] a bill is 156mm x 66.3mm. California is 1,220km long. 1220000/0.063=19,365,079. Oregon is 580km long. $2T is so much fucking money you could place $100 bills side by side along the length of California and if you did this a thousand times you'd be $63.5bn shy of $2T. If you did it 1033 times you still be $1.5bn short.
I don't need to look up the Pareto Principle and I am well aware of the scale of 2 trillion dollars. I don't need a bunch of condescending examples about, thanks. As far as journalism goes, I call it a good thing to highlight to tax-payers wasteful spending and perhaps other people think responsible spending by the government is a good thing will as well. This is quite common the numbers are not always at the scale of trillions. Sometimes, you find a city, perhaps like Chicago, wants to raise a billion dollar property tax, but has spent 400 million dollars on supporting illegal immigration, largely to one corporation, for example.
Just because I'm giving examples does not mean they are condescending. I am helping put into terms the scale of these numbers. That even my examples are not something you can actually conceptualize. Honestly, I can't even imagine a dollar side by side spanning a few city blocks.
I agree there's massive amounts of wasteful programs at that scale. But massive amounts of them gets you nowhere close to the actual number. That's the point.
I'm not sure that's even enough. I have a physics degree and all I've learned is that my stupid brain can't understand those numbers except through equations and relations. I think this is the best you can get.
Money is also very weird since it compounds. Both large wealth and government money are completely different classes, as well as different from each other. People try to think of both those versions of money in relation to how they use money themselves. Which just leads to misunderstandings.
Correct. But it was impossible to convince them that the premise of the complaint was different from what they wanted to hear. Tribalism makes people want to fight.
> The truth is that the public does not understand the difference in the confidence levels of scientists for things like "anthropomorphic climate change" vs "drinking wine and eating chocolate is healthy for you."
The problem is that without science and stem foundations it is very hard for people to even understand what is and isn’t known.
My mum used to send me all kind of articles about chakras. “there are kids born now who have their sixth chakra open” and “this chakra is orange, and that chakra is indigo”. One day my mom, my then girlfriend, and me were chatting about what specialisation my girlfriend is thinking about pursuing at medical school. She told her that she is thinking about specialising in endocrinology. My mom become really angry and cided us for using such “big words” to “lord our education over her”. So to placate her we explained that it is a doctor who studies hormones, measures hormone levels and treats diseases of the hormone system. She got visibly surprised and the only things she asked “you can measure hormones?”
The conversation continued of course but that question, and the genuine surprise on her face remained with me. The thing is, she trully did not know that we can measure hormones. And if you don’t know that hormones are as real as the legs of the table, and chakras are as real as santa claus, then they both sound equally plausible theories about health. And when you race the stories against each other “i’m not feeling well because my heart chakra is blocked, and I need healing crystals and massages to get well again” vs “i’m not feeling well because my thyroid gland is not producing enough thyroxine, and i need to take suplements in a pill form” then the first one wins because it is simpler and neater sounding. But one is kinda bulshit and the other is a real thing. But you won’t know that unless you understand that we can measure hormones, and nobody even has any idea what it would mean to measure a chakra.
Ok, but you make it sounds like the vast majority of people are under such impressions. A lot of people in this thread imply that having a STEM degree confers god-like insight over the plebes that don't have one; I think it is a bit of ironic arrogance. For instance, this thing about the "confidence levels of scientists" for climate change; there is a well-known issue with this myth of a "97% consensus" on that topic. As well, many people think critics of "climate policies" are rubes that "don't know teh science stuff" but actually most do accept humans have contributed to some climate change (but don't agree to the extent that lefts claim when they say migrants come to the US because of it) but rather they disagree about what to do about it.
> this thing about the "confidence levels of scientists" for climate change; there is a well-known issue with this myth of a "97% consensus" on that topic.
You're right, but what's your point? It's definitely above 80% and I've never met a climate scientist or someone working on adjacent research that doesn't believe it (given I've worked at national labs, that is a large number).
I do agree we shouldn't embellish things. That this actually undermines a point rather than strengthens it. If that's your point, I'm with you.
I also agree that a lot of stem people are too arrogant. This is especially common in CS and there's a strong negative stereotype that I think people should be aware of.
Honestly, it sounds like you strongly agree with my original statement. How science communication is doing serious harm to public perception of science. I'm not saying abandon education btw, just that we're doing a bad job when optimizing for views rather than strongly constraining for maintaining integrity and honesty
> My mom become really angry and cided us for using such “big words” to “lord our education over her”.
Sounds like a skill issue.
Joking aside, I think one of the most important things anyone can learn in life is being comfortable not knowing and being comfortable asking. As a researcher I don't understand other researchers who have such high confidence and are happy to claim they understand when a low threshold is obtained [0]. Personal I feel as I learn a subject I only get to "oh I think I might get this" to only then find a new rabbit hole. Though, personally, I enjoy this. It's why I research ¯\_(ツ)_/¯
I'm not sure how to teach that but I'm at least happy to say I don't know. Like how I totally forgot what an endocrinologist did until you said it lol. There's no shame in not knowing things. But I'm guessing your mom (like my parents) is very uncomfortable with not knowing things. In that case, any answer is better than no answer. Because knowing is valued over understanding. I do think there's social aspects that reinforce this, but I'm not sure what to do besides demonstrate my own stupidity lol
[0] I find people's judgement of understanding a subject is extremely subjective. Even when we remove the set of cases where it's being said to placate another or move a conversation along. I mean honest proclamations.
Also note that the medical field selects hard for people who can memorize information, to the exclusion of people who can understand systems. Those people, in turn, are the ones doing this research. This is likely a large part of why our knowledge of neuroscience is largely mechanistic and without a sense of the larger picture.
Compare to the invention of the perceptron, which took a joint effort between a polymathic neurophysiologist and a logician.
> the medical field selects hard for people who can memorize information, to the exclusion of people who can understand systems.
sounds similar to the problem with tech coding interviews. ive refactored the backend orchestration software of a SaaS company's primary app and saved 24tb of RAM, while getting 300% faster spinup times for the key part of the customer app, but i bomb interviews because i panic and mix up O(n) for algorithms and forget to add obvious recursion base cases. i know i can practice that stuff and pass, its just frustrating to see folks that have zero concept of distributed systems getting hired because they succeed at this hazing ritual.
but with that said, i suppose no industry or job will ever be free from "no true scottsman" gate-keeping from tenured professionals. hiring someone that potentially knows more than you puts your own job security at risk.
I moved from Physics and Engineering to CS and honestly, I found the interview process very odd. It is far more involved and time consuming of a process than the interviews in other fields.
In other fields, it is expected that if you can "talk the talk" you can "walk the walk." Mostly because it is really hard to talk in the right way if you don't have actual experience. Tbh, I think this is true about expertise in any domain. I don't think it is too hard to talk to a programmer about how they'd solve a problem and see the differences between a novice and a veteran.
A traditional engineering interview will have a phone screen and an in person interview. Both of which they'll ask you about a problem similar to one they are working on or recently solved. They'll also typically ask you to explain a recent project of yours. The point is to see how you think and how you overcome challenges, not what you memorize. Memorization comes with repetition, so it's less important. I remember in one phone interview I was asked about something and gave a high level answer and asked if it was okay for me to grab one of the books I had sitting next to me because I earmarked that equation suspecting it would be asked. I was commended for doing so, grabbed my book, and once I reminded myself of the equation (all <<1m?) gave a much more detailed response.
In a PhD level interview, you're probably going to do this and give a talk on your work. Where people ask questions about your work.
IMO the tech interviews are wasteful. They aren't great at achieving their goals and are quite time consuming. General proficiency can be determined in other ways, especially with how prolific GitHub is these days. It's been explained to me that the reason for all this is due to the cost of bad hires. But all this is expensive too, since you are paying for the time of your high cost engineers all throughout this process. If the concern is that firing is so difficult, then I don't think it'd be hard to set policy where new employees are hired in under a "probationary" or "trial" status. It shouldn't take months to hire someone...
No engineering interview would have you do anything like leetcode problems. No one is going to ask you to solve equations in front of them[0]. They will not give you take home tests or any of that. Doesn't matter if you're at a small startup or a big player like Boeing or Lockheed Martin.
The stereotypical software engineering interview is heavily leetcode dependent. It's why leetcode exists and they can charget $150/yr for people to just study it (time that could be spent on learning other things). I mean somewhere like Google you can have 3-6 rounds in the interviewing process.
[0] Maybe you'll use a board or paper to draw illustrations and help in your explanations, but you're not going to work out problems. No one is going to give you a physics textbook problem and say "Go".
First, when you write "engineering", I suppose that you mean all engineering disciplines except software engineering. If true, how do you compare candidates for material science or civil engineering or aero engineering or mechanical engineering roles?
I suppose that you will reply "talk to them" or "look at their experience". But, we have learned through squillions of posts here on HN, it just isn't enough. There are many charlatans that will slip through that type of interview process -- "great talker / good looking". This is the reason for the three-decade-long "arms race" in the technical interview process where programming problems become harder and harder over time. (Side comment: Does anyone think that people who can solve harder leetcode problems have a higher IQ? Controversially, on balance, I believe it to be true, and, thus, I think programming tests are a great way to filter for higher IQ candidates.)
If IQ were a universally appliable measure sure, but knowing how to tackle a specific kind of problem (Specially one not commonly used, and that you have to specifically train for) isn't the ideal measure either
LeetCode specifically is famously dissimilar to day to day programming, and therefore (probably) a bad measure for Quality
It's like saying that having a degree is better because it proves that you can follow trough
It's not necessarily wrong, but it's a tertiary situation that doesn't neccesarily correlate (Even if it might, mostly)
> But, we have learned through squillions of posts here on HN, it just isn't enough.
Idk, I can usually tell when someone is an actual engineer vs armchair expert. Actually building stuff requires you to think differently. It's like how someone that just does CAD often fights with machinists because they don't understand the physical limitations.
> There are many charlatans that will slip through that type of interview process
Of course. But that also happens with leetcode style interviews. You can memorize problems and that doesn't reflect your actual job performance. There's a saying "studying to the test"
The real question is the "optimization" question. Considering the time and cost of the interview process, along with the difficulty to remove bad employees, how effective is the interview process. You are optimizing for the best candidate but it's a constrained optimization problem. Otherwise you need infinite resources. So don't ignore the constraints. There's more that I haven't mentioned.
> Does anyone think that people who can solve harder leetcode problems have a higher IQ?
I know people who believe this. But I'm generally uninterested in anyone who has an obsession with IQ. So far it's been a fairly successful filter
I haven't interviewed for a tech job in over 10 years but I've never had a leetcode interview. Never been asked about algorithms, or put on the spot to answer how many manhole covers are there in New York City.
I've been asked about my former projects, my roles, what I liked or didn't like about them, how do I approach a new project, what did I find most interesting, etc.
I gather there are a lot of fakers in the software dev world. So maybe that's why more places try to make you prove you can actually write code.
Reaching for a book to answer, makes sense to me. That's what you'd do on the job, and nobody would think less of you for it.
I've seen that at BigCo, but that's the exception. Every other place strongly prefers a start date of ASAP, with O(weeks) from initial contact as a next-best option. If you state that you aren't available for months you probably won't be hired.
> concern is that firing is so difficult
There are lots of concerns.
Keep in mind, 95% of resumes are some sort of bot/scam, and 0.5% of the rest are actually at the skill level I'm looking for. There are lots of potential explanations, and I don't think it's that only 0.5% of developers are who I'm looking for (there are sampling biases, survivorship bias, and all sorts of things at play in that data), but from my position doing the screening and interviewing those are the stats I see.
1. Suppose you actually did hire everyone who passed a 1hr screen. You'd still have 20+ failed candidates before you found the right person. Even with a 2-week trial period, that's 3/4 of the year not having your projects properly staffed, a demoralizing experience for all their coworkers, and 3/4 of a SWE-year in wages and benefits lost.
2. Is it really fair to hire somebody if I know there's a 95% chance I intend to fire them? What if they have to move? What if they hadn't quit their old job till I accepted them? I suppose if somebody said they were confident in themselves and were willing to risk a trial period I might allow it, but the current set of social expectations is that once you're hired your employer will spend months at a bare minimum trying to make you successful, I'd want to be cautious with that sort of arrangement out of respect for the candidates.
3. Onboarding is even more expensive than it might seem since it sucks away your more senior talent for the training. If the cost of a bad engineer were just the normal day-to-day post-onboarding it wouldn't be _that_ terrible (you still have attrition and other knock-on effects to worry about), but having multiple onboarding sessions for a single hire (because of multiple trial periods) is the most expensive part of the process.
etc
> General proficiency can be determined in other ways, especially with how prolific GitHub is these days
I agree. Walking through a project with a candidate is one of my favorite interview sessions. They tend to be more comfortable, I tend to learn more, and I get to learn something about their technical communication on top of any coding knowledge.
Not everyone has a GH with anything interesting, so I make other interviews available for everyone, but my life is a little easier if public "proof" (till you talk to the candidate you really have no idea how much they know or who wrote what, but I thankfully haven't seen that problem yet in an interview) exists.
> Suppose you actually did hire everyone who passed a 1hr screen.
Good thing that's not what I suggested.
What I talked about is something people already do in other domains. We're not talking about something theoretical here. There's even other commenters saying that what I said was similar to how they were hired. So again, we're not talking about something theoretical.
This hasn't been my experience at all in medicine and science. I perhaps have more exposure to both science and medicine than most because I have an MD-PhD. Perhaps at the medical student level there is truth to this, but the physicians who are conducting clinical research are often at academic centers where they go specifically for the opportunity to do research. Academic centers almost universally pay much less than private hospitals. In my area, physicians make double salary in private practice over academia.
And this all ignores that the authors are PhD scientists. So I'm confused how this is categorized as "medical field" in the first place. I found that the ability to memorize is essentially useless in PhD level biological science (I studied immunology, so I can't necessarily speak to other fields), and it is all systems level conceptualizing.
I think this is a team with many talented people who came together to do their best. But I'm sure I'm naive. There seems to have been a lot of new interest and debate about what is happening in the glymphatics sphere.
I don't know how neuroscientists fare wrt knowledge of biology, chemistry, etc. that is relevant to their field, but the real problem is when they wade into philosophical waters without the requisite philosophical chops or background to do so [0].
Others can be guilty of similar sins, of course, and since the early 20th century, when philosophy and the classical liberal arts in general evaporated from school curricula, scientists have generally been quite poor at this, despite unwittingly treading into subject matters they are ill-prepared to discuss. Compare how a Schroedinger or a Heisenberg[2] talk about philosophical stuff, and then look at someone like Krauss [3]. The former may not have been great philosophical thinkers, but there is a huge difference in basic philosophical education and awareness, and these are not just isolated cases.
Sure, I agree that, when neuroscientists begin to wade into the realm of consciousness etc., they are wandering into a world they are unequipped to discuss. In my experience with my neurobiology colleagues, they are pretty dialed into their neurocircuits. I do have qualms with their experimental models on the behavioral end as a non-neuroscientist.
To really answer your question, I think I need to talk about the books modern day neuroscientists are writing and I have to say I simply agree. I think these self-help kind of books are not good! Too bad they are so easily propagated in the media.
Are you gate-keeping philosophy from neuroscientists? Are you not a fan of the Damasio or Paul and Patricia Kirkland? I don't know; i think you are a bit too dismissive here.
This is pretty true for MDs, but I don't know how true it is for PhDs.
The classic meme is that MDs love organic chemistry, but they hate biochemistry [1], because one is about memorization and the other is...less so, anyway.
But then again, neuroscientists do tend to love their big books of disjointed facts, so maybe it's more like medicine than I realize. I remember the one class I took on neuroscience was incredibly frustrating because of the wild extrapolations they were making from limited, low-quality data [2], that made it almost impossible to form a coherent theory of anything.
[1] ...except for the Krebs cycle! Gotta memorize that thing or we'll never be able to fix broken legs!
[2] "ooh, the fMRI on two people turned slightly pink! significant result!"
Also note that the medical field selects hard for people who can memorize information, to the exclusion of people who can understand systems
It's not impossible for people who are good in memorization to also be good in understanding systems.
Those people, in turn, are the ones doing this research.
Although common, it's not quite so that only people with a pure medical background do neuroscience.
All in all, having met quite some people in the field, the things you're hinting at never occurred to mee as an actual problem. My guess is because the people who actually have issues get weeded out very soon. Like: before even finishing their PhD. It's not an easy field.
>It's not impossible for people who are good in memorization to also be good in understanding systems
and it might not be "good at memorization" that's being selected, it might be "conscientiousness", one of the Big Five, and a relatively important parameter.
I don't really believe that. The real issue is that basic science in medicine is hard. You can't test a human in ways that might cause harm, which really limits how much investigation we can do. Ethics and morals also restrict what can be done to animals to investigate the basics on them too, though admittedly a lot of the time things just don't carry over anyway.
That being said, I think the rise of "evidence-based" medicine is also causing issues. It gets used as a cop-out to avoid thinking about the mechanics of what is actually happening in an injury. While this is certainly a good things for treatments where A or B superiority is uncertain, there's a lot of cases where I think an RCT just doesn't really make sense.
A pet example:
I broke my ankle recently, and this dug into the literature and common practice. A significant number of people will get end-stage arthritis a few years after "simple" ankle fractures and often the doctors have no idea why. At the same time, an important part of ankle anatomy is often left unfixed (the deltoid ligament) because a few studies back in the 80s found it wasn't necessary to fix it. The bone that serves an equivalent purpose IS fixed (if broken) though. Mechanically, they restrict the ankle joint and prevent it moving in certain directions.
When presented with biomechanical reasons for fixing it, and concurrent common poor outcomes for some patients, I've seen the response from surgeons thusly - "it's not supported by evidence" presumably because there isn't an RCT demonstrating definitive superiority.
So much of medicine and treatment is literally just hearsay and whatever your surgeon happened to read last week. As a whole the standard is rising, but so much research is so disjoint, disorganised and inconsistent that doctors often have no definitive guidance. It's probably more of a problem in some fields (like ortho) than others, but its still surprising when you see it yourself.
> So much of medicine and treatment is literally just hearsay and whatever your surgeon happened to read last week.
I think that you can replace "medicine" with "technology" and "surgeon" with "programmer". Something that I don't know about medicine into highly advanced countries: How do surgeons learn about the latest techniques? I assume they subscribe to some key industry/professional journals and/or go to annual conferences to discuss specific techniques. I know that dentists do it because I have asked multiple dentists about it. (In my life, the type of doctor that I visit the most often is a dentist for twice-annual checkups, so I see regular improvements to care and treatment.) Finally, I doubt that most surgeons would agree with your statement.
> Also note that the medical field selects hard for people who can memorize information, to the exclusion of people who can understand systems.
It isn't limited to the medical field. This is quite common in most fields.
I understand testing knowledge and intelligence is an intractable problem, but I my main wish is that this would simply be acknowledged. That things like tests are _guidelines_ rather than _answers_. I believe that if we don't acknowledge the fuzziness of our measurements we become overconfident in them and simply perpetuate Goodhart's Law. There's an irony in that to be more accurate, you need to embrace the noise of the system. Noise being due to either limitations in measurements (i.e. not perfectly aligned. All measurements are proxies. This is "measurement uncertainty") or due to the stochastic nature of what you're testing. Rejecting the noise only makes you less accurate, not more.
I think this is also why LLMs score so well on many tests for professions -- much of the learned subject matter is expected later to be regurgitated rather than used in the synthesis of new ideas or the scientific inquiry of mechanisms of action or pathology. If the tests asked questions to measure the latter, I suspect LLMs would fare far less impressively.
Yes, you're fairly spot on. (But I still encourage you to read all this)
I refer to them as "fuzzy databases" (this is a bit more general than transformers too), because they are good at curve fitting. There's a big problem with benchmarks in that most of the models are not falsifiable in their testing. Since it is not open of what they have trained on, you cannot verify that tasks are "zero-shot"[0]. When you can, they usually don't actually look like it. Another example is looking at the HumanEval dataset[1]. Look at those problems and before searching, ask yourself if you really think they will not be on GitHub prior to May 2020. Then go search. You'll find identical solutions (with comments!) as well as similar ones (solution is accepted as long as it works).
IME there's a strong correlation between performance and number of samples. You'll also see strong overfitting to things very common.
That said, I wouldn't say LLMs aren't able to perform novel synthesis. Just that it is highly limited. Needing to be quite similar to the data it was trained on, but they __can__ extrapolate and generate things not in the dataset. After all, it is modeling a continuous function. But they are trained to reflect the dataset and then trained to output according to human preference (which obfuscates evaluation).
Additionally, I wouldn't call LLMs useless nor impressive. Even if they're 'just' "a fuzzy database with a built in human language interface", that is still some Sci-Fi shit right there. I find that wildly impressive despite not believing it is a path to AGI. But it is easy to undervalue something when it is highly overvalued or misrepresented by others. But let's not forget how incredible of a feat of engineering this accomplishment is even if we don't consider it intelligent.
(I am an ML researcher and have developed novel transformer variants)
[0] A zero-shot task is one that it was not trained on AND is "out of distribution." The original introduction used an example of classification where the algorithm was trained to do classification of animals and then they looked to see if it could _cluster_ images of animals that were of distinct classes to those in the training set (e.g. train on cats and dogs. Will it recognize that bears and rabbits are different?). Certainly it can't classify them, as there was no label (but classification is discrimination). Current zero-shot tasks include things like training on LAION and then testing on ImageNet. The problem here is that LAION is text + images and that the class of images are a superset (or has significant overlap) with the classes of images in ImageNet (label + image). So the task might be a bit different, but it should not be surprising that a model trained on "Trying for Tench" paired with an image of a man holding a Tench (fish) works when you try to get it to classify a tench (first label in ImageNet). Same goes for "Goldfish Yellow Comet Goldfish For The Pond Pinterest Goldfish Fish And Comet Goldfish" and "Goldfish" (second label in ImageNet).
if something happens comprehensively across fields, it's likely to be a good idea. the idea that one guy who interviewed people "properly" could assemble a team that was better at the job across the board and disrupt that industry, and other such guys across other industries would disrupt those industries, seems a little farfetched.
> if something happens comprehensively across fields, it's likely to be a good idea.
I don't have high confidence that this is likely. I've seen a lot of bad habits happen simply because "that's how X does it." Which often misses a lot of context. Those things matter, and often matter a lot. Not to mention that information is always passed via a game of telephone [0].
This is related to "trust, but verify". If a big player is doing something it is worth looking at to see if it's a good idea for you. Same with when something is popular. But you have to be careful too. It's easy to miss context or small details that make a big difference. As an example, Google has a surplus of high quality candidates. Any arbitrary filter is helpful to them as they just need to down select. So a highly noisy process (i.e. random with a small bias) will yield good results for them. You'll even MEASURE positive results! This isn't necessarily (might be, might not be) true for anyone who isn't big tech or highly bureaucratic. (Same is true for college admissions)
It's important to remember that big players don't maintain their status simply because no other can out innovate. Rather momentum is a bitch. It can make up for a lack of innovation and still out compete.
I agree that the selection for memorization is high, and I've worked with many neuroscientists who cared more about biological "stamp collecting" than understanding systems.
But in my experience neuroscientists have to have a solid level of systems thinking to succeed in the field. There are too many factors, related disciplines (from physics to sociology), and levels of analysis to be closed off.
Luckily, those two different traits are learnable, so I'd guess as the field advance and mature, this will change?
Honestly 'our knowledge of [X] is largely mechanistic and without a sense of the larger picture' is weirdly applicable to most scientific fields once they escaped the 'natural philosophy' designation.
I doubt it, I used the perceptron as a neuroscience-related example of what happens when we have the right people trying to put the pieces together, not just memorizing.
> Also note that the medical field selects hard for people who can memorize information, to the exclusion of people who can understand systems.
This sounds like one of those complete bullshit memes that certain groups of people like to repeat. Very similar to tech people being "creatives" while other groups like sales are somehow not. Utter bullshit.
> Compare to the invention of the perceptron, which took a joint effort between a polymathic neurophysiologist and a logician.
While cross-field collaboration often yields the best insights, I hope you're not implying that computer scientists are somehow better at "understanding systems" compared to biologists. Not only are computer scientists hugely guilty of pretending that various neural networks are anything at all like the brain (they are not), its also the case that biological systems are fantastically more complicated than any computing system.
> Also note that the medical field selects hard for people who can memorize information
Easy to agree with
> to the exclusion of people who can understand systems
On what basis do you draw this conclusion? I'm not saying the field is full of systems thinkers, but I have no evidence that they are at higher or lower concentration than many other skilled disciplines. Many of the specialties within medicine require systems thinking to be effective physicians.
I work with scientists in many disciplines and neuroscientists are the absolute worst when it comes to hyping their work. Neuroscience, especially the CNS subfield, is complex and still in its infancy compared to other disciplines. The field’s unknowns creates space for strong, ego-driven personalities to claim certainties where none exist and hype their work. The field itself perpetuates this problem by lionizing specific labs or people (i..e The Allen Institute’s Next Generation Leaders) instead of viewing progress as a group effort sustained over years and decades.
Origin of life doesn't make it into the NY Times health column and get parroted by your grandma a week later when she wants to know if taking a nap will give you cancer.
Neuroscience is in the same quadrant of the knowledge / hype plot as nutrition science.
A lot of the things we sort of know are also related to studies that look at a very small subsystems attempting to isolate variables. Like take a slice of neurons, apply a certain chemical and check how it changes action potentials. Over time a bunch of that kind of data can be pieced together in larger systems analysis. That kind of things relies on extrapolation from that lower-order data though, ideally with confirming studies from subject animals, but the data is really clean. The media reports on research is usually bad too, usually taking whatever speculative impact the research might have that is suggested for funding or future work ... but wasn't actual the results of the paper just something tacked on as basically informed speculation.
I learned this after being diagnosed with epilepsy. It became clear quickly that we know very little about how the brain works. Almost all of the medical advice and prediction was based on observed behaviors in the population, nothing specifically about my brain.
>What seems like a ton of consensus at cruising altitude is actually much more divisive as you approach ground level.
I think there's a sense in which that's true (I've especially heard it with respect to the foundations of maths), but I worry about that way of thinking. There absolutely are places where we have consensus, even on subjects of extreme complexity. And the fact that we really do have consensus can be one of the things that's most important to understand. I don't want people doubting our knowledge that, say, too much sugar is bad, that sunscreen is good, that vaccines are real and so on.
A lot of what passes for nuanced decoding of the social and institutional contexts where science really happens, looks to outsiders like "yeah, so everything's fake!"
And when the job of communicating these nuances falls into the hands of people who don't think it's important to draw that distinction, I think that contributes to an erroneous loss of faith in institutional knowledge.
Another nuance that most people don't understand is that there are different levels of "badness."
There's a difference between "cigarettes cause cancer" and "phones cause cancer". The former is very definitely true, confirmed by many studies, and the health impact is very significant. The latter is probably untrue (there are studies that go both ways, but the vast majority say "no cancer"). Even if there's any impact, it's extremely minimal when compared to cigarettes.
People can't distinguish between those two levels of "causes cancer" in a headline.
> When you talk to neuroscientists and researchers in private you often find that they are far less confident than public personas, PR, articles, or science reporting, make them sound.
> they are far less confident than public personas
Science requires, at its core, falsifiability. Just a little education on the philosophy of science is enough to rid most scientists of bravado; to make them wince at words like "fact" and "prove" in scientific contexts.
I imagine this has an impact on personality as well, in the long run.
Yeah this is very common. You might see the headline "Scientists prove X causes Y", and when you click through all the pop-science journalism until you get to the paper, you'll find "We found a weak positive correlation between X and Y and it's surprising because the prior research found the opposite".
>It's not just optimistic - its qualitatively unjustified to think that neuroscience (in its current form, at least) is inevitably capable of cracking consciousness.
The fact that you had to add the parenthetical here to hedge your bet demonstrates that you don't even entirely believe your own claims.
That claim has a very robust history in philosophy of mind. Peter Hacker and M.R. Bennett, a philosopher and a neuroscientist respectively, cowrote Philosophical Foundations of Neuroscience[0]. There was also a fascinating response and discussion in a further book with Daniel Dennett and John Searle called Neuroscience and Philosophy[1]. Both books are excellent and have fascinating arguments and counter-arguments; you get very clear pictures of fundamentally different pictures of the human mind and the role and idea of neuroscience.
Not an axiom, just a prior with enough evidence to smash the probability of ghosts quite near to zero. You're welcome to pretend that I said "zero" and continue shadow-boxing a straw-man, but if you want to fight my actual argument you need to contend with "near to zero."
The difference between Woo of the Gaps and Science of the Gaps is that science is on the advance and woo is on the retreat, it has been this way for centuries, and the pace always seems to be determined exactly by the rate at which science advances rather than any actual opposition from the Woo camp. Nothing is over until it's over, but how much do you actually want to bet on a glorious turnaround? You do you, but for me the answer is "not much."
The rate of an army's advance through a particular town is 0 until they get there, but if you were to see the front lines moving towards you and put forward this argument as a reason to stay, you would be in for a rude surprise.
What argument? I don't see an argument, I just see [deleted].
If you mean this...
This is why I think strict materialism on consciousness is misguided. People like to think "weve cracked everything scientifically, from quantum physics to neuroscience, so even if we don't have a good explanation for consciousness now, we'll get there." Except the reality is macroscopic neuroscientific findings are incredibly coarse and with many caveats and uncertainties, statements more like "this area of the brain is associated with X" than "this area of the brain causes X." It's not just optimistic - its qualitatively unjustified to think that neuroscience (in its current form, at least) is inevitably capable of cracking consciousness.
Many STEM people hate this because they want to axiomatically believe materialist science can reach everything, despite the evidence to the contrary. shrug
... that wasn't an argument, it was a loosely formed set of vague and unsubstantiated claims. The fact that you immediately deleted it and started insulting anyone who responded to you kind of proves my point. I'm sorry I wasted my time on you, won't happen again.
I was going to help support your argument, but I can’t because you started throwing a fit and deleted everything. My 18-month has calmer manners. You can actually delete your comment (instead of editing to [deleted]) and it kills the entire thread, you know.
While I agree in general, I think you overstate things here:
> Many STEM people hate this because they want to axiomatically believe materialist science can reach everything, despite the evidence to the contrary.
Do we have actual evidence that it can't reach everything? That would be "evidence to the contrary". What you have given is evidence of its inability to reach everything so far, in its current form. That's still not nothing - the pure materialists are committed to that position because of their philosophical starting point, not because of empirical evidence, and you show that that's the case. But so far as I know, there is no current evidence that they could never reach that goal.
[Edit to reply, since I'm rate limited: No, sauce for the goose is sauce for the gander. The materialists don't get the freebee, and neither do you. In fact, I was agreeing with you about you pointing out that the materialists were claiming an undeserved freebee. But you don't get the freebee, for the same reason that they don't.]
Science and philosophy as they currently stand have yet to settle on just one single an universally agreed upon definition of "consciousness" — last I heard it was about 40 different definitions, some of which are so poor that tape recorders would pass.
The philosophical definitions also sometimes preclude any human from being able to meet the standard, e.g. by requiring the ability to solve the halting problem.
Without knowing which thing you mean, we can't confidently say which arrangements of matter are or are not conscious; but we can still be at least moderately confident (for most definitions) that it's something material because various material things can change our consciousness. LSD, for example.
>because various material things can change our consciousness. LSD, for example.
I feel really encouraged here, because I think this example has surfaced recently (to my awareness at least) of a good example of material impacts on conscious states that seems to get through to everybody.
Right, you can cite, say, lobotomies, concussions etc all day long but I think eyes glaze over and it hinges on the examples you choose.
I think the one about drugs is helpful because it speaks to the special things the mind does, the kind of romanticized essentialism that's sometimes attributed to consciousness, in virtue of which it supposedly is beyond the reach of any physicalist accounting or explanation.
A slightly-less-than-perfect analogy: I can mess with the execution of software by mis-adjusting the power supply far enough. It still runs, but it starts having weird errors. Based on that, would we say that software is electrical?
Is software electrical? It certainly runs on electrical hardware. And yet, it seems absurdly reductionist to say that software is electrical. It's missing all the ways in which software is not like hardware.
Is consciousness similar? It runs on physical (chemical) hardware. But is it itself physical or chemical? Or is that too reductionist a view?
(Note that there is no claim that software is "woo" or "spirit" or anything like that. It's not just hardware, though.)
Humans being unable to figure out how inanimate matter gives rise to consciousness is not evidence that "strict materialism on consciousness is misguided". Or is there some other evidence I'm unaware of?
> When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."
> Please don't fulminate. Please don't sneer, including at the rest of the community.
People lash out at anyone who says "aha! this is evidence against materialism" in the usual case where materialism predicts exactly the same observation. There are only a few areas where common materialist and dualist models diverge: "brains are complex and hard to understand" is not one of them.
I don't really care if I believe you or not, you deleted your comment so I can't even see what you're referring to, but getting into unproductive arguments on the internet is just gonna make you miserable.
> The new paper used many of the techniques incorrectly, says Nedergaard, who says she plans to elaborate on her critiques in her submission to Nature Neuroscience. Injecting straight into the brain, for example, requires more control animals than Franks and his colleagues used, to check for glial scarring and to verify that the amount of dye being injected actually reaches the tissue, she says. The cannula should have been clamped for 30 minutes after fluid injection to ensure there was no backflow, she adds, and the animals in the sleep groups are a model of sleep recovery following five hours of sleep deprivation, not natural sleep—a difference she calls “misleading.”
> “They are unaware of so many basic flaws in the experimental setup that they have,” she says.
> More broadly, measurements taken within the brain cannot demonstrate brain clearance, Nedergaard says. “The idea is, if you have a garbage can and you move it from your kitchen to your garage, you don’t get clean.”
> There are no glymphatic pathways, Nedergaard says, that carry fluid from the injection site deep in the brain to the frontal cortex where the optical measurements occurred. White-matter tracts likely separate the two regions, she adds. “Why would waste go that way?”
That part stuck out to me as well. However, I wonder if that would be as conclusive as it seems. Even if waste removal is faster while awake, waste creation may be slower. Part of the purpose of sleep and getting tired could be that waste concentration hits some threshold, and the body says “it’s time to stop creating so much metabolic waste”.
>>researchers have challenged parts of this picture, however; a 2024 study, for example, suggested waste clearance is actually faster during waking than during sleep
>That’s a pretty big ambiguity in the story!
no, it's not: "waste clearance faster during waking than sleep" does not mean it's adequate to the job, and waste clearance at night could still be critically important. We also do not know what the waste consists of comprehensively and having a specific sleep system implies its doing something.
Maybe I'm misunderstanding, but isn't that quote referring to the glymphatic clearance found in 2012 and not the main topic highlighted; fluid clearance via blood vessel contraction?
Private market valuations are funky and hard to reason about. In the public markets, valuation represents the current equilibrium of supply and demand - very roughly an average of a large number of opinions.
In private markets, especially with the recent trend of selling a tiny portion of the company at a massive price, the valuation represents something much closer to the maximum that any investor in the world thinks the company is valued at.
Especially when BARTERING a tiny portion of the company for resources from another company - that massively benefits a growth area.
Amazon can "buy" $2B worth of Anthropic to guarantee $2B of spending on AWS - to report that as growth under AWS in their earnings - to juice their stock price.
They also get to report that their investment in the previous round is up massively.
Given the current valuation of Tesla, it doesn't look very different from private market valuations. If anything private valuations seem more sane than the public market to me.
> the valuation represents something much closer to the maximum that any investor in the world thinks the company is valued at
This is all before accounting for the preference stack, which makes multiplying a Series F per-share price (itself derived from dividing compute time by some magic number) by employee common stock a bit silly.
I doubt their preferences are >1x since they've always had high demand. In that case, the preference stack would just be the total raised over time (~$10B).
> A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.
This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.
While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.
My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.
I think your biggest takeaway should be that they person writing the blog post is extremely well-known versed in programming and has labored over code for hours, along with writing tests, debugging, etc. He knows what he would like because it's second nature. He was able to get the best from the LLM because his vision of what the code should look like helped craft a solid prompt.
Newer people into programming might not have as good of a time because they may skip actually learning something fundamentals and rely on LLMs as a crutch. Nothing wrong with that, I suppose, but there might be at some point when everything goes up in smoke and the LLM is out of answers.
My experience is opposite - I get the most value out of LLMs for topics that I have less expertise in. It’s become vastly easier up to speed in a new field because you can immediately answer basic questions, have the holes in your understanding pointed out, and be directed to the concepts you are missing.
Great point, and agree that generalization makes for the clearest wins from abstraction.
But there are also cases where there is no generalization, but the encapsulation / detail hiding is worthwhile. If you have a big function and in the middle of it you need to sort some numbers, you would probably implement a Sort routine to make the control flow much easier to understand - even if you only use the function once (let’s pretend there’s no sort functionality in standard library).
Curious if others agree, and what heuristics you use to decide when implementation encapsulation is worthwhile.
Seems completely nonsensical. Yes, neural networks themselves are not unit testable, modular, symbolic or verifiable. That’s why we have them produce code artifacts - which possess all those traits and can be reviewed by both humans and other machines. It’s completely analogous to human software engineers, who are unfortunately black boxes as well.
More broadly, I’ve learned to attach 0 credence to any conceptual argument that an approach will not lead somewhere interesting. The hit rate on these negative theories is atrocious, they are often motivated by impure reasons, and the downside is very asymmetric (who cares if you sidestep a boring path? yet how brutal is it to miss an easy and powerful solution?)
So what have you gained in the process, other than wasting significantly higher amounts of energy in the form of heat and other emissions? It is nothing like software engineering; clearly you speak out of ignorance.
And then you say you attach 0 credence whatever, but you give no reasons for why others should buy your points. You don't really seem to have much of a point, anyway.
My argument: the theoretical limitations of NNs (lack of modularity, symbolic reasoning, verifiability) cause no practical problems to usefulness - we can just analyze the code artifacts as we do with human programmers. Do you disagree?
Yes. These limitations are not theoretical at all. The author touches on compositionality --- how a problem/program be decomposed into smaller, orthogonal problems/programs, reasoned about and tested separately, and then abstracted away in an interface that hides the implementation details. This is the essence of programming and software engineering at large, whether you're programming in assembly, Java, or Haskell. To divide and conquer so that we can fit an isolated aspect of the program in brain cache so that we can reason about it. This is a fundamental limitation and will not change until the year 40,000 when we have Space Marines.
A neural network, conversely, is a big ball of mud. Impossible to reason about and to test except for whole-system, end-to-end testing, which is impossible to do exhaustively because of the size of the state space. It is, by design, unexplainable and untestable, and therefore unreliable. It's why you use globals in C only judiciously. (I am just rephrasing the article here, not saying anything new.)
And the evidence that it causes practical problems to usefulness is already out there; "hallucinations" are simply errors, just that corporate PR likes to pretend that it's a "feature" and not a bug. This is delusional. A society seeking digitalization should run away from this level of stupidity.
What's totally unclear is what data they used for this reinforcement learning step. How many math problems of the right difficulty with well-defined labeled answers are available on the internet? (I see about 1,000 historical AIME questions, maybe another factor of 10 from other similar contests). Similarly, they mention LeetCode - it looks like there are around 3000 LeetCode questions online. Curious what others think - maybe the reinforcement learning step requires far less data than I would guess?
reply