Hacker News new | past | comments | ask | show | jobs | submit login
StackOverflow petition to allow removing AI generated content (mousetail.nl)
127 points by miohtama on June 2, 2023 | hide | past | favorite | 114 comments



Either answers are good or not.

It doesn't matter if they're generated by a 13-year-old in their bedroom, someone studying CS at university, a well-respected IC at a top tech company... or an AI.

If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

And to those asking why you would ever want AI-generated content on StackOverflow when you could just go to ChatGPT/Copilot/etc... it's because of all of the commentary. People are giving different answers, arguing the pros and cons, pointing out errors... all of the discussion around a StackOverflow question is usually just as valuable as any given answer, if not more so.

Of course, I understand that AI-powered accounts that can ask/answer hundreds of questions a minute are a problem simply because moderation can't keep up with them. But StackOverflow already has a lot of limits just for human accounts -- lots of actions you can't take until you've contributed certain amounts of value. Just extend these protections to do things like rate-limiting and so forth, to ensure that the normal ratio of content submitted vs. moderated stays constant and manageable.


This doesn't work at scale. Stack overflow as a platform has been handling user generated input via moderators, voting, and testing. This is fine when there are only 26.8 million coders on the planet, most of which aren't posting on stack overflow regularly. With LLM's all of a sudden there is a huge influx of mediocre content on the platform that people can't handle. Inevitably this will erode trust in the platform. When someone posts a answer I assume they actually ran the code, and can verify the result. LLM's can spit out seemingly correct code that just doesn't work.


> This doesn't work at scale.

See also: the Clarkesworld saga of them being bombarded with mediocre AI-generated short stories. Filtering out bad submissions has always come with the territory, but they're suddenly drowning in them with the advent of LLMs which make it trivial to churn out vaguely story-shaped text on an industrial scale. The generated content isn't good by any measure, but it's "good enough" to pass the smell test and waste a curators time before they realise it has zero actual merit, and there's so much that it becomes a sisyphean task to sort through it.

Likewise with image generation, it's now incredibly easy to churn out images that look like something a person might make to express themselves, which are actually just a loosely guided slice through a statistical model of pre-existing images, passing the smell test for "good art" despite having zero actual intent or substance. It's spam, but for culture.


This seems like a problem you can only solve with an invite or credential system. If you are an invited writer (or have some sort of literary degree) you can submit content, otherwise you gotta let people invite you. AI content is still allowed, and if you post garbage you lose your posting privileges.


Has that observably worked for Citizendium or lobste.rs, which have tried exactly that for years, though? Have they been widely recognized as superior to Wikipedia and Hacker News? Have they in fact been widely recognized?

If your answer is that Wikipedia and Hacker News still get the recognition and haven't collapsed, then I suggest that there are already examples to learn from that the same idea for Stack Exchange won't work.


>>If answers are good, keep them. If they're bad, downvote them.

>This doesn't work at scale... with LLM's all of a sudden there is a huge influx of mediocre content

The GP's answer may not work at scale - however LLM detection doesn't work at all. So the only semi-workable solution is aggressive filtering and banning users who post trash (LLM or not).

Also, there's a need to think about score and trust mechanisms - the same mechanisms which can be used for filtering also provide an incentive for LLM use, is there a way to avoid that?

>When someone posts a answer I assume they actually ran the code, and can verify the result

I wish we lived in a world where this assumption wasn't naive.


Yep, and if you aggressively ban bots/LLM content, then you'll see everyone accuse and report each other for said content even if it's good content.

For example here on HN we have a rule if you see bot content you don't mention it in the thread. You report it and let the admins decide. Anything else just turns into flamewars.


And there's the problem on SO. Previously, we could do exactly that - Flag the content for a Mod to review. Now Mods are pretty much prevented from taking any action when we (the community members) and they believe it is a bot.

I saw one user yesterday post 10 lengthy, detailed answers in an hour, in 3 different programming languages. But the Mods aren't allowed by SE to consider that (or pretty much anything) to be an indicator that it's AI-generated.


Again, you can handle this by rate-limiting and standard anti-abuse measures. To elaborate: don't allow new-ish accounts to post more than one question/answer per day, don't allow allow accounts to more than one question/answer per week/month if their previous content hasn't reached a certain quality threshold of votes, and so forth.

It's entirely possible to set up the system to prevent it from being flooded by content that moderation can't handle. In fact, StackOverflow has already been largely set up that way, and this will just require just a little more tweaking of the types of existing policies that have already been in place for a long time. People attempting to flood internet forums with low-quality content or outright spam isn't anything new.


This works in theory, if people abide by it.

However, in practice this sort of approach would likely mean people who don't have anything invested (especially new users) would create multiple accounts to be able to post multiple times.

Rate limiting only works well if there's a stickiness that makes changing accounts more difficult than waiting out the rate limit.

---

While Stack Overflow was set up to handle moderation, the culture evolved to one that disdained any appearance of gate keeping, preservation of any attempt to answer, and that moderation and curation actions on a post were personal attacks on the individual who wrote it.

As tooling was taken away from community moderation and curation it became harder and harder to maintain quality. Additionally, the rule of 90-9-1 (aka The Rule of Participation Inequality - https://www.grazitti.com/blog/the-90-9-1-rule-is-over-its-ti... ) applied to people who are doing moderation and curating means that once it scales above a certain point it becomes impractical if not impossible to curate all of the incoming material.

A little more tweaking may have been possible a decade ago. However, both the culture of people asking questions and the corporate "engagement first" approach have made being a person trying to curate the material fighting against the tide.

There are 3.3k questions that have had a close vote cast that need more people to review them. There have been only 313 reviews today as I write this ( https://stackoverflow.com/review/close/stats ). And that's ignoring the countless thousands of reviews that have timed out.

    year close tasks
    2016     581,204 https://meta.stackoverflow.com/q/340815
    2017     .......
    2018     440,336 https://meta.stackoverflow.com/q/378415
    2019     318,431 https://meta.stackoverflow.com/q/392550
    2020     225,745 https://meta.stackoverflow.com/q/404558
    2021     213,104 https://meta.stackoverflow.com/q/415250
    2022      96,495 https://meta.stackoverflow.com/q/422885
A trend with community moderation is clearly visible and likely too far to be corrected with tweaking.


Exactly, it’s not that useful if the answer I’m looking for exists on the platform but I can’t find it because of the signal vs. noise ratio. To me, usually, the context of the answer is more important than the answer text itself.


How are they going to check for LLM usage?

I think it's way more likely that poor answers won't mention the usage of LLM's to generate the answer, while good answers aided by LLM's will more often mention it.

Punishing honesty just seems incredibly counterproductive.

Automatic detection is downright dystopian... being censored by an algorithm because it mistook my effort and work for a LLM.


Agree with the middle part - At the moment, the policy implemented by corporate is "Don't ask; don't tell". If someone says they used GPT or other AI for their answer, it's disallowed. If they try to hide the fact, there's not much the community can do to get it removed.

And while I'm not a moderator, as just a user I've flagged over 1,200 answers on Stack Overflow (and several of the smaller communities like Ask Ubuntu) that were subsequently removed. Automatic detection was never the sole criteria that was used to determine if it was AI - It's entirely possible to spot GPT content using multiple methods. I don't publicly talk about most of these, since we do have a group of users (sometimes spammers) who attempt to hide their use and make it more difficult to detect. See some of my additional notes on the topic on https://meta.stackexchange.com/a/389674/902710


> This doesn't work at scale.

Sounds like a job for AI


> If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

Yeah, lets us keep providing free labor to help train somebody else's models, improve somebody else's infrastructure so that they can even more effectively dominate.


I think you missed the fact that you are mainly doing this to make other users find relevant answers, on a free website.

Or maybe you would want to pay for a version of stack overflow that is only curated by employed people ?

But I guess you would lose much content.


As you say, Stack overflow is free to use. ChatGPT isn't and its scraped web training corpus is not available.

So it's not symmetric


How is that any different than providing free labor to help someone learn how to build the next [X] so that they can more effectively dominate?

But more importantly banning AI generated content from Stack Overflow doesn’t solve the “problem” you’re describing.


> How is that any different than providing free labor to help someone learn how to build the next [X] so that they can more effectively dominate?

Its an absurd equivalence. A human can never "dominate" no matter how much they learn.

Training for free a closed, for-profit private enterprize infrastructure that will eventually make your preferred free collaboration platform obsolete is like the ultimate of naivete (or vested interest).


All content on Stack Overflow is licensed under Creative Commons CC BY-SA. So, in theory, the community could create a new Q&A website and import all the questions and answers from Stack Overflow. But that doesn't overcome the network effect, I admit.


It's not really an equivalence, if you remove the sensationalism, it's just a point that the free work is already being provided.

Put more succinctly I'm making two points: 1) The free work is already being done. 2) StackOverflow has been crawl-able since forever.

You'd have to address both of those problems before the concern that you're raising becomes relevant. Banning the posting of new generated content doesn't address either of those things.


Ok, it makes more sense now. But you need to consider the significant incremental benefit to the model training process when creating labels exactly where the model underperforms.

The baseline you mention is already an issue that is being debated hotly.

My comment was that humans reviewing and fixing new AI content is exactly not what should be done in the current circumstances.


>A human can never "dominate" no matter how much they learn.

Somewhere an autocratic leader is laughing at you.


If you as an answerer uses an LLM to generate content, then verify and vet it yourself based on your own knowledge before posting, I'd think it's fine.

But spamming thousands of answers an hour automatically and wanting the community to do all the work is just not sustainable I feel. It'll also kill the sense of community if half the actors are bots.


Agreed - That's the basis of my "responsible use of AI on SO" post at https://meta.stackexchange.com/a/389675/902710


> If answers are good, keep them. If they're bad, downvote them. If they're redundant or off-topic or gibberish, delete them.

In practice this isn't possible. Lots of accepted answers are bad. Often for subtle reasons! An answer with a SQL injection vulnerability might get plenty of upvotes and be accepted, but it's objectively bad (even if it answers the question).

The problem is that there is no AI that's accurately fixing answers. AI only generates mediocre answers, it doesn't have the capacity to moderate mediocre answers. Humans simply can't keep up, or don't have the acumen to pick up on the subtle inaccuracies in accepted-but-bad answers (after all, that's often why they're looking for the answer).

Even with protections like rate limiting, I'm not sure you could prevent the majority of the damage that crappy AI can accomplish. Simply paying (pennies) for proxy servers with residential IPs gets around much of that, anyway.


haha. no.

answers are not either good or not. it's not binary. there is a lot of nuance and sometimes the questions and followups contain details that can take the answers in a completely new direction.

just telling if an answer is good or not is a lot of work. sometimes it requires and expert to figure it out. when the answers are given by a human and it's a good faith effort both people in the loop benefit from it. when the BS generation is automated there is zero incentive for a human being to even look and correct the answers. what is the incentive to do so?


> when the BS generation is automated there is zero incentive for a human being to even look and correct the answers. what is the incentive to do so?

You hit a good point here. If users can't be bothered to spend time, effort, energy and cognition into answering a question, why should the readers and correctors do so?


The volume is the real problem, as you mentioned. People are worried about AI content because it’s hard to even review it or accurately detect it. I can’t remember the last time I looked at my email spam folder, even though I’m sure there are some useful messages in there, mostly because it would take too much time to go through it. If the inbox starts being filled with AI content that isn’t quite spam, it might start feeling like a chore to go through too, again mostly because of the volume. We will have to see if the existing mechanics of unsubscribe, report, downvote, etc. will be sufficient to hold back the tsunami.

My guess is the spam detection arms race will evolve into a content moderation arms race and we will end up with advanced AIs filtering and moderating content for us, to varying degrees of success and things like email, forums, and Q&A sites will become increasingly hands off.


Agreed. I have found myself sometimes asking Chat-GPT for guidance related to obscure error messages and occasionally its more useful than google or stack overflow.


When it doesn’t hallucinate, which in my case it’s most of the time.


Sure, but if you try it out, you pretty quickly realize it's a hallucination. Unfortunately the type of GPT content we're now getting on Stack Overflow and its sibling sites is mostly unvalidated GPT hallucinations.


I generally agree if the content can be kept clean. Certainly, human users will have more to contend with.

There are just so many interesting questions by having chatbots interact with StackOverflow:

What happens when AI-powered accounts are allowed to vote?

What kind of questions might AI-powered accounts be asking or be interested in?

How will the questions/answers/responses change between model revisions?

You can imagine things spiraling out of control. Perhaps there needs to be a chatbot version of StackOverflow to service all the questions that interest chatbots. :P


I guess the premise is that it is easier to identify AI-generated content, with a low hit rate (?) of correctness, than to identify correct / incorrect answers based on merit alone.

Also StackOverflow is kind of gamified (a rare case of it working IMHO) and the rules of the game don't work so well when kind of good looking content is easy to generate. SE answers are hard to verify but easy to write. If writing becomes too easy, it is a recipe for spam - as has indeed happened.


Before jumping to conclusions, be sure to check out this context:

https://meta.stackexchange.com/questions/389582/what-is-the-...

TL; DR - the policy is “we can’t tell if something is AI generated, so hard to justify removal on just that basis.”

> We recently performed a set of analyses on the current approach to AI-generated content moderation. The conclusions of these analyses strongly indicate to us that AI-generated content is not being properly identified across the network, and that the potential for false-positives is very high. Through no fault of moderators' own, we also suspect that there have been biases for or against residents of specific countries as a potential result of the heuristics being applied to these posts. Finally, internal evidence strongly suggests that the overapplication of suspensions for AI-generated content may be turning away a large number of legitimate contributors to the site.


This follows from the recent case where Texas A&M students were falsely accused of using ChatGPT and subsequently failed, which caused a ton of negative publicity: https://www.rollingstone.com/culture/culture-features/texas-...


I would've thought the problem is bots that are mass solving instead of human + ChatGPT.

If a bot mass solves you can identify it by posting frequency and quality over time.


SE has not provided the data here, and what they communicated is a mess. Some of their main arguments are rather dubious and didn't convince the mods.

Keep in mind that moderators have more than just the pure content of the answers available. This is usually not about a single post, but many of them.


I don't understand why you would, for the forseeable future, want to allow AI generated content on SO. Not with the current state of the art in generative AI.

If I want an AI generated answer, with all the pros and cons specific to LLMs, I'll just open chatgpt or turn on copilot... SO answers (used to be) in an entirely different league in terms of trustworthiness.

I'm totally on board with this moderator strike.

Edit: it appears there is a very high false positive rate, making it difficult to distinguish between some answers. So, there's that to consider...


If someone has already put in the right prompt to get the best answer, you will now not have to do that.

Also, SO isn't just about getting the right answer, it's also about finding the pros and cons of each proposed solution.


There could still be value, if ChatGPT is right 60% of the time, there is still value in someone filtering that down to the correct answers.


If the code works just fine, why remove it? For small algorithms it should be good enough.


"Code working" isn't necessarily black-and-white. For a new user (the one asking the question), the code may appear to solve the problem, but may have corner-cases or even security risks. That's entirely possible with user-generated code as well, of course, but GPT/AI allows it to be produced at a much higher rate, with the person who posted the answer often not being capable of (or not caring to) validate or correct it.


Yes, and SO already have plentiful of sample code that appear to solve the problem, but have huge flaws.


What's the motivation for Stack Overflow to allow AI-generated content in the first place?

These AI models were likely heavily trained on SO data, so any LLM-based answer is merely a regurgitation of thousands of human answers before it.

In addition, asking a question on Stack Overflow and having some LLM respond seems to me like the equivalent of asking GPT-X.Y directly, albeit with extra steps.


There is a possibility that everything we output is merely a regurgitation of thousands of human answers, ideas, and thoughts we have encountered before. Including this comment of yours. And mine.


There's this new trend of what I'll dub techno-nihilism, which is essentially a counterargument to the stochastic parrot argument. The former being: well what if WE are stochastic parrots, after all that's how we learn, right? Well yes, but actually no.

It's trivially false because ChatGPT was trained on something (in this case, Stack Overflow), which, in turn was trained on something else (maybe a book), and so on. So knowledge, imagination, and genuine creativity must exist somewehere down that chain. Everything can't be just repeating what was learned prior ad infinitum, or we'd have nothing new. Ironically, even the development of large language models is an exercise in creativity.


But there is also this idea that you can't judge the system while being a part of it. You simply can't see the whole picture.


True, but like solipsism or nihilism, those ideas are just not that interesting.


Oh, it's not an "interesting idea" that we're teetering on the edge of not being able to differentiate between machine-generated and human-generated content. It does seem that our AI learning models simply mimic our own thinking processes, absorbing and combining experiences to create results that sometimes outshine their origins.

The real question that'll soon dominate is, "How can we even tell the difference?"*

(* - Reworded with ChatGPT 4)


At the sametime I think you may be overvaluing new answers and the importance of reiterating known answers in a method the user asking the question understands.

For example, there is no intrinsic value in something new. If I take a new solution to a problem, lock it in a box, then it has zero value. It is not improving anyway.

Now, if I take a solution and present it to you in a manner that you can understand, that has an inherent value to the end user.

By this analogy attempting to say that LLMs are useless because they only know what already exists is far to harsh of measure because the vast volume of human output is rehashing what we already know.


I hate this sort of retort since it’s fundamentally meaningless. All atoms that will ever exist were in the singularity and exploded during the big bang to encompass all of the universe, and so we are all just moving along. Ok, so what.


The point of the retort is precisely to point out the meaninglessness when "regurgitation" is extended to include systems with some degree of generalization/extrapolation/discovery capabilities. The expectation set on AI seems to be to break information theory, "it's not original because it can only draw a horse (more accurately than a random guess) due to having trained on information about a horse" or so on.


A possibility that every idea necessary to put the James Webb Space Telescope in place and analyse the data it collects from the universe's earliest and most distant galaxies was already present thousands of years ago, in fact must have somehow been built in to the first humans? I don't think there is.


I'd guess that valid human-authored submissions were being rejected by moderators because they appeared to the moderator to be AI-authored. Moreover, there is probably a significant grey area where a human selects among multiple AI answers and refines them.

As a user of Stack Overflow answers, I don't care too much about who the author was. I do value the voting, comments and community on Stack Overflow, since they add confidence and color to an answer.


Why is that a bad thing? People do ask very similar things on Stackoverflow so if an AI can answer that, let it!


Probably scaling moderation to account for increasingly wrong and right-looking answers.


Not surprised - the universal issue with AI generated content is that it's near impossible to curate (not helped by what we'll call... a certain eagerness of its advocates to share the output of these tools) and often is wrong in imperceptibly tiny ways.


This just makes obvious sense.

If someone wants a GPT-generated answer, they can use ChatGPT themselves.

If I'm bothering to post my question to a forum, the social contract is that I'd like a response from a human.


Anyone around who moderators on SO? Is there a general sense of alienation from corporate SO?

My understanding is that in the early days, a lot of the devs at SO were actually recruited from the SO and Meta moderation userbase but probably that doesn't scale.

Example:

    Ben: And then, April 29th, I was, again, sitting in the agency, doing my thing, and I also had my personal email open, and I suddenly got an email from Jeff Atwood. 

    But it said, “Who should work for Stack Overflow?” That was the subject line, and then the body of the message said, “You should,” and that was basically it. 

    And that was kind of like this jaw-dropping thing, because from that moment on, it was this, “Okay, maybe the skill that I have is actually marketable,” kind of moment.
https://corecursive.com/stack-overflow/


Not a mod but mods have complained publicly about the sense of isolation for years


I used to be heavily active in curating SO, and yes, there is an incredibly strong sense of alienation from the corporation (which is what drove me away).

In the old days, most of the staff, from devs to management to the CEO, were active users of the site and hung out on Meta and in chat. They were easy to reach and happy to answer questions; and whenever they made major changes to the site they would go to Meta to ask for feedback and adjust their plans accordingly.

There was a noticeable shift in this dynamic starting around ~2016. Around this time, the company stopped focusing development effort on the core site functionality, and instead prioritized side products & attempts to monetize the site (most of which failed). Feature requests on Meta were almost completely ignored, and site features that had been on previously-announced timelines/roadmaps were never delivered. But the community was still as strong as ever; and so people started implementing these missing features themselves in the form of bots and userscripts. This was the "golden age" of moderation bots, and a really fun time to be a part of the community -- power users ran heavily modified frontends that could display all kinds of additional information and automate repetitive actions, and integrate with bots to do things that Stack Exchange's systems were bad at -- like flagging spam and low-quality posts; identifying plagiarism; and detecting flamewars in comments.

As the company grew, they hired a ton of middle management who were not active participants in the site, and largely did not care about the day-to-day. This was alright when they left us alone, but around 2018-2019 they began to take an openly hostile stance towards the "power users". Here's an excellent post from that time summarizing the general sentiment: https://meta.stackexchange.com/questions/331513

The short version is: the company began blaming power users for things like the site's "unwelcoming" reputation (which is really a symptom of the site's outdated and opaque moderation tooling, and power users had been clamoring for better tools for years). They began a pattern of rolling out features and UI changes that took major steps backwards in usability and accessibility -- and due to all the negative feedback these changes received on Meta, they announced staff would no longer participate on Meta because it was "too negative". A high-level manager famously quipped that the opinion of Meta was not relevant as it represented "0.015%" of Stack Exchange's userbase -- despite the fact that that 0.015% was responsible for the majority of content & moderation activity contributed to the site.

In late 2019 it got a whole lot worse when, in rapid succession: 1) the company updated the site terms-of-service to illegally change the license of user-submitted content, and 2) an employee abruptly revoked a volunteer's moderation privileges without due process, and then went to news outlets making false accusations about that user's behavior (https://meta.stackexchange.com/questions/333965). Shortly after that, Stack Overflow fired several well-loved and highly respected staff moderators, for undisclosed reasons (https://meta.stackexchange.com/questions/342039/). A lot of people, including myself, left the community in the wake of this.

Since then -- at least from my outsider perspective of checking in once in a while to see what's going on -- it seemed for a while like the company was learning from its mistakes.They apologized for ignoring Meta, began asking for and listening to community feedback once again, created new policies to protect volunteers from the kind of abuse that happened in 2019, and began implementing some of those long-ignored and long-overdue feature requests. But in 2021 the company got bought out by a VC firm that is even more aggressive about trying to monetize the site; they started cranking up advertising and pushing generally unwanted side products, but they mostly left the community alone.

That brings us to generative AI. As soon as ChatGPT came out, a deluge of users began copy/pasting Stack Overflow questions into ChatGPT and copy/pasting its answers into the answer box (usually with no editing or fact-checking effort). General consensus among the community seems to be that ChatGPT produces wrong or unhelpful information to an unacceptable degree, and that allowing machine-generated content on Stack Overflow defeats the purpose of the site (you go to ChatGPT if you want answers from a machine, but you go to SO if you want answers from a human). The staff supported this consensus and made it official policy -- but at the same time the CEO kept making rambling blog posts about how "AI is the future about Stack Overflow" and launching sweeping initiatives within the company to do...AI related things? Nobody really knows what he's talking about.

That all leads up to the events that triggered strike: out of nowhere, the company suddenly announced a few days ago that it was overruling previous community consensus and prohibiting users from deleting content on the basis of being AI-generated.


Wow, great summary. I did not know about the relicensing bit. That seems problematic!


I'd propose a different solution. For every single new question, auto-generate an answer with GPT and mark it as AI-generated.


At least the generated responses won’t start by asking me “why in the world would you want to do that?”


You nailed it.


ChatGPT isn’t AI, I really wish people would stop calling it that. It tricks people into thinking something magical is happening. There is still only human generated content and ChatGPT scraped it.


"AI" is a moving target. As soon as techniques under the banner of "AI" start being used, they get a specific name and people start saying "that's not AI". For now, LLMs "are AI", just like current text-to-image models (Stable Diffusion, Midjourney, etc). There's no winning a terminology war single-handed.


And this folks is the AI-Effect in actions.

ChatGPT is AI. It is a human level general intelligence, no, and we're likely very lucky for that.

Humanity is going to have to accept that the words we used to define AI were far too imprecise and don't reflect how development of AI is actually occurring. If you want to argue the language of it, then demand we make new, testable, and scientific definitions for the capabilities of computer intelligence. Any less, and all we'll hear is the sound of scraping goal posts as they get moved further and further.


The thing that comes to mind when I hear about volunteers withholding their efforts is the "reinstate Monica" protest. I haven't counted how many years it has been, but there are still people with "reinstate Monica" in their Stack Exchange profiles to this day.


It is most certainly a predicament. If people feel like they can freely post ChatGPT content, it adds enormous strain on making sure that the solution provided actually works. This then has to be done by moderators (with experience) or other users who land on the site. If it is the latter, people might get frustrated that answers are wrong, devaluing SO's value overall.

In a nutshell, allowing AI in its current iteration will create more problems when it comes to moderation, content quality, and the sanity of sites users - pretty much.

Quite a lot of variables to find common ground on this one, but I think StackOverflow should not allow AI content for the time being.


So, prove it's AI content, that's going to be the hard part with a ton of false negatives.

I look at it the other way, AI answers should be fine where tools like Code Interpreter are in use and they can provide a verifiable solution. At the end of the day people copying from SO are generally looking for something that works, and the problems of this being best practice can be fought out by the people down thread.


I mean people post wrong answers to stackoverflow all the time. They generally get downvoted.


People here are arguing about whether it’s ok to have AI-generated content on StackOverflow or not. But it seems to me that’s not the issue. The real issue is that people suck at identifying AI content, and so do AI detectors. So moderation based on that identification is obviously going to be unfair and inaccurate. Moderators are removing perfectly acceptable human-written answers based on their spurious intuition that it’s AI generated, and they are striking because they want to go on doing so.

It doesn’t matter how confident moderators are that they can identity AI content when the correlation between their confidence and reality is so low.


Any one tool is bad at detecting AI but combining both human intuition and multiple automatic tools can get a very accurate result. Moderators already only make suspensions when multiple systems agree.

Not to mention the best AI might be hard to distinguish but most AI content is quite obvious (when reading carefully) because how bad it is. It can trick a casual reader but not a experienced moderator who knows about the subject matter.


Who cares about provenance? If the answer is usefull it is. Don't think the "humans" post bogus "quick google search" answers on SO for rep farming? Have you accessed the site in the last 5 years?


We talk about job disruption, GPT has SO's number. They should have never shut their job board. They have painted themselves into a tiny corner now and OpenAI + others are coming for their lunch.


LLMs will usually lag a lot though. Often a human has more up to date knowledge. Computers being such an active and ever changing field, that matters. OpenAI should want people to keep contributing original answers to feed its AI.


Oh StackOverflow. I forgot it existed since ChatGPT was released.

Yeah.. surely a 2013 reply from a reputation farmer suggesting a dirty jQuery solution will be better, since it was made by a human. LOL


Why? If the answers are good, leave them.


Because the system isn't confirming the answers.

Let's assume we have the final form AGI. If it doesn't confirm the claims are accurate then it falls under the same flaw as humans who don't confirm.

If it's a bunch of untested, unconfirmed results that are statistically pretty decent, this doesn't really reduce the amount of work.

That is the same misplaced enthusiasm that developers tend to do when they fetishize new tools or languages which actually make their job more complicated and time consuming but occupy the time in a way that gives illusory optics to the contrary.

The fact that most devs get fooled by such productivity illusions are why 10x engineer are a real thing.


>Because the system isn't confirming the answers.

Do we have any evidence that users confirm the answers? In general the conformation is in the end users following up and saying this works/doesn't work.

Now, this gets more problematic when 'bots' follow up and says it works, but that is going to be a massive problem in detecting that's the case unless they are doing it far more than a human could.


> Do we have any evidence that users confirm the answers?

No.

> In general the conformation is in the end users following up and saying this works/doesn't work.

I guess.

The point is there's a limit to the trustworthiness of the results and if I'm going there to fix a problem, the success rate I'm looking for is 100%, especially if it's said as confidently and verbosely as these AIs seem to be


That's the thing, A) they might not be, B) once we start mixing human answers with AI on SO, it's value as a data source for future LLM train runs will decrease.

The tech is moving at breakneck pace, and in general I'm loving it. But let's not hurry the adoption of AI too much


For B, SO still has voting on answers as well as acceptance of answers

It's not as valuable as raw text input, but it's still marked up and classified


Consider the answer to the question "how often do people down vote incorrect information?"

While SO still has voting, the value of a vote in terms of identifying correct information has fallen off substantially in the past several years. Poorly written questions and incorrect answers are more likely to get an upvote for trying than a down vote for what's actually written.


> SO still has voting on answers

We call it RLHF these days


Some people use their StackOverflow karma for social credit and portfolio. Their karma won't have the same value as it did before AI-generated responses were allowed. I think that comments should be approved but zero karma applied for providing the correct response using AI. There won't be an incentive to provide AI generated responses, but they won't be rejected either.


AI generated can be somewhat sniffed out today but I'm not sure how long that will even last. Usually you can tell when the writing is great but the points made are trash or miss the point.

I personally wouldn't find it helpful. I'd say let the question poster and visitor decide if they want to see AI generated stuff. But that does rely on AI answers being marked as such or detected.


The current title is “StackOverflow petition to allow remove AI generated content”, which sounds… uh… AI-generated.


So it has been only a couple months and AI spam is causing problems for submissions in communities from StackOverflow to Sci-Fi publishing to photo contests. The AIs can overrun all the CAPTCHAs too.

So, this is just the beginning. What will happen next year on HN? Anyone can deploy a swarm of bots


With generative AI (text, images, videos), I'm of the opinion that very soon -- if not now -- every"one" on the internet should be assumed a bot until proven otherwise.

In other words, it's now trivial to fake a digital persona, including "selfies", "personalities", and more.

How do you know I'm not an LLM?


dead internet will become real


It was a misnomer calling it 'dead internet theory', Thoughty2 correctly renames it 'dead internet prophecy'.


One problem with this is that future Models will probably be trained with data that includes a subset of AI generated data.

As this ratio grows the need to classify AI/human becomes critical, otherwise all these models will probably degrade over time.


It's an ongoing joke about spam bots, SEO spam, etc, that the day they actually post helpful and useful content it's mission fucking accomplished. With "AI" we are one step closer.


And they can tell it's AI generated... how?


Possible AI generated, closed.


Can answers be edited by the community to correct any hallucinations? Or is there just an outright ban on removing AI content?


There's no ban on removing AI content as far as I'm aware - admins just upped the evidence requirement for determining a post as AI-generated, seemingly in response to high false positive rates of hunches and current detectors:

> We recently performed a set of analyses on the current approach to AI-generated content moderation. The conclusions of these analyses strongly indicate to us that AI-generated content is not being properly identified across the network, and that the potential for false-positives is very high.

> In order to help mitigate the issue, we've asked moderators to apply a very strict standard of evidence to determining whether a post is AI-authored when deciding to suspend a user.

> We've reminded moderators that suspensions (and typically mod messages as well) are for real, verifiable malfeasance only, and should not be enacted on the basis of hunches, guesses, intuition, or unverified heuristics

https://meta.stackexchange.com/questions/389582/what-is-the-...


This is essentially a ban on removing AI content, the cases where it is still allowed to remove the posts are exceedingly rare. This is rather confusing as the public policy doesn't contain those details.

I'm a moderator on a small SE site, so I have seen the internal communication between SE and the mods on this.


> This is essentially a ban on removing AI content

I think framing it as "a ban on removing AI content" is wrong/misleading and the reason for softwaredoug's question about whether you'd even be allowed to edit low-quality hallucinations.

To my understanding you can still remove/edit/ban for all the same reasons as normal content, just that the additional reason (removing for inherently being AI generated) is now significantly stricter.

> the cases where it is still allowed to remove the posts are exceedingly rare.

Not aware of the internal guidelines unless there's something you're allowed to share, but generally I could understand the admins' desire to err towards high-precision over high-recall.


This gets into the particulars of how moderation on SO usually works. Posts are only deleted for low quality when they are really terrible. Most bad posts are handled by downvoting by the community, no mods involved. Deciding which posts are bad very often requires domain knowledge, which the mods can't have for every possible topic on SO.

SE dropped this new policy on monday, a holiday and it went into effect immediately. They never even mentioned before that they were concerned about false positives here. They could have just asked the mods, explained the concerns and asked them to be more careful. That never happened.

The new policy isn't just erring towards avoiding false positives, it is far more extreme than that and prevents almost all cases of AI-generated content from being moderated.

There are other concerns like e.g. especially academic sites are strict about considering the use of AI-generated content without declaring it to be plagiarism. Acting on that is now impossible.


I would go with only human code should exist.


And how do you prove it?


I would be more afraid of AI generated questions rather than answers.


“Let’s ban computers on a forum about computers”


I mean, would you want HN to be completely overrun by bots?


depends if the answers are better than the ones by humans


that's fascinating.

if i am here as a technical resource, i dont care

if i am here to make connections and maybe even, every 10 years, a new friend, then absolutely not.

so the challenge to the reply to my post, is to say something that disproves i come here for connections and friends.


HN is for chatting but SO is for storing answers. Why moderators can not de-hallicinate the answers? Probably it needs some time before SO will became the only source of autogenerated answers on SO.


Why can’t moderators on HN deal with a torrent of AI generated comment?

Mission has been f**ing accomplished and now we will be enjoyin the xkcd dystopia: https://xkcd.com/810/


Imagine the AI dystopia being like the entire Earth. We are only about a few shovel fulls deep. But don't worry, tech companies are spending billions of dollars on tools to dig even faster.

Buckle up, because it's going to be a wild ride.


There are 181 Stack Exchange sites. There are fewer than 10 that I have found, in years of using the place, that are about computers.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: