The reasons listed in TFA - "confidentiality, sensitive data and compromising authors’ intellectual property" - make sense to discourage reviewers from using cloud-based LLMs.
There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.
However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?
Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.
Yes, there are often strong reasons to have peers as gatekeepers. Scientific writing is extremely information-dense. Consider a niche technical task that you work on -- now consider summarizing a day's worth of work in one or two sentences, designed to be read by someone else with similar expertise. In most scientific fields, the niches are pretty small, The context necessary to parse that dense scientific writing into a meaningful picture of the research methods is often years/decades of work in the field. Only peers are going to have that context.
There are also strong reasons why the peers-as-gatekeepers model is detrimental to the pursuit of knowledge, such as researchers forming semi-closed communities that bestow local political power on senior people in the field, creating social barriers to entry or critique. This is especially pernicious given the financial incentives (competition for a limited pool of grant money; award of grant money based on publication output) that researchers are exposed to.
I think if you leave authors alone they will be more likely to write in the first category rather than the second. After all, papers are mainly written to communicate your findings to your direct peers. So information dense isn't bad because the target audience understands.
Of course that makes it harder for people outside to penetrate but this also depends on the culture of the specific domain and there's usually people writing summaries and surveys. Great task for grad students tbh (you read a ton of papers, summarize, and by that point you should have a good understanding of what needs to be worked on in the field and not just dragged through by your advisor)
Agreed: information-dense isn't bad at all. It's a reason for peer review, though: people other than peers in the field have a much harder time reviewing an article for legitimacy, because they lack the context.
It's a fair point. In the ideal setting, peer review can really be a very informative and important gate. And who better to be the gatekeeper than someone who understands the context?
However, there are still big issues with how these peers perform reviews today [1].
For example, if there's a scientifically arbitrary cutoff (e.g., the 25% acceptance rate at top conferences), reviewers will be mildly incentivized to reject (what they consider to be) "borderline-accept" submissions. If the scores are still "too high", the associate editors will overrule the decision of the reviewers, sometimes for completely arbitrary reasons [2].
There's also a whole number of things reviewers should look out for, but for which they neither have the time, space, tools, nor incentives to do. For example, reviewers are meant to check if the claims fit what is cited, but I can't know how many actually take the time to look at the cited content. There's also checking for plagiarism, GenAI and hallucinated content, does the evidence support the claims, how were charts generated, "novelty", etc. There are also things that reviewers shouldn't check, but that pop up occasionally [3].
However, you would be right to point out that none of this has to do with peers doing the gatekeeping, but with how the process is structured. But I'd argue that this structure is so common that it's basically synonymous with peer review. If it results in bad experiences often enough, we really need to push for the introduction of more tools and honesty into the process [4].
[1] This is based on my experience as a submitter and a reviewer. From what I see/hear online and in my community, it's not an uncommon experience, but it could be a skewed sample.
[3] Example things reviewers shouldn't check for or use as arguments: did you cite my work; did you cite a paper from the conference; can I read the diagram without glasses if I print out the PDF; do you have room to appeal if I say I can't access publicly available supplementary material; etc.
[4] Admittedly, I also don't know what would be the solution. Still, some mechanisms come to mind: open but guaranteed double-blind anonymous review; removal of arbitrary cutoffs for digital publications; (responsible, gradual) introduction of tools like LLMs and replication checks before it gets to the review stage; actually monitoring reviewers and acting on bad behavior.
> However, I think it's high time to reconsider what scientific review is supposed to be
I've been arguing for years we should publish to platforms like OpenReview and that basically we check for plagiarism and obvious errors but otherwise publish.
The old days the bottleneck was the physical sending out of papers. Now that's cheap. So make comments public. We're all on the same side. The people that will leave reviews are more likely to actually be invested in the topic rather than doing review as purely a service. It's not perfect but no system will be and we currently waste lots of time chasing reviewers
I agree. OpenReview is a good initiative, and while it has its own flaws, it's definitely a step in the right direction.
The arXiv and the derivative preprint repositories (e.g., bioRxiv) are other good initiatives.
However, I don't think it's enough to leave the content review completely to the community. There's are known issues with researchers using arXiv, for example, to stake claims on novel things, or readers jumping on the claims made by well-known institutions in preprints, which may turn out to be overconfident or bogus.
I believe that a number of checks (beyond plagiarism) need to happen before the paper is endorsed by a journal or a conference. Some of these can and should be done in a peer review-like format, but it needs to be heavily redesigned to support review quality without sacrificing speed. Also, there are things that we have good tools for (e.g., checking citation formatting), so this part should be integrated.
Plus, time may be one of the bottlenecks, but that's partly because publishers take money from academic institutions, yet expect voluntary service. There's no reason for this asymmetry, IMO.
> There's are known issues with researchers using arXiv, for example, to stake claims on novel things
I think this is more a function of the metric system. That we find works get through review better when "novel". So this is used over-zealously. But get rid of the formal review system and that goes too.
> which may turn out to be overconfident
This is definitely an issue but one we must maintain as forgivable. Mistakes must be allowed in science. Minimized, but allowed. Mistakes are far too easy to make when working at the edge of knowledge. I'd wager >90% of papers have mistakes. I can tell you that 100% of mine have mistakes (all found after publication) and I don't know another researcher who says differently.
> bogus
And these people should be expelled.
A problem that the current system actually perpetuates. This is because when authors plagiarize the papers get silently desk rejected. Other researchers do not learn of this and cannot then take extra precaution at other works by these authors. IMO fraud is one of the greatest sins you can make in science. Science depends a lot on trust (even more so because our so-called peer-review system places emphasis on novelty and completely rejects replication) on authors.
The truth is that no reviewer can validate claims by reading a paper. I can tell you I can't do that even for papers that are in my direct niche. But what a reviewer can do is invalidate. We need to be clear about that difference and the bias. Because we should never interpret papers as "this is the truth" but "this is likely the truth under these specific conditions". Those are very different things.
I agree that checking is better, but I don't believe absolutely necessary. The bigger problem I have right now is that we are publishing so much that it is difficult to get a reviewer who is a niche expert, or sub-domain expert. More generalized reviewers can't properly interpret papers. It is too easy to over-generalize results and think they are just doing the same thing as another work (I've seen this way too often), or see something as too incremental (almost everything is incremental... and it is going to stay that way as long as we have a publish or perish system). BUT the people that are niche experts are going to tend to find the papers because they are seeking them out.
But what I think does need to be solved still is the search problem. It's getting harder and frankly we shouldn't make scientists also be marketers. It is a waste of time and creates perverse incentives, as you've even mentioned.
> because publishers take money from academic institutions,
And the government.
Honestly I hate how shady this shit is. I understand conferences, where there's a physical event, but paid-access journals are a fucking scam (I'd be okay with a small fee for server costs and such but considering arxiv and openreview, I suspect this isn't very costly). They are double dipping. Getting money from govs, academics paying for access, but then getting the literal "product" they are selling given to them for free and then the "quality control" of that "product" also being done for free. And by "for free" I mean on the dime of academic institutions and government tax dollars.
This misses the point entirely. Science is a dialogue. Peer review is just a protocol to signal that the article has been part of some dialogue.
Anyone can put anything to paper. Now more than ever - see all the vibe physics floating around. Peer review is just an assurance that what you are about to read isn't just some write-only output of some spurious thought process.
I think it's interesting that AI is probably unintuitively good at spotting fraud in papers due to their ability to hold more context than majority of humans. I wish someone explored this to see if it can spot academic fraud that isn't in their training data already.
It doesn't have to be reliable! It just has to flag things: "hey these graphs look like they were generated using (formula)" or "these graphs do not seem to represent realistic values / real world entrophy" - it just has to be a tool that stops very advanced fraud from slipping through when it already passed human peer review.
The only reason why this is helpful is because humans have natural biases and/or inverse of AI biases which allow them to find patterns that might just be the same graph being scaled up 5 to 10 times.
I hope I'm wrong but I haven't seen anything like this in practice. I would imagine we have the same problem as before where we could use it as an extra filter but the amount of shit that comes out makes the process not actually any more accurate, just faster.
Having seen from close-up how these reviews go, I get why people use tools like this unfortunately. it doesn't make me very hopeful for the near future of reviewing.
that's why I wanted someone to actually do a real test to see if it has even a shred of accuracy since that means it could be improved upon in the future.
Am I missing something here? I am new to posting at HN, despite being a long-time reader.
I get that HN has a policy to allow duplicates so that duplicates that were missed for arbitrary timing reasons can still gain traction at later times. I've seen plenty of "[Duplicate]" tagged posts, and have just seen this as a sort of useful thing for readers (duplicates may have interesting info, or seeing that the dupe did or did not gain traction also gives me info). But maybe I am missing something here, particularly etiquette-wise?
It’s certainly okay to link to a previous discussion, but “duplicate” implies that you think the present submission shouldn’t exist, and the previous submission doesn’t actually provide any discussion.
The fact that a previous submission didn’t gain traction isn’t usually interesting, because it can be pretty random whether something gains traction or not, depending on time of day and audience that happens to be online.
Okay, I don't in general see "duplicate" as implying this, but I take your point, and was wondering if that might be the etiquette here.
I also think, on reflection, that you are right in this particular case (given there are no comments on the previous duplicate) so, thank you also for clarifying.
I suppose in the future an e.g. "[Previous discussion]" tag would be more appropriate, providing comments were made, otherwise, just say nothing and leave it to HN.
Yeah I have that on so see those all the time, I was more wondering why I got a strange comment about tagging a duplicate, and was wondering if I was breaching some kind of etiquette.
people with 100k+ karma often breach the etiquitte they preach so I wouldn't worry too much about it, worse case you get downvoted to -5 and it'll become dead.
Guidance needs to be more specific. Failing to use AI for search often means you are wasting a huge amount of time, ChatGPT 5.2 Extended Thinking with search enabled speeds up research obscenely, and I'd be more concerned if reviewers were NOT making use of such tools in reviews.
Seeing the high percentage of usage of AI for composing reviews is concerning, but, also, peer review is an unpaid racket which seems basically random anyway (https://academia.stackexchange.com/q/115231), and probably needs to die given alternatives like ArXiV and OpenPeerReview and etc. I'm not sure how much I care about AI slop contaminating an area that already might be mostly human slop in the first place.
That's a wrong way of using AI in peer review. A key part of reviewing a paper is reading it without preconceptions. After you have done the initial pass, AI can be useful for a second opinion, or for finding something you may have missed.
But of course, you are often not allowed to do that. Review copies are confidential documents, and you are not allowed to upload them to random third-party services.
Peer review has random elements, but thats true for all other situations (such as job interviews), where the final decision is made using subjective judgment. There is nothing wrong in that.
> A key part of reviewing a paper is reading it without preconceptions
I get where you are coming from here, but, in my opinion, no, this is not part of peer review (where expertise implies preconceptions), nor for really anything humans do. If you ignore your pre-conceptions and/or priors (which are formed from your accumulated knowledge and experience), you aren't thinking.
A good example in peer review (which I have done) would be: I see a paper where I have some expertise of the technical / statistical methods used in a paper, but not of the very particular subject domain. I can use AI search to help me find papers in the subject domain faster than I can on my own, and then I can more quickly see if my usual preconceptions about the statistical methods are relevant on this paper I have to review. I still have to check things, but, previously, this took a lot more time and clever crafting of search queries.
Failing to use AI for search in this way harms peer review, because, in practice, you do less searching and checking than AI does (since you simply don't have the time, peer review being essentially free slave labor).
By "without preconceptions", I mean that your initial review should not be influenced by anyone else's opinions. In CS, conference management software often makes this explicit by requiring you to upload your review before you can see other reviews. (You can of course revise your review afterwards.)
You are also supposed to review the paper and not just check it for correctness. If the presentation is unclear, or if earlier sections mislead the reader before later sections clarify the situation, you are supposed to point that out. But if you have seen an AI summary of the paper before reading it, you can no longer do that part. (And if a summary helps to interpret the paper correctly, that summary should be a part of the paper.)
If you don't have sufficient expertise to review every aspect of the paper, you can always point that out in the review. Reading papers in unfamiliar fields is risky, because it's easy to misinterpret them. Each field has its own way of thinking that can only be learned by exposure. If you are not familiar with the way of thinking, you can read the words but fail to understand the message. If you work in a multidisciplinary field (such as bioinformatics), you often get daily reminders of that.
Then on top of that there's the slop that comes from the university's PR department, where they turn "New possibly-interesting lab result in surface chemistry" into "Trillion dollar battery technology launched".
(Now that I think about it, I haven't seen much battery hype lately. The battery hype people may have pivoted to AI. Lots of stuff is going on in batteries, but mostly by billion-dollar companies in China quietly building plants and mostly shutting up about what's going on inside.)
Journals need to find a way to give guidance on what is and isn't appropriate and to let reviewers explain how they used AI tools... because like, you aren't going to nag people out of using AI to do UNPAID work 90% faster and produce results that are 90+th percentile of review quality (let's be real, there are a lot of bad flesh and blood reviewers).
There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.
However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?
Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.
reply