Hacker News new | past | comments | ask | show | jobs | submit login

This is a great idea; I wonder if anyone is working on AI censorship monitoring at scale or at all. A secondary model could compare “censorship candidate” prompt results over time to classify how those results changed, and if those changes represent censorship or misinformation.



There's also (I think?) been some research in the direction of figuring out more abstract notions of how models perceive various 'concepts'. I'd be interested in the LLM version of diffs to see where changes have been implemented overall, too.

But really, the trouble is that it's tough to predict ahead of time what kinds of things are likely to be censored in the future; if I were motivated to track this, I'd just make sure to keep a copy of each version of the model in my personal archive for future testing with whatever prompts seem reasonable in the future.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: