Hacker News new | past | comments | ask | show | jobs | submit login
How to create a blog post title optimizer with GPT-3 and Hacker News data (minimaxir.com)
218 points by minimaxir on Aug 15, 2022 | hide | past | favorite | 36 comments



This is so funny because the article is actually on the first page, but GPT-3 told him it's not a good title. Lol

So, it's GPT-3 wrong, is his code wrong or it's just that GPT-3 doesn't like hacker news titles?


The model predicted a 1/5 chance that it turned out to be a good title, so those aren't terrible odds for a random chance. Although if this post does hit 100 points, I'll add an update.

That's the difficulty of interpreting the results of AI models and one of the dark secrets of DS/ML practicioners is that they don't always have the right answers, just the best-case scenarios.


Update: it did hit the 100 points. ¯\_(ツ)_/¯


The title was not good I think GPT-3 or Dalle are the keyword put them in any form a lot of people will sneak in :)


Well, can you imagine how much more upvoted it would have been with a good title?


Also, those were variations on a title about this particular topic. I bet if the model were used to score the likelihood that this blog post (with any reasonable title) were to hit front page, it would predict that to be likely.


I wonder, when sampling to shrink the set of bad titles, it samples uniformly. But wouldn't it be better to take the n worst titles (with the least upvotes)? After all, titles near the good/bad threshold are a bit arbitrary and should be hard for the model to get right, would increasing the contrast make it perform better? Or does that limit the model's exposure too much that it ends up harming more than it helps?


It does sample uniformly to shrink the bad titles.

Sampling nonuniformly may introduce an unexpected bias, so there's a tradeoff there.


Reposts are very common on Hacker News. I wonder if incorporating the performance of the same links with differing titles might've yielded a better training set.


Posting guidelines on HN generally require you to not editorialize titles, so most reposts have the exact same title (since it's the title of the original article).


Goodhart's Law: “When a measure becomes a target, it ceases to be a good measure.”

Optimizing for HN frontpage is a terrible idea. (1) HN users are usually hostile against titles that even remotely look clickbait and flag them, (2) It attracts the wrong audience to the blog post.


Speaking as someone who has been writing forever and blogging-as-we-currently-know-it since 2004...

It always depends on what you want out of it. Some people are outcome-oriented. They want reputation and/or the affirmation of crowds, and for them, writing is a process of reverse-engineering what people want to read. At a deep level, they don't want to write, they want to have written something popular.

Others are journey-oriented. They write to work ideas out. paulg championed this o̶v̶e̶r̶ ̶a̶ ̶d̶e̶c̶a̶d̶e̶ ̶a̶g̶o̶ almost twenty years ago when he wrote an essay explaining that the word "essay" is not just a noun, but also a verb suggesting an attempt or experiment: http://www.paulgraham.com/essay.html.

For some ideas, trying to explain them forces us to confront the idea cheaply and identify areas where we haven't thought things through all the way.

And of course, sometimes it's a little of column A, a little of column B. Speaking for myself, my taste is almost entirely for column B: http://braythwayt.com/posterous/2014/10/30/write.html. But this is a big world, and if someone wants what's in column A or to sprinkle some column A on their column B essay, well, this post provides some hints.


Yeah, you can make HN posts that gets upvoted, or HN posts that gets a lot of responses explaining things to you and where you get downvoted. To me the second case is much more valuable than the first case, people helping me root out my biases and fix my worldview is the main reason I use HN. It is really hard to fix your unknown unknowns, but posting things you think are correct and seeing how people correct you is a really fast and easy way to find them out.

But of course you need to contribute things back or you will get negative score and banned, so I mix in some things I know will get upvoted as well.


Anecdotally I agree with you but doesn't this blog post suggest the reverse - click bait does well? The model was trained on a fairly comprehensive set of HN titles and it scores click-bait-y titles with a high "Good" probability. e.g. `"Beware! Uninstalling this PC game deletes your hard drive"` with a `62.0% Good prob`. There's a ton of hidden complexity involved here but if click-bait was generally downvoted by the HN community, we should expect a low "Good" score, right?


"Clickbait" that truthfully describes the contents and attracts readers who will like it is also known as "good headline writing."


A bit of a digression, but when I think "clickbait," I think of optimizing for curiosity about the contents, which is not strictly equivalent to attracting readers who will like the contents.

"Good headline writing" to me suggests being attractive to people who will like the contents, but also allowing people who might not care for the contents to self-filter.


But what if you optimize on data from HN?


My point is: "why optimize in the first place"??


This is a cool demo, but note the author doesn’t actually show the rewritten headlines are better in any way but spot-checking. The conclusion here isn’t that GPT-3 can optimize titles on its own, but that it generates ideas that, when reviewed by a human, can be usefully cherry-picked for inspiration. The task of ensuring, e.g., that the headline is factually accurate is left to the human.


There isn't an easy way to prove one headline is better than another outside of A/B testing, unfortunately.


Even then, is there even a way to do so with A/B testing?

Even in A/B testing you can't fully control for the environment a headline sits in. Whats happening in the world? What styles of headlines are trending?

Proving things is very difficult, and probably not necessary.


"Proving", in this case, is whether the performance of one title is better than another at statistically significant level.

Performance can be faceted by referrer if necessary if there's a need to control the results further, or exclude sources where title A/B testing can't feasibly be done or has noticeably different behavior. (an example is using a different title for social sharing vs. the article title itself)

This is generally how title A/B testing works at every major online publisher.


Okay, now optimize for controversial.


“Apple releases M3 processor with embedded NFT of rust compiler written in go”


In place of Curie or Davinci, the author of the piece utilised GPT-3 Babbage. On the validation set, it had a prediction rate of 68%. Do the other versions have higher ratings? I also wonder how much training that cost.


In the article the author used GPT-3 babbage instead of Curie or Davinci. It got a 68% prediction rate on the validation set. I wonder if the other versions scored better?

Also wonder how much it cost to train it?


I mentioned in the article it cost $2.

Finetuning on curie would be 5x that ($10) which is a bit much to experiment, and also increase the overall generation cost notably.

It's possible there would be an accuracy improvement but it's arguable if it's worth it.


Hey Minimaxir again! This is really funny because I built an entire company around this idea: optimizing titles for YouTube videos: https://CreatorML.com.

Similarly, the user can use GPT-3 to generate titles, and a secondary ranker model scores them for estimated viewership. I use a regression model however.

Anyway, just funny that "great minds think alike" :)


So it's people like you that turn the internet into a clickbait jungle... No thanks.


The first thing I thought of when reading this article was https://news.ycombinator.com/item?id=15567707 where someone created a model that filters Hacker News for articles potentially interesting for them


That is pretty powerful. I didn’t realize we had so many tools already that something like this looks.. easy?


Is this typically how large language models are used for classification? With prompts?


It's one approach. In general you get better performance when the input data is aligned with what the model is trained on.


Should we be concerned that the author is "Data Scientist at BuzzFeed"?


In what way?


Not OP, I don’t think it is concerning if anything makes you more qualified, Buzz Feed has enormous subject matter expertise on titles that work.

However I did it find it amusing after reading your rejection of click baity versions of the title because HN crowd wouldn’t go for it




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: