Hacker News new | past | comments | ask | show | jobs | submit login

Not my intent to argue about data at any point in time but note that as of today gpt-3.5-turbo-0613 (June 13th 2023) scores 1112, above OpenChat (1075) and OpenHermes(1077).



That's not much relative difference. How much does 1% difference make?

I am tempted to call it equivalent.


I guess one thing people have learned is that these small differences whatever benchmark turn out to be huge differences qualitatively.


nah that is sizeable




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: