Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How fast is it in single batch mode?


After turning on compression I was able to fit the whole thing in GPU memory and then it became much faster. Not ChatGPT speeds or anything, but under a minute for a response in their chatbot demo. A few seconds in some cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: