How fast is it in single batch mode? | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ImprobableTruth on Feb 20, 2023 \| parent \| context \| favorite \| on: Running large language models like ChatGPT on a si... How fast is it in single batch mode?

Miraste on Feb 20, 2023 [–]

After turning on compression I was able to fit the whole thing in GPU memory and then it became much faster. Not ChatGPT speeds or anything, but under a minute for a response in their chatbot demo. A few seconds in some cases.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact