After turning on compression I was able to fit the whole thing in GPU memory and...

		Miraste on Feb 20, 2023 \| parent \| context \| favorite \| on: Running large language models like ChatGPT on a si... After turning on compression I was able to fit the whole thing in GPU memory and then it became much faster. Not ChatGPT speeds or anything, but under a minute for a response in their chatbot demo. A few seconds in some cases.