In my experience, these models are at least as good as gpt3.5-turbo and they're very fast on consumer grade gpus (if you use quantized q3_m, q4_m, q5_m). The 1.3b is fast and good enough on consumer gpus to be used for completely local/offline code autocomplete.
With an 8gb vram gpu, I tend to prefer these models over chatgpt. Granted I'm a bit strapped for vram so I'm stuck at 6.7b q3_m or q4_m, so I often drop to 1.3b q6. It gives marginally worse quality answers than gpt3.5-turbo for me, but it's so stupid fast I don't care. On my 2070 it generates ~a dozen or more lines of code + prose per second.
If you only have a cpu, the 1.3b model is plenty fast enough for chat although it's definitely too large for real time / low latency tasks like code autocomplete.
1.3b - https://huggingface.co/TheBloke/deepseek-coder-1.3b-instruct...
In my experience, these models are at least as good as gpt3.5-turbo and they're very fast on consumer grade gpus (if you use quantized q3_m, q4_m, q5_m). The 1.3b is fast and good enough on consumer gpus to be used for completely local/offline code autocomplete.
With an 8gb vram gpu, I tend to prefer these models over chatgpt. Granted I'm a bit strapped for vram so I'm stuck at 6.7b q3_m or q4_m, so I often drop to 1.3b q6. It gives marginally worse quality answers than gpt3.5-turbo for me, but it's so stupid fast I don't care. On my 2070 it generates ~a dozen or more lines of code + prose per second.
If you only have a cpu, the 1.3b model is plenty fast enough for chat although it's definitely too large for real time / low latency tasks like code autocomplete.