> A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words) - https://platform.openai.com/tokenizer
32,000 (tokens) * 4 = 128,000 (characters)
> While a general guideline is one page is 500 words (single spaced) or 250 words (double spaced), this is a ballpark figure - https://wordcounter.net/words-per-page
Assuming (on average) one word = 5 letters, context ends up being ~50 pages (128000 / (500 * 5)).
Just to put the number of "32k tokens" into somewhat estimated context.
That's no good, wasn't my point if so. Luckily it seems like you're in the minority so far, so at least I managed to get the right feeling across to some people.
We both arrived at mostly the same result from the same question of "how many pages of text would 32k tokens be?", they basically did the calculation again albeit in a slightly different way. Just like when researchers try to reproduce the results of other's studies.
probably off topic, but part of this is "Good luck writing a grant proposal that says I want to reproduce the work of this group and get it accepted". Unless of course you are claiming ground breaking paradigm shift like evidence of unified theory or super-conductivity at room temperature.
At $0.60 for 20k prompt tokens, it's not going to be cheap so they will need to bring the price down to get broader adoption.
As far as I can tell, the initial reading in of the document (let's say a 20k token one), will be a repeated cost for each subsequent query over the document. If I have a 20k token document, and ask 10 follow-up prompts consisting of 100 tokens, that would take me to a total spend of 20k * 10 + (10 * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does not include completion tokens or the response history which would edge us closer to $8 for the chat session.
What I’ve read is that people let a kind of meta-chat run alongside the client interaction. The meta channel decides what parts of the history to retain so the primary channel doesn’t use as many resources. You could let GPT decide when it needs to see the whole context again etc. There’s a lot of interesting ways to let GPT handle it’s own scope, I think.
But aside from this, if I have a large document that I want to "chat" with, it looks like I am either chunking it and then selectively retrieving relevant subsections at question time, or I am naively dumping the whole document (that now fits in 32k) and then doing the chat, at a high cost. So 32k (and increased context size in general) does not look to be a huge gamechanger in patterns of use until cost comes down by an order of magnitude or two.
Yes, that’s one example. Relatively simple to implement yourself. Any frontend for chatgpt already does the basic thing, which is to pass the previous messages along with the prompt.
I think we may end up with first and second stage completions, where the first stage prepares the context for the second stage. The first stage can be a (tailored) gpt3.5 and the second stage can do the brainy work. That way you can actively control costs by making the first stage forward a context of a given maximum size.
Right now all of this is people tinkering with the API. If you look at the docs you will note that it doesn’t even provide chat history or any kind of session. You have to pass all context yourself. So you’re already homebrewing that, why not add some spice.
I forget if Langchain can do that, but something along those lines will exist if it doesn't already, it's too obvious not to, and too important for the free ChatGPT equivalents which will be popping up over a matter of days/weeks now that truly free versions of llama are coming out.
TL;DR the puck should get there very soon, not just Really Soon Now.
Testing some random rust code, it's about 15-20 tokens per LoC, so about 1500-2000 LoC in a 32k context.
Interestingly, using 2-space indentation, as soon as you are about 3-4 indentation levels deep you spend as many tokens on indentation as on the actual code. For example, "log::LevelFilter::Info" is 6 tokens, same as 6 consecutive spaces. There are probably a lot of easy gains here reformatting your code to use longer lines or maybe no indentation at all.
Ah, good catch, it's actually closer to 8 tokens per LoC with GPT4's toktoken, so about twice as good. Some quick testing suggests that's mostly down to better whitespace handling.
if you wanted to send 32k tokens of code, are you able to do that using a model with a 4k context limit by spreading those tokens out across multiple messages? or does it not work that way?
Not really. The API is stateless, you pass it in a whole conversation and it responds with the next message. The entire conversation including its response is limited to 32k tokens.
i’m just confused because i thought i remembered sending long chunks of code using the api, the request will fail, but then i would split it up and then it would work okay.
i guess i’m running into a different limit (not context length), or maybe i’m misremembering
The context limit is for request + response, and there is no storage in between requests (ongoing chat interactions are done by adding prior interactions to the prompt, so the whole chat – before things start falling out of history – is limited to the context window.)
32,000 (tokens) * 4 = 128,000 (characters)
> While a general guideline is one page is 500 words (single spaced) or 250 words (double spaced), this is a ballpark figure - https://wordcounter.net/words-per-page
Assuming (on average) one word = 5 letters, context ends up being ~50 pages (128000 / (500 * 5)).
Just to put the number of "32k tokens" into somewhat estimated context.