Hacker News new | past | comments | ask | show | jobs | submit login

> A helpful rule of thumb is that one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words) - https://platform.openai.com/tokenizer

32,000 (tokens) * 4 = 128,000 (characters)

> While a general guideline is one page is 500 words (single spaced) or 250 words (double spaced), this is a ballpark figure - https://wordcounter.net/words-per-page

Assuming (on average) one word = 5 letters, context ends up being ~50 pages (128000 / (500 * 5)).

Just to put the number of "32k tokens" into somewhat estimated context.




32k tokens, 3/4 of 32k is 24k words, each page average is 500 or 0.5k words, so that's basically 24k / .5k = 24 x 2 =~48 pages.


That's great, imagine if more researchers were as excited to reproduce results as you are? The world would be much better.


This comes across as patronizing to me. "Who's a good little verifier. You are!"


That's no good, wasn't my point if so. Luckily it seems like you're in the minority so far, so at least I managed to get the right feeling across to some people.


Forgive my ignorance, but what is the relevance of this to the above comment?


We both arrived at mostly the same result from the same question of "how many pages of text would 32k tokens be?", they basically did the calculation again albeit in a slightly different way. Just like when researchers try to reproduce the results of other's studies.


probably off topic, but part of this is "Good luck writing a grant proposal that says I want to reproduce the work of this group and get it accepted". Unless of course you are claiming ground breaking paradigm shift like evidence of unified theory or super-conductivity at room temperature.


This is a better calculator, because it also supports GPT-4:

https://tiktokenizer.vercel.app/

Not sure, why they don't support GPT-4 on their own website.


At $0.60 for 20k prompt tokens, it's not going to be cheap so they will need to bring the price down to get broader adoption.

As far as I can tell, the initial reading in of the document (let's say a 20k token one), will be a repeated cost for each subsequent query over the document. If I have a 20k token document, and ask 10 follow-up prompts consisting of 100 tokens, that would take me to a total spend of 20k * 10 + (10 * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does not include completion tokens or the response history which would edge us closer to $8 for the chat session.


What I’ve read is that people let a kind of meta-chat run alongside the client interaction. The meta channel decides what parts of the history to retain so the primary channel doesn’t use as many resources. You could let GPT decide when it needs to see the whole context again etc. There’s a lot of interesting ways to let GPT handle it’s own scope, I think.


Is this anything like what Langchain supports which is various strategies for chat history compression (summarisation)?

https://www.pinecone.io/learn/langchain-conversational-memor...

But aside from this, if I have a large document that I want to "chat" with, it looks like I am either chunking it and then selectively retrieving relevant subsections at question time, or I am naively dumping the whole document (that now fits in 32k) and then doing the chat, at a high cost. So 32k (and increased context size in general) does not look to be a huge gamechanger in patterns of use until cost comes down by an order of magnitude or two.


Yes, that’s one example. Relatively simple to implement yourself. Any frontend for chatgpt already does the basic thing, which is to pass the previous messages along with the prompt.

I think we may end up with first and second stage completions, where the first stage prepares the context for the second stage. The first stage can be a (tailored) gpt3.5 and the second stage can do the brainy work. That way you can actively control costs by making the first stage forward a context of a given maximum size.


Is this a native feature? Or something home brewed? If the second, how to use it?


Right now all of this is people tinkering with the API. If you look at the docs you will note that it doesn’t even provide chat history or any kind of session. You have to pass all context yourself. So you’re already homebrewing that, why not add some spice.


I forget if Langchain can do that, but something along those lines will exist if it doesn't already, it's too obvious not to, and too important for the free ChatGPT equivalents which will be popping up over a matter of days/weeks now that truly free versions of llama are coming out.

TL;DR the puck should get there very soon, not just Really Soon Now.


> At $0.60 for 20k prompt tokens

Is it $0.60 even for the input tokens?


Yes, its $0.03/1K for prompt and $0.06/1K for response, which is $0.6/20K for prompt, $1.2/20K for response.


If it’s not any faster, I’m thinking that how long you’re willing to wait for an answer will be the practical bottleneck?

38 seconds for that example.


I wonder how many LoC that is on average. Is there an average for LoC? It’s probably based on the language…


Testing some random rust code, it's about 15-20 tokens per LoC, so about 1500-2000 LoC in a 32k context.

Interestingly, using 2-space indentation, as soon as you are about 3-4 indentation levels deep you spend as many tokens on indentation as on the actual code. For example, "log::LevelFilter::Info" is 6 tokens, same as 6 consecutive spaces. There are probably a lot of easy gains here reformatting your code to use longer lines or maybe no indentation at all.


Make sure you are testing with the tiktokenizer: https://news.ycombinator.com/item?id=35460648


Ah, good catch, it's actually closer to 8 tokens per LoC with GPT4's toktoken, so about twice as good. Some quick testing suggests that's mostly down to better whitespace handling.


> There are probably a lot of easy gains here reformatting your code to use longer lines or maybe no indentation at all.

Using tabs for indentation needs fewer tokens than using multiple spaces for indentation.


Curious how this point in the tabs vs. spaces debate re-emerges



How do this even work if the code also uses external libraries?


A LoC is ~7 tokens thanks to the new toktoken tokenization in GPT-4.

32k is ~4.5k LoC


0.75 * 32,000 = 24,000 words is faster and more direct


Thanks, math was never my strong suit and I was writing the comment as I was calculating towards the results, never refined and raw, as it should :)


Unfortunately, in CJK languages, it averages to slightly more than one token per character.


I suppose we have a quite objective way of measuring and comparing semantic density of natural languages.


if you wanted to send 32k tokens of code, are you able to do that using a model with a 4k context limit by spreading those tokens out across multiple messages? or does it not work that way?


Not really. The API is stateless, you pass it in a whole conversation and it responds with the next message. The entire conversation including its response is limited to 32k tokens.


i’m just confused because i thought i remembered sending long chunks of code using the api, the request will fail, but then i would split it up and then it would work okay.

i guess i’m running into a different limit (not context length), or maybe i’m misremembering


The context limit is for request + response, and there is no storage in between requests (ongoing chat interactions are done by adding prior interactions to the prompt, so the whole chat – before things start falling out of history – is limited to the context window.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: