> A helpful rule of thumb is that one token generally corresponds to ~4 characte...

sdo72 · on May 6, 2023

32k tokens, 3/4 of 32k is 24k words, each page average is 500 or 0.5k words, so that's basically 24k / .5k = 24 x 2 =~48 pages.

capableweb · on May 6, 2023

That's great, imagine if more researchers were as excited to reproduce results as you are? The world would be much better.

sdwr · on May 6, 2023

This comes across as patronizing to me. "Who's a good little verifier. You are!"

capableweb · on May 6, 2023

That's no good, wasn't my point if so. Luckily it seems like you're in the minority so far, so at least I managed to get the right feeling across to some people.

tysam_and · on May 6, 2023

Forgive my ignorance, but what is the relevance of this to the above comment?

capableweb · on May 6, 2023

We both arrived at mostly the same result from the same question of "how many pages of text would 32k tokens be?", they basically did the calculation again albeit in a slightly different way. Just like when researchers try to reproduce the results of other's studies.

bluish29 · on May 6, 2023

probably off topic, but part of this is "Good luck writing a grant proposal that says I want to reproduce the work of this group and get it accepted". Unless of course you are claiming ground breaking paradigm shift like evidence of unified theory or super-conductivity at room temperature.

s-macke · on May 6, 2023

This is a better calculator, because it also supports GPT-4:

https://tiktokenizer.vercel.app/

Not sure, why they don't support GPT-4 on their own website.

modernpink · on May 6, 2023

At $0.60 for 20k prompt tokens, it's not going to be cheap so they will need to bring the price down to get broader adoption.

As far as I can tell, the initial reading in of the document (let's say a 20k token one), will be a repeated cost for each subsequent query over the document. If I have a 20k token document, and ask 10 follow-up prompts consisting of 100 tokens, that would take me to a total spend of 20k * 10 + (10 * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does not include completion tokens or the response history which would edge us closer to $8 for the chat session.

smodo · on May 6, 2023

What I’ve read is that people let a kind of meta-chat run alongside the client interaction. The meta channel decides what parts of the history to retain so the primary channel doesn’t use as many resources. You could let GPT decide when it needs to see the whole context again etc. There’s a lot of interesting ways to let GPT handle it’s own scope, I think.

modernpink · on May 6, 2023

Is this anything like what Langchain supports which is various strategies for chat history compression (summarisation)?

https://www.pinecone.io/learn/langchain-conversational-memor...

But aside from this, if I have a large document that I want to "chat" with, it looks like I am either chunking it and then selectively retrieving relevant subsections at question time, or I am naively dumping the whole document (that now fits in 32k) and then doing the chat, at a high cost. So 32k (and increased context size in general) does not look to be a huge gamechanger in patterns of use until cost comes down by an order of magnitude or two.

smodo · on May 6, 2023

Yes, that’s one example. Relatively simple to implement yourself. Any frontend for chatgpt already does the basic thing, which is to pass the previous messages along with the prompt.

I think we may end up with first and second stage completions, where the first stage prepares the context for the second stage. The first stage can be a (tailored) gpt3.5 and the second stage can do the brainy work. That way you can actively control costs by making the first stage forward a context of a given maximum size.

bravura · on May 6, 2023

Is this a native feature? Or something home brewed? If the second, how to use it?

smodo · on May 6, 2023

Right now all of this is people tinkering with the API. If you look at the docs you will note that it doesn’t even provide chat history or any kind of session. You have to pass all context yourself. So you’re already homebrewing that, why not add some spice.

happycube · on May 6, 2023

I forget if Langchain can do that, but something along those lines will exist if it doesn't already, it's too obvious not to, and too important for the free ChatGPT equivalents which will be popping up over a matter of days/weeks now that truly free versions of llama are coming out.

TL;DR the puck should get there very soon, not just Really Soon Now.

1024core · on May 6, 2023

> At $0.60 for 20k prompt tokens

Is it $0.60 even for the input tokens?

dragonwriter · on May 6, 2023

Yes, its $0.03/1K for prompt and $0.06/1K for response, which is $0.6/20K for prompt, $1.2/20K for response.

skybrian · on May 6, 2023

If it’s not any faster, I’m thinking that how long you’re willing to wait for an answer will be the practical bottleneck?

38 seconds for that example.

dmix · on May 6, 2023

I wonder how many LoC that is on average. Is there an average for LoC? It’s probably based on the language…

wongarsu · on May 6, 2023

Testing some random rust code, it's about 15-20 tokens per LoC, so about 1500-2000 LoC in a 32k context.

Interestingly, using 2-space indentation, as soon as you are about 3-4 indentation levels deep you spend as many tokens on indentation as on the actual code. For example, "log::LevelFilter::Info" is 6 tokens, same as 6 consecutive spaces. There are probably a lot of easy gains here reformatting your code to use longer lines or maybe no indentation at all.

ec109685 · on May 6, 2023

Make sure you are testing with the tiktokenizer: https://news.ycombinator.com/item?id=35460648

wongarsu · on May 6, 2023

Ah, good catch, it's actually closer to 8 tokens per LoC with GPT4's toktoken, so about twice as good. Some quick testing suggests that's mostly down to better whitespace handling.

JimDabell · on May 6, 2023

> There are probably a lot of easy gains here reformatting your code to use longer lines or maybe no indentation at all.

Using tabs for indentation needs fewer tokens than using multiple spaces for indentation.

mafuy · on May 6, 2023

Curious how this point in the tabs vs. spaces debate re-emerges

ethbr0 · on May 6, 2023

Finally, APL's day is here! https://en.m.wikipedia.org/wiki/APL_(programming_language)

omega3 · on May 6, 2023

How do this even work if the code also uses external libraries?

MacsHeadroom · on May 6, 2023

A LoC is ~7 tokens thanks to the new toktoken tokenization in GPT-4.

32k is ~4.5k LoC

user_named · on May 6, 2023

0.75 * 32,000 = 24,000 words is faster and more direct

capableweb · on May 6, 2023

Thanks, math was never my strong suit and I was writing the comment as I was calculating towards the results, never refined and raw, as it should :)

glandium · on May 7, 2023

Unfortunately, in CJK languages, it averages to slightly more than one token per character.

TeMPOraL · on May 7, 2023

I suppose we have a quite objective way of measuring and comparing semantic density of natural languages.

BaculumMeumEst · on May 6, 2023

if you wanted to send 32k tokens of code, are you able to do that using a model with a 4k context limit by spreading those tokens out across multiple messages? or does it not work that way?

simonbw · on May 6, 2023

Not really. The API is stateless, you pass it in a whole conversation and it responds with the next message. The entire conversation including its response is limited to 32k tokens.

BaculumMeumEst · on May 6, 2023

i’m just confused because i thought i remembered sending long chunks of code using the api, the request will fail, but then i would split it up and then it would work okay.

i guess i’m running into a different limit (not context length), or maybe i’m misremembering

dragonwriter · on May 6, 2023

The context limit is for request + response, and there is no storage in between requests (ongoing chat interactions are done by adding prior interactions to the prompt, so the whole chat – before things start falling out of history – is limited to the context window.)