At $0.60 for 20k prompt tokens, it's not going to be cheap so they will need to ...

smodo · on May 6, 2023

What I’ve read is that people let a kind of meta-chat run alongside the client interaction. The meta channel decides what parts of the history to retain so the primary channel doesn’t use as many resources. You could let GPT decide when it needs to see the whole context again etc. There’s a lot of interesting ways to let GPT handle it’s own scope, I think.

modernpink · on May 6, 2023

Is this anything like what Langchain supports which is various strategies for chat history compression (summarisation)?

https://www.pinecone.io/learn/langchain-conversational-memor...

But aside from this, if I have a large document that I want to "chat" with, it looks like I am either chunking it and then selectively retrieving relevant subsections at question time, or I am naively dumping the whole document (that now fits in 32k) and then doing the chat, at a high cost. So 32k (and increased context size in general) does not look to be a huge gamechanger in patterns of use until cost comes down by an order of magnitude or two.

smodo · on May 6, 2023

Yes, that’s one example. Relatively simple to implement yourself. Any frontend for chatgpt already does the basic thing, which is to pass the previous messages along with the prompt.

I think we may end up with first and second stage completions, where the first stage prepares the context for the second stage. The first stage can be a (tailored) gpt3.5 and the second stage can do the brainy work. That way you can actively control costs by making the first stage forward a context of a given maximum size.

bravura · on May 6, 2023

Is this a native feature? Or something home brewed? If the second, how to use it?

smodo · on May 6, 2023

Right now all of this is people tinkering with the API. If you look at the docs you will note that it doesn’t even provide chat history or any kind of session. You have to pass all context yourself. So you’re already homebrewing that, why not add some spice.

happycube · on May 6, 2023

I forget if Langchain can do that, but something along those lines will exist if it doesn't already, it's too obvious not to, and too important for the free ChatGPT equivalents which will be popping up over a matter of days/weeks now that truly free versions of llama are coming out.

TL;DR the puck should get there very soon, not just Really Soon Now.

1024core · on May 6, 2023

> At $0.60 for 20k prompt tokens

Is it $0.60 even for the input tokens?

dragonwriter · on May 6, 2023

Yes, its $0.03/1K for prompt and $0.06/1K for response, which is $0.6/20K for prompt, $1.2/20K for response.