Hacker News new | past | comments | ask | show | jobs | submit login

At $0.60 for 20k prompt tokens, it's not going to be cheap so they will need to bring the price down to get broader adoption.

As far as I can tell, the initial reading in of the document (let's say a 20k token one), will be a repeated cost for each subsequent query over the document. If I have a 20k token document, and ask 10 follow-up prompts consisting of 100 tokens, that would take me to a total spend of 20k * 10 + (10 * 11)/2 * 100 = 205,500 prompt tokens, or over $6. This does not include completion tokens or the response history which would edge us closer to $8 for the chat session.




What I’ve read is that people let a kind of meta-chat run alongside the client interaction. The meta channel decides what parts of the history to retain so the primary channel doesn’t use as many resources. You could let GPT decide when it needs to see the whole context again etc. There’s a lot of interesting ways to let GPT handle it’s own scope, I think.


Is this anything like what Langchain supports which is various strategies for chat history compression (summarisation)?

https://www.pinecone.io/learn/langchain-conversational-memor...

But aside from this, if I have a large document that I want to "chat" with, it looks like I am either chunking it and then selectively retrieving relevant subsections at question time, or I am naively dumping the whole document (that now fits in 32k) and then doing the chat, at a high cost. So 32k (and increased context size in general) does not look to be a huge gamechanger in patterns of use until cost comes down by an order of magnitude or two.


Yes, that’s one example. Relatively simple to implement yourself. Any frontend for chatgpt already does the basic thing, which is to pass the previous messages along with the prompt.

I think we may end up with first and second stage completions, where the first stage prepares the context for the second stage. The first stage can be a (tailored) gpt3.5 and the second stage can do the brainy work. That way you can actively control costs by making the first stage forward a context of a given maximum size.


Is this a native feature? Or something home brewed? If the second, how to use it?


Right now all of this is people tinkering with the API. If you look at the docs you will note that it doesn’t even provide chat history or any kind of session. You have to pass all context yourself. So you’re already homebrewing that, why not add some spice.


I forget if Langchain can do that, but something along those lines will exist if it doesn't already, it's too obvious not to, and too important for the free ChatGPT equivalents which will be popping up over a matter of days/weeks now that truly free versions of llama are coming out.

TL;DR the puck should get there very soon, not just Really Soon Now.


> At $0.60 for 20k prompt tokens

Is it $0.60 even for the input tokens?


Yes, its $0.03/1K for prompt and $0.06/1K for response, which is $0.6/20K for prompt, $1.2/20K for response.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: