You might be interested in "Text Embeddings Reveal (Almost) As Much As Text": > ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jncraton on March 5, 2024 \| parent \| context \| favorite \| on: Launch HN: Greptile (YC W24) - RAG on codebases th... You might be interested in "Text Embeddings Reveal (Almost) As Much As Text": > We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes. https://arxiv.org/pdf/2310.06816.pdf There's certainly information loss, but there is also a lot of information still present.

simonw on March 5, 2024 [–]

Yeah, that paper is what I was thinking about. https://simonwillison.net/2024/Jan/8/text-embeddings-reveal-...

“a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly”.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact