Ironically, pitting a LLM (ideally a completely different model) up against what... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		embedding-shape 2 days ago \| parent \| context \| favorite \| on: So you wanna build a local RAG? Ironically, pitting a LLM (ideally a completely different model) up against what you're testing, letting it write human "out of the ordinary" queries to use as test cases tend to work well too, if you don't have kids you can use as a free workforce :)

scosman 2 days ago [–]

I build a system to do exactly this: https://docs.kiln.tech/docs/evaluations/evaluate-rag-accurac...

Basically it:

- iterates over your docs to find knowledge specific to the content

- generates hundreds of pairs of [synthetic query, correct answer]

- evaluates different RAG configurations for recall

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact