Hacker Newsnew | past | comments | ask | show | jobs | submit | jlopes2's commentslogin

This feels a little to high level for me, the leap from what LLM does today to provable theorem set is rough.

community is having large debates on whether an LLM can reason outside of its training.This feels ignored in here.


Let’s see the code. A bit skeptical, this hasnt over complicated something architecturally. Need more clear drawings of architecture. What prompts exist, what tool calls are made, and what gets updated.


I just did a talk with Jerry from LlamaIndex earlier this week. https://www.youtube.com/watch?v=eLXivBehPGo

Included here is a bit of the old tried and true: NDCG/MRR/Precision @k - what you really want for measuring your information retrieval systems.

But we also talk through a bit of the "new", how to use Evals to generate the building blocks for those metrics above. You will want both hand labels and the automated Evals in the end to evaluate your system.


Awesome overview of data prep but I am probably a bit biased.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: