Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

hey, author (not op) here. we do do semantic chunking! I think maybe I gave the impression that we don't because of the mention of aggregating context but I tested this with questions that would require aggregating context from 15+ documents (meaning 2x that in chunks), hence the comment in the post!




Is there a way to convert documents into a hierarchical connected graph data structure which references each other similar to how we use personal knowledge tools like Obsidian and ability to traverse this graph? Is GraphRag technique trying to do this exactly?

Not exactly what you’re looking for but Wilson Lin’s search engine creates a graph from the DOM for context. Here’s his write up: https://blog.wilsonl.in/search-engine/

Ah so you’re generating context from multiple docs for your chunks? How do you decide which docs get aggregated?

Haven’t seen an answer better than “vibes” here. Especially with data across multiple domains.

I mean as long as they're not too long I suppose you could use just about any heuristic for grouping sources. Just seems like it would be hard to generate succinct context if you mess it up.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: