Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure, I think there is a lot of research being done here.

Actually, browser use works quite well with vision turned off, it just sometimes gets stuck at some trivial vision tasks. The interesting thing is that screenshot approach is often cheaper than cleaned up html, because some websites have HUGE action spaces.

We looked at some papers (like ferret ui) but i think we can do much better on html tasks. Also, there is a lot of space to improve the current pipeline.



Would be really cool if you could tie this into Claude's computer use APIs!


Do you think they do any super fancy magic other than for example how ferret ui does their classification of ui elements? It could be very interesting to test head to head hope much better you can make computer use by adding html (it’s much better from our quick testing, just don’t know the numbers).


My wallet just ran way. Come back you!


Haha with browser use or Computer use??




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: