Good tool to test the new capability. Thanks for sharing.
My limited testing has produced okay result for a trivial use case and very disappointing results for a simple use case.
Trivial: what is the time. |
Claude: took screnshot and read the time off the bottom right. |
Cost: $0.02
Simple: download a high resolution image of singapore skyline and set it as desktop wallpaper |
Claude: description of steps looks plausible but actions are wild and all over the place. opens national park service website somehow and only other action it is able to do is right click a couple of times. failed! |
Cost: $0.37
Long way to go before it can be used for even hobby use cases I feel.
PS: is it possible that the screenshots include a image of Agent.exe itself and that is creating a poor feedback loop somehow?
My limited testing has produced okay result for a trivial use case and very disappointing results for a simple use case.
Trivial: what is the time. | Claude: took screnshot and read the time off the bottom right. | Cost: $0.02
Simple: download a high resolution image of singapore skyline and set it as desktop wallpaper | Claude: description of steps looks plausible but actions are wild and all over the place. opens national park service website somehow and only other action it is able to do is right click a couple of times. failed! | Cost: $0.37
Long way to go before it can be used for even hobby use cases I feel.
PS: is it possible that the screenshots include a image of Agent.exe itself and that is creating a poor feedback loop somehow?