Anyone can test that Microsoft's software sends the clicks back to Microsoft, although I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.
Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.
Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.
> so I don't believe Bing has been able to crawl our search results directly
Isn't compliance with robots.txt more of a voluntary thing?
I'm not accusing MS of ignoring it when convenient, but if you/we/someone is accusing them of acting unethically wrt search results in the first place, telling the crawler to ignore robots.txt wouldn't be that far away, would it? (And likewise faking the user-agent, etc.)
For better or for worse, UA identification, robots.txt compliance - all those things are voluntary. I'm not suggesting they shouldn't be, but it certainly makes a difference in terms of whether something's possible or not. (And, if you ask me, places an even higher obligation on the actors to behave ethically, lest trust completely evaporates and the whole thing goes to hell in a handbasket).
I am not a lawyer, but as I understand it there is some precedent in the US of intentionally ignoring robots.txt being unauthorized computer access, exposing you to all the liability that entails (possibly criminal).
Internet Archive was sued unsuccessfully. As I understand it a lawsuit is still in process against Google on the topic. So I guess the precedent is weaker than I thought, but still: tread carefully.
If you read the article and other comments here it's been made perfectly clear that the Google toolbar and Chrome browser are not sending similar data back to Google.
Ah, at least the google toolbar does. If you enable PageRank on the Google Toolbar it sends back all the urls you visit just like the bing toolbar.
From the toolbar privacy policy: "Toolbar's enhanced features, such as PageRank and Sidewiki, operate by sending Google the addresses and other information about sites at the time you visit them."
Google has managed to demonstrate one way MS appears to be using the data. What does google do with their trove of data? That's a lot of data to collect and not do anything with.
If they want to make it perfectly clear they should add into their privacy policies and EULAs.
Yes absolutely. I don't think anyone in this thread or in the article denied that the Google Toolbar sends data to Google. And you are absolutely right that Google's use of the data collected should be clearly stated in a privacy policy and EULA. It might be, I haven't read them.
I did not say "similar data" because "similar" is a bit too slippery a word in a technical context. There's too much plausible deniablity. What I am asking is if Google's tools send data back to Googleplex to be mined for the sake of search engine improvements.
Quote from the article: "In fact, Google stressed that the only information that flows back at all from Chrome is what people are searching for from within the browser, if they are using Google as their search engine."
I'm pretty positive that's not true. If you run Fiddler when browsing with Chrome you will see constant hits to toolbarqueries.clients.google.com whether you're using Google or not. I could be browsing some MS site and toolbarqueries.clients.google.com gets hit. Chromium doesn't do this.
Edit: You can uncheck everything under privacy and it will still send those requests.
Edit2: What it sends back looks something like this:
Looks like auto-fill data, but this happens when I click around a site, NOT when searching Google or typing something in the address bar. For some sites (interestingly, not all) it sends 3 requests for each page load.
That's troubling. I'd be very interested in seeing a response from Google about this. Are you aware of any? Also, can you use Fiddler to inspect the content of the requests? I'm not familiar with the tool.
I see this too, if I have autofill enabled, and at least one autofill address entry.
I would guess that Chrome is sending a hash of the <form> (perhaps URL + method?), plus a hash of each of the <input> tags, and Google returns some sort of information about what kind of form it is?
If so, it would mean it's pretty easy for Google to determine which sites you're on from the pattern of hashes sent for each site. e.g. I see this data sent in the clear for pretty much every page on https://www.facebook.com/
>I believe Microsoft sends the data back by SSL, so it's harder to verify even that than you'd expect.
Please. Adding my own SSL cert to my own laptop is not harder than I'd expect. Certainly not harder than many other things you did in setting up this experiment.
poacher69, we crawl the public web. Anyone that blocks us out with robots.txt, we won't crawl. If you check bing.com/robots.txt, it has "Disallow: /search" . So no, we won't crawl Bing's search results pages. If anything, users tend to complain when search results from Lycos or wherever show up in Google.
I was gonna call out Matt for crawling bing's search results but I'm guessing Microsoft hasn't realized they return results from the /Search/ folder. ;)
From my experience, Googlebot doesn't crawl pages that are blocked in robots.txt files. Check out Bing's robots.txt: http://bing.com/robots.txt - notice how /search is disallowed. That typically means that Googlebot isn't able to access that page. The same for the other search engines, it's more down to if they specify (through robots.txt) that Googlebot isn't allowed to crawl those results.
Google's search results are blocked in robots.txt, so I don't believe Bing has been able to crawl our search results directly. All the evidence points to users' clicks on Google, which are then sent to Microsoft.
Microsoft has (so far) declined to admit whether our allegation is true. Getting them to talk about exactly what they do and what software they use or don't use would be the easiest way. I'd like them to confirm or deny, which is why I wanted to go to this search panel later today and ask them.