Arguing against big data a convenient way to argue for an increased bottom line?

It seems to be Apple and Microsoft against Google. The “concern” expressed by the unlikely pairing of Apple and Microsoft is that Google collects, and as I understand all of the concerns, shares personal data for pay. Google, in response, argues they use the information collected as a way to improve the search experience.

While this sounds like an disagreement over a principle, the positions taken align with business interests. Google makes money from advertising. Apple makes money from hardware. Microsoft makes money from software. The income foci of these companies has evolved and this may have something to do with the positions now taken on privacy. Google offers software and services for free partly to increase use of the web and as a way to offer more ads and collect more data. Google also offers services that decrease the importance of hardware. Chrome hurts the hardware sales of Apple.

What I think is important under these circumstances is clear public understanding of what data are being collected, how it is being used, and what are the motives of the players involved. It turns out we all are also players because blocking ads while still accepting services (the consequences of modifications of browsers) involves personal decisions for what constitutes ethical behavior.

Into this business struggle and how it has been spun appears a recent “study”  from Tim Wu and colleagues. Evaluation of the study is complicated by the funding source – Yelp. Yelp has long argued their results should appear higher in Google searches and suggests Google elevates the results of Google services instead. Clearly, you or I could go directly to Yelp when searching for local information completely ignoring Google (this is what I do when searching for restaurants), but Yelp wants more.

I have a very small stake in Google ads (making probably $3-4 a year), but I am more interested in the research methodology employed in this case. My own background as an educational researcher involved the reading and evaluation of many research studies. Experience as an educational researcher is relevant here because many educational studies are conducted in the field rather than the laboratory and this work does not allow the tight controls required for simple interpretation. We are used to evaluating “methods” and the capacity of methods to rule out alternative explanations. Sometimes, multiple interpretations are possible and it is important to recognize these cases.

Take a look at the “methods” section from the study linked above. It is a little difficult to follow, but it seems the study contrasts two sets of search results.

The Method and the data:
The method involved a comparison of “search results” consisting of a) Google organic search results or b) Google organiic search results and Google local “OneBox” links (7 links for local services with additional information provided by Google). The “concern” here is that condition “b” contains results that benefit Google.

The results found that condition B generate fewer clicks.

Here is a local search showing both the OneBox results (red box) and organic results from a Minneapolis search I conducted for pizza. What you see is what I could see on my Air. Additional content could be scrolled up.

gsearch

The conclusion:

The results demonstrate that consumers vastly prefer the second version of universal search. Stated differently, consumers prefer, in effective, competitive results, as scored by Google’s own search engine, than results chosen by Google. This leads to the conclusion that Google is degrading its own search results by excluding its competitors at the expense of its users. The fact that Google’s own algorithm would provide better results suggests that Google is making a strategic choice to display their own content, rather than choosing results that consumers would prefer.

Issues I see:

The limited range of searches in the study. While relevant to the Yelp question which has a business model focused on local services, do the findings generalize to other types of search?

What does the difference in click frequency mean? Does the difference indicate as the conclusion claims that the search results provide an inferior experience for the user? Are there other interpretations. For example, the Google “get lucky” and the general logic of Google search is that many clicks indicate an inferior algorithm. Is it possible the position of the OneBox rather than the information returned that is the issue? This might be a bias, but the quality of the organic search would not be the issue.

How would this method feed into resolution of the larger question (is the collection of personal information to be avoided)? This connection to me is unclear. Google could base search on data points that are not personal (page rank). A comparison of search results based on page rank vs. page rank and personal search history would be more useful, but that is not what we have here.

How would you conduct a study to evaluate the “quality” concern?

Wired

Search Engine Land

Fortune

Time

Loading