## Wednesday, December 7, 2011

### Research and metrics: an ex-ample with "erotic content", "sexual interests", keywords as sex, youth, breasts, vaginas etc...

Bad Research: Popular Sex Search Terms

Researchers Ogi Ogas and Sai Gaddam recently published a book, A Billion Wicked Thoughts,
detailing their analysis of 400 million searches they collected from the Dogpile search engine.
Of those 400 million searches, 13 percent (55 million) were for erotic content.

Note that the terms below are the general category of search for that interest, which is inclusive of all sorts of permutations of the terms. These permutations (such as “tits” for breasts) are not listed below.

Youth – 13.5 percent
Gay – 4.7 percent
MILFs (Mother’s I’d Like to F***) – 4.3 percent
Breasts – 4.0 percent
Cheating wives – 3.4 percent
Vaginas – 2.8 percent
Penises – 2.4 percent

There’s an old saying in computer programming — GIGO: Garbage In, Garbage Out. It applies equally well to any scientific endeavor, which is only as good as the data you choose to analyze. If you start out with a dataset of questionable generalizability or value, you may find yourself drawing conclusions which have little connection with reality.

In this instance, there’s a huge problem with the research data these researchers compiled. They don’t come from Google or even Bing. They come from a little-known search engine called “Dogpile” which isn’t even a search engine. What Dogpile is is simply an aggregation engine of search results from Google, Yahoo and Bing. Google and Bing (and Yahoo) don’t make the data it collects on searches readily available to researchers.

Context is, of course, everything when it comes to datasets, especially when those datasets are likely to be biased in ways you never bothered to investigate. In this instance, the dataset is biased by the use of the Dogpile search engine — a tiny, niche search engine that is more likely than not used by a certain subset of the population that differs from the rest of the population.

So who uses Dogpile? Who knows, but it certainly isn’t likely to be a mainstream Internet user. While over 150 million people use Google and 90 million use Bing.com, Dogpile’s measly 2-3 million people per month pales in comparison and is far less than 0.05 percent of the total search engine market.

For instance, Ogi Ogas and Sai Gaddam said they analyzed 400 million Internet searches. But compare this number to the 3 billion searches conducted each and every day, according to Hitwise, an online analytics company.