Sometimes I have to put text on a path

Wednesday, December 7, 2011

Search Engine Analysis: Hitwise

The following report shows search engines for the industry 'All Categories', ranked by Volume of Searches for the 4 weeks ending 12/03/2011.
Rank   Searches
1.
www.google.com
63.69%
2.
search.yahoo.com
15.36%
3.
www.bing.com
14.24%
4.
www.ask.com
3.71%
5.
search.aol.com
2.47%
Ref. http://www.hitwise.com/us/datacenter/main/dashboard-23984.html

Top 10 Social Networking sites
The following report shows websites for the industry 'Computers and Internet - Social Networking and Forums', ranked by Visits for the week ending 12/03/2011.
Rank Website Visits Share
1.
Facebook
63.63%
2.
YouTube
20.36%
3.
Twitter
1.52%
4.
Yahoo! Answers
1.02%
5.
Tagged
0.74%
6.
Linkedin
0.65%
7.
Google+
0.58%
8.
MySpace
0.47%
9.
myYearbook
0.41%
10.
Pinterest.com
0.34%

Top 10 Software sites
The following report shows websites for the industry 'Computers and Internet - Software', ranked by Visits for the week ending 12/03/2011.

Rank Website Visits Share
1.
Adobe
5.52%
2.
Apple
5.16%
3.
Microsoft Windows
4.10%
4.
Yahoo! Pulse
3.79%
5.
Big Fish Games
3.07%
6.
Apple iPod & iTunes
2.82%
7.
Google Chrome
1.97%
8.
Frontline Placement Technologies
1.86%
9.
Symantec
1.84%
10.
Microsoft Office Online
1.79%
--------
    Hitwise Methodology

Hitwise has developed proprietary software that Internet Service Providers (ISPs) use to analyze website logs created on their network. This anonymous data is aggregated and provided to Hitwise, where it is analyzed to provide a range of industry standard metrics relating to the viewing of websites including page requests, visits, average visit length, search terms and behaviour.

Hitwise is able to combine this rich ISP data with data from opt-in panel partners and with region specific consumer demographic and lifestyle information.

Hitwise collects aggregate usage data from a geographically diverse range of ISP networks and opt-in panels, representing all types of Internet usage, including home, work, educational and public access. To ensure this data is accurate and representative, it is weighted to universe estimates in each market.

Because of the extensive sample size (25 million people worldwide, including 10 million in the US), Hitwise can provide detailed insights into the search terms used to find thousands of sites as well as a range of clickstream reports, analyzing the movement of visitors between sites.
Ref. http://www.hitwise.com/us/about-us/hitwise-methodology

Research and metrics: an ex-ample with "erotic content", "sexual interests", keywords as sex, youth, breasts, vaginas etc...

Bad Research: Popular Sex Search Terms

Researchers Ogi Ogas and Sai Gaddam recently published a book, A Billion Wicked Thoughts,
detailing their analysis of 400 million searches they collected from the Dogpile search engine.
Of those 400 million searches, 13 percent (55 million) were for erotic content.

Note that the terms below are the general category of search for that interest, which is inclusive of all sorts of permutations of the terms. These permutations (such as “tits” for breasts) are not listed below.

    Youth – 13.5 percent
    Gay – 4.7 percent
    MILFs (Mother’s I’d Like to F***) – 4.3 percent
    Breasts – 4.0 percent
    Cheating wives – 3.4 percent
    Vaginas – 2.8 percent
    Penises – 2.4 percent

There’s an old saying in computer programming — GIGO: Garbage In, Garbage Out. It applies equally well to any scientific endeavor, which is only as good as the data you choose to analyze. If you start out with a dataset of questionable generalizability or value, you may find yourself drawing conclusions which have little connection with reality.

In this instance, there’s a huge problem with the research data these researchers compiled. They don’t come from Google or even Bing. They come from a little-known search engine called “Dogpile” which isn’t even a search engine. What Dogpile is is simply an aggregation engine of search results from Google, Yahoo and Bing. Google and Bing (and Yahoo) don’t make the data it collects on searches readily available to researchers.

Context is, of course, everything when it comes to datasets, especially when those datasets are likely to be biased in ways you never bothered to investigate. In this instance, the dataset is biased by the use of the Dogpile search engine — a tiny, niche search engine that is more likely than not used by a certain subset of the population that differs from the rest of the population.

So who uses Dogpile? Who knows, but it certainly isn’t likely to be a mainstream Internet user. While over 150 million people use Google and 90 million use Bing.com, Dogpile’s measly 2-3 million people per month pales in comparison and is far less than 0.05 percent of the total search engine market.

For instance, Ogi Ogas and Sai Gaddam said they analyzed 400 million Internet searches. But compare this number to the 3 billion searches conducted each and every day, according to Hitwise, an online analytics company.

Ref. http://psychcentral.com/blog/archives/2011/05/31/bad-research-popular-sex-search-terms/