Open Data: Google NGrams
Mittwoch, 22. Dezember 2010 0:28
Granted that Google has grown into a controversial giant, with a grip on our personal data and lives so firm that a skeptic watchfulness seems in place. They do stay in front of a digital revolution that often serves the common good as well as their own. The latest trick has the magic name „N-Grams“, refering to combinations of N words, starting with single words. The idea is to use the scanned Google Books database to count the frequency of words over time. The data is available freely, both for download to be used in „serious“ statistical analysis, and in an easy online tool.
A NYTimes article points to an example comparing the appearance of „men“ and „women“ and supposedly showing a „feminism bump“ in the 1970s. The German newspaper ZEIT twittered a history of nuclear energy and the nuclear waste debate.
Of course, this entry of an abundance of numbers into classical research domains of the humanities is controversial, as the NYTimes reports. The founders are aiming high: