Open Data: Google NGrams

Granted that Google has grown into a controversial giant, with a grip on our personal data and lives so firm that a skeptic watchfulness seems in place. They do stay in front of a digital revolution that often serves the common good as well as their own. The latest trick has the magic name „N-Grams“, refering to combinations of N words, starting with single words. The idea is to use the scanned Google Books database to count the frequency of words over time. The data is available freely, both for download to be used in „serious“ statistical analysis, and in an easy online tool.

A NYTimes article points to an example comparing the appearance of „men“ and „women“ and supposedly showing a „feminism bump“ in the 1970s. The German newspaper ZEIT twittered a history of nuclear energy and the nuclear waste debate.

Of course, this entry of an abundance of numbers into classical research domains of the humanities is controversial, as the NYTimes reports. The founders are aiming high:

“The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books,” said Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard. Mr. Lieberman Aiden and Jean-Baptiste Michel, a postdoctoral fellow at Harvard, assembled the data set with Google and spearheaded a research project to demonstrate how vast digital databases can transform our understanding of language, culture and the flow of ideas.

Their study, to be published in the journal Science on Friday, offers a tantalizing taste of the rich buffet of research opportunities now open to literature, history and other liberal arts professors who may have previously avoided quantitative analysis. Science is taking the unusual step of making the paper available online to nonsubscribers.

But there are also skeptical and critical voices:

Reactions from humanities scholars who quickly reviewed the article were more muted. “In general it’s a great thing to have,” Louis Menand, an English professor at Harvard, said, particularly for linguists. But he warned that in the realm of cultural history, “obviously some of the claims are a little exaggerated.” He was also troubled that, among the paper’s 13 named authors, there was not a single humanist involved.

“There’s not even a historian of the book connected to the project,” Mr. Menand noted.

Ultimately, the debate leads to the old divide in research methods, between quantitative and qualitative positions. Which the authors try to reconcile a bit:

Aware of concerns raised by humanists that the essence of their art is a search for meaning, Mr. Michel and Mr. Lieberman Aiden emphasized that culturomics simply provided information. Interpretation remains essential.

“I don’t want humanists to accept any specific claims — we’re just throwing a lot of interesting pieces on the table,” Mr. Lieberman Aiden said. “The question is: Are you willing to examine this data?”

I’m curious what will come out of this!

Datum: Mittwoch, 22. Dezember 2010 0:28
Trackback: Trackback-URL Themengebiet: English

Feed zum Beitrag: RSS 2.0 Kommentare und Pings geschlossen.

Keine weiteren Kommentare möglich.