Professor receives Knight Grant for data journalism class

by Martha Schick / Beacon Staff • September 15, 2015

Catherine D'Ignazio received a $35,000 grant.
Courtesy of Catherine D'Ignazio
Catherine D'Ignazio received a $35,000 grant.
Courtesy of Catherine D'Ignazio

Catherine D’Ignazio, an assistant journalism professor, received a $35,000 grant on Aug. 1 from the Knight Foundation to develop a suite of data visualization tools.

The suite is called DataBasic: Simple Data Analysis Tools for Journalists, Classrooms and Communities. The tools were developed in a partnership between D’Ignazio and Rahul Bhargava, a professor at the Massachusetts Institute of Technology.

DataBasic falls under the “Media Innovation” category of Knight grants, which “seek to improve how we create, share and use information essential to communities,” according to the foundation’s website. The suite includes three tools for data analysis: WordCounter, WTFcsv, and TuffyDuff.

WordCounter is the only tool currently live and available to the public, according to D’Ignazio, who uses it in her Data Visualization class. It allows students to see the top words and phrases used in large amounts of text. To familiarize her students with the tool, she has them analyze the entirety of the lyrics in the Beatles discography, to find the most commonly used word: love.

“What we’re hoping students use this for is starting to think creatively with data,” D’Ignazio said.

She said that tools for data visualization are becoming more important when journalists have to use text mining—reporting on hundreds or thousands of documents—to decide what a story is or if there’s one at all.

“This a good tool for people trying to make sense of lots and lots of text,” she said. “In the case of the [Clinton] emails and WikiLeaks, there’s no human way to read it all.”

WTFcsv, named after the commonly used .csv file format for data, is a high-level summary tool. According to the grant application, the program takes “a CSV file and returns a summary of the fields, their data type, their range, and basic descriptive statistics.”

D’Ignazio described TuffyDuff as the next level up from WordCounter. It can be used to upload multiple files, and analyze the main differences between them. For example, D’Ignazio said it could be used to analyze speeches by different Republican candidates. While most will use the words “money” and “America,” she said TuffyDuff would show the words and phrases the candidates do not have in common.

The money from the grant will go toward the continuing development of the apps, the graphic design of the website, producing learning guides and introduction videos for each tool, and two workshops through the Engagement Lab for journalists, students, and community groups, according to D’Ignazio.

WordCounter, which launched in May, has already received traffic from journalists using the tool to analyze data for stories, according to D’Ignazio. She said that the others will launch before January, when the Knight Foundation stipulates any prototype funded by a grant must go live.