Overview is an open-source tool to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

There are good tools for searching within large document sets for names and keywords, but that doesn’t help find the stories you’re not specifically looking for. Overview visualizes the relationships among topics, people, and places to help journalists to answer the question, “What’s in there?”

Overview is designed specifically for text documents where the interesting content is all in narrative form — that is, plain English, not a table of numbers. (Other languages are coming soon.) It also works great for analyzing social media data, to find and understand the conversations around a particular topic.

It’s an interactive system where the computer reads every word of every document to create a visualization of topics and sub-topics, while a human guides the exploration. There is no installation required — just use the free web application. Or you can run this open-source software on your own server for extra security. The goal is to make advanced document mining capability available to anyone who needs it.

If you are a journalist, Overview will import your projects directly from DocumentCloud. Or, you can upload your document set in CSV format.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge. The Associated Press invests its resources to advance the news industry, delivering fast, unbiased news from every corner of the world to all media platforms and formats. The Knight News Challenge is an international contest to fund digital news experiments that use technology to inform and engage communities.

Research and design work began in November 2010, moving through a proof-of-concept to a working prototype to an easy-to-use web application, now in beta. Want to know more? See the FAQ.

 

3 Comments