Overview is an open-source tool originally designed to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. It’s also used for qualitative research, social media conversation analysis, legal document review, digital humanities, and more.

Overview does at least three things really well.

  • Find what you don’t even know to look for.
  • See broad trends or patterns across many documents.
  • Make exhaustive manual reading faster, when all else fails.

Search is a wonderful tool when you know what you’re trying to find — and Overview includes advanced search features. It’s less useful when you start with a hunch or an anonymous tip. Or there might be many different ways to phrase what you’re looking for, or you could be struggling with poor quality material and OCR error. By automatically sorting documents by topic, Overview gives you a fast way to see what you have .

In other cases you’re interested in broad patterns. Overview’s topic tree shows the structure of your document set at a glance, and you can tag entire folders at once to label documents according to your own category names. Then you can export those tags to create visualizations.

But even when you really do have to read every document manually, Overview is a huge help. It’s optimized for fast scanning through sets of documents, with its document list feature and keyboard shortcuts. And fast reading goes faster when similar documents are grouped together, eliminating the time wasted on near duplicates. Meanwhile, the tagging system helps you track what’s you’ve found, and what’s left to read.

For more about the different ways to use Overview, see our post on the different types of document-driven stories.

Overview is designed specifically for text documents where the interesting content is all in narrative form — that is, plain English (or other languages) as opposed to a table of numbers.  Overview has been used to analyze emailsdeclassified document dumpsmaterial from Wikileaks releasessocial media postsonline comments, and more. There is no installation required — just use the free web application. Or you can run this open-source software on your own server for extra security. The goal is to make advanced document mining capability available to anyone who needs it.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge.

Research and design work began in November 2010, moving through a proof-of-concept to a working prototype to an easy-to-use web application. See the FAQ for more.

 

3 Comments