Overview is an open-source tool to help journalists find stories in large numbers of documents, by automatically sorting them according to topic and providing a fast visualization and reading interface. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

Overview does at least three things really well.

  • Find what you don’t even know to look for.
  • Find broad trends or patterns across many documents.
  • Make exhaustive manual reading faster, when all else fails.

Search is a wonderful tool when you know what you’re trying to find — and Overview includes advanced search features. It’s less useful when you start with a hunch or an anonymous tip. Or there might be many different ways to phrase what you’re looking for, or you could be struggling with poor quality material and OCR error. By automatically sorting documents by topic, Overview gives you a fast way to see what you have .

In other cases you’re interested in broad patterns. Overview’s topic tree shows the structure of your document set at a glance, and you can tag entire folders at once to label documents according to your own category names. Then you can export those tags to create visualizations.

But even when you really do have to read every document manually, Overview is a huge help. It’s optimized for fast scanning through sets of documents, with its document list feature and keyboard shortcuts. And fast reading goes faster when similar documents are grouped together, eliminating the time wasted on near duplicates. Meanwhile, the tagging system helps you track what’s you’ve found, and what’s left to read.

For more about the different ways to use Overview, see our post on the different types of document-driven stories.

Overview is designed specifically for text documents where the interesting content is all in narrative form — that is, plain English (or other languages) as opposed to a table of numbers.  Overview has been used to analyze emailsdeclassified document dumpsmaterial from Wikileaks releasessocial media postsonline comments, and more. There is no installation required — just use the free web application. Or you can run thisopen-source softwareon your own server for extra security. The goal is to make advanced document mining capability available to anyone who needs it.

You can upload your documents directly as PDF files. If you are a journalist, Overview will import your projects from DocumentCloud. Or, you can upload your document set in CSV format.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge.

Research and design work began in November 2010, moving through a proof-of-concept to a working prototype to an easy-to-use web application. Want to know more? See the FAQ.