1. What does Overview do?
Overview is intended to help journalists, researchers, and other curious people make sense of massive, disorganized collections of electronic documents. It’s a visualization and analysis tool designed for sets of documents, typically thousands of pages of material.
Overview includes a text search engine, but it’s much more powerful than that. Overview applies natural language processing algorithms to automatically sort the documents into folders and sub-folders based on their topic. Like a table of contents, this organization helps you to understand “what’s in there?” This can help you find what you don’t even know to look for.
2. How does Overview sort documents?
Overview processes every word of every document — not just entities or key words — to determine the topic of each document. It files documents with a similar topic together in the same folder. Then, it goes through the documents in each folder and groups them into sub-folders, and so on. You can learn more about the details of this process.
3. How do I get my documents into Overview?
There are two ways: Overview can import projects from DocumentCloud, or you can upload the documents as a CSV file. More detailed instructions here.
If you have an account on the DocumentCloud document handling system for journalists, Overview can import your DocumentCloud project directly. First upload your documents to DocumentCloud, then log in to Overview. Or, just click the “Analyze in Overview” option from DocumentCloud’s Analyze menu.
You can also upload the documents as CSV file. This is a simple type of text-only spreadsheet document, and Overview expects one document per row, plus a header row that describes the columns. Here is the CSV format that Overview can read. Overview can also read CSV files of social media data exported from Radian 6 and Sysomos.
We are planning to add the capability to upload PDFs directly, in the near future.
4. What types of documents will Overview read?
Overview is designed for text: lots of text, plain human text, the sort of narrative text that a human would read. Overview also works well on the text of social media posts.
In terms of file formats, if you are using DocumentCloud, then Overview can handle any file format that DocumentCloud supports (see this list). Otherwise, you will need to find a way to extract the text of each document into a single row of a CSV file. See, for example, the pdftotext tool.
Overview is not designed for tables, speadsheets, database dumps, or other primarily numeric data or structured data — unless there is a field that has lots of text in ordinary human language. If you need to extract tables from PDF files, try Tabula.
Overview supports only English documents at the moment, though multiple language support is not too difficult and is coming; if you have need of a specific language, tell us!
5. How many documents can Overview handle and how long will it take?
There is a current maximum of 50,000 documents per document set. We are steadily working to increase this limit. Overview can process about 1000 documents per minute, plus the time needed to upload the documents or transfer them from DocumentCloud.
6. Is Overview really free?
Overview is an open source project of the Associated Press, funded by the John S. and James L. Knight Foundation. The code is released under the AGPL license.
We maintain a public server at overviewproject.org. It is currently free for all users. In the future, as we scale to larger documents, we may charge for certain premium features so that we can cover the server costs and support the development of the project.
7. Who can see the documents I upload to Overview?
Only you can access the documents uploaded to your account, unless you share them. Overview can import your private projects from DocumentCloud.
If you have an especially sensitive set of documents that cannot leave your computer, we recommend running your own local server. Please contact us for help installing this; it is something we plan to make easier in the future.
8. Which languages does Overview support?
Overview currently supports documents in English, Spanish, French, German, and Swedish. We can add another language in a day or two if you have documents to test with. Note that Overview can read documents in different languages, but the UI is still in English (this file would need to be translated.)
9. I need help using Overview, or I have a question you haven’t answered.
You can talk to us and other Overview users on the Overview user forum. Or you can reach us on Twitter at @overviewproject or by email, We’d love to hear from you — especially if you have a document set analysis problem that Overview can’t solve. Maybe there’s something we can do for you.