1. What does Overview do?
Overview is intended to help journalists, researchers, and other curious people make sense of massive, disorganized collections of electronic documents. It’s a visualization and analysis tool designed for sets of documents, typically thousands of pages of material.
Overview includes a text search engine, but it’s much more powerful than that. Overview applies natural language processing algorithms to automatically sort the documents into folders and sub-folders based on their topic. Like a table of contents, this organization helps you to understand “what’s in there?” This can help you find what you don’t even know to look for.
2. How does Overview sort documents?
Overview processes every word of every document — not just entities or key words — to determine the topic of each document. It files documents with a similar topic together in the same folder. Then, it goes through the documents in each folder and groups them into sub-folders, and so on. You can learn more about the details of this process.
3. How do I get my documents into Overview?
There are three ways: Overview can import PDF files, projects from DocumentCloud, or you can upload text in bulk as a CSV file. More detailed instructions here.
4. What types of documents will Overview read?
Overview is designed for text: lots of text, the sort of narrative text that a human would read. Overview also works well on the text of social media posts.
Overview is not designed for tables, speadsheets, database dumps, or other primarily numeric data or structured data — unless there is a field that has lots of text in ordinary human language. If you need to extract tables from PDF files, try Tabula.
5. How many documents can Overview handle and how long will it take?
There is a current maximum of 200,000 documents per document set. We are steadily working to increase this limit. Overview can process about 1000 documents per minute, plus the time needed to upload the documents or transfer them from DocumentCloud.
6. Is Overview really free?
Overview is an open source project of the Associated Press, initially funded by a News Challenge grant from the John S. and James L. Knight Foundation. The code is released under the AGPL license.
We maintain a public server at overviewproject.org. It is currently free for all users. In the future we may charge for premium features so that we can cover the server costs and support the development of the project.
7. Who can see the documents I upload to Overview?
Only you can access the documents uploaded to your account, unless you share them. Overview can import your private projects from DocumentCloud.
If you have a sensitive set of documents that cannot leave your computer, we recommend running your own local server.
8. Which languages does Overview support?
Overview currently supports documents in English, Spanish, French, German, Russian, Arabic, Swedish, and Dutch. We can add another language in a day or two if you have documents to test with. Note that Overview can read documents in different languages, but the UI is still in English (this file would need to be translated.)
9. I need help using Overview, or I have a question you haven’t answered.
You can talk to us and other Overview users on the Overview user forum. Or you can reach us on Twitter at @overviewproject or by email, We’d love to hear from you — especially if you have a document set analysis problem that Overview can’t solve. Maybe there’s something we can do for you.