The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). - View it on GitHub
Star
2139
Rank
14990