
OCR (Optical Character Recognition) and Text Mining for Digital Humanities


Join the Georgia Tech Library Monday, March 31 from 4 to 5:30 p.m. in the Crosland Tower Second Floor Classroom, room 2130, for our OCR (Optical Character Recognition) and Text Mining for Digital Humanities workshop.
In this free, 90-minute to 2-hour workshop, users will explore the fundamentals and preparatory steps for text mining in digital humanities. Participants can expect a walkthrough of the main steps in a workflow, starting with the collection of data from digital archives like HathiTrust, JSTOR, and Internet Archive. This will be followed by some practical tools for converting images and PDFs to text using OCR (Optical Character Recognition) software, then optimizing those images for basic text analysis with platforms like Voyant.
This introductory course is designed to provide a clear, accessible overview of the text mining process and tools useful in that process.