Auto-Tagging Project
The Russian Movie Theater project is an effort by Russian professors and students
at William & Mary to record an oral history of Russian movie-going in St. Petersburg.
Our primary work involves student researchers (myself included) collecting interviews
with native Petersburgers while studying abroad in Russia. Interviews are conducted
in Russian and recorded on video camera. Once back in the States, a team of students
and faculty transcribe and translate these interviews, tag them in XML, and use textual
data (word frequency, mentions of names, places, movies, etc.) to analyze these documents.
In working on this project, I realized that the most tedious portion, tagging
the interviews with XML tags, could be done by a computer, freeing hours of time
previously spent on this endeavor. My partner, John Hoskins, and I have created an application
for our research group that can be used to automatically tag common words, while allowing
researchers the flexibility to change which tags are attributed to which words.
This webpage lays out our application, and explains how we built it and how it is to be used.