Thanks to Sava Saheli Singh, whose weekly round-up for the GC’s own Journal of Interactive Technology and Pedagogy brought a new tool to my attention: TACIT, Text Analysis, Collection and Interpretation Tool. From the website:
Though several limited-method tools for text analysis are already available (e.g. LIWC), and some have become part of standard statistical packages (e.g., SPSS Text Analytics), a unified, open-source architecture for gathering, managing and analyzing text does not exist.
The Computational Social Science Lab (CSSL) at the University of Southern California introduces TACIT: An Open-Source Text Analysis, Crawling and Interpretation Tool.
TACIT’s plugin architecture has three main components:
- Crawling plugins, for automated text collection from online sources (e.g., US Senate and Supreme Court speech transcriptions, Twitter, Reddit)
- Analysis plugins, including LIWC-type word count, topic modeling, sentiment analysis, clustering and classification.
- Corpus management, for applying standard text preprocessing to prepare and store corpora.
TACIT’s open-source plugin platform allows the architecture to easily adapt with the rapid developments text analysis.
The tool is available on Github for those interested in checking it out. A related paper can be found on SSRN.
I have not used this tool, so if anyone here tries it out, please report back!
Thanks – this looks like it may be incredibly helpful for those of us doing any text mining.