Tag Archives: #dataCuration

Ways that Humanists Think About Data – An alternative text for in-class discussion

Up to this point, I’ve enjoyed our in-class discussions. Typically,  I leave with an unfocused, impending fatigue that transforms during my subway ride home into a grounded awareness of the gaps in my thinking about DH theory, what questions I have more generally about how DH fits into the larger context of humanistic inquiry in the academy, as well as a slightly more refined awareness of how I see myself finding my place in the field.

Last week I left, running through potential ideas for my data project, wishing I had articulated the desire for (in an effort to create a lexicon) a more specific discussion about terms related to actual DH projects. I found myself trying to anticipate the unique ways in which humanities scholars think about data. Data sets and maps generally, are obviously representations of a more complex, dynamic, ambiguous world. How have DH practitioners found inspiration in this reality, and what potential solutions and tools already exist? How can the gap between the “real” and the represented be used fruitfully? How can uninterpreted data result in new ways of seeing?

After reading Stephen’s Ramsay’s “Programming with Humanists: Reflections on Raising an Army of Hack-Scholars in the Digital Humanities” I found myself setting aside time to research what exactly went into “word frequency generators” and “poetry deformers”. He mentions a list of tools for analyzing text corpora: tf-dif analyzers, basic document classifiers, sentence complexity tools, etc, as well as natural language processing tools, as potential programs that could be built during a computer science introduction focusing on humanities computing. Hashing out a basic explanation about what these programs do, and potentially a bit about how they do it, would contribute an additional, fruitful dimension to our praxis seminar discussions. I have a sense that learning more about what tools exist would go a long way in helping me zero in on a meaningful dataset.

**As an aside, as I bet not everyone will have had a chance to read this particular article, I should mention that I also really appreciated Ramsay’s extensive list of supplemental reading materials, some of which I have read (The Question Concerning Technology Martin Heiddeger, and others that I would love to spend some time with like NOW, The Work of Art in the Age of Mechanical Reproduction for example.)**

During my research I came across an excellent blog post by Miriam Posner titled Humanities Data: A Necessary Contradiction in which she engages some of the questions that are preoccupying me in lieu of having to choose my dataset. In her blog post she provides a transcript of a talk she gave at the Harvard Purdue data symposium this past summer. Her talk focused on the unique ways that humanists think about data vs say a scientist or a social scientist, and the implications of these differences for librarianship and data curation. I’ll list a couple prescient quotes and a link to her post. If you have some time, check it out!

“It requires some real soul-searching about what we think data actually is and its relationship to reality itself; where is it completely inadequate, and what about the world can be broken into pieces and turned into structured data? I think that’s why digital humanities is so challenging and fun, because you’re always holding in your head this tension between the power of computation and the inadequacy of data to truly represent reality.”


So it’s quantitative evidence that seems to show something, but it’s the scholar’s knowledge of the surrounding debates and historiography that give this data any meaning. It requires a lot of interpretive work.