Category Archives: Resources

Report Back from Git and GitHub training by the GC Digital Fellows

On Tuesday evening, February 23rd, Mary Catherine Kinniburgh and Patrick Smyth taught a workshop on Collaboration and Writing Workflows with Git and GitHub. The workshop page is here. Follow the directions for signing up with GitHub and downloading Git on your computer. From there, follow the workshop here. There are explanations for the differences between Git (local repository) and GitHub (remote hosting service), glossary, and instructions on how to get started.

Mary Catherine and Patrick made this an excellent introduction to learning Git and GitHub workflows. Group learning provided a great opportunity to practice collaborative work.

GIS workshop

A few weeks ago I was fortunate to be able to attend an all-day GIS workshop offered by Frank Connolly, the Geospatial Data Librarian at Baruch College. It was very thorough and by the end everyone had finished a simple chloropleth map. For those of you interested in continuing with map-related DH and can spare a friday, I recommend the workshop, which is free and offered several times a semester.

Most professional GIS projects use ArcGIS, made by ESRI, and many institutions subscribe to it to support their GIS projects. It’s not cheap. But, (yay!) there is an open-source alternative called QGIS, which anyone can download. This is the software we used in the workshop. QGIS is far more versatile than CartoDB, but it also has a complex interface and a steep learning curve.

In the workshop, we covered the pros and cons of various map projections (similar to some of our readings) and different types of map shapefiles (background map images); GPS coordinates vs. standard latitude/longitude (sometimes they differ) ;how to geo-rectify old maps so that they line up with modern maps and geocoordinates; open data sources; and how to organize and add information to a QGIS datababse.

The entire workshop tutorial, which participants took home, is available on the Baruch library website. If you’re comfortable learning complicated software on your own, it’s a great resource. Personally, I would need to spend a lot more time working with QGIS, with someone looking over my shoulder, to get a feel for the program. But practicing with the manual over the winter break will be on my ever-growing “to-do” list.


Data Presentation: Content Analysis and “In the Country”

Officially, my data set project is an attempt at content analysis using a short story collection as my chosen data set. In reality, this was me taking apart a good book so I could fool around with Python and MALLET, both of which I am very new to. In my previous post, I indicated that I was interested in “what the investigation of cultural layers in a novel can reveal about the narrative, or, in the case of my possible data set, In the Country: Stories by Mia Alvar, a shared narrative among a collection of short stories, each dealing specifically with transnational Filipino characters, their unique circumstances, and the historical contexts surrounding these narratives.” I’ve begun to scratch at the surface.

I prepared my data set by downloading the Kindle file onto my machine. This presented my first obstacle: converting the protected Kindle file into something readable. Using Calibre and some tutorials, I managed to remove the DRM and convert the file from Amazon’s .azw to .txt. I stored this .txt file and a .py file I found on a tutorial for performing content analysis using Python under the same directory and started with identifying a keyword in context (KWIC). After opening Terminal on my macbook, I typed the following script into the command line:

python itc_book.txt home 3

This reads my book’s text file and prints all instances of the word “home” and three words on either sides into the shell. The abbreviated output from the entire book can be seen below:

Alisons-Air:~ Alison$ ls
Applications Directory Library PYScripts Test
Calibre Library Documents Movies Pictures mallet-2.0.8RC2
Desktop Downloads Music Public
Alisons-Air:~ Alison$ cd PYScripts/
Alisons-Air:PYScripts Alison$ ls
In the Country
Alisons-Air:PYScripts Alison$ cd In\ the\ Country/
Alisons-Air:In the Country Alison$ ls
itc_book.txt itc_ch1.txt itc_ch2.txt
Alisons-Air:In the Country Alison$ python itc_book.txt home 3
or tuition back [home,] I sent what
my pasalubong, or [homecoming] gifts: handheld digital
hard and missed [home] but didn’t complain,
that I’d come [home.] What did I
by the tidy [home] I kept. “Is
copy each other’s [homework] or make faces
my cheek. “You’re [home,”] she said. “All
Immaculate Conception Funeral [Home,] the mortician curved
and fourth days [home;] one to me.
was stunned. Back [home] in the Philippines
farmer could come [home] every day and
looked around my [home] at the life
them away back [home,] but used up
ever had back [home—and] meeting Minnie felt
shared neither a [hometown] nor a dialect.
sent her wages [home] to a sick
while you bring [home] the bacon.” Ed
bring my work [home.] Ed didn’t mind.
“Make yourself at [home,”] I said. “I’m
when Ed came [home.] By the time
have driven Minnie [home] before, back when
night Ed came [home] angry, having suffered
coffee in the [homes] of foreigners before.
of her employer’s [home] in Riffa. She
fly her body [home] for burial. Eleven
of their employers’ [homes] were dismissed for
contract. Six went [home] to the Philippines.
the people back [home,] but also: what
she herself left [home.] “She loved all
I drove her [home,] and then myself.
we brought boys [home] for the night.
hopefuls felt like [home.] I showed one
She once brought [home] a brown man
time she brought [home] a white man
against me back [home] worked in my
the guests went [home] and the women
I’d been sent [home] with a cancellation
feed,” relatives back [home] in the Philippines
we’d built back [home,] spent our days
keep us at [home.] Other women had
Alisons-Air:In the Country Alison$

I chose the word “home” without much thought, but the output reveals an interesting pattern: back home, come home, bring home. Although this initial analysis is simple and crude, I was excited to see the script work and that the output could suggest that the book’s characters do focus on returning to the homeland or are preoccupied, at least subconsciously, with being at home, memories of home, or matters of the home. In most of In the Country’s chapters, characters are abroad as Overseas Filipino Workers (OFWs). Although home exists elsewhere, identities and communities are created on a transnational scale.

Following an online MALLET tutorial for topic modeling, I ran MALLET using the command line and prepared my data by importing the same .txt file in a readable .mallet file. Navigating back into the MALLET directory, I type the following command:

bin/mallet train-topics --input itc_book.mallet

— And received the following abbreviated output:

Last login: Sun Nov 29 22:40:08 on ttys001
Alisons-Air:~ Alison$ cd mallet-2.0.8RC2/
Alisons-Air:mallet-2.0.8RC2 Alison$ bin/mallet train-topics --input itc_book.mallet
Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask
Data loaded.
max tokens: 49172
total tokens: 49172
LL/token: -9.8894
LL/token: -9.74603
LL/token: -9.68895
LL/token: -9.658470 0.5 girl room voice hair thought mother’s story shoulder left turn real blood minnie ago annelise sick wondered rose today sit
1 0.5 didn’t people work asked kind woman aroush place hospital world doesn’t friends body american began you’ve hadn’t set front vivi
2 0.5 back mother time house can’t you’re home husband thought we’d table passed billy family hear sat food stop pepe radio
3 0.5 day i’d made called school turned mansour manila don’t child things jackie mouth wasn’t i’ll car air boy watch thinking
4 0.5 hands years water morning mother head girl’s sound doctor felt sabine talk case dinner sleep told trouble books town asleep
5 0.5 he’d life man bed days found inside husband country call skin job reached wrote york past mind philippines chair family
6 0.5 time knew looked it’s she’d girls felt living i’m floor president fingers jim’s john young church jorge boys women nurses
7 0.5 baby hand city jaime door words annelise andoy heard he’s gave put lived that’s make white ligaya held brother end
8 0.5 milagros night face couldn’t year son brought men head money open they’d worked stood laughed met find eat white wrong
9 0.5 jim father home children eyes mrs milagros told long good years left wanted feet delacruz she’s started side girl streetLL/token: -9.62373
LL/token: -9.60831
LL/token: -9.60397
LL/token: -9.60104
LL/token: -9.596280 0.5 voice room you’re wife mother’s he’s story wrote closed walls stories america father’s ago line times sick rose thought today
1 0.5 didn’t people asked kind woman place hospital work city body doesn’t started front milagros american you’ve hadn’t held set watched
2 0.5 mother back house school thought can’t days bed minnie parents billy we’d table passed read sat stop high food they’re
3 0.5 day i’d made manila called don’t turned mansour child head hair jackie mouth dark wasn’t car stopped boy watch bedroom
4 0.5 man hands morning water reached doctor real sabine dinner sleep town asleep isn’t told dead letters loved slept press standing
5 0.5 husband he’d life family found inside call country skin live past daughter book mind chair wall heart window shoes true
6 0.5 time it’s knew looked felt she’d living i’m floor close president fingers things young began church boys women thing leave
7 0.5 baby hand jaime annelise door room words andoy hear heard lived put brother make that’s paper ligaya city end world
8 0.5 milagros night face couldn’t white son year brought men work job open stood they’d met money worked laughed find head
9 0.5 jim girl home years father children eyes aroush left good long mrs told she’s wanted girls love gave feet girl’sLL/token: -9.59296

LL/token: -9.59174

Total time: 6 seconds
Alisons-Air:mallet-2.0.8RC2 Alison$

It doesn’t make much sense, but I would consider this a small success only because I managed to run MALLET and read the file. I would need to work further with my .txt file’s content for better results. At the very least, this MALLET output could also be used to identify possible categories and category members for dictionary-based content analysis.

Workshops: What I learned, and how…

… it helped me this semester :

I learned what I didn’t want to know.

Which is valuable! Here’s a quick run-down, for anyone who’s interested :

First, I went to “Scraping Social Media,” on October 19th, which was taught by a very energetic and helpful woman named Michelle Johnson-McSweeney. The workshop moved quickly, but I was able to keep up, especially by sneaking questions to the excellent lab partner next to me (JoJo). After learning about the various interests, reasons, and concerns about gathering data from the likes of Twitter and Facebook, we moved on to actually “scraping” those sites – which worked for the most part, and felt quite satisfying. There were of course, some issues, and these became more present towards the end of the workshop. The main disappointment I remember was the I couldn’t “scrape” Twitter on a Mac… at that point I hadn’t been considering doing a project exclusively on CUNY computers. Nevertheless, the workshop was encouraging enough to lead me to think of projects for which I could use this tool. This lead me to my first (overwhelming) data set proposal: scrape the web for data regarding a controversy in Best American Poetry. Unfortunately, as soon as I went down that rabbit hole, I ended up composing a project that was totally unmanageable, about “Appropriation” in contemporary poetry. It was way too big. So I moved on to something else:

I had a book come out November 1st, and I thought, why not just use my own poems? This was a moment of anxiety for me – I felt that I could “thick map” my book, create a hypertext version of it, disclose more information and “be transparent,” perhaps take some responsibility for my own “appropriations.”

So, the next workshop I attended was “Text Encoding,” taught by the ever-wonderful Mary Catherine Kinniburgh. I was pretty excited to learn about “code,” excited about the prospect that I might one day learn to “code,” excited overall to lock down some acronyms at the start, such as HTML, TEI (the focus of this workshop), and XML. However, as the workshop progressed, I naturally started wondering whether I wasn’t up-to-speed enough to be here. Or rather, that my “hypertext” project idea wouldn’t actually benefit from TEI. If HTML stood for “hypertext mark-up language,” wasn’t that what I needed to learn first? The TEI projects we looked at were Shakespeare plays, and some Latin / Greek texts, and it was great to learn more about the “backbone” of how text is encoded, with plenty of examples and explanations.

But even more than realizing HTML was what I would probably need for my hyper-text project, I realized once again that hyper-texting my book of poems wasn’t really a “data set.” I went back to my idea of “deformance” (interpretation + performance). I wanted to try to learn something about the language in my poems, and to simultaneously make “art from art.” I regretted that I had forgot to register for the “Data Visualization” workshop a week prior before it filled up.

So, although my path through these workshops may have felt like a bunch of (gentle) dead-ends, I do think that they helped me arrive at a project, albeit late to the game. I’d imagine that if I had gone into the semester knowing more about the digital terms (why did I have to miss the “DH Lexicon” workshop! And why was it so late in the semester, too?) – I might have been able to learn tools that would actually help me conceive of a project / start conducting it quicker.

There’s a kind suggestion here: have more workshops early on that might help students get grounded without prior knowledge of DH and digital tools. That said, I did learn a lot from each workshop, even if it wasn’t what I “wanted” to learn. And there’s a lesson in that: I should have gone to more workshops, or at least done better research on my own before just “following my gut.”

Hey Girl

For those of you thinking of a public history-oriented final project proposal (like I am), you might appreciate the Public History Ryan Gosling Tumblr.  Although its most recent entry is from 2013 and the meme is old, the ideas are still very relevant. More to the point , this  short post by the authors on the NCPH website explains how they used their Tumblr  to call attention to issues around public engagement, the ethics of historical representation of the “underrepresented”, public communities vs. academic communities, and more. Anyway, it’s a helpful reminder of things to consider while developing a project. Plus Ryan Gosling.

Data Project: Reading Transnationalism and Mapping “In the Country”

Last week, we discussed “thick mapping” in class using the Todd Presner readings from HyperCities: Thick Mapping in the Digital Humanities, segueing briefly into the topic of cultural production and power within transnational and postcolonial studies (Presner 52). I am interested in what the investigation of cultural layers in a novel can reveal about the narrative, or, in the case of my possible data set, In the Country: Stories by Mia Alvar, a shared narrative among a collection of short stories, each dealing specifically with transnational Filipino characters, their unique circumstances, and the historical contexts surrounding these narratives.

In the Country contains stories of Filipinos in the Philippines, the U.S., and the Middle East, some characters traveling across the world and coming back. For many Overseas Filipino Workers (OFWs), the expectation when working abroad is that you will return home permanently upon the end of a work contract or retirement. But the reality is that many Filipinos become citizens of and start families in the countries that they migrate to, sending home remittances or money transfers and only returning to the Philippines when it is affordable. The creation of communities and identities within the vast Filipino diaspora is a historical narrative worth examining and has been a driving force behind my research.

For my data set project, I hope to begin by looking at two or more chapters from In the Country and comparing themes and structures using Python and/or MALLET. The transnational aspect of these short stories, which take place in locations that span the globe, adds another possible layer of spatial analysis that could be explored using a mapping tool such as Neatline. My current task is creating the data set – if I need to convert it, I could possibly use Calibre.

Digital Dorothy

As I described in the last class, I’m going to use a data set that is a text.  At first, I wanted to create a “diachronic” map of a particular place—the English Lake District—which is a popular destination for hikers, walkers, photographers, and Romantic literature enthusiasts. This last category also includes a great many Japanese tourists.

My first plan was to create a corpus of 18th– and 19th-century poetry and prose related to the Lake District (read: dead white males), explore the way landscape was treated, map locations mentioned in these texts or create a timeline, and then add excerpts of text along with the present-day visual data.

For the present-day component, I was thinking about how to scrape and incorporate data and photos from Flickr and Twitter that were tagged with the names of local landmarks and landscape features of the area.

mapping the lakes image in Google Earth

An image from Mapping the Lakes in Google Earth

Early on, I discovered Mapping the Lakes – a 2007-2008 project (apparently still in pilot phase) at the University of Lancaster that uses very similar strategies to explore constellations of spatial imagination, creativity, writing, and movement in the very same landscape. From the pilot project:

The ‘Mapping the Lakes’ project website begins to demonstrate how GIS technology can be used to map out writerly movement through space. The site also highlights the critical potentiality of comparative digital cartographies. There is a need, however, to test the imaginative and conceptual possibilities of a literary GIS: there is a need to explore the usefulness of qualitative mappings of literary texts… digital space allows the literary cartographer to highlight the ways in which different writers have, across time, articulated a range of emotional responses to particular locations … we are seeking to explore the cartographical representation of subjective geographies through the creation of ‘mood maps’.

The interactive maps are built on Google Earth; therefore, don’t try to view this in Chrome. You can also use the desktop version of Google Earth. The project is quite instructive in its aims as well as its faults and failures, and the process and outcomes are described on the website. (Actually, the pilot project might be a very good object lesson on mapping creative expression with GIS.)

However, if you’re interested in this kind of mapping, you should take a look at the Lancaster team’s award-winning research presentation poster on their expanded Lakes project:

I wrote to one of the authors to ask her about it—methodology, data set, etc. She was happy to respond, and was encouraging. Although the methodology is way beyond my technical chops at present, she referred me to a helpful semantic text-tagging resource that they used, and I’m sure will come in handy at some point.

After some floundering around, I defined a data set and project that is challenging but more manageable. It will involve a map and one text: an excerpt of Dorothy Wordsworth’s journals, from 1801-1803— not long after the second edition of Lyrical Ballads was published, and she and her brother moved to the area with their friend Samuel Taylor Coleridge.

The journals are a counterpoint to William Wordsworth’s early poetry, in that she kept them as much for her brother as for herself—recording experiences they had together, and personal observations that she knew would inspire him—to provide the raw material for his poems. There is a not extensive yet established amount of scholarship on the subject. She even describes this collaborative process in her journal—although it’s not called collaboration, and until more recently wasn’t characterized as such by critics.

To prepare the data set, I downloaded the text file of the most complete edition of her journals from Project Gutenberg, took out everything not related to this time period, and did a lot of “find and replace” work to get rid of extra spaces and characters, editorial footnotes, standardize some spellings, and change initials to full names. Following the advisories on the semantic tagging and corpus analysis sites, I also saved the file in both ASCII and UTF-8 text formats, with line breaks. (This may or may not prove necessary, depending on the tools I use later on). I have considered using a concordance tool of some kind (like Antconc) to visualize those connections, since I don’t think that has been done. However, this would entail creating a second data set with the book of poems and it’s a secondary interest.

My primary goals are these:

  • I’m hoping this project will confirm or complicate existing assumptions about Dorothy and her journals, which until now—as far as I know—have only been developed through close reading, not visualization.
  • Using this text, I want to map her life in the Lake District during this period – socially, physically, and emotionally. (In her brother’s case, his poetry does a good job of that, and stacks of books have been written about his relationships to other people, women, landscape, time, etc.)
  • I want the map to be interactive to some degree, so that users can trace these different aspects of her life geographically, by clicking on related keywords. Ideally, I would like to include supplementary images—paintings, engravings, and portraits—that were created in the era, to provide a contemporaneous visual component. Including related excerpts of journal or poetic text would also be helpful: it would be a means of mapping her creativity, in a way. A similar map of  William Wordsworth‘s creativity exists. It is more extensive but not very user-friendly.

On the cartographical front, I have been considering CartoDB and Mapbox. I also looked at the British Ordnance Survey topographical map of the area, which, like all the ordnance maps, is now online. The OS website includes a feature similar to Google maps, whereby you can personalize maps to some degree, and connect text and image data. Of course, Google Earth can be used this way too. Mapbox has nice backgrounds, but less options. CartoDB is visually pleasing,  versatile, and allows for more elegant “pop ups,” which I could use to include bits of text, images from the time, etc. But it can’t be embedded into a webpage. As they come into focus, the project goals will ultimately determine what I use.

In the meantime, I’m using Voyant to explore the text/data set. It is a great resource to help you define the parameters for a more focused project. You can see what I’m working with here. Eventually I will geocode the locations, either by hand or via Google Maps, input location data, temporal data, and data about her social interactions (all in the text) in a CSV file that can be uploaded into a mapping program, and figure out how to connect everything . (Or I will die trying.) I also plan to study the new and improved “Mapping the Lakes” project more carefully, for ideas on how best to present my own, less ambitious project.

Along the way I’ve encountered some other software that may be useful for those of us who like working with olde bookes: VARD is a spelling normalization program for historic corpora (also from Lancaster U. It requires permission to download but that is easy to get).

That is all.

You Are Listening To New York: Reflections on Open APIs

The Digital Fellows workshops here at the GC have far exceeded my expectations of what a 2-hour seminar tends to be. There’s just only so much technical material that can be absorbed in such a small window of time. That being said, the real strength of these workshops comes from the capable Digital Fellows leading the discussions, and the superb, thorough documentation they provide.

Out of the workshops I’ve thus far attended (Server Architecture, Introduction to Webscraping, etc), I’ve found the Lexicon to be the most useful, as it touched, very briefly, on a range of DH tools and approaches. In fact, it was so successful in communicating an overview of the emerging field, that it has thrown my dataset/final project planning for a loop (for another blog post).

One fairly important aspect of DH project development glossed over during the Lexicon was the importance of open APIs. I wanted to share a project that uses open APIs to wonderful effect. The “You Are Listening To” project utilizes open APIs to curate an immersive user experience centered around a mashup of ambient music and real time transmissions of police radars and airwave communications from cities around the world. Check out this link for You Are Listening to New York.

What I like so much about this site is it’s simplicity. It’s an elegant digital curation of various streaming media. When you load the page there’s a javascript file that pulls in an audio stream from, which provides the police radio audio feed. It also pulls up a soundcloud list that has been screened by the site’s creator Eric Eberhardt to ensure that it only incorporates, ambient, dreamy soundscapes that contrast with and compliment the police scanner audio. It also loads the page’s background image (of the user’s chosen city), which is pulling from Flickr’s API. This is all legal, free, and only possible because each of the companies made an effort to provide access to their site through simple web APIs.

There’s also a ton of additional metrics in the “i” info dropdown to the website. It looks like it’s accessing  twitter and reddit feeds, a geotracking tool to provide metrics about and for listeners, some google reference info, and various news trackers.

Have a look!





New Cool DH Tool

I subscribe to the American Antiquarian Society blog Past is Present, and I receive all sorts of wonderful things in the emails from them.

After two years of DH development under the guidance of a DH fellow – Molly O’Hagan Hardy – the AAS now has a dedicated DH curator (same person) and an official DH component of their mission, which means (I hope) that even more of their resources will be available to lay-antiquarians like me who cannot slog up to Worcester, MA and noodle around in the archives just for kicks.

Their image archives are especially fun to peruse, and they offer a wealth of resources under the Digital AAS banner.

Anyway, this MARC records conversion tutorial just fell over the transom of my inbox, and I think it could be a very useful tool for one or some of us, if not now, then in the future. Putting your data into a CSV format opens up many possibilities, including data visualizations.