Officially, my data set project is an attempt at content analysis using a short story collection as my chosen data set. In reality, this was me taking apart a good book so I could fool around with Python and MALLET, both of which I am very new to. In my previous post, I indicated that I was interested in “what the investigation of cultural layers in a novel can reveal about the narrative, or, in the case of my possible data set, In the Country: Stories by Mia Alvar, a shared narrative among a collection of short stories, each dealing specifically with transnational Filipino characters, their unique circumstances, and the historical contexts surrounding these narratives.” I’ve begun to scratch at the surface.
I prepared my data set by downloading the Kindle file onto my machine. This presented my first obstacle: converting the protected Kindle file into something readable. Using Calibre and some tutorials, I managed to remove the DRM and convert the file from Amazon’s .azw to .txt. I stored this .txt file and a .py file I found on a tutorial for performing content analysis using Python under the same directory and started with identifying a keyword in context (KWIC). After opening Terminal on my macbook, I typed the following script into the command line:
python kwic1.py itc_book.txt home 3
This reads my book’s text file and prints all instances of the word “home” and three words on either sides into the shell. The abbreviated output from the entire book can be seen below:
Alisons-Air:~ Alison$ ls Applications Directory Library PYScripts Test Calibre Library Documents Movies Pictures mallet-2.0.8RC2 Desktop Downloads Music Public Alisons-Air:~ Alison$ cd PYScripts/ Alisons-Air:PYScripts Alison$ ls In the Country Alisons-Air:PYScripts Alison$ cd In\ the\ Country/ Alisons-Air:In the Country Alison$ ls itc_book.txt itc_ch1.txt itc_ch2.txt kwic1.py twtest.py Alisons-Air:In the Country Alison$ python kwic1.py itc_book.txt home 3 or tuition back [home,] I sent what my pasalubong, or [homecoming] gifts: handheld digital hard and missed [home] but didn’t complain, that I’d come [home.] What did I by the tidy [home] I kept. “Is copy each other’s [homework] or make faces my cheek. “You’re [home,”] she said. “All Immaculate Conception Funeral [Home,] the mortician curved and fourth days [home;] one to me. was stunned. Back [home] in the Philippines farmer could come [home] every day and looked around my [home] at the life them away back [home,] but used up ever had back [home—and] meeting Minnie felt shared neither a [hometown] nor a dialect. sent her wages [home] to a sick while you bring [home] the bacon.” Ed bring my work [home.] Ed didn’t mind. “Make yourself at [home,”] I said. “I’m when Ed came [home.] By the time have driven Minnie [home] before, back when night Ed came [home] angry, having suffered coffee in the [homes] of foreigners before. of her employer’s [home] in Riffa. She fly her body [home] for burial. Eleven of their employers’ [homes] were dismissed for contract. Six went [home] to the Philippines. the people back [home,] but also: what she herself left [home.] “She loved all I drove her [home,] and then myself. we brought boys [home] for the night. hopefuls felt like [home.] I showed one She once brought [home] a brown man time she brought [home] a white man against me back [home] worked in my the guests went [home] and the women I’d been sent [home] with a cancellation feed,” relatives back [home] in the Philippines we’d built back [home,] spent our days keep us at [home.] Other women had Alisons-Air:In the Country Alison$
I chose the word “home” without much thought, but the output reveals an interesting pattern: back home, come home, bring home. Although this initial analysis is simple and crude, I was excited to see the script work and that the output could suggest that the book’s characters do focus on returning to the homeland or are preoccupied, at least subconsciously, with being at home, memories of home, or matters of the home. In most of In the Country’s chapters, characters are abroad as Overseas Filipino Workers (OFWs). Although home exists elsewhere, identities and communities are created on a transnational scale.
Following an online MALLET tutorial for topic modeling, I ran MALLET using the command line and prepared my data by importing the same .txt file in a readable .mallet file. Navigating back into the MALLET directory, I type the following command:
bin/mallet train-topics --input itc_book.mallet
— And received the following abbreviated output:
Last login: Sun Nov 29 22:40:08 on ttys001 Alisons-Air:~ Alison$ cd mallet-2.0.8RC2/ Alisons-Air:mallet-2.0.8RC2 Alison$ bin/mallet train-topics --input itc_book.mallet Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask Data loaded. max tokens: 49172 total tokens: 49172 LL/token: -9.8894 LL/token: -9.74603 LL/token: -9.68895 LL/token: -9.658470 0.5 girl room voice hair thought mother’s story shoulder left turn real blood minnie ago annelise sick wondered rose today sit 1 0.5 didn’t people work asked kind woman aroush place hospital world doesn’t friends body american began you’ve hadn’t set front vivi 2 0.5 back mother time house can’t you’re home husband thought we’d table passed billy family hear sat food stop pepe radio 3 0.5 day i’d made called school turned mansour manila don’t child things jackie mouth wasn’t i’ll car air boy watch thinking 4 0.5 hands years water morning mother head girl’s sound doctor felt sabine talk case dinner sleep told trouble books town asleep 5 0.5 he’d life man bed days found inside husband country call skin job reached wrote york past mind philippines chair family 6 0.5 time knew looked it’s she’d girls felt living i’m floor president fingers jim’s john young church jorge boys women nurses 7 0.5 baby hand city jaime door words annelise andoy heard he’s gave put lived that’s make white ligaya held brother end 8 0.5 milagros night face couldn’t year son brought men head money open they’d worked stood laughed met find eat white wrong 9 0.5 jim father home children eyes mrs milagros told long good years left wanted feet delacruz she’s started side girl streetLL/token: -9.62373 LL/token: -9.60831 LL/token: -9.60397 LL/token: -9.60104 LL/token: -9.596280 0.5 voice room you’re wife mother’s he’s story wrote closed walls stories america father’s ago line times sick rose thought today 1 0.5 didn’t people asked kind woman place hospital work city body doesn’t started front milagros american you’ve hadn’t held set watched 2 0.5 mother back house school thought can’t days bed minnie parents billy we’d table passed read sat stop high food they’re 3 0.5 day i’d made manila called don’t turned mansour child head hair jackie mouth dark wasn’t car stopped boy watch bedroom 4 0.5 man hands morning water reached doctor real sabine dinner sleep town asleep isn’t told dead letters loved slept press standing 5 0.5 husband he’d life family found inside call country skin live past daughter book mind chair wall heart window shoes true 6 0.5 time it’s knew looked felt she’d living i’m floor close president fingers things young began church boys women thing leave 7 0.5 baby hand jaime annelise door room words andoy hear heard lived put brother make that’s paper ligaya city end world 8 0.5 milagros night face couldn’t white son year brought men work job open stood they’d met money worked laughed find head 9 0.5 jim girl home years father children eyes aroush left good long mrs told she’s wanted girls love gave feet girl’sLL/token: -9.59296 LL/token: -9.59174 Total time: 6 seconds Alisons-Air:mallet-2.0.8RC2 Alison$
It doesn’t make much sense, but I would consider this a small success only because I managed to run MALLET and read the file. I would need to work further with my .txt file’s content for better results. At the very least, this MALLET output could also be used to identify possible categories and category members for dictionary-based content analysis.