Officially, my data set project is an attempt at content analysis using a short story collection as my chosen data set. In reality, this was me taking apart a good book so I could fool around with Python and MALLET, both of which I am very new to. In my previous post, I indicated that I was interested in “what the investigation of cultural layers in a novel can reveal about the narrative, or, in the case of my possible data set, In the Country: Stories by Mia Alvar, a shared narrative among a collection of short stories, each dealing specifically with transnational Filipino characters, their unique circumstances, and the historical contexts surrounding these narratives.” I’ve begun to scratch at the surface.
I prepared my data set by downloading the Kindle file onto my machine. This presented my first obstacle: converting the protected Kindle file into something readable. Using Calibre and some tutorials, I managed to remove the DRM and convert the file from Amazon’s .azw to .txt. I stored this .txt file and a .py file I found on a tutorial for performing content analysis using Python under the same directory and started with identifying a keyword in context (KWIC). After opening Terminal on my macbook, I typed the following script into the command line:
python kwic1.py itc_book.txt home 3
This reads my book’s text file and prints all instances of the word “home” and three words on either sides into the shell. The abbreviated output from the entire book can be seen below:
Alisons-Air:~ Alison$ ls
Applications Directory Library PYScripts Test
Calibre Library Documents Movies Pictures mallet-2.0.8RC2
Desktop Downloads Music Public
Alisons-Air:~ Alison$ cd PYScripts/
Alisons-Air:PYScripts Alison$ ls
In the Country
Alisons-Air:PYScripts Alison$ cd In\ the\ Country/
Alisons-Air:In the Country Alison$ ls
itc_book.txt itc_ch1.txt itc_ch2.txt kwic1.py twtest.py
Alisons-Air:In the Country Alison$ python kwic1.py itc_book.txt home 3
or tuition back [home,] I sent what
my pasalubong, or [homecoming] gifts: handheld digital
hard and missed [home] but didn’t complain,
that I’d come [home.] What did I
by the tidy [home] I kept. “Is
copy each other’s [homework] or make faces
my cheek. “You’re [home,”] she said. “All
Immaculate Conception Funeral [Home,] the mortician curved
and fourth days [home;] one to me.
was stunned. Back [home] in the Philippines
farmer could come [home] every day and
looked around my [home] at the life
them away back [home,] but used up
ever had back [home—and] meeting Minnie felt
shared neither a [hometown] nor a dialect.
sent her wages [home] to a sick
while you bring [home] the bacon.” Ed
bring my work [home.] Ed didn’t mind.
“Make yourself at [home,”] I said. “I’m
when Ed came [home.] By the time
have driven Minnie [home] before, back when
night Ed came [home] angry, having suffered
coffee in the [homes] of foreigners before.
of her employer’s [home] in Riffa. She
fly her body [home] for burial. Eleven
of their employers’ [homes] were dismissed for
contract. Six went [home] to the Philippines.
the people back [home,] but also: what
she herself left [home.] “She loved all
I drove her [home,] and then myself.
we brought boys [home] for the night.
hopefuls felt like [home.] I showed one
She once brought [home] a brown man
time she brought [home] a white man
against me back [home] worked in my
the guests went [home] and the women
I’d been sent [home] with a cancellation
feed,” relatives back [home] in the Philippines
we’d built back [home,] spent our days
keep us at [home.] Other women had
Alisons-Air:In the Country Alison$
I chose the word “home” without much thought, but the output reveals an interesting pattern: back home, come home, bring home. Although this initial analysis is simple and crude, I was excited to see the script work and that the output could suggest that the book’s characters do focus on returning to the homeland or are preoccupied, at least subconsciously, with being at home, memories of home, or matters of the home. In most of In the Country’s chapters, characters are abroad as Overseas Filipino Workers (OFWs). Although home exists elsewhere, identities and communities are created on a transnational scale.
Following an online MALLET tutorial for topic modeling, I ran MALLET using the command line and prepared my data by importing the same .txt file in a readable .mallet file. Navigating back into the MALLET directory, I type the following command:
bin/mallet train-topics --input itc_book.mallet
— And received the following abbreviated output:
Last login: Sun Nov 29 22:40:08 on ttys001
Alisons-Air:~ Alison$ cd mallet-2.0.8RC2/
Alisons-Air:mallet-2.0.8RC2 Alison$ bin/mallet train-topics --input itc_book.mallet
Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask
Data loaded.
max tokens: 49172
total tokens: 49172
LL/token: -9.8894
LL/token: -9.74603
LL/token: -9.68895
LL/token: -9.658470 0.5 girl room voice hair thought mother’s story shoulder left turn real blood minnie ago annelise sick wondered rose today sit
1 0.5 didn’t people work asked kind woman aroush place hospital world doesn’t friends body american began you’ve hadn’t set front vivi
2 0.5 back mother time house can’t you’re home husband thought we’d table passed billy family hear sat food stop pepe radio
3 0.5 day i’d made called school turned mansour manila don’t child things jackie mouth wasn’t i’ll car air boy watch thinking
4 0.5 hands years water morning mother head girl’s sound doctor felt sabine talk case dinner sleep told trouble books town asleep
5 0.5 he’d life man bed days found inside husband country call skin job reached wrote york past mind philippines chair family
6 0.5 time knew looked it’s she’d girls felt living i’m floor president fingers jim’s john young church jorge boys women nurses
7 0.5 baby hand city jaime door words annelise andoy heard he’s gave put lived that’s make white ligaya held brother end
8 0.5 milagros night face couldn’t year son brought men head money open they’d worked stood laughed met find eat white wrong
9 0.5 jim father home children eyes mrs milagros told long good years left wanted feet delacruz she’s started side girl streetLL/token: -9.62373
LL/token: -9.60831
LL/token: -9.60397
LL/token: -9.60104
LL/token: -9.596280 0.5 voice room you’re wife mother’s he’s story wrote closed walls stories america father’s ago line times sick rose thought today
1 0.5 didn’t people asked kind woman place hospital work city body doesn’t started front milagros american you’ve hadn’t held set watched
2 0.5 mother back house school thought can’t days bed minnie parents billy we’d table passed read sat stop high food they’re
3 0.5 day i’d made manila called don’t turned mansour child head hair jackie mouth dark wasn’t car stopped boy watch bedroom
4 0.5 man hands morning water reached doctor real sabine dinner sleep town asleep isn’t told dead letters loved slept press standing
5 0.5 husband he’d life family found inside call country skin live past daughter book mind chair wall heart window shoes true
6 0.5 time it’s knew looked felt she’d living i’m floor close president fingers things young began church boys women thing leave
7 0.5 baby hand jaime annelise door room words andoy hear heard lived put brother make that’s paper ligaya city end world
8 0.5 milagros night face couldn’t white son year brought men work job open stood they’d met money worked laughed find head
9 0.5 jim girl home years father children eyes aroush left good long mrs told she’s wanted girls love gave feet girl’sLL/token: -9.59296
LL/token: -9.59174
Total time: 6 seconds
Alisons-Air:mallet-2.0.8RC2 Alison$
It doesn’t make much sense, but I would consider this a small success only because I managed to run MALLET and read the file. I would need to work further with my .txt file’s content for better results. At the very least, this MALLET output could also be used to identify possible categories and category members for dictionary-based content analysis.