Category Archives: Data Projects

Data Presentation: Neologisms in Two Manuscripts

For my dataset project, I eventually used a combination of – – – Voyant Tools, Sublime Text, and Excel – – – to generate / visualize the words that DO NOT appear in the dictionary (based on a list of words from the MacAir file) – that is, “neologisms” in two manuscripts of my own poetry: PARADES (a 48 pg chapbook, about 4000 words total, fall 2014), & BABETTE (a 100 pg book, about 5500 words total, fall 2015).

The process looked like this :

  • Voyant Tools (to generate word frequencies in manuscripts)
  • Sublime Text (to generate plain text and CSV files)
  • Excel (to compare words in manuscript to words in dictionary)
  • (& back to) Voyant Tools (to generate word clouds with new data set)
  • (& back to) Excel (to generate column graphs with new data set)


Here are the results for neologisms that occur more than once in each manuscript, in 4 images :





PARADES NEOLOGISM GRAPHPARADES column graph (screen shot)


BABETTE NEOLOGISM GRAPHBABETTE column graph (screen shot)


What did I learn about the manuscripts from comparing their use of neologisms this way?

  1. Contrary to what I thought, I actually used MORE neologisms in Babette than I did in Parades
  2. The nature of the neologisms I used in each manuscript (do they sound like Latin, like a “real” word in English, like a “part of a word” in English, or like an entirely different thing altogether?)
  3. … SINCE I actually only finished creating these visualizations today (!) this kind of “interpretation” is much to be continued!


I ALSO tried to visualize the “form” (shape on the page) of the poems in each manuscript using IMAGE J – here are a few images and animations from my experiments with PARADES (you have to click on the links to get the animations… not sure they will work if IMAGE J isn’t downloaded) :

MIN_parades images

parades image J sequence parades images projection image j 2

Projection of MIN_parades

Projection of parades

Projection of Projection parades min z Projection of SUM_Projection

parades min v

SUM_Projection of MIN_parades

Data Presentation: Content Analysis and “In the Country”

Officially, my data set project is an attempt at content analysis using a short story collection as my chosen data set. In reality, this was me taking apart a good book so I could fool around with Python and MALLET, both of which I am very new to. In my previous post, I indicated that I was interested in “what the investigation of cultural layers in a novel can reveal about the narrative, or, in the case of my possible data set, In the Country: Stories by Mia Alvar, a shared narrative among a collection of short stories, each dealing specifically with transnational Filipino characters, their unique circumstances, and the historical contexts surrounding these narratives.” I’ve begun to scratch at the surface.

I prepared my data set by downloading the Kindle file onto my machine. This presented my first obstacle: converting the protected Kindle file into something readable. Using Calibre and some tutorials, I managed to remove the DRM and convert the file from Amazon’s .azw to .txt. I stored this .txt file and a .py file I found on a tutorial for performing content analysis using Python under the same directory and started with identifying a keyword in context (KWIC). After opening Terminal on my macbook, I typed the following script into the command line:

python itc_book.txt home 3

This reads my book’s text file and prints all instances of the word “home” and three words on either sides into the shell. The abbreviated output from the entire book can be seen below:

Alisons-Air:~ Alison$ ls
Applications Directory Library PYScripts Test
Calibre Library Documents Movies Pictures mallet-2.0.8RC2
Desktop Downloads Music Public
Alisons-Air:~ Alison$ cd PYScripts/
Alisons-Air:PYScripts Alison$ ls
In the Country
Alisons-Air:PYScripts Alison$ cd In\ the\ Country/
Alisons-Air:In the Country Alison$ ls
itc_book.txt itc_ch1.txt itc_ch2.txt
Alisons-Air:In the Country Alison$ python itc_book.txt home 3
or tuition back [home,] I sent what
my pasalubong, or [homecoming] gifts: handheld digital
hard and missed [home] but didn’t complain,
that I’d come [home.] What did I
by the tidy [home] I kept. “Is
copy each other’s [homework] or make faces
my cheek. “You’re [home,”] she said. “All
Immaculate Conception Funeral [Home,] the mortician curved
and fourth days [home;] one to me.
was stunned. Back [home] in the Philippines
farmer could come [home] every day and
looked around my [home] at the life
them away back [home,] but used up
ever had back [home—and] meeting Minnie felt
shared neither a [hometown] nor a dialect.
sent her wages [home] to a sick
while you bring [home] the bacon.” Ed
bring my work [home.] Ed didn’t mind.
“Make yourself at [home,”] I said. “I’m
when Ed came [home.] By the time
have driven Minnie [home] before, back when
night Ed came [home] angry, having suffered
coffee in the [homes] of foreigners before.
of her employer’s [home] in Riffa. She
fly her body [home] for burial. Eleven
of their employers’ [homes] were dismissed for
contract. Six went [home] to the Philippines.
the people back [home,] but also: what
she herself left [home.] “She loved all
I drove her [home,] and then myself.
we brought boys [home] for the night.
hopefuls felt like [home.] I showed one
She once brought [home] a brown man
time she brought [home] a white man
against me back [home] worked in my
the guests went [home] and the women
I’d been sent [home] with a cancellation
feed,” relatives back [home] in the Philippines
we’d built back [home,] spent our days
keep us at [home.] Other women had
Alisons-Air:In the Country Alison$

I chose the word “home” without much thought, but the output reveals an interesting pattern: back home, come home, bring home. Although this initial analysis is simple and crude, I was excited to see the script work and that the output could suggest that the book’s characters do focus on returning to the homeland or are preoccupied, at least subconsciously, with being at home, memories of home, or matters of the home. In most of In the Country’s chapters, characters are abroad as Overseas Filipino Workers (OFWs). Although home exists elsewhere, identities and communities are created on a transnational scale.

Following an online MALLET tutorial for topic modeling, I ran MALLET using the command line and prepared my data by importing the same .txt file in a readable .mallet file. Navigating back into the MALLET directory, I type the following command:

bin/mallet train-topics --input itc_book.mallet

— And received the following abbreviated output:

Last login: Sun Nov 29 22:40:08 on ttys001
Alisons-Air:~ Alison$ cd mallet-2.0.8RC2/
Alisons-Air:mallet-2.0.8RC2 Alison$ bin/mallet train-topics --input itc_book.mallet
Mallet LDA: 10 topics, 4 topic bits, 1111 topic mask
Data loaded.
max tokens: 49172
total tokens: 49172
LL/token: -9.8894
LL/token: -9.74603
LL/token: -9.68895
LL/token: -9.658470 0.5 girl room voice hair thought mother’s story shoulder left turn real blood minnie ago annelise sick wondered rose today sit
1 0.5 didn’t people work asked kind woman aroush place hospital world doesn’t friends body american began you’ve hadn’t set front vivi
2 0.5 back mother time house can’t you’re home husband thought we’d table passed billy family hear sat food stop pepe radio
3 0.5 day i’d made called school turned mansour manila don’t child things jackie mouth wasn’t i’ll car air boy watch thinking
4 0.5 hands years water morning mother head girl’s sound doctor felt sabine talk case dinner sleep told trouble books town asleep
5 0.5 he’d life man bed days found inside husband country call skin job reached wrote york past mind philippines chair family
6 0.5 time knew looked it’s she’d girls felt living i’m floor president fingers jim’s john young church jorge boys women nurses
7 0.5 baby hand city jaime door words annelise andoy heard he’s gave put lived that’s make white ligaya held brother end
8 0.5 milagros night face couldn’t year son brought men head money open they’d worked stood laughed met find eat white wrong
9 0.5 jim father home children eyes mrs milagros told long good years left wanted feet delacruz she’s started side girl streetLL/token: -9.62373
LL/token: -9.60831
LL/token: -9.60397
LL/token: -9.60104
LL/token: -9.596280 0.5 voice room you’re wife mother’s he’s story wrote closed walls stories america father’s ago line times sick rose thought today
1 0.5 didn’t people asked kind woman place hospital work city body doesn’t started front milagros american you’ve hadn’t held set watched
2 0.5 mother back house school thought can’t days bed minnie parents billy we’d table passed read sat stop high food they’re
3 0.5 day i’d made manila called don’t turned mansour child head hair jackie mouth dark wasn’t car stopped boy watch bedroom
4 0.5 man hands morning water reached doctor real sabine dinner sleep town asleep isn’t told dead letters loved slept press standing
5 0.5 husband he’d life family found inside call country skin live past daughter book mind chair wall heart window shoes true
6 0.5 time it’s knew looked felt she’d living i’m floor close president fingers things young began church boys women thing leave
7 0.5 baby hand jaime annelise door room words andoy hear heard lived put brother make that’s paper ligaya city end world
8 0.5 milagros night face couldn’t white son year brought men work job open stood they’d met money worked laughed find head
9 0.5 jim girl home years father children eyes aroush left good long mrs told she’s wanted girls love gave feet girl’sLL/token: -9.59296

LL/token: -9.59174

Total time: 6 seconds
Alisons-Air:mallet-2.0.8RC2 Alison$

It doesn’t make much sense, but I would consider this a small success only because I managed to run MALLET and read the file. I would need to work further with my .txt file’s content for better results. At the very least, this MALLET output could also be used to identify possible categories and category members for dictionary-based content analysis.

Digital Dorothy 2: The Reckoning

I resolved to write a journal of the time, till W. and J. return, and I set about keeping my resolve, because I will not quarrel with myself, and because I shall give William pleasure by it when he comes home again.

And so begins Dorothy Wordsworth’s Grasmere Journals (1800-1803).  At first a strategy to cope with her loneliness while her brothers William and John were away, the journals were soon expanded in purpose to provide a record of what she saw, heard, and experienced around their home for the benefit of her brother’s poetry (apparently, he hated to write). There are numerous instances where her subject matter corresponds with his poems, down to the level of shared language. The most cited of these is an entry she made about seeing a field of daffodils; this was incorporated into the much-anthologized poem “I wandered lonely as a cloud,” which ends with a similar scene.

Dorothy and her brother William had a peculiar and unusually close relationship. She was more or less his constant companion before and after they moved to Grasmere.  She copied down his poems, attended to his needs, and continued to live with him after he married and had a family, She herself never married, and remained with him until his death in 1850. For these reasons she is often portrayed as a woman who lived for him and vicariously through him.

This impression is reinforced by the nature of her writing, which focuses on the minutiae of day-to-day life, and isolated details of the world around her – weather, seasonal change, plants, animals, flowers, etc. – and which she mentions numerous times  will be of use to her brother’s own work. Because her diaries chronicle many long walks in the surrounding countryside, alone or with friends and family, I originally planned to  map her movements around the region and connect this movement to the people she was seeing in the area and corresponding with – to map her world, and see how hermetic and localized it really was.

I created the initial data set by downloading and cleaning a text file of the Journals from There were many ways I could have worked with this data set to make it suitable for mapping. I ended up using several online platforms: the UCREL semantic tagger, Voyant Tools, CartoDB, and a visualization platform called RAW. Working back and forth with some of these programs enabled me to put the data into Excel spreadsheets to filter and sort in numerous ways. 

When I began thinking about mapping strategies and recording the various data I extracted (locations, people, activities), I saw that to ensure accuracy I would have to corroborate much of it by going through the journals entry by entry – essentially, to do a close reading. Because I was hoping to see what could be gleaned from this text by distant reading, I chose to make a simple map of the locations she mentions in her journal entries, in relation to some word usage statistics provided by Voyant Tools. Voyant  has numerous text visualization options, and working with them also encouraged me to think more about the role Dorothy’s writing had in her brother’s creative process; I was curious about how that might be visualized, in order to note patterns or consider the relationship between his defiantly informal poetic diction and her colloquial, quickly jotted prose.

So, I downloaded William Wordsworth’s Poems in Two Volumes, much of which was written during the same period, and processed it in a similar way. Using the RAW tools, I created some comparative visualizations with the total number words common to both texts. I’ve used the images that are easiest to read in my presentation, but there are others equally informative, that track the movement of language from one text to another.

If I were to return to the map and do a close reading, I would include a “density” component to reflect the amount of time Dorothy spent going to other locations, and perhaps add the people associated with those locations (there is a lot of overlap here), and the nature of activity.

I had some trouble winnowing my presentation down to three slides, but the images can be accessed here.

Also, thanks to Patrick Smyth for writing a short Python program for me! I didn’t end up using it but I think it will be very helpful for future data projects.

Terrorism Data pt.2

For my data project I used Processing to create a simple interactive animation of data I downloaded from the Global Terrorism Database. (My application file can be found here.**) For the sake of early draft ease, I limited the information that I pulled to all recorded global incidents from 2011 to 2014, which was still in excess of 40,000 incidents.

The motivation for this project was to create a dynamic display of information that I find difficult to contextualize. To that end, the app displays location, date, mode of attack, target, casualties, and motive, alongside an animation of frenetically moving spheres. The number of spheres is constant, but their size is scaled to the number of casualties. Thus as more people are injured and maimed the more overwhemed the screen becomes. The slider across the top of the window allows the user to move forward and back through time, while the displayed information and the animation updates to the data associated with her new position.

** It requires downloading the whole folder first and then clicking on the app icon. If you try to click through in Drive it will just show you the sub-folders that make it up. Also, for reasons that escape me, the file keeps breaking somewhere between upload and download. I’ll keep trying to fix this, but if the zip gods do not smile upon me, I’ll present with my laptop and run it from there. **



Please click here to take a look at my data set. As you can see, it includes

  • All civil and commercial aviation accidents of scheduled and non-scheduled passenger airliners worldwide, which resulted in a fatality (including all U.S. Part 121 and Part 135 fatal accidents)
  • All cargo, positioning, ferry and test flight fatal accidents.
  • All military transport accidents with 10 or more fatalities.
  • All commercial and military helicopter accidents with greater than 10 fatalities.
  • All civil and military airship accidents involving fatalities.
  • Aviation accidents or incidents of noteworthy interest.

All together, I have almost 85 years of aviation accidents. You have to agree it is a lot of data. As expected, the tool I am using , Carto DB, does not accept the format the data is provided in on Github. First I almost pushed the panic button while imagining the amount of time needed to transfer it to Excel, but luckily, Hannah Aizenman suggested I download and debug it, and then see if any manual cleaning is required. Downloading took a long time. Unfortunately, I do not know how to debug it yet. Therefore, for my presentation I have entered eight years of data into Excel spreadsheet manually.

Screen Shot 2015-11-29 at 11.52.49 PM

The categories I am working with are

Date:  Date of accident,  in the format – January 01, 2001
Time:  Local time, in 24 hr. format unless otherwise specified
Airline/Op:  Airline or operator of the aircraft
Flight #:  Flight number assigned by the aircraft operator
Route:  Complete or partial route flown prior to the accident
AC Type:  Aircraft type
Reg:  ICAO registration of the aircraft
cn / ln:  Construction or serial number / Line or fuselage number
Aboard:  Total aboard (passengers / crew)
Fatalities:  Total fatalities aboard (passengers / crew)
Ground:  Total killed on the ground
Summary:  Brief description of the accident and cause if known

At this moment, I do not have enough data to make any conclusions, but questions have already started to arise. For example, is there a connection between the aircraft type and the number of accidents? De Havilland, Fokker, and KLM seem the most popular for now.

To better understand my data, I chose Carto DB as my tool. In the near future I should be able to see in what part of the world the most airplane accidents happened. For now, my map looks like this

Screen Shot 2015-11-29 at 10.49.39 PM

When clicked on a dot info about the accident appears

Screen Shot 2015-11-29 at 10.57.34 PMI

Ideally, I would like to have every ten years appear in different colors on the map. This kind of visualization should provide a deeper insight in my data.

Screen Shot 2015-11-29 at 11.08.16 PM

Problems Encountered

I knew that Carto DB mostly recognized large cities. Surprisingly, it is sometimes able to find towns, too. So how do you georeference something that did not appear on the map? You need to go to the map’s data and look for what exactly did not show up. Then, go back to the map and search for the town you were looking for. At the bottom of the menu there is an “add” feature. Click it and click twice at the location you need. You can enter data manually. This is how I dealt with the little towns situation. When a plane accident over Gibraltar had to be georeferenced, I simply clicked twice on the map at the approximate location of the catastrophe. A new dot appeared and I entered the details manually.

Sometimes Carto DB is wrong. Spring Rocks Wyoming is in the USA, not Jamaica. A few other dots proved wrong, too. Do not know how to deal with this yet. Will ask digital fellows for advise.

Only three lawyers are available on Carto DB. It appears you have to pay for the rest. Good Destry told us about the two free tryout weeks:).

I have 85 years of data. In the first 8 years out of 122 accidents 24 did not get georeferenced. The amount of those in 85 years will be overwhelming. To enter all that data manually would take forever. I wonder if I could leave my map a little imperfect… Something worth asking our professors.

In general, it was rather interesting to research aviation accidents data. Now I know most tragedies happen due to bad weather. Conclusion: never complain about plane delays in winter!

May we stay safe, and may there be less tragedies in the world.

Airplane Crash Info

At Github I came across a dataset about plane crashes. Even though it is not something I am passionate about (in fact, I hope to not think too much about it while flying next time), the data raised my curiosity. At Accident Database they provide information about airplane accidents since 1921 till 2015. I was not sure how to verify this data. What I did was selecting a few random plane crashes and looking them up online. It worked, those calamities really happened.

It is hard to predict what questions I will be able to ask and answer once my data is ready to use. They say, the more you work on it the more correlations you see. For now, I have noticed that the number of airplane accidents during in 1945 does not differ much from from the one in 1957. Looking at the data closer revealed that almost all the airplane crashes documented in 1945 happened to the military planes, which means the regular ones weren’t flying because of World War II. Actually, my data set provides info on military plane crashes with more than 10 people aboard. One can only imagine how many small airplanes were destroyed.

With this being said, l am going back to working on my slides.

May we stay safe!


Data Set Project: AAU Report on Sexual Assault

My data set project turned into an exercise in parsing and cleaning data before all else. I knew that I wanted to look into the current climate of sexual assault on college and university campuses, but there were a number of places to look for data on the issue. I landed on the Report on the Association of American Universities Campus Climate Survey on Sexual Assault and Sexual Misconduct. This report was the most recent, had the most immediately accessible data, and had been widely publicized when it was released earlier this fall. (See, for example, the NYTimes’s September article, “1 in 4 Women Experience Sex Assault on Campus.“) I contacted the AAU and made multiple attempts to access the original survey results, but in the end had to resort to the data tables as they appeared in the report.

This data table is where I began — with the percent of students experiencing different forms of nonconsensual sexual contact, organized according to their gender.

Screen Shot 2015-11-28 at 1.34.45 PM

I then used the web software Tabula to scrape the data from the .pdf…

Screen Shot 2015-11-28 at 1.52.05 PM

…cleaned and exported to Google Sheets…

Screen Shot 2015-11-28 at 1.34.01 PM

…and cleaned and imported into Plotly.

Screen Shot 2015-11-28 at 1.32.58 PM

At every step of the way I was reorganizing the data to focus on the story that I most wanted. When I first started graphing, the data looked like this:

newplot (2)

Until I continued to manipulate it to get it to look like this:

newplot (3)

This exercise speaks to the amount of parsing, cleaning, selection, and editing that is necessary to arrive at even the simplest of bar charts. But I appreciate it for the new pieces of information that emerge from analyzing data through this process. For example, consider the high percentage of TGQN (that’s transgender women, transgender men, genderqueer, gender non-conforming, and questioning students, and those whose gender wasn’t listed) that are experiencing some form of assault on their campuses. Most media coverage of this issue has focused on the victimization of female students, but apparently there’s also a critical story here about the safety of TGQN students — something that I didn’t realize until I went through the motions of visualizing the data, and that probably a lot of other people are missing too.

As far as Plotly as a tool for visualization, I found it to be relatively easy to use, especially for someone who doesn’t know how to code. But I prefer Highcharts for the greater autonomy and flexibility it allows the user. That said, Highcharts is a JavaScript library, and as such requires some degree of JS knowledge to manipulate.  For anyone who wants to churn out a pretty quick, pretty attractive, and very shareable graph, I think Plotly is a good choice.


Apigee – Fashion Studies Dataset project #stansmith

Instagram —> # Stan Smith —> APIGEE to create a collection of data —> pictures and hashtag —> mapping or story-line

Scarlett and I decided to deepen fashion studies through DH tools that we learned during the semester. Since we are studying fashion, we noticed two different aspects that we would like to develop and that both are necessary in our field. The first one is visualization, while the second one is connectivity. For the first, we decided to start from Instagram. Between all social media we have today, Instagram is based on hashtag and images. Fashion is a visual thing, and the absence of pictures wouldn’t allow its growth. The second aspect would be connectivity: through a specific detail, like Stan Smith shoes, we can trace the visual story of fashion, declined in that specific detail. Marketing strategies, level of interests all over the world, connect people with same passions and interests, etc.… It could not be appealing for someone, but we think fashion owns this power of connection. This is what we consider a tool that has the power of social mobility, etc.…

We started with Apigee, the leading provider of API technology and services for enterprises and developers. Hundreds of companies including Walgreens, Bechtel, eBay, Pearson, and Gilt Group as well as tens of thousands of developers use Apigee to simplify the delivery, management and analysis of APIs and apps ( Our classmates in the Fashion Studies Track have suggested this program to us because it would have helped for a good but simple data project; so let’s see how it works.

Basically we wanted to collect images, find tags with StanSmith, locations all over the world, to see what relationship exists between the world and the shoes. I guess this could also be a good project to keep track of marketing movements in the entire fashion world, and with all the items, not only one specific.

We typed in the search bar and in the page that popped out we chose “Instagram” in the column API. The next step is to select OAuth2 under the column “Authentication” because to interact with data through Apigee is necessary to authorize your Instagram account.

At this point you will have three options (Query, Template and Headers). We chose “Template” (for Instagram) and in the “tag name*” slot we typed our tag “stansmith”. Right after this step the authentication is complete, authorizing Apigee to use your social media account.

It’s necessary to select an API method and it’s important to select the second choice under the “Tags” section of the list.

We only had to click on “send” and the response came: Instagram has pagination, so the data we got were divided in pages. Copying the URL in the picture, and pasting it in a new searching bar we obtained a weird data page in order to see the next page.

Our friend told us that the process was almost complete, but the last step was to download “JSONview” (with Safari it doesn’t work, so we used Chrome), to see the data in an organized form. This step is specifically for an easier visualization of images, profile pictures, username, etc.…and we also found the numbers for “created_time”. This part is very important because converting this numbers from Unix epoch time to GMT is necessary for the visualization of images.

With Epoch Converter we were able to convert everything and the result is a list of data, where every “attribution” is a post. We collapsed the posts, having the chance to look at posted pictures in different resolutions!

For the presentation we’ll provide a Power Point with images step by step of the process to reach this data that we will probably use for a mapping or a timeline of the item.


Thanks for your time,


Nico and Scarlett

Deformance / Hypertext Project

This is a sort of two-pronged post, addressing Matt’s question towards the end of last class, re: how the readings / class discussions are helping me think more about my data (or final) project.

I’m really interested in the ideas and examples of “deformance” (in Jerome McGann’s definition = interpretation + performance) that have come up recently, especially and most recently in Lev Manovich & Kevin’s digital work. I suppose I think of “deformance” as a way of turning art into new art… the purpose of which is beyond just “playing around” and being creative (good purpose in itself), but also, as Kevin pointed out, to ask questions of the “data” (the art, or the world in which it was produced) that you wouldn’t have known to ask before. Disordering the work of art (text, photo, or film) in order to change its questions, its answers, its “rules.” I have also been interested in the way that digital “deformance” tends to “aesthetically pleasing” results – Kevin and Lev’s work simply “look good,” and I’d love if one my projects in this course (i.e., project fully executed) could aspire to that type of artistic attention (which seems to derive from direct intention + skills + a level of pure play or “accident”).

Along these lines, it is now my intention to do a “deformance” project that is focused on my own writing / creative process. That is, rather than trying to uncover and work with the huge and somewhat impossibly impenetrable “data set” I previously proposed (Appropriation in Contemporary Poetry), I would like to either:

  • 1 – Make a digital hypertext edition of my book manuscript (Babette, recently published in print this month), adding one or more layers of text to discover more information about the language on the page. This may include anecdotes, links, or perhaps even other “poems,” that seem to enrich, deconstruct, or disorder the present text. Thus the “data set” would be the original text (+ the new text?) I would like this hypertext edition to move the reader away from the “search” (for meaning) and towards the “browse” function, revealing both writing and reading as dynamic, non-linear, and layered, with interconnected information and experiences. On that note, a final goal would be to open the text to “community, relationship, and play” (Stephen Ramsay) by allowing “users” to add their own interpretations, experiences, links, etc. (though I understand this might be beyond the scope of this project).


  • 2 – Create a digital hyper-text edition of my three published manuscripts (Babette, Parades, and Latronic Strag) and do a data-visualization of the neologisms I’ve used in these works. The “data set” would thus be these “neologistic” words, about which I could ask starting questions such as: “how often do they appear in each book,” “how much do they sound like one another,” “how closely are they “related” to each other (by the computer’s definition),” how closely are they “related” to “real” words, what words do these associate with in my mind (or the computer’s, or in the minds of other readers)… what “real” language do they sound like, and is there some sort of “neologistic” conversation going on between the words, phrases, poems, manuscripts? Again, the aim would be to use the language as data to “browse” for new questions about the text, rather than “search” for these answers, and one ultimate goal would be to have the project allow for “users” to add in their own experience of these words (creating more data).

Allowing others to add reactions, data, or personal experience is one way for me to get away from the fear that this would be a “vanity project” (in which the data in the set is simply my own data). Another way would be to see this project as a starting point for hypertext-ing or disordering other texts, texts that are not my own. Perhaps I see this project as one that might move me closer to that more “research”-like or scholarly question of how language is appropriated or repurposed in contemporary poetry.

As for creating a “digital edition” of one (or more) of my books, I found a tool called Ediarum on the DIRT site, which claims to help authors “transcribe, encode, and edit” manuscripts.

As for the second (and I’d imagine, more fun and elaborate) task of “hypertexting” the book(s), I had to do a little more research to see what’s out there, and where it’s coming from. What “kind” of hypertext am I looking to produce? Based on the Wikipedia definitions of “forms of hypertexts,” I’d surely like to create something that is “networked,” i.e. “an interconnected system of nodes with no dominant axis of orientation… no designated beginning or designated ending.” And, if I wanted to be able to add that user interaction, I’d want something “layered”: a structure with two layers of linked pages in which readers could insert data of their own.

Searching for tools to create networked / layered hypertext lead me to two options on DIRT: Mozilla Thimble, and TiddlyWiki. (It also lead me to investigate what software is or has been available for hypertext, starting with Ted Nelson’s ProjectXanadu, and ending, it seems, with the popular (and expensive, at $300) program from Eastgate called StorySpace, neither of which I think will be very helpful).

I’d love any thoughts on which project (1 or 2) seems more interesting, appropriate, or feasible for this project… I’m going to make an appointment with the Digital Fellows to get their advice (and guidance on the tools).


– Sara