Category Archives: Uncategorized

Data Presentation: Neologisms in Two Manuscripts

For my dataset project, I eventually used a combination of – – – Voyant Tools, Sublime Text, and Excel – – – to generate / visualize the words that DO NOT appear in the dictionary (based on a list of words from the MacAir file) – that is, “neologisms” in two manuscripts of my own poetry: PARADES (a 48 pg chapbook, about 4000 words total, fall 2014), & BABETTE (a 100 pg book, about 5500 words total, fall 2015).

The process looked like this :

  • Voyant Tools (to generate word frequencies in manuscripts)
  • Sublime Text (to generate plain text and CSV files)
  • Excel (to compare words in manuscript to words in dictionary)
  • (& back to) Voyant Tools (to generate word clouds with new data set)
  • (& back to) Excel (to generate column graphs with new data set)

*****

Here are the results for neologisms that occur more than once in each manuscript, in 4 images :

PARADES NEOLOGISM WORD CLOUDPARADES word cloud

vs.

BABETTE NEOLOGISM WORD CLOUDBABETTE word cloud

*****

PARADES NEOLOGISM GRAPHPARADES column graph (screen shot)

vs.

BABETTE NEOLOGISM GRAPHBABETTE column graph (screen shot)

*****

What did I learn about the manuscripts from comparing their use of neologisms this way?

  1. Contrary to what I thought, I actually used MORE neologisms in Babette than I did in Parades
  2. The nature of the neologisms I used in each manuscript (do they sound like Latin, like a “real” word in English, like a “part of a word” in English, or like an entirely different thing altogether?)
  3. … SINCE I actually only finished creating these visualizations today (!) this kind of “interpretation” is much to be continued!

*****

I ALSO tried to visualize the “form” (shape on the page) of the poems in each manuscript using IMAGE J – here are a few images and animations from my experiments with PARADES (you have to click on the links to get the animations… not sure they will work if IMAGE J isn’t downloaded) :

MIN_parades images

parades image J sequence parades images projection image j 2

Projection of MIN_parades

Projection of parades

Projection of Projection parades min z Projection of SUM_Projection

parades min v

SUM_Projection of MIN_parades

Data Set Project – Graduation Rate in 4 Year Postsecondary Institutions

My data set project turned out a lot different from what I envisioned. I found my data from the Integrated Postsecondary Education Data System (IPEDS). They have a Trend Generator System, which easily helps you generate data, and you can see how this data changed over time. So, I looked at the percentage of bachelor students who graduated within 150% normal time (6 years) at 4 year postsecondary institutions in cohort year 2007. I broke the data down by state and race/ethnicity.

I wanted to do a mapping of this data to help us visualize these numbers. After seeing what UC Berkley did with their Urban Displacement  project, my goal was to emulate something similar and continue to work on it for the final project. After using CartoDB, it was really not giving me what I wanted. I couldn’t separate the cluster of plotting points. Each layer also acted as an individual layer that didn’t add up with previous data/layers I mapped. For instance, here’s what I got when I added two other layers to the map:

map with points

The black and red bubbles aren’t really points…. Then I tried to create another map, a choropleth map. This map shows the different percentage of Asians who graduated at normal time. If you click on each state, a pop-up shows the graduation rate for other race/ethnic groups.

map choropleth

This wasn’t what I wanted at first, but it was what I got. Then I was thinking, is this enough? Do I need an actual map in my visualization? How does this map help me compare my data?  I decided to turn to Plotly just to do some visualization with charts. Here was what I came up with:

A three y-axes graph:

Screen Shot 2015-11-29 at 11.01.31 PM

A scatter plot:

Screen Shot 2015-11-29 at 11.07.30 PM

Even with these two charts, there’s still limitations to visually seeing the data. I’m hoping that in the next couple of weeks, I’ll either find a tool that matches with what I want or I’ll try to manipulate my data more and see what I get.

Using cartodb

I see everyone is having issues with either their datasets or cartodb.  I  had problems syncing my excel data with cartodb.  The free version has limited features like you stated Maple, so as far as layers are concerned its a dead issue.  Since I wasn’t able to sync I had to manually put the data onto the map I created.  Yesterday when I realized that I had to put in 56 street names on the map, I thought about switching, using something else, but that too proved to be fruitless.  Hopefully the website for final project will be easier to maneuver.. Good luck ladies…

New Data Set

Hi everyone. As you know, I was going to work on the Indian boarding schools project. Unfortunately, I was not able to find data. All the people/libraries/organizations I contacted were very responsive, but regrettably, the information they provided was scarce. It turned out, I could not work on this project, because there was NO data to use/rely on. I have watched videos where Native Americans express their frustration about being the third generation of boarding school raised people, so I presumed that maybe I could track down at least a few families, but even that was not feasible. This kind of information is private. Do not misunderstand me, the information I need exists, but I would have to go to Washington DC to obtain it (most likely, no one even comprised it into a data set yet). I honestly did not expect this turn of events. To fight back frustration, I started to look for ideas in other people’s posts. That is how I learned Kat Vecchio found her data set on Github. So, I started browsing.

To be continued

 

Report from the Eng. dept – First Year Comp Exam

Kathleen Fitzpatrick’s book, Planned Obsolescence, and the class discussion with her last Monday, has recently become relevant in my own path through academia. Over the course of last week and over the holiday weekend, I was asked by the English Dept. for my input on their proposals to re-model the first-year comprehensive exams.

You all may know about these in some way, but let me first briefly describe how it works, esp at CUNY / the GC. This is the exam that all PhD candidates in English must take before moving on to the next stage of their program. Not every department has them, but almost any PhD program in English seems to. I can’t speak to too many programs, but I do know that at Harvard, for example, they have a “Comprehensive Exam” they dub as the “100-book” exam: you must read and know a 100-book canon (gag me) like the back of your hand, and then go into a timed room of 3-4 faculty and spit out all your knowledge.

When I looked at the GC program, I was glad to see that it used a different model, one which didn’t seem to favor any particular canon. That said, it is still a full day, 8-hr, timed “exam,” in which you speed-respond to given essay prompts on an “empty” (brainless) computer.

After reports from students about the uselessness of this exam to measure their skills as thinkers / writers / teachers, and / or prepare them for “advanced study” (not to mention the fact that it penalizes students with learning disabilities or ESL), the Department has recently decided to try to change the model.

(Don’t worry, I’m going to get back to Kathleen’s points soon).

The model now under review is a “Portfolio” in the place of a test. It would consist of: one “conference paper,” one “review essay,” and one teaching syllabus.

As someone who has tested as “learning disabled,” I was certainly happy to hear that we were moving away from the timed exam.

And yet, looking back at Kathleen’s arguments made me re-think how “great” the Portfolio model really would be. As a poet, I’m interested in creative + critical teaching and practice… in building new “forms.” I’ve never written a review essay, and I’ve never attended an academic conference. I always worried that my lack of desire to do so would prevent me from getting my degree. But maybe I’m right: as Kathleen prescribes, we should be focusing more on the “process” of research, rather than the finished “product” (the review / conference papers). Maybe those are obsolete forms – forms that work towards the obsolete academic dissertation – which in turn work toward the obsolete academic book. Or am I just screaming in my head, “Don’t make me write a conference paper! I’m just a poet! Get me out of academia now!”

I have two answers to these questions. The first is: great, I finally have some smart argumentative backing (from Kathleen’s book, and our DH discussions all semester) to encourage my program to move away from the purely academic model of scholarship that is merely required, rather than wanted or needed. The second is: rather than wasting my time worrying that “pure academia” would come to get me, I should believe that I can actually interrogate these forms to create the type of work I want to do and see.

If we are given the Portfolio model, I have options, not limits. I can write, lets say, an open-access review essay. I can work collaboratively with other thinkers, perhaps even non-academic thinkers, online. I can write a conference paper both “about” and “demonstrating” joint creative and critical practice, and thereby question the form of the “paper” itself. I can certainly be grateful that I don’t have to spend all summer sweating about “failing” a biased timed-exam, and that I didn’t go to Harvard. Most importantly, I can think about the question of whether, by fixing the broken parts of a broken machine (rather than throwing them all away out of frustration, fear, and anxiety)… perhaps the machine will eventually start running well again; running somewhere new.

Data Set Project

For my data project, I have changed my mind so many times I can’t even begin to tell you where I started in terms of concept…but one idea branched off into another and finally I’m left with the idea of creating the beginnings of a thick map of terrorist activity in the US, with the intention of visualizing how our approach and classification of “terrorism” has changed in the wake of major incidents. For a final/next semester project, I think it would be interesting to focus specifically on creating a map that includes the events and that draws in the media conversations surrounding that event—for instance, mapping the Planned Parenthood shooting that occurred in Colorado last night, including the different ways people reacted to it on twitter, facebook, and in the news (if you look at #PPshooting or #PlannedParenthood on twitter, you’ll see some VERY revealing and diverse reactions to the event). I’m interested in the way that the idea of terrorism has infiltrated American culture and media, especially with relation to Islamophobia but also more generally for the scope of this project.

I really wanted to use the VisualEyes tool from University of Virginia and the NEH, however after much exploration I was not really able to learn how to use it. I like the final presentation of the data with this format, the sample projects on the visualeyes site seemed like exactly the kind of mapping I was looking for, and it is something I would like to really learn and explore in the future.

The point of my visualization, in the larger, more complex project scheme, is to map the way that terrorism and our reactions to it has changed.

Data Set Project

For my data project I chose to work with a Titanic passenger data set, which had information about age, cabin class, gender, survival and other details. I wanted to explore the connection between gender and class as it related to the passenger survival rate.

Ultimately I ended up with a few mediocre graphs and a lot of hours spent trying to learn new tools.

I created the final graphs in an online interface called Quadrigram which has similar functionality to Excel’s graphing options. The Quadrigram interface is reminiscent of the Squarespace website building tools, and had a relatively easy learning curve. It allows you to publish your work on a website, for embedding or you can download the source code. While I didn’t need these functions, it would certainly be a good way to display charts on a project site. I also explored Excel’s chart functions but chose Quadrigram for the graphs to present. It took a little trail and error to figure out how to best format the data to achieve appropriate results, and all attempts at scatter plots were a profound disaster.

I downloaded Gephi which is an interactive visualization platform designed to illustrate connections. While it was interesting to explore it wasn’t the right fit for the questions I wanted to consider with this data set. Two programs which look promising, but that I was not able to explore yet are Analyse-it, which works with Excel to create data visualizations (only runs on PC) and Weave, though it does appear to have a steep learning curve.

For my presentation I’ll share the graphs and a quick overview of the Quadrigram interface.

 

Data set project

I finally got a plan for my data set project and if all goes well with testing I will have created something that will be informative.  My data set will focus on the homeless street population in NYC, using a CartoDB map.  I want to tell the story of homeless people, and how many live on the street.  I believe using maps will help illustrate the widespread dilemma of homelessness, as well as showing the various parts of the city, and where they  live.  Later for my final project, I will be using my commons webpage to show how many women, men, and children are living in shelters, in conjunction with the homeless population that live on the street.

The reason I chose homelessness is that many people do not realize how easy it is to become homeless,  take it for granted, and judge people for being homeless, when not really knowing how and why.  I hope this will at least make people think about how this has become society’s problem and not just the individual’s.

Juana