For my data project I chose to work with a Titanic passenger data set, which had information about age, cabin class, gender, survival and other details. I wanted to explore the connection between gender and class as it related to the passenger survival rate.
Ultimately I ended up with a few mediocre graphs and a lot of hours spent trying to learn new tools.
I created the final graphs in an online interface called Quadrigram which has similar functionality to Excel’s graphing options. The Quadrigram interface is reminiscent of the Squarespace website building tools, and had a relatively easy learning curve. It allows you to publish your work on a website, for embedding or you can download the source code. While I didn’t need these functions, it would certainly be a good way to display charts on a project site. I also explored Excel’s chart functions but chose Quadrigram for the graphs to present. It took a little trail and error to figure out how to best format the data to achieve appropriate results, and all attempts at scatter plots were a profound disaster.
I downloaded Gephi which is an interactive visualization platform designed to illustrate connections. While it was interesting to explore it wasn’t the right fit for the questions I wanted to consider with this data set. Two programs which look promising, but that I was not able to explore yet are Analyse-it, which works with Excel to create data visualizations (only runs on PC) and Weave, though it does appear to have a steep learning curve.
For my presentation I’ll share the graphs and a quick overview of the Quadrigram interface.
I’ve thought of and discarded a number of ideas about what to work on for my data set project. I started looking through lists of publicly available data sets hoping something would catch my interest or inspiration would strike.
At https://github.com/caesar0301/awesome-public-datasets I came across a csv file with Titanic passenger survival data. The file listed passenger names, sex, cabin class, and ticket price as well as other information. I thought it could be an interesting set to work on.
The problem with finding information from a collection called “Awesome Public Datasets” is that there was absolutely no information about who created this file or where it came from. After some more digging I found a similar data set posted by the Department of Biostatistics at Vanderbilt University, complete with information about who created and where they found their information.
While the history of the Titanic and it’s passengers is well covered, but I like the idea of testing some of my assumptions about this data, and considering interesting or unexpected questions to explore.
For example I assume there will be a direct correlation between how much a passenger paid for their ticket, and whether or not they survived. I also expect to see a similar relationship with gender, with the assumption that women were more likely to survive than men. If both of those assumptions appear to be true after examining the data, then I’m curious whether wealth or gender will appear to be a more substantial factor.
A few years ago I did a lot of reading about algorithms and machine learning as it related to the arts and popular culture. What immediately sprang to mind when I read the opening of “Quantitative Formalism: An Experiment” was an article from WIRED in 2011 about at team from The University of Bristol that worked on developing an equation that could predict a hit song.
At the top of the article you’ll see a video that shows the “evolution of musical features” as they relate to hit songs. Since we will soon be considering ways to display results from our data sets I thought this might be interesting to take a look at.
The short article considers both the Bristol team’s work and other similar projects related to predicting the popularity of new pop music. While this is not scholarly work, I thought it was interesting to share and consider how this type of enquiry is being used outside of the academy.