Monthly Archives: April 2014

Carleton University Data Day Poster Competition

As I have mentioned in earlier posts, I had been recently preparing for Carleton University’s first ever Data Day. This day was focused on displaying attention to the work of Carleton Student’s who have done recent work in the data sciences. This day also included a panel discussion of experts, who discussed the subject of data sciences and what role they will play in the upcoming years.

One of the comments from the panel that struck me particularly was from one Joe Armstrong, a regional business leader at CAE IES Canada, who said that the 21st Century will be the when the world’s statistical problems are resolved. This will be done thanks to technology that allows us to study data in a more precise fashion. Going on to say that Google was taking the lead in this process.

Taking this into consideration, the poster competition was a intellectually stimulating event. There were a vast array of projects from a number of fields of study, from the health sciences to geology. Mine however, stood alone representing the humanities, perhaps being best described by the poster judge when he said, “this one is like comparing apples to oranges”. Nonetheless it still received a great deal of attention from the attendees (see photo below of me explaining the project to fellow attendees). This made me feel confident that despite being the only humanities project, by the only undergraduate student at the event, it provided an equal level of value along with the rest.

Bl_nVTfIAAAPXph

After attending this day and seeing all the projects, I must agree with the panelists and speakers on the fact that the most important thing for the data sciences moving forward is for all fields to collaborate together. Whether you are an engineer or a historian all data studied needs to be valued equally.

 

Data Day Poster

Over the past few days I have been working on a poster to take part in Carleton University’s “Data Day” on Thursday. The Carleton University website describes “Data Day” as being a celebration of strategic development in the Data Sciences. The event includes a panel discussion and presentations by Carleton experts from a number of faculties.

At the event there will be a poster fair to showcase student research using Data Science. I will be participating in this part of the day by making a poster that summarizes my project for the George Garth Graham Research Fellowship.

The poster includes information on the research question that I am answering, and where I got the information to gather the data. In addition, I describe how the data mining program that I am using works, what discoveries I have made from the data and what are the next steps in my research.

You can check out the poster yourself below if you are unable to make it to the event on Thursday.

hollisposter

Data Day Poster Competition

On the 24th of April, Carleton University is going to be hosting “Data Day”. This event will be centred around celebrating strategic development in Data Sciences. At this event their will be a poster competition for student researchers in all categories (including big data, Data Analytics, Social Sentiment Analytics, Business Analytics). Seeing as my project falls under the category of big data, Dr Graham alerted me of the competition and what I needed to do to compete in it.

In order to enter I had to first write up an abstract about what my project entails and how I am going about completing it. Below you will find my abstract and can read all about my project to date and what I am hoping to accomplish with it.

 

Data Mining THATCamp – Hollis Peirce, Graham Undergraduate Digital History Research Fellow,

 

(Abstract)

 

 

Big data tools are not just for ‘big data’. In the humanities, they can provide a macroscopic view of patterns in materials that are otherwise difficult to analyze computationally. In this poster, I present the initial results of an analysis of the conversations at ‘THATCamp Accessibility 2012’, a conference held at Carleton in October 2012, using ‘overviewproject’, a system developed by data journalists for finding topics in data using term frequency-inverse document frequency methods.

 

THATCamp Accessibility 2012 was an ‘unconference’ (a series of free-form discussions) that explored issues of digital and physical access to humanistic research and materials. Sessions explored how digital tools help accessibility, designing accessible courses, digital museums and libraries, augmented reality, game based learning, and other ideas. These seminars were then recorded for future analysis. This project takes one of these conversations, on accessible museums and libraries, and analyzes it to identify underlying hidden themes and patterns of discourse.

 

Oral history normally transcribes the complete verbatim speech of a session; with these digital mining tools, we instead created an annotated bag-of-words by timestamp, analyzing this list via the overviewproject.org interface. While still in its early days, this project promises to accelerate the analytic step in oral history practice, removing one element of the subjectivity of oral history in favour of a dialetic between so-called ‘distant’ and ‘close’ readings of those transcripts.

So until then I will be continuing to transcribe the conversation to improve results.

 

Unravelling the Conversation

Thanks to the advice of Dr Graham I returned to my document on Overview to try and unravel some of the hidden topics of it. In order to do so I thought I needed to re-upload the entire document to tell it to ignore certain words. However, thanks to the prompt advice from Jonathan at Overview I found out that Overview is now able to create a new tree that ignores different words than originally asked for.

Therefore, I told it to ignore the major keywords that showed up in all the branches previously such as digital, data, humanities, and technology. Amazingly this created an explosion of new keywords such as accessible, databases, and information amongst many others.

To follow these words I created new tags for them each. These tags look like this:

Screen Shot 2014-04-01 at 5.32.58 PM

 

My plan this week therefore is to talk to Dr Graham more about how I should sift through these words even more.

Stay tuned to see what I uncover next!