Category Archives: Uncategorized

DHSI 2014

At the beginning of this month I was fortunate enough to have the opportunity to attend the Digital Humanities Summer Institute (DHSI) 2014 In Victoria British Columbia. For those of you who are as confused as I was when I first heard the name, it is a week long conference filled with an intense combination of coursework, seminars and lectures relating to the influence of computing technology on teaching, research, preservation of disciplines amongst others. This was my second year in attendance and I took a week long course titled “The History of the Pre-Digital Book”.

Given that the title possessed the word History in it appealed to me right away. I was a little apprehensive to register for it though as I did not know how much it would relate to my interest in the Digital Humanities. I must admit though that I was pleasantly surprised to learn that despite not revolving around my immediate idea of what “technology” is, the course related quite a bit to the seminar I focused on transcribing during my fellowship this past year.

The seminar I am referring to of course is that which discussed museums and libraries. The primary reason for my saying this is that one of the main topics of the course was the development of publishing books which was really the first step in making them more accessible to everyone. As much as my interest does lie in the digitization of the books of today to make historical (or any other subject for that matter) research more accessible to a physically disabled individual like myself, it is interesting to learn more about how we arrived at this stage.

Overall I believe I chose the perfect course to take at DHSI 2014 and hope that I have a chance to attend again in the coming years.

Carleton University Data Day Poster Competition

As I have mentioned in earlier posts, I had been recently preparing for Carleton University’s first ever Data Day. This day was focused on displaying attention to the work of Carleton Student’s who have done recent work in the data sciences. This day also included a panel discussion of experts, who discussed the subject of data sciences and what role they will play in the upcoming years.

One of the comments from the panel that struck me particularly was from one Joe Armstrong, a regional business leader at CAE IES Canada, who said that the 21st Century will be the when the world’s statistical problems are resolved. This will be done thanks to technology that allows us to study data in a more precise fashion. Going on to say that Google was taking the lead in this process.

Taking this into consideration, the poster competition was a intellectually stimulating event. There were a vast array of projects from a number of fields of study, from the health sciences to geology. Mine however, stood alone representing the humanities, perhaps being best described by the poster judge when he said, “this one is like comparing apples to oranges”. Nonetheless it still received a great deal of attention from the attendees (see photo below of me explaining the project to fellow attendees). This made me feel confident that despite being the only humanities project, by the only undergraduate student at the event, it provided an equal level of value along with the rest.

Bl_nVTfIAAAPXph

After attending this day and seeing all the projects, I must agree with the panelists and speakers on the fact that the most important thing for the data sciences moving forward is for all fields to collaborate together. Whether you are an engineer or a historian all data studied needs to be valued equally.

 

Data Day Poster

Over the past few days I have been working on a poster to take part in Carleton University’s “Data Day” on Thursday. The Carleton University website describes “Data Day” as being a celebration of strategic development in the Data Sciences. The event includes a panel discussion and presentations by Carleton experts from a number of faculties.

At the event there will be a poster fair to showcase student research using Data Science. I will be participating in this part of the day by making a poster that summarizes my project for the George Garth Graham Research Fellowship.

The poster includes information on the research question that I am answering, and where I got the information to gather the data. In addition, I describe how the data mining program that I am using works, what discoveries I have made from the data and what are the next steps in my research.

You can check out the poster yourself below if you are unable to make it to the event on Thursday.

hollisposter

Data Day Poster Competition

On the 24th of April, Carleton University is going to be hosting “Data Day”. This event will be centred around celebrating strategic development in Data Sciences. At this event their will be a poster competition for student researchers in all categories (including big data, Data Analytics, Social Sentiment Analytics, Business Analytics). Seeing as my project falls under the category of big data, Dr Graham alerted me of the competition and what I needed to do to compete in it.

In order to enter I had to first write up an abstract about what my project entails and how I am going about completing it. Below you will find my abstract and can read all about my project to date and what I am hoping to accomplish with it.

 

Data Mining THATCamp – Hollis Peirce, Graham Undergraduate Digital History Research Fellow,

 

(Abstract)

 

 

Big data tools are not just for ‘big data’. In the humanities, they can provide a macroscopic view of patterns in materials that are otherwise difficult to analyze computationally. In this poster, I present the initial results of an analysis of the conversations at ‘THATCamp Accessibility 2012’, a conference held at Carleton in October 2012, using ‘overviewproject’, a system developed by data journalists for finding topics in data using term frequency-inverse document frequency methods.

 

THATCamp Accessibility 2012 was an ‘unconference’ (a series of free-form discussions) that explored issues of digital and physical access to humanistic research and materials. Sessions explored how digital tools help accessibility, designing accessible courses, digital museums and libraries, augmented reality, game based learning, and other ideas. These seminars were then recorded for future analysis. This project takes one of these conversations, on accessible museums and libraries, and analyzes it to identify underlying hidden themes and patterns of discourse.

 

Oral history normally transcribes the complete verbatim speech of a session; with these digital mining tools, we instead created an annotated bag-of-words by timestamp, analyzing this list via the overviewproject.org interface. While still in its early days, this project promises to accelerate the analytic step in oral history practice, removing one element of the subjectivity of oral history in favour of a dialetic between so-called ‘distant’ and ‘close’ readings of those transcripts.

So until then I will be continuing to transcribe the conversation to improve results.

 

Unravelling the Conversation

Thanks to the advice of Dr Graham I returned to my document on Overview to try and unravel some of the hidden topics of it. In order to do so I thought I needed to re-upload the entire document to tell it to ignore certain words. However, thanks to the prompt advice from Jonathan at Overview I found out that Overview is now able to create a new tree that ignores different words than originally asked for.

Therefore, I told it to ignore the major keywords that showed up in all the branches previously such as digital, data, humanities, and technology. Amazingly this created an explosion of new keywords such as accessible, databases, and information amongst many others.

To follow these words I created new tags for them each. These tags look like this:

Screen Shot 2014-04-01 at 5.32.58 PM

 

My plan this week therefore is to talk to Dr Graham more about how I should sift through these words even more.

Stay tuned to see what I uncover next!

Hiccup Solved/More Progress Made!

This week’s task was to discover if my the same words remained as important as the conversation progressed.  I have now transcribed fifteen minutes into the conversation. So there was more progress made in that sense, but I had a hiccup as I was attempting to upload my document again onto Overview.  Thanks to their help, as well as Dr Graham’s I tried in Firefox instead of Safari and it worked just fine.

As I predicted in my last post regarding Tags, the words Humanities, Libraries, and Technology continued to be important. New words appeared in the important words section as well though such as, media and publicly.

This coming week I hope to look further into these tags and see what I find.

Until next time…

Progression… Slowly But Surely

I have now looked through more of the conversation and it is interesting to be reminded of the topics that were discussed.  Ancestry.ca for example, and whether or not it is ok for someone to profit off of information that we are allowed to access publicly.  They just benefit though by making this information more accessible.

This is becoming a major concern of the conversation.  Who tracks the location of this information, or any information like it for that matter, once it is released on to the internet. Is it a bad thing to profit off of it? Or is this just a way of making the information more to those who don’t know how to get at it via other means?

Listening to it again is a great deal of fun and will move it on to Overview soon but was having difficulty getting my file uploaded today for some reason.  I will have to take a closer look, it will probably turn out to be something minute like last time but we shall see.

Until next time…

TAG! You’re it!

After a stupidly long and unsuccessful effort of solving my previous problem I decided to seek advice from Dr Graham.  Of course when I came to his office he solved my problem within seconds.  The reason, it turns out, that Overview was not reading my file was that I had an additional, unneeded, blank space in the top column after the word ‘text’.  BLAST!!!

The good news is that I am now on to bigger and more important things, putting Overview to use and analyzing the conversation itself.  I have only looked at the first ten minutes of audio but already I have begun to see a pattern.  Overview has helped me see the most commonly used word in the file, that word being ‘digital’.  It then follows this word, and other words that it sees less frequently, for example ‘humanities’, ‘library’ and ‘technology’, through the conversation by separating them into separate files.  I took a screen shot (see below) so that you can understand more clearly.

Screen Shot 2014-03-07 at 3.15.20 PM Screen Shot 2014-03-07 at 3.46.51 PM
As you can see, Overview takes the ‘tag’ that you give a word and shows you where that tag appears throughout.  In this shot it follows the tag ‘Humanities’.  While in the second it follows the primary tag of the conversation ‘Digital’.  My suspicion is that the words, ‘humanities’, ‘library’ and ‘technology’, that the program has seen less often, may become primary words in the conversation as it goes on.  Other secondary words that it found though such as ‘example’ and ‘two’ will most likely remain in that category.
Stay tuned to see if my suspicions are indeed correct!

Making A CSV File

In order to create a file that the Overview Project can read and analyze, there are a few different steps that need to be taken.  First and foremost, file must use UTF-8, ISO-8859-1, or Windows 1252 character set.  Secondly, the formatting of your file must be correct.  Therefore, my visual layout of the Museum and Libraries conversation needs to be saved as a CSV file rather than an XLS file.  That, of course, was easy enough to accomplish.  The final, and next step is where the difficulty lies, or at least it seems to for me.

Now that the file is in the correct format I have to ensure that the first line of each column contains the name of the content within that column.  Once this is done I also have to ensure that the column itself is either labelled “text”, “snippet”, or “content”.  This seems to be where I am running into trouble.  However, after reading the help and example documents over a few times I believe I may have solved my problem.

Stay tuned!

Change of Plans

After struggling for a few days with Gephi I turned to Dr Graham for some guidance.  He took a look at what I had done with tracking the museum and library conversation visually with Excel and told me that perhaps Gephi was not the best option.  Instead, he told me I should try a different program called Overview.

Therefore, I am now looking in to what Overview is all about by reading their FAQ and Blog. As far as I can tell so far it is looking pretty promising as the first paragraph of their FAQ section that answers “What is Overview?” states, “Overview is intended to help journalists, researchers, and other curious people make sense of massive, disorganized collections of electronic documents. It’s a visualization and analysis tool designed for sets of documents, typically thousands of pages of material”.

PERFECT!  Well I will try not to jump ahead to far in my excitement…