Apologies for the hiatus. Last week was my first full week back, since I was in Germany for the ICWSM 2016 conference and presenting at the HCIL Symposium the previous two weeks. It was my first time at ICWSM, in Germany, and presenting at the HCIL Symposium, and all of it was amazing!
I don’t have the usual day-by-day breakdown of my research, but I will instead post a general overview of my work from last week.
My ICWSM paper was on Twitter’s response to terrorist attacks in Western countries, and I focused specifically on the Boston Marathon bombing, Sydney Hostage Crisis, and Charlie Hebdo attacks (my poster is available here: ICWSM16_Poster_Portrait). Since writing the paper though, two additional tragic events occurred: the Paris November attacks, and the Brussels airport attacks. It made sense to apply the same analyses from my ICWSM paper to these new cases and see if the same behaviors were observed.
I also wanted to experiment with some of the new technology that supports interactive analyses on “big data,” so I began working with Anaconda, Apache Toree, and Bokeh-Scala to see if I could duplicate my original analyses directly on the big NSF-funded cluster we have on campus at the University of Maryland.
To these ends, I built a pair of Jupyter notebooks (using the Apache Toree Spark kernel) that runs on our cluster, reads data directly from HDFS, analyzes it with Spark, and produces graphics using Bokeh.
I’ve made these notebooks and the original ICWSM analysis available on Github. Feel free to modify and play with the data and analysis!
I had the great opportunity to run a tutorial on social media analytics during crises at the 2016 HCIL Symposium at UMD this year.
As with my previous talk at MITH on Twitter + Ferguson, I wanted to give a talk that was informative about tools but also be hands-on enough, so attendees could see some easy analytics they could modify to answer their own questions.
I recently gave a talk at UMD’s Maryland Institute for Technology in the Humanities (MITH) about advanced analytical techniques for Twitter analysis during the Ferguson, MO protests last year. Since the majority of the audience was non-technical, the talk focused more on what sort of analytics one might run on Twitter data, but I still wanted the talk to be hands-on enough, so attendees could see how these sorts of analytics were done and how they could modify them to answer their own questions.
To support this hands-on, exploratory analysis, I created an IPython notebook that covers the following topics:
Geolocation in Twitter
Media in Twitter
The notebook is available on Github, and you can view it via the Jupyter NBViewer here:
After some experience with OpenCV and a trip to the Smithsonian’s new museum outside Dulles to see the Discovery, I thought it might be fun to explore stitching together a large hi-res image of the Discovery. OpenCV already has the capabilities for an image stitching pipeline, so I built a preliminary application for creating such an image automatically. It takes a key frame that you want to use as the main image, a directory that contains the remaining images, and an output directory for storing the stitched images.
The application uses SIFT to find key points in each image and then finds the most similar image from the directory and stitches it together. Each newly stitched image is then fed back into the program for stitching with the next most similar image.
To present some competition among friends, a coworker and I started a “tournament” in which we competed to see who could build the best predictor. As such, I built a prediction engine for determining the winners in NCAA football games. The engine leverages freely available NCAA statistics from the NCAA website and compares deltas between the stats of the two opposing teams from their previous week’s games. Currently, I am able to predict at around 70% accuracy.