Weekly Reports: May 30 – June 5, 2016, Expanding Twitter Terrorism Research

Apologies for the hiatus. Last week was my first full week back, since I was in Germany for the ICWSM 2016 conference and presenting at the HCIL Symposium the previous two weeks. It was my first time at ICWSM, in Germany, and presenting at the HCIL Symposium, and all of it was amazing!

I don’t have the usual day-by-day breakdown of my research, but I will instead post a general overview of my work from last week.

My ICWSM paper was on Twitter’s response to terrorist attacks in Western countries, and I focused specifically on the Boston Marathon bombing, Sydney Hostage Crisis, and Charlie Hebdo attacks (my poster is available here: ICWSM16_Poster_Portrait). Since writing the paper though, two additional tragic events occurred: the Paris November attacks, and the Brussels airport attacks. It made sense to apply the same analyses from my ICWSM paper to these new cases and see if the same behaviors were observed.

I also wanted to experiment with some of the new technology that supports interactive analyses on “big data,” so I began working with Anaconda, Apache Toree, and Bokeh-Scala to see if I could duplicate my original analyses directly on the big NSF-funded cluster we have on campus at the University of Maryland.

To these ends, I built a pair of Jupyter notebooks (using the Apache Toree Spark kernel) that runs on our cluster, reads data directly from HDFS, analyzes it with Spark, and produces graphics using Bokeh.

I’ve made these notebooks and the original ICWSM analysis available on Github. Feel free to modify and play with the data and analysis!

ICWSM 2016 Analytics

Paris November Attacks

Brussels Transit Attacks

Social Media Analytics During Crises

I had the great opportunity to run a tutorial on social media analytics during crises at the 2016 HCIL Symposium at UMD this year.
As with my previous talk at MITH on Twitter + Ferguson, I wanted to give a talk that was informative about tools but also be hands-on enough, so attendees could see some easy analytics they could modify to answer their own questions.

The notebooks are available on Github, include data acquisition from Reddit, Facebook, and Twitter, and you can view them directly on github here: https://github.com/cbuntain/TutorialSocialMediaCrisis

This material includes:

Material Overview

Tutorial Introduction

  • Terror Data sets
    • Boston Marathon
      • 15 April 2013, 14:49 EDT -> 18:49 UTC
    • Charlie Hebdo
      • 7 January 2015, 11:30 CET -> 10:30 UTC
    • Paris Nov. attacks
      • 13 November 2015, 21:20 CET -> 20:20 UTC (until 23:58 UTC)
    • Brussels
      • 22 March 2016, 7:58 CET -> 6:58 UTC (and 08:11 UTC)

Data Acquisition

  • Topic 1: Introducing the Jupyter Notebook
    • Jupyter notebook gallery
  • Topic 2: Data sources and collection
    • Notebook: T02 – DataSources.ipynb
    • Data sources:
      • Twitter
      • Reddit
      • Facebook
  • Topic 3: Parsing Twitter data
    • Notebook: T03 – Parsing Twitter Data.ipynb
    • JSON format
    • Python json.load

Data Analytics

  • Notebook: T04-08 – Twitter Analytics.ipynb
  • Topic 4: Simple frequency analysis
    • Top hash tags
    • Most common keywords
    • Top URLs
    • Top images
    • Top users
    • Top languages
    • Most retweeted tweet
  • Topic 5: Geographic information systems
    • General plotting
    • Country plotting
    • Images from target location
  • Topic 6: Sentiment analysis
    • Subjectivity/Objectivity w/ TextBlob
  • Topic 7: Other content analysis
    • Topics in relevant data
  • Topic 8: Network analysis
    • Building interaction networks
    • Central accounts
    • Visualization

IPython Notebook for Exploratory Twitter Analytics

I recently gave a talk at UMD’s Maryland Institute for Technology in the Humanities (MITH) about advanced analytical techniques for Twitter analysis during the Ferguson, MO protests last year. Since the majority of the audience was non-technical, the talk focused more on what sort of analytics one might run on Twitter data, but I still wanted the talk to be hands-on enough, so attendees could see how these sorts of analytics were done and how they could modify them to answer their own questions.

To support this hands-on, exploratory analysis, I created an IPython notebook that covers the following topics:

  • Frequency Analysis
  • Geolocation in Twitter
  • Media in Twitter
  • Sentiment Analysis
  • Topic Modeling
  • Network Analysis

The notebook is available on Github, and you can view it via the Jupyter NBViewer here:

The analysis code should work on any Twitter data you provide (as long as it is GZipped), so feel free to use it to explore other Twitter data sets as well!

Super Bowl 2014 Bingo

For this year’s Super Bowl, I put together a quick ImageMagick-based application for building bingo cards given a set of images. It was great fun and pretty popular.

It should work for pretty much any bingo-like event you have; you’ll just need to change the background PDF and the image set to suit your needs.

Auto Stitching Images

After some experience with OpenCV and a trip to the Smithsonian’s new museum outside Dulles to see the Discovery, I thought it might be fun to explore stitching together a large hi-res image of the Discovery. OpenCV already has the capabilities for an image stitching pipeline, so I built a preliminary application for creating such an image automatically. It takes a key frame that you want to use as the main image, a directory that contains the remaining images, and an output directory for storing the stitched images.

The application uses SIFT to find key points in each image and then finds the most similar image from the directory and stitches it together. Each newly stitched image is then fed back into the program for stitching with the next most similar image.

Predicting Results of NCAA Football Games

To present some competition among friends, a coworker and I started a “tournament” in which we competed to see who could build the best predictor. As such, I built a prediction engine for determining the winners in NCAA football games. The engine leverages freely available NCAA statistics from the NCAA website and compares deltas between the stats of the two opposing teams from their previous week’s games. Currently, I am able to predict at around 70% accuracy.

1 forks.
0 open issues.
Recent commits: