User Tools

Site Tools


data_collection

Data Collection

The data we collected consists of all usages of the #notracist hashtag from 19/03/13 to 17/11/13, which has resulted in 24,853 tweets over the approximately eight month period. To give a sense of how voluminous the #notracist talk is on a day-by-day basis, this averages out at slightly over 100 tweets per day, with the least populated day in our data consisting of 36 tweets and the most populated day featuring 239 tweets.

Data was collected (and analyzed) using Chorus, which allows users to collect various fields associated with individual tweets, including:1)

  • The date and time the tweet was published
  • Tweet content (inc. hashtags and URLs) and tweet ID number
  • User handle
  • Chosen screen name
  • Followers/following associated with the user (numbers)
  • Users' given timezones
  • Geo-coordinates of the location the tweet was published from (if available)
  • The number of times that tweet has been ReTweeted
  • Positive and negative sentiment values (see http://sentistrength.wlv.ac.uk/ for details)

Issues in Data Collection

A large part of our early working with the #notracist dataset focussed on what (and who) it consisted of. Collecting by keyword is a different enterprise to typical social science data collection techniques, which are usually done by selecting a group of people sharing a set of demographic properties (i.e. age, ethnicity, gender, and so on) and polling those that fit the criteria for information. As such, collecting tweets on the basis of their usage of a shared term - #notracist - provided us with a unique composition of data which we believed required consideration in terms of how to treat it and to what analytic processes it might be put.

Our initial explorations of the data revealed that the tweeters captured represented “the long tail” of Twitter - these tweeters typically used the #notracist hashtag only once or twice in the entire eight-month period of data collection, and hence our dataset consists of a proportionally large amount of different tweeters. Moreover, these tweeters are on the whole unconnected from each other - they are not Twitter friends and they do not make up any kind of Twitter “community”. These became important things to keep in mind throughout the project, in terms of characterising what the #notracist hashtag consists of.

Although the data collected was done so by 'linguistic' criteria, i.e. collecting all tweets around the usage of the #notracist hashtag, we were interested in exploring the tweeting practices of those using the hashtag. This involves conceiving the hashtag not only as a linguistic marker, but as as racialized machinic operator that is central to grasping Twitter as a techno-cultural assemblage.2) Moreover, we relied on how the “big picture” of the data, as taken on aggregate, can be used to show insight into the construction of the #notracist hashtag without users actively agreeing on a format to the “conversation”. It is this phenomena we explore in our Analysis.

<Back to project homepage>

data_collection.txt · Last modified: 11-Apr-14 12:50 by sanjay