Birdwatching

[EDIT (5th Aug 2013): Some of this is out of date due to recent changes in the Twitter API. I would rewrite it but all the basics are still the same. If twitter stops mucking with the API I'll update it. Till then I leave fixing the bugs as an excercise to the reader, you can contact me if you get stuck.]

Okay here is part 1 of my tale of my attempt to do some twitter based data analysis. I was spurred to do this when wondering about a specific user's tendency to only mention twitter users with huge numbers of followers. The number of followers a twitter user has is a very rough metric of gravitas in the twitter sphere and often fame in the "real" world. As such I wondered if you could identify certain aspects of a user's personality by how he interacts with other users. Now before I can even think about the analysis I will need some data to work with. You could acquire this twitter data several ways but I went with the linux-python-tweepy way as you can see below. I also coded it whilst watching the hulk. That is recommended but arguably not essential.

Random screen showing Ubuntu desktop, python with Tweepy and The Hulk playing in the top right.

The main reason for choosing this particular data acquisition pathway is that I wanted to put as little effort into this as is humanly possible. Python is, for those of you who don't already know, the perfect programming language. I realize many will disagree with this assertion but that's okay I have also been wrong in the past. Joking aside, python really is a lovely language for this sort of thing and a mere 5 minutes googling can tell you why. My expended energy minimization approach is also why I am doing this on a ubuntu (linux) system. It would thou be pretty much the same on Windows or Mac OS thou the installation of the required packages might be a little different and you will have to install python first if using Windows (if you have related questions google is again your friend here).

Tweepy is merely a nice little wrapper that lets you interact with the twitter application programming interdace (the REST API) within python. You can read all about it here. Tweepy is open source and its makers deserve credit for making it so elegant and simple to use. The equally nice ubuntu folks include tweepy in their repositories. As such in ubuntu to install tweepy and have it correctly linked in python merely involves typing the following:

$ sudo apt-get install python-tweepy

Now twitter.com have decided in their wisdom to limit the amount of times you can use their REST API per hour. The limits are currently 150/hr for an unauthorized application and 350/hr for an authorized one. These limits seem generous but actually they are so low as to prevent you doing any really interesting number crunching in a resonable time. You can however cache results over a long time period or use different twitter APIs to avoid running into issues (see here).

You can do all the following without authorization but it could later make things much slower so lets get authorization. Go to and log in at https://dev.twitter.com/. Click on "Create an app" then enter a name, a reasonable description, a full url (as in something starting http), read the terms and conditions and if you agree click continue. Like most sites twitter is constantly messing around with option menus so these might get renamed but you get the picture.

The next stage is making your application's OAuth token. You do not need to worry what Oauth is but if you want to you can read all about this system here http://oauth.net/ (twitter did previously support other authorization methods but now only supports Oauth). On this options page you can change the permissions you grant your app. I only need to read twitter not post to it so I picked read-only. If you want to post status updates you need write permission. If you read that last sentence and thought great now I can make a porn spambot two things A: I hate you and B: Twitter.com will find out and block your app and future ones so please just don't. Anyway next click create your token. You need to record the following four long passwords values given on this page before leaving:

1. Consumer key
2. Consumer secret
3. Access token
4. Access token secret

Simple screenshot of the Oauth page at twitter developer's site.

The 2 passwords whose names end in "secret" should never ever be disclosed to others nor given out in your application hence their cunning names. Now to get connected and test everything is working. Open a text document and copy the following code replacing XXXX with the long passwords you just copied.

#!/usr/bin/env python import tweepy #Connect with Oauth CONSUMER_KEY = 'XXXX' CONSUMER_SECRET = 'XXXX' ACCESS_KEY = 'XXXX' ACCESS_SECRET = 'XXXX' #Check connection try: auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET) auth.set_access_token(ACCESS_KEY, ACCESS_SECRET) api = tweepy.API(auth) limit= api.rate_limit_status()['remaining_hits'] print "You have successfully authorized with Oauth." except: print "You have not successfully authorized with Oauth" try: api = tweepy.API() limit= api.rate_limit_status()['remaining_hits'] except: print "Can't connect. Is tweepy installed and is twitter.com accessable?" #API Limit test print 'You have',limit,'twitter API calls left.' print "" #Get my last 10 status updates auser=api.get_user('kasilas') TweetNo=0 LastTweets=auser.timeline(count=10) for EachTweet in LastTweets: TweetNo=TweetNo+1 TweetText=EachTweet.text print "Tweet",TweetNo,":\n",TweetText print ""

Save this as "OauthTest.py" then make it executable by opening a terminal and going to the directory you saved the text file in and typing:

$ chmod +x OauthTest.py

then run the program with the command

$ ./OauthTest.py

You should get a little message letting you know whether you are successfully authorized with Oauth or not then listing my last 10 tweets. If it turns out you aren't authorized you probably entered one of the long passwords wrong. You can check the printed tweets against those at twitter.com for @kasilas but if you got any response chances are all is working fine. As everything is working I will end here. In the next post I will use a bit of python to do some very basic data scrapping and analysis.