Welc

Twitter is really a massive social networking shopping mall. You’ve got tons of audience out there where you can analyze them and get really useful insights you really intend to find. If you ever want to research or study about something Twitter should be in your list of places to look for data.

There is so much you can do with the twitter data. For instance, you can determine relationships between users, stalking your ex or current one and much more. These data can be used to carryout some interesting analysis or research inside the social media universe. I’ve read about an app called Fire Me, the app collects and analyzes the recent tweets and finds out the most horrendous tweets about their jobs and has implemented an algorithm to find out the users who have strong chance of getting fired.

To be honest there is lots and lots of noisy or irrelevant data present inside the twitter universe. To really make most of your twitter data, you’ve to first figure out what really you are trying to find out or study. A very clearly defined objective and making sure of cleaning the data at your end. Organisations have been mining twitter data to understand categories, consumer mindset and needs, potential market trends, and the list is quite huge to list them all. There are companies who correlate the twitter data with their available data sources to predict the future and even get insights about their current business. All all there are very statistical and very mathematical.

Today we will be focusing on mining twitter data using python and tweepy was the obvious candidate. Before using tweepy API. But before that we have to create a twitter app to get hands on some keys.

First we will create an app here. It looks something like this

Click on Create New App to create an app on twitter and enter the following fields.

1. Name – Add some name upto 32 characters.

2. Description – What ever your app does, for this tutorial we will use something lame. you can have description from 10 to 200 characters.

3. Website – Put up your website in the field. Suppose to be your publicly accessible home page. For personal use just enter something you want. http://www.websitename.domain

4. Callback URL – Well this is really beyond scope of this tutorial, Just to give you heads up its something to do with the authentication of the users. If you are allowing your users to login in to your app.

At the end of the page there will be developer agreement. CLick on I agree check-box and click on Create twitter application

Once the application is create the page should redirect you to your application dashboard.

Click on the “Keys and Access Tokens” tab on the dashboard and you shall find your Consumer Key andConsumer Secret.

You’ve got your Consumer Key and Consumer Secret and now its time for generating your Access token andAccess token secret. Scroll down a bit and you’ll find a button which says Create my access token

Post clicking on the button you shall have your Access token and Access token secret.

Now we have all the keys to access our App via python. Lets do it!!!

As I had mentioned before, we will be using tweepy API for accessing our app.

To install tweepy just use the following command on windows or linux


pip install tweepy

or there are other way rounds if you are finding difficulties installing through this command

Download the whl file from following link and use the following command


pip install [directory-to-downloaded-whl-package]

After the module has been installed, we will use the simple program to access our public timeline.

or you can use


git clone git://github.com/joshthecoder/tweepy.git
cd tweepy/
python setup.py install

Once tweepy has been installed, We have to write something like this.


import tweepy

CONSUMER_KEY=""
CONSUMER_SECRET=""
ACCESS_TOKEN=""
ACCESS_TOKEN_SECRET=""


auth_name=tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth_name.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api=tweepy.API(auth_name)

tweets=api.home_timeline()
for tweet in tweets:
     tweet.text

This is a very simple program to get the recent 20 tweets, retweets posted by the user[in this case me] and my friends i.e the people I follow.

Lets break down the code



auth_name=tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)

#This returns the OAuthHandler instance from the provided consumer secret and consumer key
#Consider it a way to use the twitter application. We've just created its instance and to 
#use the application we will be supplying it with the access tokens in the next step

auth_name.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

#To Access the API features we have to create an API instance by supplying the authentication

api=tweepy.API(auth_name)

#Once you've created the API instance we are free to do anything we want with it.
#In the above example we've used the API instance to get the recent 20 tweets.


tweets=api.home_timeline()

#The above method will return a tweepy.model.ResultSet
#Lets analyze a single instance of the this result set

tweet_one=tweets[0] #Get the first instance

type(tweet_one) #This returns "tweepy.model.Status"

So in the above code we are iterating through each of the "tweepy.model.Status" object and printing text for each Status object.

for tweet in tweets:
     tweet.text

We can convert the Status object into a dictionary directly



tweets=api.home_timeline()

tweet=tweets[0] #Get the very first tweepy.model.status object

tweet_json=tweet._json 

type(tweet_json) # this returns dict type

Once they’ve been converted to the JSON format, you can use the key-value concept to access the data.

We need to look at the following important data in the Status object


favorite_count
entities
user
geo
coordinates

To access them from the JSON we have normal syntax

 tweet_json['entities']

We will be focusing on 2 values present in tweet_json i.e. entities and user

entities

tweet_json[‘entities’] returns a dictionary with following attributes

1. Symbols
2. User mentions
3. hashtags
4. urls

The hashtags are something we are interested in.


tweet_json['entities']['hashtags'] #To Access the hashtags used

To access the hashtags we will use the following code


hashtags=tweet_json['entities']['hashtags']

for hashtag in hashtags:
     print hashtag['text']

user

user=tweet_json[‘user’] #This returns the user details

This is used to access the details of the user whose status we are analyzing

user has following attributes


follow_request_sent -> True or False if we have sent that user a follow request.
id -> Some twitter unique ID 
profile_background_image_url_https -> The https url for background image of the user
verified -> True or False [The blue tick wankers]
entities -> only have urls
followers_count -> You get the idea
statuses_count ->number of status that are updated by the user
friends_count -> count of followers
description -> some words
location -> Location if provided
following -> returns True or False if we are following the user
screen_name -> The screen name of the user
profile_image_url -> You know how to stalk
name -> the name given by the user while creating the twitter profile

So much attributes right???? Well to be honest there are more attributes and you gotta explore it yourself

To retrieve more tweets from your home timeline, Use the following code.


tweets=api.home_timeline(count=200)

This will retrieve 200 recent tweets from your timeline. You cannot retrieve more than 200 recent tweets.

To stalk some user tweets, use the following code.


tweets=api.user_timeline(id="",count=200)

for tweet in tweets:
     json_data=tweet._json
     do_someprocessing(json_data)

do_someprocessing(json_data) is your written algorithm. The algorithm is something you intend to do with data.

Again the max number of tweets you can retrieve is 200.

Want to update a status???? Here’s how you do it


tweet_instance=api.update_status(status="Some Status")

The update_status returns a “tweepy.models.Status” object and you can analyze the same things that we had done before.

Suppose you want to reply the first status update that is present in your timeline.


tweet=api.home_timeline()[0]._json #get's the first tweet in JSON format
user=tweet['user']
api.update_status(status="some status @"+user['screen_name'],in_reply_to_status_id=tweet['id'])

Here we are referring to the @ we want to reply and we want the tweet id that we are referring to.

To remove most recent tweet done by me



tweet=api.user_timeline(id="MyUserName")[0]._json
f=api.destroy_status(tweet['id'])

You can even retweet from the one you are following. I will be retweeting only the first recent tweet but you can have some continious loop and on an event when person you are following tweets your program will retweet at the exact same moment


tweet=api.user_timeline(id="someid")[0]._json
f=api.retweet(tweet['id'])

To get some user details


user=api.get_user(id="username")

#Then access the user attributes to get whatever details you require

To follow someone



user=api.create_friendship(id="TheIDYouWantToFollow")

To Un-follow someone


user=api.destroy_friendship(id="username")

To check if user a follows user b


val=api.exists_friendship("UserIDA","UserIDB")

returns True or False

This is just the introduction of the twitter API and we got more to explore and we will be exploring it a lot. In the next tutorial we will cover the Cursor part of tweepy. The reason cursor is awesome because as I’ve mentioned above, our traditional methods only retrieve 200 records and not more than that. But if we want suppose say I am following a band I like and I’d love to get their old tweets. This is not possible through the above method specified. Thus, here in this case we will use a tweepy cursor. The next tutorial will also include hashtag mining and even the streaming live feed from twitter

There is lots you can do with tweepy API. Explore and have fun

Mining twitter using python – Part 1

entities

user

About author: Gourav kumar

0 comments:

Find us on Facebook

Labels

Popular Posts

Mining twitter using python – Part 1

entities

user

About author: Gourav kumar

Related Posts

0 comments:

Live Traffic Stats

Find us on Facebook

Labels

Popular Posts