Get Stocks tweets using Twython (Updates)

Add more functionality to the script on getting stocks tweets using Twython and python. Add in a class StockTweetsReader that inherited the base class TweetsReader.

The StockTweetReader class is able to take in a series of stock name (as in company name) and incorporate  the different search phrases such as ( <stockname> stock, <stockname> sentiment, <stockname> buy)  to form a combined twitter query.

This search phrases are joined together by the “OR” keywords and the twitter search is based on the series of queries. Below is part of code showing the joining of stock name to the additional parts and which the phrases will eventually be joined with the “OR” operator. The final query will look something like <stockname> OR <stockname> shares OR <stockname> stock etc based on the modified part of the list as [”,’shares’,’stock’, ‘Sentiment’, ‘buy’, ‘sell’]

    self.modified_part_search_list = ['','shares','stock', 'Sentiment', 'buy', 'sell']
    def set_search_list_and_form_search_query(self):
        """ Set the search list for individual stocks.
            Set to self.search_list and self.twitter_search_query.
        self.search_list = ['&quot;' + self.target_stock + ' ' + n + '&quot;'for n in self.modified_part_search_list]

After iterating through the series of stocks symbols, it will compute the number of tweets, group by date, for each company or stock name to see any sudden spike in interest of the particular stock at any given date. Sample of the tweets count results from  a series of Singapore stocks are shown below:

 Processing stock: Sembcorp Ind
Processing stock: Mapletree Com Tr
Processing stock: Riverstone
20141006 14
20141007 86
Processing stock: NeraTel
20140930 3
Processing stock: Amtek Engg
Processing stock: Fortune Reit HKD
Processing stock: SATS
20141007 100
Processing stock: UOB Kay Hian
20141001 1
20141003 2
Processing stock: CapitaR China Tr
Processing stock: LantroVision
Processing stock: Sim Lian
20140929 1
20141001 2
20141005 1

There are currently limitation of the results due to API limitation. One is that the query is limited to 100 results and that it is limited to recent tweets (maybe capped within a month or two period). The other is that for short form stock name it may get other tweets having the same short form as the stockname or it might get stuff irrelevant of the stock news eg SATS which has 100 tweets in a single day.

The updated script is found in GitHub. It may need certain workaround to resolve some of the limitations observed.


Get Stocks tweets using Twython

Twython is a python twitter API for getting tweets as well as performing more advanced features such as posting or updating status. A particular project of mine requires monitoring stock tweets in the hope that it will help to give more insight about the particular stock. One of the way, I thinking,  is to detect sudden rise in number of tweets for a particular stock for a particular day which signify increased attention or activities of that stock.

The script required authentication from Twitter hence requiring a twitter account. We just be needing the OAuth2 authentication, which is sufficient for only requesting feeds. Twython have described in their documentation on the setting up of the various authorization. After setting up, querying the search is relatively easy which can be found in the following tutorial. Additional parameters of the search function can also be found in the website.

A sample of a script that scan based on series of keywords is as below. The script will formed the search query string based on the include_search_list and ignore items based on the exclude list. More advanced usage of the different query method can be found in the tutorial.. The items in the include_search_list are joined by the “OR” words. Similarly, the items in the exclude_list is joined by “-” , meaning the tweets that have the phrases will be excluded from the search results.

The date extracted from the search function under “created_at” are modified to a date_key for easy comparison. Hence, by grouping the date_key, we can know the number of tweets for the particular stock for each day. Any unusual sign or increased activities can then be noted. Below code shows the query method used for the twitter search function.

    def perform_twitter_search(self):
        """Perform twitter search by calling the function.
            Ensure the setting for search such as lang, count are being set.
            Will store the create date and the contents of each tweets.
        for n in, lang = self.lang,
                                         count= self.result_count, result_type = self.result_type)[&quot;statuses&quot;]:
            # store the date
            date_key =  self.convert_date_str_to_date_key(n['created_at'])
            contents = n['text'].encode(errors = 'ignore')
            self.search_results.append([date_key, contents])

To convert the date str to date key for easy processing, the calendar module is used to convert the month to integer and eventually join with the year str and day str.

    def convert_date_str_to_date_key(self, date_str):
        """Convert the date str given by twiiter [created_at] to date key in format YYYY-MM-DD.
                date_str (str): date str in format given by twitter. 'Mon Sep 29 07:00:10 +0000 2014'
                (int): date key in format YYYYMMDD
        date_list = date_str.split()

        month_dict = {v: '0'+str(k) for k,v in enumerate(calendar.month_abbr) if k &lt;10}
        month_dict.update({v:str(k) for k,v in enumerate(calendar.month_abbr) if k &gt;=10})

        return int(date_list[5] + month_dict[date_list[1]] + date_list[2])

To count the number of tweets for a particular day, pandas module is used in this case but other method can do the job too.

    def count_num_tweets_per_day(self):
        """ Count the number of tweets per day present. Only include the days where there are at least one tweets,.
        day_info = [n[0] for n in self.search_results]
        date_df = pandas.DataFrame(day_info)
        grouped_date_info = date_df.groupby(0).size()
        date_group_data = zip(list(grouped_date_info.index), list(grouped_date_info.values))
        for date, count in date_group_data:
            print date,' ', count

The full script is found in GitHub. Note that there seems to have some limitations or number tweets from using Twitter API compared to the search results displayed from the main Twitter interface. This poses some limitations to the information the program can provide.