SGX

Retrieving short sell qty for SG stocks from SGX using python

SGX usually releases short sell information for each stock at the end of each trading day. This information are found in their website. The daily short sell of all stocks are compiled into a  report classified by day. We are interested in getting the short qty ranked by stocks per day.

If we examine the link, each report is in the form of a table format. To extract the information, we can use python pattern for web content download and Pandas for table extraction. Pandas has a function “pandas.io.html.read_html” that can retrieve table like data from the html string easily.

The following lists the steps to retrieve the short sell information.

  1. URL formation: As the link are joined by the date, need to retrieve the date str to join to the fixed url string. However, not all the date will be present, eg , during weekends. A better way is keep looping the the date back from current to get the latest date avaliable.
  2. HTML data download: This can be done using python pattern.
  3. Converting the table to data frame: This can be done using Pandas function “pandas.io.html.read_html”. Also Pandas provides a rank function so that the results can be ranked accordingly. Converting into Pandas database make it easy.
  4. Ranking by absolute qty may tend to mislead as it will also depends on the shares relative volume. Combining with the actual shares traded will give  a more representative data. For this case, the data frame retrieved can be joined to the current price df created from the previous post “Retrieving stock news and Ex-date from SGX using python“.
  5. The last will be to set the alerts which can be done easily using PushBullet as describe as the following post “Sending alerts to iphone or Android phone using python“. You can customize to send the alert at the end of each trading day to determine the top 10 short sell stocks.

Below show the short sell info retrieval portion of the code found in the  “SGX_stock_announcement_extract.py”  for retrieving the short sell qty for each stocks. The updated code is found in Github.


    def retrieve_shortsell_info(self):
        """ Retrieve the shortsell information.
            will form the url and retrieved the information using pandas to make into table.
            The function will set to self_shortsell_info_df.
            make it iterat over the days to get the latest data
        """
        for last_effective_date in range(7):
            self.form_shortsell_url(last_effective_date)
            url = URL(self.shortsell_full_url)
            try:
                #see data is available for that current date
                url_data = url.download(timeout = 50)
                shortsell_list = pandas.io.html.read_html(url_data)
                self.shortsell_info_df =shortsell_list[1]
            except:
                continue

            #continue if there is no data
            if len(self.shortsell_info_df) == 0: continue

            self.shortsell_info_df.rename(columns={0:'Security',1:'Short Sale Volume',
                                                  2:'Currency',3:'Short Sale Value',
                                                    },inplace =True)
            self.shortsell_info_df = self.shortsell_info_df[1:-3]
            #change type of the columns
            self.shortsell_info_df[['Short Sale Volume', 'Short Sale Value']] = self.shortsell_info_df[['Short Sale Volume', 'Short Sale Value']].astype(float)
            #need a rank on the short sell
            self.shortsell_info_df['ranked_shortsell'] = self.shortsell_info_df['Short Sale Volume'].rank(method='min',ascending=False)
            self.shortsell_info_df['shortsell_lastdate'] = self.set_last_desired_date(last_effective_date)
            #need percentage as well

            # have a sorting of data?
            return

        print 'No suitable data found within time frame.'
        return

    def form_shortsell_url(self, last_effective_date):
        """ Based on the current date to set the shorsell url.
            Set to self.shortsell_full_url
            Args:
                last_effective_date (int): last desired date in yyyymmdd.
        """
        #retrieve the current date in yyyymmdd format
        self.shortsell_date_url = self.set_last_desired_date(num_days = last_effective_date)
        self.shortsell_full_url = self.shortsell_info_start_url + self.shortsell_date_url + self.shortsell_end_url

    def set_last_desired_date(self, num_days = 0):
        """ Return the last date in which the results will be displayed.
            It is set to be the current date - num of days as set by users.
            Affect only self.print_feeds function.
            Kwargs:
                num_days (int): num of days prior to the current date.
                Setting to 0 will only retrieve the current date
            Returns:
                (int): datekey as yyyyymmdd.
        """
        last_eff_date_list = list((datetime.date.today() - datetime.timedelta(num_days)).timetuple()[0:3])

        if len(str(last_eff_date_list[1])) == 1:
            last_eff_date_list[1] = '0' + str(last_eff_date_list[1])

        if len(str(last_eff_date_list[2])) == 1:
            last_eff_date_list[2] = '0' + str(last_eff_date_list[2])

        return str(last_eff_date_list[0]) + str(last_eff_date_list[1]) + str(last_eff_date_list[2])

    def shortsell_notification(self):
        """ Use for alerts on shortsell information.
            Identify top ten short sell plus target stock short sell information.

        """
        ## get the current price df so can combined with the shortsell info
        self.process_all_data()
        merged_shortsell_df = pandas.merge(self.shortsell_info_df,self.sgx_curr_price_df,left_on = 'Security', right_on = 'CompanyName' )

        ## add in additional columns
        merged_shortsell_df['shortsell_vol_per'] = merged_shortsell_df['Short Sale Volume']/merged_shortsell_df['DailyVolume']
        merged_shortsell_df['ranked_percent_vol_shortsell'] = merged_shortsell_df['shortsell_vol_per'].rank(method='min',ascending=False)

        top_shortsell_df = merged_shortsell_df[merged_shortsell_df['ranked_shortsell'].isin(range(1,16))]
        top_shortsell_df  = top_shortsell_df.sort(columns = 'ranked_shortsell', ascending =True)
        top_shortsell_df = top_shortsell_df[['Security','Short Sale Volume','shortsell_lastdate']]
        shortsell_top15_shtver = top_shortsell_df.to_string()

        api_key_path = r'C:\Users\356039\Desktop\running bat\pushbullet_api\key.txt'
        with open(api_key_path,'r') as f:
            apiKey = f.read()

        p = PushBullet(apiKey)

        if shortsell_top15_shtver:
            p.pushNote('all', 'Shortsell top10', shortsell_top15_shtver,recipient_type="random1")

        ## display for target watchlist
        tar_watchlist_shortsell_df = merged_shortsell_df[merged_shortsell_df['Security'].isin(self.companyname_watchlist)]
        tar_watchlist_shortsell_df = tar_watchlist_shortsell_df[['Security','Short Sale Volume','ranked_shortsell','shortsell_vol_per','ranked_percent_vol_shortsell']]
        tar_watchlist_shortsell_df =tar_watchlist_shortsell_df[tar_watchlist_shortsell_df['ranked_shortsell'].isin(range(1,100))]
        tar_watchlist_shortsell_df  = tar_watchlist_shortsell_df.sort(columns = 'ranked_shortsell', ascending =True)
        tar_watchlist_shortsell_shtver = tar_watchlist_shortsell_df.to_string()

        if tar_watchlist_shortsell_shtver:
            p.pushNote('all', 'Shortsell targetwatchlist', tar_watchlist_shortsell_shtver,recipient_type="random1")

Sample output as followed:
Security | Short Sale Volume|  ranked_shortsell | shortsell_vol_per | ranked_percent_vol_shortsell
Sembcorp Ind | 3529600 | 6 | 0.437422 | 4
CapitaLand | 3313300 | 7 | 0.354216|  7
SingTel | 2809000 | 8 | 0.276471 | 16
Lippo Malls Tr | 2073800 | 11 | 0.492531 | 2

  1. Ranked_shortsell –> rank according to the absolute volume
  2. Shortsell_vol_per –> short sell qty as ratio of transacted vol
  3. ranked_percent_vol_shortsell –> rank according to Shortsell_vol_per

 

Advertisements

Retrieving stock news and Ex-date from SGX using python

For Singapore stocks, one of the way to retrieve the latest company news and announcements (such as trading halt, general announcements, dividend info) are through the Singapore Exchange (SGX) main webpage.

Besides company announcements, the following are the list of data that can be retrieved:

  1. Company announcements
  2. Company information
  3. Dividend Ex Date
  4. Latest price (also some of parameters based on SGX stock filters)

Directly scraping the website by parsing the html elements can be done but poses difficulties due to the different frames existed in the page itself and the contents being dynamic in nature (run using javascript).

A simpler way is to use the Chrome Developer tools to obtain the link to the raw data. More information on how that is achieved  is available in the following post “Getting historical financial statistics of stock using python“. Upon clicking the XHR of the Chrome Developer tab, the related stock raw data is found under the specific url. Upon clicking on the url, the data is found to be of json type but with some additional characters. The data set can be cleaned up hence converting it to a json format that is easy to download and manipulate using Pattern and Pandas. Pandas make it easy to convert any dict (which is basically a json file) to a dataframe.

The following general class will retrieve the json file and convert it to a pandas dataframe. Once the pandas dataframe is formed, it is easier for users to link it to other  tables or do further analysis on the data.

class WebJsonRetrieval(object):
    """
        General object to retrieve json file from the web.
        Would require only the first tag so after that can str away form the dict
    """
    def __init__(self):
        """ 

        """
        ## parameters
        self.saved_json_file = r'c:\data\temptryyql.json'
        self.target_tag = '' #use to identify the json data needed

        ## Result dataframe
        self.result_json_df = pandas.DataFrame()

    def set_url(self, url_str):
        """ Set the url for the json retrieval.
            url_str (str): json url str
        """
        self.com_data_full_url = url_str

    def set_target_tag(self, target_tag):
        """ Set the target_tag for the json retrieval.
            target_tag (str): target_tag for json file
        """
        self.target_tag = target_tag

    def download_json(self):
        """ Download the json file from the self.com_data_full_url.
            The save file is default to the self.saved_json_file.

        """
        cache.clear()
        url = URL(self.com_data_full_url)
        f = open(self.saved_json_file, 'wb') # save as test.gif
        try:
            str = url.download(timeout = 50)
        except:
            str = ''
        f.write(str)
        f.close()

    def process_json_data(self):
        """ Processed the json file for handling the announcement.

        """
        try:
            self.json_raw_data  = json.load(open(self.saved_json_file, 'r'))
        except:
            print "Problem loading the json file."
            self.json_raw_data = [{}] #return list of empty dict

    def convert_json_to_df(self):
        """ Convert json data (list of dict) to dataframe.
            Required the correct input of self.target_tag.

        """
        self.result_json_df = pandas.DataFrame(self.json_raw_data[self.target_tag])

This class assume that data retrieved are in strict json format. However, most of the data retrieved from the SGX main board website requires some special handling before it can be directly used as a json. Below are a few examples.

To retrieve the stocks announcements or news: the page will display the below string. The first 4 characters “{}&&” need to be removed for it to download as proper json file. The tag used will be “items”.

  • {}&&{“SHARES”:123, “items”:[{“key”:”96RY1TIA8Z……

To retrieve the ex-date of stocks: the page will display the below string. Similarly, the first 4 characters are to be removed.

  •  {}&&{“SHARES”:123, “items”:[{“key”:”22126″,”CompanyName”:”NX09100W 19060……

To retrieve the latest price of all stocks, we make use of the stock filter page. This is more complicated as all the key and items are not in double quotes. Besides removing the first few characters, it is also required to put all the key value pairs in double quotes. This can be achieved by using the regular expression.

  • {}&& {identifier:’ID’, label:’As at 13-04-2015 5:06 PM’,items:[{ID:0,N:’3Cnergy’,SIP:”,NC:’502′,R:”,I:”,M:’t’,LT:0.350,C:0.100

The full script can be found in Github. The module (SGX_stock_announcement_extract.py) is placed together with other modules related to stock extraction. The script itself also includes functions to filter required stock announcements and also creating alerts based on different price using the pushbullet. More descriptions on creating notifications are detailed in this post “Sending alerts to iphone or Android phone using python” The next part of the post will demonstrate creating price and announcement alerts with this module and pushbullet.