Retrieving stock news and Ex-date from SGX using python

For Singapore stocks, one of the way to retrieve the latest company news and announcements (such as trading halt, general announcements, dividend info) are through the Singapore Exchange (SGX) main webpage.

Besides company announcements, the following are the list of data that can be retrieved:

  1. Company announcements
  2. Company information
  3. Dividend Ex Date
  4. Latest price (also some of parameters based on SGX stock filters)

Directly scraping the website by parsing the html elements can be done but poses difficulties due to the different frames existed in the page itself and the contents being dynamic in nature (run using javascript).

A simpler way is to use the Chrome Developer tools to obtain the link to the raw data. More information on how that is achieved  is available in the following post “Getting historical financial statistics of stock using python“. Upon clicking the XHR of the Chrome Developer tab, the related stock raw data is found under the specific url. Upon clicking on the url, the data is found to be of json type but with some additional characters. The data set can be cleaned up hence converting it to a json format that is easy to download and manipulate using Pattern and Pandas. Pandas make it easy to convert any dict (which is basically a json file) to a dataframe.

The following general class will retrieve the json file and convert it to a pandas dataframe. Once the pandas dataframe is formed, it is easier for users to link it to other  tables or do further analysis on the data.

class WebJsonRetrieval(object):
    """
        General object to retrieve json file from the web.
        Would require only the first tag so after that can str away form the dict
    """
    def __init__(self):
        """ 

        """
        ## parameters
        self.saved_json_file = r'c:\data\temptryyql.json'
        self.target_tag = '' #use to identify the json data needed

        ## Result dataframe
        self.result_json_df = pandas.DataFrame()

    def set_url(self, url_str):
        """ Set the url for the json retrieval.
            url_str (str): json url str
        """
        self.com_data_full_url = url_str

    def set_target_tag(self, target_tag):
        """ Set the target_tag for the json retrieval.
            target_tag (str): target_tag for json file
        """
        self.target_tag = target_tag

    def download_json(self):
        """ Download the json file from the self.com_data_full_url.
            The save file is default to the self.saved_json_file.

        """
        cache.clear()
        url = URL(self.com_data_full_url)
        f = open(self.saved_json_file, 'wb') # save as test.gif
        try:
            str = url.download(timeout = 50)
        except:
            str = ''
        f.write(str)
        f.close()

    def process_json_data(self):
        """ Processed the json file for handling the announcement.

        """
        try:
            self.json_raw_data  = json.load(open(self.saved_json_file, 'r'))
        except:
            print "Problem loading the json file."
            self.json_raw_data = [{}] #return list of empty dict

    def convert_json_to_df(self):
        """ Convert json data (list of dict) to dataframe.
            Required the correct input of self.target_tag.

        """
        self.result_json_df = pandas.DataFrame(self.json_raw_data[self.target_tag])

This class assume that data retrieved are in strict json format. However, most of the data retrieved from the SGX main board website requires some special handling before it can be directly used as a json. Below are a few examples.

To retrieve the stocks announcements or news: the page will display the below string. The first 4 characters “{}&&” need to be removed for it to download as proper json file. The tag used will be “items”.

  • {}&&{“SHARES”:123, “items”:[{“key”:”96RY1TIA8Z……

To retrieve the ex-date of stocks: the page will display the below string. Similarly, the first 4 characters are to be removed.

  •  {}&&{“SHARES”:123, “items”:[{“key”:”22126″,”CompanyName”:”NX09100W 19060……

To retrieve the latest price of all stocks, we make use of the stock filter page. This is more complicated as all the key and items are not in double quotes. Besides removing the first few characters, it is also required to put all the key value pairs in double quotes. This can be achieved by using the regular expression.

  • {}&& {identifier:’ID’, label:’As at 13-04-2015 5:06 PM’,items:[{ID:0,N:’3Cnergy’,SIP:”,NC:’502′,R:”,I:”,M:’t’,LT:0.350,C:0.100

The full script can be found in Github. The module (SGX_stock_announcement_extract.py) is placed together with other modules related to stock extraction. The script itself also includes functions to filter required stock announcements and also creating alerts based on different price using the pushbullet. More descriptions on creating notifications are detailed in this post “Sending alerts to iphone or Android phone using python” The next part of the post will demonstrate creating price and announcement alerts with this module and pushbullet.

5 comments

Leave a comment