Day: August 23, 2014

Extracting stocks info from yahoo finance using python (Updates)

Have made several updates to the script from previous posting. Firstly is the capability to sweep through all the stocks symbol for a .csv file. The list of stocks symbol is easily generated using the extract all symbol script describe in the previous post. Reading all the symbols from the CSV can be done using python Pandas as shown below.

        data_ext = YFinanceDataExtr()
        ## read  data from .csv file -- full list of stocks
        csv_fname = r'C:\pythonuserfiles\yahoo_finance_data_extract\stocklist.csv'
        stock_list = pandas.read_csv(csv_fname)
        # convert from pandas dataframe object to list
        stock_list = list(stock_list['SYMBOL'])
        #stock_list = ['S58.SI','S68.SI']

The second improvement is instead of keying all the individual properties that need to be extracted (as illustrated below), the list of properties can be read from a xls table using the xls_table_extract_module described in the following post.

original method to set the property in the url

    def form_cur_quotes_property_url_str(self):
        """ To form the properties/parameters of the data to be received for current quotes
            To eventually utilize the get_table_fr_xls.
            Current use default parameters.
            name(n0), symbol(s), the latest value(l1), open(o) and the close value of the last trading day(p)
            volumn (v), year high (k), year low(j)

            Further info can be found at :
        start_str = '&f='
        target_properties = 'nsl1opvkj'
        self.cur_quotes_property_portion_url =  start_str + target_properties

 New method: xls table format. (the xls illustrated here is the simplified version). The full property xls is in Github.


The data can be retrieved easily using the xls_table_extract_module hence easily forming the properties str by concat the tag together. The information required can be customized to the order based on the order of xls and the information required can be turned on and off using the comment tag ‘#’.  Note  some of the properties retrieved might not be in format that easy to parse and might result in extra column upon downloading. The portion of script to handle this is as described below.

    def form_cur_quotes_property_url_str_fr_excel(self):
        """ Required xls_table_extract_module.
            Get all the properties from excel table.
            Properties can be selected by comment out those properties not required.
            Also set the heeader: self.cur_quotes_parm_headers for the values.

        from xls_table_extract_module import XlsExtractor
        self.xls_property_data = XlsExtractor(fname = self.properties_excel_table, sheetname= 'Sheet1',
                                             param_start_key = 'stock_property//', param_end_key = 'stock_property_end//',
                                             header_key = '', col_len = 2)


        ## form the header
        self.cur_quotes_parm_headers = [n.encode() for n in self.xls_property_data.data_label_list]

        ## form the url str
        start_str = '&f='
        target_properties = ''.join([n[0].encode().strip() for n in self.xls_property_data.data_value_list])
        self.cur_quotes_property_portion_url =  start_str + target_properties

The last update enable the script to handle more than one url query (each query can handle up to 50 stocks). This enable the full sweep of all the stocks listed in the stocklist and downloaded it to single results file. A sweep of around 1000 stocks symbol take less than 3 mins (it also depends on the internet connection).

The updated script can be found at GitHub.