Extracting stocks info from yahoo finance using python (Updates)

Have made several updates to the script from previous posting. Firstly is the capability to sweep through all the stocks symbol for a .csv file. The list of stocks symbol is easily generated using the extract all symbol script describe in the previous post. Reading all the symbols from the CSV can be done using python Pandas as shown below.

        data_ext = YFinanceDataExtr()
        ## read  data from .csv file -- full list of stocks
        csv_fname = r'C:\pythonuserfiles\yahoo_finance_data_extract\stocklist.csv'
        stock_list = pandas.read_csv(csv_fname)
        # convert from pandas dataframe object to list
        stock_list = list(stock_list['SYMBOL'])
        #stock_list = ['S58.SI','S68.SI']
        data_ext.get_cur_quotes_fr_list(stock_list)

The second improvement is instead of keying all the individual properties that need to be extracted (as illustrated below), the list of properties can be read from a xls table using the xls_table_extract_module described in the following post.

original method to set the property in the url

    def form_cur_quotes_property_url_str(self):
        """ To form the properties/parameters of the data to be received for current quotes
            To eventually utilize the get_table_fr_xls.
            Current use default parameters.
            name(n0), symbol(s), the latest value(l1), open(o) and the close value of the last trading day(p)
            volumn (v), year high (k), year low(j)

            Further info can be found at : https://code.google.com/p/yahoo-finance-managed/wiki/enumQuoteProperty
        """
        start_str = '&f='
        target_properties = 'nsl1opvkj'
        self.cur_quotes_property_portion_url =  start_str + target_properties

 New method: xls table format. (the xls illustrated here is the simplified version). The full property xls is in Github.

stockproperty

The data can be retrieved easily using the xls_table_extract_module hence easily forming the properties str by concat the tag together. The information required can be customized to the order based on the order of xls and the information required can be turned on and off using the comment tag ‘#’.  Note  some of the properties retrieved might not be in format that easy to parse and might result in extra column upon downloading. The portion of script to handle this is as described below.

    def form_cur_quotes_property_url_str_fr_excel(self):
        """ Required xls_table_extract_module.
            Get all the properties from excel table.
            Properties can be selected by comment out those properties not required.
            Also set the heeader: self.cur_quotes_parm_headers for the values.

        """
        from xls_table_extract_module import XlsExtractor
        self.xls_property_data = XlsExtractor(fname = self.properties_excel_table, sheetname= 'Sheet1',
                                             param_start_key = 'stock_property//', param_end_key = 'stock_property_end//',
                                             header_key = '', col_len = 2)

        self.xls_property_data.open_excel_and_process_block_data()

        ## form the header
        self.cur_quotes_parm_headers = [n.encode() for n in self.xls_property_data.data_label_list]

        ## form the url str
        start_str = '&f='
        target_properties = ''.join([n[0].encode().strip() for n in self.xls_property_data.data_value_list])
        self.cur_quotes_property_portion_url =  start_str + target_properties

The last update enable the script to handle more than one url query (each query can handle up to 50 stocks). This enable the full sweep of all the stocks listed in the stocklist and downloaded it to single results file. A sweep of around 1000 stocks symbol take less than 3 mins (it also depends on the internet connection).

The updated script can be found at GitHub.

Advertisements

9 comments

  1. I’m trying to modify “extract_all_stock_symbols.py” to extract symbols from US market. I changed the ulr to “https://finance.yahoo.com/lookup/stocks?t=S&m=US&r=”. But the script will not work. Could you please point me to the right direction? Thanks!

    1. Hi Hamsa, I tried the script by changing self.sym_start_url = “https://sg.finance.yahoo.com/lookup/stocks?t=S&m=US&r=”
      It is able to extract the symbols. May I know the error output? The script may take some time to run as it needs to scan quite a number of pages for US stocks.

  2. I may be late to the party here but I am wondering how you are able to get around yahoo timing out your connection?

    I have a similar program written in MATLAB code which extracts stock information on over 1300 stocks. Every time I run the program, I can get to anywhere between 80-600 stocks extracted before having my connection timed out. I assume Yahoo does not like me mining their data.

    I have tried introducing random pause() times and maxed out the timed connection until my program stops. Did you ever have this kind of problem?

    1. Hi Gordon, may I ask each query represent 1 stock information or multiple of stocks? I queried about 700 stocks and each of my query is around 50 stocks so about 14 queries in total. This number of query is very small and I normally do not face any issue on the connection timed out. I think the daily limit for yahoo query is much higher than that so I not sure if the limit is the issue here.

      Alternatively, you can try the YQL which limit is around 2000/hr. You can look at the following site for reference. https://developer.yahoo.com/yql/guide/usage_info_limits.html

  3. hi, thx for ur work. i have some questions about installing. i got a response that said, ” no module named pattern.web.” when i tried on ipython. any feedback would be appreciated.

    1. Hi Dave,

      Thanks for the feedback. You would need to install pattern module. You can use “pip install pattern” to get the required module.

      Hope that helps to solve your problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s