Retrieving Stock statistics from Yahoo Finance using python

For this post, we are only going to scrape the “Key Statistics” page of a particular stock in Yahoo Finance. The usual way might be to use Requests and BeautifulSoup to parse the web page. However, with the table format in the targeted webpage, it is easier to use Pandas read_html and DataFrame function.

Objectives:
1. 1. Retrieving stocks information (Key statistics) from Yahoo Finance.
Required Tools:
1. 1. Python Pandas— Using Pandas read_html function for reading web table form.

Usage — Pulling a particular stock data data

import pandas as pd

tgt_website = r'https://sg.finance.yahoo.com/quote/WDC/key-statistics?p=WDC'

def get_key_stats(tgt_website):

    # The web page is make up of several html table. By calling read_html function.
    # all the tables are retrieved in dataframe format.
    # Next is to append all the table and transpose it to give a nice one row data.
    df_list = pd.read_html(tgt_website)
    result_df = df_list[0]

    for df in df_list[1:]:
        result_df = result_df.append(df)

    # The data is in column format.
    # Transpose the result to make all data in single row
    return result_df.set_index(0).T

# Save the result to csv
result_df = get_key_stats(tgt_website)

Pulling all the stocks symbols

Here, we are pulling one known stock symbol. To get all the stocks in particular indices, the stock symbols need to be known first. The below code will extract all the stock symbols, along with other data, from the NASDAQ website. [Note: the NASDAQ website has changed format and the original method of getting the stock symbols is not valid. Please see the 2nd method to pull from eoddata website]

import pandas as pd

weblink = 'https://www.nasdaq.com/screening/companies-by-name.aspx?letter=A&render=download'
sym_df = pd.read_csv(weblink)
stock_symbol_list = sym_df.Symbol.tolist()

import string
import time
import pandas as pd

url_template = 'http://eoddata.com/stocklist/NASDAQ/{}.htm'

sym_df = pd.DataFrame()
for letter in list(string.ascii_uppercase):
    tempurl = url_template.format(letter)
    temp_data = pd.read_html(tempurl)
    temp_df = temp_data[4]
    if len(sym_df)==0:
        sym_df = temp_df
    else:
        sym_df = sym_df.append(temp_df)
    time.sleep(1)
stock_symbol_list = sym_df.Code.tolist()

Pulling key statistics for all stock symbols (for given index)

The last step will be to iterate all the symbols and get the corresponding key statistcis

all_result_df = pd.DataFrame()
url_prefix = 'https://sg.finance.yahoo.com/quote/{0}/key-statistics?p={0}'
for sym in stock_symbol_list:
    stock_url = url_prefix.format(sym)
    result_df = get_key_stats(stock_url)
    if len(all_result_df) ==0:
        all_result_df = result_df
    else:
        all_result_df = all_result_df.append(result_df)

# Save all results
all_result_df.to_csv('results.csv', index =False)

3 comments

Hi,
just found your blog,
was finding ways to use python for data retrieval
this was very helpful
thank you very much!

Hi! The original developer!
Thanks for sharing your code!
It is really beautiful and helpful!

However, I should comment that downloading the entire list of symbols from NASDAQ does not work!
Please have a look at the code
sym_df = pd.read_csv(weblink)

Thanks!

Kok Hua says:

November 11, 2019 at 1:32 pm

Hi Jaeyong, thanks for the comment. I have updated the post with new ways of getting the symbols. Hope it helps.

Reply