For this post, we are only going to scrape the “Key Statistics” page of a particular stock in Yahoo Finance. The usual way might be to use Requests and BeautifulSoup to parse the web page. However, with the table format in the targeted webpage, it is easier to use Pandas read_html and DataFrame function.
- Objectives:
-
- Retrieving stocks information (Key statistics) from Yahoo Finance.
-
- Required Tools:
-
- Python Pandas— Using Pandas read_html function for reading web table form.
-
Usage — Pulling a particular stock data data
import pandas as pd tgt_website = r'https://sg.finance.yahoo.com/quote/WDC/key-statistics?p=WDC' def get_key_stats(tgt_website): # The web page is make up of several html table. By calling read_html function. # all the tables are retrieved in dataframe format. # Next is to append all the table and transpose it to give a nice one row data. df_list = pd.read_html(tgt_website) result_df = df_list[0] for df in df_list[1:]: result_df = result_df.append(df) # The data is in column format. # Transpose the result to make all data in single row return result_df.set_index(0).T # Save the result to csv result_df = get_key_stats(tgt_website)
Pulling all the stocks symbols
Here, we are pulling one known stock symbol. To get all the stocks in particular indices, the stock symbols need to be known first. The below code will extract all the stock symbols, along with other data, from the NASDAQ website. [Note: the NASDAQ website has changed format and the original method of getting the stock symbols is not valid. Please see the 2nd method to pull from eoddata website]
import pandas as pd weblink = 'https://www.nasdaq.com/screening/companies-by-name.aspx?letter=A&render=download' sym_df = pd.read_csv(weblink) stock_symbol_list = sym_df.Symbol.tolist()
import string import time import pandas as pd url_template = 'http://eoddata.com/stocklist/NASDAQ/{}.htm' sym_df = pd.DataFrame() for letter in list(string.ascii_uppercase): tempurl = url_template.format(letter) temp_data = pd.read_html(tempurl) temp_df = temp_data[4] if len(sym_df)==0: sym_df = temp_df else: sym_df = sym_df.append(temp_df) time.sleep(1) stock_symbol_list = sym_df.Code.tolist()
Pulling key statistics for all stock symbols (for given index)
The last step will be to iterate all the symbols and get the corresponding key statistcis
all_result_df = pd.DataFrame() url_prefix = 'https://sg.finance.yahoo.com/quote/{0}/key-statistics?p={0}' for sym in stock_symbol_list: stock_url = url_prefix.format(sym) result_df = get_key_stats(stock_url) if len(all_result_df) ==0: all_result_df = result_df else: all_result_df = all_result_df.append(result_df) # Save all results all_result_df.to_csv('results.csv', index =False)
Hi! The original developer!
Thanks for sharing your code!
It is really beautiful and helpful!
However, I should comment that downloading the entire list of symbols from NASDAQ does not work!
Please have a look at the code
sym_df = pd.read_csv(weblink)
Thanks!
Hi Jaeyong, thanks for the comment. I have updated the post with new ways of getting the symbols. Hope it helps.