A simple python script to retrieve key financial metrics for all stocks from Google Finance Screener. Google screener have more metrics avaliable compared to SGX screener and also contains comprehensive stocks data for various stock exchanges.
The reason for the fast retrieval is that the information are stored in the form of single json format for all stocks such that it will reduce the number of request calls and downloading. Being in json format also allows easy conversion to a Pandas Dataframe object.
To retrieve the json url of the stock data, go to the Google Screener and select the criteria (like what is normally done when setting up a filter). Open up the criteria to full range of the particular metrics. In this way, all the stocks will be selected instead of being filtered off. Using the developer tab of any browser, retrieve the full url. For further description of how to retrieve the url, you can refer to the following post: “Getting historic financial statistics of stocks using Python”
Two points to take note: Firstly the URL only include stock list from 1 -20 due to page setting. Set the end stock to a large number eg 3000 (in blue) to include the full stock list. Below is a sample of the corresponding url.
Secondly, as Google only allows 12 criteria to be set at any one go, you would need to repeat process multiple times to obtain all the parameters. Repeat the above process by selecting different criteria and join all the parameters together.
Once the url is formed, the same process is used when scraping web data using python as described in most posts in this blog. The main tools are Python Pandas and Python Pattern. Python Pattern is to help with the json file download and Pandas to convert the json file to Data frame which can then be used to join with other parameters.
The difficult part of the script is to obtain the url. Once the url is known, other methods can be employed to download and read the data from the json file.
The script (for all stocks in Singapore) is available in Github. Due to the long url format, the script will form the full url by concatenating the start and end url with the middle portion (which are all the criteria) stored in a file. File is also found in Github.
(Update: Thanks to Bob1: the start url is changed to https://finance.google.com/finance?output=json&start=0&num=3000&noIL=1&q=)