python dict

Filter stocks data using python

After retrieving the various stocks information from yahoo finance etc with tools described in the previous blog post, it is more meaningful to filter stocks that meet certain requirements much like the functionality of  the Google stocks screener.

The script (avaliable in GitHub) will take in a text file with the criteria specified and filter them using python Pandas. The text file is in the format such that users can easily input and retrieve the criteria description using the DictParser module described in the following blog post. In addition, the DictParser module make it easy to create the respective criteria. A sample of a particular criteria file is as below.

$greater
Volume:999999
PERATIO:4
Current Ratio (mrq):1.5
Qtrly Earnings Growth (yoy):0
DilutedEPS:0

$less
PERATIO:17
Mean Recommendation (this week):3

$compare
1:YEARHIGH,OPEN,greater,0

The DictParser object will get 3 dict based on above criteria text file. These are criteria that will filter the stocks that meet the listed requirements. The stock data after retrieved (in the form of .csv) are converted to Pandas Dataframe object for easy filtering and the stocks eventually selected will  match all the criteria within each criteria file.

Under the ‘greater’ dict, each of the key value pair mean that only stocks that have the key (eg Volume) greater than the value (eg 999999) will be selected. Under the “less” dict, only stocks that have key less than the corresponding value will be selected.  For the “compare” dict, it will not make use of the key but utilize the value (list) for each key.

Inside the value list of the “compare”, there will be 4 items. It will compare the first to second item with 3rd item as comparator and last item as the value. For example, the phrase “YEARHIGH,OPEN,greater,0” will scan stock that has “YearHigh” price greater than “open” price by at least 0 which indicates all stocks will be selected based on this particular criteria.

Users can easily add or delete criteria by conforming to the format. The script allows several criteria files to be run at one go so users can create multiple criteria files with each catering to different risk appetite as in the case of stocks. Below is part of the script that show getting the different criteria dicts using the DictParser and using the dict to filter the data.


    def get_all_criteria_fr_file(self):
        """ Created in format of the dictparser.
            Dict parser will contain the greater, less than ,sorting dicts for easy filtering.
            Will parse according to the self.criteria_type

            Will also set the output file name
        """
        self.dictparser = DictParser(self.criteria_type_path_dict[self.criteria_type])
        self.criteria_dict = self.dictparser.dict_of_dict_obj
        self.modified_df = self.data_df

        self.set_output_file()

    def process_criteria(self):
        """ Process the different criteria generated.
            Present only have more and less
        """
        greater_dict = dict()
        less_dict = dict()
        compare_dict = dict()
        print 'Processing each filter...'
        print '-'*40

        if self.criteria_dict.has_key('greater'): greater_dict =  self.criteria_dict['greater']
        if self.criteria_dict.has_key('less'): less_dict =  self.criteria_dict['less']
        if self.criteria_dict.has_key('compare'): compare_dict =  self.criteria_dict['compare']

        for n in greater_dict.keys():
            if not n in self.modified_df.columns: continue #continue if criteria not found
            self.modified_df = self.modified_df[self.modified_df[n] > float(greater_dict[n][0])]
            if self.print_qty_left_aft_screen:
                self.__print_criteria_info('Greater', n)
                self.__print_modified_df_qty()

        for n in less_dict.keys():
            if not n in self.modified_df.columns: continue #continue if criteria not found
            self.modified_df = self.modified_df[self.modified_df[n] < float(less_dict[n][0])]
            if self.print_qty_left_aft_screen:
                self.__print_criteria_info('Less',n)
                self.__print_modified_df_qty()

        for n in compare_dict.keys():
            first_item = compare_dict[n][0]
            sec_item = compare_dict[n][1]
            compare_type = compare_dict[n][2]
            compare_value = float(compare_dict[n][3])

            if not first_item in self.modified_df.columns: continue #continue if criteria not found
            if not sec_item in self.modified_df.columns: continue #continue if criteria not found

            if compare_type == 'greater':
                self.modified_df = self.modified_df[(self.modified_df[first_item] - self.modified_df[sec_item])> compare_value]
            elif compare_type == 'less':
                self.modified_df = self.modified_df[(self.modified_df[first_item] - self.modified_df[sec_item])< compare_value]

            if self.print_qty_left_aft_screen:
                self.__print_criteria_info('Compare',first_item, sec_item)
                self.__print_modified_df_qty()

        print 'END'
        print '\nSnapshot of final df ...'
        self.__print_snapshot_of_modified_df()

Sample output from one of the criteria is as shown below. It try to screen out stocks that provide high dividend and yet have a good fundamental (only basic parameters are listed below). The modified_df_qty will show the number of stocks left after each criteria.

 List of filter for the criteria:  dividend
—————————————-
VOLUME  >  999999
Qtrly Earnings Growth (yoy)  >  0
DILUTEDEPS  >  0
DAYSLOW  >  1.1
TRAILINGANNUALDIVIDENDYIELDINPERCENT  >  4
PERATIO  <  15
TrailingAnnualDividendYieldInPercent  <  10

Processing each filter…
—————————————-
Current Screen criteria:  Greater   VOLUME
Modified_df qty:  53
Current Screen criteria:  Greater   Qtrly Earnings Growth (yoy)
Modified_df qty:  48
Current Screen criteria:  Greater   DILUTEDEPS
Modified_df qty:  48
Current Screen criteria:  Greater   DAYSLOW
Modified_df qty:  24
Current Screen criteria:  Greater   TRAILINGANNUALDIVIDENDYIELDINPERCENT
Modified_df qty:  5
Current Screen criteria:  Less   PERATIO
Modified_df qty:  4
END

Snapshot of final df …
Unnamed: 0   SYMBOL              NAME LASTTRADEDATE    OPEN  \
17            4   O39.SI         OCBC Bank     10/3/2014   9.680
21            8   BN4.SI       Keppel Corp     10/3/2014  10.380
37            5  C38U.SI  CapitaMall Trust     10/3/2014   1.925
164          14   U11.SI               UOB     10/3/2014  22.300

PREVIOUSCLOSE  LASTTRADEPRICEONLY   VOLUME  AVERAGEDAILYVOLUME  DAYSHIGH  \
17           9.710               9.740  3322000             4555330     9.750
21          10.440              10.400  4280000             2384510    10.410
37           1.925               1.925  5063000             7397900     1.935
164         22.270              22.440  1381000             1851720    22.470

…     Mean Recommendation (last week)  \
17     …                                 2.6
21     …                                 2.1
37     …                                 2.5
164    …                                 2.8

Change  Mean Target  Median Target  \
17                                 0.0        10.53          10.63
21   <font color=”#cc0000″>-0.1</font>        12.26          12.50
37                                 0.1         2.14           2.14
164                                0.0        24.03          23.60

High Target Low Target  No. of Brokers            Sector  \
17         12.23       7.96              22         Financial
21         13.50      10.00              23  Industrial Goods
37          2.40       1.92              21         Financial
164        26.80      22.00              23         Financial

Industry                                       company_desc
17    Money Center Banks  Oversea-Chinese Banking Corporation Limited of…
21   General Contractors  Keppel Corporation Limited primarily engages i…
37         REIT – Retail  CapitaMall Trust (CMT) is a publicly owned rea…
164   Money Center Banks  United Overseas Bank Limited provides various …

[4 rows x 70 columns]

Parsing Dict object from text file (More…)

I have modified the DictParser ,mentioned in previous blog, to handle object parsing. Previous version of DictParser can only handle basic data type, whereas in this version, user can pass a dict of objects for the DictParser to identify and it will replace those variables marked with ‘@’, treating them as objects.

An illustration is as below. Note the “second” key has an object @a included in the value list. This will be subsequently substitute by [1,3,4] after parsing.

## Text file
$first
aa:bbb,cccc,1,2,3
1:1,bbb,cccc,1,2,3

$second
ee:bbb,cccc,1,2,3
2:1,bbb,@a,1,2,3  
## end of file

The output from DictParser are as followed:

p = DictParser(temp_working_file, {'a':[1,3,4]}) #pass in a dict with obj def
p.parse_the_full_dict()
print p.dict_of_dict_obj
>>> {'second': {'ee': ['bbb', 'cccc', 1, 2, 3], 2: [1, 'bbb', [1, 3, 4], 1, 2, 3]},
'first': {'aa': ['bbb', 'cccc', 1, 2, 3], 1: [1, 'bbb', 'cccc', 1, 2, 3]}}

If the object is not available or not pass to DictParser, it will be treated as string.

Using the ‘@’ to denote the object is inspired by the Julia programming language where $xxx is used to substitute objects during printing.

Scaping google results using python (GUI version)

I add a GUI version using wxpython to the script as described in previous post.

The GUI version enable display of individual search results in a GUI format. Each search results can be customized to have the title, link, meta body description, paragraphs on the main page. That is all that is displayed in the current script, I will add in the summarized text in future.

There is also a separate textctrl box for entering any notes based on the results so that user can copy any information to the textctrl box and save it as separate files. The GUI is shown in the picture below.

The GUI script is found in the same Github repository as the google search module. It required one more module which parse the combined results file into separate entity based on the search result number. The module is described in the previous post.

The parsing of the combined results file is very simple by detecting the “###” characters that separate each results and store them individually into a dict. The basic code is as followed.


key_symbol = '###'
combined_result_list,self.page_scroller_result = Extract_specified_txt_fr_files.para_extract(r'c:\data\temp\htmlread_1.txt',key_symbol, overlapping = 0 )

Google Search GUI

Parsing Dict object from text file (Updates)

I have been using the DictParser created as mentioned in previous blog in a recent project to create a setting file for various users. In the project, different users need to have different settings such as parameter filepath.

The setting file created will use the computer name to segregate the different users. By creating a text file (with Dict Parser) based on the different computer names, it is easy to get separate setting parameters for different users. Sample of the setting file are as below.

## Text file
$USER1_COM_NAME
#setting_comment_out:r'c:\data\temp\bbb.txt'
setting2:r'c:\data\temp\ccc.txt'

$USER2_COM_NAME
setting:r'c:\data\temp\eee.txt'
2:1,bbb,cccc,1,2,3
## end of file

The output from DictParser are as followed:

## python output as one dict containing two dicts with different user'USER1_COM_NAME' and 'USER2_COM_NAME'
>> {'USER1_COM_NAME': {'setting2': ['c:\\data\\temp\\ccc.txt']}, 'USER2_COM_NAME': {2: [1, 'bbb', 'cccc', 1, 2, 3], 'setting': ['c:\\data\\temp\\eee.txt']}}

User can use the command “os.environ[‘ComputerName’]” to get the corresponding setting filepath.

I realized that the output format is somewhat similar to json format. This parser is more restrictive in uses hence has some advantage over json in less punctuations (‘{‘, ‘\’) etc and able to comment out certain lines.