Day: August 20, 2014

Manage and extract data using python and Excel tables

User data such as setting parameters are usually stored in txt, json or perhaps .csv format. One alternative to the common types of storing simple data sets used for initial settings is through usingĀ  Excel tables. Excel represents a good way to store and view tables with the extensive formatting options andĀ  different tabs for multiple storage. This provides an edge over txt or other simple data storing. The only difficulty might be that it is not easy to retrieve the data easily unlike in .csv or txt file.

The script below will utilize the Excel tables to extract various information such as setting files, parameters values and at the same time maintain a neat way of viewing and changing all the parameters.

The script will required pyExcel which is interface module between python and excel. The concept is to retrieve all the tables and rows specified within the start and closing tag.

The advantages are:

  1. Multiple tags can be used within the same excel sheet or other excel sheets as well.
  2. Number of columns can be edited easily.
  3. Space can be inserted between rows for easy viewing.
  4. Comment can be inserted so that particular row data can be easily bypassed.
  5. Normal excel formatting can be enabled without disruption to the data retrieved hence allowing easy viewing of data.

Below is a sample table which data will be extracted to be use in subsequent python functions or modules. Note the formatting is up to the user preferences.

sample table

The script run and output are as below. Note that there are various output format to query from. Also notice that the space between the data and those commented rows are being taken care off. The script is available in GitHub.

        xls_set_class = XlsExtractor(fname = r'C:\Python27\Lib\site-packages\excel_table_extract\testset.xls', sheetname= 'Sheet1',
                                     param_start_key = 'start//', param_end_key = 'end//',
                                     header_key = 'header#3//', col_len = 3)

        print xls_set_class.data_label_list
        ## >>> [u'label1', u'label3']

        print xls_set_class.data_value_list
        ## >>> [[2.0, 3.0], [8.0, 9.0]]

        print xls_set_class.label_value_dict
        ## >>> {u'label1': [2.0, 3.0], u'label3': [8.0, 9.0]}

        print xls_set_class.header_list
        ## >>> [u'header1', u'header2', u'header3']