computing

Rapid generation of powerpoint report with template scanning

In my work, I need to create PowerPoint (ppt) report of similar template. For the report, I need to create various plots in Excel or JMP, save it to folders and finally paste them to ppt. It be great if it is possible to generate ppt report rapidly by using automation. I have created a python interface to powerpoint using com commands hoping it will help to generate the report automatically.

The initial idea is to add command to paste the plots at specific slides and specific positions. The problem with this is that I have to set the position values and picture sizes for each graph in the python script. This become tedious and have to set independently for each report type.

The new idea will be to give the script a scanned template and the script will do the following commands:

Create a template ppt with the graphs at particular slide, position and size set.
Rename each object that you need to copy with the keywords such as ‘xyplot_Qty_year’ which after parsing will require a xyplot with qty as y axis and year as x axis. This will then get the corresponding graph with the same type and qty path and link them together.
See the link on how to rename objects.
The script will scan through all the slide, getting all info of picture that need to be pasted by having the keyword. It will note the x and y positon and the size.
The script will then search the required folder for the saved pic file of the same type and will paste them to a new ppt.

The advantage of this approach is that multiple scanned template can be created. The picture position can be adjusted easily as well.

Sample of the script is as below. It is not a fully executable script.

import os
import re
import sys

import pyPPT

class ppt_scanner(object):
    def __init__(self):

        # ppt setting
        self.ppt_scanned_filename = r'\\SGP-L071166D033\Chengai main folder\Chengai setup files\scanned_template.ppt'

        # scanned plot results
        self.full_scanned_info = dict()
        self.scanned_y_list = list()

        # plots file save location where keyword is the param scanned
        self.bivar_plots_dict = dict()# to be filled in 

        #ppt plot results
        ##store the slide no and the corresponding list of pic
        self.ppt_slide_bivar_pic_name_dict = dict()

    def initialize_ppt(self):
        '''
            Initialize the ppt object.
            Open the template ppt and save it to target filename as ppt and work it from there
            None --> None (create the ppt obj)

        '''
        self.pptobj = UsePPT()                                          # New ppt for pasting the results.
        self.pptobj.show()
        self.pptobj.save(self.ppt_save_filename)
        self.scanned_template_ppt = UsePPT(self.ppt_scanned_filename)   # Template for new ppt to follow
        self.scanned_template_ppt.show()

    def close_all_ppt(self):
        """ Close all existing ppt. 

        """
        self.pptobj.close()
        self.scanned_template_ppt.close()

## Scanned ppt obj function
    def get_plot_info_fr_scan_ppt_slide(self, slide_no):
        """ Method (pptobj) to get info from template scanned ppt.priorty to get the x, y coordinates of pasting.
            Only get the Object name starting with plot.
            Straight away stored info in various plot classification
            Args:
                Slide_no (int): ppt slide num
            Returns:
                (list): properties of all objects in slide no

        """
        all_obj_list =  self.scanned_template_ppt.get_all_shapes_properties(slide_no)
        self.classify_info_to_related_group(slide_no, [n for n in all_obj_list if n[0].startswith("plot_")] )
        return [n for n in all_obj_list if n[0].startswith("plot_")]

    def get_plot_info_fr_all_scan_ppt_slide(self):
        """ Get all info from all slides. Store info to self.full_scanned_info.

        """
        for slide_no in range(1,self.scanned_template_ppt.count_slide()+1,1):
            self.get_plot_info_fr_scan_ppt_slide(slide_no)

    def classify_info_to_related_group(self, slide_no, info_list_fr_one_slide):
        """Group to one consolidated group: main dict is slide num with list of name, pos as key.
            Append to the various plot groups. Get the keyword name and the x,y pos.
            Will also store the columns for the y-axis (self.scanned_y_list).
            Args:
                slide_no (int): slide num to place in ppt.
                info_list_fr_one_slide (list):

        """
        temp_plot_biv_info, temp_plot_tab_info, temp_plot_legend_info = [[],[],[]]
        for n in info_list_fr_one_slide:
            if n[0].startswith('plot_biv_'):
                temp_plot_biv_info.append([n[0].encode().replace('plot_biv_',''),n[1],n[2], n[3], n[4]])
                self.scanned_y_list.append(n[0].encode().replace('plot_biv_',''))

        self.ppt_slide_bivar_pic_name_dict[slide_no] = temp_plot_biv_info

## pptObj -- handling the pasting
    def paste_all_plots_to_all_ppt_slide(self):
        """ Paste the respective plots to ppt.
        """
        ## use the number of page as scanned template
        for slide_no in range(1,self.pptobj.count_slide()+1,1):
            self.paste_plots_to_slide(slide_no)

    def paste_plots_to_slide(self, slide_no):
        """ Paste all required plots to particular slide
            Args:
                slide_no (int): slide num to place in ppt.

        """
        ## for all biv plots
        for n in self.ppt_slide_bivar_pic_name_dict[slide_no]:
            if self.bivar_plots_dict.has_key(n[0]):
                filename = self.bivar_plots_dict[n[0]]
                pic_obj = self.pptobj.insert_pic_fr_file_to_slide(slide_no, filename, n[1], n[2], (n[4],n[3])) 

if (__name__ == "__main__"):

    prep = ppt_scanner()

    prep.initialize_ppt()

    ## scanned all info -- scanned template function
    prep.get_plot_info_fr_all_scan_ppt_slide()
    prep.scanned_template_ppt.close()

    ## paste plots
    prep.paste_all_plots_to_all_ppt_slide()
    prep.pptobj.save()

    print 'Completed'

Parsing Dict object from text file (Updates)

I have been using the DictParser created as mentioned in previous blog in a recent project to create a setting file for various users. In the project, different users need to have different settings such as parameter filepath.

The setting file created will use the computer name to segregate the different users. By creating a text file (with Dict Parser) based on the different computer names, it is easy to get separate setting parameters for different users. Sample of the setting file are as below.

## Text file
$USER1_COM_NAME
#setting_comment_out:r'c:\data\temp\bbb.txt'
setting2:r'c:\data\temp\ccc.txt'

$USER2_COM_NAME
setting:r'c:\data\temp\eee.txt'
2:1,bbb,cccc,1,2,3
## end of file

The output from DictParser are as followed:

## python output as one dict containing two dicts with different user'USER1_COM_NAME' and 'USER2_COM_NAME'
>> {'USER1_COM_NAME': {'setting2': ['c:\\data\\temp\\ccc.txt']}, 'USER2_COM_NAME': {2: [1, 'bbb', 'cccc', 1, 2, 3], 'setting': ['c:\\data\\temp\\eee.txt']}}

User can use the command “os.environ[‘ComputerName’]” to get the corresponding setting filepath.

I realized that the output format is somewhat similar to json format. This parser is more restrictive in uses hence has some advantage over json in less punctuations (‘{‘, ‘\’) etc and able to comment out certain lines.

Extracting portions of text from text file

I was trying to read the full book of abstracts from a conference earlier and finding it tedious to copy portions of desired paragraphs for my summary report to be fed into my simple auto-summarized module.

I came up with the following script that allows users to put a specific symbol such as “@” at the start and end of the paragraph to mark those paragraphs or sentences to be extracted. More than one portion can be selected and they can be returned as a list for further processing. For my case, each of the paragraph outputted will be auto summarized.

The following diagram illustrated the two different kinds of extraction.

The script scans all the lines of the text file, looking for the key_symbol (“@” in this case) and marks the index of the selected lines. The present method only use string “startwith” function. It can be expanded to be using regular expression.

Depending on the mode (overlapping or non-overlapping), it will calculate the portion of the text to be selected and output as a list which can be use for further processing.

Script can be found here.

Hotel Reviews Scraping

A python module for scraping hotel rating and reviews from Trip Advisor and Orbitz. The documentation seems to suggest only scraping for US hotels. It is worth studying to see how it can apply to other countries…..