coding

Radix Sort in Python

Background

  1. Non comparison integer sorting by grouping numbers based on individual digits or radix (base)
  2. Perform iteratively from least significant digit (LSD) to most significant digit (MSD) or recusively from MSD to LSD.
  3. At each iteration, sorting of target digit is based usually on Counting sort as subroutine.
  4. Complexity: O(d*n+b)) where b is the base for representing numbers eg 10. d is the number of digits. Close to Linear time if d is constant amount

Counting Sort as subroutine

  • Recap on the counting sort. See Counting Sort in Python for more info
  • Taking “get_sortkey ” function that generate the keys based on objects characteristics.
  • Modified the get_sortkey function to perform radix sort.
import random, math

def get_sortkey(n):
    """ Define the method to retrieve the key """
    return n

def counting_sort(tlist, k, get_sortkey):
    """ Counting sort algo with sort in place.
        Args:
            tlist: target list to sort
            k: max value assume known before hand
            get_sortkey: function to retrieve the key that is apply to elements of tlist to be used in the count list index.
            map info to index of the count list.
        Adv:
            The count (after cum sum) will hold the actual position of the element in sorted order
            Using the above, 

    """

    # Create a count list and using the index to map to the integer in tlist.
    count_list = [0]*(k)

    # iterate the tgt_list to put into count list
    for n in tlist:
        count_list[get_sortkey(n)] = count_list[get_sortkey(n)] + 1  

    # Modify count list such that each index of count list is the combined sum of the previous counts
    # each index indicate the actual position (or sequence) in the output sequence.
    for i in range(k):
        if i ==0:
            count_list[i] = count_list[i]
        else:
            count_list[i] += count_list[i-1]

    output = [None]*len(tlist)
    for i in range(len(tlist)-1, -1, -1):
        sortkey = get_sortkey(tlist[i])
        output[count_list[sortkey]-1] = tlist[i]
        count_list[sortkey] -=1

    return output

Radix sort with up to 3-digits numbers

  • Replace the get_sortkey with the get_sortkey2 which extract the integer based on the digit place and uses the counting sort at each iteration
# radix sort
from functools import partial

def get_sortkey2(n, digit_place=2):
    """ Define the method to retrieve the key
        return the key based on the digit place. Current set base to 10
    """
    return (n//10**digit_place)%10

## Create random list for demo counting sort.
random.seed(1)
tgt_list = [random.randint(20,400) for n in range(10)]
print("Unsorted List")
print(tgt_list)

## Perform the counting sort.
print("\nSorted list using counting sort")

output = tgt_list
for n in range(3):
    output = counting_sort(output, 30, partial(get_sortkey2, digit_place=n))
    print(output)

## output
# Unsorted List
# [88, 311, 52, 150, 80, 273, 250, 261, 353, 214]

# Sorted list using counting sort
# [150, 80, 250, 311, 261, 52, 273, 353, 214, 88]
# [311, 214, 150, 250, 52, 353, 261, 273, 80, 88]
# [52, 80, 88, 150, 214, 250, 261, 273, 311, 353]

See also:

Resources:

  1. Getting To The Root Of Sorting With Radix Sort
Advertisements

Convert PDF pages to text with python

A simple guide to text from PDF. This is an extension of the Convert PDF pages to JPEG with python post

  1. Objectives:
      1. Extract text from PDF
  2. Required Tools:
      1. Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
      2. Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
      3. pdftotext— Python module. Wraps the poppler pdftotext utility to convert PDF to text.
  3. Steps:
      1. Install Poppler. For windows, Add “xxx/bin/” to env path
      2. pip install pdftotext

Usage (sample code from pdftotext github)

import pdftotext

# Load your PDF
with open("Target.pdf", "rb") as f:
    pdf = pdftotext.PDF(f)

# Save all text to a txt file.
with open('output.txt', 'w') as f:
    f.write("\n\n".join(pdf))

Further notes 

See also:

Convert PDF pages to JPEG with python

A simple guide to extract images (jpeg, png) from PDF.

  1. Objectives:
      1. Extract Images from PDF
  2. Required Tools:
      1. Poppler for windows— Poppler is a PDF rendering library . Include the pdftoppm utility
      2. Poppler for Mac — If HomeBrew already installed, can use brew install Poppler
      3. Pdf2image— Python module. Wraps the pdftoppm utility to convert PDF to a PIL Image object.
  3. Steps:
      1. Install Poppler. For windows, Add “xxx/bin/” to env path
      2. pip install pdf2image

Usage

import os
import tempfile
from pdf2image import convert_from_path

filename = 'target.pdf'

with tempfile.TemporaryDirectory() as path:
     images_from_path = convert_from_path(filename, output_folder=path, last_page=1, first_page =0)

base_filename  =  os.path.splitext(os.path.basename(filename))[0] + '.jpg'      

save_dir = 'your_saved_dir'

for page in images_from_path:
    page.save(os.path.join(save_dir, base_filename), 'JPEG')

Further notes 

Counting Sort in Python

Background

  1. Sort a collection of objects according to integer keys. Count the number of objects belonging to a specific key value and output the sequence based on both integer key sequence + number of counts in each key.
  2. Running time linear: O(n+k) where n is the number of objects and k is the number of keys.
  3. Keys should not be significant larger than number of objects

Basic Counting Sort

  • With objects as integer key itself.
  • Limited use. Index key not able to modify for extended cases.
import random, math

def basic_counting_sort(tlist, k):
    """ Counting sort algo. Modified existing list. Only for positive integer.
        Args:
            tlist: target list to sort
            k: max value assume known before hand
        Disadv:
            It only does for positive integer and unable to handle more complex sorting (sort by str, negative integer etc)
            It straight away retrieve all data from count_list using count_list index as its ordering.
            Do not have the additional step to modify count_list to capture the actual index in output.
    """

    # Create a count list and using the index to map to the integer in tlist.
    count_list = [0]*(k)

    # loop through tlist and increment if exists
    for n in tlist:
        count_list[n] = count_list[n] + 1

    # Sort in place, copy back into original list
    i=0
    for n in range(len(count_list)):
        while count_list[n] > 0:
            tlist[i] = n
            i+=1
            count_list[n] -= 1

## Create random list for demo counting sort.
random.seed(0)
tgt_list = [random.randint(0,20) for n in range(10)]
print("Unsorted List")
print(tgt_list)

## Perform the counting sort.
print("\nSorted list using basic counting sort")
basic_counting_sort(tgt_list, max(tgt_list)+1)
print(tgt_list)

Counting sort — improved version

  • Taking “get_sortkey ” function that generate the keys based on objects characteristics.
  • Currently, function just return the object itself to work in same way as above but the function can be modified to work with other form of objects e.g. negative integers, string etc.
import random, math

def get_sortkey(n):
    """ Define the method to retrieve the key """
    return n

def counting_sort(tlist, k, get_sortkey):
    """ Counting sort algo with sort in place.
        Args:
            tlist: target list to sort
            k: max value assume known before hand
            get_sortkey: function to retrieve the key that is apply to elements of tlist to be used in the count list index.
            map info to index of the count list.
        Adv:
            The count (after cum sum) will hold the actual position of the element in sorted order
            Using the above, 

    """

    # Create a count list and using the index to map to the integer in tlist.
    count_list = [0]*(k)

    # iterate the tgt_list to put into count list
    for n in tlist:
        count_list[get_sortkey(n)] = count_list[get_sortkey(n)] + 1  

    # Modify count list such that each index of count list is the combined sum of the previous counts
    # each index indicate the actual position (or sequence) in the output sequence.
    for i in range(k):
        if i ==0:
            count_list[i] = count_list[i]
        else:
            count_list[i] += count_list[i-1]

    output = [None]*len(tlist)
    for i in range(len(tlist)-1, -1, -1):
        sortkey = get_sortkey(tlist[i])
        output[count_list[sortkey]-1] = tlist[i]
        count_list[sortkey] -=1

    return output

## Create random list for demo counting sort.
random.seed(0)
tgt_list = [random.randint(0,20) for n in range(10)]
print("Unsorted List")
print(tgt_list)

## Perform the counting sort.
print("\nSorted list using basic counting sort")
output = counting_sort(tgt_list, max(tgt_list) +1, get_sortkey) # assumption is known the max value in tgtlist  for this case.
print(output)

Simple illustration: Counting sort use for negative numbers

def get_sortkey2(n):
    """ Define the method to retrieve the key
        Shift the key such that the all keys still positive integers
        even though input may be negative
    """
    return n +5

## Create random list for demo counting sort.
random.seed(1)
tgt_list = [random.randint(-5,20) for n in range(10)]
print("Unsorted List")
print(tgt_list)

## Perform the counting sort.
print("\nSorted list using counting sort")
output = counting_sort(tgt_list, 30, get_sortkey2)
print(output)<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

Resources:

  1. https://www.geeksforgeeks.org/counting-sort/

Setup MongoDB on iOS

A simple guide to setting up MongoDB on iOS.

  1. Objectives:
      1. Install MongoDB on MacBook.
  2. Required Tools:
      1. Homebrew —  package manager for Mac
      2. MongoDB — MongoDB community version
      3. pymongo — python API for MongoDB.
  3. Steps (terminal command in blue):
      1. brew update
      2. brew install mongodb
      3. Create MongoDB Data directory (/data/db) with updated permission
        1. $ sudo mkdir -p /data/db
        2. $ sudo chown <user>/data/db
      4. Create/open bash_profile
        1. $ cd to users/<username>
        2. $ touch .bash_profile # skip if .bash_profile present
        3. $ open .bash_profile
      5. Insert command in  bash_profile for MongoDB commands to work in terminal
        1. export MONGO_PATH=/usr/local/mongodb
        2. export PATH=$PATH:$MONGO_PATH/bin
      6. Test: Run MongoDB
        1. terminal 1: mongod
        2. terminal 2: mongo.
      7. Install pymongo
        1. pip install pymongo

Further notes 

Fast Install Python Virtual Env in Windows

A simple guide to install virtual environment with different python version on Windows.

  1. Objectives:
      1. Install Virtual Environment on Windows
  2. Required Tools:
      1. Python —  Python 3 chosen in this case.
      2. VirtualEnv — Main virtualenv tool.
      3. VirtualEnvWrapper-Win — VirtualEnv Wrapper for Windows.
  3. Steps:
      1. Install python with python windows installer.
      2. Add python path to Windows PATH. Python 3 will enable this option for users. If not found, add the following two path (Python 3 sample default path )
        1. C:\Users\\AppData\Local\Programs\Python\Python36
        2. C:\Users\MyUserName\AppData\Local\Programs\Python\Python36\Scripts
      3. pip install virtualenv
      4. pip install virtualenvwrapper-win
      5. Main commands use with virtualenv wrapper in windows command prompt
        1. mkvirtualenv : create a new virtual env
        2. workon : list all the environment created
        3. workon  : Activate particular environment.
        4. deactivate: deactivate active environment
        5. rmvirtualenv : remove target environment.

Further notes 

  • Most of the guide reference from Timmy Reilly’s Blog.
  • To create virtualenv with specified python version
    • virtualenv -p <path/win dir of python version>
    • mkvirtualenv -p <path/win dir of python version>
  • Retrieve a list of python modules installed via pip and save to requirement.txt
    • pip freeze > requirement.txt
  • to install a list of required modules (from other virtual env etc)
    • pip install -r requirements.txt

Shorte.st Url Shortener API with Python: Create multiple shorteners at one go (& monetize your links)

A mini project that shortens urls with Shorte.st using python. Shorte.st only provides the “curl” command version of the API. In this post, the command is translated in the form of python requests for easy integration with rest of python scripts and enable multiple urls shortening.

Please note that I have an account with Shorte.st.

  1. Objectives:
      1. Create python function to shorten url using Shorte.st
  2. Required Tools:
      1. Requests —  for handling HTML protocol. Use pip install requests.
      2. Shorte.st account — Shorte.st account to shorten url.
  3. Steps:
      1. Retrieve the API token from Shorte.st by going to Link Tools –> Developer API and copy the API token.
      2. Use request.put with the following parameters:
        1. headers containing the API token and user-agent
        2. data which contains the target url to shorten.
      3. Get the response.text which contain the shortened url
      4. Complete! Include shortened url in target sites/twitter/social media etc.

Curl commands as provided by Shorte.st

curl -H "public-api-token: your_api_token" -X PUT -d "urlToShorten=target_url_to_shortened.com" https://api.shorte.st/v1/data/url

Python function to insert to part of your code or as standalone

import os, sys, re
import requests

USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36"

def shorten_url(target_url, api_token):
    """
        Function to shorten url (With your shorte.st) account.
        Args:
            target_url (str): url to shorten
            api_token (str): api token str
        Returns:
            shortened_url (str)

    """

    headers = {'user_agent':USER_AGENT, 'public-api-token':api_token}
    data = dict(urlToShorten=target_url)

    url = 'https://api.shorte.st/v1/data/url'

    r= requests.put(url, data, headers= headers)

    shortened_url = re.search('"shortenedUrl":"(.*)"',r.text).group(1)
    shortened_url = shortened_url.replace('\\','')

    return shortened_url

if __name__ == "__main__":

    api_token = 'your_api_token'

    urllist = [
                'https://simply-python.com/2018/07/20/fast-download-images-from-google-image-search-with-python-requests-grequests',
                'https://simply-python.com/2018/04/22/building-a-twitter-bot-with-python'

                ]

    for target_url in urllist:
        shortened_url = shorten_url(target_url, api_token)
        print 'shortened_url: {}'.format(shortened_url)

Results

shortened_url: http://destyy.com/wKqD2s
shortened_url: http://destyy.com/wKqD17

 

Further notes 

  1. If you have some fantastic links to share and hope to monetize your links, you can click on below banner to explore more.
  2. The above script is not meant for spamming with huge amount of urls. Shorte.st will monitor on the quality of the urls be shortened.
  3. An ads-free shortener will be with bit.ly. Please see post on using the bit.ly shortener with python if prefer an alternative.

Package your python code made simple & Fast

A mini project that create the required python packaging template folders, submit to GitHub & enable pip installation.

  1. Objectives:
      1. Upload a python project to GitHub and enable py-installable.
  2. Required Tools:
      1. Cookie Cutter–  for templating. Use pip install cookiecutter.
      2. GitHub account, Github desktop, Git shell — version control, git command line.
      3. PyPI account — for uploading to pypi so a user can just do “pip install your_project”.
  3. Steps:
      1. Cookie Cutter to set up the template directory and required folders with relevant docs and files (Readme.md, .gitignore, setup.py etc) for uploading. –> See commands section 1 below.
        • use commands in cmd prompt or Git shell  for windows (preferred Git shell if you executing additional git commands in step 2).
      2. Create a folder with same name as the directory name created in step 1 and place the relevant python codes inside.
      3. Use Git commands to upload files to GitHub. The below commands will only work if the repository is first created in your GitHub account. –> See commands section 2 below.
      4. Alternatively, you can use the GUI version for the GitHub instead of command line to submit your project to the repository.
      5. Create a .pypirc in same directory as the setup.py file. This will be used to provide the info to upload to pypi. –> See section 3
      6. Updates:
        1. Ensure setuptools and wheel are up to date and install twine
          • pip install -U setuptools wheel; pip install twine
        2. Package the code
          • python setup.py sdist bdist_wheel
        3. Upload the package
          • twine upload –repository pypi dist/*

Windows Command prompt for step 1

pip install cookiecutter
cookiecutter https://github.com/wdm0006/cookiecutter-pipproject.git
cd projectname

Git Commands for step 3

git init
git add -A
git commit -m 'first commit'
git remote add origin http://repository_url # works only if repository is created in Git. See Git commands for repository url.
git push origin master
git tag {{version}} -m 'adds the version you entered in cookiecutter as the first tag for release, change the version 0.0.1 etc'
git push --tags origin master

.pypirc contents for step 5

[distutils] # this tells distutils what package indexes you can push to
index-servers =
pypi

[pypi]
repository: https://pypi.python.org/pypi
username: {{your_username}}
password: {{your_password}}

Further notes 

  1. Most of the commands above are from Will McGinnis’ post and python packaging tutorial
  2. To create an empty file in windows for the .pypirc, use cmd echo >.pypirc
  3. Uploading to PyPI require a verfiied email address else there will be error uploading.
  4. When encounter “fatal: remote origin already exists.”. See link
  5. Basic GIT commands. See link
  6. Updates: uploading packages to pypi using twine. (link)
  7. Making changes to the code and uploading (link)

Update changes to github

git add -A
git commit -m 'whatever'
git push origin master
git tag {{version}} -m 'adds the version you entered in cookiecutter as the first tag for release, change the version 0.0.1 etc'
git push --tags origin master

Update changes to pypi

Simply upload your new code to github, create a new release, then adapt the setup.py file (new download_url — according to your new release tag, new version), then run the setup.py and the twin command again

python setup.py sdist
twine upload dist

Fast Download Images from Google Image search with python requests/grequests

A mini project that highlights the usage of requests and grequests.

  1. Objectives:
      1. Download multiple images from Google Image search results.
  2. Required Modules:
      1. Requests –  for HTTP request
      2. grequests – for easy asynchronous HTTP Requests.
      3. Both can be installed by using pip install requests, grequests
  3. Steps:
      1. Retrieve html source from the google image search results.
      2. Retrieve all image url links from above html source. (function: get_image_urls_fr_gs)
      3. Feed the image url list to grequests for multiple downloads (function: dl_imagelist_to_dir)
  4. Breakdown: Steps on grequests implementation.
    1. Very similar to requests implementation which instead of using requests. get()  use grequests.get() or grequests.post()
    2. Create a list of GET or POST actions with different urls as the url parameters. Identify a further action after getting the response e.g. download image to file after the get request.
    3. Map the list of get requests to grequests to activate it. e.g. grequests.map(do_stuff, size=x) where x is the number of async https requests. You can choose x for values such as 20, 50, 100 etc.
    4. Done !

Below is the complete code.


import os, sys, re
import string
import random
import requests, grequests
from functools import partial
import smallutils as su  #only use for creating folder

USER_AGENT = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'
headers = { 'User-Agent': USER_AGENT }

def get_image_urls_fr_gs(query_key):
    """
        Get all image url from google image search
        Args:
            query_key: search term as of what is input to search box.
        Returns:
            (list): list of url for respective images.

    """

    query_key = query_key.replace(' ','+')#replace space in query space with +
    tgt_url = 'https://www.google.com.sg/search?q={}&tbm=isch&tbs=sbd:0'.format(query_key)#last part is the sort by relv

    r = requests.get(tgt_url, headers = headers)

    urllist = [n for n in re.findall('"ou":"([a-zA-Z0-9_./:-]+.(?:jpg|jpeg|png))",', r.text)] 

    return urllist

def dl_imagelist_to_dir(urllist, tgt_folder, job_size = 100):
    """
        Download all images from list of url link to tgt dir
        Args:
            urllist: list of the image url retrieved from the google image search
            tgt_folder: dir at which the image is stored
        Kwargs:
            job_size: (int) number of downloads to spawn.

    """
    if len(urllist) == 0:
        print "No links in urllist"
        return

    def dl_file(r, folder_dir, filename, *args, **kwargs):
        fname = os.path.join(folder_dir, filename)
        with open(fname, 'wb') as my_file:
            # Read by 4KB chunks
            for byte_chunk in r.iter_content(chunk_size=1024*10):
                if byte_chunk:
                    my_file.write(byte_chunk)
                    my_file.flush()
                    os.fsync(my_file)

        r.close()

    do_stuff = []
    su.create_folder(tgt_folder)

    for run_num, tgt_url in enumerate(urllist):
        print tgt_url
        # handle the tgt url to be use as basename
        basename = os.path.basename(tgt_url)
        file_name = re.sub('[^A-Za-z0-9.]+', '_', basename ) #prevent special characters in filename

        #handling grequest
        action_item =  grequests.get(tgt_url, hooks={'response': partial(dl_file, folder_dir = tgt_folder, filename=file_name)}, headers= headers,  stream=True)
        do_stuff.append(action_item)

    grequests.map(do_stuff, size=job_size)

def dl_images_fr_gs(query_key, tgt_folder):
    """
        Function to download images from google search

    """
    url_list = get_image_urls_fr_gs(query_key)
    dl_imagelist_to_dir(url_list, tgt_folder, job_size = 100)

if __name__ == "__main__":

    query_key= 'python symbol'
    tgt_folder = r'c:\data\temp\addon'
    dl_images_fr_gs(query_key, tgt_folder)		

Further notes 

  1. Note that the images download from google search are only those displayed. Additional images which are only shown when “show more results” button is clicked will not be downloaded. To resolve this case:
    1. a user can continuously clicked on “show more results”, manually download the html source and run the 2nd function (dl_imagelist_to_dir) on the url list extracted.
    2. Use python selenium to download the html source.
  2. Instead of using grequests, request module can be used to download the images sequentially or one by one.
  3. The downloading of files are break into chunks especially for those very big files.
  4. Code can be further extended for downloading other stuff.
  5. Further parameters in the google search url here.

Tensorflow: Low Level API with iris DataSets

This post demonstrates the basic use of TensorFlow low level core API and tensorboard to build machine learning models for study purposes. There are higher level API (Tensorflow Estimators etc) from TensorFlow which will simplify some of the process and are easier to use by trading off some level of control. If fine or granular level of control is not required, higher level API might be a better option.

The following python script will use the iris data set and the following python modules to build and run the model: Numpy, scikit-learn and TensorFlow.  For this program, Numpy will be used mainly for array manipulation. Scikit-learn is used for the min-max Scaling, test-train set splitting and one-hot encoding for categorical data/output. The iris data set is imported using the Scikit-learn module.

A. Data Preparation

There are 4 input features (all numeric), 150 data row, 3 categorical outputs for the iris data set. The list of steps involved in the data processing steps are as below :

  1. Split into training and test set.
  2. Min-Max Scaling (‘Normalization’) on the features to cater for features with different units or scales.
  3. Encode the categorical outputs (3 types: setosa, virginica and versicolor ) using one-hot encoding.

import tensorflow as tf
import numpy as np
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# reset graph
tf.reset_default_graph() 

## Loading the data set
raw_data =  load_iris()

## split data set
X_train, X_test, Y_train, Y_test = train_test_split(raw_data.data, raw_data.target, test_size=0.33, random_state=42, stratify= raw_data.target)

## max min scalar on parameters
X_scaler = MinMaxScaler(feature_range=(0,1))

## Preprocessing the dataset
X_train_scaled = X_scaler.fit_transform(X_train)
X_test_scaled = X_scaler.fit_transform(X_test)

## One hot encode Y
onehot_encoder = OneHotEncoder(sparse=False)
Y_train_enc = onehot_encoder.fit_transform(Y_train.reshape(-1,1))
Y_test_enc = onehot_encoder.fit_transform(Y_test.reshape(-1,1))

B. Model definition or building the computation graph

Next we will build the computation graph. As defined by Tensorflow: “a computational graph is a series of TensorFlow Operations arranged into a graph of nodes. Each node takes zero or more tensors as inputs and produces a tensor as output”. Hence, we would need to define certain key nodes and operations such as the inputs, outputs, hidden layers etc.

The following are the key nodes or layers required:

  1. Input : This will be a tf.placeholder for data feeding. The shape depends on the number of features
  2. Hidden layers: Here we are using 2 hidden layers. Output of each hidden layer will be in the form of f(XW+B) where X is the input from either the previous layer or the input layer itself, W is the weights and B is the Bias. f() is an activation function.
    • Rectified Linear Unit (ReLu) activation function is selected for this example to introduce non-linearity to the system. ReLu: A(x) = max(0, x) i.e. output x when x > 0 and 0 when x < 0. Sigmoid activation function can also be used for this example.
    • Weights and Bias are variables here. They are changed at each training steps/epoch in this case.
    • Weights are initialized with xavier_initializer and bias are initialized to zero.
  3. Output or prediction or y hat: This is output of the Neural Network,  the computation results from the hidden layers.
  4. Y: actual output use for comparison against the predicted value. This will be tensor (tf.placeholder) for data feeding.
  5. Loss function: Compute the error between the predicted vs the actual classification ( or Yhat vs Y).  TensorFlow build-in function tf.nn.softmax_cross_entropy_with_logits is used for multiple class classification problem. “Tensorflow : It measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class)”
  6. Train model or optimizer: This defined the training algothrim use to minimize cost or loss. For this example, we are using the gradient descent to find minimum cost by updating the various weights and bias.

In addition, the learning rate and the total steps or epoches are defined for the above model.

# Define Model Parameters
learning_rate = 0.01
training_epochs = 10000

# define the number of neurons
layer_1_nodes = 150
layer_2_nodes = 150

# define the number of inputs
num_inputs = X_train_scaled.shape[1]
num_output = len(np.unique(Y_train, axis = 0)) 

# Define the layers
with tf.variable_scope('input'):
    X = tf.placeholder(tf.float32, shape= (None, num_inputs))

with tf.variable_scope('layer_1'):
    weights = tf.get_variable('weights1', shape=[num_inputs, layer_1_nodes], initializer = tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('bias1', shape=[layer_1_nodes], initializer = tf.zeros_initializer())
    layer_1_output =  tf.nn.relu(tf.matmul(X, weights) +  biases) 

with tf.variable_scope('layer_2'):
    weights = tf.get_variable('weights2', shape=[layer_1_nodes, layer_2_nodes], initializer = tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('bias2', shape=[layer_2_nodes], initializer = tf.zeros_initializer())
    layer_2_output =  tf.nn.relu(tf.matmul(layer_1_output, weights) + biases)

with tf.variable_scope('output'):
    weights = tf.get_variable('weights3', shape=[layer_2_nodes, num_output], initializer = tf.contrib.layers.xavier_initializer())
    biases = tf.get_variable('bias3', shape=[num_output], initializer = tf.zeros_initializer())
    prediction =  tf.matmul(layer_2_output, weights) + biases

with tf.variable_scope('cost'):
    Y = tf.placeholder(tf.float32, shape = (None, num_output))#use 1 instead of num output unless one hot encoding??
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = prediction))

with tf.variable_scope('train'):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.variable_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(Y, axis =1), tf.argmax(prediction, axis =1) )
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# Logging results
with tf.variable_scope("logging"):
    tf.summary.scalar('current_cost', cost)
    tf.summary.scalar('current_accuacy', accuracy)
    summary = tf.summary.merge_all()

C. Running the computation Graph or Session

Actual computation takes place during the running of computation graph (handled by tf.Session). The first step is to initialize the global variables and create the log writer object to log the parameters defined in “logging” scope for Tensorboard.

Next we are iterating through each training steps. For simplicity, we are using the full training data at each steps to train and update the respective weights, bias by calling session run on the optimizer.

Intermediate results is being output every 5 steps interval both to default sys out and also stored in respective csv file. The optimization is using the training data but the accuracy assessment is based on both the test and the train data.

# Initialize a session so that we can run TensorFlow operations

with tf.Session() as session:

    # Run the global variable initializer to initialize all variables and layers of the neural network
    session.run(tf.global_variables_initializer())

    # create log file writer to record training progress.
    training_writer = tf.summary.FileWriter(r'C:\data\temp\tf_try\training', session.graph)
    testing_writer = tf.summary.FileWriter(r'C:\data\temp\tf_try\testing', session.graph)

    # Run the optimizer over and over to train the network.
    # One epoch is one full run through the training data set.
    for epoch in range(training_epochs):

        # Feed in the training data and do one step of neural network training
        session.run(optimizer, feed_dict={X:X_train_scaled, Y:Y_train_enc})

        # Every 5 training steps, log our progress
        if epoch %5 == 0:
            training_cost, training_summary = session.run([cost, summary], feed_dict={X: X_train_scaled, Y: Y_train_enc})
            testing_cost, testing_summary = session.run([cost, summary], feed_dict={X: X_test_scaled, Y: Y_test_enc})

            #accuracy
            train_accuracy = session.run(accuracy, feed_dict={X: X_train_scaled, Y: Y_train_enc})
            test_accuracy = session.run(accuracy, feed_dict={X: X_test_scaled, Y: Y_test_enc})

            print(epoch, training_cost, testing_cost, train_accuracy, test_accuracy )

            training_writer.add_summary(training_summary, epoch)
            testing_writer.add_summary(testing_summary, epoch) 

    # Training is now complete!
    print("Training is complete!\n")

    final_train_accuracy = session.run(accuracy, feed_dict={X: X_train_scaled, Y: Y_train_enc})
    final_test_accuracy = session.run(accuracy, feed_dict={X: X_test_scaled, Y: Y_test_enc})

    print("Final Training Accuracy: {}".format(final_train_accuracy))
    print("Final Testing Accuracy: {}".format(final_test_accuracy))

    training_writer.close()
    testing_writer.close()

D. Viewing in Tensorboard

The logging of the cost and the accuracy (tf.summary.scalar) allows us to view the performance of both the test and train set.

Results is as shown below

Final Training Accuracy: 1.0
Final Testing Accuracy: 0.9599999785423279

Untitled - Copy