This post demonstrates the basic use of TensorFlow low level core API and tensorboard to build machine learning models for study purposes. There are higher level API (Tensorflow Estimators etc) from TensorFlow which will simplify some of the process and are easier to use by trading off some level of control. If fine or granular level of control is not required, higher level API might be a better option.
The following python script will use the iris data set and the following python modules to build and run the model: Numpy, scikit-learn and TensorFlow. For this program, Numpy will be used mainly for array manipulation. Scikit-learn is used for the min-max Scaling, test-train set splitting and one-hot encoding for categorical data/output. The iris data set is imported using the Scikit-learn module.
A. Data Preparation
There are 4 input features (all numeric), 150 data row, 3 categorical outputs for the iris data set. The list of steps involved in the data processing steps are as below :
- Split into training and test set.
- Min-Max Scaling (‘Normalization’) on the features to cater for features with different units or scales.
- Encode the categorical outputs (3 types: setosa, virginica and versicolor ) using one-hot encoding.
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
# reset graph
tf.reset_default_graph()
## Loading the data set
raw_data = load_iris()
## split data set
X_train, X_test, Y_train, Y_test = train_test_split(raw_data.data, raw_data.target, test_size=0.33, random_state=42, stratify= raw_data.target)
## max min scalar on parameters
X_scaler = MinMaxScaler(feature_range=(0,1))
## Preprocessing the dataset
X_train_scaled = X_scaler.fit_transform(X_train)
X_test_scaled = X_scaler.fit_transform(X_test)
## One hot encode Y
onehot_encoder = OneHotEncoder(sparse=False)
Y_train_enc = onehot_encoder.fit_transform(Y_train.reshape(-1,1))
Y_test_enc = onehot_encoder.fit_transform(Y_test.reshape(-1,1))
B. Model definition or building the computation graph
Next we will build the computation graph. As defined by Tensorflow: “a computational graph is a series of TensorFlow Operations arranged into a graph of nodes. Each node takes zero or more tensors as inputs and produces a tensor as output”. Hence, we would need to define certain key nodes and operations such as the inputs, outputs, hidden layers etc.
The following are the key nodes or layers required:
- Input : This will be a tf.placeholder for data feeding. The shape depends on the number of features
- Hidden layers: Here we are using 2 hidden layers. Output of each hidden layer will be in the form of f(XW+B) where X is the input from either the previous layer or the input layer itself, W is the weights and B is the Bias. f() is an activation function.
- Rectified Linear Unit (ReLu) activation function is selected for this example to introduce non-linearity to the system. ReLu: A(x) = max(0, x) i.e. output x when x > 0 and 0 when x < 0. Sigmoid activation function can also be used for this example.
- Weights and Bias are variables here. They are changed at each training steps/epoch in this case.
- Weights are initialized with xavier_initializer and bias are initialized to zero.
- Output or prediction or y hat: This is output of the Neural Network, the computation results from the hidden layers.
- Y: actual output use for comparison against the predicted value. This will be tensor (tf.placeholder) for data feeding.
- Loss function: Compute the error between the predicted vs the actual classification ( or Yhat vs Y). TensorFlow build-in function tf.nn.softmax_cross_entropy_with_logits is used for multiple class classification problem. “Tensorflow : It measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class)”
- Train model or optimizer: This defined the training algothrim use to minimize cost or loss. For this example, we are using the gradient descent to find minimum cost by updating the various weights and bias.
In addition, the learning rate and the total steps or epoches are defined for the above model.
# Define Model Parameters
learning_rate = 0.01
training_epochs = 10000
# define the number of neurons
layer_1_nodes = 150
layer_2_nodes = 150
# define the number of inputs
num_inputs = X_train_scaled.shape[1]
num_output = len(np.unique(Y_train, axis = 0))
# Define the layers
with tf.variable_scope('input'):
X = tf.placeholder(tf.float32, shape= (None, num_inputs))
with tf.variable_scope('layer_1'):
weights = tf.get_variable('weights1', shape=[num_inputs, layer_1_nodes], initializer = tf.contrib.layers.xavier_initializer())
biases = tf.get_variable('bias1', shape=[layer_1_nodes], initializer = tf.zeros_initializer())
layer_1_output = tf.nn.relu(tf.matmul(X, weights) + biases)
with tf.variable_scope('layer_2'):
weights = tf.get_variable('weights2', shape=[layer_1_nodes, layer_2_nodes], initializer = tf.contrib.layers.xavier_initializer())
biases = tf.get_variable('bias2', shape=[layer_2_nodes], initializer = tf.zeros_initializer())
layer_2_output = tf.nn.relu(tf.matmul(layer_1_output, weights) + biases)
with tf.variable_scope('output'):
weights = tf.get_variable('weights3', shape=[layer_2_nodes, num_output], initializer = tf.contrib.layers.xavier_initializer())
biases = tf.get_variable('bias3', shape=[num_output], initializer = tf.zeros_initializer())
prediction = tf.matmul(layer_2_output, weights) + biases
with tf.variable_scope('cost'):
Y = tf.placeholder(tf.float32, shape = (None, num_output))#use 1 instead of num output unless one hot encoding??
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = Y, logits = prediction))
with tf.variable_scope('train'):
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
with tf.variable_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(Y, axis =1), tf.argmax(prediction, axis =1) )
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Logging results
with tf.variable_scope("logging"):
tf.summary.scalar('current_cost', cost)
tf.summary.scalar('current_accuacy', accuracy)
summary = tf.summary.merge_all()
C. Running the computation Graph or Session
Actual computation takes place during the running of computation graph (handled by tf.Session). The first step is to initialize the global variables and create the log writer object to log the parameters defined in “logging” scope for Tensorboard.
Next we are iterating through each training steps. For simplicity, we are using the full training data at each steps to train and update the respective weights, bias by calling session run on the optimizer.
Intermediate results is being output every 5 steps interval both to default sys out and also stored in respective csv file. The optimization is using the training data but the accuracy assessment is based on both the test and the train data.
# Initialize a session so that we can run TensorFlow operations
with tf.Session() as session:
# Run the global variable initializer to initialize all variables and layers of the neural network
session.run(tf.global_variables_initializer())
# create log file writer to record training progress.
training_writer = tf.summary.FileWriter(r'C:\data\temp\tf_try\training', session.graph)
testing_writer = tf.summary.FileWriter(r'C:\data\temp\tf_try\testing', session.graph)
# Run the optimizer over and over to train the network.
# One epoch is one full run through the training data set.
for epoch in range(training_epochs):
# Feed in the training data and do one step of neural network training
session.run(optimizer, feed_dict={X:X_train_scaled, Y:Y_train_enc})
# Every 5 training steps, log our progress
if epoch %5 == 0:
training_cost, training_summary = session.run([cost, summary], feed_dict={X: X_train_scaled, Y: Y_train_enc})
testing_cost, testing_summary = session.run([cost, summary], feed_dict={X: X_test_scaled, Y: Y_test_enc})
#accuracy
train_accuracy = session.run(accuracy, feed_dict={X: X_train_scaled, Y: Y_train_enc})
test_accuracy = session.run(accuracy, feed_dict={X: X_test_scaled, Y: Y_test_enc})
print(epoch, training_cost, testing_cost, train_accuracy, test_accuracy )
training_writer.add_summary(training_summary, epoch)
testing_writer.add_summary(testing_summary, epoch)
# Training is now complete!
print("Training is complete!\n")
final_train_accuracy = session.run(accuracy, feed_dict={X: X_train_scaled, Y: Y_train_enc})
final_test_accuracy = session.run(accuracy, feed_dict={X: X_test_scaled, Y: Y_test_enc})
print("Final Training Accuracy: {}".format(final_train_accuracy))
print("Final Testing Accuracy: {}".format(final_test_accuracy))
training_writer.close()
testing_writer.close()
D. Viewing in Tensorboard
The logging of the cost and the accuracy (tf.summary.scalar) allows us to view the performance of both the test and train set.
Results is as shown below
Final Training Accuracy: 1.0
Final Testing Accuracy: 0.9599999785423279