Day: November 24, 2018

Heap Map for discrepancy check

Monitoring counts discrepancy

In one aspect of my work, we have a group of samples undergoing several rounds of modifications with same set of tests being performed at each round. For each test, parameters for each sample are collected. For some samples, a particular test may fail in certain rounds resulting in no/missing parameters being collected for that test.

When we compare the performance of the samples especially grouping as a mean, missing parameters from certain samples at certain rounds may skew the results. To ensure accuracy, we need to ensure matching samples data. As there are multiple tests and few hundreds parameters being tracked, we need a way to keep track of the parameters that have mismatch parameters between rounds.

A simple way will be to use the heat map to highlight parameters that have discrepancy in number of counts (this will mean that some samples are missing in data) between rounds. The script is generated using mainly Pandas and Seaborn.

Steps

  1. Group the counts for each parameter for each round.
  2. Use one round as reference (default 1st round), take the differences in counts for each parameter for each round.
  3. Display as heat map for only rounds that have discrepancy.
import os, sys, datetime, re
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# retrieve zone data
rawfile = 'raw_data.csv'
raw_df = pd.read_csv(rawfile)

# count of data in group
cnt_df = raw_df.groupby(['round']).count()

# Substract the first to the rest
diff_df = cnt_df.subtract(cnt_df.iloc[0], axis = 1)

# drop columns where it is all zeros, meaning exclude data that are matched.
diff_df.loc[:, diff_df.any()]

fig, ax = plt.subplots(figsize=(10,10))  

sns.heatmap(diff_df.loc[:, diff_df.any()].T,  xticklabels=True, yticklabels=True, ax =ax , annot=True, fmt="d", center= 0 ,  cmap="coolwarm")
plt.tight_layout()

Untitled

Extra

Quick view of missing data using seaborn heatmap


sns.heatmap(df.isnull(), yticklabels=False, cbar = False, cmap = 'viridis')

missingdata

Hosting static website with GitHub Pages

Create static website with custom domain names. Perks is having your own web hosting at minimal cost. The only cost is the cost of the custom domain name.

Requirements:

  1. Github account: For hosting the static website.
  2. Custom domain name: Purchase domain names from GoDaddy or Namecheap etc. Alternatively, can use GitHub default url <username>.github.io

Steps:

  1. Github
    1. Create new repository with following format <username>.github.io where username refers to GitHub userid.
    2. In the repository, go to setting: Under Theme, choose a Jekyll theme. When finish, click on Source, select master branch. A file needs to exist in repository before Source option can be selected.
    3. If you have purchase your custom domain, you need to configure the A records and CNAME for the domain at the registrar to point to the GitHub site. Proceed to make the necessary changes at the domain registrar website.
  2. Registrar (Below is using GoDaddy as example)
    1. Under My Products, select the domain name that will be used. Click on Manage button.
    2. Once in setting page, scroll down to Additional Settings and click Manage DNS
    3. Within the DNS management page, Add in 4 “A” row with each pointing to IP as follows:
      1. 185.199.108.153
      2. 185.199.109.153
      3. 185.199.110.153
      4. 185.199.111.153
    4. Add in the CNAME pointing to your repository at Github: <username>.github.io
    5. View link for more info on configuring domain name with goDaddy
    6. Similarly, see following link for Namecheap
    7. Note: if you setup using A records and CNAME, leave the nameservers as default.
    8. Once the settings are configured, return to GitHub pages to add the custom domain name
  3. Github
    1. At the setting page, add the custom domain name in the Custom Domain section.
    2. Tick Enforce Https (may take up to 24 hours to take effect)
    3. Completed.
  4. Proceed to add in contents in GitHub using markdown.

Resources

Notes

  • GoDaddy default A records: 50.63.202.32