plotting

Creating Mosaic Plot

In one of my work project, I need to use mosaic plot to visualize the proportion of different variables/elements exists in each group. It is hard to find a readily available mosaic plot function (from Seaborn etc) which can be easily customized. By reading some of the blogs, mosaic plot can be created using stacked bar chart concept by performing some transformation on the raw data and overlaying individual bar charts. With this knowledge and using python Pandas and Matplotlib, I am able to create a mosaic plot that is good enough for my need.

Sample Data Sets

A sample data set is as shown below. We need to plot the proportion of b, g, r (all the columns) for each index (0 to 4). Based on the format of the data set, we make a transformation of the columns to be able to have Mosaic Plot.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw SampleData.ipynb hosted with ❤ by GitHub

Breaking down the data transformation for stacked bar chart plotting

We perform two transformations as followed. Mosaic plot requires the sum of proportion of categories for each group to be 1.0 or 100%. Stacked bar chart can achieve this by summing or stacking values for each element in the group but we would need to ensure the values are normalized and the sum of all elements in a group equal to 1 (i.e r+ g+b =1 for each index).

To simulate the effect of stacked bar chart , the trick is to use multiple bar charts to overlay on top of each other to simulate the effect of stacked bar chart. To be able to create the stacked effect, the ratio/proportion of the stacked element need to be the sum of proportion value of “bottom” elements + the proportion value of the element itself. This can be easily achieved by doing a cumulative sum along the row axis.

As example below, r will be used as a base (since values are based on b + g + r). g will overlay on top of r since it is summation of b + g. b will be final layer overlay on g and r.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw SampleData.ipynb hosted with ❤ by GitHub

Mosaic plot function

Once the transformations are done, it is easy to plot the mosaic plot by plotting the different bar charts and overlaying on top of each other. Additional module adjustText can be used to prevent overlapping of the text labels in the plot. Based on the above, we can create a general mosaic function as below.

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

view raw PercentStackPlot as Mosaic - example.ipynb hosted with ❤ by GitHub

import seaborn as sns import numpy as np import pandas as pd import matplotlib.pyplot as plt df1['DATE1'] = df1.DATE.dt.strftime('%m/%d/%Y') df1 = df1.sort_values(by = 'DATE1') # calculate the metric % change and # actual change with reference to each individual head first data df1['METRIC_A_PCT_CHANGE'] = df1.groupby(['SERIAL','HEAD'])['METRIC_A']\ .apply(lambda x: x.div(x.iloc[0]).subtract(1).mul(100)) df1['METRIC_A_CHANGE'] = df1.groupby(['SERIAL','HEAD'])['METRIC_A']\ .apply(lambda x: x - x.iloc[0])

fig, ax = plt.subplots(figsize=(10,10)) # Pivot it for plotting in heap map ww = df1.pivot_table(index = ['SERIAL','HEAD'], \ columns = 'DATE1', values = "METRIC_A_PCT_CHANGE") g = sns.heatmap(ww, vmin= -5, vmax = 5, center = 0, \ cmap= sns.diverging_palette(220, 20, sep=20, as_cmap=True),\ xticklabels=True, yticklabels=True, \ ax = ax, linecolor = 'white', linewidths = 0.1, annot = True) g.set_title("% METRIC_A changes over multiple Dates", \ fontsize = 16, color = 'blue')