Python Box/Violin Plots

Photo by Kazuo ota on Unsplash
Photo by Kazuo ota on Unsplash
Box Plots and Violin Plots are statistical charts that can well represent the distribution of data. In this article, we will introduce how to use Python’s Matplotlib, Seaborn, and Plotly Express packages to draw box plots and violin plots.

Box Plots and Violin Plots are statistical charts that can well represent the distribution of data. In this article, we will introduce how to use Python’s Matplotlib, Seaborn, and Plotly Express packages to draw box plots and violin plots.

The complete code can be found in .

Concept

Before starting to use Python to draw charts, let us first understand the structure of box and violin charts.

Box Plots

The following figure shows the structure of a box plot.

Box Plots
Box Plots
  • Lower quartile (Q1, 1st quartile, or 25th percentile): The average of the lower half of the dataset.
  • Median (Q2, Median, or 50th percentile): the average of the data.
  • Upper quartile (Q3, 3rd quartile, or 75th percentile): the average of the upper half of the dataset.
  • Interquartile range (IRQ): Q3-Q1.
  • Minimum (Q0): Q1 – 1.5 * IRQ.
  • Maximum (Q4): Q3 + 1.5 * IRQ.
  • Whiskers: The straight line between Q0 and Q1, and Q3 and Q4.
  • Outliers: The values ​​not between Q0 and Q4.

Violin Plots

The violin chart can show the distribution of data just like the box chart. In addition, it also shows the probability density. The figure below shows their comparison.

Violin Plots
Violin Plots

Sample Dataset

The examples in this article all use seaborn’s tips dataset. The following example shows how to read this dataset and display its fields.

import seaborn as sns

tips = sns.load_dataset('tips')
tips.head()
total_billtipsexsomkerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4

Matplotlib

Box Plots

Matplotlib’s boxplot() can be used to draw box plots. The following is its declaration. Please refer to the official website for other parameters.

matplotlib.pyplot.boxplot(x, sym=None, vert=None, labels=None)
  • x: Data.
  • sym: The marker of outliers.
  • vert: Vertical or horizontal box chart.
  • labels: The labels of each dataset.

The following example shows how to use boxplot() to draw a box plot.

import matplotlib.pyplot as plt

plt.boxplot(tips[['total_bill', 'tip']], labels=['total_bill', 'tip'])
plt.title('Box Plot')
plt.xlabel('Total Bill vs Tip')
plt.ylabel('Money')
plt.show()
Matplotlib box plot
Matplotlib box plot

In this example, we can see that the use of boxplot() is quite simple. After that, call title() to set the title of the chart, and then call xlabel() and ylabel() to set the title of the x-axis and y-axis.

If you want to draw a horizontal box plot, you only need to set the parameter vert to False, as shown in the following example.

plt.boxplot(tips[['total_bill', 'tip']], vert=False, labels=['total_bill', 'tip'])
plt.title('Box Plot')
plt.xlabel('Money')
plt.ylabel('Total Bill vs Tip')
plt.show()
Matplot horizontal box plot
Matplot horizontal box plot

Violin Plots

As simple as drawing a box plot, Matplotlib also provides a dedicated function to draw a violin plot, that is, violinplot() . Its declaration is as follows. Please refer to the official website for other parameters.

matplotlib.pyplot.violinplot(dataset, vert=True)
  • x: Information.
  • vert: Vertical or horizontal box chart.

Compared with boxplot(), the parameters of violinplot() are relatively few. Let us first look at an example.

import matplotlib.pyplot as plt

plt.violinplot(tips[['total_bill', 'tip']])
plt.xticks([1, 2], ['total_bill', 'tip'])
plt.title('Violin Plot')
plt.xlabel('Total Bill vs Tip')
plt.ylabel('Money')
plt.show()
Matplotlib violin plot
Matplotlib violin plot

In this example, we can see that drawing a violin chart is actually very simple, but violinplot() does not provide a parameter to set labels for dataset. However, we can use xticks() to set labels.

The following example shows how to draw a horizontal violin chart.

Matplotlib horizontal violin plot
Matplotlib horizontal violin plot

Seaborn

Box Plots

Like Matplotlib, Seaborn’s function is also called boxplot() . Let’s take a look at its declaration. Please refer to the official website for other parameters.

seaborn.boxplot(x=None, y=None, hue=None, data=None, palette=None)
  • x: Long-form data of x axis.
  • y: Long-form data of y-axis.
  • hue: Grouping for color..
  • palette: Color map.
  • data: Data.

Let’s take a look at an example.

sns.boxplot(data=tips[['total_bill', 'tip']], palette='Set2')
Seaborn box plot
Seaborn box plot

Seaborn is really easy to use. It automatically uses the name of the columns as the label of the dataset. In the example, we set the parameter palette to use different color maps.

The following example shows how to use parameter x and y.

sns.boxplot(data=tips, x='day', y='total_bill', palette='Set2')
Seaborn box plot
Seaborn box plot

In this example, we set the parameter x to day. boxplot() will use the data of day column, group the data, and it becomes 4 dataset. This can be used when you want to display the distribution of each value in a certain dataset.

The following example shows how to draw a horizontal box plot.

sns.boxplot(data=tips, x='total_bill', y='day', palette='Set2')
Seaborn horizontal box plot
Seaborn horizontal box plot

Seaborn will automatically determine the type of data to choose whether to draw a horizontal or vertical box plot. Of course, you can also use the parameter orient to directly specify horizontal or vertical.

The last example shows how to use the parameter hue.

sns.boxplot(data=tips, x='day', y='total_bill', hue='smoker')
Seaborn grouped box plot
Seaborn grouped box plot

The parameter hue can display a box plot with two dataset. This can be used to compare the distribution of two dataset. In addition, it will display the legend in the upper left corner.

Violin Plots

Seaborn also provides a special function to draw the violin chart, that is violinplot() . Please refer to the official website for other parameters.

seaborn.violinplot(x=None, 
                   y=None, 
                   hue=None, 
                   data=None, 
                   palette=None, 
                   split=False)
  • x: Long-form data of x-axis.
  • y: Long-form data of y-axis.
  • hue: Grouping for color.
  • palette: Color map.
  • data: Data.
  • Split: When grouping, draw each data set on half of the violin chart.

The parameters of Seaborn’s violinplot() are very similar to boxplot(). Let us look at the following example.

sns.violinplot(data=tips[['total_bill', 'tip']], palette='Set2')
Seaborn violin plot
Seaborn violin plot

The following is an example with parameter x and y.

sns.violinplot(data=tips, x='day', y='total_bill', palette='Set2')
Seaborn violin plot
Seaborn violin plot

Next example shows an horizontal violin plot.

sns.violinplot(data=tips, x='total_bill', y='day', palette='Set3')
Seaborn horizontal violin plot
Seaborn horizontal violin plot

The following example shows how to use the parameter hue.

Seaborn grouped violin plot
Seaborn grouped violin plot

Finally, let’s take a look at an example of how to use parameter split.

sns.violinplot(data=tips, x='day', y='total_bill', hue='smoker', split=True)
Seaborn split grouped violin plot
Seaborn split grouped violin plot

I personally thinks that Seaborn’s design of functions is quite good. Since box plots and violin plots have many similar elements, the parameters and usage of the two functions should also be the same.

Plotly Express

Box Plots

Plotly Express’s box() can be used to draw box plots. Its declaration is as follows. Please refer to the official website for other parameters.

plotly.express.box(data_frame=None, 
                   x=None, 
                   y=None, 
                   color=None, 
                   points=None, 
                   title=None)
  • data_frame: Data.
  • x: Long-form data of x-axis.
  • y: Long-form data of y-axis.
  • color: Grouping for color.
  • points: Its value can be as follows:
    • ‘outliers’: Only display outliers. This is the default value.
    • ‘all’: Display all points.
    • ‘suspectedoutliers’: display points ​​less than 4*Q1-3*Q3 or greater than 4*Q3-3Q1.
    • False: No points is displayed.
  • title: Plot title.

Let’s look at an example of box().

import plotly.express as px

px.box(tips[['total_bill', 'tip']], title='Plotly Express Box Plot')
Plotly Express box plot
Plotly Express box plot

The usage of box() is quite simple and intuitive.

The next example is how to use the parameter x and y.

px.box(tips, x='day', y='total_bill', title='Plotly Express Box Plot')
Plotly Express box plot
Plotly Express box plot

As with Seaborn when parameter x is specified to day, it will group the data of day columns to 4 datasets.

box() will determine whether to draw a horizontal or vertical box plot according to the type of data value.

px.box(tips, x='total_bill', y='day', title='Plotly Express Box Plot')
Plotly Express horizontal box plot
Plotly Express horizontal box plot

The following example shows how to use the parameter color.

px.box(tips, x='day', y='total_bill', color='smoker', title='Plotly Express Box Plot')
Plotly Express grouped box plot
Plotly Express grouped box plot

The parameter color can display two datasets and a legend on the right.

The last example shows how to use the parameter points to display the probability density of data.

px.box(tips[['total_bill', 'tip']], title='Plotly Express Box Plot', points='all')
Plotly Express box plot with points
Plotly Express box plot with points

There are more box plot examples on the official website for reference.

Violin Plots

Plotly Express provides violin() to draw violin plots. Its declaration is as follows. Please refer to the official website for other parameters.

plotly.express.violin(data_frame=None, 
                      x=None, 
                      y=None, 
                      color=None, 
                      points=None, 
                      title=None)
  • data_frame: Data.
  • x: Long-form data of x-axis.
  • y: Long-form data of y-axis.
  • color: Grouping for color.
  • points: Its value can be as follows:
    • ‘outliers’: Only Display outliers. This is the default value.
    • ‘all’: Display all points.
    • ‘suspectedoutliers’: Display points ​​less than 4*Q1-3*Q3 or greater than 4*Q3-3Q1.
    • False: No points is displayed.
  • title: Plot title.

The parameters of violin() are similar to those of box(), so of course their usage is very similar. Let us look at the following example.

import plotly.express as px

px.violin(tips[['total_bill', 'tip']], title='Plotly Express Box Plot')
Plotly Express violin plot
Plotly Express violin plot

The next example show how to use parameters x and y.

px.violin(tips, x='day', y='total_bill', title='Plotly Express Box Plot')
Plotly Express violin plot
Plotly Express violin plot

Below is an example of a horizontal violin chart.

px.violin(tips, x='total_bill', y='day', title='Plotly Express Box Plot')
Plotly Express horizontal violin plot
Plotly Express horizontal violin plot

The following is an example of parameter color.

px.violin(tips, x='day', y='total_bill', color='smoker', title='Plotly Express Box Plot')
Plotly Express grouped violin plot
Plotly Express grouped violin plot

The last example shows how to use parameter points.

px.violin(tips[['total_bill', 'tip']], title='Plotly Express Box Plot', points='all')
Plotly Express violin plot with points
Plotly Express violin plot with points

It can be seen that the distribution and probability density of the points are actually displayed on the violin plot itself. Therefore, we don’t need to display additional points on the violin plot.

There are many examples of violin diagrams on the official website for reference.

Conclusion

We have introduced three packages for drawing box plots and violin plots in this article. Compared to Matplotlib, Seaborn and Plotly Express are easier to use, and they are very similar. In addition, Plotly Express has an interactive toolbar.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like