One Boxplot for multiple dataframe - python

I am trying to make a boxplot that shows multiple dataframe.
right now I have a script that gives me three subplots of sns boxplots, but is there a way that I can be all three data plots in a single boxplots, all columns side by side? The picture is that I want as an output
This is the current code that gives me three subplots.
f, axes = plt.subplots(3,1)
sns.boxplot(x ='type', y ='value', data =df1.sort_values('value'), ax =axes[0]).set_title('A')
sns.boxplot(x ='type', y ='value', data =df2.sort_values('value'), ax =axes[1]).set_title('B')
sns.boxplot(x ='type', y ='value', data =df3.sort_values('value'), ax =axes[2]).set_title('C')
plt.xticks(rotation='horizontal')
sns.set(rc = {'figure.figsize':(20,30)})
sns.set(font_scale=2)
plt.show()

Related

Multiple boxplot with seaborn, but not within the same frame

In case of pandas boxplot, we could use:
for column in df:
plt.figure()
df.boxplot([column])
What could be done equivalently using seaborn, I want to plot multiple boxplots, but not in the same frame, rather for every column individually in loop
You can pass an Axes object to sns.boxplot:
for column in df:
fig, ax = plt.subplots()
sns.boxplot(df[column], ax=ax)

How to use a 3rd dataframe column as x axis ticks/labels in matplotlib scatter

I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks
This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.
In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()

A matplotlib histogram matrix, using Pandas, with multiple categories overlaid

I am trying to combine two approaches at creating histograms.
#Sample Data
df = pd.DataFrame({'V1':[1,2,3,4,5,6],
'V2': [43,35,6,7,31,34],
'V3': [23,75,67,23,56,32],
'V4': [23,45,67,63,56,32],
'V5': [23,5,67,23,6,2],
'V6': [23,78,67,76,56,2],
'V7': [23,45,67,53,56,32],
'V8': [5,5,5,5,5,5],
'cat': ["A","B","C","A","B","B"],})
I am able to create a histogram matrix for each category using this code.
#1. Creating histogram matrix for each category
for i in df['cat'].unique():
fig, ax = plt.subplots()
df[df['cat']==i].hist(figsize=(20,20),ax =ax)
fig.suptitle(i + " Feature-Class Relationships", fontsize = 20)
fig.savefig('Histogram Matrix.png' %(i), dpi = 240)
This creates a separate histogram matrix for each category. However what I would like is for the categories to be overlaid on the same matrix.
I am able to create an overlaid histogram using this approach:
#2. Overlaid histrogram for single variable
fig, ax = plt.subplots()
for i in df['cat'].unique():
df[df['cat']==i]['V8'].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
plt.show()
However this only creates a single overlaid image. I want to create an overlaid histogram for all of variables in the matrix i.e. all categories shown in the same matrix rather than a separate matrix for each category.
I have created the following code, which is a combination of the above two approaches, but it does not overlay each of the histogram matrices together and only the last plot is created.
#3. Combining approaches to create a matrix of overlaid histograms
fig, ax = plt.subplots()
for i in df['cat'].unique():
df[df['cat']==i].hist(figsize=(12,8),ax =ax, alpha = 0.5, label = i)
ax.legend()
fig.savefig('Combined.png', dpi = 240)
Is what I am trying to do possible?
I guess this is what you want. A matrix of 2 columns and 4 rows and in each "cell" of this matrix you get the histogram for a column with the categories overlapped.
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'V1':[1,2,3,4,5,6],
'V2': [43,35,6,7,31,34],
'V3': [23,75,67,23,56,32],
'V4': [23,45,67,63,56,32],
'V5': [23,5,67,23,6,2],
'V6': [23,78,67,76,56,2],
'V7': [23,45,67,53,56,32],
'V8': [5,5,5,5,5,5],
'cat': ["A","B","C","A","B","B"],})
# Define your subplots matrix.
# In this example the fig has 4 rows and 2 columns
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
# This approach is better than looping through df.cat.unique
for g, d in df.groupby('cat'):
d.hist(alpha = 0.5, ax=axes, label=g)
# Just outputing the legend for each column in fig
for c1, c2 in axes:
c1.legend()
c2.legend()
plt.show()
Here's the output:
The last code from the question should give you a warning about the axes being cleared - essentially the phenomenon you observe.
UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
Now the idea could be to let pandas plot each histogram in its own axes, but to make sure that each of those is the same, namely ax. This can be done by passing a list of 8 times ax, ax =[ax]*8:
fig, ax = plt.subplots(figsize=(12,8),)
for i in df['cat'].unique():
df[df['cat']==i].hist(ax =[ax]*8, alpha = 0.5, label = i)
ax.legend()
plt.show()
The result will look very crowded, but this is apparently desired.

How to plot multiple dataframes to the same plot axes

I have two dataframes, with unique x and y coordinates, and I want to plot them in the same figure.
I am now plotting two dataframes in same figure as such:
plt.plot(df1['x'],df1['y'])
plt.plot(df2['x'],df2['y'])
plt.show
However, pandas also has plotting functionality.
df.plot()
How could I achieve the same as my first example but use the pandas functionality?
To plot all columns against the index as line plots.
ax = df1.plot()
df2.plot(ax=ax)
A single pandas.DataFrame.plot (not subplots=True) returns a matplotlib.axes.Axes, which you can then pass to the second dataframe.
To plot specific columns as x and y. Specifying x and y is required for scatter plots (kind='scatter').
ax = df1.plot(x='Lat', y='Lon', figsize=(8, 8))
df2.plot(ax=ax, x='Lat', y='Lon')

pandas multiple plots not working as hists

with a dataframe df in Pandas, I am trying to plot histograms on the same page filtering by 3 different variables; the intended outcome is histograms of the values for each of the three types. The following code works for me as PLOTs, but when I change to HISTs, they all stack on top of each other. Any suggestions?
plt.figure(1)
plt.subplot(311)
df[df.Type == 'Type1'].values.plot()
plt.subplot(312)
df[df.Type == 'Type2'].values.plot()
plt.subplot(313)
df[df.Type == 'Type3'].values.plot()
plt.savefig('output.pdf')
Is this a bug in Pandas or are plot and hist not interchangable?
Thanks in advance
Plot and hist aren't interchangabe: hist is its own DataFrame and Series methods. You might be able to force the plots to be independent by passing in an ax keyword. So
fig = plt.figure()
ax1 = fig.add_subplot(3,1,1)
df[df.Type == 'Type1'].values.hist(ax=ax1)
....

Categories

Resources