How to plot multiple dataframes to the same plot axes - python

I have two dataframes, with unique x and y coordinates, and I want to plot them in the same figure.
I am now plotting two dataframes in same figure as such:
plt.plot(df1['x'],df1['y'])
plt.plot(df2['x'],df2['y'])
plt.show
However, pandas also has plotting functionality.
df.plot()
How could I achieve the same as my first example but use the pandas functionality?

To plot all columns against the index as line plots.
ax = df1.plot()
df2.plot(ax=ax)
A single pandas.DataFrame.plot (not subplots=True) returns a matplotlib.axes.Axes, which you can then pass to the second dataframe.
To plot specific columns as x and y. Specifying x and y is required for scatter plots (kind='scatter').
ax = df1.plot(x='Lat', y='Lon', figsize=(8, 8))
df2.plot(ax=ax, x='Lat', y='Lon')

Related

plotting whit subplots in a loop python [duplicate]

Case:
I receive a dataframe with (say 50) columns.
I extract the necessary columns from that dataframe using a condition.
So we have a list of selected columns of our dataframe now. (Say this variable is sel_cols)
I need a bar chart for each of these columns value_counts().
And I need to arrange all these bar charts in 3 columns, and varying number of rows based on number of columns selected in sel_cols.
So, if say 8 columns were selected, I want the figure to have 3 columns and 3 rows, with last subplot empty or just 8 subplots in 3x3 matrix if that is possible.
I could generate each chart separately using following code:
for col in sel_cols:
df[col].value_counts().plot(kind='bar)
plt.show()
plt.show() inside the loop so that each chart is shown and not just the last one.
I also tried appending these charts to a list this way:
charts = []
for col in sel_cols:
charts.append(df[col].value_counts().plot(kind='bar))
I could convert this list into an numpy array through reshape() but then it will have to be perfectly divisible into that shape. So 8 chart objects will not be reshaped into 3x3 array.
Then I tried creating the subplots first in this way:
row = len(sel_cols)//3
fig, axes = plt.subplots(nrows=row,ncols=3)
This way I would get the subplots, but I get two problems:
I end up with extra subplots in the 3 columns which will go unplotted (8 columns example).
I do not know how to plot under each subplots through a loop.
I tried this:
for row in axes:
for chart, col in zip(row,sel_cols):
chart = data[col].value_counts().plot(kind='bar')
But this only plots the last subplot with the last column. All other subplots stays blank.
How to do this with minimal lines of code, possibly without any need for human verification of the final subplots placements?
You may use this sample dataframe:
pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],
'B':['E','E','E','E','F','F','F','F','E'],
'C':[1,1,0,0,1,1,0,0,1],
'D':['P','Q','R','S','P','Q','R','P','Q'],
'E':['E','E','E','E','F','F','G','G','G'],
'F':[1,1,0,0,1,1,0,0,1],
'G':['N','N','N','N','Y','N','N','Y','N'],
'H':['G','G','G','E','F','F','G','F','E'],
'I':[1,1,0,0,1,1,0,0,1],
'J':['Y','N','N','Y','Y','N','N','Y','N'],
'K':['E','E','E','E','F','F','F','F','E'],
'L':[1,1,0,0,1,1,0,0,1],
})
Selected columns are: sel_cols = ['A','B','D','E','G','H','J','K']
Total 8 columns.
Expected output is bar charts for value_counts() of each of these columns arranged in subplots in a figure with 3 columns. Rows to be decided based on number of columns selected, here 8 so 3 rows.
Given OP's sample data:
df = pd.DataFrame({'A':['Y','N','N','Y','Y','N','N','Y','N'],'B':['E','E','E','E','F','F','F','F','E'],'C':[1,1,0,0,1,1,0,0,1],'D':['P','Q','R','S','P','Q','R','P','Q'],'E':['E','E','E','E','F','F','G','G','G'],'F':[1,1,0,0,1,1,0,0,1],'G':['N','N','N','N','Y','N','N','Y','N'],'H':['G','G','G','E','F','F','G','F','E'],'I':[1,1,0,0,1,1,0,0,1],'J':['Y','N','N','Y','Y','N','N','Y','N'],'K':['E','E','E','E','F','F','F','F','E'],'L':[1,1,0,0,1,1,0,0,1]})
sel_cols = list('ABDEGHJK')
data = df[sel_cols].apply(pd.value_counts)
We can plot the columns of data in several ways (in order of simplicity):
DataFrame.plot with subplots param
seaborn.catplot
Loop through plt.subplots
1. DataFrame.plot with subplots param
Set subplots=True with the desired layout dimensions. Unused subplots will be auto-disabled:
data.plot.bar(subplots=True, layout=(3, 3), figsize=(8, 6),
sharex=False, sharey=True, legend=False)
plt.tight_layout()
2. seaborn.catplot
melt the data into long-form (i.e., 1 variable per column, 1 observation per row) and pass it to seaborn.catplot:
import seaborn as sns
melted = data.melt(var_name='var', value_name='count', ignore_index=False).reset_index()
sns.catplot(data=melted, kind='bar', x='index', y='count',
col='var', col_wrap=3, sharex=False)
3. Loop through plt.subplots
zip the columns and axes to iterate in pairs. Use the ax param to place each column onto its corresponding subplot.
If the grid size is larger than the number of columns (e.g., 3*3 > 8), disable the leftover axes with set_axis_off:
fig, axes = plt.subplots(3, 3, figsize=(8, 8), constrained_layout=True, sharey=True)
# plot each col onto one ax
for col, ax in zip(data.columns, axes.flat):
data[col].plot.bar(ax=ax, rot=0)
ax.set_title(col)
# disable leftover axes
for ax in axes.flat[data.columns.size:]:
ax.set_axis_off()
Alternative to the answer by tdy, I tried to do it without seaborn using Matplotlib and a for loop.
Figured it might be better for some who want specific control over subplots with formatting and other parameters, then this is another way:
fig = plt.figure(1,figsize=(16,12))
for i, col in enumerate(sel_cols,1):
fig.add_subplot(3,4,i,)
data[col].value_counts().plot(kind='bar',ax=plt.gca())
plt.title(col)
plt.tight_layout()
plt.show(1)
plt.subplot activates a subplot, while plt.gca() points to the active subplot.

One Boxplot for multiple dataframe

I am trying to make a boxplot that shows multiple dataframe.
right now I have a script that gives me three subplots of sns boxplots, but is there a way that I can be all three data plots in a single boxplots, all columns side by side? The picture is that I want as an output
This is the current code that gives me three subplots.
f, axes = plt.subplots(3,1)
sns.boxplot(x ='type', y ='value', data =df1.sort_values('value'), ax =axes[0]).set_title('A')
sns.boxplot(x ='type', y ='value', data =df2.sort_values('value'), ax =axes[1]).set_title('B')
sns.boxplot(x ='type', y ='value', data =df3.sort_values('value'), ax =axes[2]).set_title('C')
plt.xticks(rotation='horizontal')
sns.set(rc = {'figure.figsize':(20,30)})
sns.set(font_scale=2)
plt.show()

Multiple boxplot with seaborn, but not within the same frame

In case of pandas boxplot, we could use:
for column in df:
plt.figure()
df.boxplot([column])
What could be done equivalently using seaborn, I want to plot multiple boxplots, but not in the same frame, rather for every column individually in loop
You can pass an Axes object to sns.boxplot:
for column in df:
fig, ax = plt.subplots()
sns.boxplot(df[column], ax=ax)

How to use a 3rd dataframe column as x axis ticks/labels in matplotlib scatter

I'm struggling to wrap my head around matplotlib with dataframes today. I see lots of solutions but I'm struggling to relate them to my needs. I think I may need to start over. Let's see what you think.
I have a dataframe (ephem) with 4 columns - Time, Date, Altitude & Azimuth.
I produce a scatter for alt & az using:
chart = plt.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
What's the most efficient way to set the values in the Time column as the labels/ticks on the x axis?
So:
the scale/gridlines etc all remain the same
the chart still plots alt and az
the y axis ticks/labels remain as is
only the x axis ticks/labels are changed to the Time column.
Thanks
This isn't by any means the cleanest piece of code but the following works for me:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.scatter(ephem.Azimuth, ephem.Altitude, marker='x', color='black', s=8)
labels = list(ephem.Time)
ax.set_xticklabels(labels)
plt.show()
Here you will explicitly force the set_xticklabels to the dataframe Time column which you have.
In other words, you want to change the x-axis tick labels using a list of values.
labels = ephem.Time.tolist()
# make your plot and before calling plt.show()
# insert the following two lines
ax = plt.gca()
ax.set_xticklabels(labels = labels)
plt.show()

matplotlib.pyplot: how to include custom legends when plotting dataframes?

I am plotting two dataframes in the same chart: the USDEUR exchange rate and the 3-day moving average.
df.plot(ax=ax, linewidth=1)
rolling_mean.plot(ax=ax, linewidth=1)
Both dataframes are labelled "Value" so I would like to customize that:
I tried passing the label option but that didn't work, as it seems that this option is exclusive to matplotlib.axes.Axes.plot and not to pandas.DataFrame.plot. So I tried using axes instead, and passing each label:
ax.plot(df, linewidth=1, label='FRED/DEXUSEU')
ax.plot(rolling_mean, linewidth=1, label='3-day SMA')
However now the legend is not showing up at all unless I explicitly call ax.legend() afterwards.
Is it possible to plot the dataframes while passing custom labels without the need of an additional explicit call?
When setting a label using df.plot() you have to specifiy the data which is being plotted:
fig, (ax1, ax2) = plt.subplots(1,2)
df = pd.DataFrame({'Value':np.random.randn(10)})
df2 = pd.DataFrame({'Value':np.random.randn(10)})
df.plot(label="Test",ax=ax1)
df2.plot(ax=ax1)
df.plot(y="Value", label="Test",ax=ax2)
df2.plot(y="Value", ax=ax2)
ax1.set_title("Reproduce problem")
ax2.set_title("Possible solution")
plt.show()
Which gives:
Update: It appears that there is a difference between plotting a dataframe, and plotting a series. When plotting a dataframe, the labels are taken from the column names. However, when specifying y="Value" you are then plotting a series, which then actually uses the label argument.

Categories

Resources