I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.
Related
I have a pandas DataFrame containing the percentage of students that have a certain skill in each subject stratified according to their gender
iterables = [['Above basic','Basic','Low'], ['Female','Male']]
index = pd.MultiIndex.from_product(iterables, names=["Skills", "Gender"])
df = pd.DataFrame(data=[[36,36,8,8,6,6],[46,46,2,3,1,2],[24,26,10,11,16,13]], index=["Math", "Literature", "Physics"], columns=index)
print(df)
Skill Above basic Basic Low
Gender Female Male Female Male Female Male
Math 36 36 8 8 6 6
Literature 46 46 2 3 1 2
Physics 24 26 10 11 16 13
Next I want to see how the skills are distributed according to the subjects
#plot how the skills are distributed according to the subjects
df.sum(axis=1,level=[0]).plot(kind='bar')
df.plot(kind='bar')
Now I would like to add the percentage of Male and Female to each bar in a stacked manner.. eg. for the fist bar ("Math", "Above basic") it should be 50/50. For the bar ("Literature", "Basic") it should be 40/60, for the bar ("Literature","Low") it should be 33.3/66.7 and so on...
Could you give me a hand?
Using the level keyword in DataFrame and Series aggregations, df.sum(axis=1,level=[0]), is deprecated.
Use df.groupby(level=0, axis=1).sum()
df.div(dfg).mul(100).round(1).astype(str) creates a DataFrame of strings with the 'Female' and 'Male' percent for each of the 'Skills', which can be used to create a custom bar label.
As shown in this answer, use matplotlib.pyplot.bar_label to annotate the bars, which has a labels= parameter for custom labels.
Tested in python 3.11, pandas 1.5.3, matplotlib 3.7.0, seaborn 0.12.2
# group df to create the bar plot
dfg = df.groupby(level=0, axis=1).sum()
# calculate the Female / Male percent for each Skill
percent_s = df.div(dfg).mul(100).round(1).astype(str)
# plot the bars
ax = dfg.plot(kind='bar', figsize=(10, 7), rot=0, width=0.9, ylabel='Total Percent\n(Female/Male split)')
# iterate through the bar containers
for c in ax.containers:
# get the Skill label
label = c.get_label()
# use the Skill label to get the current group based on level, join the strings,and get an array of custom labels
labels = percent_s.loc[:, percent_s.columns.get_level_values(0).isin([label])].agg('/'.join, axis=1).values
# add the custom labels to the center of the bars
ax.bar_label(c, labels=labels, label_type='center')
# add total percent to the top of the bars
ax.bar_label(c, weight='bold', fmt='%g%%')
percent_s
Skills Above basic Basic Low
Gender Female Male Female Male Female Male
Math 50.0 50.0 50.0 50.0 50.0 50.0
Literature 50.0 50.0 40.0 60.0 33.3 66.7
Physics 48.0 52.0 47.6 52.4 55.2 44.8
Optionally, melt df into a long form, and use sns.catplot with kind='bar' to plot each 'Gender' in a separate Facet.
# melt df into a long form
dfm = df.melt(ignore_index=False).reset_index(names='Subject')
# plot the melted dataframe
g = sns.catplot(kind='bar', data=dfm, x='Subject', y='value', col='Gender', hue='Skills')
# Flatten the axes for ease of use
axes = g.axes.ravel()
# relabel the yaxis
axes[0].set_ylabel('Percent')
# add bar labels
for ax in axes:
for c in ax.containers:
ax.bar_label(c, fmt='%0.1f%%')
Or swap x= and col= to col='Subject' and x='Gender'.
I am trying to plot two displots side by side with this code
fig,(ax1,ax2) = plt.subplots(1,2)
sns.displot(x =X_train['Age'], hue=y_train, ax=ax1)
sns.displot(x =X_train['Fare'], hue=y_train, ax=ax2)
It returns the following result (two empty subplots followed by one displot each on two lines)-
If I try the same code with violinplot, it returns result as expected
fig,(ax1,ax2) = plt.subplots(1,2)
sns.violinplot(y_train, X_train['Age'], ax=ax1)
sns.violinplot(y_train, X_train['Fare'], ax=ax2)
Why is displot returning a different kind of output and what can I do to output two plots on the same line?
seaborn.distplot has been DEPRECATED in seaborn 0.11 and is replaced with the following:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw. This is a FacetGrid, and does not have the ax parameter, so it will not work with matplotlib.pyplot.subplots.
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing. This does have the ax parameter, so it will work with matplotlib.pyplot.subplots.
It is applicable to any of the seaborn FacetGrid plots that there is no ax parameter. Use the equivalent axes-level plot.
Look at the documentation for the figure-level plot to find the appropriate axes-level plot function for your needs.
See Figure-level vs. axes-level functions
Because the histogram of two different columns is desired, it's easier to use histplot.
See How to plot in multiple subplots for a number of different ways to plot into maplotlib.pyplot.subplots
Also review seaborn histplot and displot output doesn't match
Tested in seaborn 0.11.1 & matplotlib 3.4.2
fig, (ax1, ax2) = plt.subplots(1, 2)
sns.histplot(x=X_train['Age'], hue=y_train, ax=ax1)
sns.histplot(x=X_train['Fare'], hue=y_train, ax=ax2)
Imports and DataFrame Sample
import seaborn as sns
import matplotlib.pyplot as plt
# load data
penguins = sns.load_dataset("penguins", cache=False)
# display(penguins.head())
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Axes Level Plot
With the data in a wide format, use sns.histplot
# select the columns to be plotted
cols = ['bill_length_mm', 'bill_depth_mm']
# create the figure and axes
fig, axes = plt.subplots(1, 2)
axes = axes.ravel() # flattening the array makes indexing easier
for col, ax in zip(cols, axes):
sns.histplot(data=penguins[col], kde=True, stat='density', ax=ax)
fig.tight_layout()
plt.show()
Figure Level Plot
With the dataframe in a long format, use displot
# create a long dataframe
dfl = penguins.melt(id_vars='species', value_vars=['bill_length_mm', 'bill_depth_mm'], var_name='bill_size', value_name='vals')
# display(dfl.head())
species bill_size vals
0 Adelie bill_length_mm 39.1
1 Adelie bill_depth_mm 18.7
2 Adelie bill_length_mm 39.5
3 Adelie bill_depth_mm 17.4
4 Adelie bill_length_mm 40.3
# plot
sns.displot(data=dfl, x='vals', col='bill_size', kde=True, stat='density', common_bins=False, common_norm=False, height=4, facet_kws={'sharey': False, 'sharex': False})
Multiple DataFrames
If there are multiple dataframes, they can be combined with pd.concat, and use .assign to create an identifying 'source' column, which can be used for row=, col=, or hue=
# list of dataframe
lod = [df1, df2, df3]
# create one dataframe with a new 'source' column to use for row, col, or hue
df = pd.concat((d.assign(source=f'df{i}') for i, d in enumerate(lod, 1)), ignore_index=True)
See Import multiple csv files into pandas and concatenate into one DataFrame to read multiple files into a single dataframe with an identifying column.
all.
I am trying to create a stacked bar chart built using time series data. My issue -- if I plot my data as time series (using lines) then everything works fine and I get a (messy) time series graph that includes correct dates. However, if I instead try to plot this as a stacked bar chart, my dates disappear and none of my bars appear.
I have tried messing with the indexing, height, and width of the bars. No luck.
Here is my code:
import pylab
import pandas as pd
import matplotlib.pyplot as plt
df1= pd.read_excel('pathway/filename.xls')
df1.set_index('TIME', inplace=True)
ax = df1.plot(kind="Bar", stacked=True)
ax.set_xlabel("Date")
ax.set_ylabel("Change in Yield")
df1.sum(axis=1).plot( ax=ax, color="k", title='Historical Decomposition -- 1 year -- One-Quarter Revision')
plt.axhline(y=0, color='r', linestyle='-')
plt.show()
If i change
ax = df1.plot(kind="Bar", stacked=True)
to ax = df1.plot(kind="line", stacked=False)
I get:
if instead I use ax = df1.plot(kind="Bar", stacked=True)
I get:
Any thoughts here?
Without knowing what the data looks like, I'd try something like this:
#Import data here and generate DataFrame
print(df.head(5))
A B C D
DATE
2020-01-01 -0.01 0.06 0.40 0.45
2020-01-02 -0.02 0.05 0.39 0.42
2020-01-03 -0.03 0.04 0.38 0.39
2020-01-04 -0.04 0.03 0.37 0.36
2020-01-05 -0.05 0.02 0.36 0.33
f, ax = plt.subplots()
ax.bar(df.index, df['A'])
ax.bar(df.index, df['B'])
ax.bar(df.index, df['C'], bottom=df['B'])
ax.plot(df.index, df['D'], color='black', linewidth=2)
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.legend()
plt.show()
Edit:: Ok, I've found a way looking at this post here:
Plot Pandas DataFrame as Bar and Line on the same one chart
Try resetting the index so that it is a separate column. In my example, it is called 'DATE'. Then try:
ax = df[['DATE','D']].plot(x='DATE',color='black')
df[['DATE','A','B','C']].plot(x='DATE', kind='bar',stacked=True,ax=ax)
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.legend()
plt.show()
I have a dataframe with variable scale data, I am trying to get a plot with subplots. something like this.
raw_data = {'strike_date': ['2019-10-31', '2019-11-31','2019-12-31','2020-01-31', '2020-02-31'],
'strike': [100.00, 113.00, 125.00, 126.00, 135.00],
'lastPrice': [42, 32, 36, 18, 23],
'volume': [4, 24, 31, 2, 3],
'openInterest': [166, 0, 0, 62, 12]}
ploty_df = pd.DataFrame(raw_data, columns = ['strike_date', 'strike', 'lastPrice', 'volume', 'openInterest'])
ploty_df
strike_date strike lastPrice volume openInterest
0 2019-10-31 100.0 42 4 166
1 2019-11-31 113.0 32 24 0
2 2019-12-31 125.0 36 31 0
3 2020-01-31 126.0 18 2 62
4 2020-02-31 135.0 23 3 12
this is what I tried so far with a twinx, if you noticed the out put is a flat data without any scale difference for strike and volume.
fig, ax = plt.subplots()
fig.subplots_adjust(right=0.75)
mm = ax.twinx()
yy = ax.twinx()
for col in ploty_df.columns:
mm.plot(ploty_df.index,ploty_df[[col]],label=col)
mm.set_ylabel('volume')
yy.set_ylabel('strike')
yy.spines["right"].set_position(("axes", 1.2))
yy.set_ylim(mm.get_ylim()[0]*12, mm.get_ylim()[1]*12)
plt.tick_params(axis='both', which='major', labelsize=16)
handles, labels = mm.get_legend_handles_labels()
mm.legend(fontsize=14, loc=6)
plt.show()
and the output
the main problem with your script is that you are generating 3 axes but only plotting on one of them, you need to think of each axes as a separate object with its own y-scale, y-limit and so. So for example in your script when you call fig, ax = plt.subplots() you generate the first axes that you call ax (this is the standard yaxis with the scale on the left-side of your plot). If you want to plot something on this axes you should call ax.plot() but in your case you are plotting everything on the axes that you called mm.
I think you should really go through the matplotlib documentation do understand these concepts better. For plotting on multiple y-axis I would recommend you to have a look at this example.
Below you can find a basic example to plot your data on 3 different y-axis, you can take it as a starting point to produce the graph you are looking for.
#convert the index of your dataframe to datetime
plot_df.index=pd.DatetimeIndex(plot_df.strike_date)
fig, ax = plt.subplots(figsize=(15,7))
fig.subplots_adjust(right=0.75)
l1,=ax.plot(plot_df['strike'],'r')
ax.set_ylabel('Stike')
ax2=ax.twinx()
l2,=ax2.plot(plot_df['lastPrice'],'g')
ax2.set_ylabel('lastPrice')
ax3=ax.twinx()
l3,=ax3.plot(plot_df['volume'],'b')
ax3.set_ylabel('volume')
ax3.spines["right"].set_position(("axes", 1.2))
ax3.spines["right"].set_visible(True)
ax.legend((l1,l2,l3),('Stike','lastPrice','volume'),loc='center left')
here the result:
p.s. Your example dataframe contains non existing dates (31st February 2020) so you have to modify those in order to be able to convert the index to datetime.
I am trying to plot a line plot on top of a stacked bar plot in matplotlib, but cannot get them both to show up.
I have the combined dataframe already set up by pulling various information from other dataframes and with the datetime index. I am trying to plot a stacked bar plot from the activity columns (LightlyActive, FairlyActive, VeryActive) and several line plots from the minutes in each sleep cycle (wake, light, deep, rem) on one set of axes (ax1). I am then trying to plot the efficiency column as a line plot on a separate set of axes (ax2).
I cannot get both the stacked bar plot and the line plots to show up simultaneously. If I plot the bar plot second, that is the only one that shows up. If I plot the line plots first (activity and efficiency) those are the only ones that show up. It seems like whichever style of plot I plot second covers up the first one.
LightlyActive FairlyActive VeryActive efficiency wake light deep rem
dateTime
2018-04-10 314 34 123 93.0 55.0 225.0 72.0 99.0
2018-04-11 253 22 102 96.0 44.0 260.0 50.0 72.0
2018-04-12 282 26 85 93.0 47.0 230.0 60.0 97.0
2018-04-13 292 35 29 96.0 43.0 205.0 81.0 85.0
fig, ax1 = plt.subplots(figsize = (10, 10))
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].plot(kind = 'bar', stacked = True, ax = ax1)
ax2 = plt.twinx(ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1)
temp_df['efficiency'].plot(ax = ax2)
plt.show()
I would like to have on single plot with a stacked bar plot of activity levels ('LightlyActive', 'FairlyActive', 'VeryActive') and sleep cycles ('wake', 'light', 'deep', 'rem') on one set of axes, and sleep efficiency on a second set of axes.
EDIT
I am not even getting it to display as Trenton did in the edited version below (designated as "Edited by Trenton M"). The 2 plots immediately below this are the versions that display for me.
This is what I get so far (Edited by Trenton M):
Note the circled areas.
Figured it out! By leaving the dates as a column (i.e. not setting them as the index), I can plot both the line plot and bar plot. I can then go back and adjust labels accordingly.
#ScottBoston your x-axis tipped me off. Thanks for looking into this.
date1 = pd.datetime(2018, 4, 10)
data = {'LightlyActive': [314, 253, 282, 292],
'FairlyActive': [34, 22, 26, 35],
'VeryActive': [123, 102, 85, 29],
'efficiency': [93.0, 96.0, 93.0, 96.0],
'wake': [55.0, 44.0, 47.0, 43.0],
'light': [225.0, 260.0, 230.0, 205.0],
'deep': [72.0, 50.0, 60.0, 81.0],
'rem': [99.0, 72.0, 97.0, 85.0],
'date': [date1 + pd.Timedelta(days = i) for i in range(4)]}
temp_df = pd.DataFrame(data)
fig, ax1 = plt.subplots(figsize = (10, 10))
ax2 = plt.twinx(ax = ax1)
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].\
plot(kind = 'bar', stacked = True, ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1, alpha = 0.5)
temp_df['efficiency'].plot(ax = ax2)
ax1.set_xticklabels(labels = temp_df['date'])
plt.show()
What about using alpha?
fig, ax1 = plt.subplots(figsize = (10, 10))
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].plot(kind = 'bar', stacked = True, ax = ax1, alpha=.3)
ax2 = plt.twinx(ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1, zorder=10)
temp_df['efficiency'].plot(ax = ax2)
plt.show()
Output: