Plot multiple grouped bar chart with matplotlib in python - python

Here's few rows of my 100k lines df:
data.head()
my goal is to have 4 grouped bar charts (1row ; 4 col) where :
Each chart correspands to a "product family" (I know i have 4 so i can make 4 sub-df)
"Site" and "year" in x axis,
"sum of Tonnage" in y axis,
Example of a the bar chart I'm trying to get
The closest i got is to have the 4 plots but one under the other. and the code is not as elegant as i want it to be.
I'm a beginner so this might look too easy for you. just bare with me :)
Here's my code:
data_A=data_no_dp.loc[data_no_dp['Product family']=='A'][['id','Site','Tonnage','Année']].drop_duplicates('id')
data_B=data_no_dp.loc[data_no_dp['Product family']=='B'][['id','Site','Tonnage','Année']].drop_duplicates('id')
data_C=data_no_dp.loc[(data_no_dp['Product family']=='C') ][['id','Site','Tonnage','Année']].drop_duplicates('id')
data_D=data_no_dp.loc[(data_no_dp['Product family']=='D') ][['id','Site','Tonnage','Année','Product family']].drop_duplicates('id')
data_A_pivot=data_A.groupby(['Site','Année']).sum().unstack()
data_A_pivot=data_A_pivot['Tonnage'].replace(np.nan,0)
data_B_pivot=data_B.groupby(['Site','Année']).sum().unstack()
data_B_pivot=data_B_pivot['Tonnage'].replace(np.nan,0)
data_C_pivot=data_C.groupby(['Site','Année']).sum().unstack()
data_C_pivot=data_C_pivot['Tonnage'].replace(np.nan,0)
data_D_pivot=data_D.groupby(['Site','Année']).sum().unstack()
data_D_pivot=data_D_pivot['Tonnage'].replace(np.nan,0)
#plt.subplots(1,4, sharey=True, figsize= (20,4))
plt.subplot(2,2,1)
ax1=data_A_pivot.plot(kind='bar')
ax2=data_B_pivot.plot(kind='bar')
ax3=data_C_pivot.plot(kind='bar')
ax4=data_D_pivot.plot(kind='bar')
plt.show()

Since no data were provided, I drew multiple graphs using test data from seaborn. pandas plots and subplots can be addressed with the following technique.
import matplotlib.pyplot as plt
# for sample data
import seaborn as sns
tips = sns.load_dataset("tips")
data_A = tips[tips['day'] == 'Sun']
data_B = tips[tips['day'] == 'Sat']
data_C = tips[tips['day'] == 'Thur']
data_D = tips[tips['day'] == 'Fri']
data_A_pivot=data_A.groupby(['time','sex']).sum().unstack().fillna(0)
data_B_pivot=data_B.groupby(['time','sex']).sum().unstack().fillna(0)
data_C_pivot=data_C.groupby(['time','sex']).sum().unstack().fillna(0)
data_D_pivot=data_D.groupby(['time','sex']).sum().unstack().fillna(0)
fig, [ax1,ax2,ax3,ax4] = plt.subplots(nrows=1, ncols=4, figsize=(20,4))
data_A_pivot.plot(kind='bar', ax=ax1)
data_B_pivot.plot(kind='bar', ax=ax2)
data_C_pivot.plot(kind='bar', ax=ax3)
data_D_pivot.plot(kind='bar', ax=ax4)
plt.show()

Related

Pie chart enclosed with a black line (rectangle)

Below you can see my data and facet plot in matplotlib.
import pandas as pd
import numpy as np
pd.set_option('max_columns', None)
import matplotlib.pyplot as plt
import matplotlib as mpl
# Data
data = {
'type_sale': ['g_1','g_2','g_3','g_4','g_5','g_6','g_7','g_8','g_9','g_10'],
'open':[70,20,24,150,80,90,60,90,20,20],
'closed':[30,14,20,10,20,40,10,10,10,10],
}
df = pd.DataFrame(data, columns = ['type_sale',
'open',
'closed',
])
data1 = {
'type_sale': [ 'open','closed'],
'structure':[70,30],
}
df1 = pd.DataFrame(data1, columns = ['type_sale',
'structure',
])
# Ploting
labels = ['open','closed']
fig, axs = plt.subplots(2,2, figsize=(10,8))
plt.subplots_adjust(wspace=0.2, hspace=0.6)
df1.plot(x='type_sale', y='structure',labels=labels,autopct='%1.1f%%',kind='pie', title='Stacked Bar Graph by dataframe',ax=axs[0,0])
df.plot(x='type_sale', kind='bar', stacked=True, title='Stacked Bar Graph by dataframe', ax=axs[0,1])
df.plot(x='type_sale', kind='bar', stacked=True, title='Stacked Bar Graph by dataframe',ax=axs[1,0])
df.plot(x='type_sale', kind='bar', stacked=True,title='Stacked Bar Graph by dataframe', ax=axs[1,1])
plt.suptitle(t='Stacked Bar Graph by dataframe', fontsize=16)
plt.show()
If you compare the first pie plot with others, you can spot a big difference. Namely, the first pie plot is not enclosed with a black line (rectangle), while the other is enclosed.
So can anybody help me with how to solve this problem?
After playing around myself, it seems that this is working, but I think the pie gets stretched, which doesn't look that good.
EDIT
found a better solution with set_adjustable
also two options how you create the piechart, the frame and ticks differ in a bit.
# 1
axs[0,0].pie(df1['structure'],labels=labels,autopct='%1.1f%%',frame=True,radius=10)
axs[0,0].set_title('Stacked Bar Graph by dataframe')
# 2
df1.plot(x='type_sale', y='structure',labels=labels,autopct='%1.1f%%',kind='pie', title='Stacked Bar Graph by dataframe',ax=axs[0,0])
axs[0,0].set_frame_on(True)
axs[0,0].set_adjustable('datalim')

Plotting multiple lines grouped by one column dataframe, with date time as x axis [duplicate]

In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()

Adjusting the color coding on a barplot so that all values are color coded correctly in matplotlib

I have a barplot that plots Rates by State and by Category (there are 5 categories) but the problem is that some States have more categories than other states.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.plot(kind='bar', color=my_colors, orientation='vertical')
plt.tight_layout()
plt.show()
This does a good job with most of what I need to do however, what happens is that because some States do not have all values for status and hence do not appear in the plot, it makes some of the color coding incorrect because the colors are just shifted to repeat every 5 colors rather then based on whenever a value is missing or not. What can I do about this?
Possibly you want to show the data in a grouped fashion, namely to have 3 categories per group, such that each category has its own color.
In this case it seems this can easily be achieved by unstacking the multi-index dataframe,
df2.unstack().plot(...)
Complete example:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"state" : ["AL","AL","AL","AK", ],
"status" : ["Booked", "Rejected","Cancelled","Rejected"],
"0" : [1.5,2.5,3.5,1.0]})
df2 = df.groupby(['state','status']).size()/df.groupby(['state']).size()
fig, ax = plt.subplots()
plt.xlabel('State')
plt.ylabel('Bookings')
my_colors = 'gyr'
df2.unstack().plot(kind='bar', color=my_colors, orientation='vertical', ax=ax)
plt.tight_layout()
plt.show()

Scatter plot from multiple columns of a pandas dataframe

I have a pandas dataframe that looks as below:
Filename GalCer(18:1/12:0)_IS GalCer(d18:1/16:0) GalCer(d18:1/18:0)
0 A-1-1 15.0 1.299366 40.662458 0.242658 6.891069 0.180315
1 A-1-2 15.0 1.341638 50.237734 0.270351 8.367316 0.233468
2 A-1-3 15.0 1.583500 47.039423 0.241681 7.902761 0.201153
3 A-1-4 15.0 1.635365 53.139610 0.322680 9.578195 0.345681
4 B-1-10 15.0 2.370330 80.209846 0.463770 13.729810 0.395355
I am trying to plot a scatter sub-plots with a shared x-axis with the first column "Filename" on the x-axis. While I am able to generate barplots, the following code gives me a key error for a scatter plot:
import matplotlib.pyplot as plt
colnames = list (qqq.columns)
qqq.plot.scatter(x=qqq.Filename, y=colnames[1:], legend=False, subplots = True, sharex = True, figsize = (10,50))
KeyError: "['A-1-1' 'A-1-2' 'A-1-3' 'A-1-4' 'B-1-10' ] not in index"
The following code for barplots works fine. Do I need to specify something differently for the scatterplots?
import matplotlib.pyplot as plt
colnames = list (qqq.columns)
qqq.plot(x=qqq.Filename, y=colnames[1:], kind = 'bar', legend=False, subplots = True, sharex = True, figsize = (10,30))
A scatter plot will require numeric values for both axes. In this case you can use the index as x values,
df.reset_index().plot(x="index", y="other column")
The problem is now that you cannot plot several columns at once using the scatter plot wrapper in pandas. Depending on what the reason for using a scatter plot are, you may decide to use a line plot instead, just without lines. I.e. you may specify linestyle="none" and marker="o" to the plot, such that points appear on the plot.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fn = ["{}_{}".format(i,j) for i in list("ABCD") for j in range(4)]
df = pd.DataFrame(np.random.rand(len(fn), 4), columns=list("ZXYQ"))
df.insert(0,"Filename",pd.Series(fn))
colnames = list (df.columns)
df.reset_index().plot(x="index", y=colnames[1:], kind = 'line', legend=False,
subplots = True, sharex = True, figsize = (5.5,4), ls="none", marker="o")
plt.show()
In case you absolutely need a scatter plot, you may create a subplots grid first and then iterate over the columns and axes to plot one scatter plot at a time to the respective axes.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
fn = ["{}_{}".format(i,j) for i in list("ABCD") for j in range(4)]
df = pd.DataFrame(np.random.rand(len(fn), 4), columns=list("ZXYQ"))
df.insert(0,"Filename",pd.Series(fn))
colnames = list (df.columns)
fig, axes = plt.subplots(nrows=len(colnames)-1, sharex = True,figsize = (5.5,4),)
for i, ax in enumerate(axes):
df.reset_index().plot(x="index", y=colnames[i+1], kind = 'scatter', legend=False,
ax=ax, c=colnames[i+1], cmap="inferno")
plt.show()

Combining FacetGrid and dual Y-axis in Pandas

I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.

Categories

Resources