Remove empty bars from grouped barplot - python

I have a grouped barplot. It's working very well, but I try to remove the empty barplots. They take too much space.
I have already tried :
%matplotlib inline
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
import matplotlib.pyplot as plt
import sys
import os
import glob
import seaborn as sns
import pandas as pd
import ggplot
from ggplot import aes
sns.set(style= "whitegrid", palette="pastel", color_codes=True )
tab_folder = 'myData'
out_folder ='myData/plots'
tab = glob.glob('%s/R*.tab'%(tab_folder))
#is reading all my data
for i, tab_file in enumerate(tab):
folder,file_name=os.path.split(tab_file)
s_id=file_name[:-4].replace('DD','')
df=pd.DataFrame.from_csv(tab_file, sep='\t')
df_2 = df.groupby(['name','ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0]
table = pd.pivot_table(df_2, index='name',columns='ab', values='count' )
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], ax = ax)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(4)
ax.set_title(s_id).update({'color':'black', 'size':5, 'family':'monospace'})
ax.set_xlabel('')
ax.set_ylabel('')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], bbox_to_anchor=(1, 1.05),prop= {'size': 4} )
png_t = '%s/%s.b.png'%(out_folder,s_id)
plt.savefig(png_t, dpi = 500)
But it's not working. The bars are still the same.
Is there any other method to remove empty bars?

Your question is not complete. I don't know what you're trying to accomplish, but from what you've said I'd guess that you are trying not to display empty pivot pairs.
This is not possible by standard means of pandas. Plot of groups need to display all of them even NaNs which will be plot as "empty bars".
Furthermore after groupby every group is at least size of one, so df_2[df_2['count'] != 0] is allways true.
For example
df = pd.DataFrame([['nameA', 'abA'], ['nameB', 'abA'],['nameA','abB'],['nameD', 'abD']], columns=['names', 'ab'])
df_2 = df.groupby(['names', 'ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0] # this line has no effect
table = pd.pivot_table(df_2, index='names',columns='ab', values='count' )
table
gives
ab abA abB abD
names
nameA 1.00 1.00 NaN
nameB 1.00 NaN NaN
nameD NaN NaN 1.00
and
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'])
shows
And that's the way it is. Plot need to show all groups after pivot.
EDIT
You can also use stacked plot, to get rid of spaces
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], stacked=True)

Related

Heatmap Fill empty spaces with black

Suppose this is the data at hand:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
data = {'trajectory': [101,102,102,102,102,102,102,102,104,104,104,104,104,104,104,107,107,107,107,
107,107,107,107,107,108,108,108,108,108,108,108,109,109,109,109,109,109,112,
112,112,112,112,113,113,113,113,114,114,114,114],
'segment': [1,1,1,1,2,2,3,3,1,1,2,2,2,3,3,1,1,2,2,2,2,3,3,3,1,1,1,
2,2,2,2,1,1,1,2,2,2,1,1,2,2,2,1,2,2,3,1,2,2,2],
'prediction': [3,0,0,1,3,3,2,2,0,0,4,4,2,0,0,0,0,2,2,2,3,0,0,2,0,0,1,1,
1,1,0,1,2,1,3,3,3,1,1,4,4,2,1,4,4,3,0,3,3,2]}
df = pd.DataFrame(data)
df.head(2)
trajectory segment prediction
0 101 1 3
1 102 1 0
And this is plotted like so:
plot_data = (df.value_counts()
.sort_values(ascending=False)
.reset_index()
.drop_duplicates(['trajectory', 'segment'])
.pivot_table(index='trajectory', columns='segment', values='prediction',))
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m', ])
fig, ax = plt.subplots(figsize=(10,6))
sns.heatmap(plot_data,vmin=-0.5, vmax=4.5,cmap=cmap, annot=True)
Giving:
I want to fill all white cells to black. For that I have to replace all NaN values in my plot_data to some value, say 99, and add black color code k to cmap.
plot_data = (df.value_counts()
.sort_values(ascending=False)
.reset_index()
.drop_duplicates(['trajectory', 'segment'])
.pivot_table(index='trajectory', columns='segment', values='prediction',
fill_value=99))
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m', 'k'])
fig, ax = plt.subplots(figsize=(10,6))
sns.heatmap(plot_data,vmin=-0.5, vmax=4.5,cmap=cmap, annot=True)
And plot again, giviing:
Confusion: 4 is coloured k: black, same as 99, instead of m: magenta. Plus, I do not like to annotate the null value cells with 99. It is there as a placeholder, since I cannot plot when NaN values are replaced with character such as -.
Intended results:
something like the following
You can use set_bad to set the color for masked values of your colorbar to opaque black:
cmap = mcolors.ListedColormap(['c', 'b', 'g', 'y','m',])
cmap.set_bad('k')
(in your colormap definition it's transparent black, that's why you can see the Axes patch in the first place).
Ah, All I need was to set the background to black, before adding heatmap, like so:
ax.set_facecolor('black')
sns.heatmap(plot_data, vmin=-0.5, vmax=4.5, cmap=cmap, annot=True)
And that's it.

How to change the legend font size of pd.DataFrame.plot() when `secondary_y` is used?

Question
I have used the secondary_y argument in pd.DataFrame.plot().
While trying to change the fontsize of legends by .legend(fontsize=20), I ended up having only 1 column name in the legend when I actually have 2 columns to be printed on the legend.
This problem (having only 1 column name in the legend) does not take place when I did not use secondary_y argument.
I want all the column names in my dataframe to be printed in the legend, and change the fontsize of the legend even when I use secondary_y while plotting dataframe.
Example
The following example with secondary_y shows only 1 column name A, when I have actually 2 columns, which are A and B.
The fontsize of the legend is changed, but only for 1 column name.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
df.plot(secondary_y = ["B"], figsize=(12,5)).legend(fontsize=20, loc="upper right")
When I do not use secondary_y, then legend shows both of the 2 columns A and B.
import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
df.plot(figsize=(12,5)).legend(fontsize=20, loc="upper right")
To manage to customize it you have to create your graph with subplots function of Matplotlib:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
df = pd.DataFrame(np.random.randn(24*3, 2),
index=pd.date_range('1/1/2019', periods=24*3, freq='h'))
df.columns = ['A', 'B']
#define colors to use
col1 = 'steelblue'
col2 = 'red'
#define subplots
fig,ax = plt.subplots()
#add first line to plot
lns1=ax.plot(df.index,df['A'], color=col1)
#add x-axis label
ax.set_xlabel('dates', fontsize=14)
#add y-axis label
ax.set_ylabel('A', color=col1, fontsize=16)
#define second y-axis that shares x-axis with current plot
ax2 = ax.twinx()
#add second line to plot
lns2=ax2.plot(df.index,df['B'], color=col2)
#add second y-axis label
ax2.set_ylabel('B', color=col2, fontsize=16)
#legend
ax.legend(lns1+lns2,['A','B'],loc="upper right",fontsize=20)
#another solution is to create legend for fig,:
#fig.legend(['A','B'],loc="upper right")
plt.show()
result:
this is a somewhat late response, but something that worked for me was simply setting plt.legend(fontsize = wanted_fontsize) after the plot function.

Skip bars in Seaborn bar plot for which no data exists [duplicate]

I have a grouped barplot. It's working very well, but I try to remove the empty barplots. They take too much space.
I have already tried :
%matplotlib inline
import matplotlib as mpl
from matplotlib.gridspec import GridSpec
import matplotlib.pyplot as plt
import sys
import os
import glob
import seaborn as sns
import pandas as pd
import ggplot
from ggplot import aes
sns.set(style= "whitegrid", palette="pastel", color_codes=True )
tab_folder = 'myData'
out_folder ='myData/plots'
tab = glob.glob('%s/R*.tab'%(tab_folder))
#is reading all my data
for i, tab_file in enumerate(tab):
folder,file_name=os.path.split(tab_file)
s_id=file_name[:-4].replace('DD','')
df=pd.DataFrame.from_csv(tab_file, sep='\t')
df_2 = df.groupby(['name','ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0]
table = pd.pivot_table(df_2, index='name',columns='ab', values='count' )
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], ax = ax)
for label in (ax.get_xticklabels() + ax.get_yticklabels()):
label.set_fontsize(4)
ax.set_title(s_id).update({'color':'black', 'size':5, 'family':'monospace'})
ax.set_xlabel('')
ax.set_ylabel('')
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles[::-1], labels[::-1], bbox_to_anchor=(1, 1.05),prop= {'size': 4} )
png_t = '%s/%s.b.png'%(out_folder,s_id)
plt.savefig(png_t, dpi = 500)
But it's not working. The bars are still the same.
Is there any other method to remove empty bars?
Your question is not complete. I don't know what you're trying to accomplish, but from what you've said I'd guess that you are trying not to display empty pivot pairs.
This is not possible by standard means of pandas. Plot of groups need to display all of them even NaNs which will be plot as "empty bars".
Furthermore after groupby every group is at least size of one, so df_2[df_2['count'] != 0] is allways true.
For example
df = pd.DataFrame([['nameA', 'abA'], ['nameB', 'abA'],['nameA','abB'],['nameD', 'abD']], columns=['names', 'ab'])
df_2 = df.groupby(['names', 'ab']).size().reset_index(name='count')
df_2 = df_2[df_2['count'] != 0] # this line has no effect
table = pd.pivot_table(df_2, index='names',columns='ab', values='count' )
table
gives
ab abA abB abD
names
nameA 1.00 1.00 NaN
nameB 1.00 NaN NaN
nameD NaN NaN 1.00
and
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'])
shows
And that's the way it is. Plot need to show all groups after pivot.
EDIT
You can also use stacked plot, to get rid of spaces
table.plot(kind='barh', width = 0.9, color = ['b', 'g', 'r'], stacked=True)

Line color as a function of column values in pandas dataframe

I am trying to plot two columns of a pandas dataframe against each other, grouped by a values in a third column. The color of each line should be determined by that third column, i.e. one color per group.
For example:
import pandas as pd
from matplotlib import pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3]})
df.groupby('colors').plot('x','y',ax=ax)
If I do it this way, I end up with three different lines plotting x against y, with each line a different color. I now want to determine the color by the values in 'colors'. How do I do this using a gradient colormap?
Looks like seaborn is applying the color intensity automatically based on the value in hue..
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'x': [0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3,0.1,0.2,0.3],'y':[1,2,3,2,3,4,4,3,2,3,4,2], 'colors':[0.3,0.3,0.3,0.7,0.7,0.7,1.3,1.3,1.3,1.5,1.5,1.5]})
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors')
Gives:
you can change the colors by adding palette argument as below:
import seaborn as sns
sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
#more combinations : viridis, mako, flare, etc.
gives:
Edit (for colormap):
based on answers at Make seaborn show a colorbar instead of a legend when using hue in a bar plot?
import seaborn as sns
fig = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors', palette = 'mako')
norm = plt.Normalize(vmin = df['colors'].min(), vmax = df['colors'].max())
sm = plt.cm.ScalarMappable(cmap="mako", norm = norm)
fig.figure.colorbar(sm)
fig.get_legend().remove()
plt.show()
gives..
Hope that helps..
Complementing to Prateek's very good answer, once you have assigned the colors based on the intensity of the palette you choose (for example Mako):
plots = sns.lineplot(data = df, x = 'x', y = 'y', hue = 'colors',palette='mako')
You can add a colorbar with matplotlib's function plt.colorbar() and assign the palette you used:
sm = plt.cm.ScalarMappable(cmap='mako')
plt.colorbar(sm)
After plt.show(), we get the combined output:

matplotlib categorical bar chart creates unwanted whitespace

I have a dataframe that looks like this:
import numpy as np
import pandas as pd
location = list(range(1, 34))
location += [102, 172]
stress = np.random.randint(1,1000, len(location))
group = np.random.choice(['A', 'B'], len(location))
df = pd.DataFrame({'location':location, 'stress':stress, 'group':group})
df[['location', 'group']] = df[['location', 'group']].astype(str)
Note: location and group are both strings
I'm trying to create a a bar plot so that location (categorical) is on the x axis, and stress is the height of each bar. Furthermore, I want to color each bar with a different colour for each group
I've tried the following:
f, axarr = plt.subplots(1, 1)
axarr.bar(df['location'], df['stress'])
plt.xticks(np.arange(df.shape[0]) + 1, df['location'])
plt.show()
However, this produces:
I'm not sure why there are blank spaces between the end bars. I'm guessing its because of the 102 and 172 values in location, however, that column is a string so I'm expecting it to be treated as a categorical variable, with all bars placed next to each other regardless of location "value". I tried to correct for this by manually specifying the xtick location and labels but it didn't seem to work
Finally, is there a quick way to colour each bar by group without having to manually iterate over each unique group value?
If your location is categorical data, don't make your bar plot with that. Use np.arange(df.shape[0]) to make the bar plot and set ticklabels later:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
location = list(range(1, 34))
location += [102, 172]
stress = np.random.randint(1,1000, len(location))
group = np.random.choice(['A', 'B'], len(location))
df = pd.DataFrame({'location':location, 'stress':stress, 'group':group})
df[['location', 'group']] = df[['location', 'group']].astype(str)
f, axarr = plt.subplots(1, 1)
bars = axarr.bar(np.arange(df.shape[0]), df['stress'])
for b, g in zip(bars.patches, df['group']):
if g == 'A':
b.set_color('b')
elif g == 'B':
b.set_color('r')
plt.xticks(np.arange(df.shape[0]) + bars.patches[0].get_width() / 2, df['location'])
plt.setp(axarr.xaxis.get_ticklabels(), rotation=90)
plt.show()
Don't know if there is a concise way to set bar color in bulk. An iteration is not too bad...

Categories

Resources