How to increase the available space for ylabel? - python

Good morning!
I'm making some bar plots with Seaborn, but I've difficulties getting a proper ylabel for them.
Here is a reproductible example:
import pandas as pd
import os
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
from pdb import set_trace as bp
name = 'test.pdf'
data = pd.DataFrame({'Labels': ['Label', 'Longer label', 'A really really large label'], 'values': [200, 100, 300]})
sns.set_style("dark")
ax = sns.barplot(y = data['Labels'], x = data['values'], data = data)
ax.set(ylabel = 'Labels', xlabel = 'Values')
plt.savefig(name)
plt.close()
As you can see, second and third labels ('Longer label' and 'A really really large label') can't be shown completely and I can't solve it.
Furthermore, I would want to know how to delete these short black lines at top and at left of the image.
Thanks you very much!!

You need to specify bbox_inches='tight' while saving the figure as
plt.savefig(name, bbox_inches='tight')
If you are working with JuPyter notebooks, then plt.tight_layout() would work for inline plots as commented above by #ALollZ

Related

Show median and quantiles on Seaborn pairplot (Python)

I am making a corner plot using Seaborn. I would like to display lines on each diagonal histogram showing the median value and quantiles. Example shown below.
I usually do this using the Python package 'corner', which is straightforward. I want to use Seaborn just because it has better aesthetics.
The seaborn plot was made using this code:
import seaborn as sns
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})
Seaborn provides test data sets that come in handy to explain something you want to change to the default behavior. That way, you don't need to generate your own test data, nor to supply your own data that can be complicated and/or sensitive.
To update the subplots in the diagonal, there is g.map_diag(...) which will call a given function for each individual column. It gets 3 parameters: the data used for the x-axis, a label and a color.
Here is an example to add vertical lines for the main quantiles, and change the title. You can add more calculations for further customizations.
import matplotlib.pyplot as plt
import seaborn as sns
def update_diag_func(data, label, color):
for val in data.quantile([.25, .5, .75]):
plt.axvline(val, ls=':', color=color)
plt.title(data.name, color=color)
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, corner=True, diag_kws={'kde': True})
g.map_diag(update_diag_func)
g.fig.subplots_adjust(top=0.97) # provide some space for the titles
plt.show()
Seaborn is built ontop of matplotlib so you can try this:
import seaborn as sns
from matplotlib import pyplot as plt
df = pd.DataFrame(samples_new, columns = ['r1', 'r2', 'r3'])
cornerplot = sns.pairplot(df, corner=True, kind='kde',diag_kind="hist", diag_kws={'color':'darkslateblue', 'alpha':1, 'bins':10}, plot_kws={'color':'darkslateblue', 's':10, 'alpha':0.8, 'fill':False})
plt.text(300, 250, "An annotation")
plt.show()

Matplotlib inline in Jupyter - how to contol when the plot is shown?

I have a function that creates a figure and for some reason it is shown in Jupyter notebook twice, even though I didn't run show at all. I pass the fig and ax as an output of this function, and plan to show it only later.
I get confused between plt, fig and ax functionaries and guess that the answer is hidden somewhere there.
Here is an anonymised version of my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
def plot_curve(dummydata):
# builds a chart
fig,ax = plt.subplots(1) # get subplots
fig.set_figheight(7)
fig.set_figwidth(12) #set shape
plt.plot(dummydata.x1, dummydata.y1,label = 'l1') #curve 1
plt.plot(dummydata.x2, dummydata.y2,label = 'l2') #curve2
plt.xlabel('xlabel') #labels
plt.ylabel('xlabel')
plt.yscale('linear') #scale and bounds
plt.ylim(0,100)
ymin,ymax= ax.get_ylim()
ax.axhline(1, color='k', linestyle=':', label = 'lab1') #guideline - horizontal
ax.axvline(2, color='r',linestyle='--', label = 'lab2') #guideline - vertical
ax.axvline(3, color='g',linestyle='--', label = 'lab3') #guideline - vertical
ax.arrow(1,2,3,0, head_width=0.1, head_length=0.01, fc='k', ec='k') # arrow
rect = mpl.patches.Rectangle((1,2), 2,3, alpha = 0.1, facecolor='yellow',
linewidth=0 , label= 'lab4') #yellow area patch
ax.add_patch(rect)
plt.legend()
plt.title('title')
return fig,ax
and then call it with:
for i in range(3):
dummydata = pd.DataFrame({
'x1':np.arange(1+i,100,0.1),
'y1':np.arange(11+i,110,0.1),
'x2':np.arange(1+i,100,0.1),
'y2':np.arange(21+i,120,0.1)
})
fig,ax = plot_curve(dummydata) #get the chart
What should I change to not show the figure by default, and show it only by my command?
Thanks
Try disabling matplotlib interactive mode using plt.ioff(). With interactive mode disabled the plots will only be shown with an explicit plt.show().
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
# Desactivate interactive mode
plt.ioff()
def plot_curve(dummydata):
# the same code as before
Then in another cell
for i in range(3):
dummydata = pd.DataFrame({
'x1':np.arange(1+i,100,0.1),
'y1':np.arange(11+i,110,0.1),
'x2':np.arange(1+i,100,0.1),
'y2':np.arange(21+i,120,0.1)
})
# I'am assuming this should not be in the for loop
# The plot will NOT be shown because we are not in interactive mode
fig, ax = plot_curve(dummydata) #get the chart
No plot will be shown yet.
Now in another cell
# Now ANY plot (figure) which was created and not shown yet will be finally shown
plt.show()
The plot is finally shown. Note that if you have created several plots all of them will be shown now.
Try this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib
With this importing you should not see the figure after plotting.
But you can see the figure by writing fig to IPython cell:
dummydata = pd.DataFrame({
'x1':np.arange(1,100,0.1),
'y1':np.arange(11,110,0.1),
'x2':np.arange(1,100,0.1),
'y2':np.arange(21,120,0.1)
})
fig,ax = plot_curve(dummydata) #get the chart
fig # Will now plot the figure.
Is this the desired output?

Setting different color for each class in a scatter plot which is made by using pd.pivot_table

I am new to Pandas and its libraries. By using the following code I can make a scatter plot of my 'class' in the plane 'Month' vs 'Amount'. Because I consider more than one class I would like to use colors for distinguishing each class and to see a legend in the figure.
Below my first attempt can generate dots for each given class having a different color but it can not generate the right legend. On the contrary the second attempt can generate the right legend but labeling is not correct. I can indeed visualize the first letter of each class name. Moreover this second attempt plots as many figures as the number of classes. I would like to see how I can correct both my attempts. Any ideas? suggestions? Thanks in advance.
ps. I wanted to use
colors = itertools.cycle(['gold','blue','red','chocolate','mediumpurple','dodgerblue'])
as well, so that I could decide the colors. I could not make it though.
Attempts:
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import matplotlib.cm as cm
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
print(df.head(25))
print(df['class'].unique())
for cls1 in df['class'].unique():
test1= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'])
print(test1)
colors = cm.rainbow(np.linspace(0,2,len(df['class'].unique())))
fig, ax = plt.subplots(figsize=(8,6))
for cls1,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls1], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,ax=ax,color=c,s=50).legend(df['class'].unique(),scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
for cls2,c in zip(df['class'].unique(),colors):
# SCATTER PLOT
test = pd.pivot_table(df[df['class']==cls2], index=['class', 'Month', 'Year'], values=['Amount'], aggfunc=np.sum).reset_index()
test.plot(kind='scatter', x='Month',y='Amount', figsize=(16,6),stacked=False,color=c,s=50).legend(cls2,scatterpoints=1,loc='upper left',ncol=3,fontsize=10.5)
plt.show()
enter image description here
Up-to-date code
I would like to plot the following code via scatter plot.
for cls1 in df['class'].unique():
test3= pd.pivot_table(df[df['class']==cls1], index=['class', 'Month'], values=['Amount'], aggfunc=np.sum)
print(test3)
Unlike above here a class appears only once each month thanks to the sum over Amount.
Here my attempt:
for cls2 in df['class'].unique():
test2= pd.pivot_table(df[df['class']==cls2], index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
print(test2)
sns.lmplot(x='Year' , y='Amount', data=test2, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()
This gives me one plot for each class. A part from the first one (class=car) which shows different colors, the others seem to be ok. Despite this, I would like to have only one plot with all classes..
After the Marvin Taschenberger's useful help here is up-to-date result:
enter image description here
I get a white dot instead a colorful one and the legend has a different place in the figure with respect to your figure. Moreover I can not see the year labels correctly. Why?
An easy way to work around ( unfortunately not solving) your problem is letting seaborn deal with the heavy lifting due to the simple line
sns.lmplot(x='Month' , y='Amount', data=df, hue='class',palette='hls', fit_reg=False,size= 8, aspect=5/3, legend_out=False)
You could also plug in other colors for palette
EDIT : how about this then :
`
import pandas as pd
import numpy as np
import random
from matplotlib import pyplot as plt
import seaborn as sns
np.random.seed(176)
random.seed(16)
df = pd.DataFrame({'class': random.sample(['living room','dining room','kitchen','car','bathroom','office']*10, k=25),
'Amount': np.random.sample(25)*100,
'Year': random.sample(list(range(2010,2018))*50, k=25),
'Month': random.sample(list(range(1,12))*100, k=25)})
frame = pd.pivot_table(df, index=['class','Year'], values=['Amount'], aggfunc=np.sum).reset_index()
sns.lmplot(x='Year' , y='Amount', data=frame, hue='class',palette='hls', fit_reg=False,size= 5, aspect=5/3, legend_out=False,scatter_kws={"s": 70})
plt.show()

Change Error Bar Markers (Caplines) in Pandas Bar Plot

so I am plotting error bar of pandas dataframe. Now the error bar has a weird arrow at the top, but what I want is a horizontal line. For example, a figure like this:
But now my error bar ends with arrow instead of a horinzontal line.
Here is the code i used to generate it:
plot = meansum.plot(
kind="bar",
yerr=stdsum,
colormap="OrRd_r",
edgecolor="black",
grid=False,
figsize=(8, 2),
ax=ax,
position=0.45,
error_kw=dict(ecolor="black", elinewidth=0.5, lolims=True, marker="o"),
width=0.8,
)
So what should I change to make the error become the one I want. Thx.
Using plt.errorbar from matplotlib makes it easier as it returns several objects including the caplines which contain the marker you want to change (the arrow which is automatically used when lolims is set to True, see docs).
Using pandas, you just need to dig the correct line in the children of plot and change its marker:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5, lolims=True),width=0.8)
for ch in plot.get_children():
if str(ch).startswith('Line2D'): # this is silly, but it appears that the first Line in the children are the caplines...
ch.set_marker('_')
ch.set_markersize(10) # to change its size
break
plt.show()
The result looks like:
Just don't set lolim = True and you are good to go, an example with sample data:
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = pd.DataFrame({"val":[1,2,3,4],"error":[.4,.3,.6,.9]})
meansum = df["val"]
stdsum = df["error"]
plot = meansum.plot(kind='bar',yerr=stdsum,colormap='OrRd_r',edgecolor='black',grid=False,figsize=(8,2),ax=ax,position=0.45,error_kw=dict(ecolor='black',elinewidth=0.5),width=0.8)
plt.show()

matplotlib one legend entry too much

I am trying to do an errorplot with different marker-colors in python 2.7. Additionally I am including to line plots.
I found a way here: matplotlib errorbar plot - using a custom colormap using a scatter plot for the colors and errorbar() for the bars.
As you can see in my example code, in the legend I always get one entry too much (just at the top). I cannot figure out, why. Tried to exclude it, which did not work. Did not find something helpful either, as I cannot really call the first legend entry.
Any ideas?
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
data = pd.DataFrame()
data['x'] = range(10)
data['y'] = data.x
data['err'] = .5
data['col'] = np.where(data.y<5,'r','b')
### setup 1-1 line
lin = pd.DataFrame() # setting 1-1 line
lin['x'] = range(10)
lin['y'] = range(10)
### setup 1-2 line
lin['x2'] = lin.x
lin['y2'] = lin.y
plt.errorbar(data.x, data.y, yerr = data.err, \
xerr = .3, fmt=' ', markersize=4, zorder = 1)
plt.scatter(data.x,data.y, marker='o', color = data.col, zorder = 2)
plt.plot(lin.x,lin.y,'g-')
plt.plot(lin.x2,1.8*lin.y2,'r-')
plt.legend(['','1-1 line', '1-1.8 line','holla','molla'], loc=4)
What I get is:
Thanks for your help!
To clean this whole thing up, I post a proper answer instead of comments.
The problem could be solved by upgrading matplotlib from 1.3.1 to 1.5.1. Easy as that.

Categories

Resources