Adjust spacing on X-axis in python boxplots - python

I plot boxplots using sns.boxplot and pandas.DataFrame.boxplot in python 3.x.
And I want to ask is it possible to adjust the spacing between boxes in boxplot, so the box of Group_b is farther right to the box of Group_a than in the output figures. Thanks
Codes:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
sns.boxplot( y="value" , x="name" , data=dataframe )
Output figure:
dataframe.boxplot("value" ,by = "name" )
Output figure 2:

The distance between the two boxes is determined by the x axis limits. For a constant distance in data units between the boxes, what makes them spaced more or less appart is the fraction of this data unit distance compared to the overall data space shown on the axis.
For example, in the seaborn case, the first box sits at x=0, the second at x=1. The difference is 1 unit. The maximal distance between the two boxplots is hence achieved by setting the x axis limits to those exact limits,
ax.set_xlim(0, 1)
Of course this will cut half of each box.
So a more useful value would be ax.set_xlim(0-val, 1+val) with val being somewhere in the range of the width of the boxes.
One needs to mention that pandas uses different units. The first box is at x=1, the second at x=2. Hence one would need something like ax.set_xlim(1-val, 2+val).
The following would add a slider to the plot to see the effect of different values.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
dict_a = {'value':[1,2,3,7,8,9],'name':['Group_a']*3+['Group_b']*3}
dataframe = pd.DataFrame(dict_a)
fig, (ax, ax2, ax3) = plt.subplots(nrows=3,
gridspec_kw=dict(height_ratios=[4,4,1], hspace=1))
sns.boxplot( y="value" , x="name" , data=dataframe, width=0.1, ax=ax)
dataframe.boxplot("value", by = "name", ax=ax2)
from matplotlib.widgets import Slider
slider = Slider(ax3, "", valmin=0, valmax=3)
def update(val):
ax.set_xlim(-val, 1+val)
ax2.set_xlim(1-val, 2+val)
slider.on_changed(update)
plt.show()

Related

Change the tick frequency on the x axis using a for loop [duplicate]

I do have a question with matplotlib in python. I create different figures, where every figure should have the same height to print them in a publication/poster next to each other.
If the y-axis has a label on the very top, this shrinks the height of the box with the plot. So I use MaxNLocator to remove the upper and lower y-tick. In some plots, I want to have the 1.0 as a number on the y-axis, because I have normalized data. So I need a solution, which expands in these cases the y-axis and ensures 1.0 is a y-Tick, but does not corrupt the size of the figure using tight_layout().
Here is a minimal example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0,1,num=11)
y = np.linspace(1,.42,num=11)
fig,axs = plt.subplots(1,1)
axs.plot(x,y)
locator=MaxNLocator(prune='both',nbins=5)
axs.yaxis.set_major_locator(locator)
plt.tight_layout()
fig.show()
Here is a link to a example-pdf, which shows the problems with height of upper boxline.
I tried to work with adjust_subplots() but this is of no use for me, because I vary the size of the figures and want to have same the font size all the time, which changes the margins.
Question is:
How can I use MaxNLocator and specify a number which has to be in the y-axis?
Hopefully someone of you has some advice.
Greetings,
Laenan
Assuming that you know in advance how many plots there will be in 1 row on a page one way to solve this would be to put all those plots into one figure - matplotlib will make sure they are alinged on axes:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
x = np.linspace(0, 1, num=11)
y = np.linspace(1, .42, num=11)
fig, (ax1, ax2) = plt.subplots(1,2, figsize=(8,3), gridspec_kw={'wspace':.2})
ax1.plot(x,y)
ax2.plot(x,y)
locator=MaxNLocator(prune='both', nbins=5)
ax1.yaxis.set_major_locator(locator)
# You don't need to use tight_layout and using it might give an error
# plt.tight_layout()
fig.show()

Is there a way to adjust the axes limits of pairplot(), but not as individual plots?

Is there a way to adjust the axes limits of pairplot(), but not as individual plots? Maybe a setting to produce better axes limits?
I would like to have the plots with a bigger range for the axes. My plots axes allows all the data to be visualized, but it is too 'zoomed in'.
My code is:
import pandas as pd
mport matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
g = sns.pairplot(iris, hue = 'species', diag_kind = 'hist', palette = 'Dark2', plot_kws={"s": 20})
The link for my plot and what I would like to plot to look like is here:
pairplot
To change the subplots, g.map(func, <parameters>) can be used. A small problem is that func needs to accept color as parameter, and plt.margins() gives an error when color is used. Moreover, map uses x and y to indicate the row and column variables. You could write a dummy function that simply calls plt.margin(), for example g.map(lambda *args, **kwargs: plt.margins(x=0.2, y=0.3)).
An alternative is to loop through g.axes.flat and call ax.margins() on each of them. Note that many axes are shared in x and/or y direction. The diagonal is treated differently; for some reason ax.margins needs to be called a second time on the diagonal.
To have the histogram for the different colors stacked instead of overlapping, diag_kws={"multiple": "stack"} can be set.
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, hue='species', diag_kind='hist', palette='Dark2',
plot_kws={"s": 20}, diag_kws={"multiple": "stack"})
# g.map(plt.margins, x=0.2, y=0.2) # gives an error
for ax in g.axes.flat:
ax.margins(x=0.2, y=0.2)
for ax in g.diag_axes:
ax.margins(y=0.2)
plt.show()
PS: still another option, is to change the rcParams which will have effect on all the plots created later in the code:
import matplotlib as mpl
mpl.rcParams['axes.xmargin'] = 0.2
mpl.rcParams['axes.ymargin'] = 0.2

How can I have two different linear scales in x-axis in python?

Is there a way that I can set two different scales at the x-axis in a python plot?
I have following code:
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd
data=pd.read_csv(file, names=['Wavenumber', 'Intensity'])
fig=plt.figure()
ax=fig.add_subplot(1,1,1)
ax.plot(data['Wavenumber'], data['Intensity'])
ax.invert_xaxis()
ax.set_xticks([4000,3000,2000,1600,1200,800,400])
plt.show()
This gives:
But I would like to have equal spacing between the ticks, so a linear scaling from 4000 to 2000 in steps of 1000, and then again linear scaling from 2000 to 400 in steps of 400. This should look like this:
Creating a custom scale in matplotlib can be quite an effort. As you only need two different linear scales, it is easier to use a workaround consisting of joining two subplots together. With many data points located near the boundary between the two scales (as in your case), the jump from one to the other will not cause any irregular space between the ticks around the boundary if you were to show many tick marks (contrary to here). All you need is to find the data point closest to the boundary to seamlessly connect both subplots, as illustrated in the following example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
rng = np.random.default_rng(seed=1)
x = np.linspace(4000, 400, num=50)
y = 1 - rng.exponential(scale=0.1, size=x.size)
df = pd.DataFrame(dict(Wavenumber=x, Intensity=y))
# Select data for each subplot by using a boundary point
x_boundary = min(df['Wavenumber'], key=lambda x: abs(x-2000))
df1 = df[df['Wavenumber'] >= x_boundary]
df2 = df[df['Wavenumber'] <= x_boundary]
# Select x-axis ticks for each subplot
ticks = np.array([4000, 3000, 2000, 1600, 1200, 800, 400])
tk1 = ticks[ticks >= x_boundary]
tk2 = ticks[ticks <= x_boundary]
# Create figure with 2 Axes side-by-side with no space in between
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5), sharey=True,
gridspec_kw=dict(wspace=0))
# Loop through both Axes to plot data, adjust x-axis limits and remove boundary spines
for ax, data, spine, tk in zip((ax1, ax2), (df1, df2), ('right','left'), (tk1, tk2)):
data.plot(x='Wavenumber', xlabel='', ax=ax, legend=None)
ax.set_xlim(data['Wavenumber'].iloc[[0,-1]])
ax.spines[spine].set_visible(False)
ax.set_xticks(tk)
# Additional formatting
ax2.tick_params(axis='y', length=0)
ax1.set_xlabel('Wavenumber', x=1, labelpad=10, size=12)
ax1.set_ylabel('Intensity', labelpad=10, size=12)
fig.suptitle('Plot with two linear x-axis scales joined together', size=16, y=0.95);

Superimposing plots in seaborn cause x-axis to misallign

I am having an issue trying to superimpose plots with seaborn. I am able to generate the two plots separetly as
fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(30, 7))
sns.lineplot(data=data1, y='MSE',x='pct_gc',ax=ax1)
sns.boxplot(x="pct_gc", y="MSE", data=data2,ax=ax2,width=0.4)
The output looks like this:
But when i try to put both plots superimposed, but assiging both to the same ax object.
fig, (ax1,ax2) = plt.subplots(ncols=2,figsize=(30, 7))
sns.lineplot(data=data1, y='MSE',x='pct_gc',ax=ax1)
sns.boxplot(x="pct_gc", y="MSE", data=data2,ax=ax2,width=0.4)
I am not able to identify with the X axis in the Lineplot changes when superimposing both plots (both plots X axis go from 0 to 0.069).
My goal is for both plots to be superimposed, while keeping the same X axis range.
Seaborn's boxplot creates categorical x-axis, with all boxes nicely with the same distance. Internally the x-axis is numbered as 0, 1, 2, ... but externally it gets the labels from 0 to 0.069.
To combine a line plot with a boxplot, matplotlib's boxplot can be addressed directly, so that positions and widths can be set explicitly. When patch_artist=True, a rectangle is created (instead of just lines), for which a facecolor can be given. manage_ticks=False prevents that boxplot changes the x ticks and their limits. Optionally notch=True would accentuate the median a bit more, but depending on the data, the confidence interval might be too large and look weird.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
data1 = pd.DataFrame({'pct_gc': np.linspace(0, 0.069, 200), 'MSE': np.random.normal(0.02, 0.1, 200).cumsum()})
data1['pct_range'] = pd.cut(data1['pct_gc'], 10)
fig, ax1 = plt.subplots(ncols=1, figsize=(20, 7))
sns.lineplot(data=data1, y='MSE', x='pct_gc', ax=ax1)
for interval, color in zip(np.unique(data1['pct_range']), plt.cm.tab10.colors):
ax1.boxplot(data1[data1['pct_range'] == interval]['MSE'],
positions=[interval.mid], widths=0.4 * interval.length,
patch_artist=True, boxprops={'facecolor': color},
notch=False, medianprops={'color':'yellow', 'linewidth':2},
manage_ticks=False)
plt.show()

Second y axis and vertical line

I am creating a violinplot using the following code:
import seaborn as sns
ax = sns.violinplot(data=df[['SoundProduction','SoundForecast','diff']])
ax.set_ylabel("Sound power level [dB(A)]")
It gives me the folowing result:
Is there any way I can plot diff on a second y-axis so that all three series become clearly visible?
Also, is there a way to plot a vertical line in between 2 series? In this case I want a vertical line between SoundForecast and diff once they are plotted on two different axes.
You can achieve this using multiple subplots, which are easily set up using the plt.subplots (see lots more subplot examples).
This allows you to display your distributions on scales that are appropriate, and don't "waste" the display space. Most(all?) of seaborn's plotting functions accept the ax= argument so you can set the axes where the plot will be rendered. The axes also have clear separations between them.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# generate some random distribution data
n = 800 # samples
prod = 95 + 5 * np.random.beta(0.6, 0.5, size=n); # a bimodal distribution
forecast = prod + 3*np.random.randn(n) # forecast is noisy estimate around the "true" production
diff = prod-forecast # should be with mu 0 sigma 3
df = pd.DataFrame(np.array([prod, forecast, diff]).T, columns=['SoundProduction','SoundForecast','diff']);
# set up two subplots, with one wider than the other
fig, ax = plt.subplots(1,2, num=1, gridspec_kw={'width_ratios':[2,1]})
# plot violin distribution estimates separately so the y-scaling makes sense in each group
sns.violinplot(data=df[['SoundProduction','SoundForecast']], ax=ax[0])
sns.violinplot(data=df[['diff']], ax=ax[1])

Categories

Resources