Independent axis for each subplot in pandas boxplot

Independent axis for each subplot in pandas boxplot - python

The below code helps in obtaining subplots with unique colored boxes. But all subplots share a common set of x and y axis. I was looking forward to having independent axis for each sub-plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
df = pd.DataFrame(np.random.rand(140, 4), columns=['A', 'B', 'C', 'D'])
df['models'] = pd.Series(np.repeat(['model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'], 20))
bp_dict = df.boxplot(
by="models",layout=(2,2),figsize=(6,4),
return_type='both',
patch_artist = True,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
for row_key, (ax,row) in bp_dict.iteritems():
ax.set_xlabel('')
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
Here is an output of the above code:
I am trying to have separate x and y axis for each subplot...

You need to create the figure and subplots before hand and pass this in as an argument to df.boxplot(). This also means you can remove the argument layout=(2,2):
fig, axes = plt.subplots(2,2,sharex=False,sharey=False)
Then use:
bp_dict = df.boxplot(
by="models", ax=axes, figsize=(6,4),
return_type='both',
patch_artist = True,
)

You may set the ticklabels visible again, e.g. via
plt.setp(ax.get_xticklabels(), visible=True)
This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.

If you really think it is necessary to un-share the axes after the creation of the boxplot array, you can do this, but you have to do everything 'by hand'. Searching a while through stackoverflow and looking at the matplotlib documentation pages I came up with the following solution to un-share the yaxes of the Axes instances, for the xaxes, you would have to go analogously:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import PathPatch
from matplotlib.ticker import AutoLocator, AutoMinorLocator
##using differently scaled data for the different random series:
df = pd.DataFrame(
np.asarray([
np.random.rand(140),
2*np.random.rand(140),
4*np.random.rand(140),
8*np.random.rand(140),
]).T,
columns=['A', 'B', 'C', 'D']
)
df['models'] = pd.Series(np.repeat([
'model1','model2', 'model3', 'model4', 'model5', 'model6', 'model7'
], 20))
##creating the boxplot array:
bp_dict = df.boxplot(
by="models",layout = (2,2),figsize=(6,8),
return_type='both',
patch_artist = True,
rot = 45,
)
colors = ['b', 'y', 'm', 'c', 'g', 'b', 'r', 'k', ]
##adjusting the Axes instances to your needs
for row_key, (ax,row) in bp_dict.items():
ax.set_xlabel('')
##removing shared axes:
grouper = ax.get_shared_y_axes()
shared_ys = [a for a in grouper]
for ax_list in shared_ys:
for ax2 in ax_list:
grouper.remove(ax2)
##setting limits:
ax.axis('auto')
ax.relim() #<-- maybe not necessary
##adjusting tick positions:
ax.yaxis.set_major_locator(AutoLocator())
ax.yaxis.set_minor_locator(AutoMinorLocator())
##making tick labels visible:
plt.setp(ax.get_yticklabels(), visible=True)
for i,box in enumerate(row['boxes']):
box.set_facecolor(colors[i])
plt.show()
The resulting plot looks like this:
Explanation:
You first need to tell each Axes instance that it shouldn't share its yaxis with any other Axis instance. This post got me into the direction of how to do this -- Axes.get_shared_y_axes() returns a Grouper object, that holds references to all other Axes instances with which the current Axes should share its xaxis. Looping through those instances and calling Grouper.remove does the actual un-sharing.
Once the yaxis is un-shared, the y limits and the y ticks need to be adjusted. The former can be achieved with ax.axis('auto') and ax.relim() (not sure if the second command is necessary). The ticks can be adjusted by using ax.yaxis.set_major_locator() and ax.yaxis.set_minor_locator() with the appropriate Locators. Finally, the tick labels can be made visible using plt.setp(ax.get_yticklabels(), visible=True) (see here).
Considering all this, #DavidG's answer is in my opinion the better approach.

Related

How to label these points on the scatter plot

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_excel("path to the file")
fig, ax = plt.subplots()
fig.set_size_inches(7,3)
df = pd.DataFrame(data, columns = ['Player', 'Pos', 'Age'])
df.plot.scatter(x='Age',
y='Pos',
c='DarkBlue', xticks=([15,20,25,30,35,40]))
plt.show()
Got the plot but not able to label these points

Provided you'd like to label each point, you can loop over each coordinate plotted, assigning it a label using plt.text() at the plotted point's position, like so:
from matplotlib import pyplot as plt
y_points = [i for i in range(0, 20)]
x_points = [(i*3) for i in y_points]
offset = 5
plt.figure()
plt.grid(True)
plt.scatter(x_points, y_points)
for i in range(0, len(x_points)):
plt.text(x_points[i] - offset, y_points[i], f'{x_points[i]}')
plt.show()
In the above example it will give the following:
The offset is just to make the labels more readable so that they're not right on top of the scattered points.
Obviously we don't have access to your spreadsheet, but the same basic concept would apply.
EDIT
For non numerical values, you can simply define the string as the coordinate. This can be done like so:
from matplotlib import pyplot as plt
y_strings = ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
x_values = [i for i, string in enumerate(y_strings)]
# Plot coordinates:
plt.scatter(x_values, y_strings)
for i, string in enumerate(y_strings):
plt.text(x_values[i], string, f'{x_values[i]}:{string}')
plt.grid(True)
plt.show()
Which will provide the following output:

How to create two different figures based on the same previous one?

I'm trying to create two different figures based on the same previous one.
The previous one (fig) contains a line, common for both figures, from where two new figures are created (fig1 and fig2), each of them with different data (df1 and df2, respectively).
This is what I'd like to obtain:
I have tried using fig.add_subplot function, but an error is constantly raised:
ValueError: The Subplot must have been created in the present figure
I have created an example to show what I mean. The Value Error is shown when it's executed:
import pandas as pd
import matplotlib.pyplot as plt
# Data for the two different figures
df1 = pd.DataFrame({'x':np.random.rand(80), 'y':np.random.rand(80)})
df2 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
fig, ax = plt.subplots()
# Line creation for both figures
ax.plot(([1,2]))
# Try of creating the two different figures from the previous one:
fig1 = fig.add_subplot(df1.plot(x = 'x', y = 'y', kind = 'scatter'))
fig2 = fig.add_subplot(df2.plot(x = 'x', y = 'y', kind = 'scatter'))
In this example would be very easy to create the line inside of each figure, but that could not be done in the case that I'm working at.

You can pass ax as an argument for df.plot
import pandas as pd
import matplotlib.pyplot as plt
# Data for the two different figures
df1 = pd.DataFrame({'x':np.random.rand(80), 'y':np.random.rand(80)})
df2 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
fig, ax = plt.subplots(nrows=1,ncols=2)
# Line creation for both figures
ax[0].plot(([1,2]))
ax[1].plot(([1,2]))
# Try of creating the two different figures from the previous one:
df1.plot(x = 'x', y = 'y', kind = 'scatter', c='violet',ax=ax[0])
df2.plot(x = 'x', y = 'y', kind = 'scatter', c='navy',ax=ax[1])
ax[0].set_title('one')
ax[1].set_title('two')
the output figure is
UPDATE
The ax is an array of Axes objects. Different Axes objects are independent, they can have different labels, legends, ticks, etc. If you really need two figures for two plots
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Data for the two different figures
df1 = pd.DataFrame({'x':np.random.rand(80), 'y':np.random.rand(80)})
df2 = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)})
# fig 1
fig_0 = plt.figure(0)
ax_0 = fig_0.add_subplot(111)
ax_0.plot(([1,2]))
df1.plot(x = 'x', y = 'y', kind = 'scatter', c='violet',ax=ax_0)
ax_0.set_title('one')
fig_1 = plt.figure(1)
ax_1 = fig_1.add_subplot(111)
ax_1.plot(([1,2]))
df2.plot(x = 'x', y = 'y', kind = 'scatter', c='navy',ax=ax_1)
ax_1.set_title('two')
fig_0.savefig('one.png')
fig_1.savefig('two.png')
You will see from the two saved files, the two plots are in two different figures.

Modify plot axis so the order of its tick labels and their respective points change accordingly - without modifying the data itself

I want to reorder x-axis tick labels such that the data also changes appropriately.
Example
y = [5,8,9,10]
x = ['a', 'b', 'c', 'd']
plt.plot(y, x)
What I want the plot to look like by modifying the location of axis ticks.
Please note that I don't want to achieve this by modifying the order of my data
My Try
# attempt 1
fig, ax =plt.subplots()
plt.plot(y,x)
ax.set_xticklabels(['b', 'c', 'a', 'd'])
# this just overwrites the labels, not what we intended
# attempt2
fig, ax =plt.subplots()
plt.plot(y,x)
locs, labels = plt.xticks()
plt.xticks((1,2,0,3)); # This is essentially showing the location
# of the labels to dsiplay irrespective of the order of the tuple.
Edit:
Based on comments here are some further clarifications.
Let's say the first point (a,5) in fig 1. If I changed my x-axis definition such that a is now defined at the third position, then it gets reflected in the plot as well, which means, 5 on y-axis moves with a as shown in fig-2. One way to achieve this would be to re-order the data. However, I would like to see if it is possible to achieve it somehow by changing axis locations. To summarize, the data should be plotted based on how we define our custom axis without re-ordering the original data.
Edit2:
Based on the discussion in the comments it's not possible to do it by just modifying axis labels. Any approach would involve modifying the data. This was an oversimplification of the original problem I was facing. Finally, using dictionary-based labels in a pandas data frame helped me to sort the axis values in a specific order while also making sure that their respective values change accordingly.

Toggling between two different orders of the x axis categories could look as follows,
import numpy as np
import matplotlib.pyplot as plt
x = ['a', 'b', 'c', 'd']
y = [5,8,9,10]
order1 = ['a', 'b', 'c', 'd']
order2 = ['b', 'c', 'a', 'd']
fig, ax = plt.subplots()
line, = ax.plot(x, y, marker="o")
def toggle(order):
_, ind1 = np.unique(x, return_index=True)
_, inv2 = np.unique(order, return_inverse=True)
y_new = np.array(y)[ind1][inv2]
line.set_ydata(y_new)
line.axes.set_xticks(range(len(order)))
line.axes.set_xticklabels(order)
fig.canvas.draw_idle()
curr = [0]
orders = [order1, order2]
def onclick(evt):
curr[0] = (curr[0] + 1) % 2
toggle(orders[curr[0]])
fig.canvas.mpl_connect("button_press_event", onclick)
plt.show()
Click anywhere on the plot to toggle between order1 and order2.

Non overlapping error bars in line plot

I am using Pandas and Matplotlib to create some plots. I want line plots with error bars on them. The code I am using currently looks like this
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5)
ax.set_xscale("log")
plt.show()
With this code, I get 6 lines on a single plot (which is what I want). However, the error bars completely overlap, making the plot difficult to read.
Is there a way I could slightly shift the position of each point on the x-axis so that the error bars no longer overlap?
Here is a screenshot:

One way to achieve what you want is to plot the error bars 'by hand', but it is neither straight forward nor much better looking than your original. Basically, what you do is make pandas produce the line plot and then iterate through the data frame columns and do a pyplot errorbar plot for each of them such, that the index is slightly shifted sideways (in your case, with the logarithmic scale on the x axis, this would be a shift by a factor). In the error bar plots, the marker size is set to zero:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
colors = ['red','blue','green','yellow','purple','black']
df = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
df_yerr = pd.DataFrame(index=[10,100,1000,10000], columns=['A', 'B', 'C', 'D', 'E', 'F'], data=np.random.rand(4,6))
fig, ax = plt.subplots()
df.plot(ax=ax, marker="o",color=colors)
index = df.index
rows = len(index)
columns = len(df.columns)
factor = 0.95
for column,color in zip(range(columns),colors):
y = df.values[:,column]
yerr = df_yerr.values[:,column]
ax.errorbar(
df.index*factor, y, yerr=yerr, markersize=0, capsize=5,color=color,
zorder = 10,
)
factor *= 1.02
ax.set_xscale("log")
plt.show()
As I said, the result is not pretty:
UPDATE
In my opinion a bar plot would be much more informative:
fig2,ax2 = plt.subplots()
df.plot(kind='bar',yerr=df_yerr, ax=ax2)
plt.show()

you can solve with alpha for examples
df.plot(yerr=df_yerr, ax=ax, fmt="o-", capsize=5,alpha=0.5)
You can also check this link for reference

Add Legend to Seaborn point plot

I am plotting multiple dataframes as point plot using seaborn. Also I am plotting all the dataframes on the same axis.
How would I add legend to the plot ?
My code takes each of the dataframe and plots it one after another on the same figure.
Each dataframe has same columns
date count
2017-01-01 35
2017-01-02 43
2017-01-03 12
2017-01-04 27
My code :
f, ax = plt.subplots(1, 1, figsize=figsize)
x_col='date'
y_col = 'count'
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df_3,color='red')
This plots 3 lines on the same plot. However the legend is missing. The documentation does not accept label argument .
One workaround that worked was creating a new dataframe and using hue argument.
df_1['region'] = 'A'
df_2['region'] = 'B'
df_3['region'] = 'C'
df = pd.concat([df_1,df_2,df_3])
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df,hue='region')
But I would like to know if there is a way to create a legend for the code that first adds sequentially point plot to the figure and then add a legend.
Sample output :

I would suggest not to use seaborn pointplot for plotting. This makes things unnecessarily complicated.
Instead use matplotlib plot_date. This allows to set labels to the plots and have them automatically put into a legend with ax.legend().
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
date = pd.date_range("2017-03", freq="M", periods=15)
count = np.random.rand(15,4)
df1 = pd.DataFrame({"date":date, "count" : count[:,0]})
df2 = pd.DataFrame({"date":date, "count" : count[:,1]+0.7})
df3 = pd.DataFrame({"date":date, "count" : count[:,2]+2})
f, ax = plt.subplots(1, 1)
x_col='date'
y_col = 'count'
ax.plot_date(df1.date, df1["count"], color="blue", label="A", linestyle="-")
ax.plot_date(df2.date, df2["count"], color="red", label="B", linestyle="-")
ax.plot_date(df3.date, df3["count"], color="green", label="C", linestyle="-")
ax.legend()
plt.gcf().autofmt_xdate()
plt.show()
In case one is still interested in obtaining the legend for pointplots, here a way to go:
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df1,color='blue')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df2,color='green')
sns.pointplot(ax=ax,x=x_col,y=y_col,data=df3,color='red')
ax.legend(handles=ax.lines[::len(df1)+1], labels=["A","B","C"])
ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])
plt.gcf().autofmt_xdate()
plt.show()

Old question, but there's an easier way.
sns.pointplot(x=x_col,y=y_col,data=df_1,color='blue')
sns.pointplot(x=x_col,y=y_col,data=df_2,color='green')
sns.pointplot(x=x_col,y=y_col,data=df_3,color='red')
plt.legend(labels=['legendEntry1', 'legendEntry2', 'legendEntry3'])
This lets you add the plots sequentially, and not have to worry about any of the matplotlib crap besides defining the legend items.

I tried using Adam B's answer, however, it didn't work for me. Instead, I found the following workaround for adding legends to pointplots.
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='#bb3f3f', label='Label1')
black_patch = mpatches.Patch(color='#000000', label='Label2')
In the pointplots, the color can be specified as mentioned in previous answers. Once these patches corresponding to the different plots are set up,
plt.legend(handles=[red_patch, black_patch])
And the legend ought to appear in the pointplot.

This goes a bit beyond the original question, but also builds on #PSub's response to something more general---I do know some of this is easier in Matplotlib directly, but many of the default styling options for Seaborn are quite nice, so I wanted to work out how you could have more than one legend for a point plot (or other Seaborn plot) without dropping into Matplotlib right at the start.
Here's one solution:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# We will need to access some of these matplotlib classes directly
from matplotlib.lines import Line2D # For points and lines
from matplotlib.patches import Patch # For KDE and other plots
from matplotlib.legend import Legend
from matplotlib import cm
# Initialise random number generator
rng = np.random.default_rng(seed=42)
# Generate sample of 25 numbers
n = 25
clusters = []
for c in range(0,3):
# Crude way to get different distributions
# for each cluster
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Cluster {c+1}"
})
clusters.append(df)
# Flatten to a single data frame
clusters = pd.concat(clusters)
# Now do the same for data to feed into
# the second (scatter) plot...
n = 8
points = []
for c in range(0,2):
p = rng.integers(low=1, high=6, size=4)
df = pd.DataFrame({
'x': rng.normal(p[0], p[1], n),
'y': rng.normal(p[2], p[3], n),
'name': f"Group {c+1}"
})
points.append(df)
points = pd.concat(points)
# And create the figure
f, ax = plt.subplots(figsize=(8,8))
# The KDE-plot generates a Legend 'as usual'
k = sns.kdeplot(
data=clusters,
x='x', y='y',
hue='name',
shade=True,
thresh=0.05,
n_levels=2,
alpha=0.2,
ax=ax,
)
# Notice that we access this legend via the
# axis to turn off the frame, set the title,
# and adjust the patch alpha level so that
# it closely matches the alpha of the KDE-plot
ax.get_legend().set_frame_on(False)
ax.get_legend().set_title("Clusters")
for lh in ax.get_legend().get_patches():
lh.set_alpha(0.2)
# You would probably want to sort your data
# frame or set the hue and style order in order
# to ensure consistency for your own application
# but this works for demonstration purposes
groups = points.name.unique()
markers = ['o', 'v', 's', 'X', 'D', '<', '>']
colors = cm.get_cmap('Dark2').colors
# Generate the scatterplot: notice that Legend is
# off (otherwise this legend would overwrite the
# first one) and that we're setting the hue, style,
# markers, and palette using the 'name' parameter
# from the data frame and the number of groups in
# the data.
p = sns.scatterplot(
data=points,
x="x",
y="y",
hue='name',
style='name',
markers=markers[:len(groups)],
palette=colors[:len(groups)],
legend=False,
s=30,
alpha=1.0
)
# Here's the 'magic' -- we use zip to link together
# the group name, the color, and the marker style. You
# *cannot* retreive the marker style from the scatterplot
# since that information is lost when rendered as a
# PathCollection (as far as I can tell). Anyway, this allows
# us to loop over each group in the second data frame and
# generate a 'fake' Line2D plot (with zero elements and no
# line-width in our case) that we can add to the legend. If
# you were overlaying a line plot or a second plot that uses
# patches you'd have to tweak this accordingly.
patches = []
for x in zip(groups, colors[:len(groups)], markers[:len(groups)]):
patches.append(Line2D([0],[0], linewidth=0.0, linestyle='',
color=x[1], markerfacecolor=x[1],
marker=x[2], label=x[0], alpha=1.0))
# And add these patches (with their group labels) to the new
# legend item and place it on the plot.
leg = Legend(ax, patches, labels=groups,
loc='upper left', frameon=False, title='Groups')
ax.add_artist(leg);
# Done
plt.show();
Here's the output:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Independent axis for each subplot in pandas boxplot - python

You may set the ticklabels visible again, e.g. via plt.setp(ax.get_xticklabels(), visible=True) This does not make the axes independent though, they are still bound to each other, but it seems like you are asking about the visibilty, rather than the shared behaviour here.

Related

How to label these points on the scatter plot

How to create two different figures based on the same previous one?

Modify plot axis so the order of its tick labels and their respective points change accordingly - without modifying the data itself

Non overlapping error bars in line plot

Add Legend to Seaborn point plot

Categories

Resources