mplcursors interactivity with endpoints of scatterplots - python

import pandas as pd
import matplotlib.pyplot as plt
import mplcursors
df = pd.DataFrame(
{'Universe': ['Darvel', 'MC', 'MC', 'Darvel', 'MC', 'Other', 'Darvel'],
'Value': [10, 11, 13, 12, 9, 7, 10],
'Upper': [12.5, 11.3, 15.4, 12.2, 13.1, 8.8, 11.5],
'Lower': [4.5, 9.6, 11.8, 6, 6.5, 5, 8]})
df['UpperError'] = df['Upper'] - df['Value']
df['LowerError'] = df['Value'] - df['Lower']
colors = ['r', 'g', 'b']
fig, ax = plt.subplots()
for i, universe in enumerate(df['Universe'].unique()):
to_plot = df[df['Universe'] == universe]
ax.scatter(to_plot.index, to_plot['Value'], s=16, c=colors[i])
error = to_plot[['LowerError', 'UpperError']].transpose().to_numpy()
ax.errorbar(to_plot.index, to_plot['Value'], yerr=error, fmt='o',
markersize=0, capsize=6, color=colors[i])
ax.scatter(to_plot.index, to_plot['Upper'], c='w', zorder=-1)
ax.scatter(to_plot.index, to_plot['Lower'], c='w', zorder=-1)
mplcursors.cursor(hover=True)
plt.show()
This does most of what I want, but I want the following changes.
I do not want the mplcursors cursor to interact with the errorbars, but just the scatter plots, including the invisible scatterplots on top and bottom of the errorbars.
I just want the y value to show. For example, the first bar should say "12.5" on the top, "10.0" in the middle, and "4.5" on the bottom.

To have mplcursors only interact with some elements, a list of those elements can be given as the first parameter to mplcursors.cursor(). The list could be built from the return values of the calls to ax.scatter.
To modify the annotation text shown, a custom function can be connected. In the example below, the label and the y-position are extracted from the selected element and put into the annotation text. Such label can be added via ax.scatter(..., label=...).
(Choosing 'none' as the color for the "invisible" elements makes them really invisible. To make the code more "Pythonic" explicit indices can be avoided, working with zip instead of with enumerate.)
import matplotlib.pyplot as plt
import mplcursors
import pandas as pd
def show_annotation(sel):
text = f'{sel.artist.get_label()}\n y={sel.target[1]:.1f}'
sel.annotation.set_text(text)
df = pd.DataFrame(
{'Universe': ['Darvel', 'MC', 'MC', 'Darvel', 'MC', 'Other', 'Darvel'],
'Value': [10, 11, 13, 12, 9, 7, 10],
'Upper': [12.5, 11.3, 15.4, 12.2, 13.1, 8.8, 11.5],
'Lower': [4.5, 9.6, 11.8, 6, 6.5, 5, 8]})
df['UpperError'] = df['Upper'] - df['Value']
df['LowerError'] = df['Value'] - df['Lower']
colors = ['r', 'g', 'b']
fig, ax = plt.subplots()
all_scatters = []
for universe, color in zip(df['Universe'].unique(), colors):
to_plot = df[df['Universe'] == universe]
all_scatters.append(ax.scatter(to_plot.index, to_plot['Value'], s=16, c=color, label=universe))
error = to_plot[['LowerError', 'UpperError']].transpose().to_numpy()
ax.errorbar(to_plot.index, to_plot['Value'], yerr=error, fmt='o',
markersize=0, capsize=6, color=color)
all_scatters.append(ax.scatter(to_plot.index, to_plot['Upper'], c='none', zorder=-1, label=universe))
all_scatters.append(ax.scatter(to_plot.index, to_plot['Lower'], c='none', zorder=-1, label=universe))
cursor = mplcursors.cursor(all_scatters, hover=True)
cursor.connect('add', show_annotation)
plt.show()
PS: You can also show the 'Universe' via the x ticks:
ax.set_xticks(df.index)
ax.set_xticklabels(df['Universe'])
If you want to, for short functions you could use the lambda notation instead of writing a separate function:
cursor.connect('add',
lambda sel: sel.annotation.set_text(f'{sel.artist.get_label()}\n y={sel.target[1]:.1f}'))

Related

Drawing plot by positions with different colors using python

I have a dataframe with 3 columns which I want to make a plot with.
df = pd.DataFrame({
'position': [100,200,220, 300, 400, 500],
'1': list('xoooox'),
'2': list('oxxooo')
})
I drew a scheme below
(Sorry for adding am image file, I had difficulties describing it with text)
In the plot, the height of each data doesn't matter.
All the bars are the same height
Thank you.
You can do this by passing kind='bar' and color=some_array, for instance:
df = pd.DataFrame({
'col1': [100, 200, 300, 400],
'col2': [13, 7, 11, 17],
'col3': ['r', 'b', 'g', '0.5'],
})
ax = df.plot(x='col1', y='col2', kind='bar', color=df['col3'])
NOTE You can tinker with the style of the plot some more by calling methods on the ax object.
How about using seaborn:
import seaborn as sns
colors = df.groupby(['1','2']).ngroup().astype('category')
sns.barplot(x=df['position'], y=1, hue=colors, dodge=False)
Output:
Or you can manually plot the bars, which allows proper scaling of position:
cmap = {
('x','o'): 'b',
('o','x'): 'r',
('o','o'): 'g',
('x','x'): 'm'
}
fig, ax = plt.subplots()
for _, row in df.iterrows():
ax.bar(row['position'], 1, edgecolor=cmap[(row['1'], row[2])],
facecolor=(0,0,0,0),
width=10)
Output:

Filling the area between two lines on different scales/subplot axes

I am trying to figure out how to fill between two lines that are on different scales & axes of subplot, however, I have not been able to figure out how to do this.
I have tried following the answer here for a similar question, but the formula supplied in the code doesn't work on my dataset and based on the responses from the author of that question the equation doesn't appear to work when the x limits are changed.
The following image is what I am after (created in Photoshop):
However, using the code below, I get:
Example Data & Code
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame({'DEPTH':[4300, 4310, 4320, 4330, 4340, 4350, 4360, 4370, 4380, 4390],
'NEUT':[45, 40, 30, 12, 6, 12, 8, 10, 20, 18],
'DENS':[2.5, 2.55, 2.32, 2.35, 2.3, 2.55, 2.58, 2.6, 2.52, 2.53]})
fig = plt.subplots(figsize=(7,20))
ax1 = plt.subplot2grid((1,1), (0,0))
ax2 = ax1.twiny()
ax1.plot('DENS', 'DEPTH', data=data, color='red')
ax1.set_xlim(1.95, 2.95)
ax1.set_xlabel('Density')
ax1.xaxis.label.set_color("red")
ax1.tick_params(axis='x', colors="red")
ax1.spines["top"].set_edgecolor("red")
ax2.plot('NEUT', 'DEPTH', data=data, color='blue')
ax2.set_xlim(45, -15)
ax2.set_xlabel('Neutron')
ax2.xaxis.label.set_color("blue")
ax2.spines["top"].set_position(("axes", 1.04))
ax2.tick_params(axis='x', colors="blue")
ax2.spines["top"].set_edgecolor("blue")
ax1.fill_betweenx(data['DEPTH'], data['DENS'], data['NEUT'], where=data['DENS']>=data['NEUT'], interpolate=True, color='green')
ax1.fill_betweenx(data['DEPTH'], data['DENS'], data['NEUT'], where=data['DENS']<=data['NEUT'], interpolate=True, color='yellow')
for ax in [ax1, ax2]:
ax.set_ylim(4400, 4300)
ax.xaxis.set_ticks_position("top")
ax.xaxis.set_label_position("top")
Would anyone be able to help me with this please?
The difference between your code and the answer you linked is that your Neutron scale goes from the maximum value on the left to the minimum value on the right, which means the logic is slightly wrong. So we just need to switch a few min and max terms around.
Try this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({'DEPTH':[4300, 4310, 4320, 4330, 4340, 4350, 4360, 4370, 4380, 4390],
'NEUT':[45, 40, 30, 12, 6, 12, 8, 10, 20, 18],
'DENS':[2.5, 2.55, 2.32, 2.35, 2.3, 2.55, 2.58, 2.6, 2.52, 2.53]})
fig = plt.subplots(figsize=(6,8))
ax1 = plt.subplot2grid((1,1), (0,0))
ax2 = ax1.twiny()
ax1.plot('DENS', 'DEPTH', data=data, color='red')
ax1.set_xlim(1.95, 2.95)
ax1.set_xlabel('Density')
ax1.xaxis.label.set_color("red")
ax1.tick_params(axis='x', colors="red")
ax1.spines["top"].set_edgecolor("red")
ax2.plot('NEUT', 'DEPTH', data=data, color='blue')
ax2.set_xlim(45, -15)
ax2.set_xlabel('Neutron')
ax2.xaxis.label.set_color("blue")
ax2.spines["top"].set_position(("axes", 1.08))
ax2.tick_params(axis='x', colors="blue")
ax2.spines["top"].set_edgecolor("blue")
x = np.array(ax1.get_xlim())
z = np.array(ax2.get_xlim())
x1 = data['DENS']
x2 = data['NEUT']
nz=((x2-np.max(z))/(np.min(z)-np.max(z)))*(np.max(x)-np.min(x))+np.min(x)
ax1.fill_betweenx(data['DEPTH'], x1, nz, where=x1>=nz, interpolate=True, color='green')
ax1.fill_betweenx(data['DEPTH'], x1, nz, where=x1<=nz, interpolate=True, color='yellow')
for ax in [ax1, ax2]:
ax.set_ylim(4400, 4300)
ax.xaxis.set_ticks_position("top")
ax.xaxis.set_label_position("top")
plt.show()
(I changed the figure size so it would fit on my screen)

Create stacked bar with matplotlib

I have data displayed in the following format:
values = np.array([10, 12,13, 5,20], [30, 7, 10, 25,2], [10, 12,13, 5,20]])
And I want to create a straight-up stacked bar chart like the following figure. Each element in the array belongs to a stacked bar.
I have searched to see how can I do this with matplotlib, but unfortunately, I still haven't found a way to do it. How can I do this?
AFAIK, there is now straightforward way to do it. You need to calculate exact position of bars yourself and then normalize it.
import numpy as np
import matplotlib.pyplot as plt
values = np.array([[10, 12,13, 5,20], [30, 7, 10, 25,2], [10, 12,13, 5,20]])
values_normalized = values/np.sum(values, axis=0)
bottom_values = np.cumsum(values_normalized, axis=0)
bottom_values = np.vstack([np.zeros(values_normalized[0].size), bottom_values])
text_positions = (bottom_values[1:] + bottom_values[:-1])/2
r = [0, 1, 2, 3, 4] # position of the bars on the x-axis
names = ['A', 'B', 'C', 'D', 'E'] # names of groups
colors = ['lightblue', 'orange', 'lightgreen']
for i in range(3):
plt.bar(r, values_normalized[i], bottom=bottom_values[i], color=colors[i], edgecolor='white', width=1, tick_label=['a','b','c','d','e'])
for xpos, ypos, yval in zip(r, text_positions[i], values[i]):
plt.text(xpos, ypos, "N=%d"%yval, ha="center", va="center")
# Custom X axis
plt.xticks(r, names, fontweight='bold')
plt.xlabel("group")
plt.show()
There is a source that tells how to add text on top of bars. I'm a bit in a hurry right now so I hope this is useful and I'll update my answer next day if needed.
I've updated my answer. Adding text on top of the bars is tricky, it requires some calculations of their vertical positions.
Btw, I have refactored the most of code that is in a link I shared.
Python 3.8
matplotlib 3.3.1
numpy 1.19.1
Chat Result
import matplotlib.pyplot as plt
import numpy as np
values = np.array([[10, 12, 13, 5, 20], [30, 7, 10, 25, 2], [10, 12, 13, 5, 20]])
row, column = values.shape # (3, 5)
x_type = [x+1 for x in range(column)]
ind = [x for x, _ in enumerate(x_type)]
values_normalized = values/np.sum(values, axis=0)
value1, value2, value3 = values_normalized[0,:], values_normalized[1,:], values_normalized[2,:]
# Create figure
plt.figure(figsize=(8, 6))
plt.bar(ind, value1, width=0.8, label='Searies1', color='#5B9BD5')
plt.bar(ind, value2, width=0.8, label='Searies2', color='#C00000', bottom=value1)
plt.bar(ind, value3, width=0.8, label='Searies3', color='#70AD47', bottom=value1 + value2)
# Show text
bottom_values = np.cumsum(values_normalized, axis=0)
bottom_values = np.vstack([np.zeros(values_normalized[0].size), bottom_values])
text_positions = (bottom_values[1:] + bottom_values[:-1])/2
c = list(range(column))
for i in range(3):
for xpos, ypos, yval in zip(c, text_positions[i], values[i]):
plt.text(xpos, ypos, yval, horizontalalignment='center', verticalalignment='center', color='white')
plt.xticks(ind, x_type)
plt.legend(loc='center', bbox_to_anchor=(0, 1.02, 1, 0.1), handlelength=1, handleheight=1, ncol=row)
plt.title('CHART TITLE', fontdict = {'fontsize': 16,'fontweight': 'bold', 'family': 'serif'}, y=1.1)
# Hide y-axis
plt.gca().axes.yaxis.set_visible(False)
plt.show()

Matplotlib Scatter Plot Legend Creation Mystery

I have the following snipped of code (values for c, s, x, y are mockups, but the real lists follow the same format, just much bigger. Only two colors are used - red and green though. All lists are of the same size)
The issue is that the color legend fails to materialize. I am completely at loss as to why. Code snippets for legend generation is basically a cut-n-paste from docs, i.e. (https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_with_legend.html#sphx-glr-gallery-lines-bars-and-markers-scatter-with-legend-py)
Anyone has any idea??
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
c = [ 'g', 'r', 'r', 'g', 'g', 'r', 'r', 'r', 'g', 'r']
s = [ 10, 20, 10, 40, 60, 90, 90, 50, 60, 40]
x = [ 2.4, 3.0, 3.5, 3.5, 3.5, 3.5, 3.5, 2.4, 3.5, 3.5]
y = [24.0, 26.0, 20.0, 19.0, 19.0, 21.0, 20.0, 23.0, 20.0, 20.0]
fig, ax = plt.subplots()
scatter = plt.scatter(x, y, s=s, c=c, alpha=0.5)
# produce a legend with the unique colors from the scatter
handles, lables = scatter.legend_elements()
legend1 = ax.legend(handles, labels, loc="lower left", title="Colors")
ax.add_artist(legend1)
# produce a legend with a cross section of sizes from the scatter
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.5)
legend2 = ax.legend(handles, labels, loc="upper right", ncol=2, title="Sizes")
plt.show()
Plot output:
It seems that legend_elements() is only meant to be used when c= is passed a numeric array to be mapped against a colormap.
You can test by replacing c=c by c=s in your code, and you will get the desired output.
Personally, I would have expected your code to work, and maybe it is worth bringing it up either as a bug or a feature request at matplotlib's github. EDIT: actually, there is already a discussion about this very issue on the issue tracker
One way to circumvent this limitation is to replace your array of colors names with a numeric array and creating a custom colormap that maps each value in your array to the desired color:
#c = [ 'g', 'r', 'r', 'g', 'g', 'r', 'r', 'r', 'g', 'r']
c = [0, 1, 1, 0, 0, 1, 1, 1, 0, 1]
cmap = matplotlib.colors.ListedColormap(['g','r'])
s = [ 10, 20, 10, 40, 60, 90, 90, 50, 60, 40]
x = [ 2.4, 3.0, 3.5, 3.5, 3.5, 3.5, 3.5, 2.4, 3.5, 3.5]
y = [24.0, 26.0, 20.0, 19.0, 19.0, 21.0, 20.0, 23.0, 20.0, 20.0]
fig, ax = plt.subplots()
scatter = plt.scatter(x, y, s=s, c=c, alpha=0.5, cmap=cmap)
# produce a legend with the unique colors from the scatter
handles, labels = scatter.legend_elements()
legend1 = ax.legend(handles, labels, loc="lower left", title="Colors")
ax.add_artist(legend1)
# produce a legend with a cross section of sizes from the scatter
handles, labels = scatter.legend_elements(prop="sizes", alpha=0.5)
legend2 = ax.legend(handles, labels, loc="upper right", ncol=2, title="Sizes")
plt.show()

Why does using ax.twiny shift the figure mapped to second axes rightward?

I'm trying to allow my figure to share the same y axis, but have different scales along x axis. The problem is that when I try to map the second figure to the second axes (ax1 = ax.twiny), the figure seems to move forward to the right from where it should be. Here is a minimal working example that demonstrates my problem.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
r = [0,1,2,3,4]
raw_data = {'greenBars': [20, 1.5, 7, 10, 5], 'orangeBars': [5, 15, 5, 10, 15],'blueBars': [2, 15, 18, 5, 10]}
df = pd.DataFrame(raw_data)
totals = [i+j+k for i,j,k in zip(df['greenBars'], df['orangeBars'], df['blueBars'])]
greenBars = [i / j * 100 for i,j in zip(df['greenBars'], totals)]
f, ax = plt.subplots(1, figsize=(6,6))
ax.barh(r, greenBars, color='#b5ffb9', edgecolor='white', height=0.85)
df = pd.DataFrame({'group':['A', 'B', 'C', 'D', 'E'], 'values':[300,250,150,50,10] })
ax1 = ax.twiny()
ax1.hlines(y=groups, xmin=0, xmax=df['values'], color='black', linewidth=1.5);
plt.show()
where my expected outcome is to have the ax1.hlines move left-ward to the frame (as shown by the arrows in the image below). Does anybody have any suggestions as to how to fix this behaviour?
barh usually sets lower limit at 0 while plot or others set at a little less value for aesthetic. To fix this, manually set xlim for ax1:
...
f, ax = plt.subplots(1, figsize=(6,6))
ax.barh(r, greenBars, color='#b5ffb9', edgecolor='white', height=0.85)
df = pd.DataFrame({'group':['A', 'B', 'C', 'D', 'E'], 'values':[300,250,150,50,10] })
ax1 = ax.twiny()
ax1.hlines(y=df['group'], xmin=0, xmax=df['values'], color='black', linewidth=1.5);
# this is added
ax1.set_xlim(0)
plt.show()
Output:

Categories

Resources