I have a matplotlib plot where certain points get annotated. I have worked out how to do the annotations themselves, including arrows and everything. However, I need to add a line to each annotation, next to the text of the annotation. It should run in parallel to the text, with a certain offset from the text in points. The length of the line is based on a percentage value, that each annotated point has. Ideally I would like a line that's always the same length (roughly 15 text characters, which is the max length of the text in the annotations) but has a let's say red and grey portion, based on the percentage value mentioned.
Any help or suggestions is greatly appreciated.
Here is a minimum example of some mock data points:
import numpy as np
import matplotlib.pyplot as plt
x=[2, 3, 4, 6, 7, 8, 10, 11]
y=[1, 3, 4, 2, 3, 1, 5, 2]
tx=[3, 4, 5, 6, 7, 8, 9, 10]
yd=dict(zip(x, y))
plt.scatter(x, y)
plt.xlim(0, 14)
plt.ylim(0, 8)
tspace=list(np.linspace(.05, .95, len(tx)))
tsd=dict(zip(tx, tspace))
arpr = {"arrowstyle": "-",
"connectionstyle": "arc,angleA=-90,armA=20,angleB=90,armB=20,rad=10"}
for i, j in zip(x, tx):
plt.annotate("foo bar baz", (i, yd[i]), (tsd[j], .75),
textcoords="axes fraction", arrowprops=arpr,
annotation_clip=False, rotation="vertical")
And here is a comparison of current vs. desired output:
You can use plt.Rectangle to draw the bars — first a grey one that is the height of the entire bar, and then the red bar that is a percentage of the height of the entire bar.
However, since the width and length parameters of the rectangle are in units of the x- and y-coordinates on the plot, we need to be able to access the coordinates of the text annotations you made.
You set the annotation coordinates using textcoords="axes fraction" which makes it difficult to access the starting and ending coordinates for the rectangle in x- and y-coordinates, so instead I defined some constants x_min, x_max, y_min, y_max for the limits of the plot, and then calculated the coordinates for your text annotations directly from the tspace list as well as the bar annotation.
The percentage of red space for each bar can be set in a list so that's it's generalizable.
import numpy as np
import matplotlib.pyplot as plt
x=[2, 3, 4, 6, 7, 8, 10, 11]
y=[1, 3, 4, 2, 3, 1, 5, 2]
tx=[3, 4, 5, 6, 7, 8, 9, 10]
yd=dict(zip(x, y))
fig,ax = plt.subplots(1,1)
plt.scatter(x, y)
x_min, x_max = 0, 14
y_min, y_max = 0, 8
y_text_end = 0.75*(y_max-y_min)
plt.xlim(0, 14)
plt.ylim(0, 8)
tspace=list(np.linspace(.05, .95, len(tx)))
# tsd=dict(zip(tx, tspace))
# random percentage values to demonstrate the bar functionality
bar_percentages = [0.95, 0.9, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05]
bar_width = 0.2
bar_height = 1.9
arpr = {"arrowstyle": "-",
"connectionstyle": "arc,angleA=-90,armA=20,angleB=90,armB=20,rad=10"}
## axes fraction is convenient but it's important to be able to access the exact coordinates for the Rectangle function
for i, x_val in enumerate(x):
plt.annotate("foo bar baz", (x_val, yd[x_val]), (tspace[i]*(x_max-x_min), y_text_end),
arrowprops=arpr, annotation_clip=False, rotation="vertical")
bar_grey = plt.Rectangle((tspace[i]*(x_max-x_min)+0.4, y_text_end-0.1), bar_width, bar_height, fc='#cccccc')
bar_red = plt.Rectangle((tspace[i]*(x_max-x_min)+0.4, y_text_end-0.1), bar_width, bar_percentages[i]*bar_height, fc='r')
I have since found a solution, albeit a hacky one, and without the ideal "grey boxes", but it's fine for my purposes and I'll share it here if it might help someone. If anyone knows an improvement, please feel free to contribute. Thanks to #DerekO for providing a useful input, which I incorporated into my solution.
This is adapted from this matplotlib demo. I simply shifted the custom box to outside of the text and modified width and height with an additional parameter for the percentage. I had to split it into two actual annotations, because the arrow would not start at the correct location using the custom box, but this way it works fine. The scaling/zooming now behaves well and follows the text.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.path import Path
from matplotlib.patches import BoxStyle
class MyStyle(BoxStyle._Base):
def __init__(self, pad, per=1.):
self.pad = pad
self.per = per
def transmute(self, x0, y0, width, height, mutation_size):
# padding
pad = mutation_size * self.pad
# width and height with padding added.
width = width + 2.*pad
width *= self.per
height = 8.
# boundary of the padded box
x0, y0 = x0-pad, y0-pad,
x1, y1 = x0+width, y0-height
cp = [(x0, y0),
(x1, y0),
(x1, y1),
(x0, y1),
(x0, y0)]
com = [Path.MOVETO,
path = Path(cp, com)
return path
# register the custom style
BoxStyle._style_list["percent"] = MyStyle
x=[2, 3, 4, 6, 7, 8, 10, 11]
y=[1, 3, 4, 2, 3, 1, 5, 2]
tx=[3, 4, 5, 6, 7, 8, 9, 10]
yd=dict(zip(x, y))
fig,ax = plt.subplots(1,1)
plt.scatter(x, y)
x_min, x_max = 0, 14
y_min, y_max = 0, 8
y_text_end = 0.75*(y_max-y_min)
plt.xlim(0, 14)
plt.ylim(0, 8)
tspace=list(np.linspace(.05, .95, len(tx)))
# tsd=dict(zip(tx, tspace))
# random percentage values to demonstrate the bar functionality
bar_percentages = [0.95, 0.9, 0.8, 0.6, 0.4, 0.2, 0.1, 0.05]
arpr = {"arrowstyle": "-",
"connectionstyle": "arc,angleA=-90,armA=20,angleB=90,armB=20,rad=10"}
## axes fraction is convenient but it's important to be able to access the exact coordinates for the Rectangle function
for i, x_val in enumerate(x):
plt.annotate("", (x_val, yd[x_val]), (tspace[i]*(x_max-x_min), y_text_end),
arrowprops=arpr, annotation_clip=False, rotation="vertical",)
plt.annotate("foo bar baz", (x_val, yd[x_val]), (tspace[i]*(x_max-x_min), y_text_end),
annotation_clip=False, rotation="vertical",
va="bottom", ha="right",
del BoxStyle._style_list["percent"]
I have 6 points in the (x,y) plane: x=[x1,x2,x3,x4,x5,x6] and y=[y1,y2,y3,y4,y5,y6]
import matplotlib.pyplot as plt
x = [0, 2, 4, 0, 2, 4, 0, 2, 4]
y = [0, 0, 0, 3, 3, 3, 7, 7, 7]
plt.scatter(x, y)
I want to between the points, draw entirely parallel lines on each axis x,y(like photo). and how to hide x and y axis on diagram. I want to draw a 2D view of the beams and columns of 3 story building; does matplotlib bring me to my goal or should I go to other libraries?
Absolutely matplotlib can do this. Take a look at their Rectangle Patch:
Example usage (you'll have to modify this to your needs):
import matplotlib.pyplot as plt
import matplotlib.patches as patches
fig = plt.figure()
ax = fig.add_subplot()
rect = patches.Rectangle(
(0.1, 0.1),
Does anybody have a suggestion on what's the best way to present overlapping lines on a plot? I have a lot of them, and I had the idea of having full lines of different colors where they don't overlap, and having dashed lines where they do overlap so that all colors are visible and overlapping colors are seen.
But still, how do I that.
I have the same issue on a plot with a high degree of discretization.
Here the starting situation:
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
for gg,graph in enumerate(graphs):
No one can say where the green and blue lines run exactly
and my "solution"
import matplotlib.pyplot as plt
grid=[x for x in range(10)]
for gg,graph in enumerate(graphs):
plt.plot(grid,graph,label='g'+str(gg), linestyle=ls, linewidth=lw)
I am grateful for suggestions on improvement!
Just decrease the opacity of the lines so that they are see-through. You can achieve that using the alpha variable. Example:
plt.plot(x, y, alpha=0.7)
Where alpha ranging from 0-1, with 0 being invisible.
imagine your panda data frame is called respone_times, then you can use alpha to set different opacity for your graphs. Check the picture before and after using alpha.
plt.figure(figsize=(15, 7))
plt.title('a sample title')
Depending on your data and use case, it might be OK to add a bit of random jitter to artificially separate the lines.
from numpy.random import default_rng
import pandas as pd
rng = default_rng()
def jitter_df(df: pd.DataFrame, std_ratio: float) -> pd.DataFrame:
Add jitter to a DataFrame.
Adds normal distributed jitter with mean 0 to each of the
DataFrame's columns. The jitter's std is the column's std times
Returns the jittered DataFrame.
std = df.std().values * std_ratio
jitter = pd.DataFrame(
std * rng.standard_normal(df.shape),
return df + jitter
Here's a plot of the original data from Markus Dutschke's example:
And here's the jittered version, with std_ratio set to 0.1:
Replacing solid lines by dots or dashes works too
g = sns.FacetGrid(data, col='config', row='outputs', sharex=False)
g.map_dataframe(sns.lineplot, x='lag',y='correlation',hue='card', linestyle='dotted')
Instead of random jitter, the lines can be offset just a little bit, creating a layered appearance:
import matplotlib.pyplot as plt
from matplotlib.transforms import offset_copy
grid = list(range(10))
graphs = [[1, 1, 1, 4, 4, 4, 3, 5, 6, 0],
[1, 1, 1, 5, 5, 5, 3, 5, 6, 0],
[1, 1, 1, 0, 0, 3, 3, 2, 4, 0],
[1, 2, 4, 4, 3, 2, 3, 2, 4, 0],
[1, 2, 3, 3, 4, 4, 3, 2, 6, 0],
[1, 1, 3, 3, 0, 3, 3, 5, 4, 3]]
fig, ax = plt.subplots()
lw = 1
for gg, graph in enumerate(graphs):
trans_offset = offset_copy(ax.transData, fig=fig, x=lw * gg, y=lw * gg, units='dots')
ax.plot(grid, graph, lw=lw, transform=trans_offset, label='g' + str(gg))
ax.legend(loc='upper left', bbox_to_anchor=(1.01, 1.01))
# manually set the axes limits, because the transform doesn't set them automatically
ax.set_xlim(grid[0] - .5, grid[-1] + .5)
ax.set_ylim(min([min(g) for g in graphs]) - .5, max([max(g) for g in graphs]) + .5)
I have a pandas DataFrame with non-uniformly spaced data points given by an x, y and z column, where x and y are pairs of variables and z is the dependent variable. For example:
import matplotlib.pyplot as plt
from matploblib.mlab import griddata
import numpy as np
import pandas as pd
df = pd.DataFrame({'x':[0, 0, 1, 1, 3, 3, 3, 4, 4, 4],
'y':[0, 1, 0, 1, 0.2, 0.7, 1.4, 0.2, 1.4, 2],
'z':[50, 40, 40, 30, 30, 30, 20, 20, 20, 10]})
x = df['x']
y = df['y']
z = df['z']
I want to do a contour plot of the dependent variable z over x and y. For this, I create a new grid to interpolate the data on using matplotlib.mlab's griddata function.
xi = np.linspace(x.min(), x.max(), 100)
yi = np.linspace(y.min(), y.max(), 100)
z_grid = griddata(x, y, z, xi, yi, interp='linear')
plt.contourf(xi, yi, z_grid, 15)
plt.scatter(x, y, color='k') # The original data points
While this works, the output is not what I want. I do not want griddata to interpolate outside of the boundaries given by the min and max values of the x and y data. The following plots are what shows up after calling, and then highlighted in purple what area of the data I want to have interpolated and contoured. The contour outside the purple line is supposed to be blank. How could I go about masking the outlying data?
The linked question does unfortunately not answer my question, as I don't have a clear mathematical way to define the conditions on which to do a triangulation. Is it possible to define a condition to mask the data based on the data alone, taking the above Dataframe as an example?
As seen in the answer to this question one may introduce a condition to mask the values.
The sentence from the question
"I do not want griddata to interpolate outside of the boundaries given by the min and max values of the x and y data." implies that there is some min/max condition present, which can be used.
Should that not be the case, one may clip the contour using a path. The points of this path need to be specified as there is no generic way of knowing which points should be the edges. The code below does this for three different possible paths.
import matplotlib.pyplot as plt
from matplotlib.path import Path
from matplotlib.patches import PathPatch
from matplotlib.mlab import griddata
import numpy as np
import pandas as pd
df = pd.DataFrame({'x':[0, 0, 1, 1, 3, 3, 3, 4, 4, 4],
'y':[0, 1, 0, 1, 0.2, 0.7, 1.4, 0.2, 1.4, 2],
'z':[50, 40, 40, 30, 30, 30, 20, 20, 20, 10]})
x = df['x']
y = df['y']
z = df['z']
xi = np.linspace(x.min(), x.max(), 100)
yi = np.linspace(y.min(), y.max(), 100)
z_grid = griddata(x, y, z, xi, yi, interp='linear')
clipindex = [ [0,2,4,7,8,9,6,3,1,0],
fig, axes = plt.subplots(ncols=3, sharey=True)
for i, ax in enumerate(axes):
cont = ax.contourf(xi, yi, z_grid, 15)
ax.scatter(x, y, color='k') # The original data points
ax.plot(x[clipindex[i]], y[clipindex[i]], color="crimson")
clippath = Path(np.c_[x[clipindex[i]], y[clipindex[i]]])
patch = PathPatch(clippath, facecolor='none')
for c in cont.collections:
Ernest's answer is a great solution, but very slow for lots of contours. Instead of clipping every one of them, I built a mask by constructing the complement polygon of the desired clipping mask.
Here is the code based on Ernest's accepted answer:
import numpy as np
import pandas as pd
import matplotlib.tri as tri
import matplotlib.pyplot as plt
from descartes import PolygonPatch
from shapely.geometry import Polygon
df = pd.DataFrame({'x':[0, 0, 1, 1, 3, 3, 3, 4, 4, 4],
'y':[0, 1, 0, 1, 0.2, 0.7, 1.4, 0.2, 1.4, 2],
'z':[50, 40, 40, 30, 30, 30, 20, 20, 20, 10]})
points = df[['x', 'y']]
values = df[['z']]
xi = np.linspace(points.x.min(), points.x.max(), 100)
yi = np.linspace(points.y.min(), points.y.max(), 100)
triang = tri.Triangulation(points.x, points.y)
interpolator = tri.LinearTriInterpolator(triang, values.z)
Xi, Yi = np.meshgrid(xi, yi)
zi = interpolator(Xi, Yi)
clipindex = [ [0,2,4,7,8,9,6,3,1,0],
fig, axes = plt.subplots(ncols=3, sharey=True, figsize=(10,4))
for i, ax in enumerate(axes):
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-0.2, 2.2)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
cont = ax.contourf(Xi, Yi, zi, 15)
ax.scatter(points.x, points.y, color='k', zorder=2) # The original data points
ax.plot(points.x[clipindex[i]], points.y[clipindex[i]], color="crimson", zorder=1)
#### 'Universe polygon':
ext_bound = Polygon([(xlim[0], ylim[0]), (xlim[0], ylim[1]), (xlim[1], ylim[1]), (xlim[1], ylim[0]), (xlim[0], ylim[0])])
#### Clipping mask as polygon:
inner_bound = Polygon([ (row.x, row.y) for idx, row in points.iloc[clipindex[i]].iterrows() ])
#### Mask as the symmetric difference of both polygons:
mask = ext_bound.symmetric_difference(inner_bound)
ax.add_patch(PolygonPatch(mask, facecolor='white', zorder=1, edgecolor='white'))
I'm trying to make a scatterplot of two arrays/lists, one of which is the x coordinate and the other the y. I'm not having any trouble with that. However, I need to color-code these points based on their values at a specific point in time, based on data which I have in a 2d array. Also, this 2d array of data has a very large spread, so I'd like to color the points logarithmically (I'm not sure if this means just change the color bar labels or if there's a more fundamental difference.)
Here is my code so far:
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(1)
time = #I'd like to specify time here.
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = [[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]]
for counter in np.arange(0, 5):
t = multi_array[time, counter] #I tried this, and it did not work.
s = plt.scatter(x[counter], y[counter], c = t, marker = 's')
I followed the advice I saw elsewhere to color by a third variable, which was to set the color equal to that variable, but then when I tried that with my data set, I just got all the points as one color, and then when I try it with this mockup it gives me the following error:
TypeError: list indices must be integers, not tuple
Could someone please help me color my points the way I need to?
If I understand the question (which I'm not at all sure off) here is the answer:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(1)
time = 2 #I'd like to specify time here.
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = np.asarray([[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]])
s = plt.scatter(x, y, c=log_array[time], marker = 's',s=100)
cb = plt.colorbar(s)
cb.set_label('log of ...')
After some tinkering, and using information learned from user4421975's answer and the link in the comments, I've puzzled it out. In short, I used plt.scatter's norm feature/attribute/thingie to mess with the colors and make them logarithmic.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(1)
time = 2
x = [1, 2, 3, 4, 5]
y = [5, 4, 3, 2, 1]
multi_array = np.asarray([[1, 1, 10, 100, 1000], [10000, 1000, 100, 10, 1], [300, 400, 5000, 12, 47]])
for counter in np.arange(0, 5):
s = plt.scatter(x[counter], y[counter], c = multi_array[time, counter], cmap = 'winter', norm = matplotlib.colors.LogNorm(vmin=multi_array[time].min(), vmax=multi_array[time].max()), marker = 's', )
cb = plt.colorbar(s)
cb.set_label('Log of Data')