I have a continuous input function which I would like to discretize into lets say 5-10 discrete bins between 1 and 0. Right now I am using np.digitize and rescale the output bins to 0-1. Now the problem is that sometime datasets (blue line) yield results like this:
I tried pushing up the number of discretization bins but I ended up keeping the same noise and getting just more increments. As an example where the algorithm worked with the same settings but another dataset:
this is the code I used there NumOfDisc = number of bins
intervals = np.linspace(0,1,NumOfDisc)
discretized_Array = np.digitize(Continuous_Array, intervals)
The red ilne in the graph is not important. The continuous blue line is the on I try to discretize and the green line is the discretized result.The Graphs are created with matplotlyib.pyplot using the following code:
def CheckPlots(discretized_Array, Continuous_Array, Temperature, time, PlotName)
logging.info("Plotting...")
#Setting Axis properties and titles
fig, ax = plt.subplots(1, 1)
ax.set_title(PlotName)
ax.set_ylabel('Temperature [°C]')
ax.set_ylim(40, 110)
ax.set_xlabel('Time [s]')
ax.grid(b=True, which="both")
ax2=ax.twinx()
ax2.set_ylabel('DC Power [%]')
ax2.set_ylim(-1.5,3.5)
#Plotting stuff
ax.plot(time, Temperature, label= "Input Temperature", color = '#c70e04')
ax2.plot(time, Continuous_Array, label= "Continuous Power", color = '#040ec7')
ax2.plot(time, discretized_Array, label= "Discrete Power", color = '#539600')
fig.legend(loc = "upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
logging.info("Done!")
logging.info("---")
return
Any Ideas what I could do to get sensible discretizations like in the second case?
The following solution gives the exact result you need.
Basically, the algorithm finds an ideal line, and attempts to replicate it as well as it can with less datapoints. It starts with 2 points at the edges (straight line), then adds one in the center, then checks which side has the greatest error, and adds a point in the center of that, and so on, until it reaches the desired bin count. Simple :)
import warnings
warnings.simplefilter('ignore', np.RankWarning)
def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
"""Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
xs = np.linspace(x0, x1, num=integral_points)
ys = straight_line(xs)
perfect_ys = ideal_line(xs)
err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0) # Remove (x1 - x0) to only look at avg errors
return err
def discretize_bisect(xs, ys, bin_count):
"""Returns xs and ys of discrete points"""
# For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
# If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
new_xs = [xs[0], xs[-1]]
new_ys = [ys[0], ys[-1]]
while len(new_xs) < bin_count:
errors = []
for i in range(len(new_xs)-1):
err = line_error(new_xs[i], new_ys[i], new_xs[i+1], new_ys[i+1], ideal_line)
errors.append(err)
max_segment_id = np.argmax(errors)
new_x = (new_xs[max_segment_id] + new_xs[max_segment_id+1]) / 2
new_y = ideal_line(new_x)
new_xs.insert(max_segment_id+1, new_x)
new_ys.insert(max_segment_id+1, new_y)
return new_xs, new_ys
BIN_COUNT = 25
new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)
plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))
Moreover, here's my simplified plotting function I tested with.
def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
"""A simplified version of the provided plotting function"""
# Setting Axis properties and titles
fig, ax = plt.subplots(figsize=(20, 4))
ax.set_title(plot_name)
ax.set_xlabel('Time [s]')
ax.set_ylabel('DC Power [%]')
# Plotting stuff
ax.plot(cont_time, cont_array, label="Continuous Power", color='#0000ff')
ax.plot(disc_time, disc_array, label="Discrete Power", color='#00ff00')
fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)
Lastly, here's the Google Colab
If what I described in the comments is the problem, there are a few options to deal with this:
Do nothing: Depending on the reason you're discretizing, you might want the discrete values to reflect the continuous values accurately
Change the bins: you could shift the bins or change the number of bins, such that relatively 'flat' parts of the blue line stay within one bin, thus giving a flat green line in these parts as well, which would be visually more pleasing like in your second plot.
Related
I have data with lots of x values around zero and only a few as you go up to around 950,
I want to create a plot with a non-linear x axis so that the relationship can be seen in a 'straight line' form. Like seen in this example,
I have tried using plt.xscale('log') but it does not achieve what I want.
I have not been able to use the log scale function with a scatter plot as it then only shows 3 values rather than the thousands that exist.
I have tried to work around it using
plt.plot(retper, aep_NW[y], marker='o', linewidth=0)
to replicate the scatter function which plots but does not show what I want.
plt.figure(1)
plt.scatter(rp,aep,label="SSI sum")
plt.show()
Image 3:
plt.figure(3)
plt.scatter(rp, aep)
plt.xscale('log')
plt.show()
Image 4:
plt.figure(4)
plt.plot(rp, aep, marker='o', linewidth=0)
plt.xscale('log')
plt.show()
ADDITION:
Hi thank you for the response.
I think you are right that my x axis is truncated but I'm not sure why or how...
I'm not really sure what to post code wise as the data is all large and coming from a server so can't really give you the data to see it with.
Basically aep_NW is a one dimensional array with 951 elements, values from 0-~140, with most values being small and only a few larger values. The data represents a storm severity index for 951 years.
Then I want the x axis to be the return period for these values, so basically I made a rp array, of the same size, which is given values from 951 down decreasing my a half each time.
I then sort the aep_NW values from lowest to highest with the highest value being associated with the largest return value (951), then the second highest aep_NW value associated with the second largest return period value (475.5) ect.
So then when I plot it I need the x axis scale to be similar to the example you showed above or the first image I attatched originally.
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1]/2
i = i - 1
y = np.argsort(aep_NW)
fig, ax = plt.subplots()
ax.scatter(rp,aep_NW[y],label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
plt.title("AEP for NW Europe: total loss per entire extended winter season")
plt.show()
It looks like in your "Image 3" the x axis is truncated, so that you don't see the data you are interested in. It appears this is due to there being 0's in your 'rp' array. I updated the examples to show the error you are seeing, one way to exclude the zeros, and one way to clip them and show them on a different scale.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 100
numseas = np.logspace(-5, 3, n)
aep_NW = np.linspace(0, 140, n)
rp = [0]*numseas.shape[0]
i = numseas.shape[0] - 1
rp[i] = numseas.shape[0]
i = i - 1
while i != 0:
rp[i] = rp[i+1] /2
i = i - 1
y = np.argsort(aep_NW)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.scatter(rp, aep_NW[y], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period")
ax.set_ylabel("SSI score")
ax = axes[1]
rp = np.array(rp)[y]
mask = rp > 0
ax.scatter(rp[mask], aep_NW[y][mask], label="SSI sum")
ax.set_xscale('log')
ax.set_xlabel("Return period (0 values excluded)")
ax = axes[2]
log2_clipped_rp = np.log2(rp.clip(2**-100, None))[y]
ax.scatter(log2_clipped_rp, aep_NW[y], label="SSI sum")
xticks = list(range(-110, 11, 20))
xticklabels = [f'$2^{{{i}}}$' for i in xticks]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
ax.set_xlabel("log$_2$ Return period (values clipped to 2$^{-100}$)")
plt.show()
How to use matplotlib or pyqtgraph draw plot like this:
Line AB is a two-directions street, green part represents the direction from point A to point B, red part represents B to A, width of each part represents the traffic volume. Widths are measured in point, will not changed at different zoom levels or dpi settings.
This is only an example, in fact I have hunderds of streets. This kind of plot is very common in many traffic softwares. I tried to use matplotlib's patheffect but result is frustrated:
from matplotlib import pyplot as plt
import matplotlib.patheffects as path_effects
x=[0,1,2,3]
y=[1,0,0,-1]
ab_width=20
ba_width=30
fig, axes= plt.subplots(1,1)
center_line, = axes.plot(x,y,color='k',linewidth=2)
center_line.set_path_effects(
[path_effects.SimpleLineShadow(offset=(0, -ab_width/2),shadow_color='g', alpha=1, linewidth=ab_width),
path_effects.SimpleLineShadow(offset=(0, ba_width/2), shadow_color='r', alpha=1, linewidth=ba_width),
path_effects.SimpleLineShadow(offset=(0, -ab_width), shadow_color='k', alpha=1, linewidth=2),
path_effects.SimpleLineShadow(offset=(0, ba_width), shadow_color='k', alpha=1, linewidth=2),
path_effects.Normal()])
axes.set_xlim(-1,4)
axes.set_ylim(-1.5,1.5)
One idea came to me is to take each part of the line as a standalone line, and recalculate it's position when changing zoom level, but it's too complicated and slow.
If there any easy way to use matplotlib or pyqtgraph draw what I want? Any suggestion will be appreciated!
If you can have each independent line, this can be done easily with the fill_between function.
from matplotlib import pyplot as plt
import numpy as np
x=np.array([0,1,2,3])
y=np.array([1,0,0,-1])
y1width=-1
y2width=3
y1 = y + y1width
y2 = y + y2width
fig = plt.figure()
ax = fig.add_subplot(111)
plt.plot(x,y, 'k', x,y1, 'k',x,y2, 'k',linewidth=2)
ax.fill_between(x, y1, y, color='g')
ax.fill_between(x, y2, y, color='r')
plt.xlim(-1,4)
plt.ylim(-3,6)
plt.show()
Here I considered the center line as the reference (thus the negative y1width), but could be done differently. The result is then:
If the lines are 'complicated', eventually intersecting at some point, then the keyword argument interpolate=True must be used to fill the crossover regions properly. Another interesting argument probably useful for your use case is where, to condition the region, for instance, where=y1 < 0. For more information you can check out the documentation.
One way of solving your issue is using filled polygons, some linear algebra and some calculus. The main idea is to draw a polygon along your x and y coordinates and along shifted coordinates to close and fill the polygon.
These are my results:
And here is the code:
from __future__ import division
import numpy
from matplotlib import pyplot, patches
def road(x, y, w, scale=0.005, **kwargs):
# Makes sure input coordinates are arrays.
x, y = numpy.asarray(x, dtype=float), numpy.asarray(y, dtype=float)
# Calculate derivative.
dx = x[2:] - x[:-2]
dy = y[2:] - y[:-2]
dy_dx = numpy.concatenate([
[(y[1] - y[0]) / (x[1] - x[0])],
dy / dx,
[(y[-1] - y[-2]) / (x[-1] - x[-2])]
])
# Offsets the input coordinates according to the local derivative.
offset = -dy_dx + 1j
offset = w * scale * offset / abs(offset)
y_offset = y + w * scale
#
AB = zip(
numpy.concatenate([x + offset.real, x[::-1]]),
numpy.concatenate([y + offset.imag, y[::-1]]),
)
p = patches.Polygon(AB, **kwargs)
# Returns polygon.
return p
if __name__ == '__main__':
# Some plot initializations
pyplot.close('all')
pyplot.ion()
# This is the list of coordinates of each point
x = [0, 1, 2, 3, 4]
y = [1, 0, 0, -1, 0]
# Creates figure and axes.
fig, ax = pyplot.subplots(1,1)
ax.axis('equal')
center_line, = ax.plot(x, y, color='k', linewidth=2)
AB = road(x, y, 20, color='g')
BA = road(x, y, -30, color='r')
ax.add_patch(AB)
ax.add_patch(BA)
The first step in calculating how to offset each data point is by calculating the discrete derivative dy / dx. I like to use complex notation to handle vectors in Python, i.e. A = 1 - 1j. This makes life easier for some mathematical operations.
The next step is to remember that the derivative gives the tangent to the curve and from linear algebra that the normal to the tangent is n=-dy_dx + 1j, using complex notation.
The final step in determining the offset coordinates is to ensure that the normal vector has unity size n_norm = n / abs(n) and multiply by the desired width of the polygon.
Now that we have all the coordinates for the points in the polygon, the rest is quite straightforward. Use patches.Polygon and add them to the plot.
This code allows you also to define if you want the patch on top of your route or below it. Just give a positive or negative value for the width. If you want to change the width of the polygon depending on your zoom level and/or resolution, you adjust the scale parameter. It also gives you freedom to add additional parameters to the patches such as fill patterns, transparency, etc.
I need to plot a plot a normalized histogram (by normalized I mean divided by a fixed value) using the histtype='step' style.
The issue is that plot.bar() doesn't seem to support that style and if I use instead plot.hist() which does, I can't (or at least don't know how) plot the normalized histogram.
Here's a MWE of what I mean:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=10., high=20., size=(200,))
# Generate data.
x1 = rand_data()
# Define histogram params.
binwidth = 0.25
x_min, x_max = x1.min(), x1.max()
bin_n = np.arange(int(x_min), int(x_max + binwidth), binwidth)
# Obtain histogram.
hist1, edges1 = np.histogram(x1, bins=bin_n)
# Normalization parameter.
param = 5.
# Plot histogram normalized by the parameter defined above.
plt.ylim(0, 3)
plt.bar(edges1[:-1], hist1 / param, width=binwidth, color='none', edgecolor='r')
plt.show()
(notice the normalization: hist1 / param) which produces this:
I can generate a histtype='step' histogram using:
plt.hist(x1, bins=bin_n, histtype='step', color='r')
and get:
but then it wouldn't be normalized by the param value.
The step plot will generate the appearance that you want from a set of bins and the count (or normalized count) in those bins. Here I've used plt.hist to get the counts, then plot them, with the counts normalized. It's necessary to duplicate the first entry in order to get it to actually have a line there.
(a,b,c) = plt.hist(x1, bins=bin_n, histtype='step', color='r')
a = np.append(a[0],a[:])
plt.close()
step(b,a/param,color='r')
This is not quite right, because it doesn't finish the plot correctly. the end of the line is hanging in free space rather than dropping down the x axis.
you can fix that by adding a 0 to the end of 'a' and one more bin point to b
a=np.append(a[:],0)
b=np.append(b,(2*b[-1]-b[-2]))
step(b,a/param,color='r')
lastly, the ax.step mentioned would be used if you had used
fig, ax = plt.subplots()
to give you access to the figure and axis directly. For examples, see http://matplotlib.org/examples/ticks_and_spines/spines_demo_bounds.html
Based on tcaswell's comment (use step) I've developed my own answer. Notice that I need to add elements to both the x (one zero element at the beginning of the array) and y arrays (one zero element at the beginning and another at the end of the array) so that step will plot the vertical lines at the beginning and the end of the bars.
Here's the code:
import matplotlib.pyplot as plt
import numpy as np
def rand_data():
return np.random.uniform(low=10., high=20., size=(5000,))
# Generate data.
x1 = rand_data()
# Define histogram params.
binwidth = 0.25
x_min, x_max = x1.min(), x1.max()
bin_n = np.arange(int(x_min), int(x_max + binwidth), binwidth)
# Obtain histogram.
hist1, edges1 = np.histogram(x1, bins=bin_n)
# Normalization parameter.
param = 5.
# Create arrays adding elements so plt.bar will plot the first and last
# vertical bars.
x2 = np.concatenate((np.array([0.]), edges1))
y2 = np.concatenate((np.array([0.]), (hist1 / param), np.array([0.])))
# Plot histogram normalized by the parameter defined above.
plt.xlim(min(edges1) - (min(edges1) / 10.), max(edges1) + (min(edges1) / 10.))
plt.bar(x2, y2, width=binwidth, color='none', edgecolor='b')
plt.step(x2, y2, where='post', color='r', ls='--')
plt.show()
and here's the result:
The red lines generated by step are equal to those blue lines generated by bar as can be seen.
My question is a bit similar to this question that draws line with width given in data coordinates. What makes my question a bit more challenging is that unlike the linked question, the segment that I wish to expand is of a random orientation.
Let's say if the line segment goes from (0, 10) to (10, 10), and I wish to expand it to a width of 6. Then it is simply
x = [0, 10]
y = [10, 10]
ax.fill_between(x, y - 3, y + 3)
However, my line segment is of random orientation. That is, it is not necessarily along x-axis or y-axis. It has a certain slope.
A line segment s is defined as a list of its starting and ending points: [(x1, y1), (x2, y2)].
Now I wish to expand the line segment to a certain width w. The solution is expected to work for a line segment in any orientation. How to do this?
plt.plot(x, y, linewidth=6.0) cannot do the trick, because I want my width to be in the same unit as my data.
The following code is a generic example on how to make a line plot in matplotlib using data coordinates as linewidth. There are two solutions; one using callbacks, one using subclassing Line2D.
Using callbacks.
It is implemted as a class data_linewidth_plot that can be called with a signature pretty close the the normal plt.plot command,
l = data_linewidth_plot(x, y, ax=ax, label='some line', linewidth=1, alpha=0.4)
where ax is the axes to plot to. The ax argument can be omitted, when only one subplot exists in the figure. The linewidth argument is interpreted in (y-)data units.
Further features:
It's independend on the subplot placements, margins or figure size.
If the aspect ratio is unequal, it uses y data coordinates as the linewidth.
It also takes care that the legend handle is correctly set (we may want to have a huge line in the plot, but certainly not in the legend).
It is compatible with changes to the figure size, zoom or pan events, as it takes care of resizing the linewidth on such events.
Here is the complete code.
import matplotlib.pyplot as plt
class data_linewidth_plot():
def __init__(self, x, y, **kwargs):
self.ax = kwargs.pop("ax", plt.gca())
self.fig = self.ax.get_figure()
self.lw_data = kwargs.pop("linewidth", 1)
self.lw = 1
self.fig.canvas.draw()
self.ppd = 72./self.fig.dpi
self.trans = self.ax.transData.transform
self.linehandle, = self.ax.plot([],[],**kwargs)
if "label" in kwargs: kwargs.pop("label")
self.line, = self.ax.plot(x, y, **kwargs)
self.line.set_color(self.linehandle.get_color())
self._resize()
self.cid = self.fig.canvas.mpl_connect('draw_event', self._resize)
def _resize(self, event=None):
lw = ((self.trans((1, self.lw_data))-self.trans((0, 0)))*self.ppd)[1]
if lw != self.lw:
self.line.set_linewidth(lw)
self.lw = lw
self._redraw_later()
def _redraw_later(self):
self.timer = self.fig.canvas.new_timer(interval=10)
self.timer.single_shot = True
self.timer.add_callback(lambda : self.fig.canvas.draw_idle())
self.timer.start()
fig1, ax1 = plt.subplots()
#ax.set_aspect('equal') #<-not necessary
ax1.set_ylim(0,3)
x = [0,1,2,3]
y = [1,1,2,2]
# plot a line, with 'linewidth' in (y-)data coordinates.
l = data_linewidth_plot(x, y, ax=ax1, label='some 1 data unit wide line',
linewidth=1, alpha=0.4)
plt.legend() # <- legend possible
plt.show()
(I updated the code to use a timer to redraw the canvas, due to this issue)
Subclassing Line2D
The above solution has some drawbacks. It requires a timer and callbacks to update itself on changing axis limits or figure size. The following is a solution without such needs. It will use a dynamic property to always calculate the linewidth in points from the desired linewidth in data coordinates on the fly. It is much shorter than the above.
A drawback here is that a legend needs to be created manually via a proxyartist.
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
class LineDataUnits(Line2D):
def __init__(self, *args, **kwargs):
_lw_data = kwargs.pop("linewidth", 1)
super().__init__(*args, **kwargs)
self._lw_data = _lw_data
def _get_lw(self):
if self.axes is not None:
ppd = 72./self.axes.figure.dpi
trans = self.axes.transData.transform
return ((trans((1, self._lw_data))-trans((0, 0)))*ppd)[1]
else:
return 1
def _set_lw(self, lw):
self._lw_data = lw
_linewidth = property(_get_lw, _set_lw)
fig, ax = plt.subplots()
#ax.set_aspect('equal') # <-not necessary, if not given, y data is assumed
ax.set_xlim(0,3)
ax.set_ylim(0,3)
x = [0,1,2,3]
y = [1,1,2,2]
line = LineDataUnits(x, y, linewidth=1, alpha=0.4)
ax.add_line(line)
ax.legend([Line2D([],[], linewidth=3, alpha=0.4)],
['some 1 data unit wide line']) # <- legend possible via proxy artist
plt.show()
Just to add to the previous answer (can't comment yet), here's a function that automates this process without the need for equal axes or the heuristic value of 0.8 for labels. The data limits and size of the axis need to be fixed and not changed after this function is called.
def linewidth_from_data_units(linewidth, axis, reference='y'):
"""
Convert a linewidth in data units to linewidth in points.
Parameters
----------
linewidth: float
Linewidth in data units of the respective reference-axis
axis: matplotlib axis
The axis which is used to extract the relevant transformation
data (data limits and size must not change afterwards)
reference: string
The axis that is taken as a reference for the data width.
Possible values: 'x' and 'y'. Defaults to 'y'.
Returns
-------
linewidth: float
Linewidth in points
"""
fig = axis.get_figure()
if reference == 'x':
length = fig.bbox_inches.width * axis.get_position().width
value_range = np.diff(axis.get_xlim())
elif reference == 'y':
length = fig.bbox_inches.height * axis.get_position().height
value_range = np.diff(axis.get_ylim())
# Convert length to points
length *= 72
# Scale linewidth to value range
return linewidth * (length / value_range)
Explanation:
Set up the figure with a known height and make the scale of the two axes equal (or else the idea of "data coordinates" does not apply). Make sure the proportions of the figure match the expected proportions of the x and y axes.
Compute the height of the whole figure point_hei (including margins) in units of points by multiplying inches by 72
Manually assign the y-axis range yrange (You could do this by plotting a "dummy" series first and then querying the plot axis to get the lower and upper y limits.)
Provide the width of the line that you would like in data units linewid
Calculate what those units would be in points pointlinewid while adjusting for the margins. In a single-frame plot, the plot is 80% of the full image height.
Plot the lines, using a capstyle that does not pad the ends of the line (has a big effect at these large line sizes)
Voilà? (Note: this should generate the proper image in the saved file, but no guarantees if you resize a plot window.)
import matplotlib.pyplot as plt
rez=600
wid=8.0 # Must be proportional to x and y limits below
hei=6.0
fig = plt.figure(1, figsize=(wid, hei))
sp = fig.add_subplot(111)
# # plt.figure.tight_layout()
# fig.set_autoscaley_on(False)
sp.set_xlim([0,4000])
sp.set_ylim([0,3000])
plt.axes().set_aspect('equal')
# line is in points: 72 points per inch
point_hei=hei*72
xval=[100,1300,2200,3000,3900]
yval=[10,200,2500,1750,1750]
x1,x2,y1,y2 = plt.axis()
yrange = y2 - y1
# print yrange
linewid = 500 # in data units
# For the calculation below, you have to adjust width by 0.8
# because the top and bottom 10% of the figure are labels & axis
pointlinewid = (linewid * (point_hei/yrange)) * 0.8 # corresponding width in pts
plt.plot(xval,yval,linewidth = pointlinewid,color="blue",solid_capstyle="butt")
# just for fun, plot the half-width line on top of it
plt.plot(xval,yval,linewidth = pointlinewid/2,color="red",solid_capstyle="butt")
plt.savefig('mymatplot2.png',dpi=rez)
I have a complicated curve defined as a set of points in a table like so (the full table is here):
# x y
1.0577 12.0914
1.0501 11.9946
1.0465 11.9338
...
If I plot this table with the commands:
plt.plot(x_data, y_data, c='b',lw=1.)
plt.scatter(x_data, y_data, marker='o', color='k', s=10, lw=0.2)
I get the following:
where I've added the red points and segments manually. What I need is a way to calculate those segments for each of those points, that is: a way to find the minimum distance from a given point in this 2D space to the interpolated curve.
I can't use the distance to the data points themselves (the black dots that generate the blue curve) since they are not located at equal intervals, sometimes they are close and sometimes they are far apart and this deeply affects my results further down the line.
Since this is not a well behaved curve I'm not really sure what I could do. I've tried interpolating it with a UnivariateSpline but it returns a very poor fit:
# Sort data according to x.
temp_data = zip(x_data, y_data)
temp_data.sort()
# Unpack sorted data.
x_sorted, y_sorted = zip(*temp_data)
# Generate univariate spline.
s = UnivariateSpline(x_sorted, y_sorted, k=5)
xspl = np.linspace(0.8, 1.1, 100)
yspl = s(xspl)
# Plot.
plt.scatter(xspl, yspl, marker='o', color='r', s=10, lw=0.2)
I also tried increasing the number of interpolating points but got a mess:
# Sort data according to x.
temp_data = zip(x_data, y_data)
temp_data.sort()
# Unpack sorted data.
x_sorted, y_sorted = zip(*temp_data)
t = np.linspace(0, 1, len(x_sorted))
t2 = np.linspace(0, 1, 100)
# One-dimensional linear interpolation.
x2 = np.interp(t2, t, x_sorted)
y2 = np.interp(t2, t, y_sorted)
plt.scatter(x2, y2, marker='o', color='r', s=10, lw=0.2)
Any ideas/pointers will be greatly appreciated.
If you're open to using a library for this, have a look at shapely: https://github.com/Toblerity/Shapely
As a quick example (points.txt contains the data you linked to in your question):
import shapely.geometry as geom
import numpy as np
coords = np.loadtxt('points.txt')
line = geom.LineString(coords)
point = geom.Point(0.8, 10.5)
# Note that "line.distance(point)" would be identical
print(point.distance(line))
As an interactive example (this also draws the line segments you wanted):
import numpy as np
import shapely.geometry as geom
import matplotlib.pyplot as plt
class NearestPoint(object):
def __init__(self, line, ax):
self.line = line
self.ax = ax
ax.figure.canvas.mpl_connect('button_press_event', self)
def __call__(self, event):
x, y = event.xdata, event.ydata
point = geom.Point(x, y)
distance = self.line.distance(point)
self.draw_segment(point)
print 'Distance to line:', distance
def draw_segment(self, point):
point_on_line = line.interpolate(line.project(point))
self.ax.plot([point.x, point_on_line.x], [point.y, point_on_line.y],
color='red', marker='o', scalex=False, scaley=False)
fig.canvas.draw()
if __name__ == '__main__':
coords = np.loadtxt('points.txt')
line = geom.LineString(coords)
fig, ax = plt.subplots()
ax.plot(*coords.T)
ax.axis('equal')
NearestPoint(line, ax)
plt.show()
Note that I've added ax.axis('equal'). shapely operates in the coordinate system that the data is in. Without the equal axis plot, the view will be distorted, and while shapely will still find the nearest point, it won't look quite right in the display:
The curve is by nature parametric, i.e. for each x there isn't necessary a unique y and vice versa. So you shouldn't interpolate a function of the form y(x) or x(y). Instead, you should do two interpolations, x(t) and y(t) where t is, say, the index of the corresponding point.
Then you use scipy.optimize.fminbound to find the optimal t such that (x(t) - x0)^2 + (y(t) - y0)^2 is the smallest, where (x0, y0) are the red dots in your first figure. For fminsearch, you could specify the min/max bound for t to be 1 and len(x_data)
You could try implementing a calculation of distance from point to line on incremental pairs of points on the curve and finding that minimum. This will introduce a small bit of error from the curve as drawn, but it should be very small, as the points are relatively close together.
http://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line
You can easily use the package trjtrypy in PyPI: https://pypi.org/project/trjtrypy/
All needed computations and visualizations are available in this package. You can get your answer within a line of code like:
to get the minimum distance use: trjtrypy.basedists.distance(points, curve)
to visualize the curve and points use: trjtrypy.visualizations.draw_landmarks_trajectory(points, curve)