I have some data of a particle moving in a corridor with closed boundary conditions.
Plotting the trajectory leads to a zig-zag trajectory.
I would like to know how to prevent plot() from connecting the points where the particle comes back to the start. Some thing like in the upper part of the pic, but without "."
The first idea I had was to find the index where the numpy array a[:-1]-a[1:] becomes positive and then plot from 0 to that index. But how would I get the index of the first occurrence of a positive element of a[:-1]-a[1:]?
Maybe there are some other ideas.
I'd go a different approach. First, I'd determine the jump points not by looking at the sign of the derivative, as probably the movement might go up or down, or even have some periodicity in it. I'd look at those points with the biggest derivative.
Second, an elegant approach to have breaks in a plot line is to mask one value on each jump. Then matplotlib will make segments automatically. My code is:
import pylab as plt
import numpy as np
xs = np.linspace(0., 100., 1000.)
data = (xs*0.03 + np.sin(xs) * 0.1) % 1
plt.subplot(2,1,1)
plt.plot(xs, data, "r-")
#Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(data))
mask = np.hstack([ abs_d_data > abs_d_data.mean()+3*abs_d_data.std(), [False]])
masked_data = np.ma.MaskedArray(data, mask)
plt.subplot(2,1,2)
plt.plot(xs, masked_data, "b-")
plt.show()
And gives us as result:
The disadvantage of course is that you lose one point at each break - but with the sampling rate you seem to have I guess you can trade this in for simpler code.
To find where the particle has crossed the upper boundary, you can do something like this:
>>> import numpy as np
>>> a = np.linspace(0, 10, 50) % 5
>>> a = np.linspace(0, 10, 50) % 5 # some sample data
>>> np.nonzero(np.diff(a) < 0)[0] + 1
array([25, 49])
>>> a[24:27]
array([ 4.89795918, 0.10204082, 0.30612245])
>>> a[48:]
array([ 4.79591837, 0. ])
>>>
np.diff(a) calculates the discrete difference of a, while np.nonzero finds where the condition np.diff(a) < 0 is negative, i.e., the particle has moved downward.
To avoid the connecting line you will have to plot by segments.
Here's a quick way to plot by segments when the derivative of a changes sign:
import numpy as np
a = np.linspace(0, 20, 50) % 5 # similar to Micheal's sample data
x = np.arange(50) # x scale
indices = np.where(np.diff(a) < 0)[0] + 1 # the same as Micheal's np.nonzero
for n, i in enumerate(indices):
if n == 0:
plot(x[:i], a[:i], 'b-')
else:
plot(x[indices[n - 1]:i], a[indices[n - 1]:i], 'b-')
Based on Thorsten Kranz answer a version which adds points to the original data when the 'y' crosses the period. This is important if the density of data-points isn't very high, e.g. np.linspace(0., 100., 100) vs. the original np.linspace(0., 100., 1000). The x position of the curve transitions are linear interpolated. Wrapped up in a function its:
import numpy as np
def periodic2plot(x, y, period=np.pi*2.):
indexes = np.argwhere(np.abs(np.diff(y))>.5*period).flatten()
index_shift = 0
for i in indexes:
i += index_shift
index_shift += 3 # in every loop it adds 3 elements
if y[i] > .5*period:
x_transit = np.interp(period, np.unwrap(y[i:i+2], period=period), x[i:i+2])
add = np.ma.array([ period, 0., 0.], mask=[0,1,0])
else:
# interpolate needs sorted xp = np.unwrap(y[i:i+2], period=period)
x_transit = np.interp(0, np.unwrap(y[i:i+2], period=period)[::-1], x[i:i+2][::-1])
add = np.ma.array([ 0., 0., period], mask=[0,1,0])
x_add = np.ma.array([x_transit]*3, mask=[0,1,0])
x = np.ma.hstack((x[:i+1], x_add, x[i+1:]))
y = np.ma.hstack((y[:i+1], add, y[i+1:]))
return x, y
The code for comparison to the original answer of Thorsten Kranz with lower data-points density.
import matplotlib.pyplot as plt
x = np.linspace(0., 100., 100)
y = (x*0.03 + np.sin(x) * 0.1) % 1
#Thorsten Kranz: Make a masked array with jump points masked
abs_d_data = np.abs(np.diff(y))
mask = np.hstack([np.abs(np.diff(y))>.5, [False]])
masked_y = np.ma.MaskedArray(y, mask)
# Plot
plt.figure()
plt.plot(*periodic2plot(x, y, period=1), label='This answer')
plt.plot(x, masked_y, label='Thorsten Kranz')
plt.autoscale(enable=True, axis='both', tight=True)
plt.legend(loc=1)
plt.tight_layout()
Related
I want to draw a volume in x1,x2,x3-space. The volume is an isocurve found by the marching cubes algorithm in skimage. The function generating the volume is pdf_grid = f(x1,x2,x3) and
I want to draw the volume where pdf = 60% max(pdf).
My issue is that the marching cubes algorithm generates vertices and faces, but how do I map those back to the x1, x2, x3-space?
My (rather limited) understanding of marching cubes is that "vertices" refer to the indices in the volume (pdf_grid in my case). If "vertices" contained only the exact indices in the grid this would have been easy, but "vertices" contains floats and not integers. It seems like marching cubes do some interpolation between grid points (according to https://www.cs.carleton.edu/cs_comps/0405/shape/marching_cubes.html), so the question is then how to recover exactly the values of x1,x2,x3?
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt
#Make some random data
cov = np.array([[1, .2, -.5],
[.2, 1.2, .1],
[-.5, .1, .8]])
dist = scipy.stats.multivariate_normal(mean = [1., 3., 2], cov = cov)
N = 500
x_samples = dist.rvs(size=N).T
#Create the kernel density estimator - approximation of a pdf
kernel = scipy.stats.gaussian_kde(x_samples)
x_mean = x_samples.mean(axis=1)
#Find the mode
res = scipy.optimize.minimize(lambda x: -kernel.logpdf(x),
x_mean #x0, initial guess
)
x_mode = res["x"]
num_el = 50 #number of elements in the grid
x_min = np.min(x_samples, axis = 1)
x_max = np.max(x_samples, axis = 1)
x1g, x2g, x3g = np.mgrid[x_min[0]:x_max[0]:num_el*1j,
x_min[1]:x_max[1]:num_el*1j,
x_min[2]:x_max[2]:num_el*1j
]
pdf_grid = np.zeros(x1g.shape) #implicit function/grid for the marching cubes
for an in range(x1g.shape[0]):
for b in range(x1g.shape[1]):
for c in range(x1g.shape[2]):
pdf_grid[a,b,c] = kernel(np.array([x1g[a,b,c],
x2g[a,b,c],
x3g[a,b,c]]
))
from mpl_toolkits.mplot3d.art3d import Poly3DCollection
from skimage import measure
iso_level = .6 #draw a volume which contains pdf_val(mode)*60%
verts, faces, normals, values = measure.marching_cubes(pdf_grid, kernel(x_mode)*iso_level)
#How to convert the figure back to x1,x2,x3 space? I just draw the output as it was done in the skimage example here https://scikit-image.org/docs/0.16.x/auto_examples/edges/plot_marching_cubes.html#sphx-glr-auto-examples-edges-plot-marching-cubes-py so you can see the volume
# Fancy indexing: `verts[faces]` to generate a collection of triangles
mesh = Poly3DCollection(verts[faces],
alpha = .5,
label = f"KDE = {iso_level}"+r"$x_{mode}$",
linewidth = .1)
mesh.set_edgecolor('k')
fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
c1 = ax.add_collection3d(mesh)
c1._facecolors2d=c1._facecolor3d
c1._edgecolors2d=c1._edgecolor3d
#Plot the samples. Marching cubes volume does not capture these samples
pdf_val = kernel(x_samples) #get density value for each point (for color-coding)
x1, x2, x3 = x_samples
scatter_plot = ax.scatter(x1, x2, x3, c=pdf_val, alpha = .2, label = r" samples")
ax.scatter(x_mode[0], x_mode[1], x_mode[2], c = "r", alpha = .2, label = r"$x_{mode}$")
ax.set_xlabel(r"$x_1$")
ax.set_ylabel(r"$x_2$")
ax.set_zlabel(r"$x_3$")
# ax.set_box_aspect([np.ptp(i) for me in x_samples]) # equal aspect ratio
cbar = fig.color bar(scatter_plot, ax=ax)
cbar.set_label(r"$KDE(w) \approx pdf(w)$")
ax.legend()
#Make the axis limit so that the volume and samples are shown.
ax.set_xlim(- 5, np.max(verts, axis=0)[0] + 3)
ax.set_ylim(- 5, np.max(verts, axis=0)[1] + 3)
ax.set_zlim(- 5, np.max(verts, axis=0)[2] + 3)
This is probably way too late of an answer to help OP, but in case anyone else comes across this post looking for a solution to this problem, the issue stems from the marching cubes algorithm outputting the relevant vertices in array space. This space is defined by the number of elements per dimension of the mesh grid and the marching cubes algorithm does indeed do some interpolation in this space (explaining the presence of floats).
Anyways, in order to transform the vertices back into x1,x2,x3 space you just need to scale and shift them by the appropriate quantities. These quantities are defined by the range, number of elements of the mesh grid, and the minimum value in each dimension respectively. So using the variables defined in the OP, the following will provide the actual location of the vertices:
verts_actual = verts*((x_max-x_min)/pdf_grid.shape) + x_min
I have the following data set where I have to estimate the joint density of 'bwt' and 'age' using kernel density estimation with a 2-dimensional Gaussian kernel and width h=5. I can't use modules such as scipy where there are ready functions to do this and I have to built functions to calculate the density. Here's what I've gotten so far.
import numpy as np
import pandas as pd
babies_full = pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
#2d Gaussian kernel
def k_2dgauss(x):
return np.exp(-np.sum(x**2, 1)/2) / np.sqrt(2*np.pi)
#Multivariate kernel density
def mv_kernel_density(t, x, h):
d = x.shape[1]
return np.mean(k_2dgauss((t - x)/h))/h**d
t = np.linspace(1.0, 5.0, 50)
h=5
print(mv_kernel_density(t, x, h))
However, I get a value error 'ValueError: operands could not be broadcast together with shapes (50,) (1173,2)' which think is because different shape of the matrices. I also don't understand why k_2dgauss(x) for me returns an array of zeros since it should only return one value. In general, I am new to the concept of kernel density estimation I don't really know if I've written the functions right so any hints would help!
Following on from my comments on your original post, I think this is what you want to do, but if not then come back to me and we can try again.
# info supplied by OP
import numpy as np
import pandas as pdbabies_full = \
pd.read_csv("https://www2.helsinki.fi/sites/default/files/atoms/files/babies2.txt", sep='\t')
#Getting the columns I need
babies_full1=babies_full[['gestation', 'age']]
x=np.array(babies_full1,'int')
# my contributions
from math import floor, ceil
def binMaker(arr, base):
"""function I already use for this sort of thing.
arr is the arr I want to make bins for
base is the bin separation, but does require you to import floor and ceil
otherwise you can make these bins manually yourself"""
binMin = floor(arr.min() / base) * base
binMax = ceil(arr.max() / base) * base
return np.arange(binMin, binMax + base, base)
bins1 = binMaker(x[:,0], 20.) # bins from 140. to 360. spaced 20 apart
bins2 = binMaker(x[:,1], 5.) # bins from 15. to 45. spaced 5. apart
counts = np.zeros((len(bins1)-1, len(bins2)-1)) # empty array for counts to go in
for i in range(0, len(bins1)-1): # loop over the intervals, hence the -1
boo = (x[:,0] >= bins1[i]) * (x[:,0] < bins1[i+1])
for j in range(0, len(bins2)-1): # loop over the intervals, hence the -1
counts[i,j] = np.count_nonzero((x[boo,1] >= bins2[j]) *
(x[boo,1] < bins2[j+1]))
# if you want your PDF to be a fraction of the total
# rather than the number of counts, do the next line
counts /= x.shape[0]
# plotting
import matplotlib.pyplot as plt
from matplotlib.colors import BoundaryNorm
# setting the levels so that each number in counts has its own colour
levels = np.linspace(-0.5, counts.max()+0.5, int(counts.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, counts, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('Manually making a 2D (joint) PDF')
If this is what you wanted, then there is an easier way with np.histgoram2d, although I think you specified it had to be using your own methods, and not built in functions. I've included it anyway for completeness' sake.
pdf = np.histogram2d(x[:,0], x[:,1], bins=(bins1,bins2))[0]
pdf /= x.shape[0] # again for normalising and making a percentage
levels = np.linspace(-0.5, pdf.max()+0.5, int(pdf.max())+2)
cmap = plt.get_cmap('viridis') # or any colormap you like
norm = BoundaryNorm(levels, ncolors=cmap.N, clip=True)
fig, ax = plt.subplots(1, 1, figsize=(6,5), dpi=150)
pcm = ax.pcolormesh(bins2, bins1, pdf, ec='k', lw=1)
fig.colorbar(pcm, ax=ax, label='Counts (%)')
ax.set_xlabel('Age')
ax.set_ylabel('Gestation')
ax.set_xticks(bins2)
ax.set_yticks(bins1)
plt.title('using np.histogram2d to make a 2D (joint) PDF')
Final note - in this example, the only place where counts doesn't equal pdf is for the bin between 40 <= age < 45 and 280 <= gestation 300, which I think is due to how, in my manual case, I've used <= and <, and I'm a little unsure how np.histogram2d handles values outside the bin ranges, or on the bin edges etc. We can see the element of x that is responsible
>>> print(x[1011])
[280 45]
I have some data from a bioanalyzer which gives me time (x-axis) and absorbance values (y-axis). The time is every .05 seconds and its from 32s to 138 so you can imagine how many data points I have. I've created a graph using plotly and matplotlib, just so that I have more libraries to work with to find a solution, so a solution in either library is ok! What I'm trying to do is make my script find the area under each peak and return my value.
def create_plot(sheet_name):
sample = book.sheet_by_name(sheet_name)
data = [[sample.cell_value(r, c) for r in range(sample.nrows)] for c in range(sample.ncols)]
y = data[2][18:len(data[2]) - 2]
x = np.arange(32, 138.05, 0.05)
indices = peakutils.indexes(y, thres=0.35, min_dist=0.1)
peaks = [y[i] for i in indices]
This snippet gets my Y values, X values and indices of the peaks. Now is there a way to get the area under each curve? Let's say that there are 15 indices.
Here's what the graph looks like:
An automated answer
Given a set of x and y values as well as a set of peaks (the x-coordinates of the peaks), here's how you can automatically find the area under each of the peaks. I'm assuming that x, y, and peaks are all Numpy arrays:
import numpy as np
# find the minima between each peak
ixpeak = x.searchsorted(peaks)
ixmin = np.array([np.argmin(i) for i in np.split(y, ixpeak)])
ixmin[1:] += ixpeak
mins = x[ixmin]
# split up the x and y values based on those minima
xsplit = np.split(x, ixmin[1:-1])
ysplit = np.split(y, ixmin[1:-1])
# find the areas under each peak
areas = [np.trapz(ys, xs) for xs,ys in zip(xsplit, ysplit)]
Output:
The example data has been set up so that the area under each peak is (more-or-less) guaranteed to be 1.0, so the results in the bottom plot are correct. The green X marks are the locations of the minimum between each two peaks. The part of the curve "belonging" to each peak is determined as the part of the curve in-between the minima adjacent to each peak.
Complete code
Here's the complete code I used to generate the example data:
import scipy as sp
import scipy.stats
prec = 1e5
n = 10
N = 150
r = np.arange(0, N+1, N//n)
# generate some reasonable fake data
peaks = np.array([np.random.uniform(s, e) for s,e in zip(r[:-1], r[1:])])
x = np.linspace(0, N + n, num=int(prec))
y = np.max([sp.stats.norm.pdf(x, loc=p, scale=.4) for p in peaks], axis=0)
and the code I used to make the plots:
import matplotlib.pyplot as plt
# plotting stuff
plt.figure(figsize=(5,7))
plt.subplots_adjust(hspace=.33)
plt.subplot(211)
plt.plot(x, y, label='trace 0')
plt.plot(peaks, y[ixpeak], '+', c='red', ms=10, label='peaks')
plt.plot(mins, y[ixmin], 'x', c='green', ms=10, label='mins')
plt.xlabel('dep')
plt.ylabel('indep')
plt.title('Example data')
plt.ylim(-.1, 1.6)
plt.legend()
plt.subplot(212)
plt.bar(np.arange(len(areas)), areas)
plt.xlabel('Peak number')
plt.ylabel('Area under peak')
plt.title('Area under the peaks of trace 0')
plt.show()
tl;dr trying to create plots using for loop in Python 3 but it is resulting in weird inconsistent plots.
I am trying to create plots like Figure 2 of Fox et al. 2015 (https://arxiv.org/pdf/1412.1480.pdf). I have the different ion lines saved in a dictionary with their parameters saved as lists. For example,
DictIon = {
'''The way this dictionary is defined is as follows: ion name:
[wavelength, f-value, continuum lower 1, continuum higher 1, continuum lower 2,
continuum higher 2, subplot x, subplot y, ion name x, ion name y]'''
'C II 1334': [1334.5323,.1278,-400,-200,300,500,0,0,-380,.2],
'Si II 1260': [1260.4221,1.007,-500,-370,300,500,2,0,-380,.15],
'Si II 1193': [1193.2897,0.4991,-500,-200,200,500,3,0,-380,.2]
}
In order to generate the plot, I have written the following code where at first I calculate the continuum of the absorption lines and then use data from a Voigt Profile algorithm that gives me the zval and bval that I use in the Voigt profile section of the code, and finally I combine these two and plot the values in appropriate subplots. The code is:
f, axes = plt.subplots(6, 2, sharex='col', sharey='row',figsize = (15,15))
for ion in DictIon:
w0 = DictIon[ion][0] #Rest wavelength
fv = DictIon[ion][1] #Oscillator strengh/f-value
velocity = (wavelength-Lambda)/Lambda*c
#Fit the continuum
low1 = DictIon[ion][2] #lower bound for continuum fit on the left side
high1 = DictIon[ion][3] #upper bound for continuum fit on the left side
low2 = DictIon[ion][4] #lower bound for continuum fit on the right side
high2 = DictIon[ion][5] #upper bound for continuum fit on the right side
x1 = velocity[(velocity>=low1) & (velocity<=high1)]
x2 = velocity[(velocity>=low2) & (velocity<=high2)]
X = np.append(x1,x2)
y1 = flux[(velocity>=low1) & (velocity<=high1)]
y2 = flux[(velocity>=low2) & (velocity<=high2)]
Y = np.append(y1,y2)
Z = np.polyfit(X,Y,1)
#Generate data to plot continuum
xp = np.linspace(-500,501,len(flux[(velocity>=-500) & (velocity<=500)]))
p = np.poly1d(Z)
#Normalize flux
norm_flux = flux[(velocity>=-500) & (velocity<=500)]/p(xp)
#Create a line at y=1
tmp1 = np.linspace(-500,500,10)
tmp2 = np.full((1,10),1)[0]
'''Generate Voigt Profile Fits'''
#Initialize arrays
vmod = (np.arange(npix+1)-(npix/pixsize))*pixsize #-npix to npix in steps of pix
fmodraw = np.ndarray((npix+1)); fmodraw.fill(1.0)
ncom = len(zval)+1
fitn = 10**(logn); fitne = 10**(logn+elogn)-10**logn
totcol=np.log10(np.sum(fitn))
etotcol=np.sqrt(np.sum(elogn**2)) #strictly only true if independent
#Set up arrays
sigma=bval/np.sqrt(2.0); tau0=np.ndarray((ncom)); tauv=np.ndarray((ncom,npix))
find=np.ndarray((ncom, npix)); sfind=np.ndarray((ncom, npix)) #smoothed
#go from z to velocity
v0=c*(zval-zmod) / (1.0+zmod); ev0=c*zvale / (1.0+zmod)
bv=bval; ebv=bvale
#generate models for each comp, where tau is Gaussian with velocity
for k in range(0, ncom-1):
tau0[k]=(1.497e-2*(10**logn[k])*(w0/1.0e8)*fv) / (bval[k]*1.0e5)
for j in range(0, npix-1):
tauv[k][j]=tau0[k]*np.exp((-1.0*(vmod[j]-v0[k])**2) / (2.0*sigma[k]**2))
find[k][j]=np.exp(-1.0*tauv[k][j])
#Transpose
tauv = tauv.T
find = find.T
#Sum over components (pixel by pixel)
tottauv=np.ndarray((npix+1))
for j in range(0, npix-1):
tottauv[j]=tauv[j,:].sum()
fmodraw=np.exp(-1.0*tottauv)
#create Gaussian kernel (smoothing function or LSF)
#created on 1 km/s grid with default FWHM=20.0 km/s (UVES), integral=1
fwhmins=20.0
sigins=fwhmins/(1.414*1.665); nker=150 #NEED TO FIND NKER
vt=np.arange(nker)-nker/2 #-75 to +75 in 1 km/s steps
smfn=(1.0/(sigins*np.sqrt(2.0*np.pi)))*np.exp((-1.0*(vt**2))/(2.0*sigins**2))
#convolve total and individual comps with LSF
fmod = np.convolve(fmodraw, smfn, mode='same')
axes[DictIon[ion][6]][DictIon[ion][7]].axis([-400,400,-.12,1.4])
axes[DictIon[ion][6]][DictIon[ion][7]].xaxis.set_major_locator(xmajorLocator)
axes[DictIon[ion][6]][DictIon[ion][7]].xaxis.set_major_formatter(xmajorFormatter)
axes[DictIon[ion][6]][DictIon[ion][7]].xaxis.set_major_locator(xminorLocator)
axes[DictIon[ion][6]][DictIon[ion][7]].yaxis.set_major_locator(ymajorLocator)
axes[DictIon[ion][6]][DictIon[ion][7]].yaxis.set_major_locator(yminorLocator)
axes[DictIon[ion][6]][DictIon[ion][7]].plot(vmod,fmod,'r', linewidth = 1.5)
axes[DictIon[ion][6]][DictIon[ion][7]].plot(tmp1,tmp2,'k--')
axes[DictIon[ion][6]][DictIon[ion][7]].step(velocity[(velocity>=-500) & (velocity<=500)],norm_flux,'k')
Instead of producing plots as Figure 2 of Fox et al. 2015, I am getting situations like this where the code will produce different results at different times when I run it:
][2
Bottom line, I have tried to debug this for 3 days now and I am at a loss. I suspect that it may have something to do with how pyplot plots work in for loop and the fact that I am using a dictionary to loop through. Any advice or suggestions would be greatly appreciated. I am using Python 3.
Edit:
Data available here:
zval,bval values: https://drive.google.com/file/d/0BxZ6b2fEZcGBX2ZTUEdDVHVWS0U/
velocity, flux values:
Si III 1206: https://drive.google.com/file/d/0BxZ6b2fEZcGBQXpmZ01kMDNHdk0/
Si IV 1393: https://drive.google.com/file/d/0BxZ6b2fEZcGBamkxVVA2dUY0Qjg/
Here's a MWE of the operation you want to perform:
import matplotlib.pyplot as plt
import numpy as np
f, axes = plt.subplots(3, 2, sharex='col', sharey='row',figsize=(15,15))
plt.subplots_adjust(hspace=0.)
data_dict = {'C II 1334': [1., 2.],
'Si II 1260': [2., 3.],
'Si II 1193': [3., 4.]}
for i, (key, value) in enumerate(data_dict.items()):
print i, key, value
x = np.linspace(0, 100, 10)
y0 = value[0] * x
y1 = value[1] * x
axes[i, 0].plot(x, y0, label=key)
axes[i, 1].plot(x, y1, label=key)
axes[i, 0].legend(loc="upper right")
axes[i, 1].legend(loc="upper right")
plt.legend()
plt.show()
Results in
I don't see any strange behaviour after invoking plt in a for loop over a dict.
I suggest you separate data processing/calculations, i.e. calculations of the scientific quantities of interest, from plotting said data.
Note that the ordering of the dictionary items isn't preserved - better to use a list or an ordered dictionary in my example.
I need to compute bspline curves in python. I looked into scipy.interpolate.splprep and a few other scipy modules but couldn't find anything that readily gave me what I needed. So i wrote my own module below. The code works fine, but it is slow (test function runs in 0.03s, which seems like a lot considering i'm only asking for 100 samples with 6 control vertices).
Is there a way to simplify the code below with a few scipy module calls, which presumably would speed it up? And if not, what could i do to my code to improve its performance?
import numpy as np
# cv = np.array of 3d control vertices
# n = number of samples (default: 100)
# d = curve degree (default: cubic)
# closed = is the curve closed (periodic) or open? (default: open)
def bspline(cv, n=100, d=3, closed=False):
# Create a range of u values
count = len(cv)
knots = None
u = None
if not closed:
u = np.arange(0,n,dtype='float')/(n-1) * (count-d)
knots = np.array([0]*d + range(count-d+1) + [count-d]*d,dtype='int')
else:
u = ((np.arange(0,n,dtype='float')/(n-1) * count) - (0.5 * (d-1))) % count # keep u=0 relative to 1st cv
knots = np.arange(0-d,count+d+d-1,dtype='int')
# Simple Cox - DeBoor recursion
def coxDeBoor(u, k, d):
# Test for end conditions
if (d == 0):
if (knots[k] <= u and u < knots[k+1]):
return 1
return 0
Den1 = knots[k+d] - knots[k]
Den2 = knots[k+d+1] - knots[k+1]
Eq1 = 0;
Eq2 = 0;
if Den1 > 0:
Eq1 = ((u-knots[k]) / Den1) * coxDeBoor(u,k,(d-1))
if Den2 > 0:
Eq2 = ((knots[k+d+1]-u) / Den2) * coxDeBoor(u,(k+1),(d-1))
return Eq1 + Eq2
# Sample the curve at each u value
samples = np.zeros((n,3))
for i in xrange(n):
if not closed:
if u[i] == count-d:
samples[i] = np.array(cv[-1])
else:
for k in xrange(count):
samples[i] += coxDeBoor(u[i],k,d) * cv[k]
else:
for k in xrange(count+d):
samples[i] += coxDeBoor(u[i],k,d) * cv[k%count]
return samples
if __name__ == "__main__":
import matplotlib.pyplot as plt
def test(closed):
cv = np.array([[ 50., 25., -0.],
[ 59., 12., -0.],
[ 50., 10., 0.],
[ 57., 2., 0.],
[ 40., 4., 0.],
[ 40., 14., -0.]])
p = bspline(cv,closed=closed)
x,y,z = p.T
cv = cv.T
plt.plot(cv[0],cv[1], 'o-', label='Control Points')
plt.plot(x,y,'k-',label='Curve')
plt.minorticks_on()
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(35, 70)
plt.ylim(0, 30)
plt.gca().set_aspect('equal', adjustable='box')
plt.show()
test(False)
The two images below shows what my code returns with both closed conditions:
So after obsessing a lot about my question, and much research, i finally have my answer. Everything is available in scipy , and i'm putting my code here so hopefully someone else can find this useful.
The function takes in an array of N-d points, a curve degree, a periodic state (opened or closed) and will return n samples along that curve. There are ways to make sure the curve samples are equidistant but for the time being i'll focus on this question, as it is all about speed.
Worthy of note: I can't seem to be able to go beyond a curve of 20th degree. Granted, that's overkill already, but i figured it's worth mentioning.
Also worthy of note: on my machine the code below can calculate 100,000 samples in 0.017s
import numpy as np
import scipy.interpolate as si
def bspline(cv, n=100, degree=3, periodic=False):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
periodic: True - Curve is closed
False - Curve is open
"""
# If periodic, extend the point array by count+degree+1
cv = np.asarray(cv)
count = len(cv)
if periodic:
factor, fraction = divmod(count+degree+1, count)
cv = np.concatenate((cv,) * factor + (cv[:fraction],))
count = len(cv)
degree = np.clip(degree,1,degree)
# If opened, prevent degree from exceeding count-1
else:
degree = np.clip(degree,1,count-1)
# Calculate knot vector
kv = None
if periodic:
kv = np.arange(0-degree,count+degree+degree-1)
else:
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Calculate query range
u = np.linspace(periodic,(count-degree),n)
# Calculate result
return np.array(si.splev(u, (kv,cv.T,degree))).T
To test it:
import matplotlib.pyplot as plt
colors = ('b', 'g', 'r', 'c', 'm', 'y', 'k')
cv = np.array([[ 50., 25.],
[ 59., 12.],
[ 50., 10.],
[ 57., 2.],
[ 40., 4.],
[ 40., 14.]])
plt.plot(cv[:,0],cv[:,1], 'o-', label='Control Points')
for d in range(1,21):
p = bspline(cv,n=100,degree=d,periodic=True)
x,y = p.T
plt.plot(x,y,'k-',label='Degree %s'%d,color=colors[d%len(colors)])
plt.minorticks_on()
plt.legend()
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(35, 70)
plt.ylim(0, 30)
plt.gca().set_aspect('equal', adjustable='box')
plt.show()
Results for both opened or periodic curves:
ADDENDUM
As of scipy-0.19.0 there is a new scipy.interpolate.BSpline function that can be used.
import numpy as np
import scipy.interpolate as si
def scipy_bspline(cv, n=100, degree=3, periodic=False):
""" Calculate n samples on a bspline
cv : Array ov control vertices
n : Number of samples to return
degree: Curve degree
periodic: True - Curve is closed
"""
cv = np.asarray(cv)
count = cv.shape[0]
# Closed curve
if periodic:
kv = np.arange(-degree,count+degree+1)
factor, fraction = divmod(count+degree+1, count)
cv = np.roll(np.concatenate((cv,) * factor + (cv[:fraction],)),-1,axis=0)
degree = np.clip(degree,1,degree)
# Opened curve
else:
degree = np.clip(degree,1,count-1)
kv = np.clip(np.arange(count+degree+1)-degree,0,count-degree)
# Return samples
max_param = count - (degree * (1-periodic))
spl = si.BSpline(kv, cv, degree)
return spl(np.linspace(0,max_param,n))
Testing for equivalency:
p1 = bspline(cv,n=10**6,degree=3,periodic=True) # 1 million samples: 0.0882 sec
p2 = scipy_bspline(cv,n=10**6,degree=3,periodic=True) # 1 million samples: 0.0789 sec
print np.allclose(p1,p2) # returns True
Giving optimization tips without profiling data is a bit like shooting in the dark. However, the function coxDeBoor seems to be called very often. This is where I would start optimizing.
Function calls in Python are expensive. You should try to replace the coxDeBoor recursion with iteration to avoid excessive function calls. Some general information how to do this can be found in answers to this question. As stack/queue you can use collections.deque.