Contour without float artifacts - python

I linearly interpolate and after that contour data. For calculations I use float type because I do not know how many decimals will be in input data. Sometimes it might be no decimals, sometimes one or over 10.
Unfortunately because of using float after interpolation and contouring of same values I get unwanted artifacts. How can I fix my code to not produce contour artifacts where there should not be any?
Simple code example:
import numpy as np
from scipy.interpolate import griddata
import matplotlib.pyplot as plt
interval_in = np.linspace(1, 100, 10)
interval_out = np.linspace(1, 100, 100)
xin, yin = np.meshgrid(interval_in, interval_in)
zin = np.ones((10, 10))*10
xout, yout = np.meshgrid(interval_out, interval_out)
zout = griddata((xin.flatten(),yin.flatten()),zin.flatten(),(xout,yout),method='linear')
contours = plt.contour(xout, yout, zout, levels=[10])
plt.show()

With your example the zout should be all 10. but actually varies between 9.9999999999999982 and 10.000000000000002, so contour is trying to plot this. You can use numpy rounding to a given precision,
zout_ = np.round_(zout, decimals=3)
contours = plt.contour(xout, yout, zout_, levels=[10])
plt.show()
although, if your data has a large range, contour should work correctly...

Related

How to find what points lie in each bin of a histogram?

I have a 2D dimensional histogram having bin size 10. I wish to know whether there is a numpy function (or any faster method) to obtain what points lie in each bin in the 2d grid. Is there a way to access the bin elements?
I hope this solve your problem. However, I believe other can improve my code because I am new in python.
Create Histogram with matplotlib
import matplotlib.pyplot as plt
rng = np.random.RandomState(10) # deterministic random data
a = np.hstack((rng.normal(size=100), rng.normal(loc=5, scale=2, size=1000)))
n ,bins ,patches = plt.hist(a, bins=10) # arguments are passed to np.histogram
plt.title("Histogram with '10' bins")
plt.show()
Reshape arrays and..
newbin = np.repeat(np.reshape(bins,(-1, len(bins))), a.shape, axis=0)
newa = np.repeat(np.reshape(a,(len(a),-1)),len(bins),axis=1)
#index_bin = (np.where(newbin[:,0] >np.reshape(a,(1,-1))[:,0] ) )[0][0]
index_bin = (newbin>newa).argmax(axis=1).T
test
print(a[0] , bins)
print(index_bin[0])
Output
1.331586504129518 [-2.13171211 -0.88255884 0.36659444 1.61574771 2.86490098 4.11405425
5.36320753 6.6123608 7.86151407 9.11066734 10.35982062]
3

Filtering accelerometry data in scipy

I'm new to python and scipy, and i am trying to filter acceleration data taken in 3 dimensions at 25Hz. I'm having a weird problem, after applying the filter the graph of my data is smoothed, however the values seem to be amplified quite a bit depending on the order and cutoff frequencies of the filter. Here is my code:
from scipy import loadtxt
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
my_data = loadtxt("DATA-001.CSV",delimiter=",",skiprows=8)
N, Wn = signal.buttord( [3,11], [.3,18], .1, 10, True)
print N
print Wn
b,a = signal.butter(N, Wn, 'bandpass', analog=True)
filtered_z = signal.filtfilt(a,b,[my_data[1:500,3]],)
filtered_z = np.reshape(filtered_z, (499,))
plt.figure(1)
plt.subplot(411)
plt.plot(my_data[1:500,0],my_data[1:500,3])
plt.subplot(412)
plt.plot(my_data[1:500,0], filtered_z, 'k')
plt.show()
Right now, this code returns this graph:
I'm unsure of how to get rid of this weird gain issue, if anyone has any suggestions?
Thank you!
You have your coefficients the wrong way around in signal.filtfilt. Should be:
filtered_z = signal.filtfilt(b,a,[my_data[1:500,3]],)
The size and ratio of the coefficients can result in amplification of the signal.

matplotlib: plot hist2d piecewise

I would like to plot a large sample stored in the arrays a and b with matplotlib's hist2d feature. However, generating H, xedges, yedges, img does not work directly for this data, as it uses too much memory. It works for half the number of samples, though, so I would like to do something like
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = 10)
followed by
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = 10)
While perhaps deleting the first half of the arrays after calculating the first set of variables. Is there a way to merge these two sets of variables and generate a combined plot for the data?
If (and only if!) you specify the bin edges manually, then your histograms will be compatible. You can simply add the occurences of each bin for both subsets, and you'll recover the full histogram:
import numpy as np
import matplotlib.pyplot as plt
a=np.random.rand(200)*10
b=np.random.rand(200)*10
binmin=min(a.min(),b.min())
binmax=max(a.max(),b.max())
H_1, xedges_1, yedges_1, img_1 = plt.hist2d(a[:len(a)/2], b[:len(b)/2], bins = np.linspace(binmin,binmax,10+1))
H_2, xedges_2, yedges_2, img_2 = plt.hist2d(a[len(a)/2:], b[len(b)/2:], bins = np.linspace(binmin,binmax,10+1))
H_3, xedges_3, yedges_3, img_3 = plt.hist2d(a, b, bins = np.linspace(binmin,binmax,10+1))
Result:
In [150]: (H_1+H_2==H_3).all()
Out[150]: True
Which you can easily plot using plt.pcolor. That's what hist2d seems to use, albeit with an additional transpose of the data:
plt.figure()
plt.pcolor((H_1+H_2).T)
img_3 (left) vs (H_1+H_2).T (right):

binned_statistic_2d producing unexpected negative values

I'm using scipy.stats.binned_statistic_2d and then plotting the output. When I use stat="count", I have no problems. When I use stat="mean" (or np.max() for that matter), I end up with negative values in each bin (as identified by the color bar), which should not be the case because I have constructed zvals such that it is always greater than zero. Does anyone know why this is the case? I've included the minimal code I use to generate the plots. I also get an invalid value RunTime warning, which makes me think that something strange is going on in binned_statistic_2d. The following code should just copy and run.
From the documentation:
'count' : compute the count of points within each bin. This is
identical to an unweighted histogram. `values` array is not
referenced.
which leads me to believe that there might be something going on in binned_statistic_2d and how it handles z-values.
import numbers as _numbers
import numpy as _np
import scipy as _scipy
import matplotlib as _mpl
import types as _types
import scipy.stats
from matplotlib import pyplot as _plt
norm_args = (0, 3, int(1e5)) # loc, scale, size
x = _np.random.random(norm_args[-1]) # xvals can be log scaled.
y = _np.random.normal(*norm_args) #_np.random.random(norm_args[-1]) #
z = _np.abs(_np.random.normal(1e2, *norm_args[1:]))
nbins = 1e2
kwargs = {}
stat = _np.max
fig, ax = _plt.subplots()
binned_stats = _scipy.stats.binned_statistic_2d(x, y, z, stat,
nbins)
H, xedges, yedges, binnumber = binned_stats
Hplot = H
if isinstance(stat, str):
cbar_title = stat.title()
elif isinstance(stat, _types.FunctionType):
cbar_title = stat.__name__.title()
XX, YY = _np.meshgrid(xedges, yedges)
Image = ax.pcolormesh(XX, YY, Hplot.T) #norm=norm,
ax.autoscale(tight=True)
grid_kargs = {'orientation': 'vertical'}
cax, kw = _mpl.colorbar.make_axes_gridspec(ax, **grid_kargs)
cbar = fig.colorbar(Image, cax=cax)
cbar.set_label(cbar_title)
Here's the runtime warning:
/Users/balterma/Library/Enthought/Canopy_64bit/User/lib/python2.7/sitepackages/matplotlib/colors.py:584: RuntimeWarning: invalid value encountered in less cbook._putmask(xa, xa < 0.0, -1)
Image with mean:
Image with max:
Image with count:
Turns out the problem was interfacing with plt.pcolormesh. I had to convert the output array from binned_statistic_2d to a masked array that masked the NaNs.
Here's the question that gave me the answer:
pcolormesh with missing values?

Python interp1d vs. UnivariateSpline

I'm trying to port some MatLab code over to Scipy, and I've tried two different functions from scipy.interpolate, interp1d and UnivariateSpline. The interp1d results match the interp1d MatLab function, but the UnivariateSpline numbers come out different - and in some cases very different.
f = interp1d(row1,row2,kind='cubic',bounds_error=False,fill_value=numpy.max(row2))
return f(interp)
f = UnivariateSpline(row1,row2,k=3,s=0)
return f(interp)
Could anyone offer any insight? My x vals aren't equally spaced, although I'm not sure why that would matter.
I just ran into the same issue.
Short answer
Use InterpolatedUnivariateSpline instead:
f = InterpolatedUnivariateSpline(row1, row2)
return f(interp)
Long answer
UnivariateSpline is a 'one-dimensional smoothing spline fit to a given set of data points' whereas InterpolatedUnivariateSpline is a 'one-dimensional interpolating spline for a given set of data points'. The former smoothes the data whereas the latter is a more conventional interpolation method and reproduces the results expected from interp1d. The figure below illustrates the difference.
The code to reproduce the figure is shown below.
import scipy.interpolate as ip
#Define independent variable
sparse = linspace(0, 2 * pi, num = 20)
dense = linspace(0, 2 * pi, num = 200)
#Define function and calculate dependent variable
f = lambda x: sin(x) + 2
fsparse = f(sparse)
fdense = f(dense)
ax = subplot(2, 1, 1)
#Plot the sparse samples and the true function
plot(sparse, fsparse, label = 'Sparse samples', linestyle = 'None', marker = 'o')
plot(dense, fdense, label = 'True function')
#Plot the different interpolation results
interpolate = ip.InterpolatedUnivariateSpline(sparse, fsparse)
plot(dense, interpolate(dense), label = 'InterpolatedUnivariateSpline', linewidth = 2)
smoothing = ip.UnivariateSpline(sparse, fsparse)
plot(dense, smoothing(dense), label = 'UnivariateSpline', color = 'k', linewidth = 2)
ip1d = ip.interp1d(sparse, fsparse, kind = 'cubic')
plot(dense, ip1d(dense), label = 'interp1d')
ylim(.9, 3.3)
legend(loc = 'upper right', frameon = False)
ylabel('f(x)')
#Plot the fractional error
subplot(2, 1, 2, sharex = ax)
plot(dense, smoothing(dense) / fdense - 1, label = 'UnivariateSpline')
plot(dense, interpolate(dense) / fdense - 1, label = 'InterpolatedUnivariateSpline')
plot(dense, ip1d(dense) / fdense - 1, label = 'interp1d')
ylabel('Fractional error')
xlabel('x')
ylim(-.1,.15)
legend(loc = 'upper left', frameon = False)
tight_layout()
The reason why the results are different (but both likely correct) is that the interpolation routines used by UnivariateSpline and interp1d are different.
interp1d constructs a smooth B-spline using the x-points you gave to it as knots
UnivariateSpline is based on FITPACK, which also constructs a smooth B-spline. However, FITPACK tries to choose new knots for the spline, to fit the data better (probably to minimize chi^2 plus some penalty for curvature, or something similar). You can find out what knot points it used via g.get_knots().
So the reason why you get different results is that the interpolation algorithm is different. If you want B-splines with knots at data points, use interp1d or splmake. If you want what FITPACK does, use UnivariateSpline. In the limit of dense data, both methods give same results, but when data is sparse, you may get different results.
(How do I know all this: I read the code :-)
Works for me,
from scipy import allclose, linspace
from scipy.interpolate import interp1d, UnivariateSpline
from numpy.random import normal
from pylab import plot, show
n = 2**5
x = linspace(0,3,n)
y = (2*x**2 + 3*x + 1) + normal(0.0,2.0,n)
i = interp1d(x,y,kind=3)
u = UnivariateSpline(x,y,k=3,s=0)
m = 2**4
t = linspace(1,2,m)
plot(x,y,'r,')
plot(t,i(t),'b')
plot(t,u(t),'g')
print allclose(i(t),u(t)) # evaluates to True
show()
This gives me,
UnivariateSpline: A more recent
wrapper of the FITPACK routines.
this might explain the slightly different values? (I also experienced that UnivariateSpline is much faster than interp1d.)

Categories

Resources