I have a 3D ndarry object, which contains spectral data (i.e. spatial xy dimensions, and an energy dimension). I would like to extract and plot the spectra from each individual pixel in a line plot. At present, I am doing this using np.ndenumerate along the axis I'm interested in, but it's quite slow. I was hoping to try np.apply_along_axis, to see if it was faster, but I keep getting a strange error.
What works:
# Setup environment, and generate sample data (much smaller than real thing!)
import numpy as np
import matplotlib.pyplot as plt
ax = range(0,10) # the scale to use when plotting the axis of interest
ar = np.random.rand(4,4,10) # the 3D data volume
# Plot all lines along axis 2 (i.e. the spectrum contained in each pixel)
# on a single line plot:
for (x,y) in np.ndenumerate(ar[:,:,1]):
plt.plot(ax,ar[x[0],x[1],:],alpha=0.5,color='black')
It is my understanding that this is basically a loop, which is less efficient than array-based methods, so I would like to try an approach using np.apply_along_axis, to see if it's faster. This is my first attempt at python, however, and am still finding out how it works, so please put me right if this idea is fundamentally flawed!
What I would like to try:
# define a function to pass to apply_along_axis
def pa(y,x):
if ~all(np.isnan(y)): # only do the plot if there is actually data there...
plt.plot(x,y,alpha=0.15,color='black')
return
# check that the function actually works...
pa(ar[1,1,:],ax) # should produce a plot - does for me :)
# try to apply to to the whole array, along the axis of interest:
np.apply_along_axis(pa,2,ar,ax) # does not work... booo!
The resulting error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-109-5192831ba03c> in <module>()
12 # pa(ar[1,1,:],ax)
13
---> 14 np.apply_along_axis(pa,2,ar,ax)
//anaconda/lib/python2.7/site-packages/numpy/lib/shape_base.pyc in apply_along_axis(func1d, axis, arr, *args)
101 holdshape = outshape
102 outshape = list(arr.shape)
--> 103 outshape[axis] = len(res)
104 outarr = zeros(outshape, asarray(res).dtype)
105 outarr[tuple(i.tolist())] = res
TypeError: object of type 'NoneType' has no len()
Any ideas whats going wrong here/advice on how to do this better would be great.
Thanks!
apply_along_axis creates a new array from the output of your function.
You're returning None (by not returning anything). Thus the error. Numpy checks the length of the returned output to see if it makes sense for the new array.
Because you're not constructing a new array from the results, there's no reason to use apply_along_axis. It's not going to be any faster.
However, your current ndenumerate statement is exactly equivalent to:
import numpy as np
import matplotlib.pyplot as plt
ar = np.random.rand(4,4,10) # the 3D data volume
plt.plot(ar.reshape(-1, 10).T, alpha=0.5, color='black')
In general, you probably want to do something like:
for pixel in ar.reshape(-1, ar.shape[-1]):
plt.plot(x_values, pixel, ...)
That way you can easily iterate over the spectra at each pixel in your hyperspectral array.
You bottleneck here probably isn't how you're using the array. Plotting each line separately with identical parameters like this in matplotlib is going to be somewhat inefficient.
It will take slightly longer to construct, but a LineCollection will render much faster. (Basically, using a LineCollection tells matplotlib to not bother checking what the properties of each line are, and just pass them all to the low-level renderer to be drawn in the same way. You bypass a bunch of individual draw calls in favor of a single draw of a large object.)
On the downside, the code will be a bit less readable.
I'll add an example in a bit.
Related
I want to find the derivatives of some scattered data. I have tried two different methods:
projecting the scattered data on a regular grid using scipy.interpolate.griddata, then computing the gradients with numpy.gradients, and then projecting values back to the scattered locations.
creating a CloughTocher2DInterpolater (but I have the same issue with others) and getting the gradients out of it
The second one is an order of magnitude faster than the first one but unfortunately, it also goes crazy quite quickly when data are a bit complex. For instance starting with this signal (called F and which is a simple addition of tanh stepwise functions along x and y):
When I process F using the two methods, I get:
Method 1 gives a good approximation. Method 2 is also good but I need force the colormap because of the existence of some extreme values.
Now, if I add a small noise (i.e. of amplitude 0.1 while the signal has amplitudes between -3 and 3), the interpolator just goes crazy giving very large extreme values:
I don't know how to deal with this. I understand the interpolator won't like irregular function or noise, but I was not expecting such discrepancy. My first idea was to smooth data first but strangely I can't find any method that would help me on this. Another idea would be to make a 2d fit of F to try to remove noise but I'm dry here too...any idea ?
Here is the corresponding python example (working on python3.6.9):
import numpy as np
from scipy import interpolate
import matplotlib.pyplot as plt
plt.interactive(True)
# scattered data
N = 200
coordu = np.random.rand(N**2,2)
Xu=coordu[:,0]
Yu=coordu[:,1]
noise = 0.
noise = np.random.rand(Xu.shape[0])*0.1
Zu=np.tanh((Xu-0.25)/0.01+(Yu-0.25)/0.001)+np.tanh((Xu-0.5)/0.01+(Yu-0.5)/0.001)+np.tanh((Xu-0.75)/0.001+(Yu-0.75)/0.001)+noise
plt.figure();plt.scatter(Xu,Yu,1,Zu)
plt.title('Data signal F')
#plt.savefig('signalF_noisy.png')
### get the gradient
# using griddata np.gradients
Xs,Ys=np.meshgrid(np.linspace(0,1,N),np.linspace(0,1,N))
coords = np.array([Xs,Ys]).T
Zs = interpolate.griddata(coordu,Zu,coords)
nearest = interpolate.griddata(coordu,Zu,coords,method='nearest')
znan = np.isnan(Zs)
Zs[znan] = nearest[znan]
dZs = np.gradient(Zs,np.min(np.diff(Xs[0,:])))
dZus = interpolate.griddata(coords.reshape(N*N,2),dZs[0].reshape(N*N),coordu)
hist_dzus = np.histogram(dZus,100)
plt.figure();plt.scatter(Xu,Yu,1,dZus)
plt.colorbar()
plt.clim([0 ,10])
plt.title('dF/dx using griddata and np.gradients')
#plt.savefig('dxF_griddata_noisy.png')
# using interpolation method Clough
interp = interpolate.CloughTocher2DInterpolator(coordu,Zu)
dZuCT = interp.grad
hist_dzct = np.histogram(dZuCT[:,0,0],100)
plt.figure();plt.scatter(Xu,Yu,1,dZuCT[:,0,0])
plt.colorbar()
plt.clim([0 ,10])
plt.title('dF/dx using CloughTocher2DInterpolator')
#plt.savefig('dxF_CT2D_noisy.png')
# histograms
plt.figure()
plt.semilogy(hist_dzus[1][:-1],hist_dzus[0],'.-')
plt.semilogy(hist_dzct[1][:-1],hist_dzct[0],'.-')
plt.title('histogram of dF/dx')
plt.legend(('griddata','ClouhTocher'))
#plt.savefig('dxF_hist_noisy.png')
I have to plot a large amount of data in python (a list of size 3 million) any method/libraries to plot them easily since matplotlib does not seem to work.
what do you mean matplotlib does not work? It works when I tried it. is your data 1-dimensional or multi-dimensional? Are you expecting to see 3 million ticks in x axis? because that would not be possible.
d = 3*10**6
a = np.random.rand(d)
a[0] = 5
a[-1] = -5
print(a.shape)
plt.plot(a)
the plot
I use quite intensively matplotlib in order to plot arrays of size n > 10**6.
You can use plt.xscale('log') which allow you to display your results.
Furthermore, if your dataset shows great disparity in value, you can use plt.yscale('log') in order to plot them nicely if you use the plt.plot() function.
If not (ie you use imshow, hist2d and so on) you can write this in your preamble :
from matplotlib.colors import LogNorm and just declare the optional argument norm = LogNorm().
One last thing : you shouldn't use numpy.loadtxt if the size of the text file is greater than your available RAM. In that case, the best option is to read the file line by line, even if it take more time. You can speed up the process with from numba import jit and declare #jit(nopython=True, parallel =True) .
With that in mind, you should be able to plot in a reasonably short time array of size of about ten millions.
I'm trying to use Imshow to plot a 2-d Fourier transform of my data. However, Imshow plots the data against its index in the array. I would like to plot the data against a set of arrays I have containing the corresponding frequency values (one array for each dim), but can't figure out how.
I have a 2D array of data (gaussian pulse signal) that I Fourier transform with np.fft.fft2. This all works fine. I then get the corresponding frequency bins for each dimension with np.fft.fftfreq(len(data))*sampling_rate. I can't figure out how to use imshow to plot the data against these frequencies though. The 1D equivalent of what I'm trying to do us using plt.plot(x,y) rather than just using plt.plot(y).
My first attempt was to use imshows "extent" flag, but as fas as I can tell that just changes the axis limits, not the actual bins.
My next solution was to use np.fft.fftshift to arrange the data in numerical order and then simply re-scale the axis using this answer: Change the axis scale of imshow. However, the index to frequency bin is not a pure scaling factor, there's typically a constant offset as well.
My attempt was to use 2d hist instead of imshow, but that doesn't work since 2dhist plots the number of times an order pair occurs, while I want to plot a scalar value corresponding to specific order pairs (i.e the power of the signal at specific frequency combinations).
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
f = 200
st = 2500
x = np.linspace(-1,1,2*st)
y = signal.gausspulse(x, fc=f, bw=0.05)
data = np.outer(np.ones(len(y)),y) # A simple example with constant y
Fdata = np.abs(np.fft.fft2(data))**2
freqx = np.fft.fftfreq(len(x))*st # What I want to plot my data against
freqy = np.fft.fftfreq(len(y))*st
plt.imshow(Fdata)
I should see a peak at (200,0) corresponding to the frequency of my signal (with some fall off around it corresponding to bandwidth), but instead my maximum occurs at some random position corresponding to the frequencie's index in my data array. If anyone has any idea, fixes, or other functions to use I would greatly appreciate it!
I cannot run your code, but I think you are looking for the extent= argument to imshow(). See the the page on origin and extent for more information.
Something like this may work?
plt.imshow(Fdata, extent=(freqx[0],freqx[-1],freqy[0],freqy[-1]))
I am currently using MPL's im_show() function in order to display the depth image of an IFM 3D camera. I am able to display a single scene of the camera with no issues. Although, I am finding that the image displayed does not differ from one scene to the next (i.e changing the scene that the camera is looking at from one to another). Although, the actual data of the depth map is changing.
I have been looking into how to dynamically change images using MPL and I haven't found the right solution.
The depth map is found as a key called distance in the result dictionary after calling the method readNextFrame(). Although my question involves the plotting code. In short, the code looks a little something like this:
import o3d3xx
import array
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
imageWidth = 176
imageHeight = 132
#create ImageClient Object
pcic = o3d3xx.ImageClient("Camera IP",50010)
#store distance array as variable 'distance'
result = pcic.readNextFrame()
distance = result["distance"]
#convert to np array and reshape
distance = np.asarray(distance)
distance = distance.reshape(imageHeight,imageWidth)
#plot distance array
plt.figure()
plt.title("Distance Image")
plt.imshow(distance)
plt.show()
After changing scene, I know that the actual distance array is changing because I have compared the data arrays from one scene to the next. The only way I can get around this issue is by creating a new ImageClient object but I would like to avoid that.
Any ideas as to how to get around this? Ultimately I would like to call readNextFrame() and use imshow() to display a new depth image once the scene has changed without creating a new ImageClient object.
Easy one:
figure, axis = plt.subplots(figsize=(7.6, 6.1))
im = axis.imshow(***SOME ARRAY***)
if you want to reset plot data just
im.set_data(***SOME OTHER ARRAY***)
I'm making a demonstration of a different types of regression in numpy with ipython, and so far, I've been able to plot a simple linear regression without difficulty. Now, when I go on to make a quadratic fit to my data and go to plot it, I don't get a quadratic curve but instead get many lines. Here's the code I'm running that generates the problem:
import numpy
from numpy import random
from matplotlib import pyplot as plt
import math
# Generate random data
X = random.random((100,1))
epsilon=random.randn(100,1)
f = 3+5*X+epsilon
# least squares system
A =numpy.array([numpy.ones((100,1)),X,X**2])
A = numpy.squeeze(A)
A = A.T
quadfit = numpy.linalg.solve(numpy.dot(A.transpose(),A),numpy.dot(A.transpose(),f))
# plot the data and the fitted parabola
qdbeta0,qdbeta1,qdbeta2 = quadfit[0][0],quadfit[1][0],quadfit[2][0]
plt.scatter(X,f)
plt.plot(X,qdbeta0+qdbeta1*X+qdbeta2*X**2)
plt.show()
What I get is this picture (zoomed in to show the problem):
You can see that rather than having a single parabola that fits the data, I have a huge number of individual lines doing something that I'm not sure of. Any help would be greatly appreciated.
Your X is ordered randomly, so it's not a good set of x values to use to draw one continuous line, because it has to double back on itself. You could sort it, I guess, but TBH I'd just make a new array of x coordinates and use those:
plt.scatter(X,f)
x = np.linspace(0, 1, 1000)
plt.plot(x,qdbeta0+qdbeta1*x+qdbeta2*x**2)
gives me