logarithmic rebinning of 2D array - python

I have a 1D ray containing data that looks like this (48000 points), spaced by one wavenumber (R = 1 cm-1). The shape of the x and y array is (48000, 1), I want to rebin both in a similar way
xarr=[50000,9999,9998,....,2000]
yarr=[0.1,0.02,0.8,0.5....0.1]
I wish to decrease the spatial resolution, lets say R= 10 cm-1), so I want ten times less points (4800), from 50000 to 2000. And do the same for the y array
How to start?
I try by taking the natural log of the wavelength scale, then re-bin this onto a new log of wavelength scale generate using np.linspace()
xi=np.log(xarr[0])
xf=np.log(xarr[-1])
xnew=np.linspace(xi, xf, num=4800)
now I need to recast the y array into this xnew array, I am thinking of using rebin, a 2D rebin, but not sure how to use this. Any suggestions?

import numpy as np
arr1=[2,3,65,3,5...,32,2]
series=np.array(arr1)
print(series[:3])

I tried this and it seems to work!
import numpy as np
import scipy.stats as stats
#irregular x and y arrays
yirr= np.random.randint(1,101,10)
xirr=np.arange(10)
nbins=5
bin_means, bin_edges, binnumber = stats.binned_statistic(xirr,yirr, 'mean', bins=nbins)
yreg=bin_means # <== regularized yarr
xi=xirr[0]
xf=xirr[-1]
xreg=np.linspace(xi, xf, num=nbins)
print('yreg',yreg)
print('xreg',xreg) # <== regularized xarr
If anyone can find an improvement or see a problem with this, please post!
I'll try it on my logarithmically scaled data now

Related

matplotlib doesn't display the correct data

I am new to Python. For some reason when I look at the plot it displays all the data as if Y = 0 but the last one, which is weird since when I ask it to print Y it displays the right values. What am I doing wrong?
import math
import numpy as np
import matplotlib.pyplot as plt
y0=2 # [m]
g=9.81 # [m/s^2]
v=20 # initial speed [m/s]
y_target=1 # [m]
x=35 # [m]
n_iter=50
theta=np.linspace(0,0.5*math.pi,n_iter) # theta input [rad]
Y=np.zeros(n_iter) # y output [m]
for i in range(n_iter):
Y[i]=math.tan(theta[i])*x-g/(2*(v*math.cos(theta[i]))**2)*x**2+y0
plt.plot(theta,Y)
plt.ylabel('y [m]')
plt.xlabel('theta [rad]')
plt.ylim(top=max(Y),bottom=min(Y))
plt.show()
The problem is that the function blows up a bit as theta approaches π/2. Notice the little 1e33 at the top of the y-axis in the plot: the scale of that axis is huge, because the last value of y is essentially minus infinity (because of dividing by almost zero). If you change the limits of the y-axis, e.g. to (-1000, +1000), the plot looks correct.
But I can't resist helping you with something you didn't ask for help on... You are not using NumPy correctly. NumPy gives you two things: n-dimensional arrays as a data structure, and fast, optimized code for 'vectorized' computing with those arrays. In essence, you never need a loop in NumPy — you just compute with everything at once. Try doing 10 * np.array([1, 2, 3]) and you will get the idea.
So I would write your code like this:
import numpy as np
import matplotlib.pyplot as plt
# Problem parameters.
y0 = 2 # [m]
g = 9.81 # [m/s^2]
v = 20 # initial speed [m/s]
x = 35 # [m]
# Make theta [rad].
steps = 50
theta = np.linspace(0, 0.5*np.pi, steps)
# Compute y.
y = np.tan(theta) * x - g / (2 * (v * np.cos(theta))**2) * x**2 + y0
# Plot.
plt.plot(theta, y)
plt.ylabel('y [m]')
plt.xlabel('theta [rad]')
plt.ylim(-1000, 1000)
plt.show()
Notice that there's no loop — you just use the vector theta as if it were a scalar. And the math library (which can't handle NumPy's arrays, only scalars) is not needed at all when you're using NumPy.

How to find what points lie in each bin of a histogram?

I have a 2D dimensional histogram having bin size 10. I wish to know whether there is a numpy function (or any faster method) to obtain what points lie in each bin in the 2d grid. Is there a way to access the bin elements?
I hope this solve your problem. However, I believe other can improve my code because I am new in python.
Create Histogram with matplotlib
import matplotlib.pyplot as plt
rng = np.random.RandomState(10) # deterministic random data
a = np.hstack((rng.normal(size=100), rng.normal(loc=5, scale=2, size=1000)))
n ,bins ,patches = plt.hist(a, bins=10) # arguments are passed to np.histogram
plt.title("Histogram with '10' bins")
plt.show()
Reshape arrays and..
newbin = np.repeat(np.reshape(bins,(-1, len(bins))), a.shape, axis=0)
newa = np.repeat(np.reshape(a,(len(a),-1)),len(bins),axis=1)
#index_bin = (np.where(newbin[:,0] >np.reshape(a,(1,-1))[:,0] ) )[0][0]
index_bin = (newbin>newa).argmax(axis=1).T
test
print(a[0] , bins)
print(index_bin[0])
Output
1.331586504129518 [-2.13171211 -0.88255884 0.36659444 1.61574771 2.86490098 4.11405425
5.36320753 6.6123608 7.86151407 9.11066734 10.35982062]
3

Fastest way to convert a set of 3D points into image of heights in python

I am trying to convert a set of 3D points into a heightmap (a 2d image that shows the largest displacements of the points from the floor)
The only way I can come up with is writing a for look that iterates through all points and update the heightmap, this method, is quite slow.
import numpy as np
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
heightmap = np.zeros((int(np.max(points[:,1])/heightmap_resolution) + 1,
int(np.max(points[:,0])/heightmap_resolution) + 1))
for point in points:
y = int(point[1]/heightmap_resolution)
x = int(point[0]/heightmap_resolution)
if point[2] > heightmap[y][x]:
heightmap[y][x] = point[2]
I wonder if there is a better way of doing this. Any improvement is greatly appreciated!
The intuition:
If you find yourself using a for loop with numpy, you probably need to check again if numpy has an operation for it. I saw you wanted to compare items to get max and I wasn't sure if the structure was imporant so I changed it.
2nd point is heightmap is pre-allocating a lot of memory you aren't going to use. Try using a dictionary with a tuple (x,y) as the key or this (a dataframe)
import numpy as np
import pandas as pd
heightmap_resolution = 0.02
# generate some random 3D points
points = np.array([[x,y,z] for x in np.random.uniform(0,2,100) for y in np.random.uniform(0,2,100) for z in np.random.uniform(0,2,100)])
points_df = pd.DataFrame(points, columns = ['x','y','z'])
#didn't know if you wanted to keep the x and y columns so I made new ones.
points_df['x_normalized'] = (points_df['x']/heightmap_resolution).astype(int)
points_df['y_normalized'] = (points_df['y']/heightmap_resolution).astype(int)
points_df.groupby(['x_normalized','y_normalized'])['z'].max()

Maximum intensity projection from image stack

I'm trying to recreate the function
max(array, [], 3)
From MatLab, which can take my 300x300px image stack of N images (I'm saying "Image" here because I'm processing images, really this is just a big double array), 300x300xN, and create a 300x300 array. What I think is happening in this function, if it were to operate inefficiently, is that it is parsing through each (x,y) point, then taking the maximum value from that point across the z-axis, then normalizing with maximum and minimum values of the entire array.
I've tried recreating this in python with
# Shape of dataset: (300, 300, 181)
# Type of dataset: <type 'numpy.ndarray'>
for x in range(numpy.size(self.dataset, 0)):
for y in range(numpy.size(self.dataset, 1)):
print "Point is", x, y
# more would go here to find the maximum (x,y) value over Z axis in self.dataset
A very simple X,Y iterator. -- but not only does my IDE crash after a few milliseconds of running this code, but also it feels gross and inefficient.
Is there something I'm missing? I'm new to Python, and therefore the answer here isn't clear to me. Is there an existing function that does this operation?
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
path = "test.tif"
IM = io.imread(path)
IM_MAX= np.max(IM, axis=0)
plt.imshow(IM_MAX)

Plotting masked array that has been gridded using griddata

I have a 2D array of satellite data, and two corresponding 2D arrays giving the latitude and longitude of each pixel.
The data array is a masked array.
When I plot it up using pcolormesh, it looks like this:
m.pcolormesh(lon, lat, data)
I am attempting to grid this data on to a 0.25x0.25 deg grid.
lonGrid = arange(0, 360, 0.25)
latGrid = arange(-90, 90 0.25)
dataGridded = griddata(lon.ravel(),lat.ravel(),data.ravel(),latGrid,lonGrid, interp='linear')
m.pcolormesh(lonGrid, latGrid, dataGridded)
However, the resulting plot comes out as this:
It seems like this error has something to do with pcolormesh filling in the space between masked values. But I am unsure how to fix this.
Thanks
EDIT:
I was able to use the scipy version of griddata to get this to work...but its much slower and the syntax is more clunky. I would still appreciate some help getting the mpl(?) version above to work
from scipy.interpolate import griddata as griddata2
lonGrid,latGrid = meshgrid(lonGrid,latGrid)
dataGrid = griddata2((lon.ravel(),lat.ravel()),data.ravel(),(lonGrid,latGrid), method = 'linear')
dataGrid = ma.masked_where((dataGrid < 0) | isnan(dataGrid), dataGrid)
m.pcolormesh(lonGrid, latGrid, dataGridded)
Here are a couple initial troubleshooting ideas.
What version of Numpy are you using? If 1.09 or earlier the .ravel() will not return a masked array if given a masked array. See here.
The data array "wind" became "data". Is "data" truly masked? What happened between the two? Some more code would be useful.
dataGridded = griddata(lon.ravel(),lat.ravel(),XXXX.ravel(),latGrid,lonGrid, interp='linear')

Categories

Resources