Correctly Interpolate 4D Data (on Grid) using Python - python

GOAL
I have values v given at specific 3D coordinates x y z. The data is stored as a pandas dataframe:
x y z v
0 -68.5 68.50 -10.00 0.297845
1 -68.5 -23.29 61.10 0.148683
2 -68.5 -23.29 63.47 0.142325
3 -68.5 -23.29 65.84 0.135908
4 -68.5 -23.29 68.21 0.129365
... ... ... ...
91804 68.5 23.29 151.16 0.118460
91805 68.5 23.29 153.53 0.119462
91806 68.5 23.29 155.90 0.120386
91807 68.5 23.29 139.31 0.112257
91808 68.5 -68.50 227.00 0.127948
I would like to find the values at new coordinates that are not part of the dataframe, hence I am looking into how to efficiently interpolate the data.
What I have done:
Since the coordinates are on a grid, I can use interpn:
import numpy as np
from scipy.interpolate import interpn
# Extract the list of coordinates (I know that they are on a grid)
xs = np.array(df["x"].to_list())
ys = np.array(df["y"].to_list())
zs = np.array(df["z"].to_list())
# Extract the associated values
vs = np.array(df["v"].to_list())
Reshape the data to fit the scipy function:
points = (np.unique(xs), np.unique(ys), np.unique(zs))
values= vs.reshape(len(np.unique(xs)), len(np.unique(ys)), len(np.unique(zs)))
To test the interpolation, I would like to see if I get the same values back, if I put in the same points as the original points:
request = (xs,ys,zs)
output = interpn(points, values, request)
... BUT
I am wondering, what I am doing wrong??
Other:
Dataset
Please find the complete dataset here: https://filebin.net/u10lrw956enqhg5i
Visualization
from mayavi import mlab
# Create figure
fig = mlab.figure(1, fgcolor=(0, 0, 0), bgcolor=(0, 0, 0))
mlab.points3d(xs,ys,zs,output)
mlab.view(azimuth=270, elevation=90, roll=180, figure=fig)
# View plot
mlab.show()

I strongly suspect that your data, while on a grid, is not ordered so as to allow a simple reshape of the values. You have two solutions available, both involving reordering the data in different ways.
Solution 1
Since you're already using np.unique to extract the grid, you can get the correct ordering of vs using the return_inverse parameter:
px, ix = np.unique(xs, return_inverse=True)
py, iy = np.unique(ys, return_inverse=True)
pz, iz = np.unique(zs, return_inverse=True)
points = (px, py, pz)
values = np.empty_like(vs, shape=(px.size, py.size, pz.size))
values[ix, iy, iz] = vs
return_inverse is sort of magical, largely because it's so counterintuitive. In this case, for each element of values, it tells you which unique, sorted gross location it corresponds to.
By the way, if you are missing grid elements, you may want to replace np.empty_like(vs, shape=(px.size, py.size, pz.size)) with either np.zeros_like(vs, shape=(px.size, py.size, pz.size)) or np.empty_like(vs, np.nan, shape=(px.size, py.size, pz.size)). In the latter case, you could interpolate the nans in the grid first.
Solution 2
The more obvious solution would be to rearrange the indices so you can reshape vs as you tried to do. That only works if you're sure that there are no missing grid elements. The easiest way would be to sort the whole dataframe, since the pandas methods are less annoying than np.lexsort (IMO):
df.sort_values(['x', 'y', 'z'], inplace=True, ignore_index=True)
When you extract, do it efficiently:
xs, ys, zs, vs = df.to_numpy().T
Since everything is sorted, you don't need np.unique to identify the grid any more. The number of unique x values is:
nx = np.count_nonzero(np.diff(xs)) + 1
And the unique values are:
bx = xs.size // nx
ux = xs[::bx]
y values go through a full cycle every bx elements, so
ny = np.count_nonzero(np.diff(ys[:bx])) + 1
by = bx // ny
uy = ys[:bx:by]
And for z (bz == 1):
nz = by
uz = zs[:nz]
Now you can construct your original arrays:
points = (ux, uy, uz)
values = vs.reshape(nx, ny, nz)

Related

How to split a 3D array of positions into subvolumes

Not sure if this question has been asked before–I looked through similar examples and they weren't exactly what I need to do.
I have an array of positions (shape = (8855470, 3)) in a cube with physical coordinates in between 0 and 787.5. These positions represent point masses in some space. Here's a look at the first three entries of this array:
array([[224.90635586, 720.494766 , 19.40263367],
[491.25279546, 41.26026654, 7.35436416],
[407.70436788, 340.32618713, 328.88192913]])
I want to split this giant cube into a number of smaller cubes. For example, if I wanted to split it on each side into 10 cubes, making 1,000 subcubes total, then each subcube would contain only the points that have positions within that subcube. I have been experimenting with np.meshgrid to create the 3D grid necessary to conditionally apportion the appropriate entries of the positions array to subcubes:
split = np.arange(0.,(787.5+787.5/10.),step=787.5/10.)
xg,yg,zg = np.meshgrid(split,split,split,indexing='ij')
But I'm not sure if this is the way to go about this.
Let me know if this question is too vague or if you need any additional information.
For sake of problem I will work with toy data. I think you're near with the meshgrid. Here's a propossal
Create grid but with points until 757.5 not included, with values as you did in arange.
Reshape then to have a 1d_array. for in arrays zip to get masks with the cube shape.
create a list to save all subcube points.
import numpy as np
data = np.random.randint(0,787,( 10000,3))
start = 0
end = 787.5
step = (end-start)/10
split = np.arange(start,end,step)
xg,yg,zg = np.meshgrid(split,split,split,indexing='ij')
xg = xg.reshape(-1)
yg = yg.reshape(-1)
zg = zg.reshape(-1)
subcube_data = []
for x,y,z in zip(xg,yg,zg):
mask_x = (x<= data[:,0] ) * ( data[:,0] < x+step) #data_x between start and end for this subcube
mask_y = (y<= data[:,1] ) * ( data[:,1] < y+step) #data_y between start and end for this subcube
mask_z = (z<= data[:,2] ) * ( data[:,2] < z+step) #data_z between start and end for this subcube
mask = mask_x * mask_y * mask_z
subcube_data.append(data[mask])
Now you will have a list with 1000 elements where each element is a sub_cube containing an Nx3 point list. If you want to recover the 3d index corresponding to every sub_cube[i] you just could do [xg[i],yg[i],zg[i]].
Last you can plot to see some of the sub_cubes and the rest of data
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#plot data as 3d scatter border black
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
#plot subcubes 0 1 2 3 4 in colors
for i in range(5):
ax.scatter(subcube_data[i][:,0],
subcube_data[i][:,1],
subcube_data[i][:,2], marker='o', s=2)
for i in range(5,len(subcube_data)):
ax.scatter(subcube_data[i][:,0],
subcube_data[i][:,1],
subcube_data[i][:,2],marker='o', s=1, color='black')

How to plot in 3D with a double entry table - Matplotlib

I would like to plot in 3D with Pandas / MatplotLib (Wireframe or other, I do not care) but in a specific way..
I'm using RFID sensors and I'm trying to record the signal I receive at different distance + different angles. And I want to see the correlation between the rising of the distance and the angle.
So that's why I want to plot in 3D :
X Axis -> the Distance, Y Axis -> the Angle, Z Axis -> the signal received which means a float
My CSV file from where I generate my DataFrame is organized like this a double entry table :
Distance;0;23;45;90;120;180
0;-53.145;-53.08;-53.1;-53.035;-53.035;-53.035
5;-53.145;-53.145;-53.05;-53.145;-53.145;-53.145
15;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
25;-53.145;-52.145;-53.145;-53.002;-53.145;-53.145
40;-53.145;-53.002;-51.145;-53.145;-54.255;-53.145
60;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
80;-53.145;-53.145;-53.145;-53.145;-60;-53.145
100;-53.145;-52;-53.145;-54;-53.145;-53.145
120;-53.145;-53.145;-53.145;-53.145;-53.002;-53.145
140;-51.754;-53.145;-51.845;-53.145;-53.145;-53.145
160;-53.145;-53.145;-49;-53.145;-53.145;-53.145
180;-53.145;-53.145;-53.145;-53.145;-53.145;-53.002
200;-53.145;-53.145;-53.145;-53.145;-53.145;-53.145
On the first label row we've different angles : 0°, 23°, 45°, ...
And the index of the DataFrame is the distance : 0 cm, 15 cm...
And the matrix inside represents the signal, so, values of Z Axis...
But I do not know how to generate a 3D Scatter, WireFrame... because in every tutorial I see people that use specific columns as axis.
Indeed, in my CSV file on the first row I've the label of all columns
Distance;0 ;23 ;45 ;90 ;120;180
And I do not know how to generate a 3D plot with a double entry table.
Do you know how to do it ? Or, to generate my CSV file in a better way to see the same result at the end !
I would be grateful if you would help me about this !
Thank you !
maybe contour is enough
b = np.array([0,5,15,25,40,60,80,100,120,140,160,180,200])
a = np.array([0,23,45,90,120,180])
x, y = np.meshgrid(a, b)
z = np.random.randint(-50,-40, (x.shape))
scm = plt.contourf(x, y, z, cmap='inferno')
plt.colorbar(scm)
plt.xticks(a)
plt.yticks(b)
plt.xlabel('Distance')
plt.ylabel('Angle')
plt.show()
displays
You can get a contour plot with something like this (but for the data shown it is not very interesting since all the values are constant at -45):
df = pd.read_csv(sep=';')
df = df.set_index('Distance')
x = df.index
y = df.columns.astype(int)
z = df.values
X,Y = np.meshgrid(x,y)
Z = z.T
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Welcome to stackoverflow, your question can be split into several steps:
Step 1 - read the data
I have stored your data in a file called data.txt.
I don't know Pandas very well but this can also be handled with the nice simple function of Numpy called loadtxt. Your data is a bit problematic because of the text 'Distance' value in the first column and first row. But don't panic we load the file as a matrix of strings:
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
Step 2 - transform the raw data
To extract the wanted data from the raw data we can do the following:
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
With indexing the raw data we select the data that we want and with astype we change the string values to numbers.
Intermediate step - making the data a bit fancier
Your data was a bit boring, only the value -45, i took the liberty to make it a bit fancier:
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
Step 4 - make a wireframe plot
The example at matplotlib.org looks easy enough:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z)
plt.show()
But the trick is to get the X, Y, Z parameters right...
Step 3 - make the X and Y data
The Z data is simply our data values:
Z = data
The X and Y should also be 2D array's such that plot_wireframe can find the x and y for each value of Z in the 2D arrays X an Y at the same array locations. There is a Numpy function to create these 2D array's:
X, Y = np.meshgrid(angle, distance)
Step 5 - fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
Putting it together
All steps together in the right order:
# necessary includes...
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
import numpy as np
raw_data = np.loadtxt('data.txt', delimiter=';', dtype=np.string_)
angle = raw_data[0 , 1:].astype(float)
distance = raw_data[1:, 0 ].astype(float)
data = raw_data[1:, 1:].astype(float)
# make the example data a bit more interesting...
data = (50 + angle[np.newaxis,:]) / (10 + np.sqrt(distance[:,np.newaxis]))
# setting up the plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# the trickey part creating the data that plot_wireframe wants
Z = data
X, Y = np.meshgrid(angle, distance)
ax.plot_wireframe(X, Y, Z)
# fancing it up a bit
ax.set_xticks(angle)
ax.set_yticks(distance[::2])
ax.set_xlabel('angle')
ax.set_ylabel('distance')
# and showing the plot ...
plt.show()

Method for finding max. point on linear plot

This is in astronomy, but I think my question is probably very elementary - I'm not very experienced, I apologise.
I am plotting the relationship between the colour of a star-forming galaxy (y axis) with the redshift (x axis). The plot is a line that rises up from around 0 up to maybe 9, then decays again to about -2. The peak (~9 colour) is around 4 in terms of redshift, and I want to find the peak is more exactly. The redshift is given by quite a confusing function, and I can't figure out how to differentiate it or else I would just do that.
Could I maybe differentiate the complicated redshift (z) function? If so, how?
If not, how could I estimate a peak graphically/numerically?
Sorry for the very basic question and thank you very much in advance. My code is below.
import numpy as np
import matplotlib.pyplot as plt
import IGM
import scipy.integrate as integrate
SF = np.load('StarForming.npy')
lam = SF[0]
SED = SF[1]
filters = ['f435w','f606w','f814w','f105w','f125w','f140w','f160w']
filters_wl = {'f435w':0.435,'f606w':0.606,'f814w':0.814,'f105w':1.05,'f125w':1.25,'f140w':1.40,'f160w':1.60} # filter dictionary to give wavelengths of filters in microns
fT = {} # this is a dictionary
for f in filters:
data = np.loadtxt(f+'.txt').T
fT[f]= data
fluxes = {}
for f in filters: fluxes[f] = [] # make empty list for each
redshifts = np.arange(0.0,10.0,0.1) # redshifts going from 0 to 10
for z in redshifts:
lamz = lam * (1. + z)
obsSED = SED * IGM.madau(lamz, z)
for f in filters:
newT = np.interp(lamz,fT[f][0],fT[f][1]) # for each filter, refer back
bb_flux = integrate.trapz((1./lamz)*obsSED*newT,x=lamz)/integrate.trapz((1./lamz)*newT,x=lamz)
# 1st bit integrates, 2nd bit divides by area under filter to normalise filter
# loops over all z, for all z it creates a new SED, redshift wl grid
fluxes[f].append(bb_flux)
for f in filters: fluxes[f] = np.array(fluxes[f])
colour = -2.5*np.log10(fluxes['f435w']/fluxes['f606w'])
plt.plot(redshifts,colour)
plt.xlabel('Redshift')
plt.ylabel('Colour')
plt.show
I do not have high enough reputation to comment, but this may solve your problem, so I guess its answer. Store all your y-coordinates in a list, then use the max(list) function to find the max. If you want an ordered pair, store your coordinates as (y,x) tuples and use max(list)
lst = [(3,2), (4,1), (1, 200)]
max(lst)
yields (4,1)

Getting CDF of variable-sized numpy arrays in Python using same bins?

I'd like to make a set of comparable empirical CDFs for a few numpy arrays (each of different length) and store these in a pandas dataframe:
a = scipy.randn(100)
b = scipy.randn(500)
# ECDF from statmodels
cdf_a = ECDF(a)
cdf_b = ECDF(b)
The problem is that cdf_a.x, cdf_a.y will be of different lengths of cdf_b.x, cdf_b.y and I would like these to be the same length, i.e. use same number of bins to compute the CDF so that these can be plotted on same scale from a pandas DataFrame. This is not possible:
df = pandas.DataFrame({"cdf_a": cdf_a.y, "cdf_b": cdf_b.y})
Since the cdfs are not of the same length. How can I bin a and b using similar bins when computing their CDFs, so that I get comparable same-length vectors back?
Is this the best solution?
bins = np.linspace(0, 1, 10)
v1 = cdf_a(bins)
v2 = cdf_b(bins)
The way we use it in some goodness of fit tests is to stack the arrays, so they are defined on all points, points from both arrays.
Then use np.searchsorted to get the ranking, number of points in dataset 1 below x and number of points in dataset 2 below x.
If I remember correctly, look at scipy.stats.ks_2samp
data1 = np.sort(data1)
data2 = np.sort(data2)
data_all = np.concatenate([data1,data2])
cdf1 = np.searchsorted(data1,data_all,side='right')/(1.0*n1)
cdf2 = (np.searchsorted(data2,data_all,side='right'))/(1.0*n2)
It appears that this is a good solution:
bins = np.linspace(0, 1, 10)
v1 = cdf_a(bins)
v2 = cdf_b(bins)
Then len(v1) == len(v2) and these can be plotted as CDFs of a, b on the same scale.

How to interpolate lines over nans in pylab?

Pylab inserts line breaks where there are nans in the data. How can I get continuous lines. I think matlab does this automatically.
example:
x = linspace(0,1,10)
y = rand(10,2)
y[5:8,1] = nan
plot(x,y,'.-')
I want y[:,1] interpolated but not y[:,0] since there are no points missing.
select the rows without nan
from pylab import isnan
boolind = ~isnan(y).any(1)
then do
plot(x[boolind], y[boolind])
if you want a value from linear interpolation to substitute that nan, you simply record the position of that nan and do the interpolation using adjacent points, but I think for plotting purposes, simply eliminating nan data points is enough - the code will do the linear interpolation for you anyway.
btw: presumably your y = rand((10, 50)) should be y = rand(10, 50), although I am not sure why you wanna plot a 2D array against a 1D.
EDIT
for your particular question, you can simply plot the two columns of y separately
from pylab import *
x = linspace(0,1,10)
y = rand(10,2)
y[5:8,1] = nan
boolind = ~isnan(y)
plot(x[boolind[:,0]],y[boolind[:,0], 0],'.-')
plot(x[boolind[:,1]],y[boolind[:,1], 1],'.-')
show()

Categories

Resources