Array Interpolation Optimization - python

My problem is mainly about python optimization. I want to create a Geotiff file from an unstructured point cloud. Until now, I could create my tiff file from a 2D array of my points.
Only, it is an array of size (10000, 9300), which contains too many NaN values that I would like to interpolate.
The values to interpolate are in white on the capture.
Another constraint, I must not interpolate the values outside the project area, by extrapolation. Outside the convex area.
I have already managed to produce a result using griddata from scipy but the processing time is not viable (~15min) as I have to repeat this on a hundred files.
The piece of code I use to perform the interpolation:
import numpy as np
zi=np.load(Array.npy)
x, y = np.indices(zi.shape)
zi_i=griddata((x[~np.isnan(zi)], y[~np.isnan(zi)]), zi[~np.isnan(zi)(x[np.isnan(zi)],y[np.isnan(zi)]))
The result:
I put in link the array on which I work: https://drive.google.com/file/d/1KvEomI3H-gow2yoF6e2zpv5OwkriG7TQ/view?usp=sharing.
Thank you for your help. I hope I have provided enough information.

Related

Slicing a multidimensional numpy array -> 3D point clusters at different time instances

I have a numpy-array, who's shape is:
(30,40,100,200)
Those are 3D points (30(x-axis)x40(y-axis)x100(z-axis)) for different times (200 in total):
For visualization only (this is not my dataset, the picture comes from here: http://15462.courses.cs.cmu.edu/fall2016/article/35)
Now, I have issues with understanding how I can slice it:
How do I extract a 3D cluster for one specific time, i.e. 140?
From that extracted 3D cluster, how can I plot a 2D x-z cross-section for a specific y-position, i.e.45?
You should read up on basic numpy slicing: https://numpy.org/doc/stable/reference/arrays.indexing.html
How do I extract a 3D cluster for one specific time, i.e. 140?
Just specify the time index, i.e. data[:, :, :, 140]. Be aware that Python indexing starts from 0.
From that extracted 3D cluster, how can I plot a 2D x-z cross-section for a specific y-position, i.e.45?
You can acquire a 2D cross-section by a similar slicing operation, i.e. cluster[:, 45, :]. It can be plotted in various ways depending on the plotting library. imshow() from matplotlib might be one possibility.
Is your question about the data set (how does data categorize and how to get a 3D cluster at a specific time), or about the coding?
If it is about "How to get a cluster at a specific time" it means that your problem is about your particular dataset, which Stackoverflow is not a correct place for these types of question.
If it is about "coding" then define clearly your question and provide us with your code and the problem with it.
Based on your explanation, I think that for each time step, you have a complete set of xyz data, and so the solution is very strait.

Python image filter working on N spatial and M measurement dimensions

In short:
I’m searching for a way to calculate a multidimensional custom image filter on more than one axis of values in python.
What I mean is:
With scipy’s ndimage, I can use ndimage.generic_filter to apply the custom function myfunc to an N-dimensional numpy array. In myfunc, I just need to indicate how to process the pixel neighborhood of shape (size[0],…,size[N-1]) which is passed to the function.
Slightly different from that, what I would like to do is to provide an array of shape (S1,…,SN,V1,…VM) and apply the filter only along the spatial dimensions and interpret the remaining M axes as axes of values. The pixel neighborhood to process would then be of shape (size[0],…,size[N-1],V1,…,VM).
So far I’m having my own relatively naive implementation of such a filter, however it would be good to have a version handling the general case and dealing with border effects.
Thanks a lot in advance for hints or ideas! Cheers

Python-Read very large raster and plot empirical cumulative distribution function, memory error

I'm trying to plot an empirical cumulative distribution function (CDF) of data from a 380Gb binary raster. Using just a small mask of the data, the following code works perfectly.
import numpy as np
import matplotlib.pyplot as plt
dem_name = open('./raster.dem','rb')
vals = np.fromfile(dem_name,dtype='float32')
vals = np.negative(vals[vals!=-9999])
vals = np.sort(vals)
y = np.arange(1.,len(vals)+1.)/len(vals)
plt.plot(vals,y)
However, when I try to load the whole raster using this code, it obviously gives a memory error. My computer has 9Tb of disk space but is limited to 16Gb of RAM, so I have used numpy.memmap to get the raster values into an array.
dem_name = open('./raster.dem','rb')
vals = np.memmap(dem_name,dtype='float32','r')
This works, but I need to trim the nodata values (-9999) from the raster, switch the sign of the values (negative values becomes positive) and sort the values from lowest to highest.
vals_real = np.memmap(np.sort(np.negative(vals[vals!=-9999])))
This runs for a few hours and then gives a memory error.
The y array,
y = np.arange(1.,len(vals)+1.)/len(vals)
is also too big to be stored in RAM (gives a memory error), but I can't figure out how to store the array as a memmap object.
Is it correct that in order to plotting also takes memory, such that I will need enough disk space for 2X the size of the raster file ( 2x 380Gb)?
So to summarize, I need to read the huge raster into python and plot a CDF. It's very simple with a small raster, but I've been unsuccessful making this plot with the full raster.
I hope this question is clear. Thanks in advance.
With 380Gb of single precision floats, you have about 95 billion values.
Don't attempt to plot the ECDF using all 95 billion values. Most plotting software can't handle that many points, and even if it could, most displays are only a few thousand pixels wide, so there is no point in plotting data with resolution much higher than that.
Instead, compute a histogram, and work in batches. If you already know reasonable lower and upper bounds for the values in the file, you can preallocate the histogram bins. Otherwise, you might need a histogram algorithm that can adapt to the new data that arrives in each batch.

Cartopy behavior when plotting projected data

I am using cartopy to draw my maps. Its a great tool!
For some of my data I have the problem that the data is not properly mapped around 0deg or the dateline. See the example below.
I know the same feature from matplotlib.basemap, where it can be solved by using the add_cyclic routine. I wondered if somebody can recommend how to best fix this problem in cartopy.
Thanks!
Alex
When plotting global data like this you will always need to add a cyclic point to your input data and coordinate. I don't believe Cartopy currently includes a function to do this for you, but you can do it yourself quite simply for the time being. Assuming you have a 1d array of longitudes and a 2d array of data where the first dimension is latitude and the second is longitude:
import numpy as np
dlon = lons[1] - lons[0]
new_lons = np.concatenate((lons, lons[-1:] + dlon))
new_data = np.concatenate((data, data[:, 0:1]), axis=1)
If you have different shaped data or coordinates then you will need to adjust this to your needs.
The cartopy development team has included the required feature in the developoment branch. For details see here

How to display a float matrix as elevation values in a 3D plot in Python?

I currently have a heat map which is a 2D float matrix (list of lists of floats to be accurate), and I can display it in 2D with matplotlib fairly easily, but I would like to display it in a 3D plot such that the column and row indices can by the X and Y values respectively, and the values in the matrix are Z (elevation) values. What can I use to do that? I tried using Axes3D but it didn't seem very suitable (or maybe I was using it wrong?). What I am looking to do is conceptually very simple, to pretend the matrix is a DEM and display it as such.
Also if possible I would like to be able to change viewing angles on-the-fly, without having to re-generate the plot.
Any ideas?
These two questions are related but don't quite answer my question:
3d plotting with python
Python: 3D contour from a 2D image - pylab and contourf
NB: The float matrix is rather large, typically 100x100 or more, and the last time I tried to plot it in 3D my system ran out of memory and started thrashing.
Your use case seems like it is tailor made for mayavi/mlab, which has a function that does exactly what you are asking and by default permits interactive 3D rotation:
import numpy as np; from mayavi import mlab
data = np.random.random((100,100))
mlab.surf(data)
mlab.show()

Categories

Resources