Python Streamplot from large 1D arrays

Python Streamplot from large 1D arrays - python

I need to plot streamtraces from CFD analysis with Python, over a 2D contour plot. My problem is that I'm dealing with 4 large 1D arrays (x,y coordinates and u,v velocity components), say over 100k points, arising from an external CFD simulation (so I cannot manipulate them). Creating 2D arrays from them (e.g. with scipy.interpolate.griddata as I found) causes my computer to crash due to excessive memory usage.
I've also tried with quiver but I can't get a size for the arrows that scales with the dimensions of the plot: they are either too big or too small and anyway too many.
Since I've looked at all the solutions I've found but none worked.

try to vectorize your loops that may help greatly with large scale datas.

Related

Python Implementation for creating a triangular mesh from an array of closed loop planar contours

I'm a wee bit stuck.
I have a 3D point cloud (an array of (n,3) vertices), in which I am trying to generate a 3D triangular mesh from. So far I have had no luck.
The format my data comes in:
(x,y) values in regularly spaced (z) intervals. Think of the data as closed loop planar contours stored slice by slice in the z direction.
The vertices in my data must be absolute positions for the mesh triangles (i.e. I don't want them to be smoothed out such that the volume begins to change shape, but linear interpolation between the layers is fine).
Illustration:
Z=2. : ..x-------x... <- Contour 2
Z=1.5: ...\......|... <- Join the two contours into a mesh.
Z=1. : .....x----x... <- Contour 1
Repeat for n slices, end up with an enclosed 3D triangular mesh.
Things I have tried:
Using Open3D:
The rolling ball (pivot) method can only get 75% of the mesh completed and leaves large areas incomplete (despite a range of ball sizes). It has particular problems at the top and bottom slices where there tends to be large gaps in the middle (i.e. a flat face).
The Poisson reconstruction method smooths out the volume too much and I no longer have an accurate representation of the volume. This occurs at all depths from 3-12.
CGAL:
I cannot get this to work for the life of me. SWIG is not very good, the implementation of CGAL using SWIG is also not very good.
There are two PyBind implementations of CGAL however they have not incorporated the 3D triangulation libraries from CGAL.
Explored other modules like PyMesh, TriMesh, TetGen, Scikit-Geometry, Shapely etc. etc. I may have missed the answer somewhere along the line.
Given that my data is a list of closed-loop planar contours, it seems as though there must be some simple solution to just "joining" adjacent slice contours into one big 3d mesh. Kind of like you would in blender.
There are non-python solutions (like MeshLab) that may well solve these problems, but I require a python solution. Does anyone have any ideas? I've had a bit of a look into VTK and ITK but haven't found exactly what I'm looking for as of yet.
I'm also starting to consider that maybe I can interpolate intermediate contours between slices, and fill the contours on the top and bottom with vertices to make the data a bit more "pivot ball" method friendly.
Thank you in advance for any help, it is appreciated.
If there is a good way of doing this that isn't coded yet, I promise to code it and make it available for people in my situation :)

Actually there are two ways of having meshlab functionality in python:
The first is MeshLabXML (https://github.com/3DLIRIOUS/MeshLabXML ) a third party, is a Python scripting interface to meshlab scripting interface
the second is PyMeshLab (https://github.com/cnr-isti-vclab/PyMeshLab ) an ongoing effort done by the MeshLab authors, (currently in alpha stage) to have a direct Python bindings to all the meshlab filters

There is a very neat paper titled "Technical Note: an algorithm and software for conversion of radiotherapy contour‐sequence data to ready‐to‐print 3D structures" in the Journal of Medical Physics that describes this problem quite nicely. No python packages are required, however it is more easily implemented with numpy. No need for any 3D packages.
A useful excerpt is provided:
...
The number of slices (2D contours) constituting the specified structure is determined.
The number of points in each slice is determined.
Cartesian coordinates of each of the points in each slice are extracted and stored within dedicated data structures...
Numbers of points in each slice (curve) are re‐arranged in such a way, that the starting points (points with indices 0) are the closest points between the subsequent slices. Renumeration starts at point 0, slice 0 (slice with the lowest z coordinate).
Orientation (i.e., the direction determined by the increasing indices of points with relation to the interior/exterior of the curve) of each curve is determined. If differences between slices are found, numbering of points in non‐matching curves (and thus, orientation) is reversed.
The lateral surface of the considered structure is discretized. Points at the neighboring layers are arranged into threes, constituting triangular facets for the STL file. For each triangle the closest points with the subsequent indices from each layer are connected.
Lower and upper base surfaces of the considered structure are discretized. The program iterates over every subsequent three points on the curve and checks if they belong to a convex part of the edge. If yes, they are connected into a facet, and the middle point is removed from further iterations.
So basically it's a problem of aligning datasets in each slice to the nearest value of each slice. Then aligning the orientation of each contour. Then joining the points between two layers based on distance.
The paper also provides code to do this (for a DICOM file), however I re-wrote it myself and it works a charm.
I hope this helps others! Make sure you credit the author's in any work you do that uses this.

A recent feature of pymadcad can do things like this, not sure through if it fits your exact expectation in term of "pivot ball" or such things, checkout the doc for blending
Starting from a list of outlines, it can generate blended surfaces to join them:
For your purpose, I guess the best is one of:
blendpair(line1, line2)
junction(*lines)

Laplace interpolation between known values in a matrix

I'm working on a heatmap generation program which hopefully will fill in the colors based on value samples provided from a building layout (this is not GPS based).
If I have only a few known data points such as these in a large matrix of unknowns, how do I get the values in between interpolated in Python?:
0,0,0,0,1,0,0,0,0,0,5,0,0,0,0,9
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,2,0,0,0,0,0,0,0,0,8,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,8,0,0,0,0,0,0,0,6,0,0,0,0,0,0
0,0,0,0,0,3,0,0,0,0,0,0,0,0,7,0
I understand that bilinear won't do it, and Gaussian will bring all the peaks down to low values due to the sheer number of surrounding zeros. This is obviously a matrix handling proposition, and I don't need it to be Bezier curve smooth, just close enough to be a graphic representation would be fine. My matrix will end up being about 1500×900 cells in size, with approximately 100 known points.
Once the values are interpolated, I have written code to convert it all to colors, no problem. It's just that right now I'm getting single colored pixels sprinkled over a black background.

Proposing a naive solution:
Step 1: interpolate and extrapolate existing data points onto surroundings.
This can be done using "wave propagation" type algorithm.
The known points "spread out" their values onto surroundings until all the grid is "flooded" with some known values. At the end of this stage you have a number of intersected "disks", and no zeroes left.
Step 2: smoothen the result (using bilinear filtering or some other filtering).
If you are able to use ScyPy, then interp2d does exactly what you want. A possible problem with is that it seems to not extrapolate smoothly according to this issue. This means that all values near the walls are going to be the same as closest their neighbour points. This can be solved by putting thermometers in all 4 corners :)

Plotting a 3d triangular mesh from point cloud

I have this object/point cloud,rendered with pyopengl and pygame.
My object is a numpy array of the co-ordinates of the point. I wish to generate a 3d triangular mesh of this object, also it would be nice if you could decrease the number of triangles.
I have tried scipy.spatial.Delaunay and it doesnt generate triangles for 3d objects.

Dual Contouring would probably work well here, it's an algorithm that takes voxelized data and turns it into a mesh. I don't understand it trivially enough to outline it here, but basically you'd take your array of points and place them into a 3D grid array where if that grid cell contains a point it's set to equal 1 (full), and if it doesn't it is set to 0 (empty), you would then run the DC algorithm on this grid and it would output a mesh. The nice thing about this algorithm is it supports internal cavities and concave shapes.
Here's some links I found that may help you if you decide to use DC:
Basic Dual Contouring Theory
http://ngildea.blogspot.com/2014/11/implementing-dual-contouring.html
This is the github repo to the source I used when I implemented this algorithm in Unity3D:
https://github.com/nickgildea/DualContouringSample

Python-Read very large raster and plot empirical cumulative distribution function, memory error

I'm trying to plot an empirical cumulative distribution function (CDF) of data from a 380Gb binary raster. Using just a small mask of the data, the following code works perfectly.
import numpy as np
import matplotlib.pyplot as plt
dem_name = open('./raster.dem','rb')
vals = np.fromfile(dem_name,dtype='float32')
vals = np.negative(vals[vals!=-9999])
vals = np.sort(vals)
y = np.arange(1.,len(vals)+1.)/len(vals)
plt.plot(vals,y)
However, when I try to load the whole raster using this code, it obviously gives a memory error. My computer has 9Tb of disk space but is limited to 16Gb of RAM, so I have used numpy.memmap to get the raster values into an array.
dem_name = open('./raster.dem','rb')
vals = np.memmap(dem_name,dtype='float32','r')
This works, but I need to trim the nodata values (-9999) from the raster, switch the sign of the values (negative values becomes positive) and sort the values from lowest to highest.
vals_real = np.memmap(np.sort(np.negative(vals[vals!=-9999])))
This runs for a few hours and then gives a memory error.
The y array,
y = np.arange(1.,len(vals)+1.)/len(vals)
is also too big to be stored in RAM (gives a memory error), but I can't figure out how to store the array as a memmap object.
Is it correct that in order to plotting also takes memory, such that I will need enough disk space for 2X the size of the raster file ( 2x 380Gb)?
So to summarize, I need to read the huge raster into python and plot a CDF. It's very simple with a small raster, but I've been unsuccessful making this plot with the full raster.
I hope this question is clear. Thanks in advance.

With 380Gb of single precision floats, you have about 95 billion values.
Don't attempt to plot the ECDF using all 95 billion values. Most plotting software can't handle that many points, and even if it could, most displays are only a few thousand pixels wide, so there is no point in plotting data with resolution much higher than that.
Instead, compute a histogram, and work in batches. If you already know reasonable lower and upper bounds for the values in the file, you can preallocate the histogram bins. Otherwise, you might need a histogram algorithm that can adapt to the new data that arrives in each batch.

georeference/stack geotiffs of different sizes using python/gdal

I am in the process of porting a code I wrote in IDL (interactive data language) to python but am running into a bit of a problem that I am hoping someone can help me with.
The code goes like this:
take individual classified Landsat geotiffs (say there are N individual 1-band files per scene, each representing a different day) and further reduce these images to three binary-themed 1-band images (water and not water, land and not land, water/land and not water/land). This will be done by reading the rasters as matrices and replacing values.
** I don't actually need to have these images, so I can save them as memory or just keep them as numpy ndarrays to move to the next step
stack these images/arrays to produce 3 different (1 for each 'element') N-band stacks (or a 3-dimensional array-- (samples, lines, N)) for each scene
total the stacks to get a total number of water/land/water&land observations per pixel (produces one 1-band total image for each scene)
other stuff
The problem I am running into is when I get to the stacking, as the individual images for each scene vary in size, although they mostly overlap with each other. I originally used an ENVI layer-stacking routine that takes the N different-sized 1-band images for each scene and stacks them into an N-band image with an extent that encompasses all of the images' extents, and then reading the resulting rasters in as 3-d arrays to do the totals. I would like to do something similar with gdal/python but am not sure how to go about doing so. I was thinking I would implement gdal capabilities of geotiffs by using the geotransform info of the images to somehow find the inclusive extent, possibly padding the edges of the images with 0's so they are all the same size, stacking these images/3-d arrays so that they are correctly aligned, then computing the totals. Hopefully there is something more direct in gdal (or in any other open source package for python), as I'm not sure how I would pull that off.
Does anyone have any suggestions or ideas as to what would be the most efficient way (or any way really), to do what I need to do? I'm open to anything.
Thanks so much,
Maggie

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.