I am looking for a minimalistic solution for doing basic geospatial search in Python.
We have a dataset of roughly 10 k locations and we need to solve the find the all locations within a radius of N kilometers from a given location. I am not looking for explicit database with geospatial support. I hope to get around another external solution. Is there something that would use Python only?
Shapely seems to be a good solution. Its description seems to correspond to what you're looking for :
[Shapely] It lets you do PostGIS-ish stuff outside the context of a database using Python.
It is based on GEOS, which a widely used C++ library.
Here is a link to the documentation
scipy.spatial has a kd-tree implementation that might be the most popular in Python.
A self made solution without any external modules could be something like this:
import numpy as np
points = np.array([[22.22, 33.33],
[08.00, 05.00],
[03.12, 05.00],
[09.00, 08.00],
[-02.5, 03.00],
[0.00, -01.00],
[-10.0,-10.00],
[12.00, 12.00],
[-4.00, -6.00]])
r = 10.0 # Radius withing the points should lie
xm = 3 # Center x coordinate
ym = 8 # Center y coordinate
points_i = points[((points[:,0] - xm)**2 + (points[:,1] - ym)**2)**(1/2.0) < r]
points_i contains those points which lie within the radius. This solution requires the data to be in a numpy array which is to my knowledge also a very fast way to go trough large data sets as oppose to for loops. I guess this solution is pretty much minimalistic. The plot below shows the outcome with the data given in the code.
Related
I am working on a project and I have to do some material derivatives. I can't find the function in the module which can do this type of operation for me. Even if this function does not exist I can make it myself but then there is another problem. I don't know how to extract a vector component from a vector. If I have a vector (5,10) I cant extract the y component alone (10) without bringing the x component next to it.
I read alot about similar problems on this forum and I also read the documentation for sympy vector module. I can't seem to find an answer.
import sympy.physics.vector
A = ReferenceFrame('A')
v = 5*A.x + 10*A.y
I'd like to do a material derivative of vector "v". If this isn't possible I would like to make a function myself (def fun...) but I also don't know how to get one component from a vector.
I imagined that the component extraction would looked something like v[0] or something similar but it doesn't work.
1) As far as material derivatives go, you'd probably be best off writing your own function which shouldn't be too difficult. Otherwise you might be able to make use of other modules within sympy.physics such as the mechanics module. I'm not sure if it will help or not but it's definitely worth a look.
2) To your second question (how to extract the components of a sympy vector object):
This is a little hacky because once you convert to matrix form the underlying reference frame information is lost but you could do something like this:
from sympy.physics import vector
A = vector.ReferenceFrame('A')
v = 5*A.x + 10*A.y
x_component = v.to_matrix(A)[0]
y_component = v.to_matrix(A)[1]
If you still wanted it to be in vector form as opposed to just the scalar coefficient you could immediately re-multiply by the basis vector like such:
x_vec = (v.to_matrix(A)[0])*A.x
y_vec = (v.to_matrix(A)[0])*A.y
Basically, i have a corpus of ~10,000 STL files, and i need to turn them all into 32x32x32 arrays of 1's and 0's (voxels)
I already have this script that turns STL files into voxels; https://github.com/rcpedersen/stl-to-voxel , but sometimes even though i specify that i need a 32x32x32 array, it will give me some huge array, and also along with being buggy, it takes FOREVER (processed ~600 files in 48 hours...)
Would it be easier to attempt to fix this script, or to write my own? It doesnt seem like voxelizing an STL would be a hard task, but I don't know any of the methods out there for this; if there are any strategies/tips, anything would be greatly appreciated.
Sorry to be a bummer, but voxelisation is actually quite a hard task. And not something Python is suitable to do quickly. Even for the simple slice/crossing test I would think a c++ implementation will beat python 1:100. I recommend libigl. Or do it on the GPU for realtime :) Look for conservative rasterization. But that is for "good" meshes that are non intersecting and closed. Otherwise it becomes a lot harder. Look for "generalized winding numbers" - also in igl.
Basicly voxelizing facet surface means separation inside form outside. It can be done in different ways: easiest way is to find signed distance from each voxel but it requeres input mesh to be closed, other way is to find winding number. You can find implemetation of both in MeshLib. Also there is python module that can help you:
pip install --upgrade pip
pip install meshlib
from meshlib import mrmeshpy as mm
# load mesh
mesh = mm.loadMesh(mm.Path("path_to_file.stl"))
mtvParams = mm.MeshToVolumeParams()
# signed will have negative values inside mesh and positive outside, but requires closed mesh
mtvParams.type = mm.MeshToVolumeParamsType.Signed
# voxels with presice distance - 3 inside, 3 - outside
mtvParams.surfaceOffset = 3
# find correct voxel size to have 32x32x32 volume
meshBox = mesh.computeBoundingBox()
boxSize = meshBox.max-meshBox.min
mtvParams.voxelSize = boxSize / 27.0
voxels = mm.meshToVolume(mesh,mtvParams)
# save voxels as tiff slices
vsParams = mm.VoxelsSaveSavingSettings()
vsParams.path = "save_voxels_dir"
vsParams.slicePlane = mm.SlicePlane.XY
mm.saveAllSlicesToImage(voxels,vsParams)
Is there a way to convert a stl file to a numpy array?
The numpy array, resolved with x*y*z datapoints should contain volumetric information in the sense of "inside" or "outside" the geometry, say as 0 or 1.
To my surprise I didn't find anything on this yet, although numpy2stl seems to be quite popular.
The problem is a complex geometry of porous media, so convex hull conversion does not work either.
import numpy
import stl
from stl import mesh
stl.stl.MAX_COUNT = 1e10
your_mesh = stl.mesh.Mesh.from_file('Data.stl')
print your_mesh.data
seems to be able to export triangles only.
In addition, even this usually leads to MemoryError messages; but numpy-stl (usually) works for loading the datapoints into numpy.
Is there a way to convert the stl data into volume data that contains information if every point is inside or outside the geometry?
The resulting 3D array could technically be of binary data type, but this isn't required.
overcomplicated
With commercial software this conversion seems to be trivial, but it's not python or free. Implementing a ray casting algorithm from scratch seems over complicated for file type conversion.
I do believe that what you want to do is a voxelization of your volume. You can do that with the trimesh package at https://trimsh.org/
import trimesh
mesh = trimesh.load_mesh('path2yourstlfile.stl')
assert(mesh.is_watertight) # you cannot build a solid if your volume is not tight
volume = mesh.voxelized(pitch=0.1)
mat = volume.matrix # matrix of boolean
You can also check if a list of point are inside the volume using:
mesh.contains(points)
Small typo in [4], trimesh has no matrix atribute, you get it from VoxelGrid.
mat = mesh.matrix
fixed
mat = volume.matrix
We are using a shapely library to check that some random point is not in some prohibited areas stored in a shape file.
with fiona.open(path) as source:
geometry = get_exclusive_item(source[0])
geom = shapely.geometry.shape(geometry['geometry'])
def check(lat, lng):
point = shapely.geometry.Point(lng, lat)
return not geom.contains(point)
But the latest call geom.contains(point) takes about a second to complete. Is there any other faster libraries for python, or could we optimize a shape files somehow to get better speed?
Thank for the #iant point to use a spatial indexes.
My shapefile was a single MultiPoligon with a lot of points, makes .contains() are really slow.
I solved the issue by splitting it into smaller shapes and use Rtree index.
To split shapefile I used QGIS, as it descrived here - https://gis.stackexchange.com/a/23694/65569
The core idea how to use RTree in python is here - https://gis.stackexchange.com/a/144764/65569
In total this gaves me 1000x speed-up for .contains() lookups!
The situation follows:
Each supplier has some service areas, which the user have defined using GoogleMaps (polygons).
I need to store this data in the DB and make simple (but fast) queries over this.
Queries should looks like: "List all suppliers with service area containing x,y" or "In which polygons (service areas) x,y are inside?"
At this time, I've found GeoDjango which looks a very complex solution to this problem. To use it, I need a quite complex setup and I couldn't find any recent (and good) tutorial.
I came with this solution:
Store every polygon as a Json into the database
Apply a method to determine if some x,y belongs to any polygon
The problem with this solution is quite obvious: Queries may take too long to execute, considering I need to evaluate every polygon.
Finally: I'm looking for another solution for this problem, and I hope find something that doesn't have setup GeoDjango in my currently running server
Determine wheter some point is inside a polygon is not a problem (I found several examples); the problem is that retrieve every single polygon from DB and evaluate it does not scale. To solve that, I need to store the polygon in such way I can query it fast.
My approach.
Find centroid of polygon C++ code.
Store in database
Find longest distance from vertex to centroid (pythag)
Store as radius
Search database using centroid & radius as bounding box
If 1 or more result use point in polygon on resultant polygons
This solution enables you to store polygons outside of GeoDjango to dramatically speed up point in polygon queries.
In my case, I needed to find whether the coordinates of my numpy arrays where inside a polygon stored in my geodjango db (land/water masking). This required iterating over every coordinate combination in my arrays to test if it was inside or outside the polygon. As my arrays are large, this was taking a very long time using geodjango.
Using django's GEOSGeometry.contains my command looked something like this:
import numpy as np
from django.contrib.gis.geos import Point
my_polygon = model.geometry # get model multipolygon field
lat_lon = zip(latitude.flat, longitude.flat) # zip coordinate arrays to tuple
mask = np.array([my_polygon.contains(Point(l)) for l in lon_lat]) # boolean mask
This was taking 20 s or more on large arrays. I tried different ways of applying the geometry.contains() function over the array (e.g. np.vectorize) but this did not lead to any improvements. I then realised it was the Django contains lookup which was taking too long. I also converted the geometry to a shapely polygon and tested shapely's polygon.contains function - no difference or worse.
The solution lay in bypassing GeoDjango by using Polygon isInside method. First I created a function to create a Polygon object from my Geos Multipolygon.
from Polygon import Polygon
def multipolygon_to_polygon(multipolygon):
"""
Convert a Geos Multipolygon to python Polygon
"""
polygon = multipolygon[0] # select first polygon object
nrings = polygon.num_interior_rings # get number of rings in polygon
poly = Polygon()
poly.addContour(polygon[0].coords) # Add first ring coordinates tuple
# Add subsequent rings
if nrings > 0:
for i in range(nrings):
print("Adding ring %s" % str(i+1))
hole = True
poly.addContour(polygon[i+1].coords, hole)
return poly
Applying this to my problem
my_polygon = model.geometry # get model multipolygon field
polygon = multipolygon_to_polygon(my_polygon) # convert to python Polygon
lat_lon = zip(bands['latitude'].flat, bands['longitude'].flat) # points tuple
land_mask = array([not polygon.isInside(ll[1], ll[0]) for ll in lat_lon])
This resulted in a roughly 20X improvement in speed. Hope this helps someone.
Python 2.7.