Point in Polygon with geoJSON in Python - python

I have a geoJSON database with lots of polygons (census tracts specifically) and I have lots of long,lat points.
I am hoping that there would exist an efficient Python code to identify which census tract a given coordinate is in, however so far my googling hasn't revealed anything.
Thanks!

I found an interesting article describing how to do exactly what you are looking to do.
TL;DR: Use Shapely
You will find this code at the end of the article:
import json
from shapely.geometry import shape, Point
# depending on your version, use: from shapely.geometry import shape, Point
# load GeoJSON file containing sectors
with open('sectors.json') as f:
js = json.load(f)
# construct point based on lon/lat returned by geocoder
point = Point(-122.7924463, 45.4519896)
# check each polygon to see if it contains the point
for feature in js['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
print 'Found containing polygon:', feature

A great option for working with these types of data is PostGIS, a spatial database extender for PostgreSQL. I personally keep all of my geo data in a PostGIS database, and then reference it in python using psycopg2. I know it's not pure python, but it's got unbelievable performance benefits (discussed below) over pure python.
PostGIS has functionality built in to determine if a point or shape is within another shape. The good documentation on the ST_Within function expands upon this simple example:
SELECT
ST_WITHIN({YOUR_POINT},boundary)
FROM census;
-- returns true or false for each of your tracts
The benifit you'll gain from PostGIS that you likely won't achieve elsewhere is indexing, which can improve your speed 1,000x [1], making it better than even the best written C program (unless the C program also creates an index for your data). The database, when properly setup, will cache information about your tracts, and when you ask if a point is within a tract, it won't have to search everything... it can take advantage of it's index.
Getting data into and out of PostGRES is pretty simple. A great tutorial that will walk you through the basics of PostGIS with sample datasets not too different from yours can be found here. It's reasonably long, but if you're new to PostGIS (as I was), you'll be very entertained and excited the entire time:
http://workshops.boundlessgeo.com/postgis-intro/
[1] Indexing decreased a nearest neighbor search in one of my huge databases (20 m from 53 seconds to 8.2 milliseconds.

One cannot have really fast geometric code in Python. Instead the usual approach is to use fast C/C++ library with Python wrappers.
For example, you can start with CGAL - a very comprehensive C++ geometric library. It has Python bindings for most of its routines, see the link http://code.google.com/p/cgal-bindings/.

Related

Is the geoToH3 function as pseudo-code available?

Is there a (python) or pseudocode example of geoToH3 available? I just need this function and would like to avoid installing the library on my target environment (AWS GLUE, PySpark)
I tried to follow the javascript implementation but even that used C magic internally.
There isn't a pseudocode implementation that I'm aware of, but there's a fairly thorough explanation in the documentation. Roughly:
Select the icosahedron face (0-20) the point lies on (using point square distance in 3d space)
Project the point into face-oriented IJK coordinates
Convert the IJK coords to an H3 index by calculating the index digits at each resolution and setting the appropriate bits
The core logic can be found here and here. It's not trivial to implement - unless there's a strong reason to avoid installing, that would be the far easier and more reliable option.

Programming approach to calculating Lowest/Highest Combined Surface(s)

Lowest/Highest Combined Surface(s)
I'm looking for a methodology (and/or preferably a software approach) to create what I'm calling the Lowest (or highest) combined surface for a set of polygons.
So if our input was these two polygons that partially overlap and definitely intersect
My Lowest Combined output would be these three polygons
Given a number of "surfaces" (3d polygons)
We've gone through a variety of approaches and the best solution we could come up with involved applying a point grid to each polygon and performing calculations to return the lowest sets of points at each grid location. The problem is that the original geometry is lost in this approach which doesn't give us a working solution.
Background
I'm looking at a variety of "surfaces" that can be represented by 3d faces (cad Speak) or polygons and usually are distributed in a shapefile (.shp). When there are two surfaces that interact I'm interested in taking either the lowest combined or highest combined surface. I'm able to do this in CAD by manually tracing out new polygons for the interaction zones - but once I get into more than a handful of surfaces this becomes too labor intensive.
The current Approach
My current approach which falls somewhere in the terrible category is to generate a point cloud from each surface on a 1m grid and then do a grid cell based comparison of the points.
I do this by using AutoCAD Civl 3D's surface Generation Tools to create a TIN from each polygon surface and then using its Surface. This is then exported to a 1m DEM file which I believe is a gridded output format.
Next each DEM file is brought into Global Mapper where I generate a single point at the center of each "elevation grid cell". Next this data is exported to a .csv file where each point contains a variety of attributes such as what the name of the surface this point came from and what its altitude is
Next once I have a set of CSV files I run them through a python script that will export the lowest point (and associated attributes) at each grid. I do everything in UTM because the UTM grid is based on meters and it makes everything easier.
Lastly we bring the point file back into global mapper - coloring each point by what surface it started from.
There a variety of issues with this approach - sometimes things don't line up perfectly and there is a variety of cleanup I have to do
Also the edges end up being jagged - as is the case because I've converted nice straight lines into a point cloud
Alternatively we came up with a similar approach in Arc GIS using the Surface Comparison tool, however it had similar limitations to what we ran into with my approach.
What I'm looking for is a way to do this automatically with a variable number of inputs. I'm willing to use just about any tool to have this done, as it seems like it shouldn't be too difficult a process
Software?
When I look at this problem from a programmers point of view it looks rather straight forward - but I'm at a total loss how to proceed. I'm assuming Stack Overflow is the correct stack exchange for this question - but if it should be somewhere else - I'm happy to move it to a different exchange.
I wasn't sure if something like Mathematica (which i have zero experience) with could handle this situation or whether there was some fancy 3d math library in python that could chop polygons up by how they interact and then give me the lowest for co-located polys.
In any case I'm willing to try anything out so please if you have an idea of what tools and/or libraries I can use to do this please share! I have to assume that there is SOMETHING out there that can handle this type of 3d geometric processing
Thanks
EDIT
Because the commenters seem confused I am not asking for code - I am asking for methodologies, libraries, support tools, or even software packages that can perform these operations. I plan to write software to do this, however, I am hoping I don't need to pull out my trig books and write all these operations by hand. I have to assume there is somebody out there that has dealt with something similar before.

Geohashing vs SearchAPI for geospatial querying using datastore

I am creating an appEngine application in python that will need to perform efficient geospatial queries on datastore data. An example use case would be, I need to find the first 20 posts within a 10 mile radius of the current user. Having done some research into my options, I have found that currently what seems like the 2 best approaches for achieving this type of functionality would be:
Indexing geoHashed geopoint data using Python's GeoModel library
Creating/deleting documents of structured data using Google's newer SearchAPI
It seems from a high level perspective that indexing geohashes and performing queries on them directly would be less costly and much faster than having to create and delete a document for every geospatial query, however i've also read that geohashing can be very inaccurate along the equator or along 'faultlines' created by the hashing algorithm. I've seen very few posts contrasting the best methods in detail, and I think stack is a good place to have this conversation, so my questions are as follows:
Has anyone implemented similar features and had positive experiences with either methods?
Which method would be the cheaper alternative?
Which would be the faster alternative?
Is there another important method I'm leaving out?
Thanks in advance.
Geohashing does not have to be inaccurate at all. It's all in the implementation details. What I mean is you can check the neighbouring geocells as well to handle border-cases, and make sure that includes neighbours on the other side of the equator.
If your use case is finding other entities within a radius as you suggest, I would definitely recommend using the Search API. They have a distance function tailored for that use.
Search API queries are more expensive than Datastore queries yes, but if you weigh in the computation time to do these calculations in your instance and probably iterating through all entities for each geohash to make sure the distance is actually less than the desired radius, then I would say Search API is the winner. And don't forget about the implementation time.
You can have a look at this post, it can be another great alternative.
I have used this within my app and it works great for my requirement to find my app users with-in provided radius .

PostGIS ST_intersects python equivalent

I currently use PostGIS as a backbone for a lot of spatial functions I perform in python scripts. Specifically taking several shapefile geometries and seeing if they intersect, and then sorting them into seperate directories. I upload the shapefiles using shp2pgsql and then correlate them using ST_Intersects and then sort them using os/shutil functions in the script.
My problem is one of our teams works only on government networks and cannot get postgres/postGIS approved by their system admins. Is there a python module/function out there that performs the same correlation of geometries as ST_Intersects without the need of postgres? Or if I need to write this myself, is there a site for algorithms pertaining to geometries. For example if I have an upper left and a lower right coordinate, how can I compute the other two points. I'm not asking for anyone to write code for me, just some help being pointed in the right direction.
Also all datums performed in WGS 1984
There are many tools to read Shapefiles, which you can use to get their extents or bounds. These can be used to build an R-tree index with the Rtree pacakge, which has some good examples in the documentation. With an R-tree index, you can use intersection to see where the bounding boxes intersect. This is similar to PostGIS' GiST index, except in my experience it is much faster to build and use. And if/when you need to do a detailed intersection of the geometries, you can use Shapely, which in turn uses GEOS, which is the same library used by PostGIS. So they are all related in similar ways.
See these related questions:
Looking for a fast way to find the polygon a point belongs to using Shapely
Faster way of polygon intersection with shapely

performance - finding all points within certain distance by lat/long

I have a CSV file with points tagged by lat/long (~10K points). I'd like to search for all points within a given distance of a user/specified lat/long coordinate — say, for example, the centroid of Manhattan.
I'm pretty new to programming and databases, so this may be a basic question. If so, I apologize. Is it performant to do this search in pure Python without using a database? As in, could I simply read the CSV into memory and do the search with a Python script? If it is performant, would it scale well as the number of points increases?
Or is this simply infeasible in Python, and I need to investigate using a database that supports geospatial queries?
Additionally, how do I go about understanding the performance of these types of calculations so that I can develop a good intuition for this?
This is definitely possible in python without any databases. I would definitely recommend using numpy. I would do the following:
read all points from csv into a numpy array
Calculate the distance of each point to your given point
Sort the distance or simply find the one with minimum distance using argmin
Because all calculations are vectorized, they happen at close to C speed.
With an okay computer, the I/O will take like 2-3 seconds and the calculation will take less than 100-200 milliseconds.
In terms of math, you can try http://en.wikipedia.org/wiki/Haversine_formula

Categories

Resources