Previously I have used OSMnx library in python to get the closest drive way to a particular gps datapoint. To do so I was using following code:
places=['Nebraska, USA']
G=ox.graph_from_place(places,network_type='drive')
origin_point = (lat, long)
nearest_edge = ox.get_nearest_edge(G, origin_point)
Now what I want to do is querying OpenStreetMap with Athena for the same thing (still in the python). I want to give bunch of gps datapoints and for each datapoint get the closest road. Does anyone know how I should do this?
Also if you know any documentation which can help I really appreciate it.
Thanks
Athena and Presto support Geo-spatial functions such as:
SELECT ST_Distance(ST_Point(-71.0882, 42.3607), ST_Point(-74.1197, 40.6976))
Based on the dataset that you want to focus on and its format you can build in S3 a databased on the location that you care about such as roads in Nebraska, USA, and query against it.
Related
I have been doing some investigation to find a package to install and use for Geospatial Analytics
The closest I got to was https://github.com/harsha2010/magellan - This however has only scala interface and no doco how to use it with Python.
I was hoping if you someone knows of a package I can use?
What I am trying to do is analyse Uber's data and map it to the actual postcodes/suburbs and run it though SGD to predict the number of trips to a particular suburb.
There is already lots of data info here - http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/#comment-606532 and I am looking for ways to do it in Python.
In Python I'd take a look at GeoPandas. It provides a data structure called GeoDataFrame: it's a list of features, each one having a geometry and some optional attributes. You can join two GeoDataFrames together based on geometry intersection, and you can aggregate the numbers of rows (say, trips) within a single geometry (say, postcode).
I'm not familiar with Uber's data, but I'd try to find a way to get it into a GeoPandas GeoDataFrame.
Likewise postcodes can be downloaded from places like the U.S. Census, OpenStreetMap[1], etc, and coerced into a GeoDataFrame.
Join #1 to #2 based on geometry intersection. You want a new GeoDataFrame with one row per Uber trip, but with the postcode attached to each. Another StackOverflow post discusses how do to this, and it's currently harder than it ought to be.
Aggregate this by postcode and count the trips in each. The code will look like joined_dataframe.groupby('postcode').count().
My fear for the above process is if you have hundreds of thousands of very complex trip geometries, it could take forever on one machine. The link you posted uses Spark and you may end up wanting to parallelize this after all. You can write Python against a Spark cluster(!) but I'm not the person to help you with this component.
Finally, for the prediction component (e.g. SGD), check out scikit-learn: it's a pretty fully featured machine learning package, with a dead simple API.
[1]: There is a separate package called geopandas_osm that grabs OSM data and returns a GeoDataFrame: https://michelleful.github.io/code-blog/2015/04/27/osm-data/
I realize this is an old questions, but to build on Jeff G's answer.
If you arrive at this page looking for help putting together a suite of geospatial analytics tools in python - I would highly recommend this tutorial.
https://geohackweek.github.io/vector
It really picks up steam in the 3rd section.
It shows how to integrate
GeoPandas
PostGIS
Folium
rasterstats
add in scikit-learn, numpy, and scipy and you can really accomplish a lot. You can grab information from this nDarray tutorial as well
I'm building a django app that requires to display (on google map) items near the user current location. (ie. the nearest coffee shop kind of request).
I haven't decided how I will capture those items' location in db.
The requirements aren't final but I'm guessing the user will want to sort by distance, etc.
My question is what solution should I look into?
I looked at geodjango but it seems to be an overkill.
Could you please tell me what solution you would use and why?
Thanks everyone!
You will need to use RDBMS like MySQL or postgresql. Then, create your objects (e.g: cafeshops) with latitude and longitude as flout. Get the user's latitude and longitude and look it up via sin and cos functions.
You will need to write a raw sql query to look up objects based on their latitude and longitude.
Read this: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Take a look at this: Filter zipcodes by proximity in Django with the Spherical Law of Cosines
and this: latitude/longitude find nearest latitude/longitude - complex sql or complex calculation
I am wondering, is there a way to dump values from a grib1 file? My end goal is to find values for individual messages at latitude and longitude, or at least a grid point. I am using a linux system. Wgrib seems to do nothing except read metadata about the messages, or reconstruct the messages.
I know a bit of python, so I can use pygrib, but I don't know how to pull the values out for a specific latitude and longitude.
Here are some .grb files for everyone to play around with.
http://nomads.ncdc.noaa.gov/data/gfs-avn-hi/201402/20140218/
Thank you for your answers,
If you are interested in data from NOMADS, I would suggest going through their THREDDS Data Server, which will allow you to access data by specifying a lat/lon, and you can get that data back as a csv file, if you wish. To do so, first visit the NOMADS TDS site:
http://nomads.ncdc.noaa.gov/thredds/catalog/catalog.html
The example data files you linked to can be found here:
http://nomads.ncdc.noaa.gov/thredds/catalog/gfs-003/201402/20140218/catalog.html
Find a grid that you are interested in, say the 18Z run analysis field:
http://nomads.ncdc.noaa.gov/thredds/catalog/gfs-003/201402/20140218/catalog.html?dataset=gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb
Follow the link that says "NetcdfService":
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb/dataset.html
Near the top of that page, click "As Point Dataset":
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb/pointDataset.html
Then check the parameters you are interested in, the lat/lon (the closest grid point to that lat/lon will be chosen), and the output format type.
This web interface basically generates an access url, which if I want Temperature over Boulder, CO, returned in csv, this is what it looks like:
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb?var=Temperature_surface&latitude=40&longitude=-105&temporal=all&accept=csv&point=true
As you can see from the above URL, you can generate these pretty easily and make the request without going through all the steps above.
This access method (NetcdfSubsetService) can be combined with Python easily. For an example, check out this ipython notebook:
http://nbviewer.ipython.org/github/Unidata/tds-python-workshop/blob/master/ncss.ipynb
Specifically, the first and second cells in the notebook.
Note that you can get recent GFS data, in which an entire model run is contained within one grib file, at:
http://thredds.ucar.edu/thredds/idd/modelsNcep.html
This would allow you to make a request, like the one above, but for multiple times using one request.
Cheers,
Sean
You can use grib tools, specifically grib_ls and grib_get to get values from 1 grid point or 4 grid points nearest to specified latitude and longitude. So, you can use nearest neighbour or bilinear interpolation or whatever you like to get you value.
Read this presentation, grib_ls starts at page 31:
http://nwmstest.ecmwf.int/services/computing/training/material/grib_api/grib_api_tools.pdf
When you install grib tools, you will get several tools to help you play around with GRIB files.
I have a large collection (and growing) of geospatial data (lat, lon) points (stored in mongodb, if that helps).
I'd like to generate a choropleth map (http://vis.stanford.edu/protovis/ex/choropleth.html), which requires knowing the state which contains that point. Is there a database or algorithm that can do this without requiring call to external APIs (i.e. I'm aware of things like geopy and the google maps API).
Actually, the web app you linked to contains the data you need -
If you look at http://vis.stanford.edu/protovis/ex/us_lowres.js for each state, borders[] contains a [lat,long] polyline which outlines the state. Load this data and check for point-in-polygon - http://en.wikipedia.org/wiki/Point_in_polygon
Per Reverse Geocoding Without Web Access you can speed it up a lot by pre-calculating a bounding box on each state and only testing point-in-polygon if point-in-bounding-box.
Here's how to do it in FORTRAN. Remember FORTRAN? Me neither. Anyway, it looks pretty simple, as every state has its own range.
EDIT It's been point out to me that your starting point is LAT-LONG, not the zipcode.
The algorithm for converting a lat-long to a political division is called "a map". Seriously, that's allan ordinary map is, a mapping of every point in some range to the division it belongs to. A detailed digital map of all 48 contiguous states would be a big database, and then you would need some (fairly simple) code to determine for each state (described as a series of line segments outlining the border) whether a given point was inside it or out.
you can try using Geonames database. It has long/lat as well as city, postal and other location type data. It is free as well.
but If you need to host it locally or import it into your own database , the USGS and NGA provide a comprehensive list of cities with lat/lon. It's updated reguarly, free, and reliable.
http://geonames.usgs.gov
http://earth-info.nga.mil/gns/html/index.html
Not sure the quality of the data, but give this a shot: http://www.boutell.com/zipcodes/
If you don't mind a very crude solution, you could adapt the click-map here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I am working on an application where one of the requirements is that I be able to perform realtime reverse geocoding operations based on GPS data. In particular, I must be able to determine the state/province to which a latitude, longitude pair maps and detect when we have moved from one state/province to another.
I have a couple ideas so far but wondered if anyone had any ideas on either of the following:
What is the best approach for tackling this problem in an efficient manner?
Where is a good place to find and what is the appropriate format for North American state/province boundaries
As a starter, here are the two main ideas I have:
Break North America into a grid with each rectangle in the grid mapping to a particular state province. Do a lookup on this table (which grows quickly the more precise you would like to be) based on the latitude and then the longitude (or vice versa).
Define polygons for each of the states and do some sort of calculation to determine in which polygon a lat/lon pair lies. I am not sure exactly how to go about this. HTML image maps come to mind as one way of defining the bounds for a state/province.
I am working in python for the interested or those that might have a nice library they would like to suggest.
To be clear... I do not have web access available to me, so using an existing reverse geocoding service is not an option at runtime
I created an offline reverse geocoding module for countries: https://github.com/richardpenman/reverse_geocode
>>> import reverse_geocode
>>> coordinates = (-37.81, 144.96), (31.76, 35.21)
>>> reverse_geocode.search(coordinates)
[{'city': 'Melbourne', 'code': 'AU', 'country': 'Australia'},
{'city': 'Jerusalem', 'code': 'IL', 'country': 'Israel'}]
I will see if I can add data for states.
I suggest using a variant of your first idea: Use a spatial index. A spatial index is a data structure built from rectangles, mapping lat/long to the payload. In this case you will probably map rectangles to state-province pairs. An R-tree may be a good option. Here's an R-tree python package. You could detect roaming by comparing the results of consecutive searches.
I would stay away from implementing your own solution from scratch. This is a pretty big undertaking and there are already tools out there to do this. If you're looking for an open source approach (read: free), take a look at this blog post: Using PostGIS to Reverse Geocode.
If you can get hold of state boundaries as polygons (for example, via OpenStreetMap), determining the current state is just a point-in-polygon test.
If you need address data, an offline solution would be to use Microsoft Mappoint.
You can get data for the entire united states from open street map You could then extract the data you need such as city or state locations into what ever format works best for your application. Note although data quality is good it isn't guaranteed to be completely accurate so if you need complete accuracy you may have to look somewhere else.
I have a database with all of this data and some access tools. I made mine from the census tiger data. I imagine it'd basically be an export of my database to sqlite and a bit of code translation.
The free reverse geocoding service I developed (www.feroeg.com) is based on spatialite, a sqlite library implementing SQL spatial capabilities (r-tree).
The data are imported from OpenStreetMap (nation, cities, streets,street number) and OpenAddresses (street numbers) using proprietary tools.
The entire world consumes about 250GB.
There is a paper describing the architecture of the service:
https://feroeg.com/Feroeg_files/Feroeg Presentation.pdf
At the moment the project (importer and server) is closed source.
Reverse Geocoding Library (C++) and converting tools are availabile on request.