I have a lot of UK data and what I would like to do is extract this data based upon a post code, co-ordinates, grid ref etc.
Is this possible using Python?
Yes. If you just have the postcodes, you'll first most likely need to convert them to coordinates. This can be done with 3rd party tools such as Googles Distance Matrix API, or the Royal Mail UK Postcode mailing list. Once you have coordinates, with this, you can plot them however you like using other tools such as Highcharts, or make your own.
Related
Let's say I want to create maps of crime, education, traffic or etc on a street or a city. What are the modules I need to learn or the best ones?
For some data, I will be using excell like documents where I will have street names or building numbers unlinked to Google Maps directly and will be combined later through codes. For some, I want to obtain data directly from Google Maps, such as names of the stores or street numbers. I'm a beginner and a sociologist and this is the main reason I want to learn programming. Maybe painting on a map picture can be a lot easier but on the long term my aim is using Google Maps since it can obtain data by itself. Thanks in advance.
I'm a beginner, need a long shot plan and an advice. I watched some numpy and pandas videos and they seem ok and doable so far.
There are several Python modules that can be used to work with Google Maps data. Some of the most popular ones include:
Google Maps API: This is the official API for working with Google Maps data. It allows you to access a wide range of data, including street maps, satellite imagery, and places of interest. You can use the API to search for addresses, get directions, and even create custom maps.
gmaps: This is a Python wrapper for the Google Maps API. It makes it easy to work with the API by providing a simple, Pythonic interface. It also includes support for several popular Python libraries, such as Pandas and Numpy.
folium: This is a library for creating leaflet maps in Python. It allows you to create interactive maps, add markers and other data, and customize the appearance of your maps.
geopandas: This library allows you to work with geospatial data in Python. It is built on top of the popular Pandas library and includes support for working with shapefiles, geojson, and more.
geopy: This is a Python library for working with geocoding and distance calculations. It can be used to convert addresses to latitude and longitude coordinates, as well as to perform distance calculations between two points.
In general, it's recommended to start with Google Maps API and gmaps and folium, you can also use geopandas and geopy later when you need more advanced functionalities. Try to start with simple examples and gradually increase the complexity of your projects.
I have been doing some investigation to find a package to install and use for Geospatial Analytics
The closest I got to was https://github.com/harsha2010/magellan - This however has only scala interface and no doco how to use it with Python.
I was hoping if you someone knows of a package I can use?
What I am trying to do is analyse Uber's data and map it to the actual postcodes/suburbs and run it though SGD to predict the number of trips to a particular suburb.
There is already lots of data info here - http://hortonworks.com/blog/magellan-geospatial-analytics-in-spark/#comment-606532 and I am looking for ways to do it in Python.
In Python I'd take a look at GeoPandas. It provides a data structure called GeoDataFrame: it's a list of features, each one having a geometry and some optional attributes. You can join two GeoDataFrames together based on geometry intersection, and you can aggregate the numbers of rows (say, trips) within a single geometry (say, postcode).
I'm not familiar with Uber's data, but I'd try to find a way to get it into a GeoPandas GeoDataFrame.
Likewise postcodes can be downloaded from places like the U.S. Census, OpenStreetMap[1], etc, and coerced into a GeoDataFrame.
Join #1 to #2 based on geometry intersection. You want a new GeoDataFrame with one row per Uber trip, but with the postcode attached to each. Another StackOverflow post discusses how do to this, and it's currently harder than it ought to be.
Aggregate this by postcode and count the trips in each. The code will look like joined_dataframe.groupby('postcode').count().
My fear for the above process is if you have hundreds of thousands of very complex trip geometries, it could take forever on one machine. The link you posted uses Spark and you may end up wanting to parallelize this after all. You can write Python against a Spark cluster(!) but I'm not the person to help you with this component.
Finally, for the prediction component (e.g. SGD), check out scikit-learn: it's a pretty fully featured machine learning package, with a dead simple API.
[1]: There is a separate package called geopandas_osm that grabs OSM data and returns a GeoDataFrame: https://michelleful.github.io/code-blog/2015/04/27/osm-data/
I realize this is an old questions, but to build on Jeff G's answer.
If you arrive at this page looking for help putting together a suite of geospatial analytics tools in python - I would highly recommend this tutorial.
https://geohackweek.github.io/vector
It really picks up steam in the 3rd section.
It shows how to integrate
GeoPandas
PostGIS
Folium
rasterstats
add in scikit-learn, numpy, and scipy and you can really accomplish a lot. You can grab information from this nDarray tutorial as well
I am wondering, is there a way to dump values from a grib1 file? My end goal is to find values for individual messages at latitude and longitude, or at least a grid point. I am using a linux system. Wgrib seems to do nothing except read metadata about the messages, or reconstruct the messages.
I know a bit of python, so I can use pygrib, but I don't know how to pull the values out for a specific latitude and longitude.
Here are some .grb files for everyone to play around with.
http://nomads.ncdc.noaa.gov/data/gfs-avn-hi/201402/20140218/
Thank you for your answers,
If you are interested in data from NOMADS, I would suggest going through their THREDDS Data Server, which will allow you to access data by specifying a lat/lon, and you can get that data back as a csv file, if you wish. To do so, first visit the NOMADS TDS site:
http://nomads.ncdc.noaa.gov/thredds/catalog/catalog.html
The example data files you linked to can be found here:
http://nomads.ncdc.noaa.gov/thredds/catalog/gfs-003/201402/20140218/catalog.html
Find a grid that you are interested in, say the 18Z run analysis field:
http://nomads.ncdc.noaa.gov/thredds/catalog/gfs-003/201402/20140218/catalog.html?dataset=gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb
Follow the link that says "NetcdfService":
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb/dataset.html
Near the top of that page, click "As Point Dataset":
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb/pointDataset.html
Then check the parameters you are interested in, the lat/lon (the closest grid point to that lat/lon will be chosen), and the output format type.
This web interface basically generates an access url, which if I want Temperature over Boulder, CO, returned in csv, this is what it looks like:
http://nomads.ncdc.noaa.gov/thredds/ncss/grid/gfs-003/201402/20140218/gfs_3_20140218_1800_000.grb?var=Temperature_surface&latitude=40&longitude=-105&temporal=all&accept=csv&point=true
As you can see from the above URL, you can generate these pretty easily and make the request without going through all the steps above.
This access method (NetcdfSubsetService) can be combined with Python easily. For an example, check out this ipython notebook:
http://nbviewer.ipython.org/github/Unidata/tds-python-workshop/blob/master/ncss.ipynb
Specifically, the first and second cells in the notebook.
Note that you can get recent GFS data, in which an entire model run is contained within one grib file, at:
http://thredds.ucar.edu/thredds/idd/modelsNcep.html
This would allow you to make a request, like the one above, but for multiple times using one request.
Cheers,
Sean
You can use grib tools, specifically grib_ls and grib_get to get values from 1 grid point or 4 grid points nearest to specified latitude and longitude. So, you can use nearest neighbour or bilinear interpolation or whatever you like to get you value.
Read this presentation, grib_ls starts at page 31:
http://nwmstest.ecmwf.int/services/computing/training/material/grib_api/grib_api_tools.pdf
When you install grib tools, you will get several tools to help you play around with GRIB files.
I have a large collection (and growing) of geospatial data (lat, lon) points (stored in mongodb, if that helps).
I'd like to generate a choropleth map (http://vis.stanford.edu/protovis/ex/choropleth.html), which requires knowing the state which contains that point. Is there a database or algorithm that can do this without requiring call to external APIs (i.e. I'm aware of things like geopy and the google maps API).
Actually, the web app you linked to contains the data you need -
If you look at http://vis.stanford.edu/protovis/ex/us_lowres.js for each state, borders[] contains a [lat,long] polyline which outlines the state. Load this data and check for point-in-polygon - http://en.wikipedia.org/wiki/Point_in_polygon
Per Reverse Geocoding Without Web Access you can speed it up a lot by pre-calculating a bounding box on each state and only testing point-in-polygon if point-in-bounding-box.
Here's how to do it in FORTRAN. Remember FORTRAN? Me neither. Anyway, it looks pretty simple, as every state has its own range.
EDIT It's been point out to me that your starting point is LAT-LONG, not the zipcode.
The algorithm for converting a lat-long to a political division is called "a map". Seriously, that's allan ordinary map is, a mapping of every point in some range to the division it belongs to. A detailed digital map of all 48 contiguous states would be a big database, and then you would need some (fairly simple) code to determine for each state (described as a series of line segments outlining the border) whether a given point was inside it or out.
you can try using Geonames database. It has long/lat as well as city, postal and other location type data. It is free as well.
but If you need to host it locally or import it into your own database , the USGS and NGA provide a comprehensive list of cities with lat/lon. It's updated reguarly, free, and reliable.
http://geonames.usgs.gov
http://earth-info.nga.mil/gns/html/index.html
Not sure the quality of the data, but give this a shot: http://www.boutell.com/zipcodes/
If you don't mind a very crude solution, you could adapt the click-map here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I am working on an application where one of the requirements is that I be able to perform realtime reverse geocoding operations based on GPS data. In particular, I must be able to determine the state/province to which a latitude, longitude pair maps and detect when we have moved from one state/province to another.
I have a couple ideas so far but wondered if anyone had any ideas on either of the following:
What is the best approach for tackling this problem in an efficient manner?
Where is a good place to find and what is the appropriate format for North American state/province boundaries
As a starter, here are the two main ideas I have:
Break North America into a grid with each rectangle in the grid mapping to a particular state province. Do a lookup on this table (which grows quickly the more precise you would like to be) based on the latitude and then the longitude (or vice versa).
Define polygons for each of the states and do some sort of calculation to determine in which polygon a lat/lon pair lies. I am not sure exactly how to go about this. HTML image maps come to mind as one way of defining the bounds for a state/province.
I am working in python for the interested or those that might have a nice library they would like to suggest.
To be clear... I do not have web access available to me, so using an existing reverse geocoding service is not an option at runtime
I created an offline reverse geocoding module for countries: https://github.com/richardpenman/reverse_geocode
>>> import reverse_geocode
>>> coordinates = (-37.81, 144.96), (31.76, 35.21)
>>> reverse_geocode.search(coordinates)
[{'city': 'Melbourne', 'code': 'AU', 'country': 'Australia'},
{'city': 'Jerusalem', 'code': 'IL', 'country': 'Israel'}]
I will see if I can add data for states.
I suggest using a variant of your first idea: Use a spatial index. A spatial index is a data structure built from rectangles, mapping lat/long to the payload. In this case you will probably map rectangles to state-province pairs. An R-tree may be a good option. Here's an R-tree python package. You could detect roaming by comparing the results of consecutive searches.
I would stay away from implementing your own solution from scratch. This is a pretty big undertaking and there are already tools out there to do this. If you're looking for an open source approach (read: free), take a look at this blog post: Using PostGIS to Reverse Geocode.
If you can get hold of state boundaries as polygons (for example, via OpenStreetMap), determining the current state is just a point-in-polygon test.
If you need address data, an offline solution would be to use Microsoft Mappoint.
You can get data for the entire united states from open street map You could then extract the data you need such as city or state locations into what ever format works best for your application. Note although data quality is good it isn't guaranteed to be completely accurate so if you need complete accuracy you may have to look somewhere else.
I have a database with all of this data and some access tools. I made mine from the census tiger data. I imagine it'd basically be an export of my database to sqlite and a bit of code translation.
The free reverse geocoding service I developed (www.feroeg.com) is based on spatialite, a sqlite library implementing SQL spatial capabilities (r-tree).
The data are imported from OpenStreetMap (nation, cities, streets,street number) and OpenAddresses (street numbers) using proprietary tools.
The entire world consumes about 250GB.
There is a paper describing the architecture of the service:
https://feroeg.com/Feroeg_files/Feroeg Presentation.pdf
At the moment the project (importer and server) is closed source.
Reverse Geocoding Library (C++) and converting tools are availabile on request.