How to find geographically near documents in Redis, like $near in MongoDB? - python

In MongoDB if we provide a coordinate and a distance, using $near operator will find us the documents nearby within the provided distance, and sorted by distance to the given point.
Does Redis provide similar functions?

For the people who come accross this old question right now,
Now redis has new geographical commands such as GEOADD and GEORADIUS which covers the question's requirements.

Noelkd was right. There is no inbuilt function in Redis.
I found that the simplest solution is to use geohash to store the hashed lat/lng as keys.
Geohash is able to store locations nearby with similar structure, e.g.
A hash of a certain location is ebc8ycq, then the nearby locations can be queried with the wildcard ebc8yc* in Redis.

Related

Django and location based app

I'm building a django app that requires to display (on google map) items near the user current location. (ie. the nearest coffee shop kind of request).
I haven't decided how I will capture those items' location in db.
The requirements aren't final but I'm guessing the user will want to sort by distance, etc.
My question is what solution should I look into?
I looked at geodjango but it seems to be an overkill.
Could you please tell me what solution you would use and why?
Thanks everyone!
You will need to use RDBMS like MySQL or postgresql. Then, create your objects (e.g: cafeshops) with latitude and longitude as flout. Get the user's latitude and longitude and look it up via sin and cos functions.
You will need to write a raw sql query to look up objects based on their latitude and longitude.
Read this: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Take a look at this: Filter zipcodes by proximity in Django with the Spherical Law of Cosines
and this: latitude/longitude find nearest latitude/longitude - complex sql or complex calculation

How would I sort a set of lat/lons by distance from a single lat/lon?

A user signs up for my site and enters in their zip code. I want to query for other users, and sort by distance.
I have a database full of zip codes with lat/lon points for each zip code.
zip_code (char)
lat (float)
lon (float)
I have a method which will calculate the distance between two sets of lat/lons, but to run this on every other zip code in my db is expensive. I'd need to run this on every zip code combination. I suppose I can do it once and store it somewhere, but where would I store it? Seems strange to have a table for every zip code which would contain the distance to every other zip code. Is there a clean way to do this?
Doing it once and storing it somewhere sounds good to me. Here are some ideas that might give good performance with some consideration to storage space without sacrificing accuracy:
There are something like 43,191 zip codes, so the full would be 1,865,462,481. But the distances are of course symmetrical and the self-to-self ones are useless, which immediately cuts it down to 932,709,645 entries. We might also cut the space by realizing that a bunch of zip codes are either the same as each other, or one contains the other (e.g. 10178 seems to be inside 10016, and they're both geographically small). Many zip codes will have no users at all, so we might avoid populating those until they're needed (i.e. lazy load the cache). And finally, you can probably throw away large-distance results, where large is defined as a distance greater than is useful for your users.
For a more algorithmic view, see this previous question: Calculate distance between zip codes and users
Bonus tip: don't forget about non-US users. Poor non-US users.
Here's a solution with a fair amount of overhead, but which will pay off as your dataset size, user base, and/or number of transactions grow:
If you don't already have one, use a database that supports spatial types and spatial indexing. I recommend the PostGIS extension for PostGres, but most of these steps apply to other spatially-enabled databases:
Store your zip code location as Point geometry type instead of a two columns for lat and long.
Create a spatial index against the Point geometry column. Every time you add a new zip code, its location will automatically be added to the spatial index.
Assuming you don't want to show "nearest" neighbors that are thousands of miles away, use a Within function (ST_DWithin in PostGIS) to filter out those zip codes that are too far away. This will significantly reduce the search space for close neighbors.
Finally use a Distance function (ST_Distance in PostGIS) to calculate the distance between your zip code of interest and its closer neighbors, and use the DB to return results sorted by distance.
By using a database with spatial index and a filtering function that uses that index, you can significantly speed up your search. And when the time comes to do more spatial analysis or show maps, you'll already have a framework in place to support that new functionality.

Best data structure for finding nearby xyz points by distance?

I'm working on Python, but I suppose that doesn't affect the question itself.
I'm working on a game, and I need to store entities, each of which has an [x,y,z] in the world. I need to be able to run a "All entities within X euclidean distance of point Y".
These entities will be moving fairly often.
What would be the most efficient way to store the entities to make this as fast as possible?
As an alternative to what has been suggested already, if you don't need an exact distance, you could also use spatial hashing, which is quite easy to implement.
In summary, you have to think of your world as a grid where each cell in the grid would correspond to one bucket in the hash table. Since your entities are moving often, on each new frame you could clear and reconstruct the whole table and put the entities to their corresponding bucket depending on their position. Then, for any given entity you could just check the cells nearby and get the entities lists.
You can use a kd-tree (link has photo and code and examples) or an octree (this link is a C++ class template which you can use). Real-usage can be seen in this open-source game engine

How to convert from lat lon to zipcode or state to generate choropleth map

I have a large collection (and growing) of geospatial data (lat, lon) points (stored in mongodb, if that helps).
I'd like to generate a choropleth map (http://vis.stanford.edu/protovis/ex/choropleth.html), which requires knowing the state which contains that point. Is there a database or algorithm that can do this without requiring call to external APIs (i.e. I'm aware of things like geopy and the google maps API).
Actually, the web app you linked to contains the data you need -
If you look at http://vis.stanford.edu/protovis/ex/us_lowres.js for each state, borders[] contains a [lat,long] polyline which outlines the state. Load this data and check for point-in-polygon - http://en.wikipedia.org/wiki/Point_in_polygon
Per Reverse Geocoding Without Web Access you can speed it up a lot by pre-calculating a bounding box on each state and only testing point-in-polygon if point-in-bounding-box.
Here's how to do it in FORTRAN. Remember FORTRAN? Me neither. Anyway, it looks pretty simple, as every state has its own range.
EDIT It's been point out to me that your starting point is LAT-LONG, not the zipcode.
The algorithm for converting a lat-long to a political division is called "a map". Seriously, that's allan ordinary map is, a mapping of every point in some range to the division it belongs to. A detailed digital map of all 48 contiguous states would be a big database, and then you would need some (fairly simple) code to determine for each state (described as a series of line segments outlining the border) whether a given point was inside it or out.
you can try using Geonames database. It has long/lat as well as city, postal and other location type data. It is free as well.
but If you need to host it locally or import it into your own database , the USGS and NGA provide a comprehensive list of cities with lat/lon. It's updated reguarly, free, and reliable.
http://geonames.usgs.gov
http://earth-info.nga.mil/gns/html/index.html
Not sure the quality of the data, but give this a shot: http://www.boutell.com/zipcodes/
If you don't mind a very crude solution, you could adapt the click-map here.

How to best design a date/geographic proximity query on GAE?

I'm building a directory for finding athletic tournaments on GAE with
web2py and a Flex front end. The user selects a location, a radius, and a maximum
date from a set of choices. I have a basic version of this query implemented, but it's
inefficient and slow. One way I know I can improve it is by condensing
the many individual queries I'm using to assemble the objects into
bulk queries. I just learned that was possible. But I'm also thinking about a more extensive redesign that utilizes memcache.
The main problem is that I can't query the datastore by location
because GAE won't allow multiple numerical comparison statements
(<,<=,>=,>) in one query. I'm already using one for date, and I'd need
TWO to check both latitude and longitude, so it's a no go. Currently,
my algorithm looks like this:
1.) Query by date and select
2.) Use destination function from geopy's distance module to find the
max and min latitude and longitudes for supplied distance
3.) Loop through results and remove all with lat/lng outside max/min
4.) Loop through again and use distance function to check exact
distance, because step 2 will include some areas outside the radius.
Remove results outside supplied distance (is this 2/3/4 combination
inefficent?)
5.) Assemble many-to-many lists and attach to objects (this is where I
need to switch to bulk operations)
6.) Return to client
Here's my plan for using memcache.. let me know if I'm way out in left
field on this as I have no prior experience with memcache or server
caching in general.
-Keep a list in the cache filled with "geo objects" that represent all
my data. These have five properties: latitude, longitude, event_id,
event_type (in anticipation of expanding beyond tournaments), and
start_date. This list will be sorted by date.
-Also keep a dict of pointers in the cache which represent the start
and end indices in the cache for all the date ranges my app uses (next
week, 2 weeks, month, 3 months, 6 months, year, 2 years).
-Have a scheduled task that updates the pointers daily at 12am.
-Add new inserts to the cache as well as the datastore; update
pointers.
Using this design, the algorithm would now look like:
1.) Use pointers to slice off appropriate chunk of list based on
supplied date.
2-4.) Same as above algorithm, except with geo objects
5.) Use bulk operation to select full tournaments using remaining geo
objects' event_ids
6.) Assemble many-to-manys
7.) Return to client
Thoughts on this approach? Many thanks for reading and any advice you
can give.
-Dane
GeoModel is the best I found. You may look how my GAE app return geospatial queries. For instance India http query is with optional cc (country code) using geomodel library lat=20.2095231&lon=79.560344&cc=IN
You might be interested by geohash, which enables you to do an inequality query like this:
SELECT latitude, longitude, title FROM
myMarkers WHERE geohash >= :sw_geohash
AND geohash <= :ne_geohash
Have a look at this fine article which was featured in this month's Google App Engine App Engine Community Update blog post.
As a note on your proposed design, don't forget that entities in Memcache have no guarantee of staying in memory, and that you can not have them "sorted by date".

Categories

Resources