My Profile model has this field:
location = models.PointField(geography=True, dim=2, srid=4326)
I'd like to calculate the distance between the two of these locations (taking into account that the Earth is a spheroid) using GeoDjango, so that I can store this distance in the database.
How can I calculate this distance with GeoDjango?
What units are the results in?
Is there a 'best' way to store this data? Float? Decimal?
I've reviewed previous, similar questions, and haven't found them useful. No answer gives enough explanation of what's happening or why it works.
I'm using Django 1.8 and the latest versions of required libraries for GeoDjango.
Thanks!
Following Abhyudit Jain's comment, I'm using geopy to calculate distance. I'm adding it as a property as opposed to storing it, as per e4c5's advice:
from django.contrib.gis.measure import Distance, D
from geopy.distance import distance
#property
def distance(self):
return Distance(m=distance(self.user_a.profile.location, self.user_b.profile.location).meters)
Geopy defaults to Vincenty’s formulae, with an error of up to 0.5%, and contains a lot of other functionality I'll use in future.
The above returns a GeoDjango Distance object, ready for easy conversion between measurements.
Thanks for the help!
How can I calculate this distance with GeoDjango?
For two objects:
a.location.distance(b.location)
Suppose you have an object a which is an instance of your profile model and you wish to find the distance to every other profile you can perform the following query as described in Geodjango reference:
for profile in Profile.objects.distance(a.location):
print profile.distance
If you only want to compare with objects that are less than 1km distance away:
for profile in Profile.objects.filter(location__dwithin=(a.location, D(km=1)).distance(a.location):
print profile.distance
What units are the results in?
the unit can be whatever you want it to be. What's returned is a distance object. However the default is in meter and that's what the print statement above will display.
Is there a 'best' way to store this data? Float? Decimal?
The best way is not to save it. Typically one does not save in a database what can be calculated by a simple query. And the number of records will grow expontially. For example if you have N profiles in your database it will have some distance property to N-1 other profiles. So you end up with N(N-1) number of records in your 'cache table'
To get the distance computed in a GeoQuerySet you can combine annotate with the GeoDjango Distance database function (not to be confused with the Distance measure)
from django.contrib.gis.db.models.functions import Distance
queryset = Profile.objects.annotate(
distance=Distance('location', a.location)
)
The annotated distance will be a Distance measure. Meaning you can do the following:
for profile in queryset:
print(profile.distance.mi) # or km, etc
To filter for profiles within a certain radius you can add a filter to the QuerySet.
from django.contrib.gis.db.models.functions import Distance as DistanceDBFunction
from django.contrib.gis.measure import Distance as DistanceMeasure
queryset = Profile.objects.annotate(
distance=DistanceDBFunction('location', a.location)
).filter(
distance__lt=DistanceMeasure(mi=1)
)
If you do not need the annotated distance, you can simply use the distance lookups.
from django.contrib.gis.measure import Distance
queryset = Profile.objects.filter(
location__distance_lt=(a.location, Distance(mi=1))
)
Note: the Profile.objects.distance(a.location) as noted in other answers has been deprecated since Django 1.9.
Related
Looking through the documentation:
https://scikit-criteria.readthedocs.io/en/latest/tutorial/quickstart.html
rank.e_.similarity shows the similarity index for the top case, but it does not specify how to get that value for other cases. My project seeks to show the index for top 10 cases of an experiment.
I tried simple indexing but without knowledge of how the object is set up I can not call the value for other cases. The values exist in the backend somewhere as it is needed to determine TOPSIS best case.
The vector stored in rank.e_.similarity contains the similarity values for each of the alternatives.
Thus rank.e_similarity[0] has the similarity index for the alternative rank.alternatives[0].
This value is calculated with the formula :
similarity = d_worst / (d_better + d_worst)
Where:
d_worstis the distance to the anti-ideal alternative
d_betteris the distance to the ideal alternative.
As of scikit-criteria 0.6 the distance metric is configurable. By default the Euclidean distance is used.
For more details regarding the ideal and anti-ideal points I recommend reading the TOPSIS wikipedia page https://en.wikipedia.org/wiki/TOPSIS
All,
I have been working on an index of all MTB trails worldwide. I'm a Python person so for all steps involved I try to use Python modules.
I was able to grab relations from the OSM overpass API like this:
from OSMPythonTools.overpass import Overpass
overpass = Overpass()
def fetch_relation_coords(relation):
rel = overpass.query('rel(%s); (._;>;); out;' % relation)
return rel
rel = fetch_relation_coords("6750628")
I'm choosing this particular relation (6750628) because it is one of several that is resulting in discontinuous (or otherwise erroneous) plots.
I process the "rel" object to get a pandas.DataFrame like this:
elements = pd.DataFrame(rel.toJSON()['elements'])
"elements" looks like this:
The Elements pandas.DataFrame contains rows of the types "relation" (1 in this case), several of the type "way" and many of the type "node". It was my understanding that I would use the "relation" row, "members" column to extract the order of the ways (which point to the nodes), and use that order to make a list of the latitudes and longitudes of the nodes (for later use in leaflet), in the correct order, that is, the order that leads to continuous path on a map.
However, that is not the case. For this particular relation, I end up with the following plot:
If we compare that with the way the relation is displayed on openstreetmap.org itself, we see that it goes wrong (focus on the middle, eastern part of the trail). I have many examples of this happening, although there are also a lot of relations that do display correctly.
So I was wondering, what am I missing? Are there nodes with tags that need to be ignored? I already tried several things, including leaving out nodes with any tags, this does not help. Somewhere my processing is wrong but I don't understand where.
You need to sort the ways inside the relation yourself. Only a few relation types require sorted members, for example some route relations such as route=bus and route=tram. Others may have sorted members, such as route=hiking, route=bicycle etc., but they don't require them. Various other relations, such as boundary relations (type=boundary), usually don't have sorted members.
I'm pretty sure there are already various tools for sorting relation members, obviously this includes the openstreetmap.org website where this relation is shown correctly. Unfortunately I'm not able to point you to these tools but I guess a little bit research will reveal others.
If I opt to just plot the different way on top of each other, I indeed get a continuous plot (index contains the indexes for all nodes per way):
In the Database I would have preferred to have the nodes sorted anyway because I could use them to make a GPX file on the fly. But I guess I did answer my own question with this approach, thank you #scai for tipping me into this direction.
You could have a look at shapely.ops.linemerge, which seems to be smart enough to chain multiple linestrings even if the directions are inconsistent. For example (adapted from here):
from shapely import geometry, ops
line_a = geometry.LineString([[0,0], [1,1]])
line_b = geometry.LineString([[1,0], [2,5], [1,1]]) # <- switch direction
line_c = geometry.LineString([[1,0], [2,0]])
multi_line = geometry.MultiLineString([line_a, line_b, line_c])
merged_line = ops.linemerge(multi_line)
print(merged_line)
# output:
LINESTRING (0 0, 1 1, 2 5, 1 0, 2 0)
Then you just need to make sure that the endpoints match exactly.
i'm trying to clustering trajectories. But this is not easy.
The following stream data (spatio-temporal data) exists.
Here, we can see that each Object_ID has several x, y, and this is a
trajectory.
So I want to follow these points and get the following clusters:
I have already thought of many ways. For example, DBSCAN, TRACLUS, ...
But if I use DBSCAN, I do not know how to put the input value.
In other words, how do I put each object_ID line as an input value? (What
form?)
Or is there a way to put multiple coordinates of each Object_ID first?
object_1: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
object_2: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
object_3: [{x1, y1}, {x2, y2}, {y3, y3}, ... {xn, yn}],
.
.
.
And after I get cluster results, each cluster must have Object information.
Do you know anyone in r or python?
DBSCAN has no particular requirements on the data type.
You just need to be able to compute distances.
So organize the data as necessary for your time series distance function.
Is then try HAC first, then DBSCAN.
The situation follows:
Each supplier has some service areas, which the user have defined using GoogleMaps (polygons).
I need to store this data in the DB and make simple (but fast) queries over this.
Queries should looks like: "List all suppliers with service area containing x,y" or "In which polygons (service areas) x,y are inside?"
At this time, I've found GeoDjango which looks a very complex solution to this problem. To use it, I need a quite complex setup and I couldn't find any recent (and good) tutorial.
I came with this solution:
Store every polygon as a Json into the database
Apply a method to determine if some x,y belongs to any polygon
The problem with this solution is quite obvious: Queries may take too long to execute, considering I need to evaluate every polygon.
Finally: I'm looking for another solution for this problem, and I hope find something that doesn't have setup GeoDjango in my currently running server
Determine wheter some point is inside a polygon is not a problem (I found several examples); the problem is that retrieve every single polygon from DB and evaluate it does not scale. To solve that, I need to store the polygon in such way I can query it fast.
My approach.
Find centroid of polygon C++ code.
Store in database
Find longest distance from vertex to centroid (pythag)
Store as radius
Search database using centroid & radius as bounding box
If 1 or more result use point in polygon on resultant polygons
This solution enables you to store polygons outside of GeoDjango to dramatically speed up point in polygon queries.
In my case, I needed to find whether the coordinates of my numpy arrays where inside a polygon stored in my geodjango db (land/water masking). This required iterating over every coordinate combination in my arrays to test if it was inside or outside the polygon. As my arrays are large, this was taking a very long time using geodjango.
Using django's GEOSGeometry.contains my command looked something like this:
import numpy as np
from django.contrib.gis.geos import Point
my_polygon = model.geometry # get model multipolygon field
lat_lon = zip(latitude.flat, longitude.flat) # zip coordinate arrays to tuple
mask = np.array([my_polygon.contains(Point(l)) for l in lon_lat]) # boolean mask
This was taking 20 s or more on large arrays. I tried different ways of applying the geometry.contains() function over the array (e.g. np.vectorize) but this did not lead to any improvements. I then realised it was the Django contains lookup which was taking too long. I also converted the geometry to a shapely polygon and tested shapely's polygon.contains function - no difference or worse.
The solution lay in bypassing GeoDjango by using Polygon isInside method. First I created a function to create a Polygon object from my Geos Multipolygon.
from Polygon import Polygon
def multipolygon_to_polygon(multipolygon):
"""
Convert a Geos Multipolygon to python Polygon
"""
polygon = multipolygon[0] # select first polygon object
nrings = polygon.num_interior_rings # get number of rings in polygon
poly = Polygon()
poly.addContour(polygon[0].coords) # Add first ring coordinates tuple
# Add subsequent rings
if nrings > 0:
for i in range(nrings):
print("Adding ring %s" % str(i+1))
hole = True
poly.addContour(polygon[i+1].coords, hole)
return poly
Applying this to my problem
my_polygon = model.geometry # get model multipolygon field
polygon = multipolygon_to_polygon(my_polygon) # convert to python Polygon
lat_lon = zip(bands['latitude'].flat, bands['longitude'].flat) # points tuple
land_mask = array([not polygon.isInside(ll[1], ll[0]) for ll in lat_lon])
This resulted in a roughly 20X improvement in speed. Hope this helps someone.
Python 2.7.
I am looking for a minimalistic solution for doing basic geospatial search in Python.
We have a dataset of roughly 10 k locations and we need to solve the find the all locations within a radius of N kilometers from a given location. I am not looking for explicit database with geospatial support. I hope to get around another external solution. Is there something that would use Python only?
Shapely seems to be a good solution. Its description seems to correspond to what you're looking for :
[Shapely] It lets you do PostGIS-ish stuff outside the context of a database using Python.
It is based on GEOS, which a widely used C++ library.
Here is a link to the documentation
scipy.spatial has a kd-tree implementation that might be the most popular in Python.
A self made solution without any external modules could be something like this:
import numpy as np
points = np.array([[22.22, 33.33],
[08.00, 05.00],
[03.12, 05.00],
[09.00, 08.00],
[-02.5, 03.00],
[0.00, -01.00],
[-10.0,-10.00],
[12.00, 12.00],
[-4.00, -6.00]])
r = 10.0 # Radius withing the points should lie
xm = 3 # Center x coordinate
ym = 8 # Center y coordinate
points_i = points[((points[:,0] - xm)**2 + (points[:,1] - ym)**2)**(1/2.0) < r]
points_i contains those points which lie within the radius. This solution requires the data to be in a numpy array which is to my knowledge also a very fast way to go trough large data sets as oppose to for loops. I guess this solution is pretty much minimalistic. The plot below shows the outcome with the data given in the code.