I was struggling with using the drawmapscale function at matplotlib basemap. I couldn't understand what the syntax is extactly mean. So far I have understood the following?
e.g. map.drawmapscale(80.625, 5.75, ???, ???, 100)
As I understood, above function generate mapscale at longitude 80.625 and latitude 5.75. It should represent 100 km. But how do you understand the other two parameters? I played with some random numbers, but results are not good. I have searched on the web no satisfactory answer was found. Any help is appreciated.
Looking at the documentation
drawmapscale(lon, lat, lon0, lat0, length, **kwargs)
Draw a map scale at lon,lat of length length representing distance in the map projection coordinates at lon0,lat0.
From that one would assume that lon0,lat0 need to be the coordinates of the place in the map where 100km are to be measured.
As a start one may choose lon0 == lon and lat0 == lat. This is of course the less erroneous, the smaller the map. Whether this gives good results would also depend on the projection in use. One may also choose to use the coordinates of the middle of the map, since they would be closest to the viewer's expectation.
Related
I am trying to come up with a calculation that creates a column that comes up with a number that shows density for that specific location in a 5 mile radius, i.e if there are many other locations near it or not. I would like to compare these locations with themselves to achieve this.
I'm not familiar with the math needed to achieve this and have tried to find a solution for some time now.
Ok, i'm not super clear with what your problem may be but i will try to give you my approach.
Let's first assume that the area you are querying for points is small enough to be considered flat hence the geo coordinates of your area will basically be cartesian coordinates.
You choose your circle's center as (x,y) and then you have to find which of your points are within radius of your cirle: in cartesian coordinates being inside of a circle means that the distance of the points from your center are smaller than a given radius. You save those points in your choice of data structure and the density will probably be the number of your points divided by the area of the circle.
I hope i understood the problem correctyl!
I'm trying to use the fastKDE package (https://pypi.python.org/pypi/fastkde/1.0.8) to find the KDE of a point in a 2D plot. However, I want to know the KDE beyond the limits of the data points, and cannot figure out how to do this.
Using the code listed on the site linked above;
#!python
import numpy as np
from fastkde import fastKDE
import pylab as PP
#Generate two random variables dataset (representing 100000 pairs of datapoints)
N = 2e5
var1 = 50*np.random.normal(size=N) + 0.1
var2 = 0.01*np.random.normal(size=N) - 300
#Do the self-consistent density estimate
myPDF,axes = fastKDE.pdf(var1,var2)
#Extract the axes from the axis list
v1,v2 = axes
#Plot contours of the PDF should be a set of concentric ellipsoids centered on
#(0.1, -300) Comparitively, the y axis range should be tiny and the x axis range
#should be large
PP.contour(v1,v2,myPDF)
PP.show()
I'm able to find the KDE for any point within the limits of the data, but how do I find the KDE for say the point (0,300), without having to include it into var1 and var2. I don't want the KDE to be calculated with this data point, I want to know the KDE at that point.
I guess what I really want to be able to do is give the fastKDE a histogram of the data, so that I can set its axes myself. I just don't know if this is possible?
Cheers
I, too, have been experimenting with this code and have run into the same issues. What I've done (in lieu of a good N-D extrapolator) is to build a KDTree (with scipy.spatial) from the grid points that fastKDE returns and find the nearest grid point to the point I was to evaluate. I then lookup the corresponding pdf value at that point (it should be small near the edge of the pdf grid if not identically zero) and assign that value accordingly.
I came across this post while searching for a solution of this problem. Similiar to the building of a KDTree you could just calculate your stepsize in every griddimension, and then get the index of your query point by just subtracting the point value with the beginning of your axis and divide by the stepsize of that dimension, finally round it off, turn it to integer and voila. So for example in 1D:
def fastkde_test(test_x):
kde, axes = fastKDE.pdf(test_x, numPoints=num_p)
x_step = (max(axes)-min(axes)) / len(axes)
x_ind = np.int32(np.round((test_x-min(axes)) / x_step))
return kde[x_ind]
where test_x in this case is both the set for defining the KDE and the query set. Doing it this way is marginally faster by a factor of 10 in my case (at least in 1D, higher dimensions not yet tested) and does basically the same thing as the KDTree query.
I hope this helps anyone coming across this problem in the future, as I just did.
Edit: if your querying points outside of the range over which the KDE was calculated this method of course can only give you the same result as the KDTree query, namely the corresponding border of your KDE-grid. You would however have to hardcode this by cutting the resulting x_ind at the highest index, i.e. `len(axes)-1'.
I am trying to map a dataset with associated latitude and longitude. The details of the data I am using are given below:
Variable Type Data/Info
-------------------------------
lat ndarray 1826x960, type `float64`
lon ndarray 1826x960, type `float64`
data ndarray 1826x960, type `float64`
I have created then a basemap:
m = Basemap(projection='cyl', llcrnrlon=-180, urcrnrlon=180, llcrnrlat=-40, urcrnrlat=40, resolution='c')
Now, on the basemap created, I'd plot the above mentioned dataset using pcolormesh:
m.drawcoastlines()
m.drawcountries
x,y = m(lon,lat)
m.pcolormesh(x,y,data)
m.colorbar()
plt.show()
This gives following figure:
Temp Brightness plot
But if I perform similar plot on a dataset (size 2691x960, same goes to lon and lat) covering whole londitude stretch(-180 to 180), I get a 'strange bar':
strange bar
I am pretty sure that the strange bar occurs due to the overlapping of dataset. The same plot has been performed in matlab and it works pretty fine.
Please tell me what the problem is, what can be done to remove the bar, what are the other methods of plotting this kind of data in python.
I think that you are running into a problem that I ran into a little bit ago. The problem here is that, when basemap tries to create the polygons, it uses an interpolation method that does not appear to handle the prime meridian correctly. Pixels that actually cross the prime maridian get interpolated into a polygon that extends around the globe.
The solution that I have used is to split the file into two masked arrays (or just mask the original array two different ways at different times), one with the eastern hemisphere masked, and one with the western hemisphere masked, then map them both to the same axes object.
edit: Another solution may be to have your longitude bounds go from -179.99 to 179.99 or something similar.
I haven't worked with anything to give me this problem, but it looks like a solution to a similar sounding problem was offered here using the mpl_toolkit.basemap.addcyclic method.
From the docs:
arrout, lonsout = addcyclic(arrin, lonsin) adds cyclic (wraparound) point in longitude to arrin and lonsin, assumes longitude is the right-most dimension of arrin.
Background
I want to add a model manager function that filters a queryset based on the proximity to coordinates. I found this blog posting with code that is doing precisely what I want.
Code
The snippet below seems to make use of geopy functions that have since been removed. It coarsely narrows down the queryset by limiting the range of latitude and longitude.
# Prune down the set of all locations to something we can quickly check precisely
rough_distance = geopy.distance.arc_degrees(arcminutes=geopy.distance.nm(miles=distance)) * 2
queryset = queryset.filter(
latitude__range=(latitude - rough_distance, latitude + rough_distance),
longitude__range=(longitude - rough_distance, longitude + rough_distance)
)
Problem
Since some of the used geopy functions have been removed/moved, I'm trying to rewrite this stanza. However, I do not understand the calculations---barely passed geometry and my research has confused me more than actually helped me.
Can anyone help? I would greatly appreciate it.
In case anybody else is looking at this now, since I tried to use geopy and just hit up against it, the modern equivalent of the rough_distance snippet above is:
import geopy
rough_distance = geopy.units.degrees(arcminutes=geopy.units.nautical(miles=1))
It looks like distance in miles is being converted to nautical miles, which are each equal to a minute of arc, which are 1/60th of an arc degree each. That value is then doubled, and then added and subtracted from a given latitude and longitude. These four values can be used to form a bounding box around the coordinates.
You can lookup any needed conversion factors on Wikipedia. There's also a relevant article there titled Horizontal position representation which discusses pros and cons of alternatives to longitude and latitude positioning which avoid some of their complexities. In other words, about the considerations involved with replacing latitude and longitude with another horizontal position representation in calculations.
The Earth is not a sphere, only approximately so. If you need a more accurate calculation, use pyproj. Then you can calculate the location based a reference ellipsoid (e.g. WGS84).
martineau's answer is right on, in terms of what the snippet actually does, but it is important to note that 1 minute of arc represents very different distances depending on location. At the equator, the query covers the least axis aligned bounding box enclosing a circle of diameter distance, but off the equator, the bounding box does not completely contain that circle.
This code from the blog is sloppy:
def near(self, latitude=None, longitude=None, distance=None):
if not (latitude and longitude and distance):
return []
If latitude == 0 (equator) or longitude == 0 (Greenwich meridian), it returns immediately. Should be if latitude is None or longitude is None .......
#TokenMacGuy's answer is an improvement, but:
(a) The whole idea of the "bounding box" is to avoid an SQL or similar query calculating a distance to all points that otherwise satisfy the query. With appropriate indexes, the query will execute much faster. It does this at the cost of leaving the client to (1) calculate the coordinates of the bounding box (2) calculate and check the precise distance for each result returned by the query.
If step 2 is omitted, you get errors, even at the equator. For example "find all pizza shops in a 5-mile radius" means you get answers up to 7.07 miles (that's sqrt(5*2 + 5*2)) away in the corners of the box.
Note that the code that you show seems to be arbitrarily doubling the radius. This would mean you get points 14.1 miles away.
(b) As #TokenMacGuy said, away from the equator, it gets worse. The bounding box so calculated does not include all points that you are interested in -- unless of course you are overkilling by doubling the radius.
(c) If the circle of interest includes either the North or South Pole, the calculation is horribly inexact, and needs adjusting. If the circle of interest is crossed by the 180-degree meridian (i.e. the International Date Line without the zigzags), the results are a nonsense; you need to detect this case and apply a 2-part query (one part for each side of the meridian).
For solutions for problems (b) and (c), see this article.
If the coordinates on the earth are known, you can use geopy to get a good estimate of the decimal degrees to miles (or any distance units) scale at that point:
SCALE_VAL = 0.1
lat_scale_point = (cur_lat + SCALE_VAL, cur_long)
long_scale_point = (cur_lat, cur_long + SCALE_VAL)
cur_point = (cur_lat, cur_long)
lat_point_miles = distance.distance(cur_point, lat_scale_point).miles
long_point_miles = distance.distance(cur_point, long_scale_point).miles
# Assumes that 'radius_miles` is the range around the point you want to look for
lat_rough_distance = (radius_miles / lat_point_miles) * SCALE_VAL
long_rough_distance = (radius_miles / long_point_miles) * SCALE_VAL
Some caveats:
Special-case handling for the the scale points is needed around polls or prime meridean
Depending on how large or small you want your radius to be, you could pick a more appropriate SCALE_VAL
I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.
Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.
My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.
1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png
OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:
from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y
An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.
Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?
A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.
The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?
Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!
~~~~~~~~~~~~~~~~~~~~~~~~~
OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png
For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:
cat [data] | qconvex Fx > out
The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.
I think what you are looking for is the Convex Hull of the data That will give a set of points that if connected will mean that all your points are on or inside the connected points
I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.
This isn't the most efficient example, but if your data set is small this won't be particularly slow:
import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]
x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])
y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])