Finding if a line between two geo coordinates crosses land - python

Currently I'm working with a dataset that contains routes around the sea but some of them either cross land or are on land (due to the fidelity of the data being quite low). I have been using the great https://github.com/toddkarin/global-land-mask tool from toddkarin to find which of the coordinates I have are on land so I can discard them (eventually I'll may find a way of moving them to the nearest point at sea).
My current problem is that I need to find a way of finding if a line (given any two coordinates) crosses land (think of an island between two point in the sea).
My area of operation is the entire globe and I am using WGS84 if that changes anything. I have some very basic experience with matplotlib/Basemap but I'm not at all confident with it and I'm struggling to find where to start with this. Do I try to plot each coordinate along the line at a given distance/resolution and then use Todd's tool or is there a more efficient way?
Thanks in advance for any assistance. I've done a lot of digging and reading before posting but haven't found what I think I need.
I need the tool to be in python ideally but if I need to call another language/library/exe that can give me a True/False output that's good too.

A possible tool available in Python to perform these sorts of operations is Shapely.
If you're able to extract the polygon data of island and other masses, then you could use Shapely to perform an intersection test (see Line vs. Polygon Intersection Coordinates). This will work for checking intersections between points, lines and arbitrary polygons.
The quick and dirty way is as you propose yourself, to discretize the line between the two points and check each of these.

Thanks to some help from the answer here I came up with the following which is now working.
How to find all coordinates efficiently between two geo points/locations with certain interval using python
from global_land_mask import globe
def crosses_land(x1,y1,x2,y2):
# your geo points
#x1, y1 = 13.26077,100.81099
#x2, y2 = 13.13237,100.82993
# the increment step (higher = faster)
STEP = 0.0003
if x1 > x2: # x2 must be the bigger one here
x1, x2 = x2, x1
y1, y2 = y2, y1
for i in range(int((x2-x1)/STEP) + 1):
try:
x = x1 + i*STEP
y = (y1-y2)/(x1-x2) * (x - x1) + y1
except:
continue
is_on_land = globe.is_land(float(x), float(y))
#if not is_on_land:
#print("in water")
if is_on_land:
#print("crosses land")
return True
print(crosses_land(x1,y1,x2,y2))

Related

Counting each line intersection on a grid of polygons in geopandas

I have a large dataset (~20000) of past storms over 40 years that have a list of central points over 3-hour intervals. I'm trying to overlay a mesh-grid onto a large area from which I would like to count the number of times each storm has passed over any given grid cell, however my current implementation only tracks the position at those three-hour intervals, leading to some instances where the track jumps a grid space when it should also be counted.
I am trying to address this problem using geopandas instead to create a lineseries for each storm track, and then perform an intersection against the mesh grid, however, I cannot find any functional implementations that allow me to do so.
To create the grid in geopandas, I am using the following solution from a previous question:
lonCount = ((plotExtent[1]+360) - (plotExtent[0]+360)) * gridResolution
latCount = ((plotExtent[3]) - (plotExtent[2])) * gridResolution
lons = np.linspace(plotExtent[0], plotExtent[1], lonCount)
lats = np.linspace(plotExtent[2], plotExtent[3], latCount)
# Store the meshgrid in polygon format
xlines = [((x1, yi), (x2, yi)) for x1, x2 in zip(lons[:-1], lons[1:]) for yi in lats]
ylines = [((xi, y1), (xi, y2)) for y1, y2 in zip(lats[:-1], lats[1:]) for xi in lons]
# Save as a Shapely object, then store in geopandas
grids = list(polygonize(MultiLineString(xlines + ylines)))
polyFrame = gpd.GeoDataFrame(grids)
This creates a geoDataSeries of ~5600 polygon objects. I then loop through each of my storm objects to strip out the lat/lon list pairs, and convert them into a shapely LineSeries object, which is then read into geopandas as such:
polyLine = LineString(list(zip(storm_lons, storm_lats)))
coord_tests = gpd.GeoSeries(polyLine)
My goal from here is to simply do something like this:
I = coord_tests.intersects(polyFrame)
To collect a list of polygons that the LineString intersects with, however, this prompts the following error:
AttributeError: No geometry data set yet (expected in column 'geometry'.)
I'm wondering if I have something formatted incorrectly here, am passing the call incorrectly to this function, or if there is a more efficient way to accomplish what I am trying to do here.
Any assistance would be greatly appreciated.
Thanks!
polyFrame = gpd.GeoDataFrame(geometry=grids)
:-)

How to account for radian to degrees inaccuracy

I am trying to perform a simple task using simple math in python and I suspect that the inherit error in converting from radians to degrees as a result of an error with floating point math (as garnered from another question on the topic please don't mark this as a duplicate question, it's not).
I am trying to extend a line by 500m. To do this I am taking the the endpoint coordinates from a supplied line and using the existing heading of said line to generate the coordinates of the point which is 500m in the same heading.
Heading is important in this case as it is the source of my error. Or so I suspect.
I use the following function to calculate the interior angle of my right angle triangle, built using the existing line, or in this case my hypotenuse:
def intangle(xypoints):
angle = []
for i in xypoints:
x1 = i[0][0]
x2 = i[1][0]
y1 = i[0][1]
y2 = i[1][1]
gradient = (x1 - x2)/(y1-y2)
radangle = math.atan(gradient)
angle.append((math.degrees(radangle)))
return angle
My input points are, for example:
(22732.23679147904, 6284399.7935522054)
(20848.591367954294, 6281677.926560438)
I know going into this that my angle is 35° as these coordinates are programmatically generated by a separate function and when plotted are out by around 3.75" for each KM. Another error as a result of converting radians to degrees but acceptable in its scope.
The error generated by the above function however, results in an angle that plots my new endpoint in such a place that the line is no longer perfectly straight when I connect the dots and I absolutely have to have a straight line.
How can I go about doing this differently to account for the floating point error? Is it even possible? If not, then what would be an acceptable method of extending my line by howevermany meters using euclidean geometry?
To add to this, I have already done all relevant geographic conversions and I am 100% sure that I am working on a 2D plane so the ellipsoid and such do not play a role in this at all.
Using angles is unnecessary, and there are problems in the way you do it. Using the atan will only give you angles between -pi/2 and pi/2, and you will get the same angle value for opposite directions.
You should rather use Thales:
import math
a = (22732.23679147904, 6284399.7935522054)
b = (20848.591367954294, 6281677.926560438)
def extend_line(a, b, length):
"""
Returns the coordinates of point C at length beyond B in the direction of A->B"""
ab = math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)
coeff = (ab + length)/ab
return (a[0] + coeff*(b[0]-a[0]), a[1] + coeff*(b[1]-a[1]) )
print(extend_line(a, b, 500))
# (20564.06031560228, 6281266.7792872535)

Ellipsoid equation containing numerous points

I have a large quantity of pixel colors (96 thousands different colors):
And I want to get some kind of a mathematically-defined probability region like in this question:
The main obstacle I see right now – all methods on Google are mainly about visualisations and about two-dimensional spaces, yet there is no algorithm for finding coefficients of an equation like:
a1x2 + b1y2 + c1y2 + a2xy + b2xz + c2yz + a3x + b3y + c3z = 0
And this paper is too difficult for me to implement it in python. :(
Anyway, what I just want is to determine if some pixel is more-or-less lies within the diapason I have.
I tried making it using scikit clustering, but I failed due to having only one
set of data, probably. And creating an array 2563 elements
representing each pixel color seems a wrong way.
I wonder if there is an easy way to determine boundaries of this point cluster?
Or, maybe I'm just overthinking it and there is something like OpenCV
cv2.inRange() function?
this can be solved by optimization and fitting of the ellipsoid polynomial. However I would start with geometrical approach which is much faster:
find avg point position
that will be the center of your ellipsoid
p0 = sum (p[i]) / n // average
i = { 0,1,2,3,...,n-1 } // of all points
If your point density is not homogenuous then it is safer to use bounding box center instead. So find xmin,ymin,zmin,xmax,ymax,zmax and the middle between them is your center.
find most distant point to center
that will give you main semi axis
pa = p[j];
|p[j]-p0| >= |p[i]-p0| // max
i = { 0,1,2,3,...,n-1 } // of all points
find second semi-axises
so vector pa-p0 is normal to plane in which the other semi-axises should be. So find most distant point to p0 from that plane:
pb = p[j];
|p[j]-p0| >= |p[i]-p0| // max
dot(pa-p0,p[j]-p0) == 0 // but inly if inside plane
i = { 0,1,2,3,...,n-1 } // from all points
beware that the result of dot product may not be precisely zero so it is better to test against something like this:
|dot(pa-p0,p[j]-p0)| <= 1e-3
You can use any threshold you want (should be based on the ellipsoid size).
find last semi-axis
So we know that last semi-axis should be perpendicular to both
(pa-p0) AND (pb-p0)
So find point such that:
pc = p[j];
|p[j]-p0| >= |p[i]-p0| // max
dot(pa-p0,p[j]-p0) == 0 // but inly if inside plane
dot(pb-p0,p[j]-p0) == 0 // and perpendicular also to b semi-axis
i = { 0,1,2,3,...,n-1 } // from all points
Ellipsoid
Now you have all the parameters you need to form your ellipsoid. vectors
(pa-p0),(pb-p0),(pc-p0)
are the basis vectors of your ellipsoid (you can make them perpendicular by using cross product). Their size gives you the radiuses. And p0 is the center. You can also use this parametric equation:
a=pa-p0;
b=pb-p0;
c=pc-p0;
p(u,v) = p0 + a*cos(u)*cos(v)
+ b*cos(u)*sin(v)
+ c*sin(u);
u = < -0.5*PI , +0.5*PI >
v = < 0.0 , 2.0*PI >
This whole process is just O(n) and the results can be used as start point for both optimization and fitting to speed them up without the loss of accuracy. If you want to further improve accuracy See:
How approximation search works
The sub links shows you examples of fitting ...
You can also take a look at this:
Algorithms: Ellipse matching
which is basically similar to your task but only in 2D still may bring you some ideas.
Here is unstrict solution with fast and simple random search approach*. Best side - no heavy linear algebra library required**. Seems it worked fine for mesh collision detection.
Is assumes that ellipsoid center matches cloud center and then uses some sort of mirrored average to search for main axis.
Full working code is slightly bigger and placed on git, idea of main axis search is here:
np.random.shuffle(pts)
pts_len = len(pts)
pt_average = np.sum(pts, axis = 0) / pts_len
vec_major = pt_average * 0
minor_max, major_max = 0, 0
# may be improved with overlapped pass,
for pt_cur in pts:
vec_cur = pt_cur - pt_average
proj_len, rej_len = proj_length(vec_cur, vec_major)
if proj_len < 0:
vec_cur = -vec_cur
vec_major += (vec_cur - vec_major) / pts_len
major_max = max(major_max, abs(proj_len))
minor_max = max(minor_max, rej_len)
It can be improved/optimized even more at some points. Examples what it will produce:
And full experiment code with plots
*i.e. adjusting code lines randomly until they work
**was actually reason to figure out this solution

Finding the distance of coordinates from the beginning of a route in Shapely

I have a list of coordinates (lat/lon) representing a route.
Given a certain radius and another coordinate I need to check if the coord is in the route (within the given radius from any point) and its distance from the beginning of the route.
I looked at Shapely and it looks like a good solution.
I started off by creating a StringLine
from shapely.geometry import LineString
route = LineString[(x, y), (x1, y1), ...]
Then to check if the point is near the route I've added a buffer and checked
for intersection
from shapely.geometry import Point
p = Point(x, y)
r = 0.5
intersection = p.buffer(r).intersection(route)
if intersection.is_empty:
print "Point not on route"
else:
# Calculate P distance from the begning of route
I'm stuck calculating the distance. I thought of splitting the route at p and measuring the length of the first half but the intersection result I get is a HeterogeneousGeometrySequence which I'm not sure what I can do with.
I believe I found a solution:
if p.buffer(r).intersects(route):
return route.project(p)
Rather than buffering a geometry, which is expensive and imperfect (since buffering requires a number of segments and many other options), just see if the point is within a distance threshold:
if route.distance(p) <= r:
return route.project(p)
Also, you probably realised by now that your distance units are in degrees. If you want linear distances, like meters, you would need to make it much more complicated using different libraries.

Estimating the boundary of arbitrarily distributed data

I have two dimensional discrete spatial data. I would like to make an approximation of the spatial boundaries of this data so that I can produce a plot with another dataset on top of it.
Ideally, this would be an ordered set of (x,y) points that matplotlib can plot with the plt.Polygon() patch.
My initial attempt is very inelegant: I place a fine grid over the data, and where data is found in a cell, a square matplotlib patch is created of that cell. The resolution of the boundary thus depends on the sampling frequency of the grid. Here is an example, where the grey region are the cells containing data, black where no data exists.
1st attempt http://astro.dur.ac.uk/~dmurphy/data_limits.png
OK, problem solved - why am I still here? Well.... I'd like a more "elegant" solution, or at least one that is faster (ie. I don't want to get on with "real" work, I'd like to have some fun with this!). The best way I can think of is a ray-tracing approach - eg:
from xmin to xmax, at y=ymin, check if data boundary crossed in intervals dx
y=ymin+dy, do 1
do 1-2, but now sample in y
An alternative is defining a centre, and sampling in r-theta space - ie radial spokes in dtheta increments.
Both would produce a set of (x,y) points, but then how do I order/link neighbouring points them to create the boundary?
A nearest neighbour approach is not appropriate as, for example (to borrow from Geography), an isthmus (think of Panama connecting N&S America) could then close off and isolate regions. This also might not deal very well with the holes seen in the data, which I would like to represent as a different plt.Polygon.
The solution perhaps comes from solving an area maximisation problem. For a set of points defining the data limits, what is the maximum contiguous area contained within those points To form the enclosed area, what are the neighbouring points for the nth point? How will the holes be treated in this scheme - is this erring into topology now?
Apologies, much of this is me thinking out loud. I'd be grateful for some hints, suggestions or solutions. I suspect this is an oft-studied problem with many solution techniques, but I'm looking for something simple to code and quick to run... I guess everyone is, really!
~~~~~~~~~~~~~~~~~~~~~~~~~
OK, here's attempt #2 using Mark's idea of convex hulls:
alt text http://astro.dur.ac.uk/~dmurphy/data_limitsv2.png
For this I used qconvex from the qhull package, getting it to return the extreme vertices. For those interested:
cat [data] | qconvex Fx > out
The sampling of the perimeter seems quite low, and although I haven't played much with the settings, I'm not convinced I can improve the fidelity.
I think what you are looking for is the Convex Hull of the data That will give a set of points that if connected will mean that all your points are on or inside the connected points
I may have mixed something, but what's the motivation for simply not determining the maximum and minimum x and y level? Unless you have an enormous amount of data you could simply iterate through your points determining minimum and maximum levels fairly quickly.
This isn't the most efficient example, but if your data set is small this won't be particularly slow:
import random
data = [(random.randint(-100, 100), random.randint(-100, 100)) for i in range(1000)]
x_min = min([point[0] for point in data])
x_max = max([point[0] for point in data])
y_min = min([point[1] for point in data])
y_max = max([point[1] for point in data])

Categories

Resources