Working out which points lat/lon coordinates are closest to - python

I currently have a list of coordinates
[(52.14847612092221, 0.33689512047881015),
(52.14847612092221, 0.33689512047881015),
(52.95756796776235, 0.38027099942700493),
(51.78723479900971, -1.4214854900618064)
...]
I would like to split this list into 3 separate lists/datafames corresponding to which city they are closest to (in this case the coordinates are all in the UK and the 3 cities are Manchester, Cardiff and London)
So at the end result I would like the current single list of coordinates to be split into either separate lists ideally or it could be a dataframe with 3 columns would be fine eg:
leeds cardiff london
(51.78723479900971, (51.78723479900971, (51.78723479900971,
-1.4214854900618064) -1.4214854900618064) -1.4214854900618064)
(those are obiously not correct coordinates!)
-Hope that makes sense. It doesn't have to be overly accurate (don't need to take into consideration the curvature of the earth or anything like that!)
I'm really not sure where to start with this - I'm very new to python and would appreciate any help!
Thanks in advance

This will get you started:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
places = ['london','cardiff','leeds']
coordinates = {}
for i in places:
coordinates[i] = ((geolocator.geocode(i).latitude, geolocator.geocode(i).longitude))
>>>print coordinates
{'cardiff': (51.4816546, -3.1791933), 'leeds': (53.7974185, -1.543794), 'london': (51.5073219, -0.1276473)}
You can now hook up the architecture for putting this in a pandas dataframe, calculating the distance metric between your coordinates and the above.
Ok so now we want to do distances between what is a very small array (the coordinates).
Here's some code:
import numpy as np
single_point = [3, 4] # A coordinate
points = np.arange(20).reshape((10,2)) # Lots of other coordinates
dist = (points - single_point)**2
dist = np.sum(dist, axis=1)
dist = np.sqrt(dist)
From here there is any number of things you can do. You can sort it using numpy, or you can place it in a pandas dataframe and sort it there (though that's really just a wrapper for the numpy function I believe). Whichever you're more comfortable with.

This is a pretty brute force approach, and not too adaptable. However, that can be the easiest to understand and might be plenty efficient for the problem at hand. It also uses only pure python, which may help you to understand some of python's conventions.
points = [(52.14847612092221, 0.33689512047881015), (52.14847612092221, 0.33689512047881015), (52.95756796776235, 0.38027099942700493), (51.78723479900971, -1.4214854900618064), ...]
cardiff = (51.4816546, -3.1791933)
leeds = (53.7974185, -1.543794)
london = (51.5073219, -0.1276473)
def distance(pt, city):
return ((pt[0] - city[0])**2 + (pt[1] - city[1])**2)**0.5
cardiff_pts = []
leeds_pts = []
london_pts = []
undefined_pts = [] # for points equidistant between two/three cities
for pt in points:
d_cardiff = distance(pt, cardiff)
d_leeds = distance(pt, leeds)
d_london = distance(pt, london)
if (d_cardiff < d_leeds) and (d_cardiff < d_london):
cardiff_pts.append(pt)
elif (d_leeds < d_cardiff) and (d_leeds < d_london):
leeds_pts.append(pt)
elif (d_london < d_cardiff) and (d_london < d_leeds):
london_pts.append(pt)
else:
undefined_pts.append(pt)
Note that this solution assumes the values are on a cartesian reference frame, which latitude longitude pairs are not.

Related

Generating geographical coordinates from giver corner

Based on the given corner in the map(position A), I want to generate more coordinates towards position B by adding some small values (distances) in the given latitude and longitude. For instance:
There are 6 houses from position A to B in a map. If I know the
latitude, Longitude of 1 house ( 143.5689855, -38.328956999999996), how can I create the coordinates for the remaining 5?
I tried to achieve this by adding some small numbers in coordinates of a given corner as shown in below script. But the code only output for 1 house. How can I create a loop in my code that will automatically add the given small number and displays new coordinates for the rest of houses or even for bigger area?
What I have tried:
from arcgis.gis import GIS
from arcgis.geocoding import geocode
from arcgis.geocoding import reverse_geocode
import pprint
# Create an anonymous connection to ArcGIS Online
gis = GIS()
#45-Stodart-St (given corner)
geocode_home = geocode(address="45 Stodart St, Colac VIC 3250")
location = [geocode_home[0]["location"]['x'], geocode_home[0]["location"]['y']]
pprint.pprint(location)
#Add some small numbers in origanal location. This will give us coordinates of next house i.e 43-Stodart-St
#43-Stodart-St
new_loc = [location[0]+0.0002215*1,location[1]*0.999999]
pprint.pprint(new_loc)
Output:
Assuming you have the 2 locations of house A and house B as loc_A and loc_B.
Assuming you know the house numbers of A and B. --> You know the number of houses.
The following code will iterate over the house numbers and create a list of locations:
longitude_diff = loc_B[0] - loc_A[0]
latitude_diff = loc_B[1] - loc_A[1]
house_locations = []
for i in range(1, house_number_B - house_number_A):
house_locations.append([loc_B[0] + i * longitude_diff/(house_number_B - house_number_A),
loc_B[1] + i * latitude_diff/(house_number_B - house_number_A)])

Efficiently compute distances between thousands of coordinate pairs

I have a catalog I opened in python, which has about 70,000 rows of data (ra, dec coordinates and object name) for various objects. I also have another list of about 15,000 objects of interest, which also appear in the previously mentioned catalog. For each of these 15,000 objects, I would like to see if any other objects in the large 70,000 list have ra, dec coordinates within 10 arcseconds of the object. If this is found to be true, I'd just like to flag the object and move on to the next one. However, this process takes a long time, since the distances are computed between the current object of interest (out of 15,000) 70,000 different times. This would take days! How could I accomplish the same task more efficiently? Below is my current code, where all_objects is a list of all the 15,000 object names of interest and catalog is the previously mentioned table data for 70,000 objects.
from astropy.coordinates import SkyCoord
from astropy import units as u
for obj_name in all_objects:
obj_ind = list(catalog['NAME']).index(obj_name)
c1 = SkyCoord(ra=catalog['RA'][obj_ind]*u.deg, dec=catalog['DEC'][obj_ind]*u.deg, frame='fk5')
for i in range(len(catalog['NAME'])):
if i != obj_ind:
# Compute distance between object and other source
c2 = SkyCoord(ra=catalog['RA'][i]*u.deg, dec=catalog['DEC'][i]*u.deg, frame='fk5')
sep = c1.separation(c2)
contamination_flag = False
if sep.arcsecond <= 10:
contamination_flag = True
print('CONTAMINATION FOUND')
break
1 Create your own separation function
This step is really easy once you look at the implementation and ask yourself: "how can I make this faster"
def separation(self, other):
from . import Angle
from .angle_utilities import angular_separation # I've put that in the code bellow so it is clearer
if not self.is_equivalent_frame(other):
try:
other = other.transform_to(self, merge_attributes=False)
except TypeError:
raise TypeError('Can only get separation to another SkyCoord '
'or a coordinate frame with data')
lon1 = self.spherical.lon
lat1 = self.spherical.lat
lon2 = other.spherical.lon
lat2 = other.spherical.lat
sdlon = np.sin(lon2 - lon1)
cdlon = np.cos(lon2 - lon1)
slat1 = np.sin(lat1)
slat2 = np.sin(lat2)
clat1 = np.cos(lat1)
clat2 = np.cos(lat2)
num1 = clat2 * sdlon
num2 = clat1 * slat2 - slat1 * clat2 * cdlon
denominator = slat1 * slat2 + clat1 * clat2 * cdlon
return Angle(np.arctan2(np.hypot(num1, num2), denominator), unit=u.degree)
It calculates a lot of cosines and sines, then creates an instance of Angle and converts to degrees then you convert to arc seconds.
You might not want to use Angle, nor do the tests and conversions at the beginning, nor doing the import in the function, nor doing so much variable assignment if you need performance.
The separation function feels a bit heavy to me, it should just take numbers and return a number.
2 Use a quad tree (requires a complete rewrite of your code)
That said, let's look at the complexity of your algorithm, it checks every element against every other element, complexity is O(n**2) (Big O notation). Can we do better...
YES You could use a Quad-tree, worst case complexity of Quad tree is O(N). What that basically means if you're not familiar with Big O is that for 15 000 element, the lookup will be 15 000 times what it is for 1 element instead of 225 000 000 times (15 000 squared)... quite an improvement right... Scipy has a great Quad tree library (I've always used my own).

Geopandas : sort a sample of points like a cycle graph

I'm trying geopandas to manipulate some points data. My final GeoDataFrame is represented there :
In order to use an other Python module which calculates the shortest road between two points with OSM data, I must sort my points like a tour.
If not, the next Python module which calculates shortest road, but not necessarily between the nearest points. And the main problem is the constraint of a tour.
If my points were only in a line, a basic sorting function on latitudes and longitudes of each point should be enough, like :
df1 = pd.read_csv("file.csv", sep = ",")
df1 = df1.sort_values(['Latitude','Longitude'], ascending = [1,1])
# (I'm starting with pandas df before GeoDataFrame conversion)
If we start from the "upper" point of previous picture following this sorting, the second point of DataFrame will be the nearest of it, etc... Until the fifth point, wich is on the right of the picture (so not the nearest anymore)...
So my question is : does someone know how achieve this special kind of sorting, or must I change my index manually ?
If I understand your question correctly, you want to rearrange the order of points in a way that they would create the shortest possible path.
I have run into the same problem also.
Here is the function that accepts regular dataframe (= with separate fields for each coordinate. I am sure you will be able to modify either function in order to accept geodataframe or dataframe in order to split geometry field into x and y fields.
def autoroute_points_df(points_df, x_col="e",y_col="n"):
'''
Function, that converts a list of random points into ordered points, searching for the shortest possible distance between the points.
Author: Marjan Moderc, 2016
'''
points_list = points_df[[x_col,y_col]].values.tolist()
# arrange points in by ascending Y or X
points_we = sorted(points_list, key=lambda x: x[0])
points_sn = sorted(points_list, key=lambda x: x[1])
# Calculate the general direction of points (North-South or West-East) - In order to decide where to start the path!
westmost_point = points_we[0]
eastmost_point = points_we[-1]
deltay = eastmost_point[1] - westmost_point[1]
deltax = eastmost_point[0] - westmost_point[0]
alfa = math.degrees(math.atan2(deltay, deltax))
azimut = (90 - alfa) % 360
# If main directon is towards east (45°-135°), take westmost point as starting line.
if (azimut > 45 and azimut < 135):
points_list = points_we
elif azimut > 180:
raise Exception("Error while computing the azimuth! It cant be bigger then 180 since first point is west and second is east.")
else:
points_list = points_sn
# Create output (ordered df) and populate it with the first one already.
ordered_points_df = pd.DataFrame(columns=points_df.columns)
ordered_points_df = ordered_points_df.append(points_df.ix[(points_df[x_col]==points_list[0][0]) & (points_df[y_col]==points_list[0][1])])
for iteration in range(0, len(points_list) - 1):
already_ordered = ordered_points_df[[x_col,y_col]].values.tolist()
current_point = already_ordered[-1] # current point
possible_candidates = [i for i in points_list if i not in already_ordered] # list of candidates
distance = 10000000000000000000000
best_candidate = None
for candidate in possible_candidates:
current_distance = Point(current_point).distance(Point(candidate))
if current_distance < distance:
best_candidate = candidate
distance = current_distance
ordered_points_df = ordered_points_df.append(points_df.ix[(points_df[x_col]==best_candidate[0]) & (points_df[y_col]==best_candidate[1])])
return ordered_points_df
Hope it solves your problem!

Count number of points in multipolygon shapefile using Python

I have a polygon shapefile of the U.S. made up of individual states as their attribute values. In addition, I have arrays storing latitude and longitude values of point events that I am also interested in. Essentially, I would like to 'spatial join' the points and polygons (or perform a check to see which polygon [i.e., state] each point is in), then sum the number of points in each state to find out which state has the most number of 'events'.
I believe the pseudocode would be something like:
Read in US.shp
Read in lat/lon points of events
Loop through each state in the shapefile and find number of points in each state
print 'Here is a list of the number of points in each state: '
Any libraries or syntax would be greatly appreciated.
Based on what I can tell, the OGR library is what I need, but I am having trouble with the syntax:
dsPolygons = ogr.Open('US.shp')
polygonsLayer = dsPolygons.GetLayer()
#Iterating all the polygons
polygonFeature = polygonsLayer.GetNextFeature()
k=0
while polygonFeature:
k = k + 1
print "processing " + polygonFeature.GetField("STATE") + "-" + str(k) + " of " + str(polygonsLayer.GetFeatureCount())
geometry = polygonFeature.GetGeometryRef()
#Read in some points?
geomcol = ogr.Geometry(ogr.wkbGeometryCollection)
point = ogr.Geometry(ogr.wkbPoint)
point.AddPoint(-122.33,47.09)
point.AddPoint(-110.11,33.33)
#geomcol.AddGeometry(point)
print point.ExportToWkt()
print point
numCounts=0.0
while pointFeature:
if pointFeature.GetGeometryRef().Within(geometry):
numCounts = numCounts + 1
pointFeature = pointsLayer.GetNextFeature()
polygonFeature = polygonsLayer.GetNextFeature()
#Loop through to see how many events in each state
I like the question. I doubt I can give you the best answer, and definitely can't help with OGR, but FWIW I'll tell you what I'm doing right now.
I use GeoPandas, a geospatial extension of pandas. I recommend it — it's high-level and does a lot, giving you everything in Shapely and fiona for free. It is in active development by twitter/#kajord and others.
Here's a version of my working code. It assumes you have everything in shapefiles, but it's easy to generate a geopandas.GeoDataFrame from a list.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Make a copy because I'm going to drop points as I
# assign them to polys, to speed up subsequent search.
pts = points.copy()
# We're going to keep a list of how many points we find.
pts_in_polys = []
# Loop over polygons with index i.
for i, poly in polygons.iterrows():
# Keep a list of points in this poly
pts_in_this_poly = []
# Now loop over all points with index j.
for j, pt in pts.iterrows():
if poly.geometry.contains(pt.geometry):
# Then it's a hit! Add it to the list,
# and drop it so we have less hunting.
pts_in_this_poly.append(pt.geometry)
pts = pts.drop([j])
# We could do all sorts, like grab a property of the
# points, but let's just append the number of them.
pts_in_polys.append(len(pts_in_this_poly))
# Add the number of points for each poly to the dataframe.
polygons['number of points'] = gpd.GeoSeries(pts_in_polys)
The developer tells me that spatial joins are 'new in the dev version', so if you feel like poking around in there, I'd love to hear how that goes! The main problem with my code is that it's slow.
import geopandas as gpd
# Read the data.
polygons = gpd.GeoDataFrame.from_file('polygons.shp')
points = gpd.GeoDataFrame.from_file('points.shp')
# Spatial Joins
pointsInPolygon = gpd.sjoin(points, polygons, how="inner", op='intersects')
# Add a field with 1 as a constant value
pointsInPolygon['const']=1
# Group according to the column by which you want to aggregate data
pointsInPolygon.groupby(['statename']).sum()
**The column ['const'] will give you the count number of points in your multipolygons.**
#If you want to see others columns as well, just type something like this :
pointsInPolygon = pointsInPolygon.groupby('statename').agg({'columnA':'first', 'columnB':'first', 'const':'sum'}).reset_index()
[1]: https://geopandas.org/docs/user_guide/mergingdata.html#spatial-joins
[2]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html

pyephem FixedObject() for given RA/Dec

I'm looking to determine the alt/az of (un-famous) stars at given RA/Dec at specific times from Mauna Kea. I'm trying to compute these parameters using pyephem, but the resulting alt/az don't agree with other sources. Here's the calculation for HAT-P-32 from Keck:
import ephem
telescope = ephem.Observer()
telescope.lat = '19.8210'
telescope.long = '-155.4683'
telescope.elevation = 4154
telescope.date = '2013/1/18 10:04:14'
star = ephem.FixedBody()
star._ra = ephem.degrees('02:04:10.278')
star._dec = ephem.degrees('+46:41:16.21')
star.compute(telescope)
print star.alt, star.az
which returns -28:43:54.0 73:22:55.3, though according to Stellarium, the proper alt/az should be: 62:26:03 349:15:13. What am I doing wrong?
EDIT: Corrected latitude and longitude, which were formerly reversed.
First, you've got long and latitude backwards; second, you need to provide the strings in hexadecimal form; and third, you need to provide the RA as hours, not degrees:
import ephem
telescope = ephem.Observer()
# Reversed longitude and latitude for Mauna Kea
telescope.lat = '19:49:28' # from Wikipedia
telescope.long = '-155:28:24'
telescope.elevation = 4154.
telescope.date = '2013/1/18 00:04:14'
star = ephem.FixedBody()
star._ra = ephem.hours('02:04:10.278') # in hours for RA
star._dec = ephem.degrees('+46:41:16.21')
star.compute(telescope)
This way, you get:
>>> print star.alt, star.az
29:11:57.2 46:43:19.6
PyEphem always uses UTC for time, so that programs operate the same and give the same output wherever they are run. You simply need to convert the date you are using to UTC, instead of using your local time zone, and the results agree fairly closely with Stellarium; use:
telescope.date = '2013/1/18 05:04:14'
The result is this alt/az:
62:27:19.0 349:26:19.4
To know where the small remaining difference comes from, I would have to look into how the two programs handle each step of their computation; but does this get you close enough?

Categories

Resources