Anonymizing geo location coordinates in python - python

I have a csv of names, transaction amount and an exact longitude and latitude of the location where the transaction was performed.
I want the final document to be anonymized - for that I need to change it into a CSV where the names are hashed (that should be easy enough), and the longitude and latitude are obscured within a radius of 2km.
I.e, changing the coordinates so they are within no more than 2 km from the original location, but in a randomized way, so that it is not revertible by a formula.
Does anyone know how to work with coordinates that way?

You could use locality sensitive hashing (LSH) to map similar co-ordinates (i.e. within a 2 KM radius), to the same value with a high probability. Hence, co-ordinates that map to the same bucket would be located closer together in Euclidean space.
Else, another technique would be to use any standard hash function y = H(x), and compute y modulo N, where N is the range of co-ordinates. Assume, your co-ordinates are P = (500,700), and you would like to return a randomized value in a range of [-x,x] KM from P.
P = (500,700)
Range = 1000 #1000 meters for example
#Anonymize co-ordinates to within specified range
ANON_X = hash(P[0]) % Range
ANON_Y = hash(P[1]) % Range
#Randomly add/subtract range
P = (P + ANON_X*random.choice([-1,1]), P+ANON_Y*random.choice([-1,1]))

Related

Find the coordinate of users given a specific range

I have a list of users' latitude and longitude.
The input will be the user's lat/lon and a range. ex. 500 meters 
I want to find out which users are in the range of 500 meters from that list.
using geopy.distance I can find the distance between two points..
newport_ri = (41.49008, -71.312796, 100)
cleveland_oh = (41.499498, -81.695391, 100)
print(distance.distance(newport_ri, cleveland_oh).km)
What I want is to find the points giving distance.
Something like this-
coor = [(35.441339, -88.092403)
,(35.453793, -88.061769),
(35.559426, -88.014642),
(35.654535, -88.060918),
(35.812953, -88.120935)]
def findClosest(coor,userCoor,ranges):
pass
userCoor = [35.829042, -88.039396]
ranges = 500 #meter or km
findClosest(coor,userCoor,ranges)
## Output:The coordinates of the user are within 500 meters
For example, if the number of users is not very large, you can sort users by distances and return the closest users according to ranges. The function would begin like this:
def findClosest(coor, userCoor, ranges):
dist = []
for i, u in enumerate(userCoor):
dist.append((distance.distance(coor, u).km, i))
dist = sorted(dist)
...
Otherwise, if a faster solution is needed preprocessing users might be necessary, for example, computing a voronoi diagram of their locations or something similar.

Check if coordinates are within a specific area

I used various sources of information to determine the GPS coordinates of a traffic sign, and plotted them using using plotly.express.scatter_mapbox and add_scattermapbox as follows:
The orange dot is a high end, "reference" measurement and the others are from different sources.
The numeric coordinates in this example are:
red: 51.4001213° 12.4291356°
purple: 51.400127° 12.429187°
green: 51.400106346232° 12.429278003005°
orange: 51.4000684461437° 12.4292323627949°
How can i calculate an area around the orange dot (e.g. 5 meter), find the coordinates which are inside this area and how do i plot that area on my map?
Does this answer: https://gis.stackexchange.com/a/25883
This is tricky for two reasons: first, limiting the points to a circle instead of a square; second, accounting for distortions in the distance calculations.
Many GISes include capabilities that automatically and transparently handle both complications. However, the tags here suggest that a GIS-independent description of an algorithm may be desirable.
To generate points uniformly, randomly, and independently within a circle of radius r around a location (x0, y0), start by generating two independent uniform random values u and v in the interval [0, 1). (This is what almost every random number generator provides you.) Compute
w = r * sqrt(u)
t = 2 * Pi * v
x = w * cos(t)
y = w * sin(t)
The desired random point is at location (x+x0, y+y0).
When using geographic (lat,lon) coordinates, then x0 (longitude) and y0 (latitude) will be in degrees but r will most likely be in meters (or feet or miles or some other linear measurement). First, convert the radius r into degrees as if you were located near the equator. Here, there are about 111,300 meters in a degree.
Second, after generating x and y as in step (1), adjust the x-coordinate for the shrinking of the east-west distances:
x' = x / cos(y0)
The desired random point is at location (x'+x0, y+y0). This is an approximate procedure. For small radii (less than a few hundred kilometers) that do not extend over either pole of the earth, it will usually be so accurate you cannot detect any error even when generating tens of thousands of random points around each center (x0,y0).

How to transform lat/lon coordinates from degree to km to create a 1 km grid

I have a dataframe containing lat/lon coordinates in decimal degree.
My goal is to aggregate the data on a rectangular grid of 1 km². For that matter, I transformed my coordinates into km based on the method described in Convert latitude, longitude to distance from equator in kilometers and round to nearest kilometer
The method consists in calculating the distance from a reference point to the points (lat=0, lon) and (lat, lon=0).
But it doesn't work, because it seems to depend on the reference point.
By taking my reference point as (lon_ref=mean(lon), lat_ref=mean(lat)), I end up aggregating in the same tile points that are 120km away from each other.
This is the code that I am using :
# get the coordinates of my reference point
lat_ref, lon_ref = data["lat"].mean() , data["lon"].mean()
# the distance function
from pyproj import Geod
wgs84_geod = Geod(ellps='WGS84')
format = lambda x: wgs84_geod.inv(lon_ref,lat_ref,0,x)[2]/1000 #km
format = lambda x: wgs84_geod.inv(lon_ref,lat_ref,x,0)[2]/1000 #km
# Apply the function on my dataframe
data["londist"]=data['lon'].map(format)
data["latdist"]=data['lat'].map(format)
# round to the nearest km
step=1 # 1km
to_bin = lambda x: np.round(x / step) * step
data["latbin"] = data['latdist'].map(to_bin)
data["lonbin"] = data['londist'].map(to_bin)
This works for some lat/lon but not for others,
Example:
point1 (46.9574,4.29949) # lat,lon in °
point2( 46.9972 ,3.18153)
Calculate the distance and round using the above code:
point1 (latbin = 259 , lonbin=5205)
point2(latbin = 259 , lonbin=5205)
The two points will be aggregated together
However, the distance between the two points is 85 km!
dist=wgs84_geod.inv(4.29949,46.9574,3.18153,46.9972)[2]/1000
How can I solve this problem?
Is there any other efficient method to make the aggregation given that I have 10 millions lat/lon in my dataframe?
It looks like you assign to format variable twice, second assignment erasing first one. Did you mean to have something like format_lon and format_lat?
Note that once you fix this, the result will still depend on reference point - this is inevitable when you project spherical coordinates to flat map. But the result should be reasonable.

Dividing City by Longitude and Latitude into Matrix

How do you divide a city such as San Francisco into equally sized blocks according to longitude and latitude coordinates? The aim of this is when I receive a coordinate of a location in the city I want to assign it automatically into one of these blocks.
Here's a simple use of the publicly available python library module Geohash (https://github.com/vinsci/geohash).
A python hashmap (dictionary) is used to map Geohash strings to lists of lat/lng points which have been added in the area ("block") represented by the Geohash. A query is performed by computing a position's Geohash and accessing the hashmap for that entry to retrieve all of the entries for that "block".
### Add position to Geohash
### Each hash mapping contains a list of points in the "block"
###
def addToGeohash(m, latitude, longitude):
p = (latitude, longitude)
ph = encode(p[0],p[1],5)
if ph not in m:
m[ph] = []
m[ph].append(p)
return
### Get positions in same "block" or empty list if none found
###
def getFromGeohash(m, latitude, longitude):
p = (latitude, longitude)
ph = encode(p[0],p[1],5)
print ("query hash: " + ph)
if ph in m:
return m[ph]
return []
### Test
m = dict()
# Add 2 points in general area (1st near vista del mar)
addToGeohash(m, 37.779914,-122.509431)
addToGeohash(m, 37.780546,-122.366189)
# Query a point 769 meters from vista del mar point
n = getFromGeohash(m, 37.779642,-122.502993)
print ("Length of hashmap: "+str(len(m)))
print ("Contents of map : "+str(m))
print ("Query result : ")
print (n)
The default precision is 12 characters (5 was used in example) and will affect the dictionary mapping efficiency or "block" size.
Note that using a lat/lng based approach is a non-linear and so over vast areas or closer to poles would not be "equal sized blocks". However,
over the area of San Fransisco and with sufficient precision, this non-linearity is reduced significantly.
Output
query hash: 9q8yu
Length of hashmap: 2
Contents of map : {'9q8yu': [(37.779914, -122.509431)], '9q8yz': [(37.780546, -122.366189)]}
Query result :
[(37.779914, -122.509431)]
To get a sense of the block size for various precisions use this link and enter for example 37.779914,-122.509431 and a precision of 5. Experiment with precision.
Here are the approximate box sizes for precisions 5-8:
5 ≤ 4.89km × 4.89km
6 ≤ 1.22km × 0.61km
7 ≤ 153m × 153m
8 ≤ 38.2m × 19.1m
An interesting feature of the Geohash which works mostly (and always with the SanFran area) is you can easily find the 8 adjacent neighbors by manipulating the last character. So, with minimal effort you could use a higher precision (e.g. 8) and reduce it to a size between 7 and 8.
the naive approach is to find a big enough rectangle around the whole city (area you want to cover) and by the number of blocks desired you can deduce to how many parts divide the rectangular edges, it should be a fairly basic math
given a point you can assign it to it's block in a very fast way (just check it's lan and long see where it falls in the grid)
I've written different versions of this algorithm over the years. Spent a lot of time mulling over the problem. There are two issues involved:
Mapping 2-dimensional coordinates into a 1-dimensional value. (Fundamentally, in order to do this optimally, you must know the bounds of both dimensions)
Cutting the plane covering a sphere roughly into "squares". (The squares are different sizes as you get closer to the poles. Also, they're never actually squares, but the earth is so huge this usually doesn't matter)
This is some code that I did that suited my purpose. A few things to consider:
This creates blocks of a set size that you designate. The python Geohash library does this cool thing where the hash can be truncated to produce hashes of larger sizes. This algorithm does not do that, but on the flip side you can specify your desired block size (roughly).
This actually creates two-dimensional coordinates that I treat as a 1-dimensional string. I do this because its good enough for me and I can easily manipulate the string data to get the adjacent blocks in a way that is clear and makes sense. For example the hash: "-23,407" is 23 blocks west and 407 blocks north of the center point. So if you want to move one block to the east, you just add 1 to -23: "-22,407".
The center point that I used here is the middle of Washington DC. You could use center point 0,0, or the middle of San Francisco or whatever. But do not use center points near the poles: -90,-180. because when the algorithm goes to calculate the longitude offset in kilometers, it will calculate the distance between (-90, your-longitude) and (-90, -180). These points are at the south pole (I think the south pole?) and the distance will be infinitesimally small because at the poles all of these blocks are extremely small.
This is the main algorithm to hash the points:
# Define center point
CENTER_LAT = 38.893
CENTER_LNG = -77.084
def geohash(lat, lng, BLOCK_SIZE_KM=.05):
# Get Latitude Offset
lat_distance = haversine_km(
(CENTER_LAT, CENTER_LNG),
(lat, CENTER_LNG)
)
if lat < CENTER_LAT:
lat_distance = lat_distance*-1
lat_offset = int(lat_distance/BLOCK_SIZE_KM)
# Get Longitude offset
lng_distance = haversine_km(
(CENTER_LAT, CENTER_LNG),
(CENTER_LAT, lng)
)
if lng < CENTER_LNG:
lng_distance = lng_distance*-1
lng_offset = int(lng_distance/BLOCK_SIZE_KM)
block_str = '%s,%s' % (lat_offset, lng_offset)
return block_str
I included these helper functions for calculating the distance between two coordinates:
def haversine_km(origin, destination):
return haversine(origin, destination, 6371)
def haversine(origin, destination, radius):
lat1, lon1 = origin
lat2, lon2 = destination
lat1 = float(lat1)
lon1 = float(lon1)
lat2 = float(lat2)
lon2 = float(lon2)
dlat = math.radians(lat2-lat1)
dlon = math.radians(lon2-lon1)
a = math.sin(dlat/2) * math.sin(dlat/2) + math.cos(math.radians(lat1)) \
* math.cos(math.radians(lat2)) * math.sin(dlon/2) * math.sin(dlon/2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
d = radius * c
return d

Calculate the azimuth of a line between two points in a negative coordinate system

I have a problem that I cannot seem to work out. I also cannot find a solution already given on any prior posts.
I am working in a metric coordinate system where all of the variables are negative values (example: origin = -2,-2; north = -2,-1; east = -1,-2; south = -2, -3, west = -3,-2). It's a southern hemisphere coordinate system. I need to calculate the azimuth orientation and slope of a line that passes through two points, given that the first point is the origin point.
I have been able to write a script using Python that calculates the orientations (0-360 degrees) for each pair of points, but a number of the values are 180 degrees opposite, according to a reference data set that I am comparing my results against, which already has these values calculated.
If I use ATAN2 and then convert radians to degrees does it matter which quadrant on a 2D graph the line passes through? DO I need to add or subtract 0,90,180,270, or 360 depending on the quadrant? I think this is my problem, but I am not sure.
Lastly, the above assumes that I am making the calculations for orientation and slope in 2D spaces, respectively. Is there a more parsimonious way to calculate these variables within 3D space?
I've attached my current block of code that includes the calculation of the azimuth angles per quadrant. I would really appreciate any help you all can provide.
dn = north_2 - north_1
de = east_2 - east_1
x = x + 1
if dn<0 and de<=0:
q = "q3"
theta = math.degrees(math.atan2(dn,de))
orientation = 90- theta
if dn>=0 and de <0:
q = "q4"
theta = math.degrees(math.atan2(dn,de))
orientation = 270-theta
if dn>0 and de>=0:
q = "q1"
theta = math.degrees(math.atan2(dn,de))
orientation = 270-theta
if dn<=0 and de>0:
q = "q2"
theta = math.degrees(math.atan2(dn,de))
orientation = 90-theta

Categories

Resources