I have tried several libraries and ways to detect faces and export them as an image.
The problem is that all the algorithms are cutting a lot of the head.
Example from the deepface doc:
While I want something like:
Is there a way of doing so? Or adding "padding" to the coordinates in a smart way?
I get start and end points.
I build a function to do that with simple math:
def increase_rectangle_size(points: list[int64,int64,int64,int64], increase_percentage: int):
delta_x = (points[0] - points[2]) * increase_percentage // 100
delta_y = (points[1] - points[3]) * increase_percentage // 100
new_points = [points[0] + delta_x, points[1] + delta_y, points[2] - delta_x, points[3] - delta_y]
return [(i > 0) * i for i in new_points] # Negative numbers to zeros.
What it basically does is increase the distance between the two dots (On the dots 'line').
I don't want less than 0 values so I checked for negative numbers at the end of the function. I do not care if I get out of the frame (for bigger numbers).
I get the two points as a list ([x1, y1, x2, y2]) because this is how the library I use handles that but you can change it to 2 points of course.
Currently I'm working with a dataset that contains routes around the sea but some of them either cross land or are on land (due to the fidelity of the data being quite low). I have been using the great https://github.com/toddkarin/global-land-mask tool from toddkarin to find which of the coordinates I have are on land so I can discard them (eventually I'll may find a way of moving them to the nearest point at sea).
My current problem is that I need to find a way of finding if a line (given any two coordinates) crosses land (think of an island between two point in the sea).
My area of operation is the entire globe and I am using WGS84 if that changes anything. I have some very basic experience with matplotlib/Basemap but I'm not at all confident with it and I'm struggling to find where to start with this. Do I try to plot each coordinate along the line at a given distance/resolution and then use Todd's tool or is there a more efficient way?
Thanks in advance for any assistance. I've done a lot of digging and reading before posting but haven't found what I think I need.
I need the tool to be in python ideally but if I need to call another language/library/exe that can give me a True/False output that's good too.
A possible tool available in Python to perform these sorts of operations is Shapely.
If you're able to extract the polygon data of island and other masses, then you could use Shapely to perform an intersection test (see Line vs. Polygon Intersection Coordinates). This will work for checking intersections between points, lines and arbitrary polygons.
The quick and dirty way is as you propose yourself, to discretize the line between the two points and check each of these.
Thanks to some help from the answer here I came up with the following which is now working.
How to find all coordinates efficiently between two geo points/locations with certain interval using python
from global_land_mask import globe
def crosses_land(x1,y1,x2,y2):
# your geo points
#x1, y1 = 13.26077,100.81099
#x2, y2 = 13.13237,100.82993
# the increment step (higher = faster)
STEP = 0.0003
if x1 > x2: # x2 must be the bigger one here
x1, x2 = x2, x1
y1, y2 = y2, y1
for i in range(int((x2-x1)/STEP) + 1):
try:
x = x1 + i*STEP
y = (y1-y2)/(x1-x2) * (x - x1) + y1
except:
continue
is_on_land = globe.is_land(float(x), float(y))
#if not is_on_land:
#print("in water")
if is_on_land:
#print("crosses land")
return True
print(crosses_land(x1,y1,x2,y2))
I am calcualting the angle between two lines on the following image
with this code
# the lines are in the format (x1, y1, x2, y2)
def getAngle(line_1, line_2):
angle1 = math.atan2(line_1[1] - line_1[3], line_1[0] - line_1[2])
angle2 = math.atan2(line_2[1] - line_2[3], line_2[0] - line_2[2])
result = math.degrees(abs(angle1 - angle2))
if result < 0:
result += 360
return result
Now the function works between the two red lines (almost on top of each other) and the red and the green line. However, between the red and the blue line the fuction returns 239.1083 when it should be ~300. Since it is working in some of the cases and not others i am not sure what the problem is.
Some example inputs and outputs:
getAngle((316,309,316,-91), (316,309,421,209)) = 46.3971 # working
getAngle((316,309,316,-91), (199,239,316,309)) = 239.108 # should be around 300
For the example getAngle((316,309,316,-91), (199,239,316,309)), The culprit is measurement of angles in this case.
Angles are getting calculated w.r.t. positive X axis. The angle which you have defined here, calculates phi in the given image below rather than theta, which you should be expecting. Since the rotation is negative in nature (observe the arrow for phi), any subsequent calculation must ensure positive rotation, rather than the negative one. Otherwise, you'll be short by the complementary angle, roughly.
In the given example, the correct angle of line2 should be about +210 degrees, or about -150 degrees. Similarly, the angle of line1 could be +90 or -90 degrees. Now, it's all in the game of which ones to add or subtract and how?
The 239.something, let's call it 240 is gotten by abs(90-(-150)) The
300 you are expecting is gotten by abs(-90 - (+210)).
The difference of 60 degrees is the complement of theta = 30 degrees.
So, it's not so much as bad formula, it's bad argument passing and checking to get positive or negative angles.
I have several dataframes which each contain two columns of x and y values, so each row represents a point on a curve. The different dataframes then represent contours on a map. I have another series of data points (fewer in number), and I'd like to see which contour they are closest to on average.
I would like to establish the distance from each datapoint to each point on the curve, with sqrt(x^2+y^2) - sqrt(x_1^2 + y_1^2), add them up for each point on the curve. The trouble is that there are several thousand points on the curve, and there are only a few dozen datapoints to assess, so I can't simply put these in columns next to each other.
I think I need to cycle through the datapoints, checking the sqdistance between them and each point in the curve.
I don't know whether there is an easy function or module that can do this.
Thanks in advance!
Edit: Thanks for the comments. #Alexander: I've tried the vectorize function, as follows, with a sample dataset. I'm actually using contours which comprise several thousand datapoints, and the dataset to compare against are 100+, so I'd like to be able to automate as much as possible. I'm currently able to create a distance measurement from the first datapoint against my contour, but I would ideally like to cycle through j as well. When I try it, it comes up with an error:
import numpy as np
from numpy import vectorize
import pandas as pd
from pandas import DataFrame
df1 = {'X1':['1', '2', '2', '3'], 'Y1':['2', '5', '7', '9']}
df1 = DataFrame(df1, columns=['X1', 'Y1'])
df2 = {'X2':['3', '5', '6'], 'Y2':['10', '15', '16']}
df2 = DataFrame(df2, columns=['X2', 'Y2'])
df1=df1.astype(float)
df2=df2.astype(float)
Distance=pd.DataFrame()
i = range(0, len(df1))
j = range(0, len(df2))
def myfunc(x1, y1, x2, y2):
return np.sqrt((x2-x1)**2+np.sqrt(y2-y1)**2)
vfunc=np.vectorize(myfunc)
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[0]['X2'], df2.iloc[0]['Y2'])
Distance['Distance of Datapoint j to Contour']=vfunc(df1.iloc[i] ['X1'], df1.iloc[i]['Y1'], df2.iloc[1]['X2'], df2.iloc[1]['Y2'])
Distance
General idea
The "curve" is actually a polygon with a lot's of points. There definetly some libraries to calculate the distance between the polygon and the point. But generally it will be something like:
Calculate "approximate distance" to whole polygon, e.g. to the bounding box of a polygon (from point to 4 line segments), or to the center of bounding box
calculate distances to the lines of a polygon. If you have too many points then as an extra step "resolution" of a polygon might be reduced.
Smallest found distance is the distance from point to the polygon.
repeat for each point and each polygon
Existing solutions
Some libraries already can do that:
shapely question, shapely Geo-Python docs
Using shapely in geopandas to calculate distance
scipy.spatial.distance: scipy can be used to calculate distance between arbitrary number of points
numpy.linalg.norm(point1-point2): some answers propose different ways to calculate distance using numpy. Some even show performance benchmarks
sklearn.neighbors: not really about curves and distances to them, but can be used if you want to check "to which area point is most likely related"
And you can always calculate distances yourself using D(x1, y1, x2, y2) = sqrt((x₂-x₁)² + (y₂-y₁)²) and search for best combination of points that gives minimal distance
Example:
# get distance from points of 1 dataset to all the points of another dataset
from scipy.spatial import distance
d = distance.cdist(df1.to_numpy(), df2.to_numpy(), 'euclidean')
print(d)
# Results will be a matrix of all possible distances:
# [[ D(Point_df1_0, Point_df2_0), D(Point_df1_0, Point_df2_1), D(Point_df1_0, Point_df2_2)]
# [ D(Point_df1_1, Point_df2_0), D(Point_df1_1, Point_df2_1), D(Point_df1_1, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_2, Point_df2_1), D(Point_df1_2, Point_df2_2)]
# [ D(Point_df1_3, Point_df2_0), D(Point_df1_3, Point_df2_1), D(Point_df1_3, Point_df2_2)]]
[[ 8.24621125 13.60147051 14.86606875]
[ 5.09901951 10.44030651 11.70469991]
[ 3.16227766 8.54400375 9.8488578 ]
[ 1. 6.32455532 7.61577311]]
What to do next is up to you. For example as a metric of "general distance between curves" you can:
Pick smallest values in each row and each column (if you skip some columns/rows, then you might end up with candidate that "matches only a part of contour), and calculate their median: np.median(np.hstack([np.amin(d, axis) for axis in range(len(d.shape))])).
Or you can calculate mean value of:
all the distances: np.median(d)
of "smallest 2/3 of distances": np.median(d[d<np.percentile(d, 66, interpolation='higher')])
of "smallest distances that cover at least each rows and each columns":
for min_value in np.sort(d, None):
chosen_indices = d<=min_value
if np.all(np.hstack([np.amax(chosen_indices, axis) for axis in range(len(chosen_indices.shape))])):
break
similarity = np.median(d[chosen_indices])
Or maybe you can use different type of distance from the begining (e.g. "correlation distance" looks promising to your task)
Maybe use "Procrustes analysis, a similarity test for two data sets" together with distances.
Maybe you can use minkowski distance as a similarity metric.
Alternative approach
Alternative approach would be to use some "geometry" library to compare areas of concave hulls:
Build concave hulls for contours and for "candidate datapoints" (not easy, but possible: using shapely , using concaveman). But if you are sure that your contours are already ordered and without overlapping segments, then you can directly build polygons from those points without need for concave hull.
Use "intersection area" minus "non-common area" as a metric of similarity (shapely can be used for that):
Non-common area is: union - intersection or simply "symmetric difference"
Final metric: intersection.area - symmetric_difference.area (intersection, area)
This approach might be better than processing distances in some situations, for example:
You want to prefer "fewer points covering whole area" over "huge amount of very close points that cover only half of the area"
It's more obvious way to compare candidates with different number of points
But it has it's disadvantages too (just draw some examples on paper and experiment to find them)
Other ideas:
instead of using polygons or concave hull you can:
build a linear ring from your points and then use contour.buffer(some_distance). This way you ignore "internal area" of the contour and only compare contour itself (with tolerance of some_distance). Distance between centroids (or double of that) may be used as value for some_distance
You can build polygons/lines from segments using ops.polygonize
instead of using intersection.area - symmetric_difference.area you can:
Snap one object to another, and then compare snapped object to original
Before comparing real objects you can compare "simpler" versions of the objects to filter out obvious mismatches:
For example you can check if boundaries of objects intersect
Or you can simplify geometries before comparing them
For the distance, you need to change your formula to
def getDistance(x, y, x_i, y_i):
return sqrt((x_i -x)^2 + (y_i - y)^2)
with (x,y) being your datapoint and (x_i, y_i) being a point from the curve.
Consider using NumPy for vectorization. Explicitly looping through your data points will most likely be less efficient, depending on your use case, it might however be quick enough. (If you need to run it on a regular basis, I think vectorization will easily outspeed the explicit way) This could look something like this:
import numpy as np # Universal abbreviation for the module
datapoints = np.random.rand(3,2) # Returns a vector with randomized entries of size 3x2 (Imagine it as 3 sets of x- and y-values
contour1 = np.random.rand(1000, 2) # Other than the size (which is 1000x2) no different than datapoints
contour2 = np.random.rand(1000, 2)
contour3 = np.random.rand(1000, 2)
def squareDistanceUnvectorized(datapoint, contour):
retVal = 0.
print("Using datapoint with values x:{}, y:{}".format(datapoint[0], datapoint[1]))
lengthOfContour = np.size(contour, 0) # This gets you the number of lines in the vector
for pointID in range(lengthOfContour):
squaredXDiff = np.square(contour[pointID,0] - datapoint[0])
squaredYDiff = np.square(contour[pointID,1] - datapoint[1])
retVal += np.sqrt(squaredXDiff + squaredYDiff)
retVal = retVal / lengthOfContour # As we want the average, we are dividing the sum by the element count
return retVal
if __name__ == "__main__":
noOfDatapoints = np.size(datapoints,0)
contID = 0
for currentDPID in range(noOfDatapoints):
dist1 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour1)
dist2 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour2)
dist3 = squareDistanceUnvectorized(datapoints[currentDPID,:], contour3)
if dist1 > dist2 and dist1 > dist3:
contID = 1
elif dist2 > dist1 and dist2 > dist3:
contID = 2
elif dist3 > dist1 and dist3 > dist2:
contID = 3
else:
contID = 0
if contID == 0:
print("Datapoint {} is inbetween two contours".format(currentDPID))
else:
print("Datapoint {} is closest to contour {}".format(currentDPID, contID))
Okay, now moving on to vector-land.
I have taken the liberty to adjust this part to what I think is your dataset. Try it and let me know if it works.
import numpy as np
import pandas as pd
# Generate 1000 points (2-dim Vector) with random values between 0 and 1. Make them strings afterwards.
# This is the first contour
random2Ddata1 = np.random.rand(1000,2)
listOfX1 = [str(x) for x in random2Ddata1[:,0]]
listOfY1 = [str(y) for y in random2Ddata1[:,1]]
# Do the same for a second contour, except that we de-center this 255 units into the first dimension
random2Ddata2 = np.random.rand(1000,2)+[255,0]
listOfX2 = [str(x) for x in random2Ddata2[:,0]]
listOfY2 = [str(y) for y in random2Ddata2[:,1]]
# After this step, our 'contours' are basically two blobs of datapoints whose centers are approx. 255 units apart.
# Generate a set of 4 datapoints and make them a Pandas-DataFrame
datapoints = {'X': ['0.5', '0', '255.5', '0'], 'Y': ['0.5', '0', '0.5', '-254.5']}
datapoints = pd.DataFrame(datapoints, columns=['X', 'Y'])
# Do the same for the two contours
contour1 = {'Xf': listOfX1, 'Yf': listOfY1}
contour1 = pd.DataFrame(contour1, columns=['Xf', 'Yf'])
contour2 = {'Xf': listOfX2, 'Yf': listOfY2}
contour2 = pd.DataFrame(contour2, columns=['Xf', 'Yf'])
# We do now have 4 datapoints.
# - The first datapoint is basically where we expect the mean of the first contour to be.
# Contour 1 consists of 1000 points with x, y- values between 0 and 1
# - The second datapoint is at the origin. Its distances should be similar to the once of the first datapoint
# - The third datapoint would be the result of shifting the first datapoint 255 units into the positive first dimension
# - The fourth datapoint would be the result of shifting the first datapoint 255 units into the negative second dimension
# Transformation into numpy array
# First the x and y values of the data points
dpArray = ((datapoints.values).T).astype(np.float)
c1Array = ((contour1.values).T).astype(np.float)
c2Array = ((contour2.values).T).astype(np.float)
# This did the following:
# - Transform the datapoints and contours into numpy arrays
# - Transpose them afterwards so that if we want all x values, we can write var[0,:] instead of var[:,0].
# A personal preference, maybe
# - Convert all the values into floats.
# Now, we iterate through the contours. If you have a lot of them, putting them into a list beforehand would do the job
for contourid, contour in enumerate([c1Array, c2Array]):
# Now for the datapoints
for _index, _value in enumerate(dpArray[0,:]):
# The next two lines do vectorization magic.
# First, we square the difference between one dpArray entry and the contour x values.
# You might notice that contour[0,:] returns an 1x1000 vector while dpArray[0,_index] is an 1x1 float value.
# This works because dpArray[0,_index] is broadcasted to fit the size of contour[0,:].
dx = np.square(dpArray[0,_index] - contour[0,:])
# The same happens for dpArray[1,_index] and contour[1,:]
dy = np.square(dpArray[1,_index] - contour[1,:])
# Now, we take (for one datapoint and one contour) the mean value and print it.
# You could write it into an array or do basically anything with it that you can imagine
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
# But you want to be able to call this... so here we go, generating a function out of it!
def getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, listOfContourDataFrames):
""" Takes a DataFrame with points and a list of different contours to return the average distance for each combination"""
dpArray = ((datapoints.values).T).astype(np.float)
listOfContours = []
for item in listOfContourDataFrames:
listOfContours.append(((item.values).T).astype(np.float))
retVal = np.zeros((np.size(dpArray,1), len(listOfContours)))
for contourid, contour in enumerate(listOfContours):
for _index, _value in enumerate(dpArray[0,:]):
dx = np.square(dpArray[0,_index] - contour[0,:])
dy = np.square(dpArray[1,_index] - contour[1,:])
distance = np.mean(np.sqrt(dx+dy))
print("Mean distance between contour {} and datapoint {}: {}".format(contourid+1, _index+1, distance))
retVal[_index, contourid] = distance
return retVal
# And just to see that it is, indeed, returning the same results, run it once
getDistanceFromDatapointsToListOfContoursFindBetterName(datapoints, [contour1, contour2])
If we define a rectangle (x1,y1), (x2,y2) by its top left and bottom right hand corners and assume that all points are integer valued, I would l like to list all points in the union of a number of rectangles.
For one rectangle, the following function returns all the points within it.
def findpoints(x1,y1,x2,y2):
return [(x,y) for x in xrange(x1,x2+1) for y in xrange(y1,y2+1)]
I can find all the points in the union of two rectangles by,
set(findpoints(x1,y1,x2,y2)) | set(findpoints(x3,y3,x4,y4))
However I have a lot of rectangles and this is potentially very inefficient. For example, imagine if all the rectangles were almost identical. Is there a fast way of doing this?
I agree with StoryTeller but I think it is better to write it in more detail so it is understandable even for those of us with poor English skills
compute the minimal rectangle which is the overlapped area of all rectangles to test
x1 = max (rec[i].x1)
y1 = max (rec[i].y1)
x2 = min (rec[i].x2)
y2 = min (rec[i].y2)
i=0,... all rectangles -1
if x1>x2 or y1>y2 then all rectangles do not overlap and so no points are inside
test all points only against this new rectangle (x1,y1,x2,y2)
if (x>=x1) and (x<=x2) and (y>=y1) and (y<=y2) then point(x,y) is inside