Label points in section of np.meshgrid

Label points in section of np.meshgrid - python

I am trying to label x and y points based on their being in a specific section of a meshgrid in python. The points are stored in a pandas dataframe.
Here I have a scatter plot of the coordinates and above them I am plotting the grid.
The entire grid is way bigger, from the bottom left point (500,1250) to upper right point (2750, 3250), which means the whole grid is 225x200 sections.
I want to iterate through the sections of the grid and check if a point is inside. If a point is inside the section I want to add a label to the point. The label should be the same of the section name.
I want to add a column to the dataframe called 'section' that stores the section a point belongs to.
In the example (picture above) I would like to label all the points with
770 <= x <= 780 and 1795 <= y <= 1805 with the section name 'A3'.
my code currently looks like this:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
df = pd.read_csv('./file.csv', sep=';')
x_min = df['X[mm]'].min()
x_max = df['X[mm]'].max()
y_min = df['Y[mm]'].min()
y_max = df['Y[mm]'].max()
#side of the square in mm:
square_side = 10
xs = np.arange(x_min, x_max+square_side, square_side)
ys = np.arange(y_min, y_max+square_side, square_side)
x_2, y_2 = np.meshgrid(xs, ys, indexing = 'ij')
fig, ax = plt.subplots(figsize=(9,9))
ax.plot(df['X[mm]'], df['Y[mm]'], linewidth=0.2, c='black')
#plot meshgrid as grid instead of points:
segs1 = np.stack((x_2[:,[0,-1]],y_2[:,[0,-1]]), axis=2)
segs2 = np.stack((x_2[[0,-1],:].T,y_2[[0,-1],:].T), axis=2)
plt.gca().add_collection(LineCollection(np.concatenate((segs1, segs2))))
ax.set_aspect('equal', 'box')
plt.show()
I have also a function that determines if the points are inside of a rectangle (this does not use meshgrid):
def is_inside_rect(M, A, B, D):
'''Check if a point M is inside a rectangle with corners A, B, C, D'''
# 0 <= dot(BC,BM) <= dot(BC,BC)
#print(np.dot(B - A, D - A))
return 0 <= np.dot(B - A, M - A) <= np.dot(B - A, B - A) and 0 <= np.dot(D - B, M - B) <= np.dot(D - B, D - B)
I thought of using it in a while loop like this:
x = x_min
y = y_min
while (x <= x_max + square_side) and (y <= y_max + square_side):
A = np.array([x, y])
B = np.array([x + square_side, y])
D = np.array([x + square_side, y + square_side])
print(A, B, D)
df['c'] = df[['X[mm]', 'Y[mm]']].apply(lambda coord: 'red' if is_inside_rect(np.array(coord), A, B, D) else 'black', axis=1)
x += square_side
y += square_side
but this very slow and it changes the colors of all the points in every iteration.

Since all your points are equally sized, there is no need to define all of your squares beforehand and then determine which squares have which points. I would use the coordinates of each point to directly determine which square it will land in.
Let's take the 1-dimensional case, for the sake of simplicity. You want to group points on the number line into "squares" (really 1-d line segments). If your first square starts at x=0, your second at x=10, your third at x=20, and so on, how do you find the square for an arbitrary point x? You know that your squares are spaced by 10 (and you know they start at 0, which makes things easier), so you can simply divide by 10 and round down to get the square index.
You can just as easily do the same thing in 3-dimensions (or n-dimensions).
square_side = 10
x_min = df['X[mm]'].min()
y_min = df['Y[mm]'].min()
def label_point(x, y):
# Double forward slash is integer (round down) division
# Add 1 here if you really want 1-based indexing
x_label = (x - x_min) // square_side
y_label = chr(ord('A') + (y - y_min) // square_side)
return f'{y_label}{x_label}'
df['label'] = df[['X[mm]', 'Y[mm]']].apply(lambda coord: label_point(*coord), axis=1)
As for the efficiency, this solution looks at each point only once, and does a constant amount of work with each point, so it is O(n) in the number of points. Your solution looks at each square once, and for each square looks at each point this is O(n × m) where n is the number of points and m is the number of squares.
Your solution is more general, in that your is_inside_rect function works when your grid of rectangles has an arbitrary rotation. In this case, I would recommend rotating all your points about the origin, and then running my solution.
Also, your loop is adding 10 to x and y every loop, so you are traversing your space diagonally. I don't think you meant to do that.

Related

Some points are not displayed on the graph plotted using NumPy and matplotlib

For the following code whose job is to perform Monte Carlo integration for a function f, I was wondering what would happen if I define f as y = sqrt(1-x^2), which is the equation for a unit quarter circle, and specify an endpoint that is greater than 1, since we know that f is only defined for 0<x<1.
import numpy as np
import matplotlib.pyplot as plt
def definite_integral_show(f, x0, x1, N):
"""Approximate the definite integral of f(x)dx between x0 and x1 using
N random points
Arguments:
f -- a function of one real variable, must be nonnegative on [x0, x1]
N -- the number of random points to use
"""
#First, let's compute fmax. We do that by evaluating f(x) on a grid
#of points between x0 and x1
#This assumes that f is generally smooth. If it's not, we're in trouble!
x = np.arange(x0, x1, 0.01)
y = f(x)
print(y)
f_max = max(y)
#Now, let's generate the random points. The x's should be between
#x0 and x1, so we first create points beterrm 0 and (x1-x0), and
#then add x0
#The y's should be between 0 and fmax
#
# 0...(x1-x0)
x_rand = x0 + np.random.random(N)*(x1-x0)
print(x_rand)
y_rand = 0 + np.random.random(N)*f_max
#Now, let's find the indices of the poitns above and below
#the curve. That is, for points below the curve, let's find
# i s.t. y_rand[i] < f(x_rand)[i]
#And for points above the curve, find
# i s.t. y_rand[i] >= f(x_rand)[i]
ind_below = np.where(y_rand < f(x_rand))
ind_above = np.where(y_rand >= f(x_rand))
#Finally, let's display the results
plt.plot(x, y, color = "red")
pts_below = plt.scatter(x_rand[ind_below[0]], y_rand[ind_below[0]], color = "green")
pts_above = plt.scatter(x_rand[ind_above[0]], y_rand[ind_above[0]], color = "blue")
plt.legend((pts_below, pts_above),
('Pts below the curve', 'Pts above the curve'),
loc='lower left',
ncol=3,
fontsize=8)
def f1(x):
return np.sqrt(1-x**2)
definite_integral_show(f1, 0, 6, 200)
To my surprise, the program still works and gives me the following picture.
I suspect that it works because in NumPy, nan's in an array are just ignored when performing operations on the array. However, I don't understand why the picture only contains points whose x and y coordinates are both between 0 to 1. Where are the points that aren't within this range, but whose values are computed by
x_rand = x0 + np.random.random(N)*(x1-x0)
y_rand = 0 + np.random.random(N)*f_max

You can just print out the arrays (for example by generating only one random point) and see that they go into neither ind_below nor ind_above...
That's because all comparisons that involves nan returns False. (See also: What is the rationale for all comparisons returning false for IEEE754 NaN values?). (so y_rand < nan and y_rand >= nan both evaluates to False)
The easiest way to change the code is
ind_below = np.where(y_rand < f(x_rand))
ind_above = np.where(~(y_rand < f(x_rand)))
(optionally only compute the array once)

Is there a quick method to project points onto an certain grid?

I am now trying to project n points with 3 dimensional coordinates (x,y,z) onto a xy-grid with a certain size (like 64*64), of course the coordinate of such n points is restricted in this grid.
The goal is to print z coordinate of points which are projected onto each of grid elements. I write two for-loops, but is there any better method to avoid using for-loop to run it more quickly?
for i in range(XY_grid.shape[0]):
x = np.where((X_coordinate > i) & (X_coordinate <= i + 1), 1, 0)
for j in range(XY_grid.shape[1]):
y = np.where(( Y_coordinate > j) & (Y_coordinate <= j + 1), 1, 0)
print(x * y * Z_coordinate)

I think what you want is a 2D histogram:
import numpy as np
# generate some data (x, y, z)
x = np.arange(100)
y = np.random.rand(100)
z = np.arange(100)[::-1] * 1.5
# grid (x, y) onto a defined grid (0-127) in x and y
grid, xe, ye = np.histogram2d(x, y, bins=(np.arange(128), np.arange(128)), weights=None)
grid.sum()
>>> 100.0 # all data is in the grid (was only 100 points)
You can use the weight argument to add z values:
# grid (x, y) onto a defined grid (0-127) in x and y
grid, xe, ye = np.histogram2d(x, y, bins=(np.arange(128), np.arange(128)), weights=z)
grid.sum()
>>> 7425.0
z.sum()
>>> 7425.0 # all z values are in the produced grid
You can change the bins widths etc. to make them nonuniform, or keep them evenly spaced for a regular grid.
The resulting grid is a 2D numpy array which contains all of the z information that falls into each bin. You can easily just print it or loop over it to get every element in turn.

To print all the entries of Z_coordinate that coorespond to a specific point in X_coordinate and Y_coordinate you can do:
for i in range(XY_grid.shape[0]):
for j in range(XY_grid.shape[1]):
print(Z_coordinate[np.logical_and(X_coordinate==i, Y_coordinate==j)])

Trying to generate random x,y coordinates within a ring in python

I am trying to generate random x and y coordinates within a ring, which has an outer radius of 3.5 and an inner radius of 2. Therefor the following must be true for x and y:
x**2 + y**2 < 12.25 and x**2 + y**2 > 4
I wrote the following function:
def meteorites():
circle = False
while circle == False:
r = np.array([uniform(-6., 6.), uniform(-6., 6.)])
# we will regenerate random numbers untill the coordinates
# are within the ring x^2+y^2 < 3,5^2 and x^2+y^2 > 2^2
if (r[0]**2+r[1]**2 < 12.25) and (r[0]**2+r[1]**2 > 4.):
circle = True
else :
circle = False
return r[0], r[1]
x = np.zeros(1000)
y = np.zeros(1000)
for i in range(1000):
x[i] = meteorites()[0]
y[i] = meteorites()[1]
plt.scatter(x,y)
plt.show()
When I plot the resulting coordinates I get a square from -3.5 to 3.5. I can't seem to find the problem. I'm also not sure if it's a coding error, or some dum math problem. Since you guys are usually good at both, can you see what I'm doing wrong here?

To get uniform distribution of random point in the ring, one should take relative areas of thin circular regions into account. How it works for the circle
For your case generate uniform distribution of SquaredR in range of squared inner and outer radii. Pseudocode:
Fi = RandomUniform(0, 2 * Pi)
SquaredR = RandomUniform(inner*inner, outer*outer)
R = Sqrt(SquaredR)
x,y = R * Cos(Fi), R * Sin(Fi)

Take a random angle and a random distance between the two constraints; you'll need to produce a uniform distribution in a circle:
from math import sin, cos, radians, pi, sqrt
def meteorites():
angle = uniform(0, 2 * pi) # in radians
distance = sqrt(uniform(4, 12.25))
return distance * cos(angle), distance * sin(angle)

You're getting random points that don't fall on your ring because these two lines don't do what you want:
x[i] = meteorites()[0]
y[i] = meteorites()[1]
These assign an x value from one point on the ring to x[i], and the y value from a different point on the ring to y[i]. You get coordinates from different points because you're calling meteorites() twice.
Instead, you probably want to call the function once, and then assign to each coordinate, or do an assignment with iterable-unpacking where both targets are on the left side of the equals sign:
x[i], y[i] = meteorites()

Your implementation will also work if you correct one line: insstead of calling meteorites() twice, call just once.
x = np.zeros(1000)
y = np.zeros(1000)
for i in range(1000):
x[i], y[i] = meteorites()
plt.scatter(x,y)
plt.show()

I would also rather run through a loop that picks a random angle and a random distance within your ring range. Then calculate the coords from that.
But in your code the first problem is see is that should write:
x[i],y[i] = meteorites()
instead of
x[i] = meteorites()[0]
y[i] = meteorites()[1]
In your example, you're called meteorites() twice resulting in the x and y two different meteorites.

as #Martijn Pieters suggested, simply draw the polar coordinates uniformly in the range you require.
theta = uniform(0,2*np.pi)
r = uniform(2.,3.5)
x = r*np.cos(theta)
y = r*np.sin(theta)
EDIT: There will be equal probability for every point in the ring to occur.
But practically there will be less pixels for a given theta the closer r is to the lower limit. So "meteorites" with smaller r will occur with larger probability.
I believe this effect is negligeble.
EDIT 2: MBo's answer is better. Code:
theta = uniform(0, 2 * np.pi)
r = np.sqrt(uniform(2.0 ** 2, 3.5 ** 2)) # draw from sqrt distribution
x = r * np.cos(theta)
y = r * np.sin(theta)

You could try the following to generate 1000 samples using numpy:
import numpy
n = 1000
phi = numpy.random.uniform(0, 2*numpy.pi, n)
r = numpy.random.uniform(2, 3.5, n)
Then x, y coordinates can be constructed as follows using the transformation from radial to cartesian coordinates:
x = r * numpy.cos(phi)
y = r * numpy.sin(phi)
This shows the power of numpy, as x and y are now arrays without needing to iterate over n.

Optimizing by translation to map one x,y set of points onto another

I have a list of x,y ideal points, and a second list of x,y measured points. The latter has some offset and some noise.
I am trying to "fit" the latter to the former. So, extract the x,y offset of the latter relative to the former.
I'm following some examples of scipy.optimize.leastsq, but having trouble getting it working. Here is my code:
import random
import numpy as np
from scipy import optimize
# Generate fake data. Goal: Get back dx=0.1, dy=0.2 at the end of this exercise
dx = 0.1
dy = 0.2
# "Actual" (ideal) data.
xa = np.array([0,0,0,1,1,1])
ya = np.array([0,1,2,0,1,2])
# "Measured" (non-ideal) data. Add the offset and some randomness.
xm = map(lambda x: x + dx + random.uniform(0,0.01), xa)
ym = map(lambda y: y + dy + random.uniform(0,0.01), ya)
# Plot each
plt.figure()
plt.plot(xa, ya, 'b.', xm, ym, 'r.')
# The error function.
#
# Args:
# translations: A list of xy tuples, each xy tuple holding the xy offset
# between 'coords' and the ideal positions.
# coords: A list of xy tuples, each xy tuple holding the measured (non-ideal)
# coordinates.
def errfunc(translations, coords):
sum = 0
for t, xy in zip(translations, coords):
dx = t[0] + xy[0]
dy = t[1] + xy[1]
sum += np.sqrt(dx**2 + dy**2)
return sum
translations, coords = [], []
for xxa, yya, xxm, yym in zip(xa, ya, xm, ym):
t = (xxm-xxa, yym-yya)
c = (xxm, yym)
translations.append(t)
coords.append(c)
translation_guess = [0.05, 0.1]
out = optimize.leastsq(errfunc, translation_guess, args=(translations, coords), full_output=1)
print out
I get the error:
errfunc() takes exactly 2 arguments (3 given)"
I'm not sure why it says 3 arguments as I only gave it two. Can anyone help?
====
ANSWER:
I was thinking about this wrong. All I have to do is to take the average of the dx and dy's -- that gives the correct result.
n = xa.shape[0]
dx = -np.sum(xa - xm) / n
dy = -np.sum(ya - ym) / n
print dx, dy

The scipy.optimize.leastsq assumes that the function you are using already has one input, x0, the initial guess. Any other additional inputs are then listed in args.
So you are sending three arguments: translation_guess, transactions, and coords.
Note that here it specifies that args are "extra arguments."
Okay, I think I understand now. You have the actual locations and the measured locations and you want to figure out the constant offset, but there is noise on each pair. Correct me if I'm wrong:
xy = tuple with coordinates of measured point
t = tuple with measured offset (constant + noise)
The actual coordinates of a point are (xy - t) then?
If so, then we think it should be measured at (xy - t + guess).
If so, then our error is (xy - t + guess - xy) = (guess - t)
Where it is measured doesn't even matter! We just want to find the guess that is closest to all of the measured translations:
def errfunc(guess, translations):
errx = 0
erry = 0
for t in translations:
errx += guess[0] - t[0]
erry += guess[1] - t[1]
return errx,erry
What do you think? Does that make sense or did I miss something?

Locating the centroid (center of mass) of spherical polygons

I'm trying to work out how best to locate the centroid of an arbitrary shape draped over a unit sphere, with the input being ordered (clockwise or anti-cw) vertices for the shape boundary. The density of vertices is irregular along the boundary, so the arc-lengths between them are not generally equal. Because the shapes may be very large (half a hemisphere) it is generally not possible to simply project the vertices to a plane and use planar methods, as detailed on Wikipedia (sorry I'm not allowed more than 2 hyperlinks as a newcomer). A slightly better approach involves the use of planar geometry manipulated in spherical coordinates, but again, with large polygons this method fails, as nicely illustrated here. On that same page, 'Cffk' highlighted this paper which describes a method for calculating the centroid of spherical triangles. I've tried to implement this method, but without success, and I'm hoping someone can spot the problem?
I have kept the variable definitions similar to those in the paper to make it easier to compare. The input (data) is a list of longitude/latitude coordinates, converted to [x,y,z] coordinates by the code. For each of the triangles I have arbitrarily fixed one point to be the +z-pole, the other two vertices being composed of a pair of neighboring points along the polygon boundary. The code steps along the boundary (starting at an arbitrary point), using each boundary segment of the polygon as a triangle side in turn. A sub-centroid is determined for each of these individual spherical triangles and they are weighted according to triangle area and added to calculate the total polygon centroid. I don't get any errors when running the code, but the total centroids returned are clearly wrong (I have run some very basic shapes where the centroid location is unambiguous). I haven't found any sensible pattern in the location of the centroids returned...so at the moment I'm not sure what is going wrong, either in the math or code (although, the suspicion is the math).
The code below should work copy-paste as is if you would like to try it. If you have matplotlib and numpy installed, it will plot the results (it will ignore plotting if you don't). You just have to put the longitude/latitude data below the code into a text file called example.txt.
from math import *
try:
import matplotlib as mpl
import matplotlib.pyplot
from mpl_toolkits.mplot3d import Axes3D
import numpy
plotting_enabled = True
except ImportError:
plotting_enabled = False
def sph_car(point):
if len(point) == 2:
point.append(1.0)
rlon = radians(float(point[0]))
rlat = radians(float(point[1]))
x = cos(rlat) * cos(rlon) * point[2]
y = cos(rlat) * sin(rlon) * point[2]
z = sin(rlat) * point[2]
return [x, y, z]
def xprod(v1, v2):
x = v1[1] * v2[2] - v1[2] * v2[1]
y = v1[2] * v2[0] - v1[0] * v2[2]
z = v1[0] * v2[1] - v1[1] * v2[0]
return [x, y, z]
def dprod(v1, v2):
dot = 0
for i in range(3):
dot += v1[i] * v2[i]
return dot
def plot(poly_xyz, g_xyz):
fig = mpl.pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
# plot the unit sphere
u = numpy.linspace(0, 2 * numpy.pi, 100)
v = numpy.linspace(-1 * numpy.pi / 2, numpy.pi / 2, 100)
x = numpy.outer(numpy.cos(u), numpy.sin(v))
y = numpy.outer(numpy.sin(u), numpy.sin(v))
z = numpy.outer(numpy.ones(numpy.size(u)), numpy.cos(v))
ax.plot_surface(x, y, z, rstride=4, cstride=4, color='w', linewidth=0,
alpha=0.3)
# plot 3d and flattened polygon
x, y, z = zip(*poly_xyz)
ax.plot(x, y, z)
ax.plot(x, y, zs=0)
# plot the alleged 3d and flattened centroid
x, y, z = g_xyz
ax.scatter(x, y, z, c='r')
ax.scatter(x, y, 0, c='r')
# display
ax.set_xlim3d(-1, 1)
ax.set_ylim3d(-1, 1)
ax.set_zlim3d(0, 1)
mpl.pyplot.show()
lons, lats, v = list(), list(), list()
# put the two-column data at the bottom of the question into a file called
# example.txt in the same directory as this script
with open('example.txt') as f:
for line in f.readlines():
sep = line.split()
lons.append(float(sep[0]))
lats.append(float(sep[1]))
# convert spherical coordinates to cartesian
for lon, lat in zip(lons, lats):
v.append(sph_car([lon, lat, 1.0]))
# z unit vector/pole ('north pole'). This is an arbitrary point selected to act as one
#(fixed) vertex of the summed spherical triangles. The other two vertices of any
#triangle are composed of neighboring vertices from the polygon boundary.
np = [0.0, 0.0, 1.0]
# Gx,Gy,Gz are the cartesian coordinates of the calculated centroid
Gx, Gy, Gz = 0.0, 0.0, 0.0
for i in range(-1, len(v) - 1):
# cycle through the boundary vertices of the polygon, from 0 to n
if all((v[i][0] != v[i+1][0],
v[i][1] != v[i+1][1],
v[i][2] != v[i+1][2])):
# this just ignores redundant points which are common in my larger input files
# A,B,C are the internal angles in the triangle: 'np-v[i]-v[i+1]-np'
A = asin(sqrt((dprod(np, xprod(v[i], v[i+1])))**2
/ ((1 - (dprod(v[i+1], np))**2) * (1 - (dprod(np, v[i]))**2))))
B = asin(sqrt((dprod(v[i], xprod(v[i+1], np)))**2
/ ((1 - (dprod(np , v[i]))**2) * (1 - (dprod(v[i], v[i+1]))**2))))
C = asin(sqrt((dprod(v[i + 1], xprod(np, v[i])))**2
/ ((1 - (dprod(v[i], v[i+1]))**2) * (1 - (dprod(v[i+1], np))**2))))
# A/B/Cbar are the vertex angles, such that if 'O' is the sphere center, Abar
# is the angle (v[i]-O-v[i+1])
Abar = acos(dprod(v[i], v[i+1]))
Bbar = acos(dprod(v[i+1], np))
Cbar = acos(dprod(np, v[i]))
# e is the 'spherical excess', as defined on wikipedia
e = A + B + C - pi
# mag1/2/3 are the magnitudes of vectors np,v[i] and v[i+1].
mag1 = 1.0
mag2 = float(sqrt(v[i][0]**2 + v[i][1]**2 + v[i][2]**2))
mag3 = float(sqrt(v[i+1][0]**2 + v[i+1][1]**2 + v[i+1][2]**2))
# vec1/2/3 are cross products, defined here to simplify the equation below.
vec1 = xprod(np, v[i])
vec2 = xprod(v[i], v[i+1])
vec3 = xprod(v[i+1], np)
# multiplying vec1/2/3 by e and respective internal angles, according to the
#posted paper
for x in range(3):
vec1[x] *= Cbar / (2 * e * mag1 * mag2
* sqrt(1 - (dprod(np, v[i])**2)))
vec2[x] *= Abar / (2 * e * mag2 * mag3
* sqrt(1 - (dprod(v[i], v[i+1])**2)))
vec3[x] *= Bbar / (2 * e * mag3 * mag1
* sqrt(1 - (dprod(v[i+1], np)**2)))
Gx += vec1[0] + vec2[0] + vec3[0]
Gy += vec1[1] + vec2[1] + vec3[1]
Gz += vec1[2] + vec2[2] + vec3[2]
approx_expected_Gxyz = (0.78, -0.56, 0.27)
print('Approximate Expected Gxyz: {0}\n'
' Actual Gxyz: {1}'
''.format(approx_expected_Gxyz, (Gx, Gy, Gz)))
if plotting_enabled:
plot(v, (Gx, Gy, Gz))
Thanks in advance for any suggestions or insight.
EDIT: Here is a figure that shows a projection of the unit sphere with a polygon and the resulting centroid I calculate from the code. Clearly, the centroid is wrong as the polygon is rather small and convex but yet the centroid falls outside its perimeter.
EDIT: Here is a highly-similar set of coordinates to those above, but in the original [lon,lat] format I normally use (which is now converted to [x,y,z] by the updated code).
-39.366295 -1.633460
-47.282630 -0.740433
-53.912136 0.741380
-59.004217 2.759183
-63.489005 5.426812
-68.566001 8.712068
-71.394853 11.659135
-66.629580 15.362600
-67.632276 16.827507
-66.459524 19.069327
-63.819523 21.446736
-61.672712 23.532143
-57.538431 25.947815
-52.519889 28.691766
-48.606227 30.646295
-45.000447 31.089437
-41.549866 32.139873
-36.605156 32.956277
-32.010080 34.156692
-29.730629 33.756566
-26.158767 33.714080
-25.821513 34.179648
-23.614658 36.173719
-20.896869 36.977645
-17.991994 35.600074
-13.375742 32.581447
-9.554027 28.675497
-7.825604 26.535234
-7.825604 26.535234
-9.094304 23.363132
-9.564002 22.527385
-9.713885 22.217165
-9.948596 20.367878
-10.496531 16.486580
-11.151919 12.666850
-12.350144 8.800367
-15.446347 4.993373
-20.366139 1.132118
-24.784805 -0.927448
-31.532135 -1.910227
-39.366295 -1.633460
EDIT: A couple more examples...with 4 vertices defining a perfect square centered at [1,0,0] I get the expected result:
However, from a non-symmetric triangle I get a centroid that is nowhere close...the centroid actually falls on the far side of the sphere (here projected onto the front side as the antipode):
Interestingly, the centroid estimation appears 'stable' in the sense that if I invert the list (go from clockwise to counterclockwise order or vice-versa) the centroid correspondingly inverts exactly.

Anybody finding this, make sure to check Don Hatch's answer which is probably better.
I think this will do it. You should be able to reproduce this result by just copy-pasting the code below.
You will need to have the latitude and longitude data in a file called longitude and latitude.txt. You can copy-paste the original sample data which is included below the code.
If you have mplotlib it will additionally produce the plot below
For non-obvious calculations, I included a link that explains what is going on
In the graph below, the reference vector is very short (r = 1/10) so that the 3d-centroids are easier to see. You can easily remove the scaling to maximize accuracy.
Note to op: I rewrote almost everything so I'm not sure exactly where the original code was not working. However, at least I think it was not taking into consideration the need to handle clockwise / counterclockwise triangle vertices.
Legend:
(black line) reference vector
(small red dots) spherical triangle 3d-centroids
(large red / blue / green dot) 3d-centroid / projected to the surface / projected to the xy plane
(blue / green lines) the spherical polygon and the projection onto the xy plane
from math import *
try:
import matplotlib as mpl
import matplotlib.pyplot
from mpl_toolkits.mplot3d import Axes3D
import numpy
plotting_enabled = True
except ImportError:
plotting_enabled = False
def main():
# get base polygon data based on unit sphere
r = 1.0
polygon = get_cartesian_polygon_data(r)
point_count = len(polygon)
reference = ok_reference_for_polygon(polygon)
# decompose the polygon into triangles and record each area and 3d centroid
areas, subcentroids = list(), list()
for ia, a in enumerate(polygon):
# build an a-b-c point set
ib = (ia + 1) % point_count
b, c = polygon[ib], reference
if points_are_equivalent(a, b, 0.001):
continue # skip nearly identical points
# store the area and 3d centroid
areas.append(area_of_spherical_triangle(r, a, b, c))
tx, ty, tz = zip(a, b, c)
subcentroids.append((sum(tx)/3.0,
sum(ty)/3.0,
sum(tz)/3.0))
# combine all the centroids, weighted by their areas
total_area = sum(areas)
subxs, subys, subzs = zip(*subcentroids)
_3d_centroid = (sum(a*subx for a, subx in zip(areas, subxs))/total_area,
sum(a*suby for a, suby in zip(areas, subys))/total_area,
sum(a*subz for a, subz in zip(areas, subzs))/total_area)
# shift the final centroid to the surface
surface_centroid = scale_v(1.0 / mag(_3d_centroid), _3d_centroid)
plot(polygon, reference, _3d_centroid, surface_centroid, subcentroids)
def get_cartesian_polygon_data(fixed_radius):
cartesians = list()
with open('longitude and latitude.txt') as f:
for line in f.readlines():
spherical_point = [float(v) for v in line.split()]
if len(spherical_point) == 2:
spherical_point.append(fixed_radius)
cartesians.append(degree_spherical_to_cartesian(spherical_point))
return cartesians
def ok_reference_for_polygon(polygon):
point_count = len(polygon)
# fix the average of all vectors to minimize float skew
polyx, polyy, polyz = zip(*polygon)
# /10 is for visualization. Remove it to maximize accuracy
return (sum(polyx)/(point_count*10.0),
sum(polyy)/(point_count*10.0),
sum(polyz)/(point_count*10.0))
def points_are_equivalent(a, b, vague_tolerance):
# vague tolerance is something like a percentage tolerance (1% = 0.01)
(ax, ay, az), (bx, by, bz) = a, b
return all(((ax-bx)/ax < vague_tolerance,
(ay-by)/ay < vague_tolerance,
(az-bz)/az < vague_tolerance))
def degree_spherical_to_cartesian(point):
rad_lon, rad_lat, r = radians(point[0]), radians(point[1]), point[2]
x = r * cos(rad_lat) * cos(rad_lon)
y = r * cos(rad_lat) * sin(rad_lon)
z = r * sin(rad_lat)
return x, y, z
def area_of_spherical_triangle(r, a, b, c):
# points abc
# build an angle set: A(CAB), B(ABC), C(BCA)
# http://math.stackexchange.com/a/66731/25581
A, B, C = surface_points_to_surface_radians(a, b, c)
E = A + B + C - pi # E is called the spherical excess
area = r**2 * E
# add or subtract area based on clockwise-ness of a-b-c
# http://stackoverflow.com/a/10032657/377366
if clockwise_or_counter(a, b, c) == 'counter':
area *= -1.0
return area
def surface_points_to_surface_radians(a, b, c):
"""build an angle set: A(cab), B(abc), C(bca)"""
points = a, b, c
angles = list()
for i, mid in enumerate(points):
start, end = points[(i - 1) % 3], points[(i + 1) % 3]
x_startmid, x_endmid = xprod(start, mid), xprod(end, mid)
ratio = (dprod(x_startmid, x_endmid)
/ ((mag(x_startmid) * mag(x_endmid))))
angles.append(acos(ratio))
return angles
def clockwise_or_counter(a, b, c):
ab = diff_cartesians(b, a)
bc = diff_cartesians(c, b)
x = xprod(ab, bc)
if x < 0:
return 'clockwise'
elif x > 0:
return 'counter'
else:
raise RuntimeError('The reference point is in the polygon.')
def diff_cartesians(positive, negative):
return tuple(p - n for p, n in zip(positive, negative))
def xprod(v1, v2):
x = v1[1] * v2[2] - v1[2] * v2[1]
y = v1[2] * v2[0] - v1[0] * v2[2]
z = v1[0] * v2[1] - v1[1] * v2[0]
return [x, y, z]
def dprod(v1, v2):
dot = 0
for i in range(3):
dot += v1[i] * v2[i]
return dot
def mag(v1):
return sqrt(v1[0]**2 + v1[1]**2 + v1[2]**2)
def scale_v(scalar, v):
return tuple(scalar * vi for vi in v)
def plot(polygon, reference, _3d_centroid, surface_centroid, subcentroids):
fig = mpl.pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
# plot the unit sphere
u = numpy.linspace(0, 2 * numpy.pi, 100)
v = numpy.linspace(-1 * numpy.pi / 2, numpy.pi / 2, 100)
x = numpy.outer(numpy.cos(u), numpy.sin(v))
y = numpy.outer(numpy.sin(u), numpy.sin(v))
z = numpy.outer(numpy.ones(numpy.size(u)), numpy.cos(v))
ax.plot_surface(x, y, z, rstride=4, cstride=4, color='w', linewidth=0,
alpha=0.3)
# plot 3d and flattened polygon
x, y, z = zip(*polygon)
ax.plot(x, y, z, c='b')
ax.plot(x, y, zs=0, c='g')
# plot the 3d centroid
x, y, z = _3d_centroid
ax.scatter(x, y, z, c='r', s=20)
# plot the spherical surface centroid and flattened centroid
x, y, z = surface_centroid
ax.scatter(x, y, z, c='b', s=20)
ax.scatter(x, y, 0, c='g', s=20)
# plot the full set of triangular centroids
x, y, z = zip(*subcentroids)
ax.scatter(x, y, z, c='r', s=4)
# plot the reference vector used to findsub centroids
x, y, z = reference
ax.plot((0, x), (0, y), (0, z), c='k')
ax.scatter(x, y, z, c='k', marker='^')
# display
ax.set_xlim3d(-1, 1)
ax.set_ylim3d(-1, 1)
ax.set_zlim3d(0, 1)
mpl.pyplot.show()
# run it in a function so the main code can appear at the top
main()
Here is the longitude and latitude data you can paste into longitude and latitude.txt
-39.366295 -1.633460
-47.282630 -0.740433
-53.912136 0.741380
-59.004217 2.759183
-63.489005 5.426812
-68.566001 8.712068
-71.394853 11.659135
-66.629580 15.362600
-67.632276 16.827507
-66.459524 19.069327
-63.819523 21.446736
-61.672712 23.532143
-57.538431 25.947815
-52.519889 28.691766
-48.606227 30.646295
-45.000447 31.089437
-41.549866 32.139873
-36.605156 32.956277
-32.010080 34.156692
-29.730629 33.756566
-26.158767 33.714080
-25.821513 34.179648
-23.614658 36.173719
-20.896869 36.977645
-17.991994 35.600074
-13.375742 32.581447
-9.554027 28.675497
-7.825604 26.535234
-7.825604 26.535234
-9.094304 23.363132
-9.564002 22.527385
-9.713885 22.217165
-9.948596 20.367878
-10.496531 16.486580
-11.151919 12.666850
-12.350144 8.800367
-15.446347 4.993373
-20.366139 1.132118
-24.784805 -0.927448
-31.532135 -1.910227
-39.366295 -1.633460

To clarify: the quantity of interest is the projection of the true 3d centroid
(i.e. 3d center-of-mass, i.e. 3d center-of-area) onto the unit sphere.
Since all you care about is the direction from the origin to the 3d centroid,
you don't need to bother with areas at all;
it's easier to just compute the moment (i.e. 3d centroid times area).
The moment of the region to the left of a closed path on the unit sphere
is half the integral of the leftward unit vector as you walk around the path.
This follows from a non-obvious application of Stokes' theorem; see Frank Jones's vector calculus book, chapter 13 Problem 13-12.
In particular, for a spherical polygon, the moment is half the sum of
(a x b) / ||a x b|| * (angle between a and b) for each pair of consecutive vertices a,b.
(That's for the region to the left of the path;
negate it for the region to the right of the path.)
(And if you really did want the 3d centroid, just compute the area and divide the moment by it. Comparing areas might also be useful in choosing which of the two regions to call "the polygon".)
Here's some code; it's really simple:
#!/usr/bin/python
import math
def plus(a,b): return [x+y for x,y in zip(a,b)]
def minus(a,b): return [x-y for x,y in zip(a,b)]
def cross(a,b): return [a[1]*b[2]-a[2]*b[1], a[2]*b[0]-a[0]*b[2], a[0]*b[1]-a[1]*b[0]]
def dot(a,b): return sum([x*y for x,y in zip(a,b)])
def length(v): return math.sqrt(dot(v,v))
def normalized(v): l = length(v); return [1,0,0] if l==0 else [x/l for x in v]
def addVectorTimesScalar(accumulator, vector, scalar):
for i in xrange(len(accumulator)): accumulator[i] += vector[i] * scalar
def angleBetweenUnitVectors(a,b):
# https://www.plunk.org/~hatch/rightway.html
if dot(a,b) < 0:
return math.pi - 2*math.asin(length(plus(a,b))/2.)
else:
return 2*math.asin(length(minus(a,b))/2.)
def sphericalPolygonMoment(verts):
moment = [0.,0.,0.]
for i in xrange(len(verts)):
a = verts[i]
b = verts[(i+1)%len(verts)]
addVectorTimesScalar(moment, normalized(cross(a,b)),
angleBetweenUnitVectors(a,b) / 2.)
return moment
if __name__ == '__main__':
import sys
def lonlat_degrees_to_xyz(lon_degrees,lat_degrees):
lon = lon_degrees*(math.pi/180)
lat = lat_degrees*(math.pi/180)
coslat = math.cos(lat)
return [coslat*math.cos(lon), coslat*math.sin(lon), math.sin(lat)]
verts = [lonlat_degrees_to_xyz(*[float(v) for v in line.split()])
for line in sys.stdin.readlines()]
#print "verts = "+`verts`
moment = sphericalPolygonMoment(verts)
print "moment = "+`moment`
print "centroid unit direction = "+`normalized(moment)`
For the example polygon, this gives the answer (unit vector):
[-0.7644875430808217, 0.579935445918147, -0.2814847687566214]
This is roughly the same as, but more accurate than, the answer computed by #KobeJohn's code, which uses rough tolerances and planar approximations to the sub-centroids:
[0.7628095787179151, -0.5977153368303585, 0.24669398601094406]
The directions of the two answers are roughly opposite (so I guess KobeJohn's code
decided to take the region to the right of the path in this case).

I think a good approximation would be to compute the center of mass using weighted cartesian coordinates and projecting the result onto the sphere (supposing the origin of coordinates is (0, 0, 0)^T).
Let be (p[0], p[1], ... p[n-1]) the n points of the polygon. The approximative (cartesian) centroid can be computed by:
c = 1 / w * (sum of w[i] * p[i])
whereas w is the sum of all weights and whereas p[i] is a polygon point and w[i] is a weight for that point, e.g.
w[i] = |p[i] - p[(i - 1 + n) % n]| / 2 + |p[i] - p[(i + 1) % n]| / 2
whereas |x| is the length of a vector x.
I.e. a point is weighted with half the length to the previous and half the length to the next polygon point.
This centroid c can now projected onto the sphere by:
c' = r * c / |c|
whereas r is the radius of the sphere.
To consider orientation of polygon (ccw, cw) the result may be
c' = - r * c / |c|.

Sorry I (as a newly registered user) had to write a new post instead of just voting/commenting on the above answer by Don Hatch. Don's answer, I think, is the best and most elegant. It is mathematically rigorous in computing the center of mass (first moment of mass) in a simple way when applying to the spherical polygon.
Kobe John's answer is a good approximation but only satisfactory for smaller areas. I also noticed a few glitches in the code. Firstly, the reference point should be projected to the spherical surface to compute the actual spherical area. Secondly, function points_are_equivalent() might need to be refined to avoid divided-by-zero.
The approximation error in Kobe's method lies in the calculation of the centroid of spherical triangles. The sub-centroid is NOT the center of mass of the spherical triangle but the planar one. This is not an issue if one is to determine that single triangle (sign may flip, see below). It is also not an issue if triangles are small (e.g. a dense triangulation of the polygon).
A few simple tests could illustrate the approximation error. For example if we use just four points:
10 -20
10 20
-10 20
-10 -20
The exact answer is (1,0,0) and both methods are good. But if you throw in a few more points along one edge (e.g. add {10,-15},{10,-10}... to the first edge), you'll see the results from Kobe's method start to shift. Further more, if you increase the longitude from [10,-10] to [100,-100], you'll see Kobe's result flips the direction. A possible improvement might be to add another level(s) for sub-centroid calculation (basically refine/reduce sizes of triangles).
For our application, the spherical area boundary is composed of multiple arcs and thus not polygon (i.e. the arc is not part of great circle). But this will just be a little more work to find the n-vector in the curve integration.
EDIT: Replacing the subcentroid calculation with the one given in Brock's paper should fix Kobe's method. But I did not try though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.