Calculating area under the curves

Calculating area under the curves - python

I am attempting to calculate the area of the blue region and the area of yellow region:
In this graph: y=blue, peak_line=green, thresh=orange.
I am using this code:
idx = np.argwhere(np.diff(np.sign(y - peak_line))).flatten()
bounds = [1077.912, 1078.26, 1078.336, 1078.468, 1078.612, 1078.78, 1078.828, 1078.88, 1079.856, 1079.86]
plt.plot(x, y, x, thresh, x, peak_line)
plt.fill_between(x, y, thresh, where=(y>=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, thresh, peak_line, where=(y<=peak_line),interpolate=True, color='#fff8ba')
plt.fill_between(x, y, peak_line, where=(y>=peak_line) & (x>=x[idx][0]) & (x<=bounds[-1]), interpolate=True, color='#CDEAFF')
plt.plot(x[idx], y[idx], 'ro')
plt.show()
estimated_y = interp1d(x, y, kind='cubic')
estimated_peak_line = interp1d(x, peak_line, kind='cubic')
estimated_thresh = interp1d(x, thresh, kind='cubic')
yellow_areas = []
blue_areas = []
for i in range(len(bounds) - 1):
midpoint = (bounds[i] + bounds[i+1]) / 2
if estimated_y(midpoint) < estimated_peak_line(midpoint):
above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
above_thresh_line = abs(integrate.quad(estimated_thresh, bounds[i], bounds[i+1])[0])
yellow_areas.append(above_peak_line - above_thresh_line)
else:
above_peak_line = abs(integrate.quad(estimated_peak_line, bounds[i], bounds[i+1])[0])
above_y = abs(integrate.quad(estimated_y, bounds[i], bounds[i+1])[0])
blue_areas.append(above_peak_line - above_y)
print(sum(yellow_areas))
print(sum(blue_areas))
4.900000000000318
2.999654602006661
I thought I calculated the area of the blue region and the area of yellow region correct, until I calculated the area of the polygon:
bunch_of_xs = np.linspace(min(x), max(x), num=10000, endpoint=True)
final_curve = estimated_y(bunch_of_xs)
final_thresh = estimated_thresh(bunch_of_xs)
final_peak_line = estimated_peak_line(bunch_of_xs)
def PolygonArea(corners):
n = len(corners) # of corners
area = 0.0
for i in range(n):
j = (i + 1) % n
area += corners[i][0] * corners[j][1]
area -= corners[j][0] * corners[i][1]
area = abs(area) / 2.0
return area
vertex1 = (bunch_of_xs[0], final_thresh[0])
vertex2 = (bunch_of_xs[-1], final_thresh[-1])
vertex3 = (x[idx][-1], y[idx][-1])
vertex4 = (x[idx][0], y[idx][0])
coords = (vertex1,vertex2,vertex3,vertex4)
plt.plot(x, y, 'o', bunch_of_xs, final_curve, '--', bunch_of_xs, final_thresh, bunch_of_xs, final_peak_line)
x_val = [x[0] for x in coords]
y_val = [x[1] for x in coords]
plt.plot(x_val,y_val,'or')
print("Coordinates of total polygon:", coords)
print("Total polygon area:", PolygonArea(coords))
Coordinates of total polygon: ((1077.728, -41.30177170550451), (1079.96, -42.254314285935834), (1079.86, -49.207348695828706), (1077.912, -48.271572477115136))
Total polygon area: 14.509708069890621
The sum of the area of the blue region and the area of yellow region should equal the total polygon area.
4.900000000000318 + 2.999654602006661 ≠ 14.509708069890621
What am I doing wrong?
Edit: This code will be used for many different graphs. Not all graphs look the same. For example, this graph has 3 blue regions and so I have to calculate the area of all 3 blue regions and add them together to get the total blue area. Every graph has a different amount of blue regions (some only have 1 region). So, I have to make the code flexible to account for the possibility of a graph having multiple blue regions to add together to get the total blue region area.

Since I don't have all of your data I will give something between pseudo-code and implementation.
Say we have arrays x (x-axis), y1 (data), y2 (some line which bounds the parts over which we want to integrate).
First step: Iterate over your bounds array and see which parts we want to integrate over. I assume that you have the bounds array already, as your question suggests.
def get_pairs_of_idxs(x, y1, y2, bounds):
lst_pairs = []
for i in range(len(bounds)-1):
x0, x1 = bounds[i], bounds[i+1]
xc = 0.5 * (x0 + x1) # we want to see if the straight line y2 is above or below, so we take one x value and test it
indx_xc = np.searchsorted(x, xc) # this returns us the index at which xc is located
y1c, y2c = y1[indx_xc], y2[indx_xc]
if y2c < y1c: # then the line is below the curve, so we want to integrate
lst_pairs.append((x0, x1))
Now we have a list of pairs of indices, between which we want to integrate.
def solution(x, y1, y2, bounds):
tot_area = 0
lst_pairs = get_pairs_of_idxs(x, y1, y2, bounds)
for x0, x1 in lst_pairs:
mask = np.logical_and(x >= x0, x <= x1) # relevant places in x and y data
xs = x[mask] # the x values along which we integrate
ys = (y2 - y1)[mask] # we want to integrate the difference of the curves
tot_area += np.trapz(ys, xs)
return tot_area
That's what I was thinking about.

In general, the area between two curves f(x) and g(x) is integral(g(x) - f(x)).
So say we have two curves:
xvals = np.linspace(0, 1, 100)
yvals_1 = np.sin(xvals * 10)
yvals_2 = 0.5 - 0.5 * xvals
plt.plot(xvals, yvals_1, '-b')
plt.plot(xvals, yvals_2, '-g')
The "transformed" curve becomes:
yvals_3 = yvals_1 - yvals_2
plt.plot(xvals, yvals_3, '--r')
plt.plot(xvals, np.zeros(xvals.shape), '--k')
And since we want to ignore everything under the green line,
yvals_3[yvals_3 < 0] = 0
plt.plot(xvals, yvals_3, '-r')
Since you want to impose additional constraints, such as "only the area between the first and last intersections", do that now.
# Cheating a little bit -- but you already know how to get the intersections.
first_intersection_x = xvals[4]
last_intersection_x = xvals[94]
cfilter = np.logical_and(xvals >= first_intersection_x, xvals <= last_intersection_x)
xvals_calc = xvals[cfilter]
yvals_calc = yvals_3[cfilter]
The area under this curve is easily calculated using np.trapz
area_under_curve = np.trapz(yvals_calc, xvals_calc)
Of course, this answer assumes that yvals_1 and yvals_2 are available at the same xvals. If not, interpolation is easy.

Related

Walk along path of discrete line segments evenly distributing points

I am trying to write a program that given a list of points indicating a path and given a desired number of marks, it should distribute these marks exactly evenly along the path. As it happens the path is cyclical but given an arbitrary point to both start and end at I don't think it affects the algorithm at all.
The first step is to sum up the length of the line segments to determine the total length of the path, and then dividing that by the number of marks to get the desired distance between marks. Easy enough.
The next step is to walk along the path, storing the coordinates of each mark each time you traverse another even multiple's worth of the distance between marks.
In my code, the traversal seems correct but the distribution of marks is not even and does not exactly follow the path. I have created a visualization in matplotlib to plot where the marks are landing showing this (see last section).
path data
point_data = [
(53.8024, 50.4762), (49.5272, 51.8727), (45.0118, 52.3863), (40.5399, 53.0184), (36.3951, 54.7708),
(28.7127, 58.6807), (25.5306, 61.4955), (23.3828, 65.2082), (22.6764, 68.3316), (22.6945, 71.535),
(24.6674, 77.6427), (28.8279, 82.4529), (31.5805, 84.0346), (34.7024, 84.8875), (45.9183, 84.5739),
(57.0529, 82.9846), (64.2141, 79.1657), (71.089, 74.802), (76.7944, 69.8429), (82.1092, 64.4783),
(83.974, 63.3605), (85.2997, 61.5455), (85.7719, 59.4206), (85.0764, 57.3729), (82.0979, 56.0247),
(78.878, 55.1062), (73.891, 53.0987), (68.7101, 51.7283), (63.6943, 51.2997), (58.6791, 51.7438),
(56.1255, 51.5243), (53.8024, 50.4762), (53.8024, 50.4762)]
traversal
import math
number_of_points = 20
def euclid_dist(x1, y1, x2, y2):
return ((x1-x2)**2 + (y1-y2)**2)**0.5
def move_point(x0, y0, d, theta_rad):
return x0 + d*math.cos(theta_rad), y0 + d*math.sin(theta_rad)
total_dist = 0
for i in range(1, len(point_data), 1):
x1, y1 = point_data[i - 1]
x2, y2 = point_data[i]
total_dist += euclid_dist(x1, y1, x2, y2)
dist_per_point = total_dist / number_of_points
length_left_over = 0 # distance left over from the last segment
# led_id = 0
results = []
for i in range(1, len(point_data), 1):
x1, y1 = point_data[i - 1]
x2, y2 = point_data[i]
angle_rads = math.atan2(y1-y2, x1-x2)
extra_rotation = math.pi / 2 # 90deg
angle_output = math.degrees((angle_rads + extra_rotation + math.pi) % (2*math.pi) - math.pi)
length_of_segment = euclid_dist(x1, y1, x2, y2)
distance_to_work_with = length_left_over + length_of_segment
current_dist = dist_per_point - length_left_over
while distance_to_work_with > dist_per_point:
new_point = move_point(x1, y1, current_dist, angle_rads)
results.append((new_point[0], new_point[1], angle_output))
current_dist += dist_per_point
distance_to_work_with -= dist_per_point
length_left_over = distance_to_work_with
visualization code
import matplotlib.pyplot as plt
from matplotlib import collections as mc
import numpy as np
X = np.array([x for x, _, _ in results])
Y = np.array([y for _, y, _ in results])
plt.scatter(X, Y)
for i, (x, y) in enumerate(zip(X, Y)):
plt.text(x, y, str(i), color="red", fontsize=12)
possible_colors = [(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)]
lines = []
colors = []
for i in range(len(point_data) -1 , 0, -1):
x1, y1 = point_data[i - 1]
x2, y2 = point_data[i]
lines.append(((x1, y1), (x2, y2)))
colors.append(possible_colors[i % 3])
lc = mc.LineCollection(lines, colors = colors, linewidths=2)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale()
ax.margins(0.1)
plt.show()
visualization result

The key here is to find the segment on the path for each of the points we want to distribute along the path based on the cumulative distance (across segments) from the starting point on the path. Then, interpolate for the point based on the distance between the two end points of the segment in which the point is on the path. The following code does this using a mixture of numpy array processing and list comprehension:
point_data = [
(53.8024, 50.4762), (49.5272, 51.8727), (45.0118, 52.3863), (40.5399, 53.0184), (36.3951, 54.7708),
(28.7127, 58.6807), (25.5306, 61.4955), (23.3828, 65.2082), (22.6764, 68.3316), (22.6945, 71.535),
(24.6674, 77.6427), (28.8279, 82.4529), (31.5805, 84.0346), (34.7024, 84.8875), (45.9183, 84.5739),
(57.0529, 82.9846), (64.2141, 79.1657), (71.089, 74.802), (76.7944, 69.8429), (82.1092, 64.4783),
(83.974, 63.3605), (85.2997, 61.5455), (85.7719, 59.4206), (85.0764, 57.3729), (82.0979, 56.0247),
(78.878, 55.1062), (73.891, 53.0987), (68.7101, 51.7283), (63.6943, 51.2997), (58.6791, 51.7438),
(56.1255, 51.5243), (53.8024, 50.4762), (53.8024, 50.4762)
]
number_of_points = 20
def euclid_dist(x1, y1, x2, y2):
return ((x1-x2)**2 + (y1-y2)**2)**0.5
# compute distances between segment end-points (padded with 0. at the start)
# I am using the OP supplied function and list comprehension, but this
# can also be done using numpy
dist_between_points = [0.] + [euclid_dist(p0[0], p0[1], p1[0], p1[1])
for p0, p1 in zip(point_data[:-1], point_data[1:])]
cum_dist_to_point = np.cumsum(dist_between_points)
total_dist = sum(dist_between_points)
cum_dist_per_point = np.linspace(0., total_dist, number_of_points, endpoint=False)
# find the segment that the points will be in
point_line_segment_indices = np.searchsorted(cum_dist_to_point, cum_dist_per_point, side='right').astype(int)
# then do linear interpolation for the point based on distance between the two end points of the segment
# d0s: left end-point cumulative distances from start for segment containing point
# d1s: right end-point cumulative distances from start for segment containing point
# alphas: the interpolation distance in the segment
# p0s: left end-point for segment containing point
# p1s: right end-point for segment containing point
d0s = cum_dist_to_point[point_line_segment_indices - 1]
d1s = cum_dist_to_point[point_line_segment_indices]
alphas = (cum_dist_per_point - d0s) / (d1s - d0s)
p0s = [point_data[segment_index - 1] for segment_index in point_line_segment_indices]
p1s = [point_data[segment_index] for segment_index in point_line_segment_indices]
results = [(p0[0] + alpha * (p1[0] - p0[0]), p0[1] + alpha * (p1[1] - p0[1]))
for p0, p1, alpha in zip(p0s, p1s, alphas)]
The array cum_dist_to_point is the cumulative (across segments) distance along the path from the start to each point in point_data, and the array cum_dist_per_point is the cumulative distance along the path for the number of points we want to evenly distribute along the path. Note that we use np.searchsorted to identify the segment on the path (by cumulative distance from start) that the point, with a given distance from the start, lies in. According to the documentation, searchsorted:
Find the indices into a sorted array (first argument) such that, if the corresponding elements in the second argument were inserted before the indices, the order would be preserved.
Then, using the OP's plot function (slightly modified because results no longer has an angle component):
def plot_me(results):
X = np.array([x for x, _ in results])
Y = np.array([y for _, y in results])
plt.scatter(X, Y)
for i, (x, y) in enumerate(zip(X, Y)):
plt.text(x, y, str(i), color="red", fontsize=12)
possible_colors = [(1, 0, 0, 1), (0, 1, 0, 1), (0, 0, 1, 1)]
lines = []
colors = []
for i in range(len(point_data) -1, 0, -1):
x1, y1 = point_data[i - 1]
x2, y2 = point_data[i]
lines.append(((x1, y1), (x2, y2)))
colors.append(possible_colors[i % 3])
lc = mc.LineCollection(lines, colors=colors, linewidths=2)
fig, ax = plt.subplots()
ax.add_collection(lc)
ax.autoscale()
ax.margins(0.1)
plt.show()
We have:
plot_me(results)

Generate random number outside of range in python

I'm currently working on a pygame game and I need to place objects randomly on the screen, except they cannot be within a designated rectangle. Is there an easy way to do this rather than continuously generating a random pair of coordinates until it's outside of the rectangle?
Here's a rough example of what the screen and the rectangle look like.
______________
| __ |
| |__| |
| |
| |
|______________|
Where the screen size is 1000x800 and the rectangle is [x: 500, y: 250, width: 100, height: 75]
A more code oriented way of looking at it would be
x = random_int
0 <= x <= 1000
and
500 > x or 600 < x
y = random_int
0 <= y <= 800
and
250 > y or 325 < y

Partition the box into a set of sub-boxes.
Among the valid sub-boxes, choose which one to place your point in with probability proportional to their areas
Pick a random point uniformly at random from within the chosen sub-box.
This will generate samples from the uniform probability distribution on the valid region, based on the chain rule of conditional probability.

This offers an O(1) approach in terms of both time and memory.
Rationale
The accepted answer along with some other answers seem to hinge on the necessity to generate lists of all possible coordinates, or recalculate until there is an acceptable solution. Both approaches take more time and memory than necessary.
Note that depending on the requirements for uniformity of coordinate generation, there are different solutions as is shown below.
First attempt
My approach is to randomly choose only valid coordinates around the designated box (think left/right, top/bottom), then select at random which side to choose:
import random
# set bounding boxes
maxx=1000
maxy=800
blocked_box = [(500, 250), (100, 75)]
# generate left/right, top/bottom and choose as you like
def gen_rand_limit(p1, dim):
x1, y1 = p1
w, h = dim
x2, y2 = x1 + w, y1 + h
left = random.randrange(0, x1)
right = random.randrange(x2+1, maxx-1)
top = random.randrange(0, y1)
bottom = random.randrange(y2, maxy-1)
return random.choice([left, right]), random.choice([top, bottom])
# check boundary conditions are met
def check(x, y, p1, dim):
x1, y1 = p1
w, h = dim
x2, y2 = x1 + w, y1 + h
assert 0 <= x <= maxx, "0 <= x(%s) <= maxx(%s)" % (x, maxx)
assert x1 > x or x2 < x, "x1(%s) > x(%s) or x2(%s) < x(%s)" % (x1, x, x2, x)
assert 0 <= y <= maxy, "0 <= y(%s) <= maxy(%s)" %(y, maxy)
assert y1 > y or y2 < y, "y1(%s) > y(%s) or y2(%s) < y(%s)" % (y1, y, y2, y)
# sample
points = []
for i in xrange(1000):
x,y = gen_rand_limit(*blocked_box)
check(x, y, *blocked_box)
points.append((x,y))
Results
Given the constraints as outlined in the OP, this actually produces random coordinates (blue) around the designated rectangle (red) as desired, however leaves out any of the valid points that are outside the rectangle but fall within the respective x or y dimensions of the rectangle:
# visual proof via matplotlib
import matplotlib
from matplotlib import pyplot as plt
from matplotlib.patches import Rectangle
X,Y = zip(*points)
fig = plt.figure()
ax = plt.scatter(X, Y)
p1 = blocked_box[0]
w,h = blocked_box[1]
rectangle = Rectangle(p1, w, h, fc='red', zorder=2)
ax = plt.gca()
plt.axis((0, maxx, 0, maxy))
ax.add_patch(rectangle)
Improved
This is easily fixed by limiting only either x or y coordinates (note that check is no longer valid, comment to run this part):
def gen_rand_limit(p1, dim):
x1, y1 = p1
w, h = dim
x2, y2 = x1 + w, y1 + h
# should we limit x or y?
limitx = random.choice([0,1])
limity = not limitx
# generate x, y O(1)
if limitx:
left = random.randrange(0, x1)
right = random.randrange(x2+1, maxx-1)
x = random.choice([left, right])
y = random.randrange(0, maxy)
else:
x = random.randrange(0, maxx)
top = random.randrange(0, y1)
bottom = random.randrange(y2, maxy-1)
y = random.choice([top, bottom])
return x, y
Adjusting the random bias
As pointed out in the comments this solution suffers from a bias given to points outside the rows/columns of the rectangle. The following fixes that in principle by giving each coordinate the same probability:
def gen_rand_limit(p1, dim):
x1, y1 = p1Final solution -
w, h = dim
x2, y2 = x1 + w, y1 + h
# generate x, y O(1)
# --x
left = random.randrange(0, x1)
right = random.randrange(x2+1, maxx)
withinx = random.randrange(x1, x2+1)
# adjust probability of a point outside the box columns
# a point outside has probability (1/(maxx-w)) v.s. a point inside has 1/w
# the same is true for rows. adjupx/y adjust for this probability
adjpx = ((maxx - w)/w/2)
x = random.choice([left, right] * adjpx + [withinx])
# --y
top = random.randrange(0, y1)
bottom = random.randrange(y2+1, maxy)
withiny = random.randrange(y1, y2+1)
if x == left or x == right:
adjpy = ((maxy- h)/h/2)
y = random.choice([top, bottom] * adjpy + [withiny])
else:
y = random.choice([top, bottom])
return x, y
The following plot has 10'000 points to illustrate the uniform placement of points (the points overlaying the box' border are due to point size).
Disclaimer: Note that this plot places the red box in the very middle such thattop/bottom, left/right have the same probability among each other. The adjustment thus is relative to the blocking box, but not for all areas of the graph. A final solution requires to adjust the probabilities for each of these separately.
Simpler solution, yet slightly modified problem
It turns out that adjusting the probabilities for different areas of the coordinate system is quite tricky. After some thinking I came up with a slightly modified approach:
Realizing that on any 2D coordinate system blocking out a rectangle divides the area into N sub-areas (N=8 in the case of the question) where a valid coordinate can be chosen. Looking at it this way, we can define the valid sub-areas as boxes of coordinates. Then we can choose a box at random and a coordinate at random from within that box:
def gen_rand_limit(p1, dim):
x1, y1 = p1
w, h = dim
x2, y2 = x1 + w, y1 + h
# generate x, y O(1)
boxes = (
((0,0),(x1,y1)), ((x1,0),(x2,y1)), ((x2,0),(maxx,y1)),
((0,y1),(x1,y2)), ((x2,y1),(maxx,y2)),
((0,y2),(x1,maxy)), ((x1,y2),(x2,maxy)), ((x2,y2),(maxx,maxy)),
)
box = boxes[random.randrange(len(boxes))]
x = random.randrange(box[0][0], box[1][0])
y = random.randrange(box[0][1], box[1][1])
return x, y
Note this is not generalized as the blocked box may not be in the middle hence boxes would look different. As this results in each box chosen with the same probability, we get the same number of points in each box. Obviously the densitiy is higher in smaller boxes:
If the requirement is to generate a uniform distribution among all possible coordinates, the solution is to calculate boxes such that each box is about the same size as the blocking box. YMMV

I've already posted a different answer that I still like, as it is simple and
clear, and not necessarily slow... at any rate it's not exactly what the OP asked for.
I thought about it and I devised an algorithm for solving the OP's problem within their constraints:
partition the screen in 9 rectangles around and comprising the "hole".
consider the 8 rectangles ("tiles") around the central hole"
for each tile, compute the origin (x, y), the height and the area in pixels
compute the cumulative sum of the areas of the tiles, as well as the total area of the tiles
for each extraction, choose a random number between 0 and the total area of the tiles (inclusive and exclusive)
using the cumulative sums determine in which tile the random pixel lies
using divmod determine the column and the row (dx, dy) in the tile
using the origins of the tile in the screen coordinates, compute the random pixel in screen coordinates.
To implement the ideas above, in which there is an initialization phase in which we compute static data and a phase in which we repeatedly use those data, the natural data structure is a class, and here it is my implementation
from random import randrange
class make_a_hole_in_the_screen():
def __init__(self, screen, hole_orig, hole_sizes):
xs, ys = screen
x, y = hole_orig
wx, wy = hole_sizes
tiles = [(_y,_x*_y) for _x in [x,wx,xs-x-wx] for _y in [y,wy,ys-y-wy]]
self.tiles = tiles[:4] + tiles[5:]
self.pixels = [tile[1] for tile in self.tiles]
self.total = sum(self.pixels)
self.boundaries = [sum(self.pixels[:i+1]) for i in range(8)]
self.x = [0, 0, 0,
x, x,
x+wx, x+wx, x+wx]
self.y = [0, y, y+wy,
0, y+wy,
0, y, y+wy]
def choose(self):
n = randrange(self.total)
for i, tile in enumerate(self.tiles):
if n < self.boundaries[i]: break
n1 = n - ([0]+self.boundaries)[i]
dx, dy = divmod(n1,self.tiles[i][0])
return self.x[i]+dx, self.y[i]+dy
To test the correctness of the implementation, here it is a rough check that I
run on python 2.7,
drilled_screen = make_a_hole_in_the_screen((200,100),(30,50),(20,30))
for i in range(1000000):
x, y = drilled_screen.choose()
if 30<=x<50 and 50<=y<80: print "***", x, y
if x<0 or x>=200 or y<0 or y>=100: print "+++", x, y
A possible optimization consists in using a bisection algorithm to find the relevant tile in place of the simpler linear search that I've implemented.

It requires a bit of thought to generate a uniformly random point with these constraints. The simplest brute force way I can think of is to generate a list of all valid points and use random.choice() to select from this list. This uses a few MB of memory for the list, but generating a point is very fast:
import random
screen_width = 1000
screen_height = 800
rect_x = 500
rect_y = 250
rect_width = 100
rect_height = 75
valid_points = []
for x in range(screen_width):
if rect_x <= x < (rect_x + rect_width):
for y in range(rect_y):
valid_points.append( (x, y) )
for y in range(rect_y + rect_height, screen_height):
valid_points.append( (x, y) )
else:
for y in range(screen_height):
valid_points.append( (x, y) )
for i in range(10):
rand_point = random.choice(valid_points)
print(rand_point)
It is possible to generate a random number and map it to a valid point on the screen, which uses less memory, but it is a bit messy and takes more time to generate the point. There might be a cleaner way to do this, but one approach using the same screen size variables as above is here:
rand_max = (screen_width * screen_height) - (rect_width * rect_height)
def rand_point():
rand_raw = random.randint(0, rand_max-1)
x = rand_raw % screen_width
y = rand_raw // screen_width
if rect_y <= y < rect_y+rect_height and rect_x <= x < rect_x+rect_width:
rand_raw = rand_max + (y-rect_y) * rect_width + (x-rect_x)
x = rand_raw % screen_width
y = rand_raw // screen_width
return (x, y)
The logic here is similar to the inverse of the way that screen addresses are calculated from x and y coordinates on old 8 and 16 bit microprocessors. The variable rand_max is equal to the number of valid screen coordinates. The x and y co-ordinates of the pixel are calculated, and if it is within the rectangle the pixel is pushed above rand_max, into the region that couldn't be generated with the first call.
If you don't care too much about the point being uniformly random, this solution is easy to implement and very quick. The x values are random, but the Y value is constrained if the chosen X is in the column with the rectangle, so the pixels above and below the rectangle will have a higher probability of being chosen than pizels to the left and right of the rectangle:
def pseudo_rand_point():
x = random.randint(0, screen_width-1)
if rect_x <= x < rect_x + rect_width:
y = random.randint(0, screen_height-rect_height-1)
if y >= rect_y:
y += rect_height
else:
y = random.randint(0, screen_height-1)
return (x, y)
Another answer was calculating the probability that the pixel is in certain regions of the screen, but their answer isn't quite correct yet. Here's a version using a similar idea, calculate the probability that the pixel is in a given region and then calculate where it is within that region:
valid_screen_pixels = screen_width*screen_height - rect_width * rect_height
prob_left = float(rect_x * screen_height) / valid_screen_pixels
prob_right = float((screen_width - rect_x - rect_width) * screen_height) / valid_screen_pixels
prob_above_rect = float(rect_y) / (screen_height-rect_height)
def generate_rand():
ymin, ymax = 0, screen_height-1
xrand = random.random()
if xrand < prob_left:
xmin, xmax = 0, rect_x-1
elif xrand > (1-prob_right):
xmin, xmax = rect_x+rect_width, screen_width-1
else:
xmin, xmax = rect_x, rect_x+rect_width-1
yrand = random.random()
if yrand < prob_above_rect:
ymax = rect_y-1
else:
ymin=rect_y+rect_height
x = random.randrange(xmin, xmax)
y = random.randrange(ymin, ymax)
return (x, y)

If it's the generation of random you want to avoid, rather than the loop, you can do the following:
Generate a pair of random floating point coordinates in [0,1]
Scale the coordinates to give a point in the outer rectangle.
If your point is outside the inner rectangle, return it
Rescale to map the inner rectangle to the outer rectangle
Goto step 3
This will work best if the inner rectangle is small as compared to the outer rectangle. And it should probably be limited to only going through the loop some maximum number of times before generating new random and trying again.

Locating the centroid (center of mass) of spherical polygons

I'm trying to work out how best to locate the centroid of an arbitrary shape draped over a unit sphere, with the input being ordered (clockwise or anti-cw) vertices for the shape boundary. The density of vertices is irregular along the boundary, so the arc-lengths between them are not generally equal. Because the shapes may be very large (half a hemisphere) it is generally not possible to simply project the vertices to a plane and use planar methods, as detailed on Wikipedia (sorry I'm not allowed more than 2 hyperlinks as a newcomer). A slightly better approach involves the use of planar geometry manipulated in spherical coordinates, but again, with large polygons this method fails, as nicely illustrated here. On that same page, 'Cffk' highlighted this paper which describes a method for calculating the centroid of spherical triangles. I've tried to implement this method, but without success, and I'm hoping someone can spot the problem?
I have kept the variable definitions similar to those in the paper to make it easier to compare. The input (data) is a list of longitude/latitude coordinates, converted to [x,y,z] coordinates by the code. For each of the triangles I have arbitrarily fixed one point to be the +z-pole, the other two vertices being composed of a pair of neighboring points along the polygon boundary. The code steps along the boundary (starting at an arbitrary point), using each boundary segment of the polygon as a triangle side in turn. A sub-centroid is determined for each of these individual spherical triangles and they are weighted according to triangle area and added to calculate the total polygon centroid. I don't get any errors when running the code, but the total centroids returned are clearly wrong (I have run some very basic shapes where the centroid location is unambiguous). I haven't found any sensible pattern in the location of the centroids returned...so at the moment I'm not sure what is going wrong, either in the math or code (although, the suspicion is the math).
The code below should work copy-paste as is if you would like to try it. If you have matplotlib and numpy installed, it will plot the results (it will ignore plotting if you don't). You just have to put the longitude/latitude data below the code into a text file called example.txt.
from math import *
try:
import matplotlib as mpl
import matplotlib.pyplot
from mpl_toolkits.mplot3d import Axes3D
import numpy
plotting_enabled = True
except ImportError:
plotting_enabled = False
def sph_car(point):
if len(point) == 2:
point.append(1.0)
rlon = radians(float(point[0]))
rlat = radians(float(point[1]))
x = cos(rlat) * cos(rlon) * point[2]
y = cos(rlat) * sin(rlon) * point[2]
z = sin(rlat) * point[2]
return [x, y, z]
def xprod(v1, v2):
x = v1[1] * v2[2] - v1[2] * v2[1]
y = v1[2] * v2[0] - v1[0] * v2[2]
z = v1[0] * v2[1] - v1[1] * v2[0]
return [x, y, z]
def dprod(v1, v2):
dot = 0
for i in range(3):
dot += v1[i] * v2[i]
return dot
def plot(poly_xyz, g_xyz):
fig = mpl.pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
# plot the unit sphere
u = numpy.linspace(0, 2 * numpy.pi, 100)
v = numpy.linspace(-1 * numpy.pi / 2, numpy.pi / 2, 100)
x = numpy.outer(numpy.cos(u), numpy.sin(v))
y = numpy.outer(numpy.sin(u), numpy.sin(v))
z = numpy.outer(numpy.ones(numpy.size(u)), numpy.cos(v))
ax.plot_surface(x, y, z, rstride=4, cstride=4, color='w', linewidth=0,
alpha=0.3)
# plot 3d and flattened polygon
x, y, z = zip(*poly_xyz)
ax.plot(x, y, z)
ax.plot(x, y, zs=0)
# plot the alleged 3d and flattened centroid
x, y, z = g_xyz
ax.scatter(x, y, z, c='r')
ax.scatter(x, y, 0, c='r')
# display
ax.set_xlim3d(-1, 1)
ax.set_ylim3d(-1, 1)
ax.set_zlim3d(0, 1)
mpl.pyplot.show()
lons, lats, v = list(), list(), list()
# put the two-column data at the bottom of the question into a file called
# example.txt in the same directory as this script
with open('example.txt') as f:
for line in f.readlines():
sep = line.split()
lons.append(float(sep[0]))
lats.append(float(sep[1]))
# convert spherical coordinates to cartesian
for lon, lat in zip(lons, lats):
v.append(sph_car([lon, lat, 1.0]))
# z unit vector/pole ('north pole'). This is an arbitrary point selected to act as one
#(fixed) vertex of the summed spherical triangles. The other two vertices of any
#triangle are composed of neighboring vertices from the polygon boundary.
np = [0.0, 0.0, 1.0]
# Gx,Gy,Gz are the cartesian coordinates of the calculated centroid
Gx, Gy, Gz = 0.0, 0.0, 0.0
for i in range(-1, len(v) - 1):
# cycle through the boundary vertices of the polygon, from 0 to n
if all((v[i][0] != v[i+1][0],
v[i][1] != v[i+1][1],
v[i][2] != v[i+1][2])):
# this just ignores redundant points which are common in my larger input files
# A,B,C are the internal angles in the triangle: 'np-v[i]-v[i+1]-np'
A = asin(sqrt((dprod(np, xprod(v[i], v[i+1])))**2
/ ((1 - (dprod(v[i+1], np))**2) * (1 - (dprod(np, v[i]))**2))))
B = asin(sqrt((dprod(v[i], xprod(v[i+1], np)))**2
/ ((1 - (dprod(np , v[i]))**2) * (1 - (dprod(v[i], v[i+1]))**2))))
C = asin(sqrt((dprod(v[i + 1], xprod(np, v[i])))**2
/ ((1 - (dprod(v[i], v[i+1]))**2) * (1 - (dprod(v[i+1], np))**2))))
# A/B/Cbar are the vertex angles, such that if 'O' is the sphere center, Abar
# is the angle (v[i]-O-v[i+1])
Abar = acos(dprod(v[i], v[i+1]))
Bbar = acos(dprod(v[i+1], np))
Cbar = acos(dprod(np, v[i]))
# e is the 'spherical excess', as defined on wikipedia
e = A + B + C - pi
# mag1/2/3 are the magnitudes of vectors np,v[i] and v[i+1].
mag1 = 1.0
mag2 = float(sqrt(v[i][0]**2 + v[i][1]**2 + v[i][2]**2))
mag3 = float(sqrt(v[i+1][0]**2 + v[i+1][1]**2 + v[i+1][2]**2))
# vec1/2/3 are cross products, defined here to simplify the equation below.
vec1 = xprod(np, v[i])
vec2 = xprod(v[i], v[i+1])
vec3 = xprod(v[i+1], np)
# multiplying vec1/2/3 by e and respective internal angles, according to the
#posted paper
for x in range(3):
vec1[x] *= Cbar / (2 * e * mag1 * mag2
* sqrt(1 - (dprod(np, v[i])**2)))
vec2[x] *= Abar / (2 * e * mag2 * mag3
* sqrt(1 - (dprod(v[i], v[i+1])**2)))
vec3[x] *= Bbar / (2 * e * mag3 * mag1
* sqrt(1 - (dprod(v[i+1], np)**2)))
Gx += vec1[0] + vec2[0] + vec3[0]
Gy += vec1[1] + vec2[1] + vec3[1]
Gz += vec1[2] + vec2[2] + vec3[2]
approx_expected_Gxyz = (0.78, -0.56, 0.27)
print('Approximate Expected Gxyz: {0}\n'
' Actual Gxyz: {1}'
''.format(approx_expected_Gxyz, (Gx, Gy, Gz)))
if plotting_enabled:
plot(v, (Gx, Gy, Gz))
Thanks in advance for any suggestions or insight.
EDIT: Here is a figure that shows a projection of the unit sphere with a polygon and the resulting centroid I calculate from the code. Clearly, the centroid is wrong as the polygon is rather small and convex but yet the centroid falls outside its perimeter.
EDIT: Here is a highly-similar set of coordinates to those above, but in the original [lon,lat] format I normally use (which is now converted to [x,y,z] by the updated code).
-39.366295 -1.633460
-47.282630 -0.740433
-53.912136 0.741380
-59.004217 2.759183
-63.489005 5.426812
-68.566001 8.712068
-71.394853 11.659135
-66.629580 15.362600
-67.632276 16.827507
-66.459524 19.069327
-63.819523 21.446736
-61.672712 23.532143
-57.538431 25.947815
-52.519889 28.691766
-48.606227 30.646295
-45.000447 31.089437
-41.549866 32.139873
-36.605156 32.956277
-32.010080 34.156692
-29.730629 33.756566
-26.158767 33.714080
-25.821513 34.179648
-23.614658 36.173719
-20.896869 36.977645
-17.991994 35.600074
-13.375742 32.581447
-9.554027 28.675497
-7.825604 26.535234
-7.825604 26.535234
-9.094304 23.363132
-9.564002 22.527385
-9.713885 22.217165
-9.948596 20.367878
-10.496531 16.486580
-11.151919 12.666850
-12.350144 8.800367
-15.446347 4.993373
-20.366139 1.132118
-24.784805 -0.927448
-31.532135 -1.910227
-39.366295 -1.633460
EDIT: A couple more examples...with 4 vertices defining a perfect square centered at [1,0,0] I get the expected result:
However, from a non-symmetric triangle I get a centroid that is nowhere close...the centroid actually falls on the far side of the sphere (here projected onto the front side as the antipode):
Interestingly, the centroid estimation appears 'stable' in the sense that if I invert the list (go from clockwise to counterclockwise order or vice-versa) the centroid correspondingly inverts exactly.

Anybody finding this, make sure to check Don Hatch's answer which is probably better.
I think this will do it. You should be able to reproduce this result by just copy-pasting the code below.
You will need to have the latitude and longitude data in a file called longitude and latitude.txt. You can copy-paste the original sample data which is included below the code.
If you have mplotlib it will additionally produce the plot below
For non-obvious calculations, I included a link that explains what is going on
In the graph below, the reference vector is very short (r = 1/10) so that the 3d-centroids are easier to see. You can easily remove the scaling to maximize accuracy.
Note to op: I rewrote almost everything so I'm not sure exactly where the original code was not working. However, at least I think it was not taking into consideration the need to handle clockwise / counterclockwise triangle vertices.
Legend:
(black line) reference vector
(small red dots) spherical triangle 3d-centroids
(large red / blue / green dot) 3d-centroid / projected to the surface / projected to the xy plane
(blue / green lines) the spherical polygon and the projection onto the xy plane
from math import *
try:
import matplotlib as mpl
import matplotlib.pyplot
from mpl_toolkits.mplot3d import Axes3D
import numpy
plotting_enabled = True
except ImportError:
plotting_enabled = False
def main():
# get base polygon data based on unit sphere
r = 1.0
polygon = get_cartesian_polygon_data(r)
point_count = len(polygon)
reference = ok_reference_for_polygon(polygon)
# decompose the polygon into triangles and record each area and 3d centroid
areas, subcentroids = list(), list()
for ia, a in enumerate(polygon):
# build an a-b-c point set
ib = (ia + 1) % point_count
b, c = polygon[ib], reference
if points_are_equivalent(a, b, 0.001):
continue # skip nearly identical points
# store the area and 3d centroid
areas.append(area_of_spherical_triangle(r, a, b, c))
tx, ty, tz = zip(a, b, c)
subcentroids.append((sum(tx)/3.0,
sum(ty)/3.0,
sum(tz)/3.0))
# combine all the centroids, weighted by their areas
total_area = sum(areas)
subxs, subys, subzs = zip(*subcentroids)
_3d_centroid = (sum(a*subx for a, subx in zip(areas, subxs))/total_area,
sum(a*suby for a, suby in zip(areas, subys))/total_area,
sum(a*subz for a, subz in zip(areas, subzs))/total_area)
# shift the final centroid to the surface
surface_centroid = scale_v(1.0 / mag(_3d_centroid), _3d_centroid)
plot(polygon, reference, _3d_centroid, surface_centroid, subcentroids)
def get_cartesian_polygon_data(fixed_radius):
cartesians = list()
with open('longitude and latitude.txt') as f:
for line in f.readlines():
spherical_point = [float(v) for v in line.split()]
if len(spherical_point) == 2:
spherical_point.append(fixed_radius)
cartesians.append(degree_spherical_to_cartesian(spherical_point))
return cartesians
def ok_reference_for_polygon(polygon):
point_count = len(polygon)
# fix the average of all vectors to minimize float skew
polyx, polyy, polyz = zip(*polygon)
# /10 is for visualization. Remove it to maximize accuracy
return (sum(polyx)/(point_count*10.0),
sum(polyy)/(point_count*10.0),
sum(polyz)/(point_count*10.0))
def points_are_equivalent(a, b, vague_tolerance):
# vague tolerance is something like a percentage tolerance (1% = 0.01)
(ax, ay, az), (bx, by, bz) = a, b
return all(((ax-bx)/ax < vague_tolerance,
(ay-by)/ay < vague_tolerance,
(az-bz)/az < vague_tolerance))
def degree_spherical_to_cartesian(point):
rad_lon, rad_lat, r = radians(point[0]), radians(point[1]), point[2]
x = r * cos(rad_lat) * cos(rad_lon)
y = r * cos(rad_lat) * sin(rad_lon)
z = r * sin(rad_lat)
return x, y, z
def area_of_spherical_triangle(r, a, b, c):
# points abc
# build an angle set: A(CAB), B(ABC), C(BCA)
# http://math.stackexchange.com/a/66731/25581
A, B, C = surface_points_to_surface_radians(a, b, c)
E = A + B + C - pi # E is called the spherical excess
area = r**2 * E
# add or subtract area based on clockwise-ness of a-b-c
# http://stackoverflow.com/a/10032657/377366
if clockwise_or_counter(a, b, c) == 'counter':
area *= -1.0
return area
def surface_points_to_surface_radians(a, b, c):
"""build an angle set: A(cab), B(abc), C(bca)"""
points = a, b, c
angles = list()
for i, mid in enumerate(points):
start, end = points[(i - 1) % 3], points[(i + 1) % 3]
x_startmid, x_endmid = xprod(start, mid), xprod(end, mid)
ratio = (dprod(x_startmid, x_endmid)
/ ((mag(x_startmid) * mag(x_endmid))))
angles.append(acos(ratio))
return angles
def clockwise_or_counter(a, b, c):
ab = diff_cartesians(b, a)
bc = diff_cartesians(c, b)
x = xprod(ab, bc)
if x < 0:
return 'clockwise'
elif x > 0:
return 'counter'
else:
raise RuntimeError('The reference point is in the polygon.')
def diff_cartesians(positive, negative):
return tuple(p - n for p, n in zip(positive, negative))
def xprod(v1, v2):
x = v1[1] * v2[2] - v1[2] * v2[1]
y = v1[2] * v2[0] - v1[0] * v2[2]
z = v1[0] * v2[1] - v1[1] * v2[0]
return [x, y, z]
def dprod(v1, v2):
dot = 0
for i in range(3):
dot += v1[i] * v2[i]
return dot
def mag(v1):
return sqrt(v1[0]**2 + v1[1]**2 + v1[2]**2)
def scale_v(scalar, v):
return tuple(scalar * vi for vi in v)
def plot(polygon, reference, _3d_centroid, surface_centroid, subcentroids):
fig = mpl.pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
# plot the unit sphere
u = numpy.linspace(0, 2 * numpy.pi, 100)
v = numpy.linspace(-1 * numpy.pi / 2, numpy.pi / 2, 100)
x = numpy.outer(numpy.cos(u), numpy.sin(v))
y = numpy.outer(numpy.sin(u), numpy.sin(v))
z = numpy.outer(numpy.ones(numpy.size(u)), numpy.cos(v))
ax.plot_surface(x, y, z, rstride=4, cstride=4, color='w', linewidth=0,
alpha=0.3)
# plot 3d and flattened polygon
x, y, z = zip(*polygon)
ax.plot(x, y, z, c='b')
ax.plot(x, y, zs=0, c='g')
# plot the 3d centroid
x, y, z = _3d_centroid
ax.scatter(x, y, z, c='r', s=20)
# plot the spherical surface centroid and flattened centroid
x, y, z = surface_centroid
ax.scatter(x, y, z, c='b', s=20)
ax.scatter(x, y, 0, c='g', s=20)
# plot the full set of triangular centroids
x, y, z = zip(*subcentroids)
ax.scatter(x, y, z, c='r', s=4)
# plot the reference vector used to findsub centroids
x, y, z = reference
ax.plot((0, x), (0, y), (0, z), c='k')
ax.scatter(x, y, z, c='k', marker='^')
# display
ax.set_xlim3d(-1, 1)
ax.set_ylim3d(-1, 1)
ax.set_zlim3d(0, 1)
mpl.pyplot.show()
# run it in a function so the main code can appear at the top
main()
Here is the longitude and latitude data you can paste into longitude and latitude.txt
-39.366295 -1.633460
-47.282630 -0.740433
-53.912136 0.741380
-59.004217 2.759183
-63.489005 5.426812
-68.566001 8.712068
-71.394853 11.659135
-66.629580 15.362600
-67.632276 16.827507
-66.459524 19.069327
-63.819523 21.446736
-61.672712 23.532143
-57.538431 25.947815
-52.519889 28.691766
-48.606227 30.646295
-45.000447 31.089437
-41.549866 32.139873
-36.605156 32.956277
-32.010080 34.156692
-29.730629 33.756566
-26.158767 33.714080
-25.821513 34.179648
-23.614658 36.173719
-20.896869 36.977645
-17.991994 35.600074
-13.375742 32.581447
-9.554027 28.675497
-7.825604 26.535234
-7.825604 26.535234
-9.094304 23.363132
-9.564002 22.527385
-9.713885 22.217165
-9.948596 20.367878
-10.496531 16.486580
-11.151919 12.666850
-12.350144 8.800367
-15.446347 4.993373
-20.366139 1.132118
-24.784805 -0.927448
-31.532135 -1.910227
-39.366295 -1.633460

To clarify: the quantity of interest is the projection of the true 3d centroid
(i.e. 3d center-of-mass, i.e. 3d center-of-area) onto the unit sphere.
Since all you care about is the direction from the origin to the 3d centroid,
you don't need to bother with areas at all;
it's easier to just compute the moment (i.e. 3d centroid times area).
The moment of the region to the left of a closed path on the unit sphere
is half the integral of the leftward unit vector as you walk around the path.
This follows from a non-obvious application of Stokes' theorem; see Frank Jones's vector calculus book, chapter 13 Problem 13-12.
In particular, for a spherical polygon, the moment is half the sum of
(a x b) / ||a x b|| * (angle between a and b) for each pair of consecutive vertices a,b.
(That's for the region to the left of the path;
negate it for the region to the right of the path.)
(And if you really did want the 3d centroid, just compute the area and divide the moment by it. Comparing areas might also be useful in choosing which of the two regions to call "the polygon".)
Here's some code; it's really simple:
#!/usr/bin/python
import math
def plus(a,b): return [x+y for x,y in zip(a,b)]
def minus(a,b): return [x-y for x,y in zip(a,b)]
def cross(a,b): return [a[1]*b[2]-a[2]*b[1], a[2]*b[0]-a[0]*b[2], a[0]*b[1]-a[1]*b[0]]
def dot(a,b): return sum([x*y for x,y in zip(a,b)])
def length(v): return math.sqrt(dot(v,v))
def normalized(v): l = length(v); return [1,0,0] if l==0 else [x/l for x in v]
def addVectorTimesScalar(accumulator, vector, scalar):
for i in xrange(len(accumulator)): accumulator[i] += vector[i] * scalar
def angleBetweenUnitVectors(a,b):
# https://www.plunk.org/~hatch/rightway.html
if dot(a,b) < 0:
return math.pi - 2*math.asin(length(plus(a,b))/2.)
else:
return 2*math.asin(length(minus(a,b))/2.)
def sphericalPolygonMoment(verts):
moment = [0.,0.,0.]
for i in xrange(len(verts)):
a = verts[i]
b = verts[(i+1)%len(verts)]
addVectorTimesScalar(moment, normalized(cross(a,b)),
angleBetweenUnitVectors(a,b) / 2.)
return moment
if __name__ == '__main__':
import sys
def lonlat_degrees_to_xyz(lon_degrees,lat_degrees):
lon = lon_degrees*(math.pi/180)
lat = lat_degrees*(math.pi/180)
coslat = math.cos(lat)
return [coslat*math.cos(lon), coslat*math.sin(lon), math.sin(lat)]
verts = [lonlat_degrees_to_xyz(*[float(v) for v in line.split()])
for line in sys.stdin.readlines()]
#print "verts = "+`verts`
moment = sphericalPolygonMoment(verts)
print "moment = "+`moment`
print "centroid unit direction = "+`normalized(moment)`
For the example polygon, this gives the answer (unit vector):
[-0.7644875430808217, 0.579935445918147, -0.2814847687566214]
This is roughly the same as, but more accurate than, the answer computed by #KobeJohn's code, which uses rough tolerances and planar approximations to the sub-centroids:
[0.7628095787179151, -0.5977153368303585, 0.24669398601094406]
The directions of the two answers are roughly opposite (so I guess KobeJohn's code
decided to take the region to the right of the path in this case).

I think a good approximation would be to compute the center of mass using weighted cartesian coordinates and projecting the result onto the sphere (supposing the origin of coordinates is (0, 0, 0)^T).
Let be (p[0], p[1], ... p[n-1]) the n points of the polygon. The approximative (cartesian) centroid can be computed by:
c = 1 / w * (sum of w[i] * p[i])
whereas w is the sum of all weights and whereas p[i] is a polygon point and w[i] is a weight for that point, e.g.
w[i] = |p[i] - p[(i - 1 + n) % n]| / 2 + |p[i] - p[(i + 1) % n]| / 2
whereas |x| is the length of a vector x.
I.e. a point is weighted with half the length to the previous and half the length to the next polygon point.
This centroid c can now projected onto the sphere by:
c' = r * c / |c|
whereas r is the radius of the sphere.
To consider orientation of polygon (ccw, cw) the result may be
c' = - r * c / |c|.

Sorry I (as a newly registered user) had to write a new post instead of just voting/commenting on the above answer by Don Hatch. Don's answer, I think, is the best and most elegant. It is mathematically rigorous in computing the center of mass (first moment of mass) in a simple way when applying to the spherical polygon.
Kobe John's answer is a good approximation but only satisfactory for smaller areas. I also noticed a few glitches in the code. Firstly, the reference point should be projected to the spherical surface to compute the actual spherical area. Secondly, function points_are_equivalent() might need to be refined to avoid divided-by-zero.
The approximation error in Kobe's method lies in the calculation of the centroid of spherical triangles. The sub-centroid is NOT the center of mass of the spherical triangle but the planar one. This is not an issue if one is to determine that single triangle (sign may flip, see below). It is also not an issue if triangles are small (e.g. a dense triangulation of the polygon).
A few simple tests could illustrate the approximation error. For example if we use just four points:
10 -20
10 20
-10 20
-10 -20
The exact answer is (1,0,0) and both methods are good. But if you throw in a few more points along one edge (e.g. add {10,-15},{10,-10}... to the first edge), you'll see the results from Kobe's method start to shift. Further more, if you increase the longitude from [10,-10] to [100,-100], you'll see Kobe's result flips the direction. A possible improvement might be to add another level(s) for sub-centroid calculation (basically refine/reduce sizes of triangles).
For our application, the spherical area boundary is composed of multiple arcs and thus not polygon (i.e. the arc is not part of great circle). But this will just be a little more work to find the n-vector in the curve integration.
EDIT: Replacing the subcentroid calculation with the one given in Brock's paper should fix Kobe's method. But I did not try though.

Drawing diagonal lines on an image

Hi im trying to draw diagonal lines across an image top right to bottom left here is my code so far.
width = getWidth(picture)
height = getHeight(picture)
for x in range(0, width):
for y in range(0, height):
pixel = getPixel(picture, x, y)
setColor(pixel, black)
Thanks

Most graphic libraries have some way to draw a line directly.
In JES there is the addLine function, so you could do
addLine(picture, 0, 0, width, height)
If you're stuck with setting single pixels, you should have a look at Bresenham Line Algorithm, which is one of the most efficient algorithms to draw lines.
A note to your code: What you're doing with two nested loops is the following
for each column in the picture
for each row in the current column
set the pixel in the current column and current row to black
so basically youre filling the entire image with black pixels.
EDIT
To draw multiple diagonal lines across the whole image (leaving a space between them), you could use the following loop
width = getWidth(picture)
height = getHeight(picture)
space = 10
for x in range(0, 2*width, space):
addLine(picture, x, 0, x-width, height)
This gives you an image like (the example is hand-drawn ...)
This makes use of the clipping functionality, most graphics libraries provide, i.e. parts of the line that are not within the image are simply ignored. Note that without 2*width (i.e. if x goes only up to with), only the upper left half of the lines would be drawn...

I would like to add some math considerations to the discussion...
(Just because it is sad that JES's addLine function draws black lines only and is quite limited...)
Note : The following code uses the Bresenham's Line Algorithm pointed out by MartinStettner (so thanks to him).
The Bresenham's line algorithm is an algorithm which determines which order to form a close approximation to a straight line between two given points. Since a pixel is an atomic entity, a line can only be drawn on a computer screen by using some kind of approximation.
Note : To understand the following code, you will need to remember a little bit of your basic school math courses (line equation & trigonometry).
Code :
# The following is fast implementation and contains side effects...
import random
# Draw point, with check if the point is in the image area
def drawPoint(pic, col, x, y):
if (x >= 0) and (x < getWidth(pic)) and (y >= 0) and (y < getHeight(pic)):
px = getPixel(pic, x, y)
setColor(px, col)
# Draw line segment, given two points
# From Bresenham's line algorithm
# http://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm
def drawLine(pic, col, x0, y0, x1, y1):
dx = abs(x1-x0)
dy = abs(y1-y0)
sx = sy = 0
#sx = 1 if x0 < x1 else -1
#sy = 1 if y0 < y1 else -1
if (x0 < x1):
sx = 1
else:
sx = -1
if (y0 < y1):
sy = 1
else:
sy = -1
err = dx - dy
while (True):
drawPoint(pic, col, x0, y0)
if (x0 == x1) and (y0 == y1):
break
e2 = 2 * err
if (e2 > -dy):
err = err - dy
x0 = x0 + sx
if (x0 == x1) and (y0 == y1):
drawPoint(pic, col, x0, y0)
break
if (e2 < dx):
err = err + dx
y0 = y0 + sy
# Draw infinite line from segment
def drawInfiniteLine(pic, col, x0, y0, x1, y1):
# y = m * x + b
m = (y0-y1) / (x0-x1)
# y0 = m * x0 + b => b = y0 - m * x0
b = y0 - m * x0
x0 = 0
y0 = int(m*x0 + b)
# get a 2nd point far away from the 1st one
x1 = getWidth(pic)
y1 = int(m*x1 + b)
drawLine(pic, col, x0, y0, x1, y1)
# Draw infinite line from origin point and angle
# Angle 'theta' expressed in degres
def drawInfiniteLineA(pic, col, x, y, theta):
# y = m * x + b
dx = y * tan(theta * pi / 180.0) # (need radians)
dy = y
if (dx == 0):
dx += 0.000000001 # Avoid to divide by zero
m = dy / dx
# y = m * x + b => b = y - m * x
b = y - m * x
# get a 2nd point far away from the 1st one
x1 = 2 * getWidth(pic)
y1 = m*x1 + b
drawInfiniteLine(pic, col, x, y, x1, y1)
# Draw multiple parallele lines, given offset and angle
def multiLines(pic, col, offset, theta, randOffset = 0):
# Range is [-2*width, 2*width] to cover the whole surface
for i in xrange(-2*getWidth(pic), 2*getWidth(pic), offset):
drawInfiniteLineA(pic, col, i + random.randint(0, randOffset), 1, theta)
# Draw multiple lines, given offset, angle and angle offset
def multiLinesA(pic, col, offsetX, offsetY, theta, offsetA):
j = 0
# Range is [-2*width, 2*width] to cover the whole surface
for i in xrange(-2*getWidth(pic), 2*getWidth(pic), offsetX):
drawInfiniteLineA(pic, col, i, j, theta)
j += offsetY
theta += offsetA
file = pickAFile()
picture = makePicture(file)
color = makeColor(0, 65, 65) #pickAColor()
#drawline(picture, color, 10, 10, 100, 100)
#drawInfiniteLine(picture, color, 10, 10, 100, 100)
#drawInfiniteLineA(picture, color, 50, 50, 135.0)
#multiLines(picture, color, 20, 56.0)
#multiLines(picture, color, 10, 56.0, 15)
multiLinesA(picture, color, 10, 2, 1.0, 1.7)
show(picture)
Output (Painting by Pierre Soulages) :
Hope this gave some fun and ideas to JES students... And to others as well...

Where does your picture object comes from? What is it? What is not working so far? And what library for image access are you trying to use? (I mean, where do you get, or intend to get "getWidth, getHeight, getPixel, setColor) from?
I think no library that gives you a "pixel" as a whole object which can be used in a setColor call exists, and if it does, it would be the slowest thing in the World - maybe in the galaxy.
On the other hand, if these methods did exist and your Picture, the code above would cover all the image in black - you are getting all possible "y" values (from 0 to height) inside all possible x values (from 0 to width) of the image, and coloring each Black.
Drawing a line would require you to change x, and y at the same time, more like:
(using another "imaginary library", but one more plausible:
for x, y in zip(range(0, width), range(0, height)):
picture.setPixel((x,y), Black) )
This would sort of work, but the line would not be perfect unless the image was perfectly square - else it would skip pixels in the widest direction of the image. To solve that a more refined algorithm is needed - but that is second to you have a real way to access pixels on an image - like using Python's Imaging Library (PIL or Pillow), or pygame, or some other library.

Finding the full width half maximum of a peak

I have been trying to figure out the full width half maximum (FWHM) of the the blue peak (see image). The green peak and the magenta peak combined make up the blue peak. I have been using the following equation to find the FWHM of the green and magenta peaks: fwhm = 2*np.sqrt(2*(math.log(2)))*sd where sd = standard deviation. I created the green and magenta peaks and I know the standard deviation which is why I can use that equation.
I created the green and magenta peaks using the following code:
def make_norm_dist(self, x, mean, sd):
import numpy as np
norm = []
for i in range(x.size):
norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]
return np.array(norm)
If I did not know the blue peak was made up of two peaks and I only had the blue peak in my data, how would I find the FWHM?
I have been using this code to find the peak top:
peak_top = 0.0e-1000
for i in x_axis:
if i > peak_top:
peak_top = i
I could divide the peak_top by 2 to find the half height and then try and find y-values corresponding to the half height, but then I would run into trouble if there are no x-values exactly matching the half height.
I am pretty sure there is a more elegant solution to the one I am trying.

You can use spline to fit the [blue curve - peak/2], and then find it's roots:
import numpy as np
from scipy.interpolate import UnivariateSpline
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
import pylab as pl
pl.plot(x, blue)
pl.axvspan(r1, r2, facecolor='g', alpha=0.5)
pl.show()
Here is the result:

This worked for me in iPython (quick and dirty, can be reduced to 3 lines):
def FWHM(X,Y):
half_max = max(Y) / 2.
#find when function crosses line half_max (when sign of diff flips)
#take the 'derivative' of signum(half_max - Y[])
d = sign(half_max - array(Y[0:-1])) - sign(half_max - array(Y[1:]))
#plot(X[0:len(d)],d) #if you are interested
#find the left and right most indexes
left_idx = find(d > 0)[0]
right_idx = find(d < 0)[-1]
return X[right_idx] - X[left_idx] #return the difference (full width)
Some additions can be made to make the resolution more accurate, but in the limit that there are many samples along the X axis and the data is not too noisy, this works great.
Even when the data are not Gaussian and a little noisy, it worked for me (I just take the first and last time half max crosses the data).

If your data has noise (and it always does in the real world), a more robust solution would be to fit a Gaussian to the data and extract FWHM from that:
import numpy as np
import scipy.optimize as opt
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
print "FWHM", FWHM
The plotted image can be generated by:
from pylab import *
plot(X,Y)
plot(X, gauss(X,p1),lw=3,alpha=.5, color='r')
axvspan(fit_mu-FWHM/2, fit_mu+FWHM/2, facecolor='g', alpha=0.5)
show()
An even better approximation would filter out the noisy data below a given threshold before the fit.

Here is a nice little function using the spline approach.
from scipy.interpolate import splrep, sproot, splev
class MultiplePeaks(Exception): pass
class NoPeaksFound(Exception): pass
def fwhm(x, y, k=10):
"""
Determine full-with-half-maximum of a peaked set of points, x and y.
Assumes that there is only one peak present in the datasset. The function
uses a spline interpolation of order k.
"""
half_max = amax(y)/2.0
s = splrep(x, y - half_max, k=k)
roots = sproot(s)
if len(roots) > 2:
raise MultiplePeaks("The dataset appears to have multiple peaks, and "
"thus the FWHM can't be determined.")
elif len(roots) < 2:
raise NoPeaksFound("No proper peaks were found in the data set; likely "
"the dataset is flat (e.g. all zeros).")
else:
return abs(roots[1] - roots[0])

You should use scipy to solve it: first find_peaks and then peak_widths.
With default value in rel_height(0.5) you're measuring the width at half maximum of the peak.

If you prefer interpolation over fitting:
import numpy as np
def get_full_width(x: np.ndarray, y: np.ndarray, height: float = 0.5) -> float:
height_half_max = np.max(y) * height
index_max = np.argmax(y)
x_low = np.interp(height_half_max, y[:index_max+1], x[:index_max+1])
x_high = np.interp(height_half_max, np.flip(y[index_max:]), np.flip(x[index_max:]))
return x_high - x_low

For monotonic functions with many data points and if there's no need for perfect accuracy, I would use:
def FWHM(X, Y):
deltax = x[1] - x[0]
half_max = max(Y) / 2.
l = np.where(y > half_max, 1, 0)
return np.sum(l) * deltax

I implemented an empirical solution which works for noisy and not-quite-Gaussian data fairly well in haggis.math.full_width_half_max. The usage is extremely straightforward:
fwhm = full_width_half_max(x, y)
The function is robust: it simply finds the maximum of the data and the nearest points crossing the "halfway down" threshold using the requested interpolation scheme.
Here are a couple of examples using data from the other answers.
#HYRY's smooth data
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(x, blue, return_points=True)
# Print comparison
print('HYRY:', r2 - r1, 'MP:', fwhm)
plt.plot(x, blue)
plt.axvspan(r1, r2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For smooth data, the results are pretty exact:
HYRY: 26.891157007233254 MP: 26.891193606203814
#Hooked's Noisy Data
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(X, Y, return_points=True)
# Print comparison
print('Hooked:', FWHM, 'MP:', fwhm)
plt.plot(X, Y)
plt.plot(X, gauss(X, p1), lw=3, alpha=.5, color='r')
plt.axvspan(fit_mu - FWHM / 2, fit_mu + FWHM / 2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For noisy data (with a biased baseline), the results are not as consistent.
Hooked: 1.9903193212254346 MP: 1.5039676990530118
On the one hand the Gaussian fit is not very optimal for the data, but on the other hand, the strategy of picking the nearest point that intersects the half-max threshold is likely not optimal either.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.