I am trying to write a program that determines the pearson correlation coefficient with population standard deviation in python. I thought this would be pretty trivial until I got to the part where I was summing (yi - μy)*(xi - μx). Here is my full code:
def r(x, y):
mx, my = sum(x) / len(x), sum(y) / len(y)
sdx, sdy = (sum([(xi-mx)**2 for xi in x]) / len(x))**0.5, (sum([(yi-
my)**2 for yi in y]) / (len(y)))**0.5
res = ((sum([(xi-mx)*(yi-my) for xi in x for yi in y]))/(len(x)*sdx*sdy))**0.5
return res
I noticed the result was super small, so I checked out the sum of (xi-mx):
sum([(xi-mx) for xi in x])
and the result was -9.769962616701378e-15. Here are the values in the list:
print([(xi-mx) for xi in x])
[3.2699999999999987, 3.0699999999999994, 1.2699999999999987, 1.0699999999999985, 0.9699999999999989, 0.2699999999999987, -0.7300000000000013, -1.7300000000000013, -2.7300000000000013, -4.730000000000001]
Can anyone explain why python is behaving so strangely with this?
res = (sum([(xi-mx)*(yi-my) for xi in x for yi in y]))/(len(x)*sdx*sdy)
That isn't doing what you think it does. When calculating the numerator of Pearson's correlation coefficient, (xi - mx) * (yi - my) should be paired sequentially.
Using zip should fix it.
res = (sum([(xi-mx)*(yi-my) for xi, yi in zip(x, y)]))/(len(x)*sdx*sdy)
This is what I'm getting:
def r(x, y):
mx, my = sum(x) / len(x), sum(y) / len(y)
sdx, sdy = (sum([(xi-mx)**2 for xi in x]) / len(x))**0.5, (sum([(yi-
my)**2 for yi in y]) / (len(y)))**0.5
res = (sum([(xi-mx)*(yi-my) for xi, yi in zip(x, y)]))/(len(x)*sdx*sdy)
return res
r(x, y) # 0.6124721937208479
What does for xi in x for yi in y really do?
>>> x, y = [1, 2, 3], [4, 5, 6]
>>> [(xi, yi) for xi in x for yi in y]
[(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6)]
So there's repetition going on. (Actually generating a list of combinations.) You can use zip to aggregate values into pairs:
>>> [*zip(x, y)]
[(1, 4), (2, 5), (3, 6)]
The sum of the numbers you showed is in fact close to 0. Why is that strange? In fact, it must be close to 0. Regardless of the values in x to begin with, mathematically
sum(xi - mean(x) for xi in x) =
sum(xi for xi in x) - sum(mean(x) for xi in x) =
len(x) * mean(x) - len(x) * mean(x) =
0
That the numeric result isn't exactly 0 is simply due to floating-point rounding errors.
Related
Trying to create a list of tuples like so:
m=[-1,0,1]
[(self._x+x,self._y) for x in m for y in m]
But I want to exclude the tuple when both x and y are equal to 0. I've tried:
[(self._x+x,self._y) for x in m for y in m if x!=0 and y!=0]
but it doesn't work.
Thanks.
m = [-1, 0, 1]
# so you want all combinations x and y out of m
# but only if x and y are not both 0 at the same time?
[ (x, y) for x in m for y in m if not (x == 0 and y == 0) ]
## => [(-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 1), (1, -1), (1, 0), (1, 1)]
Your code's problem are the self. - which occurs only within class definitions.
self._x + x is x from the for loop plus a certain self._x value, whatever that is. self._y is not y.
What you want with this actually? you don't use y from the second for loop at all in your code.
I have a matrix variable in size where 1 indicates the cell such as:
Cells = [[0,0,0,0,0],
[0,0,0,0,0],
[0,0,1,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
]
I need to find neigbours in a parametric sized diamond shape. Not a box as answer given in here or not a fixed sized 1 diamond, answer given here. For example, N=2 I want to know the column, rows for below:
Mask = [[0,0,1,0,0],
[0,1,1,1,0],
[1,1,0,1,1],
[0,1,1,1,0],
[0,0,1,0,0],
]
The function should receive x and y for the requested column and row, (for above I will input 2,2) and N (input 2) the size of diamond. The function should return list of tuples (x,y) for the given diamond size.
I struggled at defining the shape as a function of x, y and k in for loops. I need to know both numpy (if there is anything that helps) and non-numpy solution.
For an iterative approach where you just construct the diamond:
def get_neighbors(center, n=1):
ret = []
for dx in range(-n, n + 1):
ydiff = n - abs(dx)
for dy in range(-ydiff, ydiff + 1):
ret.append((center[0] + dx, center[1] + dy))
return ret
Result of get_neighbors((2, 2), 2):
[(0, 2), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (4, 2)]
Or, for a recursive approach:
dirs = [(1, 0), (0, 1), (-1, 0), (0, -1)]
def add_tuples(a, b):
return tuple([x + y for (x, y) in zip(a, b)])
def get_neighbors(center, n=1, seen=set()):
seen.add(center)
if n <= 0:
return seen
for dir in dirs:
newpos = add_tuples(center, dir)
if newpos in seen:
continue
get_neighbors(newpos, n - 1, seen)
return seen
I would start by taking out a "sub-matrix" that is the smallest square that can contain your result cells. This is the part that numpy should be able to help with.
Then define a function that calculates the manhattan distance between two cells (abs(x - x_p) + abs(y - y_p)) and iterate through the cells of your sub-matrix and return the values with a manhattan distance of less than N from your origin.
Make mask with rotation
Convolute cell and mask
Fix the result
import numpy as np
from scipy.ndimage import rotate, convolve
import matplotlib.pyplot as plt
def diamond_filter(radius):
s = radius * 2 + 1
x = np.ones((s, s), dtype=int)
x[radius, radius] = 0
return rotate(x, angle=45)
def make_diamonds(x, radius):
filter = diamond_filter(radius)
out = convolve(x, filter)
out[out > 1] = 1
out -= x
out[out < 0] = 0
return out
def plot(x):
plt.imshow(x)
plt.show()
plt.close()
def main():
cell = np.random.choice([0, 1], size=(200, 200), p=[0.95, 0.05])
plot(diamond_filter(2))
plot(cell)
result = make_diamonds(cell, 2)
plot(result)
if __name__ == '__main__':
main()
I need to find all the lattice points inside and on a polygon.
Input:
from shapely.geometry import Polygon, mapping
sh_polygon = Polygon(((0,0), (2,0), (2,2), (0,2)))
Output:
(0, 0), (1, 0), (2, 0), (0, 1), (1, 1), (2, 1), (0, 2), (1, 2), (2, 2)
Please suggest if there is a way to get the expected result with or without using Shapely.
I have written this piece of code that gives points inside the polygon, but it doesn't give points on it. Also is there a better way to do the same thing:
from shapely.geometry import Polygon, Point
def get_random_point_in_polygon(poly):
(minx, miny, maxx, maxy) = poly.bounds
minx = int(minx)
miny = int(miny)
maxx = int(maxx)
maxy = int(maxy)
print("poly.bounds:", poly.bounds)
a = []
for x in range(minx, maxx+1):
for y in range(miny, maxy+1):
p = Point(x, y)
if poly.contains(p):
a.append([x, y])
return a
p = Polygon([(0,0), (2,0), (2,2), (0,2)])
point_in_poly = get_random_point_in_polygon(p)
print(len(point_in_poly))
print(point_in_poly)
Output:
poly.bounds: (0.0, 0.0, 2.0, 2.0)
1
[[1, 1]]
I have simplified my problem. Actually, I need to find all points inside and on a square with corners: (77,97), (141,101), (136,165), (73,160).
I would approach the problem as follows.
First, define a grid of lattice points. One could use, for example, itertools.product:
from itertools import product
from shapely.geometry import MultiPoint
points = MultiPoint(list(product(range(5), repeat=2)))
points = MultiPoint(list(product(range(10), range(5))))
or any NumPy solution from Cartesian product of x and y array points into single array of 2D points:
import numpy as np
x = np.linspace(0, 1, 5)
y = np.linspace(0, 1, 10)
points = MultiPoint(np.transpose([np.tile(x, len(y)), np.repeat(y, len(x))]))
Then, using intersection method of Shapely we can get those lattice points that lie both inside and on the boundary of the given polygon.
For your given example:
p = Polygon([(0, 0), (2, 0), (2, 2), (0, 2)])
xmin, ymin, xmax, ymax = p.bounds
x = np.arange(np.floor(xmin), np.ceil(xmax) + 1) # array([0., 1., 2.])
y = np.arange(np.floor(ymin), np.ceil(ymax) + 1) # array([0., 1., 2.])
points = MultiPoint(np.transpose([np.tile(x, len(y)), np.repeat(y, len(x))]))
result = points.intersection(p)
And for a bit more sophisticated example:
p = Polygon([(-4.85571368308564, 37.1753007358263),
(-4.85520937147867, 37.174925051829),
(-4.85259349198842, 37.1783463712614),
(-4.85258684662671, 37.1799609243756),
(-4.85347524651836, 37.1804461589773),
(-4.85343407576431, 37.182006629169),
(-4.85516283166052, 37.1842384372115),
(-4.85624511894443, 37.1837967179202),
(-4.85533824429553, 37.1783762575331),
(-4.85674599573635, 37.177038261295),
(-4.85571368308564, 37.1753007358263)])
xmin, ymin, xmax, ymax = p.bounds # -4.85674599573635, 37.174925051829, -4.85258684662671, 37.1842384372115
n = 1e3
x = np.arange(np.floor(xmin * n) / n, np.ceil(xmax * n) / n, 1 / n) # array([-4.857, -4.856, -4.855, -4.854, -4.853])
y = np.arange(np.floor(ymin * n) / n, np.ceil(ymax * n) / n, 1 / n) # array([37.174, 37.175, 37.176, 37.177, 37.178, 37.179, 37.18 , 37.181, 37.182, 37.183, 37.184, 37.185])
points = MultiPoint(np.transpose([np.tile(x, len(y)), np.repeat(y, len(x))]))
result = points.intersection(p)
Is there not a function that will find lattice points that lie on a line? Those are the only ones you're missing. They are simply solutions to the line segment's defining equation. If not, it's easy enough to write the algorithm yourself, finding the points by brute force.
Do the following for each edge (p1, p2) of the polygon.
p1 = (x1, y1)
p2 = (x2, y2)
xdiff = x2 - x1
ydiff = y2 - y1
# Find the line's equation, y = mx + b
m = ydiff / xdiff
b = y1 - m*x1
for xval in range(x1+1, x2):
yval = m * xval + b
if int(yval) == yval:
# add (xval, yval) to your list of points
I've left details up to you: make sure that x1 < x2 (or adapt otherwise), handle a vertical segment, etc. This isn't particularly elegant, but it's fast, easy to implement, and easy to debug.
I've working nearest neighbors function but I don't know how to make it work only horizontally and vertically right now it works in all directions. Code below:
nnlst = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
MAP_WIDTH = 3
MAP_HEIGHT = 3
def nearest_neighbors(map_x, map_y):
coordinates_list = []
for x_ in range(max(0, map_x - 1), min(MAP_WIDTH, map_x + 2)):
for y_ in range(max(0, map_y - 1), min(MAP_HEIGHT, map_y + 2)):
# we are ignoring result when x_ and y_ equals variable we ask for
if (map_x, map_y) == (x_, y_):
continue
coordinates_list.append([x_, y_])
return coordinates_list
print "function result"
print "nearest neighbors of", nnlst[0][1]
nearest_neighbor_coordinates_list = nearest_neighbors(0, 1)
for coordinates in nearest_neighbor_coordinates_list:
print coordinates, "=", nnlst[coordinates[0]][coordinates[1]]
As you can see right now it works in all directions.
You need to add one more condition to prevent inclusion of the diagonal ones:
def nearest_neighbors(map_x, map_y):
coordinates_list = []
for x_ in range(max(0, map_x - 1), min(MAP_WIDTH, map_x + 2)):
for y_ in range(max(0, map_y - 1), min(MAP_HEIGHT, map_y + 2)):
# we are ignoring result when x_ and y_ equals variable we ask for, also the diagonal neigbors that differ in both x & y coordinates
if (map_x, map_y) == (x_, y_) or (map_x != x_ and map_y != y_):
continue
coordinates_list.append([x_, y_])
return coordinates_list
to get the desired result:
function result
nearest neighbors of 2
[0, 0] = 1
[0, 2] = 3
[1, 1] = 5
alternatively, you could list all "admissible" displacements explicitly:
for dx, dy in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
x_ = min(MAP_WIDTH, max(0, map_x + dx))
y_ = min(MAP_HEIGHT, max(0, map_y + dy))
if (map_x, map_y) == (x_, y_):
continue
...
For a problem with such as small number of possibilities, I would just spell them all out and pre-compute the function results for every position. That way the function can be eliminated and the problem reduced to doing simple table look-up operation.
Here's what I mean:
nnlist = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
MAP_WIDTH = len(nnlist[0])
MAP_HEIGHT = len(nnlist)
nearest_neighbors = {} # is now a dictionary
for x in range(MAP_WIDTH):
for y in range(MAP_HEIGHT):
neighbors = [[nx, ny] for nx, ny in [(x-1, y), (x+1, y), (x, y-1), (x, y+1)]
if -1 < nx < MAP_WIDTH and -1 < ny < MAP_HEIGHT]
nearest_neighbors[(x, y)] = neighbors
print "look-up result"
print "nearest neighbors of", nnlist[0][1]
nearest_neighbor_coordinates_list = nearest_neighbors[(0, 1)]
for coordinates in nearest_neighbor_coordinates_list:
print coordinates, "=", nnlist[coordinates[0]][coordinates[1]]
suppose to have two polygons p1 and p2, where p2 is completely inside p1
p1 = [(0, 10), (10, 10), (10, 0), (0, 0)]
p2 = [(2, 6), (6, 6), (6, 2), (2, 2)]
degree_of_contact = 0
xyarrays = [p1,p2]
p1_degree_of_contact = 0
for x,y in xyarrays[0]:
if point_inside_polygon(x,y,xyarrays[1]):
p1_degree_of_contact += 1
p2_degree_of_contact = 0
for x,y in xyarrays[1]:
if point_inside_polygon(x,y,xyarrays[0]):
p2_degree_of_contact += 1
degree_of_contact = p1_degree_of_contact + p2_degree_of_contact
where point_inside_polygon is to deciding if a point is inside (True, False otherwise) a polygon,
where poly is a list of pairs (x,y) containing the coordinates of the polygon's vertices. The algorithm is called the "Ray Casting Method
i wish to combine in an elegant way (line coding save) both loops in one.
The following should work:
degree_of_contact = 0
for tmp1, tmp2 in [(p1, p2), (p2, p1)]:
for x,y in tmp1:
if point_inside_polygon(x, y, tmp2):
degree_of_contact += 1
degree_of_contact = sum(point_inside_polygon(x, y, i) for i, j in ((p1, p2), (p2, p1)) for x, y in j)