How can I iterate over a function

How can I iterate over a function - python

I'm simulating forest-fire, and one of my tasks is to plot the density of trees vs those currently burning and empty plots. I have the disparate parts, however I need assistance in putting them together as I can't work out how to put my code together. Currently, I have my initial conditions
p, f = 0.5, 0.3
nx, ny = 100, 100
X = np.zeros((ny, nx))
adjacent = ((-1,0), (0,-1), (0, 1), (1,0))
E, T, F = 0, 1, 2
xvalues = [0]
yvalues = [0]
my function that generates the next frame (distribution of fire) is
def iterate(X):
Xnew = np.zeros((ny, nx))
for ix in range(1,nx-1):
for iy in range(1,ny-1):
if X[iy,ix] == E and np.random.random() <= p:
Xnew[iy,ix] = T
if X[iy,ix] == T:
Xnew[iy,ix] = T
for dx,dy in adjacent:
if X[iy+dy,ix+dx] == F:
Xnew[iy,ix] = F
else:
if np.random.random() <= f:
Xnew[iy,ix] = F
return Xnew
print(Xnew)
The bit I'm struggling with is how to write the following correctly with the above material so that I could go up to Xn where n is about 1000
X1 = iterate(X)
X2 = iterate(X1)
X3 = iterate(X2) and so on
and for each iteration calculate
num_empty = (Xn == 0).sum()
num_tree = (Xn == 1).sum()
num_fire = (Xn == 2).sum()
density = num_tree/(num_fire+num_empty)
xvalues.append(i)
yvalues.append(density)
print(density)
Any help would be appreciated!

I think you need to iterate over the range of n ints rather than "function".
i = 0
_X = iteration(X)
num_empty = (_X == 0).sum()
num_tree = (_X == 1).sum()
num_fire = (_X == 2).sum()
density = num_tree / (num_fire + num_empty)
print(i, density)
xvalues.append(i)
yvalues.append(density)
n = 1000
for i in range(1, n):
_X = iteration(_X)
num_empty = (_X == 0).sum()
num_tree = (_X == 1).sum()
num_fire = (_X == 2).sum()
density = num_tree / (num_fire + num_empty)
print(i, density)
xvalues.append(i)
yvalues.append(density)

Related

Check if a set of points described a triangle

I tried to solve this question but couldn't find a simple solution without passing all rows and find which numbers are on the same line.
Is there a simple way to find triangles?
this is my solution for finding a triangle:
How can I change it to be more "pythonic"? (or even better method for solving it)
from sympy.solvers import solve
from sympy import Symbol
from collections import Counter
vals = [8,17,19] # the triangle
dicl = [] #list of dics
for v in vals:
dic = {}
dic['val'] = v
v1 = v
done = 0
stepsb = 0
while done == 0: #going backword untill reaching the big triabgle edges
x = Symbol('x')
k = solve((x**2 + x)/2 +1 - v1, x)
k = list(filter(lambda x:x>0, k))
if k[0]%1 == 0:
done = 1
else:
v1 -= 1
stepsb += 1
dic['line'] = k[0]
dic['stepsb'] = stepsb #dist from the left edge
dic['stepsf'] = (k[0]**2 + 3*k[0] + 2)/2 - v #dist from the right edge
dicl.append(dic)
print(dic)
lines = [l['line'] for l in dicl]
mc = Counter(lines).most_common(1)[0][0] #finding the numbers on the same line
minv = min([l['val'] for l in dicl if l['line'] == mc])
maxv = max([l['val'] for l in dicl if l['line'] == mc])
stb = [l['stepsb'] for l in dicl if l['val'] == minv][0]
stf = [l['stepsf'] for l in dicl if l['val'] == maxv][0]
for k in dicl:
if k['stepsb'] == stb and k['stepsf'] == stf:
print("good")
break

A first step could be to search for a formula that translates the one-dimensional point number t to an x,y coordinate.
So, search for an n such that n*(n+1)/2 < t:
from sympy import solve, Eq
from sympy.abc import n, t
f = Eq(n * (n + 1), 2 * t)
print(solve(f, n))
This shows as positive root: (sqrt(8*t + 1) - 1)/2.
To be strict smaller, a formula that copes with small approximation errors, could be:
floor((sqrt(8*t + 1) - 1)/2 - 0.0000001
The following idea is, given a list of indices:
convert them to xy coordinates
find their center (sum and divide by the length of the list)
find the distances of each xy to the center
check that all distances are equal
To convert to an xy position, note that the height of an equilateral triangle with base 1 is sqrt(3)/2, so the distances between the y-positions should be multiplied by that factor. The x-positions need to be centered which can be achieved by subtracting n/2.
import math
def find_xy(t):
# convert the numerical position into an xy coordinate in the plane
# first find largest n such that n*(n+1)/2 < t
n = math.floor((math.sqrt(8 * t + 1) - 1) / 2 - 0.0000001)
return (n + 1) * math.sqrt(3) / 2, t - n * (n + 1) // 2 - n/2
def sq_dist(p, q):
return (p[0] - q[0]) ** 2 + (p[1] - q[1]) ** 2
def center(points):
# find the center of a list of points
l = len(points)
x = sum(p[0] for p in points)
y = sum(p[1] for p in points)
return x / l, y / l
def is_regular(tri_points):
points = [find_xy(t) for t in tri_points]
cent = center(points)
dists = [sq_dist(cent, p) for p in points]
return max(dists) - min(dists) < 0.000001
Note that this code finds geometric figures for which all the points lie on a circle. This doesn't work for the parallelogram. The actual question also has some extra criteria: all edges should follow the grid lines, and all edges need to be equal in length.
Therefore, it is useful to have 3 coordinates for each point: the row, the column and the diagonal (the 3 directions of the grid).
The length in each direction, is just the maximum minus the minimum for that direction. These lengths are called d_r, d_c and d_d in the code below.
Checking for a valid triangle, the 3 lengths need to be equal. One way to check this, is to check that the minimum of the lengths is equal to the maximum.
For a valid parallelogram, two lengths need to be equal, and the third should be the double. Checking that the maximum length is twice the minimum length should cover this. But, because this can already be reached using 3 points, we should also check that for a given direction, there are exactly 2 points at the minimum and 2 at the maximum. Summing all points and comparing twice the sum of maximum and minimum should accomplish this.
For a valid hexagon, the 3 lengths should be equal. So, the same test as for the triangle: the minimum of the lengths equal to the maximum. And also the test on the sums is needed, as 4 points can already fulfil the length conditions.
import math
def find_row_col_diag(t):
# convert the numerical position into an row,col,diag coordinate in the plane
# first find largest n such that n*(n+1)/2 < t
n = math.floor((math.sqrt(8 * t + 1) - 1) / 2 - 0.0000001)
row, col = n + 1, t - n * (n + 1) // 2
return row, col, row - col
def check_valid_figure(tri_points):
points = [find_row_col_diag(t) for t in tri_points]
rs = [r for (r, c, d) in points]
cs = [c for (r, c, d) in points]
ds = [d for (r, c, d) in points]
sum_r = sum(rs)
min_r = min(rs)
max_r = max(rs)
d_r = max_r - min_r
sum_c = sum(cs)
min_c = min(cs)
max_c = max(cs)
d_c = max_c - min_c
sum_d = sum(ds)
min_d = min(ds)
max_d = max(ds)
d_d = max_d - min_d
if len(points) == 3:
is_ok = max(d_r, d_c, d_d) == min(d_r, d_c, d_d)
elif len(points) == 4:
is_ok = max(d_r, d_c, d_d) == 2 * min(d_r, d_c, d_d) \
and sum_r == 2 * (min_r + max_r) and sum_c == 2 * (min_c + max_c) and sum_d == 2 * (min_d + max_d)
elif len(points) == 6:
is_ok = max(d_r, d_c, d_d) == min(d_r, d_c, d_d) \
and len(set(rs)) == 3 and len(set(cs)) == 3 and len(set(ds)) == 3
else:
is_ok = False
print(" ".join([str(t) for t in tri_points]), end=" ")
if is_ok:
print("are the vertices of a",
"triangle" if len(points) == 3 else "parallelogram" if len(points) == 4 else "hexagon")
else:
print("are not the vertices of an acceptable figure")
tri_point_lists = [[1, 2, 3],
[11, 13, 22, 24],
[11, 13, 29, 31],
[11, 13, 23, 25],
[26, 11, 13, 24],
[22, 23, 30],
[4, 5, 9, 13, 12, 7]]
for lst in tri_point_lists:
check_valid_figure(lst)
The last code can be further compressed using list comprehensions:
def check_valid_figure_bis(tri_points):
points = [find_row_col_diag(t) for t in tri_points]
rs, cs, ds = [[p[i] for p in points] for i in range(3)]
sums = [sum(xs) for xs in (rs, cs, ds)]
mins = [min(xs) for xs in (rs, cs, ds)]
maxs = [max(xs) for xs in (rs, cs, ds)]
lens = [ma - mi for mi, ma in zip(mins, maxs)]
if len(points) == 3:
is_ok = max(lens) == min(lens)
elif len(points) == 4:
is_ok = max(lens) == 2 * min(lens) and all([su == 2 * (mi + ma) for su, mi, ma in zip(sums, mins, maxs)])
elif len(points) == 6:
is_ok = max(lens) == min(lens) and all([len(set(xs)) == 3 for xs in (rs, cs, ds)])
else:
is_ok = False
return is_ok

Implementing sub gradient Stochastic descent in python

I want to implement subgradient and Stochastic descent using a cost function, calculate the number of iterations that it takes to find a perfect classifier for the data and also the weights (w) and bias (b).
the dataset is in four dimension
this is my cost function
i have take the derivative of the cost function and here it is:
When i run my code i get a lot of errors, can someone please help.
Here is my Code in python
import numpy as np
learn_rate = 1
w = np.zeros((4,1))
b = 0
M = 1000
data = '/Users/labuew/Desktop/dataset.data'
#calculating the gradient
def cal_grad_w(data, w, b):
for i in range (M):
sample = data[i,:]
Ym = sample[-1]
Xm = sample[0:4]
if -Ym[i]*(w*Xm+b) >= 0:
tmp = 1.0
else:
tmp = 0
value = Ym[i]*Xm*tmp
sum = sum +value
return sum
def cal_grad_b(data, w, b):
for i in range (M):
sample = data[i,:]
Ym = sample[-1]
Xm = sample[0:4]
if -Ym*(w*Xm+b) >= 0:
tmp = 1.0
else:
tmp = 0
value = Ym[i]*x*tmp
sum = sum +value
return sum
if __name__ == '__main__':
counter = 0
while 1:
counter +=1
dw = cal_grad_w(data, w, b)
db = cal_grad_b(data, w, b)
if dw == 0 and db == 0:
break
w = w - learn_rate*dw
b = b - learn_rate *dw
print(counter,w,b)

are you missing the numpy load function?
data = np.load('/Users/labuew/Desktop/dataset.data')
It looks like you're doing the numerics on the string.
also
Ym = sample[-1]
Xm = sample[0:4]
Also 4 dimensions implies that Ym = Xm[3]? Is your data rank 2 with the second rank being dimension 5? [0:4] includes the forth dimension i.e.
z = [1,2,3,4]
z[0:4] = [1,2,3,4]
This would be my best guess. I'm taking a few educated guesses about your data format.
import numpy as np
learn_rate = 1
w = np.zeros((1,4))
b = 0
M = 1000
#Possible format
#data = np.load('/Users/labuew/Desktop/dataset.data')
#Assumed format
data = np.ones((1000,5))
#calculating the gradient
def cal_grad_w(data, w, b):
sum = 0
for i in range (M):
sample = data[i,:]
Ym = sample[-1]
Xm = sample[0:4]
if -1*Ym*(np.matmul(w,Xm.reshape(4,1))+b) >= 0:
tmp = 1.0
else:
tmp = 0
value = Ym*Xm*tmp
sum = sum +value
return sum.reshape(1,4)
def cal_grad_b(data, w, b):
sum = 0
for i in range (M):
sample = data[i,:]
Ym = sample[-1]
Xm = sample[0:4]
if -1*Ym*(np.matmul(w,Xm.reshape(4,1))+b) >= 0:
tmp = 1.0
else:
tmp = 0
value = Ym*tmp
sum = sum +value
return sum
if __name__ == '__main__':
counter = 0
while 1:
counter +=1
dw = cal_grad_w(data, w, b)
db = cal_grad_b(data, w, b)
if dw.all() == 0 and db == 0:
break
w = w - learn_rate*dw
b = b - learn_rate*db
print([counter,w,b])
Put in dummy data because I don't know the format.

Value Error, Shapes do Not Align Python

Yeah, so this is my code in multiclass logistic regression, but when I run it it gives the error of Value Error, Shapes not aligned or whatever.
import numpy
import matplotlib.pyplot as plt
import math as mt
#normalized and feature scaled
Just loading the data set
def load():
data = numpy.loadtxt(open("housing.data.txt", "rb"), dtype="float")
m, n = data.shape
first_col = numpy.ones((m, 1))
#create new array using new parameters
data = numpy.hstack((first_col, data))
#divide each X with the max in the column
#subtract the mean of X to each element
for l in range(1, n):
max = 0.0
sum = 0.0
for j in range(0, m):
if max < data[j, l]:
max = data[j, l]
sum += data[j, l]
avg = sum / m
for j in range(0, m):
data[j, l] -= avg
data[j, l] /= max
return data
def logistic(z):
z = z[0,0]
z = z * -1
return (1.0 / (1.0 + mt.exp(z)))
def hyp(theta, x):
x = numpy.mat(x)
theta = numpy.mat(theta)
return logistic(theta * x.T)
#cost and derivative functions: TO REWRITE
#regularize using "-1000/m (hyp(theta, data[x, :-1]))"
def derv(theta, data, j):
sum = 0.0
last = data.shape[1] - 1
m = data.shape[0]
for x in range(0, m):
sum += (hyp(theta, data[x, :-1]) - numpy.mat(data[x, last])) +
numpy.mat(data[x, j])
return (sum[0,0] / m)
#regularize using " + 1000/2m(hyp(theta, data[x, :-1]))"
def cost(theta, data):
sum = 0.0
last = data.shape[1] - 1
m = data.shape[0]
for x in range(0, m):
y = data[x, last]
sum += y * mt.log(hyp(theta, data[x, :-1])) + (1 - y) * mt.log(1
- hyp(theta, data[x, :-1]))
return -1 * (sum / m)
data = load()
data1 = data[:, [10]]
data2 = data[:, [13]]
d12 = numpy.hstack((data1, data2))
data3 = data[:, [14]]
pdata = numpy.hstack((d12, data3))
print(pdata)
alpha = 0.01
theta = [10,10,10,10]
ntheta = [0,0,0,0]
delta = 50
x = 0
for l in range(0, 1000):
old_cost = cost(theta, pdata)
for y in range(0, data.shape[1] - 1):
ntheta[y] = theta[y] - alpha * derv(theta, data1, y)
for k in range(0, data.shape[1] - 1):
theta[k] = ntheta[k]
new_cost = cost(theta, data1)
delta = new_cost - old_cost
print("Cost: " + str(new_cost))
print("Delta: " + str(delta))
for r in range(0, data.shape[1]):
if hyp(theta, data1[r, :-1]) >= 0.5:
print("Predicted: 1 Actual: " + str(data1[r, data1.shape[1] - 1]))
else:
print("Predicted: 0 Actual: " + str(data1[r, data1.shape[1] - 1]))
plt.scatter(data1[:, 1], data1[:, 2])
x1 = (-1 * theta[0]) / theta[1]
x2 = (-1 * theta[0]) / theta[1]
x = range(-2, 2)
y = [((-1 * theta[0]) - (theta[1] * z) ) for z in x]
plt.plot(x, y)
plt.show()
I'm guessing it cant be plotted like this or idk

Replacing multiprocessing pool.map with mpi4py

I'm a beginner in using MPI, and I'm still going through the documentation. However, there's very little to work on when it comes to mpi4py. I have written a code that currently uses the multiprocessing module to run on many cores, but I need replace this with mpi4py so that I can use more than one node to run my code. My code is below, when using the multiprocessing module, and also without.
With multiprocessing,
import numpy as np
import multiprocessing
start_time = time.time()
E = 0.1
M = 5
n = 1000
G = 1
c = 1
stretch = [10, 1]
#Point-Distribution Generator Function
def CDF_inv(x, e, m):
A = 1/(1 + np.log(m/e))
if x == 1:
return m
elif 0 <= x <= A:
return e * x / A
elif A < x < 1:
return e * np.exp((x / A) - 1)
#Elliptical point distribution Generator Function
def get_coor_ellip(dist=CDF_inv, params=[E, M], stretch=stretch):
R = dist(random.random(), *params)
theta = random.random() * 2 * np.pi
return (R * np.cos(theta) * stretch[0], R * np.sin(theta) * stretch[1])
def get_dist_sq(x_array, y_array):
return x_array**2 + y_array**2
#Function to obtain alpha
def get_alpha(args):
zeta_list_part, M_list_part, X, Y = args
alpha_x = 0
alpha_y = 0
for key in range(len(M_list_part)):
z_m_z_x = X - zeta_list_part[key][0]
z_m_z_y = Y - zeta_list_part[key][1]
dist_z_m_z = get_dist_sq(z_m_z_x, z_m_z_y)
alpha_x += M_list_part[key] * z_m_z_x / dist_z_m_z
alpha_y += M_list_part[key] * z_m_z_y / dist_z_m_z
return (alpha_x, alpha_y)
#The part of the process containing the loop that needs to be parallelised, where I use pool.map()
if __name__ == '__main__':
# n processes, scale accordingly
num_processes = 10
pool = multiprocessing.Pool(processes=num_processes)
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
print len(x)*len(y)*n,'calculations to be carried out.'
M_list = np.array([.001 for i in range(n)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, num_processes, axis=0)
M_list_split = np.array_split(M_list, num_processes)
X_list = [X for e in range(num_processes)]
Y_list = [Y for e in range(num_processes)]
alpha_list = pool.map(
get_alpha, zip(zeta_list_split, M_list_split, X_list, Y_list))
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0] * 4 * G / (c**2)
alpha_y += e[1] * 4 * G / (c**2)
print("%f seconds" % (time.time() - start_time))
Without multiprocessing,
import numpy as np
E = 0.1
M = 5
G = 1
c = 1
M_list = [.1 for i in range(n)]
#Point-Distribution Generator Function
def CDF_inv(x, e, m):
A = 1/(1 + np.log(m/e))
if x == 1:
return m
elif 0 <= x <= A:
return e * x / A
elif A < x < 1:
return e * np.exp((x / A) - 1)
n = 1000
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
stretch = [5, 2]
#Elliptical point distribution Generator Function
def get_coor_ellip(dist=CDF_inv, params=[E, M], stretch=stretch):
R = dist(random.random(), *params)
theta = random.random() * 2 * np.pi
return (R * np.cos(theta) * stretch[0], R * np.sin(theta) * stretch[1])
#zeta_list is the list of coordinates of a distribution of points
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
#Creation of a X-Y Grid
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
def get_dist_sq(x_array, y_array):
return x_array**2 + y_array**2
#Calculation of alpha, containing the loop that needs to be parallelised.
alpha_x = 0
alpha_y = 0
for key in range(len(M_list)):
z_m_z_x = X - zeta_list[key][0]
z_m_z_y = Y - zeta_list[key][1]
dist_z_m_z = get_dist_sq(z_m_z_x, z_m_z_y)
alpha_x += M_list[key] * z_m_z_x / dist_z_m_z
alpha_y += M_list[key] * z_m_z_y / dist_z_m_z
alpha_x *= 4 * G / (c**2)
alpha_y *= 4 * G / (c**2)
Basically what my code does is, it first generates a list of points that follow a certain distribution. Then I apply an equation to obtain the quantity 'alpha' using different relations between the distances of the points. The part that requires parallelisation is the single for loop involved in the calculation of alpha. What I want to do is to use mpi4py instead of multiprocessing to do this, and I am not sure how to get this going.

Transforming the multiprocessing.map version to MPI can be done using scatter / gather. In your case it is useful, that you already prepare the input list into one chunk for each rank. The main difference is, that all code gets executed by all ranks in the first place, so you must make everything that should be done only by the maste rank 0 conidtional.
if __name__ == '__main__':
comm = MPI.COMM_WORLD
if comm.rank == 0:
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
print len(x)*len(y)*n,'calculations to be carried out.'
M_list = np.array([.001 for i in range(n)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, comm.size, axis=0)
M_list_split = np.array_split(M_list, comm.size)
X_list = [X for e in range(comm.size)]
Y_list = [Y for e in range(comm.size)]
work_list = list(zip(zeta_list_split, M_list_split, X_list, Y_list))
else:
work_list = None
my_work = comm.scatter(work_list)
my_alpha = get_alpha(my_work)
alpha_list = comm.gather(my_alpha)
if comm.rank == 0:
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0] * 4 * G / (c**2)
alpha_y += e[1] * 4 * G / (c**2)
This works fine as long as each processor gets a similar amount of work. If communication becomes an issue, you might want to split up the data generation among processors instead of doing it all on the master rank 0.
Note: Some things about the code are bogus, e.g. alpha_[xy] ends up as np.ndarray. The serial version runs into an error.

For people who are still interested in similar subjects, I highly recommend having a look at the MPIPoolExecutor() class here and the documentation is here.

Increasing performance with octant search algorithm

I am working on an octant search to find the n-number(e.g. 8) of points (+) closest to my circular point (o) in each octant. This would mean that my points (+) are reduced to only 64 (8 per octant).
The first thing I did is to divide my region into octants with my point (o) as reference.
data = array containing (x, y, z) for all points (+)
gdata = array containing (x, y) for point (o)
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist
from collections import defaultdict
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
nrow, cols = data.shape
file_path1 = filedialog.askopenfilename()
gdata = pd.read_excel(file_path1)
gdata = np.array(gdata, dtype=np.float)
pwangle = np.zeros(nrow)
for j in range(nrow):
delta_x = gdata[:,0]-data[:,0][j]
delta_y = gdata[:,1]-data[:,1][j]
if delta_x != 0:
pwangle[j] = np.rad2deg(np.arctan(delta_y/delta_x))
else:
if delta_y > 0:
pwangle[j] = 90
elif delta_y < 0:
pwangle[j] = 270
if (delta_x < 0)&(delta_y > 0):
pwangle[j] = 180 + pwangle[j]
elif (delta_x < 0)&(delta_y < 0):
pwangle[j] = 270 - pwangle[j]
elif (delta_x > 0)&(delta_y < 0):
pwangle[j] = 360 + pwangle[j]
vecangle = pwangle.ravel()
sortdata = defaultdict(list)
count = -1
get_anglesector = 45
N = 8
d = cdist(data[:,:2], gdata)
P = np.hstack((data, d))
for j in range(0, 360, get_anglesector):
count += 1
get_data = []
for k, dummy_val in enumerate(vecangle):
if j <= vecangle[k] < j + get_anglesector:
get_data.append(P[k,::])
sortdata[count] = np.array(get_data)
After data have been grouped into various octant, I then sort data in each octant to obtain the closest 8 data to the point (o).
for i, j in enumerate(sortdata):
octantsort = defaultdict(list)
for i in range(8):
octantsort[i] = np.array(sortdata[i][sortdata[i][:,3].argsort()[:N]])
Is there an efficient and pythonic way of doing this do increase performance?
This works fine but when i have more than one 'o' point (e.g. 10000 points 'o') and I have run the above code for each point, it would be time consuming.

The job gets a lot easier if you use arctan2 instead of arctan. Then vectorizing for speed we may get something like this:
import numpy as np
from scipy.spatial.distance import cdist
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf # handle edge case
octantsort = []
for i in range(8):
data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
dist_order = np.argsort(cdist(data_i, gdata))
octantsort.append(data_i[dist_order[:N]])

Thank you #user7138814, apart from making some slight changes, your code is faster
N=8
delta = gdata - data[:,:2]
angles = np.arctan2(delta[:,1], delta[:,0])
bins = np.linspace(-np.pi, np.pi, 9)
bins[-1] = np.inf # handle edge case
octantsort = []
for i in range(8):
data_i = data[(bins[i] <= angles) & (angles < bins[i+1])]
dist_order = np.argsort(cdist(data_i[:,:2], gdata), axis=0)
[octantsort.append(data_i[dist_order[:N][j]]) for j in range(8)]
final = np.vstack(octantsort)
Time of execution of the previous code (code in the question):
---- 0.021449804306030273 seconds ------
Time of execution of the code in this post:
---- 0.0015172958374023438 seconds ------

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I iterate over a function - python

Related

Check if a set of points described a triangle

Implementing sub gradient Stochastic descent in python

Value Error, Shapes do Not Align Python

Replacing multiprocessing pool.map with mpi4py

Increasing performance with octant search algorithm

Categories

Resources