For loop on two arrays of points - python

I have two arrays of points: list1 with list1.shape = [N, 3] and list2 with list2.shape = [M, 3]. Within N, M: the total number of point, (x, y, z) are 3 coordinates in 3D.
Now I want to check if each point of list1 is within a distance r with each point of list2. A nature way to do this is a for loop:
for i in range(N):
for j in range(M):
if (list1[i, 0] - list2[j, 0])**2 + (list1[i, 1] - list2[j, 1])**2 + (list1[i, 2] - list2[j, 2])**2 < r**2:
''' Return 1 if list1[i] is within list2[j] '''
return True
else:
''' Return 0 if list1[i] is not within list2[j] '''
return False
But it's horribly slow. Could I do the more efficient way?

You can use the outer operations to calculate the distance matrix without the for loops:
s = np.subtract.outer
d_matrix = s(list1[:,0], list2[:,0])**2
d_matrix += s(list1[:,1], list2[:,1])**2
d_matrix += s(list1[:,2], list2[:,2])**2
Where each line is the distance of point i about all points. To find out if point i is close to any point using your criterion:
a = np.zeros_like(list1[:,0])
a[np.any(d_matrix < r**2, axis=1)] = 1

Related

Problem implementing Merge Sort from pseudo code python

Im trying to implement merge sort in Python based on the following pseudo code. I know there are many implementations out there, but I have not been able to find one that followis this pattern with a for loop at the end as opposed to while loop(s). Also, setting the last values in the subarrays to infinity is something I haven't seen in other implementation. NOTE: The following pseudo code has 1 based index i.e. index starts at 1. So I think my biggest issue is getting the indexing right. Right now its just not sorting properly and its really hard to follow with the debugger. My implementation is at the bottom.
Current Output:
Input: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Merge Sort: [0, 0, 0, 3, 0, 5, 5, 5, 8, 0]
def merge_sort(arr, p, r):
if p < r:
q = (p + (r - 1)) // 2
merge_sort(arr, p, q)
merge_sort(arr, q + 1, r)
merge(arr, p, q, r)
def merge(A, p, q, r):
n1 = q - p + 1
n2 = r - q
L = [0] * (n1 + 1)
R = [0] * (n2 + 1)
for i in range(0, n1):
L[i] = A[p + i]
for j in range(0, n2):
R[j] = A[q + 1 + j]
L[n1] = 10000000 #dont know how to do infinity for integers
R[n2] = 10000000 #dont know how to do infinity for integers
i = 0
j = 0
for k in range(p, r):
if L[i] <= R[j]:
A[k] = L[i]
i += 1
else:
A[k] = R[j]
j += 1
return A
First of all you need to make sure if the interval represented by p and r is open or closed at its endpoints. The pseudocode (for loops include last index) establishes that the interval is closed at both endpoints: [p, r].
With last observation in mind you can note that for k in range(p, r): doesn't check last number so the correct line is for k in range(p, r + 1):.
You can represent "infinity" in you problem by using the maximum element of A in the range [p, r] plus one. That will make the job done.
You not need to return the array A because all changes are being done through its reference.
Also, q = (p + (r - 1)) // 2 isn't wrong (because p < r) but correct equation is q = (p + r) // 2 as the interval you want middle integer value of two numbers.
Here is a rewrite of the algorithm with “modern” conventions, which are the following:
Indices are 0-based
The end of a range is not part of that range; in other words, intervals are closed on the left and open on the right.
This is the resulting code:
INF = float('inf')
def merge_sort(A, p=0, r=None):
if r is None:
r = len(A)
if r - p > 1:
q = (p + r) // 2
merge_sort(A, p, q)
merge_sort(A, q, r)
merge(A, p, q, r)
def merge(A, p, q, r):
L = A[p:q]; L.append(INF)
R = A[q:r]; R.append(INF)
i = 0
j = 0
for k in range(p, r):
if L[i] <= R[j]:
A[k] = L[i]
i += 1
else:
A[k] = R[j]
j += 1
A = [433, 17, 585, 699, 942, 483, 235, 736, 629, 609]
merge_sort(A)
print(A)
# → [17, 235, 433, 483, 585, 609, 629, 699, 736, 942]
Notes:
Python has a handy syntax for copying a subrange.
There is no int infinity in Python, but we can use the float one, because ints and floats can always be compared.
There is one difference between this algorithm and the original one, but it is irrelevant. Since the “midpoint” q does not belong to the left range, L is shorter than R when the sum of their lengths is odd. In the original algorithm, q belongs to L, and so L is the longer of the two in this case. This does not change the correctness of the algorithm, since it simply swaps the roles of L and R. If for some reason you need not to have this difference, then you must calculate q like this:
q = (p + r + 1) // 2
In mathematics, we represent all real numbers which are greater than or equal to i and smaller than j by [i, j). Notice the use of [ and ) brackets here. I have used i and j in the same way in my code to represent the region that I am dealing with currently.
ThThe region [i, j) of an array covers all indexes (integer values) of this array which are greater or equal to i and smaller than j. i and j are 0-based indexes. Ignore the first_array and second_array the time being.
Please notice, that i and j define the region of the array that I am dealing with currently.
Examples to understand this better
If your region spans over the whole array, then i should be 0 and j should be the length of array [0, length).
The region [i, i + 1) has only index i in it.
The region [i, i + 2) has index i and i + 1 in it.
def mergeSort(first_array, second_array, i, j):
if j > i + 1:
mid = (i + j + 1) // 2
mergeSort(second_array, first_array, i, mid)
mergeSort(second_array, first_array, mid, j)
merge(first_array, second_array, i, mid, j)
One can see that I have calculated middle point as mid = (i + j + 1) // 2 or one can also use mid = (i + j) // 2 both will work. I will divide the region of the array that I am currently dealing with into 2 smaller regions using this calculated mid value.
In line 4 of the code, MergeSort is called on the region [i, mid) and in line 5, MergeSort is called on the region [mid, j).
You can access the whole code here.

Minimum swaps to sort a string of X,Y and Z chars

I need to find the minimum amount of swaps required to sort a string that only has letters X, Y and Z in random order and amount (not only adjacent). Any two chars can be swapped.
For example the string ZYXZYX will be sorted in 3 swaps: ZYXZYX -> XYXZYZ -> XXYZYZ -> XXYYZZ
ZZXXYY - in 4 swaps, XXXX - in 0 swaps.
So far I have this solution, but the sorting does not sort the chars in the optimal way, so the result is not always the very minimum amount of swaps. Also, the solution should be O(nlogn).
def solve(s):
n = len(s)
newS = [*enumerate(s)]
sortedS = sorted(newS, key = lambda item:item[1])
counter = 0
vis = {v:False for v in range(n)}
print(newS)
print(sortedS)
for i in range(n):
if vis[i] or sortedS[i][0] == i:
continue
cycle_size = 0
j = i
while not vis[j]:
vis[j] = True
j = sortedS[j][0]
cycle_size += 1
if cycle_size > 0:
counter += (cycle_size - 1)
return counter
First perform an O(n) pass through the array and count the X's, Y's, and Z's. Based on the counts, we can define three regions in the array: Rx, Ry, and Rz. Rx represents the range of indexes in the array where the X's should go. Likewise for Ry and Rz.
Then there are exactly 6 permutations that need to be considered:
Rx Ry Rz
X Y Z no swaps needed
X Z Y 1 swap: YZ
Y X Z 1 swap: XY
Y Z X 2 swaps: XZ and XY
Z X Y 2 swaps: XZ and YZ
Z Y X 1 swap: XZ
So all you need is five more O(n) passes to fix each possible permutation. Start with the cases where 1 swap is needed. Then fix the 2 swap cases, if any remain.
For example, the pseudocode for finding and fixing the XZY permutation is:
y = Ry.start
z = Rz.start
while y <= Ry.end && z <= Rz.end
if array[y] == 'Z' && array[z] == 'Y'
array[y] <--> array[z]
swapCount++
if array[y] != 'Z'
y++
if array[z] != 'Y'
z++
The running time for each permutation is O(n), and the overall running time is O(n).
Formal proof of correctness is left as an exercise for the reader. I'll only note that cases XZY, YXZ, and ZYX fix two elements at a cost of one swap (efficiency 2), whereas cases YZX and ZXY fix three elements at a cost of two swaps (efficiency 1.5). So finding and fixing the efficient cases first (and performing inefficient cases only as needed) should give the optimal answer.
This can be solved using breadth-first-search, for example using Raymond Hettinger's puzzle solver (full code of the solver at the bottom of the page):
class SwapXYZ(Puzzle):
def __init__(self, pos):
self.pos = [x for x in pos]
self.goal = sorted(pos)
def __repr__(self):
return repr(''.join(self.pos))
def isgoal(self):
return self.pos == self.goal
def __iter__(self):
for i in range(len(self.pos)):
for j in range(i+1, len(self.pos)):
move = self.pos[:]
temp = move[i]
move[i] = move[j]
move[j] = temp
yield SwapXYZ(''.join(move))
SwapXYZ("ZYXZYX").solve()
# ['ZYXZYX', 'XYXZYZ', 'XXYZYZ', 'XXYYZZ']
IMVHO number of swaps for sorting such string is zero.
We get counts of X, Y and Z (nx, ny, nz).
Then we fill first nx elements with X, next ny elements with 'Y' and the rest with 'Z'. Complexity is o(n)
def sortxyz(a):
nx = ny = nz = 0
for i in range(len(a)):
if a[i] == 'X':
nx += 1
elif a[i] == 'Y':
ny += 1
else:
nz += 1
return ''.join(['X'] * nx + ['Y'] * ny + ['Z'] * nz)
print(sortxyz('YXXZXZYYX'))
XXXXYYYZZ
For more general case when elements of list can take m values complexity will be o(m * n).

How to probabilistically populate a list in python?

I want to use a basic for loop to populate a list of values in Python but I would like the values to be calculate probabilistically such that p% of the time the values are calculated in (toy) equation 1 and 100-p% of the time the values are calculated in equation 2.
Here's what I've got so far:
# generate list of random probabilities
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# loop through but where to put 'p'? append() should probably only appear once
for p in p_list:
calc1 = x*y # equation 1
calc2 = (x-y) # equation 2
my_list.append(calc1)
my_list.append(calc2)
You've already generated a list of probabilities - p_list - that correspond to each value in my_list you want to generate. The pythonic way to do so is via a a ternary operator and a list comprehension:
import random
my_list = [(x*y if random() < p else x-y) for p in p_list]
If we were to expand this into a proper for loop:
my_list = []
for p in p_list:
if random() < p:
my_list.append(x*y)
else:
my_list.append(x-y)
If we wanted to be even more pythonic, regarding calc1 and calc2, we could make them into lambdas:
calc1 = lambda x,y: x*y
calc2 = lambda x,y: x-y
...
my_list = [calc1(x,y) if random() < p else calc2(x,y) for p in p_list]
or, depending on how x and y vary for your function (assuming they're not static), you could even do the comprehension in two steps:
calc_list = [calc1 if random() < p else calc2 for p in p_list]
my_list = [calc(x,y) for calc in calc_list]
I took approach of minimal changes to the original code and easy to understand syntax:
import numpy as np
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# uncomment below 2 lines to make this code syntactially correct
#x = 1
#y = 2
for p in p_list:
# randoms are uniformly distributed over the half-open interval [low, high)
# so check if p is in [0, 0.5) for equation 1 or [0.5, 1) for equation 2
if p < 0.5:
calc1 = x*y # equation 1
my_list.append(calc1)
else:
calc2 = (x-y) # equation 2
my_list.append(calc2)
The other answers seem to assume you want to keep the calculated chances around. If all you are after is a list of results for which equation 1 was used p% of the time and equation 2 100-p% of the time, this is all you need:
from random import random, seed
inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# change the seed to see different 'random' outcomes
seed(1)
results = [x * x if random() > 0.5 else 2 * x for x in inputs]
print(results)
If you are ok to use numpy worth trying the choice method.
https://docs.scipy.org/doc/numpy-1.14.1/reference/generated/numpy.random.choice.html

Navigating a grid

I stumbled upon a problem at Project Euler, https://projecteuler.net/problem=15
. I solved this by combinatorics but was left wondering if there is a dynamic programming solution to this problem or these kinds of problems overall. And say some squares of the grid are taken off - is that possible to navigate? I am using Python. How should I do that? Any tips are appreciated. Thanks in advance.
You can do a simple backtrack and explore an implicit graph like this: (comments explain most of it)
def explore(r, c, n, memo):
"""
explore right and down from position (r,c)
report a rout once position (n,n) is reached
memo is a matrix which saves how many routes exists from each position to (n,n)
"""
if r == n and c == n:
# one path has been found
return 1
elif r > n or c > n:
# crossing the border, go back
return 0
if memo[r][c] is not None:
return memo[r][c]
a= explore(r+1, c, n, memo) #move down
b= explore(r, c+1, n, memo) #move right
# return total paths found from this (r,c) position
memo[r][c]= a + b
return a+b
if __name__ == '__main__':
n= 20
memo = [[None] * (n+1) for _ in range(n+1)]
paths = explore(0, 0, n, memo)
print(paths)
Most straight-forwardly with python's built-in memoization util functools.lru_cache. You can encode missing squares as a frozenset (hashable) of missing grid points (pairs):
from functools import lru_cache
#lru_cache(None)
def paths(m, n, missing=None):
missing = missing or frozenset()
if (m, n) in missing:
return 0
if (m, n) == (0, 0):
return 1
over = paths(m, n-1, missing=missing) if n else 0
down = paths(m-1, n, missing=missing) if m else 0
return over + down
>>> paths(2, 2)
6
# middle grid point missing: only two paths
>>> paths(2, 2, frozenset([(1, 1)]))
2
>>> paths(20, 20)
137846528820
There is also a mathematical solution (which is probably what you used):
def factorial(n):
result = 1
for i in range(1, n + 1):
result *= i
return result
def paths(w, h):
return factorial(w + h) / (factorial(w) * factorial(h))
This works because the number of paths is the same as the number of ways to choose to go right or down over w + h steps, where you go right w times, which is equal to w + h choose w, or (w + h)! / (w! * h!).
With missing grid squares, I think there is a combinatoric solution, but it's very slow if there are many missing squares, so dynamic programming would probably be better there.
For example, the following should work:
missing = [
[0, 1],
[0, 0],
[0, 0],
]
def paths_helper(x, y, path_grid, missing):
if path_grid[x][y] is not None:
return path_grid[x][y]
if missing[x][y]:
path_grid[x][y] = 0
return 0
elif x < 0 or y < 0:
return 0
else:
path_count = (paths_helper(x - 1, y, path_grid, missing) +
paths_helper(x, y - 1, path_grid, missing))
path_grid[x][y] = path_count
return path_count
def paths(missing):
arr = [[None] * w for _ in range(h)]
w = len(missing[0])
h = len(missing)
return paths_helper(w, h, arr, missing)
print paths()

Randomly generating 3-tuples with distinct elements in python

I am trying to generate 3-tuples (x,y,z) in python such that no two of x , y or z have the same value. Furthermore , the variables x , y and z can be defined over separate ranges (0,p) , (0,q) and (0,r). I would like to be able to generate n such tuples. One obvious way is to call random.random() for each variable and check every time whether x=y=z . Is there a more efficient way to do this ?
You can write a generator that yields desired elements, for example:
def product_no_repeats(*args):
for p in itertools.product(*args):
if len(set(p)) == len(p):
yield p
and apply reservoir sampling to it:
def reservoir(it, k):
ls = [next(it) for _ in range(k)]
for i, x in enumerate(it, k + 1):
j = random.randint(0, i)
if j < k:
ls[j] = x
return ls
xs = range(0, 3)
ys = range(0, 4)
zs = range(0, 5)
size = 4
print reservoir(product_no_repeats(xs, ys, zs), size)

Categories

Resources