I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
Related
I am trying to write a code where i have a list of vectors and Ι have to find the angle between every vector and the rest of them.(I am working on mediapipe's hand landmarks).
My code so far is this one:
vectors = [thumb_cmc_vec, thumb_mcp_vec, thumb_ip_vec, thumb_tip_vec, index_mcp_vec, index_pip_vec,
index_dip_vec, index_tip_vec, middle_mcp_vec, middle_pip_vec, middle_dip_vec, middle_tip_vec,
ring_mcp_vec, ring_pip_vec, ring_dip_vec, ring_tip_vec, pinky_mcp_vec, pinky_pip_vec,
pinky_dip_vec, pinky_tip_vec]
for vector in vectors:
next_vector = vector + 1
print(vector)
for next_vector in vectors:
print(next_vector)
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
print(list)
where M = the multiplication of the norms of the current sets of vectors, ES = the
scalar product of the vectors and th = the angle of the vectors.
My problem is that the variable next_vector always starts the for loop from the first vector of the list even though I want it to start from the previous loop's next vector in order not to have duplicate results. Also when both of the loops are on the 3rd vector (thumb_ip_vec) I am getting this error
th = math.acos(ES / M)
ValueError: math domain error . Is there any way to solve this? Thank you!
I think you can iterate through the list indices (using range(len(vectors) - 1)) and access the elements through their indices instead of looping through each element
for i in range(len(vectors) - 1):
# Iterate from 0 to len(vectors) -1
vector = vectors[i]
for j in range(i + 1, len(vectors)):
# Iterate from index i + 1 to len(vectors)
next_vector = vectors[j]
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
print(list)
The efficient solution here is to iterate over combinations of vectors:
from itertools import combinations # At top of file
for vector, next_vector in itertools.combinations(vectors, 2):
M = (np.linalg.norm(vector) * np.linalg.norm(next_vector))
ES = np.dot(vector, next_vector)
th = math.acos(ES / M)
list.append(th)
It's significantly faster than looping over indices and indexing, reduces the level of loop nesting, and makes it more clear what you're trying to do (working with every unique pairing of the input).
I'm not sure I understand your question, but consider using ranges instead.
Ranges allow you to iterate, but without calling the exact value only, but by calling it's address.
Which means you can manipulate that index to access neighboring values.
for i in range(len(iterables)-1):
ii = i+1
initial_value = iterables[i]
next_value = iterables[ii]
for ii in range(len(iterables)):
# do_rest_of_code
Sort of like the mailman, you can reach someone's neighbor without knowing the neighbor's address.
The structure above generally works, but you will need to tweak it to meet your needs.
In this snippet of Python code,
fun iterates through the array arr and counts the number of identical integers in two array sections for every section pair. (It simulates a matrix.) This makes n*(n-1)/2*m comparisons in total, giving a time complexity of O(n^2).
Are there programming solutions or ways of reframing this problem that would yield equivalent results but have reduced time complexity?
# n > 500000, 0 < i < n, m = 100
# dim(arr) = n*m, 0 < arr[x] < 4294967311
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
arr is a shared memory array, therefore it's best kept read-only for simplicity and performance reasons.
arr is implemented as a 1D RawArray from multiprocessing. The reason for this it has by far the fastest performance according to my tests. Using a numpy 2D array, for example, like this:
arr = np.ctypeslib.as_array(mp.RawArray(ctypes.c_uint, n*m)).reshape(n,m)
would provide vectorization capabilities, but increases the total runtime by an order of magnitude - 250s vs. 30s for n = 1500, which amounts to 733%.
Since you can't change the array characteristics at all, I think you're stuck with O(n^2). numpy would gain some vectorization, but would change the access for others sharing the array. Start with the innermost operation:
for k in range(0,m):
count += (arr[i][k] == arr[j][k])
Change this to a one-line assignment:
count = sum(arr[i][k] == arr[j][k] for k in range(m))
Now, if this is truly an array, rather than a list of lists, use the array package's vectorization to simplify the loops, one at a time:
count = sum(arr[i] == arr[j]) # results in a vector of counts
You can now return the j indices where count[j] / m > 0.7. Note that there's no real need to return i for each one: it's constant within the function, and the calling program already has the value. Your array package likely has a pair of vectorized indexing operations that can return those indices. If you're using numpy, those are easy enough to look up on this site.
So after fiddling around some more, I was able to cut down the running time greatly with help from NumPy's vectorization and Numba's JIT compiler. Going back to the original code:
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
We can leave out the bottom return statement as well as dismiss the idea of using count entirely, leaving us with:
def fun(i):
for j in range(i-1,0,-1):
if sum(arr[i*m+k] == arr[j*m+k] for k in range(m)) > 0.7*m:
return (i,j)
Then, we change the array arr to a NumPy format:
np_arr = np.frombuffer(arr,dtype='int32').reshape(m,n)
The important thing to note here is that we do not use a NumPy array as a shared memory array to be written from multiple processes, avoiding the overhead pitfall.
Finally, we apply Numba's decorator and rewrite the sum function in vector form so that it works with the new array:
import numba as nb
#nb.njit(fastmath=True,parallel=True)
def fun(i):
for j in range(i-1, 0, -1):
if np.sum(np_arr[i] == np_arr[j]) > 0.7*m:
return (i,j)
This reduced the running time to 7.9s, which is definitely a victory for me.
I am trying to do spatial derivatives and almost managed to get all the loops out of my code, but when I try to sum everything up at the end I have a problem.
I have a set of N~=250k nodes. I have found indices i,j of node pairs with i.size=j.size=~7.5M that are within a certain search distance, originally coming from np.triu_indices(n,1) and passed through a series of boolean masks to wash out nodes not influencing each other. Now I want to sum up the influences on each node from the other nodes.
I currently have this:
def sparseSum(a,i,j,n):
return np.array([np.sum(a[np.logical_or(i==k,j==k)],axis=0) for k in range(n)])
This is very slow. What I would like is something vectorized. If I had scipy I could do
def sparseSum(a,i,j,n):
sp=scipy.sparse.csr_matrix((a,(i,j)),shape=(n,n))+ scipy.sparse.csr_matrix((a,(j,i)),shape=(n,n))
return np.sum(sp, axis=0)
But I'm doing this all within an Abaqus implementation that doesn't include scipy. Is there any way to do this numpy-only?
Approach #1 : Here's an approach making use of matrix-multiplication and broadcasting -
K = np.arange(n)[:,None]
mask = (i == K) | (j == K)
out = np.dot(mask,a)
Approach #2 : For cases with a small number of columns, we can use np.bincount for such bin-based summing along each column, like so -
def sparseSum(a,i,j,n):
if len(a.shape)==1:
out=np.bincount(i,a,minlength=n)+np.bincount(j,a)
else:
ncols = a.shape[1]
out = np.empty((n,ncols))
for k in range(ncols):
out[:,k] = np.bincount(i,a[:,k],minlength=n) + np.bincount(j,a[:,k])
return out
Here's not a turn-key solution but one that adds columns of a sparse matrix. It essentially computes and utilises the csc representation
def sparse_col_sums(i, j, a, N):
order = np.lexsort(j, i)
io, jo, ao = i[order], j[order], a[order]
col_bnds = io.searchsorted(np.arange(N))
return np.add.reduceat(ao, col_bnds)
I have the following code
l = len(time) #time is a 300 element list
ll = len(sample) #sample has 3 sublists each with 300 elements
w, h = ll, l
Matrix = [[0 for x in range(w)] for y in range(h)]
for n in range(0,l):
for m in range(0,ll):
x=sample[m]
Matrix[m][n]= x
When I run the code to fill the matrix I get an error message saying "list index out of range" I put in a print statement to see where the error happens and when m=0 and n=3 the matrix goes out of index.
from what I understand on the fourth line of the code I initialize a 3X300 matrix so why does it go out of index at 0X3 ?
You need to change Matrix[m][n]= x to Matrix[n][m]= x
The indexing of nested lists happens from the outside in. So for your code, you'll probably want:
Matrix[n][m] = x
If you prefer the other order, you can build the matrix differently (swap w and h in the list comprehensions).
Note that if you're going to be doing mathematical operations with this matrix, you may want to be using numpy arrays instead of Python lists. They're almost certainly going to be much more efficient at doing math operations than anything you can write yourself in pure Python.
Note that indexing in nested lists in Python happens from outside in, and so you'll have to change the order in which you index into your array, as follows:
Matrix[n][m] = x
For mathematical operations and matrix manipulations, using numpy two-dimensional arrays, is almost always a better choice. You can read more about them here.
What's an efficient way, given a NumPy matrix (2D array), to return the minimum/maximum n values (along with their indices) in the array?
Currently I have:
def n_max(arr, n):
res = [(0,(0,0))]*n
for y in xrange(len(arr)):
for x in xrange(len(arr[y])):
val = float(arr[y,x])
el = (val,(y,x))
i = bisect.bisect(res, el)
if i > 0:
res.insert(i, el)
del res[0]
return res
This takes three times longer than the image template matching algorithm that pyopencv does to generate the array I want to run this on, and I figure that's silly.
Since the time of the other answer, NumPy has added the numpy.partition and numpy.argpartition functions for partial sorting, allowing you to do this in O(arr.size) time, or O(arr.size+n*log(n)) if you need the elements in sorted order.
numpy.partition(arr, n) returns an array the size of arr where the nth element is what it would be if the array were sorted. All smaller elements come before that element and all greater elements come afterward.
numpy.argpartition is to numpy.partition as numpy.argsort is to numpy.sort.
Here's how you would use these functions to find the indices of the minimum n elements of a two-dimensional arr:
flat_indices = numpy.argpartition(arr.ravel(), n-1)[:n]
row_indices, col_indices = numpy.unravel_index(flat_indices, arr.shape)
And if you need the indices in order, so row_indices[0] is the row of the minimum element instead of just one of the n minimum elements:
min_elements = arr[row_indices, col_indices]
min_elements_order = numpy.argsort(min_elements)
row_indices, col_indices = row_indices[min_elements_order], col_indices[min_elements_order]
The 1D case is a lot simpler:
# Unordered:
indices = numpy.argpartition(arr, n-1)[:n]
# Extra code if you need the indices in order:
min_elements = arr[indices]
min_elements_order = numpy.argsort(min_elements)
ordered_indices = indices[min_elements_order]
Since there is no heap implementation in NumPy, probably your best guess is to sort the whole array and take the last n elements:
def n_max(arr, n):
indices = arr.ravel().argsort()[-n:]
indices = (numpy.unravel_index(i, arr.shape) for i in indices)
return [(arr[i], i) for i in indices]
(This will probably return the list in reverse order compared to your implementation - I did not check.)
A more efficient solution that works with newer versions of NumPy is given in this answer.
I just met the exact same problem and solved it.
Here is my solution, wrapping the np.argpartition:
Applied to arbitrary axis.
High speed when K << array.shape[axis], o(N).
Return both the sorted result and the corresponding indexs in original matrix.
def get_sorted_smallest_K(array, K, axis=-1):
# Find the least K values of array along the given axis.
# Only efficient when K << array.shape[axis].
# Return:
# top_sorted_scores: np.array. The least K values.
# top_sorted_indexs: np.array. The least K indexs of original input array.
partition_index = np.take(np.argpartition(array, K, axis), range(0, K), axis)
top_scores = np.take_along_axis(array, partition_index, axis)
sorted_index = np.argsort(top_scores, axis=axis)
top_sorted_scores = np.take_along_axis(top_scores, sorted_index, axis)
top_sorted_indexs = np.take_along_axis(partition_index, sorted_index, axis)
return top_sorted_scores, top_sorted_indexs