I really love the data structure and Algorithms.
I am working with a matrix 80000 X 80000 to insert data. I am using numpy. And, my code looks like this:
n = 80000
similarity = np.zeros((n, n), dtype='int8')
for i, photo_i in enumerate(photos):
for j, photo_j in enumerate(photos[i:]):
similarity[i, j] = score(photo_i, photo_j)
if i % 100 == 0:
print(i)
This piece of code is taking too much time. score function is O(1). I was wondering if there could be a better way to do this. I want to plot the data of this matrix in "short time" possible. But the way, I am doing it has the complexity of O(n^2).
Is there "anything", with that it can be "optimized" or maybe by using different Data structure?
I have already read similar questions on SO and they have mentioned pytables. I will definitely try it but don't know yet how. Any suggestion is welcome.
Thanks in advance.
There's a bunch of different things you could do, which all revolve around avoiding the explicit for-loops, which are slow in Python, and delegating to C-level code (either using Python's underlying C runtime or numpy's builtin array creation methods).
Using fromfunction
Numpy has a built-in function for populating a matrix from a function taking coordinates: numpy.fromfunction. This might be faster since it does all the iteration and assignment in C instead of Python.
You'd have to supply it a score-by-coordinates function, e.g.:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j])
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
The photos=photos in the function definition makes the photos array a local of the function and saves some time accessing it on each invocation; this is a common Python micro-optimization technique.
Note that this computes the similarity for the entire matrix instead of just a triangle. To fix this, you could do:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j]) if i < j else 0
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
similarity += similarity.T # fill in other triangle from transposed matrix
Using comprehensions
You could also try creating the similarity matrix from a generator comprehension (or even a list comprehension), again avoiding the explicit for-loops in favor of a comprehension which is faster, but sacrificing the triangle optimization:
similarity = numpy.fromiter((score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos),
shape=(n,n), dtype='int8')
# or:
similarity = numpy.array([score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos],
shape=(n,n), dtype='int8')
To re-introduce the triangle optimization, you could do something like:
similarity = numpy.array([score(photo_i, photo_j) if i < j else 0
for i, photo_i in enumerate(photos)
for j, photo_j in enumerate(photos)],
shape=(n,n), dtype='int8')
similarity += similarity.T
Using triu_indices to populate a triangle directly
Finally, you could use numpy.triu_indices to assign directly into the matrix's upper (and then lower) triangle:
similarity_values = (score(photo_i, photo_j
for photo_i in photos
for photo_j in photos[:i]) # only computing values for the triangle
similarity = np.zeroes((n,n), dtype='int8')
xs, ys = np.triu_indices(n, 1)
similarity[xs, ys] = similarity_values
similarity[ys, xs] = similarity_values
similarity[np.diag_indices(n)] = 1 # assuming score(x, x) == 1
This approach is inspired by this related question: https://codereview.stackexchange.com/questions/107094/create-symmetrical-matrix-from-list-of-values
I don't have a means of benchmarking which of these approaches would work best, but you could experiment and find out. Good luck!
Related
I am learning how to code and wondered how to take the mean without using a builtin function (I know these are optimized and they should be used in real life, this is more of a thought experiment for myself).
For example, this works for vectors:
def take_mean(arr):
sum = 0
for i in arr:
sum += i
mean = sum/np.size(arr)
return mean
But, of course, if I try to pass a matrix, it already fails. Clearly, I can change the code to work for matrices by doing:
def take_mean(arr):
sum = 0
for i in arr:
for j in i:
sum += i
mean = sum/np.size(arr)
return mean
And this fails for vectors and any >=3 dimensional arrays.
So I'm wondering how I can sum over a n-dimensional array without using any built-in functions. Any tips on how to achieve this?
You can use a combination of recursion and loop to achieve your objective without using any of numpy's methods.
import numpy as np
def find_mean_of_arrays(array):
sum = 0
for element in array:
if type(element) == type(np.array([1])):
sum += find_mean_of_arrays(element)
else:
sum += element
return sum/len(array)
Recursion is a powerful tool and it makes code more elegant and readable. This is yet another example
Unless you need to mean across a specific axis, the shape of the array does not matter to compute the mean. Making your first solution possible.
def take_mean(arr):
sum = 0
for i in arr.reshape(-1): # or arr.flatten()
sum += i
mean = sum/np.size(arr)
return mean
In this snippet of Python code,
fun iterates through the array arr and counts the number of identical integers in two array sections for every section pair. (It simulates a matrix.) This makes n*(n-1)/2*m comparisons in total, giving a time complexity of O(n^2).
Are there programming solutions or ways of reframing this problem that would yield equivalent results but have reduced time complexity?
# n > 500000, 0 < i < n, m = 100
# dim(arr) = n*m, 0 < arr[x] < 4294967311
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
arr is a shared memory array, therefore it's best kept read-only for simplicity and performance reasons.
arr is implemented as a 1D RawArray from multiprocessing. The reason for this it has by far the fastest performance according to my tests. Using a numpy 2D array, for example, like this:
arr = np.ctypeslib.as_array(mp.RawArray(ctypes.c_uint, n*m)).reshape(n,m)
would provide vectorization capabilities, but increases the total runtime by an order of magnitude - 250s vs. 30s for n = 1500, which amounts to 733%.
Since you can't change the array characteristics at all, I think you're stuck with O(n^2). numpy would gain some vectorization, but would change the access for others sharing the array. Start with the innermost operation:
for k in range(0,m):
count += (arr[i][k] == arr[j][k])
Change this to a one-line assignment:
count = sum(arr[i][k] == arr[j][k] for k in range(m))
Now, if this is truly an array, rather than a list of lists, use the array package's vectorization to simplify the loops, one at a time:
count = sum(arr[i] == arr[j]) # results in a vector of counts
You can now return the j indices where count[j] / m > 0.7. Note that there's no real need to return i for each one: it's constant within the function, and the calling program already has the value. Your array package likely has a pair of vectorized indexing operations that can return those indices. If you're using numpy, those are easy enough to look up on this site.
So after fiddling around some more, I was able to cut down the running time greatly with help from NumPy's vectorization and Numba's JIT compiler. Going back to the original code:
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
We can leave out the bottom return statement as well as dismiss the idea of using count entirely, leaving us with:
def fun(i):
for j in range(i-1,0,-1):
if sum(arr[i*m+k] == arr[j*m+k] for k in range(m)) > 0.7*m:
return (i,j)
Then, we change the array arr to a NumPy format:
np_arr = np.frombuffer(arr,dtype='int32').reshape(m,n)
The important thing to note here is that we do not use a NumPy array as a shared memory array to be written from multiple processes, avoiding the overhead pitfall.
Finally, we apply Numba's decorator and rewrite the sum function in vector form so that it works with the new array:
import numba as nb
#nb.njit(fastmath=True,parallel=True)
def fun(i):
for j in range(i-1, 0, -1):
if np.sum(np_arr[i] == np_arr[j]) > 0.7*m:
return (i,j)
This reduced the running time to 7.9s, which is definitely a victory for me.
I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
I am trying to do spatial derivatives and almost managed to get all the loops out of my code, but when I try to sum everything up at the end I have a problem.
I have a set of N~=250k nodes. I have found indices i,j of node pairs with i.size=j.size=~7.5M that are within a certain search distance, originally coming from np.triu_indices(n,1) and passed through a series of boolean masks to wash out nodes not influencing each other. Now I want to sum up the influences on each node from the other nodes.
I currently have this:
def sparseSum(a,i,j,n):
return np.array([np.sum(a[np.logical_or(i==k,j==k)],axis=0) for k in range(n)])
This is very slow. What I would like is something vectorized. If I had scipy I could do
def sparseSum(a,i,j,n):
sp=scipy.sparse.csr_matrix((a,(i,j)),shape=(n,n))+ scipy.sparse.csr_matrix((a,(j,i)),shape=(n,n))
return np.sum(sp, axis=0)
But I'm doing this all within an Abaqus implementation that doesn't include scipy. Is there any way to do this numpy-only?
Approach #1 : Here's an approach making use of matrix-multiplication and broadcasting -
K = np.arange(n)[:,None]
mask = (i == K) | (j == K)
out = np.dot(mask,a)
Approach #2 : For cases with a small number of columns, we can use np.bincount for such bin-based summing along each column, like so -
def sparseSum(a,i,j,n):
if len(a.shape)==1:
out=np.bincount(i,a,minlength=n)+np.bincount(j,a)
else:
ncols = a.shape[1]
out = np.empty((n,ncols))
for k in range(ncols):
out[:,k] = np.bincount(i,a[:,k],minlength=n) + np.bincount(j,a[:,k])
return out
Here's not a turn-key solution but one that adds columns of a sparse matrix. It essentially computes and utilises the csc representation
def sparse_col_sums(i, j, a, N):
order = np.lexsort(j, i)
io, jo, ao = i[order], j[order], a[order]
col_bnds = io.searchsorted(np.arange(N))
return np.add.reduceat(ao, col_bnds)
lets say i have arrays:
a = array((1,2,3,4,5))
indices = array((1,1,1,1))
and i perform operation:
a[indices] += 1
the result is
array([1, 3, 3, 4, 5])
in other words, the duplicates in indices are ignored
if I wanted the duplicates not to be ignored, resulting in:
array([1, 6, 3, 4, 5])
how would I go about this?
the example above is somewhat trivial, what follows is exactly what I am trying to do:
def inflate(self,pressure):
faceforces = pressure * cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.verts[self.faces[:,0]] += faceforces
self.verts[self.faces[:,1]] += faceforces
self.verts[self.faces[:,2]] += faceforces
def constrain_lengths(self):
vectors = self.verts[self.constraints[:,1]] - self.verts[self.constraints[:,0]]
lengths = sqrt(sum(square(vectors), axis=1))
correction = 0.5 * (vectors.T * (1 - (self.restlengths / lengths))).T
self.verts[self.constraints[:,0]] += correction
self.verts[self.constraints[:,1]] -= correction
def compute_normals(self):
facenormals = cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.normals.fill(0)
self.normals[self.faces[:,0]] += facenormals
self.normals[self.faces[:,1]] += facenormals
self.normals[self.faces[:,2]] += facenormals
lengths = sqrt(sum(square(self.normals), axis=1))
self.normals = (self.normals.T / lengths).T
Ive been getting some very buggy results as a result of duplicates being ignored in my indexed assignment operations.
numpy's histogram function is a scatter operation.
a += histogram(indices, bins=a.size, range=(0, a.size))[0]
You may need to take some care because if indices contains integers, small rounding errors could result in values ending up in the wrong bucket. In which case use:
a += histogram(indices, bins=a.size, range=(-0.5, a.size-0.5))[0]
to get each index into the centre of each bin.
Update: this works. But I recommend using #Eelco Hoogendoorn's answer based on numpy.add.at.
Slightly late to the party, but seeing how commonly this operation is required, and the fact that it still does not seem to be a part of standard numpy, ill put my solution here for reference:
def scatter(rowidx, vals, target):
"""compute target[rowidx] += vals, allowing for repeated values in rowidx"""
rowidx = np.ravel(rowidx)
vals = np.ravel(vals)
cols = len(vals)
data = np.ones(cols)
colidx = np.arange(cols)
rows = len(target)
from scipy.sparse import coo_matrix
M = coo_matrix((data,(rowidx,colidx)), shape=(rows, cols))
target += M*vals
def gather(idx, vals):
"""for symmetry with scatter"""
return vals[idx]
A custom C routine in numpy could easily be twice as fast still, eliminating the superfluous allocation of and multiplication with ones, for starters, but it makes a world of difference in performance versus a loop in python.
Aside from performance considerations, it is stylistically much more in line with other numpy-vectorized code to use a scatter operation, rather than mash some for loops in your code.
Edit:
Ok, forget about the above. As of the lastest 1.8 release, doing scatter operations is now directly supported in numpy at optimal efficiency.
def scatter(idx, vals, target):
"""target[idx] += vals, but allowing for repeats in idx"""
np.add.at(target, idx.ravel(), vals.ravel())
I don't know of a way to do it that is any faster than:
for face in self.faces[:,0]:
self.verts[face] += faceforces
You could also make self.faces into an array of 3 dictionaries where the keys correspond to the face and the value to the number of times it needs to be added. You'd then get code like:
for face in self.faces[0]:
self.verts[face] += self.faces[0][face]*faceforces
which might be faster. I do hope that someone comes up with a better way because I wanted to do this when trying to help someone speed-up their code earlier today.