I am trying to do spatial derivatives and almost managed to get all the loops out of my code, but when I try to sum everything up at the end I have a problem.
I have a set of N~=250k nodes. I have found indices i,j of node pairs with i.size=j.size=~7.5M that are within a certain search distance, originally coming from np.triu_indices(n,1) and passed through a series of boolean masks to wash out nodes not influencing each other. Now I want to sum up the influences on each node from the other nodes.
I currently have this:
def sparseSum(a,i,j,n):
return np.array([np.sum(a[np.logical_or(i==k,j==k)],axis=0) for k in range(n)])
This is very slow. What I would like is something vectorized. If I had scipy I could do
def sparseSum(a,i,j,n):
sp=scipy.sparse.csr_matrix((a,(i,j)),shape=(n,n))+ scipy.sparse.csr_matrix((a,(j,i)),shape=(n,n))
return np.sum(sp, axis=0)
But I'm doing this all within an Abaqus implementation that doesn't include scipy. Is there any way to do this numpy-only?
Approach #1 : Here's an approach making use of matrix-multiplication and broadcasting -
K = np.arange(n)[:,None]
mask = (i == K) | (j == K)
out = np.dot(mask,a)
Approach #2 : For cases with a small number of columns, we can use np.bincount for such bin-based summing along each column, like so -
def sparseSum(a,i,j,n):
if len(a.shape)==1:
out=np.bincount(i,a,minlength=n)+np.bincount(j,a)
else:
ncols = a.shape[1]
out = np.empty((n,ncols))
for k in range(ncols):
out[:,k] = np.bincount(i,a[:,k],minlength=n) + np.bincount(j,a[:,k])
return out
Here's not a turn-key solution but one that adds columns of a sparse matrix. It essentially computes and utilises the csc representation
def sparse_col_sums(i, j, a, N):
order = np.lexsort(j, i)
io, jo, ao = i[order], j[order], a[order]
col_bnds = io.searchsorted(np.arange(N))
return np.add.reduceat(ao, col_bnds)
Related
This is basically an interview style question that I need to solve, but I have only been able to find a O(m*n) solution. Is there any way this can be optimized further?
Write a function that takes a pair of indices for a 2D array and prints out the number at the provided indices where each number is the sum of the value to the left and above itself. the first row and the first column are filled with 1s. So the value at (row, col) is (row-1, col) + (row,col-1) :(0,0) is 1, (1,1) is 2, (2,1) is 3, (5,3) is 56 and so on.
Currently, I have a O(m*n) solution, but I need to optimize this further. this is my O(mn) solution using dynamic programming:
from functools import lru_cache
def solution(m: int, n: int):
#lru_cache(maxsize=None)
def dp(row, col):
if row == 0 or col == 0:
return 1
return dp(row - 1, col) + dp(row, col - 1)
return dp(m, n)
row = int(input('row?\n'))
col = int(input('col?\n'))
print(solution(row, col))
Some hint on optimization
dp[i][j] = dp[j][i] as there is symmetry in the structure. You can use this to optimize the approach. But it will not change complexity of the algorithm.
Alternate Solution:
If you study the values carefully, you can find out that
dp[i][j] = (i+j)Ci = (i+j)Cj
Hope this can help you reach optimization you were looking for. It can be done in ~O(N log N) where N = i+j
Could someone please help me out with the following?
I have one dataframe with two columns: products and webshops (n x 2) with n products. Now I would like to obtain a binary (n x n) matrix with all products listed as the indices and all products listed as the column names. Then each cell should contain a 1 or 0 denoting whether the product in the index and column name came from the same webshop.
The following code is returning what I would like to achieve.
dist = np.empty((len(df_title), len(df_title)), int)
for i in range(0,len(df_title)):
for j in range(0,len(df_title)):
boolean = df_title.values[i][1] == df_title.values[j][1]
dist[i][j] = boolean
df = pd.DataFrame(dist)
However, this code takes quite a significant time already for n = 1624. Therefore I was wondering if someone would have an idea for a faster algorithm.
Thanks!
It seems like you're only interested in the element at position 1 for every column anyways, so creating a temp-variable for easier lookup could help:
lookup = df_title.values[:, 1]
Also since you want to interpret the resulting matrix as bool-matrix, you should probably specify dtype=bool (1 byte per field) instead of dtype=int (8 bytes per field), which also cuts down memory consumption by 8.
dist = np.empty((len(df_title), len(df_title)), dtype=bool)
Your matrix will be symmetric along the diagonal anyways, so you only need to compute "half" of the matrix, also if i == j we know the corresponding field in the matrix should be True.
lookup = df_title.values[:, 1]
dist = np.empty((len(df_title), len(df_title)), dtype=bool)
for i in range(len(df_title)):
for j in range(len(df_title)):
if i == j:
# diagonal
dist[i, j] = True
else:
# symmetric along diagonal
dist[i, j] = dist[j, i] = lookup[i] == lookup[j]
Also using numpy-broadcasting you could actually transform all of that into a single line of code, that is orders of magnitude faster than the double-for-loop solution:
lookup = df_title.values[:, 1]
dist = lookup[None, :] == lookup[:, None]
I really love the data structure and Algorithms.
I am working with a matrix 80000 X 80000 to insert data. I am using numpy. And, my code looks like this:
n = 80000
similarity = np.zeros((n, n), dtype='int8')
for i, photo_i in enumerate(photos):
for j, photo_j in enumerate(photos[i:]):
similarity[i, j] = score(photo_i, photo_j)
if i % 100 == 0:
print(i)
This piece of code is taking too much time. score function is O(1). I was wondering if there could be a better way to do this. I want to plot the data of this matrix in "short time" possible. But the way, I am doing it has the complexity of O(n^2).
Is there "anything", with that it can be "optimized" or maybe by using different Data structure?
I have already read similar questions on SO and they have mentioned pytables. I will definitely try it but don't know yet how. Any suggestion is welcome.
Thanks in advance.
There's a bunch of different things you could do, which all revolve around avoiding the explicit for-loops, which are slow in Python, and delegating to C-level code (either using Python's underlying C runtime or numpy's builtin array creation methods).
Using fromfunction
Numpy has a built-in function for populating a matrix from a function taking coordinates: numpy.fromfunction. This might be faster since it does all the iteration and assignment in C instead of Python.
You'd have to supply it a score-by-coordinates function, e.g.:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j])
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
The photos=photos in the function definition makes the photos array a local of the function and saves some time accessing it on each invocation; this is a common Python micro-optimization technique.
Note that this computes the similarity for the entire matrix instead of just a triangle. To fix this, you could do:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j]) if i < j else 0
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
similarity += similarity.T # fill in other triangle from transposed matrix
Using comprehensions
You could also try creating the similarity matrix from a generator comprehension (or even a list comprehension), again avoiding the explicit for-loops in favor of a comprehension which is faster, but sacrificing the triangle optimization:
similarity = numpy.fromiter((score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos),
shape=(n,n), dtype='int8')
# or:
similarity = numpy.array([score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos],
shape=(n,n), dtype='int8')
To re-introduce the triangle optimization, you could do something like:
similarity = numpy.array([score(photo_i, photo_j) if i < j else 0
for i, photo_i in enumerate(photos)
for j, photo_j in enumerate(photos)],
shape=(n,n), dtype='int8')
similarity += similarity.T
Using triu_indices to populate a triangle directly
Finally, you could use numpy.triu_indices to assign directly into the matrix's upper (and then lower) triangle:
similarity_values = (score(photo_i, photo_j
for photo_i in photos
for photo_j in photos[:i]) # only computing values for the triangle
similarity = np.zeroes((n,n), dtype='int8')
xs, ys = np.triu_indices(n, 1)
similarity[xs, ys] = similarity_values
similarity[ys, xs] = similarity_values
similarity[np.diag_indices(n)] = 1 # assuming score(x, x) == 1
This approach is inspired by this related question: https://codereview.stackexchange.com/questions/107094/create-symmetrical-matrix-from-list-of-values
I don't have a means of benchmarking which of these approaches would work best, but you could experiment and find out. Good luck!
I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
Say I would like to remove the diagonal from a scipy.sparse.csr_matrix. Is there an efficient way of doing so? I saw that in the sparsetools module there are C functions to return the diagonal.
Based on other SO answers here and here my current approach is the following:
def csr_setdiag_val(csr, value=0):
"""Set all diagonal nonzero elements
(elements currently in the sparsity pattern)
to the given value. Useful to set to 0 mostly.
"""
if csr.format != "csr":
raise ValueError('Matrix given must be of CSR format.')
csr.sort_indices()
pointer = csr.indptr
indices = csr.indices
data = csr.data
for i in range(min(csr.shape)):
ind = indices[pointer[i]: pointer[i + 1]]
j = ind.searchsorted(i)
# matrix has only elements up until diagonal (in row i)
if j == len(ind):
continue
j += pointer[i]
# in case matrix has only elements after diagonal (in row i)
if indices[j] == i:
data[j] = value
which I then follow with
csr.eliminate_zeros()
Is that the best I can do without writing my own Cython code?
Based on #hpaulj's comment, I created an IPython Notebook which can be seen on nbviewer. This shows that out of all methods mentioned the following is the fastest (assume that mat is a sparse CSR matrix):
mat - scipy.sparse.dia_matrix((mat.diagonal()[scipy.newaxis, :], [0]), shape=(one_dim, one_dim))