Numpy submatrix(selected random index) calculation performance - python

I tried to calculate the sub-matrix using Numpy.
The shape of matrices are
A : (15000, 100)
B : (15000, 100)
B_ : (3000, 100)
C : (100, 100)
sample_index = np.random.choice(np.arange(int(15000*0.2)), size=int(int(15000*0.2)), replace=False)
and the first code is
for ki in range(100):
self.A[sample_index, k] += B_[:, k] - np.dot(self.A[sample_index, : ], C[:, k])
which only use sub matrix sliced from sample_index
and the second code is
for k in range(100):
self.A[:, k] += B[:, k] - np.dot(self.A[:, : ], C[:, k])
which use all matrix.
But the calculation time of first code is slower than second code.
Do you know any reason or any solutions to speed-up?

You are actually copying the input matrix. If you are just reading the input, you don't have to copy it.
import numpy as np
a = np.random.rand(10000).reshape(100, 100)
b = np.random.rand(10000).reshape(100, 100)
i = list(range(10))
a_sub0 = a[:10] # view
a_sub1 = a[i] # copying
# you can change the original matrix from the view
a_sub0[0, 0] = 100
(a[0, 0] == 100.0) and (a_sub1[0, 0] != 100.0) # True

Related

Update 2D NumPy array indexed with cross product with repeated indices

I'm interested in the version of Increment Numpy multi-d array with repeated indices indexed with a cross-product.
In particular, I want to perform the operation done by the following code using matrix operations to accelerate it:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
My idea was to compute all the (i, j) indices at once and use NumPy's ix_ method to index the matrix s:
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
s[np.ix_(x_idx, y_idx)] += image
return s
It is easier to understand the code above with NumPy's example:
Using ix_ one can quickly construct index arrays that will index the cross product. a[np.ix_([1,3],[2,5])] returns the array [[a[1,2] a[1,5]], [a[3,2] a[3,5]]].
In my case, it's likely that some indices will be repeated (as for example with grid_size=2, int(0 / grid_size) = int(1 / grid_size)). And that's where the Increment Numpy multi-d array with repeated indices question comes.
In case the indices are repeated, I would like to update the matrix with the image value by the same number of times. I cannot get any solution to this problem without any additional loops (e.g., zipping the indices; but you essentially have to perform the actual cross product of the indices for s and the image).
I don't think this is the best way to do it but here's one way.
import numpy as np
image = np.arange(9).reshape(3, 3)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2])
# find unique destinations
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
# create mask for the source and sumthe source pixels headed to the same destination
idxs_repeated = idxs[None, :, :].repeat(len(idxs_unique), axis = 0)
image_mask = (idxs_repeated == idxs_unique[:, None, :]).all(-1)
pixel_sum = (image.flatten()[None, :]*image_mask).sum(-1)
# assign summed sources to destination
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 1:
If you run into problems caused by memory constraints you can do the image masking and summation in batches as done in the following implementation. I set the batch size to 10 but that parameter can be set to whatever works on your machine.
import numpy as np
image = np.arange(12).reshape(3, 4)
s = np.zeros((5, 5))
x_idx, y_idx = np.meshgrid([0, 0, 2], [1, 1, 2, 1])
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
batch_size = 10
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
EDIT 2:
OP's method seems to be faster by far if you use numba.
import numpy as np
from numba import jit
#jit(nopython=True)
def get_s(image, grid_size):
W, H = image.shape
s = np.zeros((W, H))
for w in range(W):
for h in range(H):
i, j = int(w / grid_size), int(h / grid_size)
s[i, j] += image[w, h]
return s
def get_s_vec(image, grid_size, batch_size = 10):
W, H = image.shape
s = np.zeros((W, H))
w_idx, h_idx = np.arange(W), np.arange(H)
x_idx, y_idx = np.trunc(w_idx / grid_size).astype(int), np.trunc(h_idx / grid_size).astype(int)
y_idx, x_idx = np.meshgrid(y_idx, x_idx)
idxs = np.stack((x_idx.flatten(), y_idx.flatten())).T
idxs_unique, counts = np.unique(idxs, axis = 0, return_counts = True)
pixel_sum = []
for i in range(len(unique_idxs)//batch_size + ((len(unique_idxs)%batch_size)!=0)):
batch = idxs_unique[i*batch_size:(i+1)*batch_size, None, :]
idxs_repeated = idxs[None, :, :].repeat(len(batch), axis = 0)
image_mask = (idxs_repeated == idxs_unique[i*batch_size:(i+1)*batch_size, None, :]).all(-1)
pixel_sum.append((image.flatten()[None, :]*image_mask).sum(-1))
pixel_sum = np.concatenate(pixel_sum)
s[tuple(idxs_unique.T)] += pixel_sum
return s
print(f'loop result = {get_s(image, 2)}')
print(f'vector result = {get_s_vec(image, 2)}')
%timeit get_s(image, 2)
%timeit get_s_vec(image, 2)
output:
loop result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
vector result = [[10. 18. 0. 0.]
[17. 21. 0. 0.]
[ 0. 0. 0. 0.]]
The slowest run took 15.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 751 ns per loop
1000 loops, best of 5: 195 µs per loop
Does skimage.measure.block_reduce do
what you want?
from skimage.measure import block_reduce
s = block_reduce(image, block_size=(grid_size, grid_size), func=np.sum)

2D indexing of scipy sparse matrix

import numpy as np
import scipy.sparse
x = np.random.randint(0, 1000, (1000, 100))
# prob better way to do this
d = np.random.random((1000,1000))
d[d < 0.99] = 0
y = scipy.sparse.csr_matrix(d)
What I would like to do is to create a new matrix z containing the values of y at the indices in x.
ie [0, 0] of z should contain the y[0, x[0, 0]]
[0, 1] of z should contain the y[0, x[0, 1]]
%time for i in range(1000): x[i, y[i]].todense()
~247ms
%time for i in range(1000): np.take(x[i].todense(), y[i])
~150ms
both of the above work, but I am looking for a faster method- this is currently the bottleneck on my code.
Please assume that representing the whole scipy.sparse matrix as dense isn't feasible.
edit:
%time z = np.vstack([q.todense()[0, p] for q, p in zip(x, y)])
is ~110ms
The answer seems to be to use an appropriately shaped broadcasting index, as outlined here: How to generate multi-dimensional 2D numpy index using a sub-index for one dimension
(answer deserves more upvotes)!
%time res = y[np.arange(0, 1000).reshape((-1, 1)), x].todense()

Minimization model with initial starting values

I'm trying to solve a minimization problem where an initial solution is already present and the objective function is based on this initial solution.
I have some sort of line y_line which is an initial mapping of resources and stations:
y_line = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
Additionally, I have a savings array for selling from the line S, an array for buying new EC and for processing P
S = np.array([[-260., -260., -260.],
[-30., -30., -30.],
[360., 360., 360.]], dtype=int)
EC = np.array([[1000, 1000, 1000],
[2000, 2000, 2000],
[5000, 5000, 5000]], dtype=int)
P = np.array([[720., 720., 720.],
[1440., 1440., 1440.],
[3600., 3600., 3600.]], dtype=int)
Using just a simplified constraint: every workstation i must have at least one resource j -> sum(y[i, j] for j in j_idx) == 1 for all i in i_idx.
My objective is that every sold resource from the initial y_line brings us savings, every newly bought costs us and the solution (the new line) y has a processing cost for operating. I have defined the objective as follows:
y_delta = y - y_line # delta between new line (y) and old line (y_line)
y_delta_plus = np.zeros(y.shape, dtype=object) # 1
y_delta_minus = np.zeros(y.shape, dtype=object) # 2
# I -> new bought resources
y_delta_plus[y_delta >= 0] = y_delta[y_delta >= 0]
# II -> sold resources
y_delta_minus[y_delta <= 0] = y_delta[y_delta <= 0]
c_i = y_delta_plus * EC # invest
c_s = y_delta_minus * S # savings
c_p = y * P # processing cost
c_y = np.sum(c_s + c_i + c_p)
However, if I solve this model (full code see below), then the objective value (5760) doesn't match my sanity check calculations (12430). Would it be possible to set initial values for y[i, j]? Or is there another function to achieve this?
from ortools.linear_solver import pywraplp
import numpy as np
y_line = np.array([[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
S = np.array([[-260., -260., -260.],
[-30., -30., -30.],
[360., 360., 360.]], dtype=int)
EC = np.array([[1000, 1000, 1000],
[2000, 2000, 2000],
[5000, 5000, 5000]], dtype=int)
P = np.array([[720., 720., 720.],
[1440., 1440., 1440.],
[3600., 3600., 3600.]], dtype=int)
solver = pywraplp.Solver('stack', pywraplp.Solver.SAT_INTEGER_PROGRAMMING)
y = np.zeros_like(y_line, dtype=object)
i_idx = range(y_line.shape[0])
j_idx = range(y_line.shape[1])
for i in i_idx:
for j in j_idx:
y[i, j] = solver.IntVar(0, 1, 'y[%i_%i]' % (i, j))
for i in i_idx:
solver.Add(
sum(y[i, j] for j in j_idx) == 1
)
def objective(y, y_line):
y_delta = y - y_line # delta between new line (y) and old line (y_line)
y_delta_plus = np.zeros(y.shape, dtype=object) # 1
y_delta_minus = np.zeros(y.shape, dtype=object) # 2
# I -> new bought resources
y_delta_plus[y_delta >= 0] = y_delta[y_delta >= 0]
# II -> sold resources
y_delta_minus[y_delta <= 0] = y_delta[y_delta <= 0]
c_i = y_delta_plus * EC # invest
c_s = y_delta_minus * S # savings
c_p = y * P # processing
return np.sum(c_s + c_i + c_p)
c_y = objective(y=y, y_line=y_line)
solver.Minimize(
c_y
)
# [START solve]
print("Number of constraints:", solver.NumConstraints())
print("Number of variables:", solver.NumVariables())
status = solver.Solve()
# [END solve]
y_new = np.zeros_like(y)
for i in range(y_line.shape[0]):
for j in range(y_line.shape[1]):
if y[i, j].solution_value() > 0:
y_new[i, j] = y[i, j].solution_value()
print(f"Objective sat: {solver.Objective().Value()}")
print(y_new)
# Number of constraints: 3
# Number of variables: 9
# Objective sat: 5760.0
# [[1.0 0 0]
# [1.0 0 0]
# [1.0 0 0]]
# %%
c_y_test = objective(y=y_new, y_line=y_line)
c_y_test # -> 12430.0
The model can be solved. However, not with the approach, I chose in the first place. Using a pywraplp model it didn't work, yet with a cp_model it can be solved using predefined variables (as mentioned by #sascha). The arrays y_line, S, EC and P are the same as above. The solemn constraint is the same as well. Yet, the "filtering" I could solve using:
for i in range(len(y_cp.flatten())):
model.AddElement(i, y_delta.flatten().tolist(), y_cp.flatten().tolist()[i] - y_line.flatten().tolist()[i])
for i in i_idx:
for j in j_idx:
model.AddMaxEquality(y_delta_plus[i, j], [y_delta[i, j], model.NewConstant(0)])
model.AddMinEquality(y_delta_minus[i, j], [y_delta[i, j], model.NewConstant(0)])
model.Minimize(
np.sum(y_delta_plus * EC) + np.sum(y_delta_minus * S) + np.sum(y_cp * P)
)
The solving and sanity check yields:
solver_cp = cp_model.CpSolver()
solver_cp.Solve(model)
y_new_cp = np.zeros_like(y_cp)
for i in i_idx:
for j in j_idx:
if solver_cp.Value(y_cp[i, j]) > 0:
y_new_cp[i, j] = solver_cp.Value(y_cp[i, j])
print(f"Objective cp: {solver_cp.ObjectiveValue()}")
print(y_new_cp)
# Objective cp: 5760.0
# [[1 0 0]
# [0 1 0]
# [1 0 0]]
c_y_test = objective(y=y_new_cp, y_line=y_line)
c_y_test # -> 5760 -> Correct
The cp_model could solve it and match the sanity check.
With the pywraplp model I couldn't figure out how to solve it.

Efficient way to perform if condition nested in for loop in python

Is there an efficient pythonic way to perform if conditions in nested for loops:
import numpy as np
big = 3
med = 2
small = 5
mat1 = np.zeros((big, 3))
mat2 = np.zeros((big, med, 3))
mat3 = np.zeros((big, med, small))
mat1 = np.array([
[0,0,0],\
[1.0,0.5,0.2],\
[0.2,0.1,-0.1]])
mat2 = np.array([[
[1.0,0.5,0.2],\
[0.1,0.1,0.1]],\
[[0.2,0.2,0.2],\
[1.0,-0.5,-0.2]],\
[[1.0,-0.5,-0.2],\
[-1.0,0.5,-0.2]]])
mat3 = np.array([[
[1,1,1,1,1],\
[0,21,1,3,5]],\
[[1,2,3,4,5],\
[-1,-2,-2,-3,-4]],\
[[1.0,1.2,1.3,1.4,1.5],\
[5,4,3,2,1]]])
sol = np.zeros((small))
for ii in np.arange(big):
found = False
for jj in np.arange(big):
for kk in np.arange(med):
if all(abs(mat1[ii, :] - mat2[jj, kk, :]) < 1E-8):
found = True
sol = mat3[jj, kk, :]
print(sol)
break
if found:
break
where big and med can be much bigger. The above dummy code works but is very slow. Is there a way to speed it up ?
Note: the mat1, mat2 and mat3 are floats (not integer) and are not zeros in practice.
Solution:
The solution for me was the following (greatly benefiting from #LRRR answer):
for ii in np.arange(big):
tmp = mat1[ii, :]
A = np.tile(tmp[:], (med, 1))
AA = np.repeat(A[np.newaxis, :], big, 0)
sub = abs(AA - mat2) < 1E-8
tmp2 = mat3[sub.all(axis=2)]
if (len(tmp2) > 0):
val = tmp2[0, :]
Note that because I had other complications I kept the outer loop.
The if statement is required as I want the first occurrence of a match.
Also worth noting, this is significantly faster but probably can be made even faster since we could stop at the match rather than having all matches.
If I understand correctly your goal is for each row of mat1, subtract each row in each matrix of mat2, check if all values in the resultant vector are negative, and if true then use that index to return the values from mat3?
Here's an example on smaller data:
import random
import numpy as np
random.seed(10)
big = 5
med = 3
small = 2
mat1 = np.random.randint(0, 10, (big, 3))
mat2 = np.random.randint(0, 10, (big, med, 3))
mat3 = np.random.randint(0, 10, (big, med, small))
# Row subtractions
A = abs(np.repeat(mat1[:, np.newaxis], med, 1) - mat2) < 1E-8
# Extract from mat3
mat3[A.all(axis = 2)]
Breaking it down mat1[:, np.newaxis] increases the array by another dimension and np.repeat() will duplicate each row, so the sizes of mat1 and mat2 will line up to do a simple subtraction between the two.
Note: I left out the abs() from your original code on the line if all(abs(mat1[ii, :] - mat2[jj, kk, :]) < 1E-8):. It seems that by taking the absolute value, the condition < 1E-8 will never be satisfied.
Update:
Here's the redo using the new data added to the original post:
# Repeat each row of mat1 for rows in mat2
A = np.repeat(mat1, big * med, 0)
# Reshape mat2 to match matrix A
B = mat2.reshape(big*med, 3)
C = np.tile(B, (big, 1))
# Subtraction rows
sub = abs(A - C) < 1E-8
# Find values from tiled mat2
values = C[sub.all(axis = 1)]
# Get indices on reshaped mat2
indices = np.all(B == values, axis=1)
# Reshape mat3
M = mat3.reshape(big * med, small)
# Result
M[indices]
output: array([[1., 1., 1., 1., 1.]])

Sparse Scipy/Numpy: an efficient way to implement sum of pairwise mins operation

Computing the sum of pairwise mins between vectors is very popular in natural language processing (NLP) and is used in computing the intersecting histogram kernel [1]. However, in NLP we frequently deal with sparse matrices.
Here is an inefficient way that uses the slow for loops to compute this operation:
import numpy as np
from scipy.sparse import csr_matrix
# Initialize sparse matrices
A = csr_matrix(np.clip(np.random.randn(100, 64) - 1, 0, np.inf))
B = csr_matrix(np.clip(np.random.randn(64, 100) - 1, 0, np.inf))
# For each row, col vector i,j in A and B respectively
G = np.zeros((100, 100))
for i in range(A.shape[0]):
for j in range(B.shape[1]):
G[i, j] = A[i].minimum(B[:,j]).sum()
Is there a way to do this without the for loop ?
I wouldn't mind a for loop if it can be compiled such as with using jit in numba.
A fast dense version of this is given here: Numpy: an efficient way to implement sum of pairwise mins operation
Thanks.
[1] http://blog.datadive.net/histogram-intersection-for-change-detection/
Here is an implementation that should be ok efficient, leveraging sparseness as best as it can. There is a loop but only along one dim, so should be not too bad.
import numpy as np
from scipy.sparse import csr_matrix, csc_matrix
M, N, K = 640, 100, 650
B1 = csr_matrix(np.clip(np.random.randn(N, K) - 1, 0, np.inf))
B2 = csr_matrix(np.clip(np.random.randn(N, K) - 1, 0, np.inf))
B = B1-B2
A1 = csc_matrix(np.clip(np.random.randn(M, N) - 1, 0, np.inf))
A2 = csc_matrix(np.clip(np.random.randn(M, N) - 1, 0, np.inf))
A = A1-A2
result = np.zeros((M, K))
for j in range(N):
ia = A.indices[A.indptr[j] : A.indptr[j+1]]
ib = B.indices[B.indptr[j] : B.indptr[j+1]]
IA, IB = np.ix_(ia, ib)
da = A.data[A.indptr[j] : A.indptr[j+1]]
db = B.data[B.indptr[j] : B.indptr[j+1]]
# both nonzero
result[IA, IB] += np.minimum.outer(da, db)
# one negative ...
am = da<0
iam, dam = ia[am], da[am]
bm = db<0
ibm, dbm = ib[bm], db[bm]
# ... the other zero
za = np.ones((M,), dtype=bool)
za[ia] = False
zb = np.ones((K,), dtype=bool)
zb[ib] = False
IA, IB = np.ix_(iam, zb)
result[IA, IB] += dam[:, None]
IA, IB = np.ix_(za, ibm)
result[IA, IB] += dbm
# compare with dense method
print(np.allclose(result, np.minimum(A.A[..., None], B.A).sum(axis=1)))
Prints
True
Well, at least in recent versions of SciPy there is a function scipy.sparse.csr_matrix.minimum Link to documentation which is the equivalent of numpy.minimum in term of element-wise minimum. However, I don't know how computationally efficient that is.

Categories

Resources