I am trying to do some linear combination of numpy arrays.
I have three lists of numpy arrays:
a = [np.random.normal(0,1, [1,2]), np.random.normal(0,1, [3,4]), np.random.normal(0,1, [10,11])]
b = [np.random.normal(0,1, [1,2]), np.random.normal(0,1, [3,4]), np.random.normal(0,1, [10,11])]
c = [np.random.normal(0,1, [1,2]), np.random.normal(0,1, [3,4]), np.random.normal(0,1, [10,11])]
I want to element-wise combine each element in each array in list a and b based on corresponding element's value of c , to get a new list d: say d_i = a_i * c_i + (1-c_i) *b_i(linear combination).
What I thought was to pick each element in each array in a and find corresponding elements in b and c and then combine. However, I found this is troublesome, inefficient and a bit stupid. Could anyone suggest a better way?
Well assuming all of your lists are the same length then I don't think that there is going to be anything much more efficient than
d = [a[i] * c[i] + (1-c[i]) * b[i] for i in range(len(a))]
Now if all you need to do is operate upon the list d one time then maybe you could speed things up with a generator comprehension?
d = (a[i] * c[i] + (1-c[i]) * b[i] for i in range(len(a)))
But at the end of the day there is no way to create a linear combination of elements in less than linear time.
Related
Suppose A and B are two 4 dimensional numpy arrays with the same dimension.
A = np.random.rand(5,5,2,10)
B = np.random.rand(5,5,2,10)
a, b, c, d = A.shape
dat = []
for k in range(d):
sum = 0
for l in range(c):
sum = sum + np.einsum('ij,ji->', A[:,:,l,k], B[:,:,l,k])
dat.append(sum)
I was wondering whether I can use the "einsum" to replace the inner for loop, maybe even outer for loop, or maybe some matrix manipulation to replace all of it, casue the data set is large.
Is there any faster way to achieve this?
I have 2 large arrays, A and B, and I want to find where the vectors of A are in B. I have to find 10,000, 1 x 800 vectors among 40,000 vectors of the same size.
Example
A = [[1,2],[2,3],[4,5]]
B = [[2,3],[4,5]]
Desired Output:
[1,2]
I can find a single vector using np.argwhere((A == B[0]).all(-1)) but I am not sure how to shape the arrays to find the indices of each vector. I can use a for loop but that is too slow. For example
np.asarray([np.argwhere((A == B_[i]).all(-1)) for i in range(np.shape(A)[0])])
Setup
import numpy as np
rows_a = 40000
rows_b = 10000
size = 800
a = np.arange(rows_a * size).reshape((rows_a, size))
np.random.shuffle(a)
b = np.arange(rows_b * size).reshape((rows_b, size))
Solution
d = {tuple(v): i for i, v in enumerate(a)}
idx = [d[tuple(row)] for row in b]
Let's say that a has size m and b has size n.
d creates a mapping of the rows in a to their index. tuple(v) is necessary if v is not hashable, like lists and ndarrays. This has O(m) time complexity because you iterable over the rows once.
idx iterates over the rows in b and checks the dictionary to fetch the respective index in a. A dictionary lookup has O(1) time complexity and the loop O(n). All in all, you're looking at O(m+n), which is linear.
What you are doing instead is for each row in b, you check every row in a to find its index. This has O(m*n) complexity, which is quadratic.
I am trying to conduct something similar to searchsorted, but in the case where the array is not completely monotonic. Say I have a scalar, c and a 1D array x, I want to find the indices i of all elements such that x[i] < c <= x[i + 1]. Importantly, x is not completely monotonic.
The following code works, but I just would like to know if this is the most efficient way to do this, or if there is a simper way:
x = np.array([1,2,3,1,2,3,1,2,3])
c = 2.5
t = c > x[:-1]
u = c <= x[1:]
v = t*u
i = v.nonzero()[0]
Or in one line of code:
i = ( (c > x[:-1]) * (c <= x[1:] ).nonzero()[0]
Is this the most efficient way to recover these indices?
Two additional questions.
Is there an easy way to extend this to the case where c is a 1D array and x is a 2D array, where c has as many elements as "rows" in x, and I perform this search for each element of c in the corresponding "row" of x?
My ultimate goal is to do this with a three dimensional case. That is, suppose c is still a 1D vector with n elements. Now, let x be a 3D array, with dimensions j by n by k. Is there a way to do #1 above for each "submatrix" in x? Basically, performing #1 above j times.
For example:
x1 = np.array([1,2,3,1,2,3],[1,2,3,1,2,3],[1,2,3,1,2,3])
x2 = x1 + 1
x = np.array([x1,x2])
c = np.array([1.5,2.5,3.5])
Under #1 above, when we compare c and x1, we would get: [[0,4],[1,5],[]]
When we compare c and x2, we would get: [[],[0,4],[1,5]]
Finally, under #2, I would like to get:
[[[0,4],[1,5],[]],
[[],[0,4],[1,5]]]
We could compare once to give us the boolean mask and re-use it with negation to get the other comparison array and also use slicing -
m = c > x
i = np.flatnonzero( m[:-1] & ~m[1:] )
We can extend it to x as 2D and c as 1D case with a loop, but do minimal computations with it by pre-computing on the masks generation in a vectorized manner, like so -
m = c[:,None] > x
m2 = m[:,:-1] & ~m[:,1:]
i = [np.flatnonzero( mi ) for mi in m2]
On such task, numpy make too much comparisons. You can win a 5X factor with Numba. No difficulties to adapt for 3 dimensions.
#numba.njit
def ind(x,c):
res = empty_like(x)
i=j=0
while i < x.size-1:
if x[i]<c and c<=x[i+1]:
res[j]=i
j+=1
i+=1
return res[:j]
I have to evaluate the following expression, given two quite large matrices A,B and a very complicated function F:
The mathematical expression
I was thinking if there is an efficient way in order to first find those indices i,j that will give a non-zero element after the multiplication of the matrices, so that I avoid the quite slow 'for loops'.
Current working code
# Starting with 4 random matrices
A = np.random.randint(0,2,size=(50,50))
B = np.random.randint(0,2,size=(50,50))
C = np.random.randint(0,2,size=(50,50))
D = np.random.randint(0,2,size=(50,50))
indices []
for i in range(A.shape[0]):
for j in range(A.shape[0]):
if A[i,j] != 0:
for k in range(B.shape[1]):
if B[j,k] != 0:
for l in range(C.shape[1]):
if A[i,j]*B[j,k]*C[k,l]*D[l,i]!=0:
indices.append((i,j,k,l))
print indices
As you can see, in order to get the indices I need I have to use nested loops (= huge computational time).
My guess would be NO: you cannot avoid the for-loops. In order to find all the indices ij you need to loop through all the elements which defeats the purpose of this check. Therefore, you should go ahead and use simple array elementwise multiplication and dot product in numpy - it should be quite fast with for loops taken care by numpy.
However, if you plan on using a Python loop then the answer is YES, you can avoid them by using numpy, using the following pseudo-code (=hand-waving):
i, j = np.indices((N, M)) # CAREFUL: you may need to swap i<->j or N<->M
fs = F(i, j, z) # array of values of function F
# for a given z over the index grid
R = np.dot(A*fs, B) # summation over j
# return R # if necessary do a summation over i: np.sum(R, axis=...)
If the issue is that computing fs = F(i, j, z) is a very slow operation, then you will have to identify elements of A that are zero using two loops built-in into numpy (so they are quite fast):
good = np.nonzero(A) # hidden double loop (for 2D data)
fs = np.zeros_like(A)
fs[good] = F(i[good], j[good], z) # compute F only where A != 0
I am very new to numpy and I am trying to achieve the following in the most pythonic way. So, I have two arrays:
a=array([[0, 1, 2],[3,4,5]])
b=zeros(a.shape)
Now, what I would like is for each element in b for be one greater than the value of the corresponding element in a i.e b=a+1
I was wondering how this can be achieved in numpy.
The easiest way is the following:
b = a + 1
But if you want to iterate over the array yourself (although not recommended):
for i in range(len(a)):
for j in range(len(a[i])):
b[i][j] = a[i][j] + 1