Loop through varied number of matrices using numpy - python

Here is the functionality demonstrated on a fixed number of matrices:
x = np.matrix('0.5')
y = np.matrix('0.5 0.5; 0.5 0.5')
z = np.matrix('0.75 0.25; 0.34 0.66')
output = []
for i in x.flat:
for j in y.flat:
for k in z.flat:
output.append(i * j * k)
I need help solving this issue on a variable number of matrices. I have tried using
reduce(np.dot, arr)
But this is not what I want to do.

With A holding the list of input matrices, we could just iteratively use np.outer. np.outer would flatten the inputs on its own, so, we don't need to do it ourselves and only a final flattening step would be needed.
Thus, solution would be -
A = [x,y,z,w]
out = A[0]
for i in A[1:]:
out = np.outer(out, i)
out = out.ravel()
Note that the output would be an array. If needed as a matrix, simply wrap it with np.matrix() at the end.
Sample run for 4 matrices -
In [38]: x = np.matrix('0.5')
...: y = np.matrix('0.15 0.25; 0.35 0.45')
...: z = np.matrix('0.75 0.25; 0.34 0.66')
...: w = np.matrix('0.45 0.15; 0.8 0.2')
...:
...: output = []
...: for i in x.flat:
...: for j in y.flat:
...: for k in z.flat:
...: for l in w.flat:
...: output.append(i * j * k * l)
...:
In [64]: A = [x,y,z,w]
...: out = A[0]
...: for i in A[1:]:
...: out = np.outer(out, i)
...: out = out.ravel()
...:
In [65]: np.allclose(output, out)
Out[65]: True

Related

Convert Python For-Loop to NumPy Operations

I have a NumPy array full of indices:
size = 100000
idx = np.random.randint(0, size, size=size)
And I have a simple function that loops over the indices and does:
out = np.zeros(size, dtype=np.int)
for i in range(size):
j = idx[i]
out[min(i, j)] = out[min(i, j)] + 1
out[max(i, j)] = out[max(i, j)] - 1
return np.cumsum(out)
This is quite slow when size is large and I am hoping to find a faster way to accomplish this. I've tried this but it isn't quite right:
out = np.zeros(size, dtype=np.int)
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
out[mini] = out[mini] + 1
out[maxi] = out[maxi] - 1
return np.cumsum(out)
We can make use of np.bincount -
R = np.arange(size)
out = np.bincount(np.minimum(R,idx),minlength=size)
out -= np.bincount(np.maximum(R,idx),minlength=size)
final_out = out.cumsum()
Timings -
All posted solutions use cumsum at the end. So, let's time these skipping that last step -
In [25]: np.random.seed(0)
...: size = 100000
...: idx = np.random.randint(0, size, size=size)
# From this post
In [27]: %%timeit
...: R = np.arange(size)
...: out = np.bincount(np.minimum(R,idx),minlength=size)
...: out -= np.bincount(np.maximum(R,idx),minlength=size)
1000 loops, best of 3: 643 µs per loop
# #slaw's solution
In [28]: %%timeit
...: i = np.arange(size)
...: j = idx[i]
...: mini = np.minimum(i, j)
...: maxi = np.maximum(i, j)
...:
...: unique_mini, mini_counts = np.unique(mini, return_counts=True)
...: unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
...:
...: out = np.zeros(size, dtype=np.int)
...: out[unique_mini] = out[unique_mini] + mini_counts
...: out[unique_maxi] = out[unique_maxi] - maxi_counts
100 loops, best of 3: 13.3 ms per loop
# Loopy one from question
In [29]: %%timeit
...: out = np.zeros(size, dtype=np.int)
...:
...: for i in range(size):
...: j = idx[i]
...: out[min(i, j)] = out[min(i, j)] + 1
...: out[max(i, j)] = out[max(i, j)] - 1
10 loops, best of 3: 141 ms per loop
This seems to give the same answer as the for-loop
i = np.arange(size)
j = idx[i]
mini = np.minimum(i, j)
maxi = np.maximum(i, j)
unique_mini, mini_counts = np.unique(mini, return_counts=True)
unique_maxi, maxi_counts = np.unique(maxi, return_counts=True)
out = np.zeros(size, dtype=np.int)
out[unique_mini] = out[unique_mini] + mini_counts
out[unique_maxi] = out[unique_maxi] - maxi_counts
return np.cumsum(out)

Refactor matrix permutations in numpy's style

I wrote the following code to do multiplication of matrix permutations and I was wondering if it can be written in a numpy style, such that I can get rid of the two for loops:
Z = np.empty([new_d, X.shape[1]])
Z = np.ndarray(shape=(new_d, X.shape[1]))
Z = np.concatenate((X, X**2))
res = []
for i in range(0, d):
for j in range(i+1, d):
res.append(np.array(X.T[:,i]* X.T[:,j]))
Z = np.concatenate((Z, res))
while: X shape is (7, 1000), d = 7, new_d=35
any suggestion ?
Approach #1
We could use np.triu_indices to get those pair-wise permutation-indices and then simply perform elementwise multiplicatons of row-indexed arrays -
r,c = np.triu_indices(d,1)
res = X[r]*X[c]
Approach #2
For memory efficiency and hence performance especially on large arrays, we are better off slicing the input array and run a single loop with each iteration working on chunks of data, like so -
n = d-1
idx = np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
L = n*(n+1)//2
res_out = np.empty((L,X.shape[1]), dtype=X.dtype)
for i,(s0,s1) in enumerate(zip(start,stop)):
res_out[s0:s1] = X[i] * X[i+1:]
To get Z directly and thus avoid all those concatenations, we could modify the earlier posted approach, like so -
n = d-1
N = len(X)
idx = 2*N + np.concatenate(( [0], np.arange(n,0,-1).cumsum() ))
start, stop = idx[:-1], idx[1:]
L = n*(n+1)//2
Z_out = np.empty((2*N + L,X.shape[1]), dtype=X.dtype)
Z_out[:N] = X
Z_out[N:2*N] = X**2
for i,(s0,s1) in enumerate(zip(start,stop)):
Z_out[s0:s1] = X[i] * X[i+1:]

Efficiently compute pairwise equal for NumPy arrays

Given two NumPy arrays, say:
import numpy as np
import numpy.random as rand
n = 1000
x = rand.binomial(n=1, p=.5, size=(n, 10))
y = rand.binomial(n=1, p=.5, size=(n, 10))
Is there a more efficient way to compute X in the following:
X = np.zeros((n, n))
for i in range(n):
for j in range(n):
X[i, j] = 1 * np.all(x[i] == y[j])
Approach #1 : Input arrays with 0s & 1s
For input arrays with 0s and 1s only, we can reduce each of their rows to scalars and hence the input arrays to 1D and then leverage broadcasting, like so -
n = x.shape[1]
s = 2**np.arange(n)
x1D = x.dot(s)
y1D = y.dot(s)
Xout = (x1D[:,None] == y1D).astype(float)
Approach #2 : Generic case
For a generic case, we can use views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
x1D, y1D = view1D(x, y)
Xout = (x1D[:,None] == y1D).astype(float)
Runtime test
# Setup
In [287]: np.random.seed(0)
...: n = 1000
...: x = rand.binomial(n=1, p=.5, size=(n, 10))
...: y = rand.binomial(n=1, p=.5, size=(n, 10))
# Original approach
In [288]: %%timeit
...: X = np.zeros((n, n))
...: for i in range(n):
...: for j in range(n):
...: X[i, j] = 1 * np.all(x[i] == y[j])
1 loop, best of 3: 4.69 s per loop
# Approach #1
In [290]: %%timeit
...: n = x.shape[1]
...: s = 2**np.arange(n)
...: x1D = x.dot(s)
...: y1D = y.dot(s)
...: Xout = (x1D[:,None] == y1D).astype(float)
1000 loops, best of 3: 1.42 ms per loop
# Approach #2
In [291]: %%timeit
...: x1D, y1D = view1D(x, y)
...: Xout = (x1D[:,None] == y1D).astype(float)
100 loops, best of 3: 18.5 ms per loop

Vectorization/optimising for loop with numpy in Python

Im writing a script to handle some data from a sensor represented in the signal_gen function. As you can see in the testing function it is quite loop sentered. Since this function is called many times it makes it a bit slow and it would be lovely with a push in the right direction for optimising it.
I have read that it is possible to exchange the for loop with a vectorizatid array, but I can't get my head around how the i_avg[i] line should be written, since we have single element y[i] multiplied with the whole array x inside a np.cos, and all this is again just one irritation of i_avg.
def testing(signal):
y = np.arange(0.0108, 0.0135, 0.001) # this one changes over time, set
#to constant for easier reading
x = np.arange(0, (len(signal)))
I_avg = np.zeros(len(y))
Q_avg = np.zeros_like(I_avg)
for i in range(0, len(y)):
I_avg[i] = np.array(signal * (np.cos(2 * np.pi * y[i] * x))).sum()
Q_avg[i] = np.array(signal * (np.sin(2 * np.pi * y[i] * x))).sum()
D = np.power(I_avg, 2) + np.power(Q_avg, 2)
max_index = np.argmax(D)
phaseOut = np.arctan2(Q_avg[max_index], I_avg[max_index])
#just a test signal
def signal_gen():
signal = np.random.random(size=251)
return signal
One vectorized approach using matrix-multiplication with numpy.dot to replace the nested loop to give us I_avg, Q_avg and also incorporating NumPy broadcasting and thus achieve a more efficient solution would be like so -
mult = 2*np.pi*y[:,None]*x
I_avg, Q_avg = np.cos(mult).dot(signal), np.sin(mult).dot(signal)
Please note that for the given sample, we are competing against a loopy version that only has to iterate for 3 iterations (y being of length 3). As such, we won't be seeing a huge speedup here.
Runtime test -
In [9]: #just a test signal
...: signal = np.random.random(size=251)
...: y = np.arange(0.0108, 0.0135, 0.001)
...: x = np.arange(0, (len(signal)))
...:
# Original approach
In [10]: %%timeit I_avg = np.zeros(len(y))
...: Q_avg = np.zeros_like(I_avg)
...: for i in range(0, len(y)):
...: I_avg[i] = np.array(signal * (np.cos(2 * np.pi * y[i] * x))).sum()
...: Q_avg[i] = np.array(signal * (np.sin(2 * np.pi * y[i] * x))).sum()
...:
10000 loops, best of 3: 68 µs per loop
# Proposed approach
In [11]: %%timeit mult = 2*np.pi*y[:,None]*x
...: I_avg, Q_avg = np.cos(mult).dot(signal), np.sin(mult).dot(signal)
...:
10000 loops, best of 3: 34.8 µs per loop
You can use np.einsum for broadcasting:
yx = 2*np.pi*np.einsum("i,j->ij", y, x)
I_avg = np.sin(yx) # signal
Q_avg = np.cos(yx) # signal

Vectorizing NumPy covariance for 3D array

I have a 3D numpy array of shape (t, n1, n2):
x = np.random.rand(10, 2, 4)
I need to calculate another 3D array y which is of shape (t, n1, n1) such that:
y[0] = np.cov(x[0,:,:])
...and so on for all slices along the first axis.
So, a loopy implementation would be:
y = np.zeros((10,2,2))
for i in np.arange(x.shape[0]):
y[i] = np.cov(x[i, :, :])
Is there any way to vectorize this so I can calculate all covariance matrices in one go? I tried doing:
x1 = x.swapaxes(1, 2)
y = np.dot(x, x1)
But it didn't work.
Hacked into numpy.cov source code and tried using the default parameters. As it turns out, np.cov(x[i,:,:]) would be simply :
N = x.shape[2]
m = x[i,:,:]
m -= np.sum(m, axis=1, keepdims=True) / N
cov = np.dot(m, m.T) /(N - 1)
So, the task was to vectorize this loop that would iterate through i and process all of the data from x in one go. For the same, we could use broadcasting at the third step. For the final step, we are performing sum-reduction there along all slices in first axis. This could be efficiently implemented in a vectorized manner with np.einsum. Thus, the final implementation came to this -
N = x.shape[2]
m1 = x - x.sum(2,keepdims=1)/N
y_out = np.einsum('ijk,ilk->ijl',m1,m1) /(N - 1)
Runtime test
In [155]: def original_app(x):
...: n = x.shape[0]
...: y = np.zeros((n,2,2))
...: for i in np.arange(x.shape[0]):
...: y[i]=np.cov(x[i,:,:])
...: return y
...:
...: def proposed_app(x):
...: N = x.shape[2]
...: m1 = x - x.sum(2,keepdims=1)/N
...: out = np.einsum('ijk,ilk->ijl',m1,m1) / (N - 1)
...: return out
...:
In [156]: # Setup inputs
...: n = 10000
...: x = np.random.rand(n,2,4)
...:
In [157]: np.allclose(original_app(x),proposed_app(x))
Out[157]: True # Results verified
In [158]: %timeit original_app(x)
1 loops, best of 3: 610 ms per loop
In [159]: %timeit proposed_app(x)
100 loops, best of 3: 6.32 ms per loop
Huge speedup there!

Categories

Resources