numpy - tensor multiplication product

numpy - tensor multiplication product - python

I have a 4 x 4 matrix
import numpy as np
c = np.random.rand((4,4))
I want to create an 100 x 4 x 4 x 100 tensor such that when the first an last index are equal, I get back my matrix else I get zeros.
I can do this in a loop as
Z = np.zeros((100, 4, 4, 100))
for i in range(100):
Z[i, :, :, i] = c
is there a better way to do this? I tried looking at np.tensordot and np.einsum but could not figure it out.
Thanks,
Sahil

Use advanced-indexing -
n = 100
Zout = np.zeros((n, 4, 4, n))
I = np.arange(n)
Zout[I,:,:,I] = c
With eye-masking -
n = 100
mask = np.eye(n, dtype=bool)
Zout = np.zeros((n, 4, 4, n))
Zout.transpose(0,3,1,2)[mask] = c
Timings -
In [72]: c = np.random.rand(4,4)
In [73]: %%timeit
...: n = 100
...: Zout = np.zeros((n, 4, 4, n))
...: I = np.arange(n)
...: Zout[I,:,:,I] = c
10000 loops, best of 3: 47.5 µs per loop
In [74]: %%timeit
...: n = 100
...: mask = np.eye(n, dtype=bool)
...: Zout = np.zeros((n, 4, 4, n))
...: Zout.transpose(0,3,1,2)[mask] = c
10000 loops, best of 3: 73.1 µs per loop

Related

Double loop vectorization and reshape for Kronecker product in numpy

I've got two matrices
import numpy as np
n = 10
a = 2*np.ones((n,n,3))
b = 3*np.ones((n,n,3))
I want to multiply them in the way that reminds Kronecker product and then to sum up
s = 0
for i in range(n):
for j in range(n):
s += a*b[i,j]
Does there exist a method to vectorize it in numpy?

Perhaps this can be written more elegantly with np.einsum():
import numpy as np
n = 10
a = 2 * np.ones((n, n, 3))
b = 3 * np.ones((n, n, 3))
s = 0
for i in range(n):
for j in range(n):
s += a * b[i, j]
print(s.shape)
# (10, 10, 3)
ss = a * np.einsum('ijk->k', b)
print(ss.shape)
# (10, 10, 3)
print(np.all(s == ss))
# True
or even with just np.sum():
sss = a * np.sum(b, axis=(0, 1))
print(sss.shape)
# (10, 10, 3)
print(np.all(s == sss))
# True
but np.einsum() seems to be faster:
n = 100
a = 2 * np.ones((n, n, 3))
b = 3 * np.ones((n, n, 3))
%timeit f_with_loops(a, b)
# 1 loop, best of 3: 787 ms per loop
%timeit a * np.einsum('ijk->k', b)
# 10000 loops, best of 3: 121 µs per loop
%timeit a * np.sum(b, axis=(0, 1))
# 1000 loops, best of 3: 254 µs per loop

Your code can be rewritten as:
Thus, this should work:
s = a * np.sum(np.sum(b,axis=1),axis=0)

Numpy array creation using a sequence

I have seen this, but it doesn't quite answer my question.
I have an array:
x = np.array([0, 1, 2])
I want this:
y = np.array([[0,1], [0,2], [1,0], [1,2], [2,0], [2,1]])
That is, I want to take each value (let's call it i) of the array x and create x.shape[0]-1 new arrays with all of the other values of x, excluding i.
Essentially y contains the indices of a 3x3 matrix without any diagonal elements.
I have a feeling there's an easy, pythonic way of doing this that's just not coming to me.

Approach #1 : One approach would be -
x[np.argwhere(~np.eye(len(x),dtype=bool))]
Approach #2 : In two steps -
r = np.arange(len(x))
out = x[np.argwhere(r[:,None]!=r)]
Approach #3 : For performance, it might be better to create those pairwise coordinates and then mask. To get the paiwise coordinates, let's use cartesian_product_transpose, like so -
r = np.arange(len(x))
mask = r[:,None]!=r
out = cartesian_product_transpose(x,x)[mask.ravel()]
Approach #4 : Another with np.broadcast_to that avoids making copies until masking, again meant as a performance measure -
n = len(x)
r = np.arange(n)
mask = r[:,None]!=r
c0 = np.broadcast_to(x[:,None], (n, n))[mask]
c1 = np.broadcast_to(x, (n,n))[mask]
out = np.column_stack((c0,c1))
Runtime test -
In [382]: x = np.random.randint(0,9,(1000))
# #tom10's soln
In [392]: %timeit list(itertools.permutations(x, 2))
10 loops, best of 3: 62 ms per loop
In [383]: %%timeit
...: x[np.argwhere(~np.eye(len(x),dtype=bool))]
100 loops, best of 3: 11.4 ms per loop
In [384]: %%timeit
...: r = np.arange(len(x))
...: out = x[np.argwhere(r[:,None]!=r)]
100 loops, best of 3: 12.9 ms per loop
In [388]: %%timeit
...: r = np.arange(len(x))
...: mask = r[:,None]!=r
...: out = cartesian_product_transpose(x,x)[mask.ravel()]
100 loops, best of 3: 16.5 ms per loop
In [389]: %%timeit
...: n = len(x)
...: r = np.arange(n)
...: mask = r[:,None]!=r
...: c0 = np.broadcast_to(x[:,None], (n, n))[mask]
...: c1 = np.broadcast_to(x, (n,n))[mask]
...: out = np.column_stack((c0,c1))
100 loops, best of 3: 6.72 ms per loop

This is a case where, unless you really need to speed, etc, of numpy, pure Python gives a cleaner solution:
import itertools
y = itertools.permutations([0, 1, 2], 2)
# [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]

How to rotate a square numpy array with different times efficiently by `np.rot90`?

I have a 2d numpy array, for example:
a = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
and another 1d array:
I = np.array([0, 2, 3, 1, 0, 2, 0, 1])
I want to rotate a by np.rot90 function like following:
b = np.zeros((len(I), 3, 3))
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
Can I do it more efficiently without the floop?

Approach #1
Generate a 3D array of all possible 4 rotations and simply index into it with I and thus have a vectorized solution -
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a # For np.rot90(a, k=0)
P[1] = a.T[::-1] # For np.rot90(a, k=1)
P[2] = a[::-1,::-1] # For np.rot90(a, k=2)
P[3] = a.T[:,::-1] # For np.rot90(a, k=3)
out = P[I]
Approach #2
Another way to create P would be with -
P = np.array([np.rot90(a, k=i) for i in range(4)])
and as with the previous method simply index into P with I for final output.
Runtime test
Approaches -
def org_app(a, I):
m,n = a.shape
b = np.zeros((len(I), m, n), dtype=a.dtype)
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
return b
def app1(a, I):
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a
P[1] = a.T[::-1]
P[2] = a[::-1,::-1]
P[3] = a.T[:,::-1]
return P[I]
def app2(a, I):
P = np.array([np.rot90(a, k=i) for i in range(4)])
return P[I]
Timings -
In [54]: a = np.random.randint(0,9,(10,10))
In [55]: I = np.random.randint(0,4,(10000))
In [56]: %timeit org_app(a, I)
10 loops, best of 3: 51 ms per loop
In [57]: %timeit app1(a, I)
1000 loops, best of 3: 469 µs per loop
In [58]: %timeit app2(a, I)
1000 loops, best of 3: 549 µs per loop
100x+ speedup!

One more efficient way that I can think of (still not vectorized) is using a list comprehension, in one line:
np.array([np.rot90(a, k=i) for i in I])

Vectorized assignment for numpy array with repeated indices (d[i,j,i,j] = s[i,j])

How can I set
d[i,j,i,j] = s[i,j]
using "NumPy" and without for loop?
I've tried the follow:
l1=range(M)
l2=range(N)
d[l1,l2,l1,l2] = s[l1,l2]

If you think about it, that would be same as creating a 2D array of shape (m*n, m*n) and assigning the values from s into the diagonal places. To have the final output as 4D, we just need a reshape at the end. That's basically being implemented below -
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
d.shape = (m,n,m,n)
Runtime test
Approaches -
# #MSeifert's solution
def assign_vals_ix(s):
d = np.zeros((m, n, m, n), dtype=s.dtype)
l1 = range(m)
l2 = range(n)
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
return d
# Proposed in this post
def assign_vals(s):
m,n = s.shape
d = np.zeros((m*n,m*n),dtype=s.dtype)
d.ravel()[::m*n+1] = s.ravel()
return d.reshape(m,n,m,n)
# Using a strides based approach
def assign_vals_strides(a):
m,n = a.shape
p,q = a.strides
d = np.zeros((m,n,m,n),dtype=a.dtype)
out_strides = (q*(n*m*n+n),(m*n+1)*q)
d_view = np.lib.stride_tricks.as_strided(d, (m,n), out_strides)
d_view[:] = a
return d
Timings -
In [285]: m,n = 10,10
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
...:
In [286]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 21.3 µs per loop
In [287]: %timeit assign_vals_strides(s)
100000 loops, best of 3: 9.37 µs per loop
In [288]: %timeit assign_vals(s)
100000 loops, best of 3: 4.13 µs per loop
In [289]: m,n = 20,20
...: s = np.random.rand(m,n)
...: d = np.zeros((m,n,m,n))
In [290]: %timeit assign_vals_ix(s)
10000 loops, best of 3: 60.2 µs per loop
In [291]: %timeit assign_vals_strides(s)
10000 loops, best of 3: 41.8 µs per loop
In [292]: %timeit assign_vals(s)
10000 loops, best of 3: 35.5 µs per loop

You can use integer array indexing (creating the broadcasted indices with np.ix_):
d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
The first time the indices have to be duplicated (you want [i, j, i, j] instead of just [i, j]) that's why I multiplied the tuple returned by np.ix_ with 2.
For example:
>>> d = np.zeros((10, 10, 10, 10), dtype=int)
>>> s = np.arange(100).reshape(10, 10)
>>> l1 = range(3)
>>> l2 = range(5)
>>> d[np.ix_(l1,l2)*2] = s[np.ix_(l1,l2)]
And to make sure that the correct values were assigned:
>>> # Assert equality for the given condition
>>> for i in l1:
... for j in l2:
... assert d[i, j, i, j] == s[i, j]
>>> # Interactive tests
>>> d[0, 0, 0, 0], s[0, 0]
(0, 0)
>>> d[1, 2, 1, 2], s[1, 2]
(12, 12)
>>> d[2, 0, 2, 0], s[2, 0]
(20, 20)
>>> d[2, 4, 2, 4], s[2, 4]
(24, 24)

Efficient numpy subarrays extraction from a mask

I am searching a pythonic way to extract multiple subarrays from a given array using a mask as shown in the example:
a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])
The output will be a collection of array like the following, where only the contiguous "region" of True values (True values next to each other) of the mask m represent the indices generating a subarray.
L[0] = np.array([10, 5])
L[1] = np.array([2, 1])

Here's one approach -
def separate_regions(a, m):
m0 = np.concatenate(( [False], m, [False] ))
idx = np.flatnonzero(m0[1:] != m0[:-1])
return [a[idx[i]:idx[i+1]] for i in range(0,len(idx),2)]
Sample run -
In [41]: a = np.array([10, 5, 3, 2, 1])
...: m = np.array([True, True, False, True, True])
...:
In [42]: separate_regions(a, m)
Out[42]: [array([10, 5]), array([2, 1])]
Runtime test
Other approach(es) -
# #kazemakase's soln
def zip_split(a, m):
d = np.diff(m)
cuts = np.flatnonzero(d) + 1
asplit = np.split(a, cuts)
msplit = np.split(m, cuts)
L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]
return L
Timings -
In [49]: a = np.random.randint(0,9,(100000))
In [50]: m = np.random.rand(100000)>0.2
# #kazemakase's's solution
In [51]: %timeit zip_split(a,m)
10 loops, best of 3: 114 ms per loop
# #Daniel Forsman's solution
In [52]: %timeit splitByBool(a,m)
10 loops, best of 3: 25.1 ms per loop
# Proposed in this post
In [53]: %timeit separate_regions(a, m)
100 loops, best of 3: 5.01 ms per loop
Increasing the average length of islands -
In [58]: a = np.random.randint(0,9,(100000))
In [59]: m = np.random.rand(100000)>0.1
In [60]: %timeit zip_split(a,m)
10 loops, best of 3: 64.3 ms per loop
In [61]: %timeit splitByBool(a,m)
100 loops, best of 3: 14 ms per loop
In [62]: %timeit separate_regions(a, m)
100 loops, best of 3: 2.85 ms per loop

def splitByBool(a, m):
if m[0]:
return np.split(a, np.nonzero(np.diff(m))[0] + 1)[::2]
else:
return np.split(a, np.nonzero(np.diff(m))[0] + 1)[1::2]
This will return a list of arrays, split into chunks of True in m

Sounds like a natural application for np.split.
You first have to figure out where to cut the array, which is where the mask changes between True and False. Next discard all elements where the mask is False.
a = np.array([10, 5, 3, 2, 1])
m = np.array([True, True, False, True, True])
d = np.diff(m)
cuts = np.flatnonzero(d) + 1
asplit = np.split(a, cuts)
msplit = np.split(m, cuts)
L = [aseg for aseg, mseg in zip(asplit, msplit) if np.all(mseg)]
print(L[0]) # [10 5]
print(L[1]) # [2 1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy - tensor multiplication product - python

Related

Double loop vectorization and reshape for Kronecker product in numpy

Numpy array creation using a sequence

How to rotate a square numpy array with different times efficiently by `np.rot90`?

Vectorized assignment for numpy array with repeated indices (d[i,j,i,j] = s[i,j])

Efficient numpy subarrays extraction from a mask

Categories

Resources