I have N matrices with dimensions R x R and one 'Weight matrix' with dimension R x N.
Now I want to combine those N matrices row-wise by weighting them with the 'Weight matrix'. In the end I want a R x R matrix.
Let me show you an example:
In the following example my initial matrices are a and b and my weight matrix is c. The desired output is matrix r.
The first row of r is the first row of a, because c[0,0] is 1 and c[0,1] is 0, so we just consider the first row of matrix a.
The second row of r is a weighted average of row 2 from both matrix a and b (because c[1,0]= 0.5 and c[1,1] = 0.5).
The third row of r is the third row of b, because c[2,0] is 0 and c[2,1] is 1, so we just consider the third row of matrix b.
How can I do this in Python (preferable with a numpy function)?
We can use np.einsum -
In [57]: A # 3D input array
Out[57]:
array([[[0.2, 0. , 0.8],
[0. , 0. , 1. ],
[0. , 0.2, 0.8]],
[[1. , 0. , 0. ],
[0. , 0.2, 0.8],
[0.2, 0. , 0.8]]])
In [58]: c # 2D weight array
Out[58]:
array([[1. , 0. ],
[0.5, 0.5],
[0. , 1. ]])
In [59]: np.einsum('ijk,ji->jk',A,c)
Out[59]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Alternatively with np.matmul -
In [142]: (np.matmul(A.transpose(1,2,0),c[...,None]))[...,0]
Out[142]:
array([[0.2, 0. , 0.8],
[0. , 0.1, 0.9],
[0.2, 0. , 0.8]])
Note : On Python 3.x np.matmul could be replaced by # operator.
Related
I have a NumPy array made of ragged nested sequences such as the following:
arr = np.array((
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2))
))
I want to resize each of the nested arrays to the shape (4, 4, 4) by filling it with zeros.
I initially looked at this post numpy - resize array filling with 0 which works for 2D NumPy arrays but, I have struggled to modify it for a 3D NumPy array.
So far I have tried iterating over the individual nested arrays however, even with some fairly basic code such as
for i, a in enumerate(arr[0]):
arr[0][i] = np.hstack([a, np.zeros([a.shape[0], 2])])
It still creates an error.
ValueError: could not broadcast input array from shape (2,4) into shape (2,2)
I could create separate variables for every nested array except this feels very slow and inefficient and I'd need even messier code to extend this to all 3 dimensions.
An example of a test:
arr = [[[0.1, 0.4],
[0.3, 0,7]],
[[0.5, 0.2],
[0.8, 0.1]]]
If I wanted it to have the shape (2, 3, 4) the output would be the following
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0,7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
UPDATE:
Don't even need to use pad then:
def pad_3d(arr: np.ndarray, out_shape: tuple[int, int, int]) -> np.ndarray:
x, y, z = arr.shape
output = np.zeros(out_shape, dtype=arr.dtype)
output[:x, :y, :z] = arr
return output
test_arr = np.array(
[[[0.1, 0.4],
[0.3, 0.7]],
[[0.5, 0.2],
[0.8, 0.1]]]
)
desired_shape = (2, 3, 4)
expected_output = np.array(
[[[0.1, 0.4, 0.0, 0.0],
[0.3, 0.7, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]],
[[0.5, 0.2, 0.0, 0.0],
[0.8, 0.1, 0.0, 0.0],
[0.0, 0.0, 0.0, 0.0]]]
)
assert np.all(expected_output == pad_3d(test_arr, desired_shape)) # True
Original answer:
It's not entirely clear how you want to fill the resulting arrays with zeros around your data. Only on one side along each axis? Or do you want to essentially "center" your original data amidst the zeros?
Either way, I see no way around creating new arrays. The pad function does what you want, I think. Here is a simplified example for one array, where I "pad around" the data:
import numpy as np
a = np.arange(2*2*2).reshape((2, 2, 2))
x = np.pad(a, 0)
If you want to pad on one side with zeros:
x = np.pad(a, (0, 2))
Assuming your arrays are always cubic, i.e. of the shape (n, n, n), you can generalize like this:
def pad_with_zeros(arr, target_size):
return np.pad(arr, (0, target_size - arr.shape[0]))
IIUC, here is one way to do it:
Assuming your arr is actually a list or a tuple:
arr = (
np.random.random((2, 2, 2)),
np.random.random((4, 4, 4)),
np.random.random((2, 2, 2)),
)
# new shape: max length in each dimension:
shape = np.c_[[x.shape for x in arr]].max(0)
>>> shape
array([4, 4, 4])
# pad all arrays
new = [np.pad(x, np.c_[[0]*len(shape), shape - x.shape]) for x in arr]
>>> new[0].shape
(4, 4, 4)
>>> new[0]
array([[[0.5488135 , 0.71518937, 0. , 0. ],
[0.60276338, 0.54488318, 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0.4236548 , 0.64589411, 0. , 0. ],
[0.43758721, 0.891773 , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]],
[[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. ]]])
I have two arrays in python.
For a, it looks like
array([[0. , 0.08],
[0.12, 0. ],
[0.12, 0.08]])
For b, it looks like
array([[0.88, 0. ],
[0. , 0.92],
[0. , 0. ]])
I want to do the multiplication for these two arrays like below:
array([[0.08*0.88], ### 1st row of a multiplies 1st row of b without zeros
[0.12*0.92], ### 2nd row of a multiplies 2nd row of b without zeros
[0.12*0.08]]) ### multiplies o.12 and 0.08 together in 3rd row of a without zeros in 3rd row of b
And the final desired result is:
array([[0.0704],
[0.1104],
[0.0096]])
How can I achieve this? I could really use your help.
Just replace zero values by 1 on both the arrays, then pass a*b to np.prod with axis=1, and keepdims=True:
>>> a[a==0] = 1
>>> b[b==0] = 1
>>> np.prod(a*b, axis=1, keepdims=True)
#output:
array([[0.0704],
[0.1104],
[0.0096]])
Consider the following strategy:
a = np.array([[0. , 0.08],
[0.12, 0. ],
[0.12, 0.08]])
b = np.array([[0.88, 0. ],
[0. , 0.92],
[0. , 0. ]])
c = np.hstack([a, b]) # stick a and b together along axis 1
d = np.where(c == 0, 1, c) # turn the 0s into 1s
result = np.prod(d, axis=1) # calculate the production along axis 1
# array([0.0704, 0.1104, 0.0096])
You can do it like this
# First concatenate both the arrays
temp = np.concatenate((arr1, arr2), axis=1)
'''
the result will be like this
array([[0. , 0.08, 0.88, 0. ],
[0.12, 0. , 0. , 0.92],
[0.12, 0.08, 0. , 0. ]])
'''
# Sort the arrays
temp.sort()
'''
result: array([[0. , 0. , 0.08, 0.88],
[0. , 0. , 0.12, 0.92],
[0. , 0. , 0.08, 0.12]])
'''
res = temp[:, -1] * temp[:, -2]
'''
result: array([0.0704, 0.1104, 0.0096])
'''
I would like calculate the sum of two in two column in a matrix(the sum between the columns 0 and 1, between 2 and 3...).
So I tried to do nested "for" loops but at every time I haven't the good results.
For example:
c = np.array([[0,0,0.25,0.5],[0,0.5,0.25,0],[0.5,0,0,0]],float)
freq=np.zeros(6,float).reshape((3, 2))
#I calculate the sum between the first and second column, and between the fird and the fourth column
for i in range(0,4,2):
for j in range(1,4,2):
for p in range(0,2):
freq[:,p]=(c[:,i]+c[:,j])
But the result is:
print freq
array([[ 0.75, 0.75],
[ 0.25, 0.25],
[ 0. , 0. ]])
Normaly the good result must be (0., 0.5,0.5) and (0.75,0.25,0). So I think the problem is in the nested "for" loops.
Is there a person who know how I can calculate the sum every two columns, because I have a matrix with 400 columns?
You can simply reshape to split the last dimension into two dimensions, with the last dimension of length 2 and then sum along it, like so -
freq = c.reshape(c.shape[0],-1,2).sum(2).T
Reshaping only creates a view into the array, so effectively, we are just using the summing operation here and as such must be efficient.
Sample run -
In [17]: c
Out[17]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [18]: c.reshape(c.shape[0],-1,2).sum(2).T
Out[18]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Add the slices c[:, ::2] and c[:, 1::2]:
In [62]: c
Out[62]:
array([[ 0. , 0. , 0.25, 0.5 ],
[ 0. , 0.5 , 0.25, 0. ],
[ 0.5 , 0. , 0. , 0. ]])
In [63]: c[:, ::2] + c[:, 1::2]
Out[63]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
Here is one way using np.split():
In [36]: np.array(np.split(c, np.arange(2, c.shape[1], 2), axis=1)).sum(axis=-1)
Out[36]:
array([[ 0. , 0.5 , 0.5 ],
[ 0.75, 0.25, 0. ]])
Or as a more general way even for odd length arrays:
In [87]: def vertical_adder(array):
return np.column_stack([np.sum(arr, axis=1) for arr in np.array_split(array, np.arange(2, array.shape[1], 2), axis=1)])
....:
In [88]: vertical_adder(c)
Out[88]:
array([[ 0. , 0.75],
[ 0.5 , 0.25],
[ 0.5 , 0. ]])
In [94]: a
Out[94]:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
In [95]: vertical_adder(a)
Out[95]:
array([[ 1, 5, 4],
[11, 15, 9],
[21, 25, 14]])
Suppose we had two arrays: some values, e.g. array([1.2, 1.4, 1.6]), and some indices (let's say, array([0, 2, 1])) Our output is expected to be the values put into a bigger array, "addressed" by the indices, so we would get
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
Is there a way to do this without loops, in a nice, fast way?
With
a = zeros((3,3))
b = array([0, 2, 1])
vals = array([1.2, 1.4, 1.6])
You just need to index it (with the help of arange or r_):
>>> a[r_[:len(b)], b] = vals
array([[ 1.2, 0. , 0. ],
[ 0. , 0. , 1.4],
[ 0. , 1.6, 0. ]])
How do we modify this for higher dimensions? For example, a is a 5x4x3 array and b and vals are 5x4 arrays.
then How do we modify the statement a[r_[:len(b)],b] = vals ?
I have a matrix that should have ones on the diagonal but the columns are mixed up.
But I don't know how, without the obvious for loop, to efficiently interchange rows to get unity on the diagonals. I'm not even sure what key I would pass to sort on.
Any suggestions?
You can use numpy's argmax to determine the goal column ordering and reorder your matrix using the argmax results as column indices:
>>> z = numpy.array([[ 0.1 , 0.1 , 1. ],
... [ 1. , 0.1 , 0.09],
... [ 0.1 , 1. , 0.2 ]])
numpy.argmax(z, axis=1)
>>> array([2, 0, 1]) #Goal column indices
z[:,numpy.argmax(z, axis=1)]
>>> array([[ 1. , 0.1 , 0.1 ],
... [ 0.09, 1. , 0.1 ],
... [ 0.2 , 0.1 , 1. ]])
>>> import numpy as np
>>> a = np.array([[ 1. , 0.5, 0.5, 0. ],
... [ 0.5, 0.5, 1. , 0. ],
... [ 0. , 1. , 0. , 0.5],
... [ 0. , 0.5, 0.5, 1. ]])
>>> np.array(sorted(a, cmp=lambda x, y: list(x).index(1) - list(y).index(1)))
array([[ 1. , 0.5, 0.5, 0. ],
[ 0. , 1. , 0. , 0.5],
[ 0.5, 0.5, 1. , 0. ],
[ 0. , 0.5, 0.5, 1. ]])
It actually sorts by rows, not columns (but the result is the same). It works by sorting by the index of the column the 1 is in.