I have a batch, two data, of multidimensional arrays (3,3,2) as following:
batch= np.asarray([
[
[[1,2,3],
[3,1,1,],
[4,9,0,]],
[[2,2,2],
[5,6,7],
[3,3,3]]
],
[
[[2,2,2],
[5,6,7],
[3,3,3]],
[[1,2,3],
[3,1,1],
[4,9,0]]
]
])
correspondingly I have batch, two data, of (1,1,2) as follows
scalers = np.asarray([
[
[[1]],
[[2]]
],
[
[[0]],
[[3]]
]
])
each dimension in the batch should be multiplied by its corresponding scaler in the scalers array. For example:
# the first dimension
[[1,2,3],
1 * [3,1,1,],
[4,9,0,]]
# the second dimension
2 * [[2,2,2],
[5,6,7],
[3,3,3]]
.
.
.
# the last dimension
3* [[1,2,3],
[3,1,1],
[4,9,0]]
So , the expected output should be like the following:
[
[
[[1 2 3],
[3 1 1 ],
[4 9 0 ]],
[[4 4 4],
[10 12 14],
[6 6 6]]
],
[
[[0 0 0],
[0 0 0],
[0 0 0]],
[[3 6 9],
[9 3 3],
[12 27 0]]
]
]
I was trying to do the following to avoid any loops
batch * scalers
but it seems it is not correct, I wonder how to do the behavior above
Related
I got a numpy array X which have shape (2 , 2 , 3) like this:
X = [[[1 , 2 , 3 ]
[4 , 5 , 6]] ,
[[7 , 8 , 9 ]
[10, 11, 12 ]],
I want to flatten all subarrays and turn X to the shape of (2 , 6) which is represented like this:
X = [[ 1 , 2 , 3 , 4, 5 , 6 ] ,
[ 7 , 8 , 9 , 10, 11 , 12 ] ]
But when I used X.flatten(), it just turned out to be like this:
X = [ 1 , 2, 3, 4 , 5, ... , 12]
Is there any function to help me transform the array like what I mean.
Just reshape....
import numpy as np
arr = np.array([[[1 , 2 , 3 ],
[4 , 5 , 6]],
[[7 , 8 , 9 ],
[10, 11, 12 ]]])
arr.reshape(2, 6)
result:
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12]])
Iterate over the array and flatten the sublists
arr = np.array([x.flatten() for x in X])
Or for numpy solution you can also use np.hstack()
arr = np.hstack(X)
Output
print(arr)
#[[ 1 2 3 4 5 6]
# [ 7 8 9 10 11 12]]
Loop through the array and flatten the subcomponents:
x = np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
y = np.array([i.flatten() for i in x])
print(x)
print(y)
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
[[1 2 3 4]
[5 6 7 8]]
If its an numpy array you can use reshape
x.reshape(2,6)
Input:
x = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
x.reshape(2,6)
Output:
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12]])
Broadcasting is only possible (as far as I know) with matrices matching shape from the end (shape [4,3,2] is broadcastable with shapes [2], [3,2], [4,3,2]). But why?
Consider the following example:
np.zeros([4,3,2])
[[[0 0]
[0 0]
[0 0]]
[[0 0]
[0 0]
[0 0]]
[[0 0]
[0 0]
[0 0]]
[[0 0]
[0 0]
[0 0]]]
Why broadcasting with [1,2,3], or [1,2,3,4] isn't possible?
Adding with [1,2,3] (shape: [3], target shape: [4,3,2]) expected result:
[[[1 1]
[2 2]
[3 3]]
[[1 1]
[2 2]
[3 3]]
[[1 1]
[2 2]
[3 3]]
[[1 1]
[2 2]
[3 3]]]
Adding with [1,2,3,4] (shape: [4], target shape: [4,3,2]) expected result:
[[[1 1]
[1 1]
[1 1]]
[[2 2]
[2 2]
[2 2]]
[[3 3]
[3 3]
[3 3]]
[[4 4]
[4 4]
[4 4]]]
Or, if there would be concerns about multi dimensional broadcasting this way, adding with:
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
(shape: [4,3], target shape: [4,3,2]) expected result:
[[[ 1 1]
[ 2 2]
[ 3 3]]
[[ 4 4]
[ 5 5]
[ 6 6]]
[[ 7 7]
[ 8 8]
[ 9 9]]
[[10 10]
[11 11]
[12 12]]]
So basically what I'm saying is that I can't see a reason why it couldn't find the matching shape, and do the operations respectively. If there's multiple dimensions matching in the target matrix, just select the last one automatically, or have the option to specify which dimension we want to perform the operation.
Any ideas/suggestions?
The broadcasting rules are simple and unambiguous.
add leading size 1 dimension as needed to match total number of dimensions
adjust all size 1 dimensions as needed to match
With (4,3,2)
(2,) => (1,1,2) => (4,3,2)
(3,2) => (1,3,2) => (4,3,2)
(3,) => (1,1,3) => (4,3,3) ERROR
(4,) => (1,1,4)
(4,3) => (1,4,3)
With reshape or np.newaxis we can add explicit new dimensions in the right place:
(3,1) => (1,3,1) => (4,3,2)
(4,1,1) => (4,3,2)
(4,3,1) => (4,3,2)
Why doesn't it do the last stuff automatically? Potential ambiguity. Without those rules, especially the 'add only leading', it would be possible to add the extra dimension in several different places.
e.g.
(2,3,3) + (3,) => is that (1,1,3) or (1,3,1)?
(2,3,3,3) + (3,3)
I have a matrix with these dimensions (150,2) and I want to duplicate each row N times. I show what I mean with an example.
Input:
a = [[2, 3], [5, 6], [7, 9]]
suppose N= 3, I want this output:
[[2 3]
[2 3]
[2 3]
[5 6]
[5 6]
[5 6]
[7 9]
[7 9]
[7 9]]
Thank you.
Use np.repeat with parameter axis=0 as:
a = np.array([[2, 3],[5, 6],[7, 9]])
print(a)
[[2 3]
[5 6]
[7 9]]
r_a = np.repeat(a, repeats=3, axis=0)
print(r_a)
[[2 3]
[2 3]
[2 3]
[5 6]
[5 6]
[5 6]
[7 9]
[7 9]
[7 9]]
To create an empty multidimensional array in NumPy (e.g. a 2D array m*n to store your matrix), in case you don't know m how many rows you will append and don't care about the computational cost Stephen Simmons mentioned (namely re-building the array at each append), you can squeeze to 0 the dimension to which you want to append to: X = np.empty(shape=[0, n]).
This way you can use for example (here m = 5 which we assume we didn't know when creating the empty matrix, and n = 2):
import numpy as np
n = 2
X = np.empty(shape=[0, n])
for i in range(5):
for j in range(2):
X = np.append(X, [[i, j]], axis=0)
print X
which will give you:
[[ 0. 0.]
[ 0. 1.]
[ 1. 0.]
[ 1. 1.]
[ 2. 0.]
[ 2. 1.]
[ 3. 0.]
[ 3. 1.]
[ 4. 0.]
[ 4. 1.]]
If your input is a vector, use atleast_2d first.
a = np.atleast_2d([2, 3]).repeat(repeats=3, axis=0)
print(a)
# [[2 3]
# [2 3]
# [2 3]]
I have two 3-D arrays of the same size a and b
np.random.seed([3,14159])
a = np.random.randint(10, size=(4, 3, 2))
b = np.random.randint(10, size=(4, 3, 2))
print(a)
[[[4 8]
[1 1]
[9 2]]
[[8 1]
[4 2]
[8 2]]
[[8 4]
[9 4]
[3 4]]
[[1 5]
[1 2]
[6 2]]]
print(b)
[[[7 7]
[1 1]
[7 8]]
[[7 4]
[8 0]
[0 9]]
[[3 8]
[7 7]
[2 6]]
[[3 1]
[9 3]
[0 5]]]
I want to take the first array from a
a[0]
[[4 8]
[1 1]
[9 2]]
And the first one from b
b[0]
[[7 7]
[1 1]
[7 8]]
And return this
a[0].T.dot(b[0])
[[ 92 101]
[ 71 73]]
But I want to do this over the entire first dimension. I thought I could use np.einsum
np.einsum('abc,ade->ace', a, b)
[[[210 224]
[165 176]]
[[300 260]
[ 75 65]]
[[240 420]
[144 252]]
[[ 96 72]
[108 81]]]
This is the correct shape, but not values.
I expect to get this:
np.array([x.T.dot(y).tolist() for x, y in zip(a, b)])
[[[ 92 101]
[ 71 73]]
[[ 88 104]
[ 23 22]]
[[ 93 145]
[ 48 84]]
[[ 12 34]
[ 33 21]]]
The matrix multiplication amounts to a sum of products where the sum is taken over the middle axis, so the index b should be the same for both arrays: (i.e. change ade to abe):
In [40]: np.einsum('abc,abe->ace', a, b)
Out[40]:
array([[[ 92, 101],
[ 71, 73]],
[[ 88, 104],
[ 23, 22]],
[[ 93, 145],
[ 48, 84]],
[[ 12, 34],
[ 33, 21]]])
When the input arrays have index subscripts that are missing in the output array,
they are summed over independently. That is,
np.einsum('abc,ade->ace', a, b)
is equivalent to
In [44]: np.einsum('abc,ade->acebd', a, b).sum(axis=-1).sum(axis=-1)
Out[44]:
array([[[210, 224],
[165, 176]],
[[300, 260],
[ 75, 65]],
[[240, 420],
[144, 252]],
[[ 96, 72],
[108, 81]]])
Here's one with np.matmul as we need to push back the second axis of a to the end, so that it would get sum-reduced against the second axis from b, while keeping their first axes aligned -
np.matmul(a.swapaxes(1,2),b)
Schematically put :
At start :
a : M x N X R1
b : M x N X R2
With swapped axes for a :
a : M x R1 X [N]
b : M x [N] X R2
The bracketed axes get sum-reduced, leaving us with :
out : M x R1 X R2
On Python 3.x, matmul is taken care of with # operator -
a.swapaxes(1,2) # b
I ran Random Forest classifier for my multi-class multi-label output variable. I got below output.
My y_test values
Degree Nature
762721 1 7
548912 0 6
727126 1 12
14880 1 12
189505 1 12
657486 1 12
461004 1 0
31548 0 6
296674 1 7
121330 0 17
predicted output :
[[ 1. 7.]
[ 0. 6.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 12.]
[ 1. 0.]
[ 0. 6.]
[ 1. 7.]
[ 0. 17.]]
Now I want to check the performance of my classifier. I found that for multiclass multilabel "Hamming loss or jaccard_similarity_score" is the good metrics. I tried to calculate it but I was getting value error.
Error:
ValueError: multiclass-multioutput is not supported
Below line I tried:
print hamming_loss(y_test, RF_predicted)
print jaccard_similarity_score(y_test, RF_predicted)
Thanks,
To calculate the unsupported hamming loss for multiclass / multilabel, you could:
import numpy as np
y_true = np.array([[1, 1], [2, 3]])
y_pred = np.array([[0, 1], [1, 2]])
np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)
0.75
You can also get the confusion_matrix for each of the two labels like so:
from sklearn.metrics import confusion_matrix, precision_score
np.random.seed(42)
y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[0 4]
[1 4]
[0 4]
[0 4]
[0 2]
[1 4]
[0 3]
[0 2]
[0 3]
[1 3]]
y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).T
[[1 2]
[1 2]
[1 4]
[1 4]
[0 4]
[0 3]
[1 4]
[1 3]
[1 3]
[0 4]]
confusion_matrix(y_true[:, 0], y_pred[:, 0])
[[1 6]
[2 1]]
confusion_matrix(y_true[:, 1], y_pred[:, 1])
[[0 1 1]
[0 1 2]
[2 1 2]]
You could also calculate the precision_score like so (or the recall_score in a similiar way):
precision_score(y_true[:, 0], y_pred[:, 0])
0.142857142857