Getting the index of the minimum value in each slice of `ndarray` - python

I am trying to do something that should be straightforward and can be accomplished in a for-loop but I am trying to avoid that.
I would like to get the index of the minimum value in each slice along a certain axis of a numpy.ndarray, a. I am more interested in the index than the value itself. I use the index to get a value from another 2D array with shape equal to the first 2 dimensions of a.
Here is a naive implementation using a for-loop:
a = np.random.randint(0, 10, 60).reshape(3, 4, 5)
print(a)
for i in range(a.shape[-1]):
idx = a[..., i].argmin()
print('Slice:', i, '| Index:', idx, '| min value:',
a[..., i].flat[idx])
Out:
[[[1 9 4 0 7]
[6 3 1 6 8]
[7 8 2 0 2]
[8 6 1 6 5]]
[[8 7 0 6 9]
[7 2 6 4 5]
[3 4 9 2 9]
[1 4 8 0 7]]
[[1 4 6 6 2]
[9 9 5 6 7]
[6 2 8 9 9]
[3 9 8 5 4]]]
Slice: 0 | Index: 0 | min value: 1
Slice: 1 | Index: 5 | min value: 2
Slice: 2 | Index: 4 | min value: 0
Slice: 3 | Index: 0 | min value: 0
Slice: 4 | Index: 2 | min value: 2
I realise I can pass an axis keyword argument to argmin but that does not produce the result I am looking for.

For the specific case given in your question, you can reshape your array, then use argmin:
>>> import numpy as np
>>> a = np.array([[[1, 9, 4, 0, 7],
... [6, 3, 1, 6, 8],
... [7, 8, 2, 0, 2],
... [8, 6, 1, 6, 5]],
...
... [[8, 7, 0, 6, 9],
... [7, 2, 6, 4, 5],
... [3, 4, 9, 2, 9],
... [1, 4, 8, 0, 7]],
...
... [[1, 4, 6, 6, 2],
... [9, 9, 5, 6, 7],
... [6, 2, 8, 9, 9],
... [3, 9, 8, 5, 4]]])
>>> a.reshape(-1, a.shape[2]).min(axis=0)
array([1, 2, 0, 0, 2])
>>> a.reshape(-1, a.shape[2]).argmin(axis=0)
array([0, 5, 4, 0, 2])
>>>
The shape[2] comes from the fact that this is the dimension (in this case, the inner dimension, or rows), where you don't want to calculate the minimum across: you're calculating the minimum across the first two dimensions.
You also need the slice number: basically just the second index of your elements. That is easy, since that one is sequential, and is just:
slices = np.arange(a.shape[2])

Related

How can I generate a matrix with random values based from a larger matrix in Python?

I would like to know if there was a way to generate a matrix with values based from a larger matrix. For example, if I have
larger_matrix = np.random.randint(10, size=(10,5))
Out[1]:
array([[0, 9, 0, 0, 3],
[9, 4, 7, 7, 0],
[9, 4, 5, 6, 9],
[6, 3, 1, 7, 3],
[8, 4, 6, 9, 7],
[8, 1, 5, 8, 8],
[9, 9, 6, 0, 9],
[9, 9, 6, 8, 7],
[5, 5, 6, 6, 4],
[4, 4, 7, 0, 7]])
and I want to create smaller_matrix of size (4, 5), with values randomly sampled from larger_matrix, how should I go about this? I'm aware that the function np.random.choice() exists, but I'm quite unsure if it would be helpful for my problem because I'm dealing with matrices instead of lists. Thank you.
Use flatten to convert 2d larger_matrix to 1d.
Then you can use random.choice to get random sample from larger_matrix
Finally, use reshape to convert 1d list to 2d matrix
code:
import numpy as np
larger_matrix = np.random.randint(10, size=(10,5))
print(larger_matrix)
n = 4
m = 5
print(np.reshape(np.random.choice(larger_matrix.flatten(),size = n*m),(n,m)))
result:
[[7 4 4 6 0]
[5 7 0 6 8]
[9 9 0 0 5]
[9 8 0 6 7]
[0 9 8 8 1]
[3 7 1 0 0]
[8 9 2 3 8]
[6 3 7 2 9]
[9 7 5 9 3]
[8 8 3 5 8]]
[[0 0 8 0 9]
[6 9 2 7 0]
[8 7 6 0 7]
[7 4 9 3 7]]
You can run a for loop inside a for loop and use it to fill the smaller matrix with random indexes from the matrix.
For i in range(len(larger_matrix)): For j in range(len(larger_matrix[0])): smaller_matrix[i][j] = larger_matrix[rand1][rand2]
That should cover it. Just make sure you generate 2 new numbers each time.
You could do it like this but bear in mind that the choices taken from the large array may be duplicated:-
import numpy as np
import random
R1 = 10
R2 = 4
C = 5
m = np.random.randint(R1, size=(R1, C))
print(m)
print()
n = []
for _ in range(R2):
n.append(random.choice(m))
print(np.array(n))

Find size of numpy array within a Panda dataframe in python

I would like to get the size of each numpy array within a panda. How do I do this?
I have:
x y z
0 [1, 2, 3, 4] [8, 9, 7] [8, 9, 7]
1 [2, 3, 4, 8] [9, 8, 1] [9, 8, 1, 6, 7, 8, 9]
2 [5, 6, 7] [3, 4, 1] [3, 4, 1]
cars= pd.DataFrame({'x': [[1,2,3,4],[2,3,4,8],[5,6,7]],
'y': [[8,9,7],[9,8,1],[3,4,1]],
'z': [[8,9,7],[9,8,1,6,7,8,9],[3,4,1]]})
I want:
x y z
0 4 3 3
1 4 3 7
2 3 3 3
I know how to get the shape and size of the entire DataFrame, but not how to combine them with size of each block.
print(cars)
print(cars.size)
print(cars.shape)
Use Series.str.len in DataFrame.apply for precessing all columns:
df = cars.apply(lambda x: x.str.len())
print (df)
x y z
0 4 3 3
1 4 3 7
2 3 3 3
If no missing values use DataFrame.applymap for element-wise apply function len :
df = cars.applymap(len)

Python program to delete all the rows and columns with all zeros and print the remaining matrix

Is there any program possible with a complexity less than O(mn)? The input is in the form as the first line contains MN and the next M lines each containing N integers
For example
4 4
1 0 3 4
0 0 0 0
4 0 6 8
4 0 2 4
The output should be:
1 3 4
4 6 8
4 2 4
You can do this by individually filtering rows and columns with all values equal to 1 but checking if set(row or column)!={0}
arr = [[1, 0, 3, 4],
[0, 0, 0, 0],
[4, 0, 6, 8],
[4, 0, 2, 4]]
rows = [i for i in arr if set(i)!={0}]
cols = [i for i in zip(*rows) if set(i)!={0}]
arr_new = [list(i) for i in zip(*cols)]
print(arr_new)
[[1, 3, 4],
[4, 6, 8],
[4, 2, 4]]
EDIT:
If you are ok with using numpy then you can do this a bit more easily -
import numpy as np
arr = np.array(arr)
arr[~(arr==0).all(0)][:,~(arr==0).all(1)]
array([[1, 3, 4],
[4, 6, 8],
[4, 2, 4]])

how to get the max of each column from an 2d array + index of the max value

I have for example
A = [[1 2 3 4 5]
[2 4 5 8 7]
[9 8 4 5 2]
[1 2 4 7 2]
[5 9 8 7 6]
[1 2 5 4 3]]
So the shape of A = (5,6)
What I want is now the max of each column and return the result as eg:
A = [[9 9 8 8 7]] with as shape (5,1)
And at the same time I would like to receive the index of the max value from each column.
Is this possible? I don't find immediatly the sollution within the np.array basic doc.
You could use ndarray.max().
The axis keyword argument describes what axis you want to find the maximum along.
keepdims=True lets you keep the input's dimensions.
To get the indizes of the maxima in the columns, you can use the ndarray.argmax() function.
You can also pass the axis argument ot this function, but there is no keepdims option.
In both commands axis=0 describes the columns, axis=1 describes the rows.
The standard value axis=None would search the maximum in the entire flattened array.
Example:
import numpy as np
A = np.asarray(
[[1, 2, 3, 4, 5],
[2, 4, 5, 8, 7],
[9, 8, 4, 5, 2],
[1, 2, 4, 7, 2],
[5, 9, 8, 7, 6],
[1, 2, 5, 4, 3]])
print(A)
max = A.max(axis=0, keepdims=True)
max_index = A.argmax(axis=0)
print('Max:', max)
print('Max Index:', max_index)
This prints:
[[1 2 3 4 5]
[2 4 5 8 7]
[9 8 4 5 2]
[1 2 4 7 2]
[5 9 8 7 6]
[1 2 5 4 3]]
Max: [[9 9 8 8 7]]
Max Index: [2 4 4 1 1]
you can use numpy as well.
Example:
import numpy as np
A = [[1, 2, 3, 4, 5],
[2, 4, 5, 8, 7],
[9, 8, 4, 5, 2],
[1, 2, 4, 7, 2],
[5, 9, 8, 7, 6],
[1, 2, 5, 4, 3]]
print(A)
A=np.array(A)
print(A.max(axis=0))

numpy.tile did not work as Matlab repmat

According to What is the equivalent of MATLAB's repmat in NumPy, I tried to build 3x3x5 array from 3x3 array using python.
In Matlab, this work as I expected.
a = [1,1,1;1,2,1;1,1,1];
a_= repmat(a,[1,1,5]);
size(a_) = 3 3 5
But for numpy.tile
b = numpy.array([[1,1,1],[1,2,1],[1,1,1]])
b_ = numpy.tile(b, [1,1,5])
b_.shape = (1, 3, 15)
If I want to generate the same array as in Matlab, what is the equivalent?
Edit 1
The output I would expect to get is
b_(:,:,1) =
1 1 1
1 2 1
1 1 1
b_(:,:,2) =
1 1 1
1 2 1
1 1 1
b_(:,:,3) =
1 1 1
1 2 1
1 1 1
b_(:,:,4) =
1 1 1
1 2 1
1 1 1
b_(:,:,5) =
1 1 1
1 2 1
1 1 1
but what #farenorth and the numpy.dstack give is
[[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]
[[1 1 1 1 1]
[2 2 2 2 2]
[1 1 1 1 1]]
[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]]
NumPy functions are not, in general, 'drop-in' replacements for matlab functions. Often times there are subtle difference to how the 'equivalent' functions are used. It does take time to adapt, but I've found the transition to be very worthwhile.
In this case, the np.tile documentation indicates what happens when you are trying to tile an array to higher dimensions than it is defined,
numpy.tile(A, reps)
Construct an array by repeating A the number of times given by reps.
If reps has length d, the result will have dimension of max(d, A.ndim).
If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.
In this case, your array is being cast to a shape of [1, 3, 3], then being tiled. So, to get your desired behavior just be sure to append a new singleton-dimension to the array where you want it,
>>> b_ = numpy.tile(b[..., None], [1, 1, 5])
>>> print(b_.shape)
(3, 3, 5)
Note here that I've used None (i.e. np.newaxis) and ellipses notation to specify a new dimension at the end of the array. You can find out more about these capabilities here.
Another option, which is inspired by the OP's comment would be:
b_ = np.dstack((b, ) * 5)
In this case, I've used tuple multiplication to 'repmat' the array, which is then constructed by np.dstack.
As #hpaulj indicated, Matlab and NumPy display matrices differently. To replicate the Matlab output you can do something like:
>>> for idx in xrange(b_.shape[2]):
... print 'b_[:, :, {}] = \n{}\n'.format(idx, str(b_[:, :, idx]))
...
b_[:, :, 0] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 1] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 2] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 3] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 4] =
[[1 1 1]
[1 2 1]
[1 1 1]]
Good luck!
Let's try the comparison, taking care to diversify the shapes and values.
octave:7> a=reshape(0:11,3,4)
a =
0 3 6 9
1 4 7 10
2 5 8 11
octave:8> repmat(a,[1,1,2])
ans =
ans(:,:,1) =
0 3 6 9
1 4 7 10
2 5 8 11
ans(:,:,2) =
0 3 6 9
1 4 7 10
2 5 8 11
numpy equivalent - more or less:
In [61]: a=np.arange(12).reshape(3,4)
In [62]: np.tile(a,[2,1,1])
Out[62]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
numpy again, but with order F to better match the MATLAB Fortran-derived layout
In [63]: a=np.arange(12).reshape(3,4,order='F')
In [64]: np.tile(a,[2,1,1])
Out[64]:
array([[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]],
[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]]])
I'm adding the new numpy dimension at the start, because in many ways it better replicates the MATLAB practice of adding it at the end.
Try adding the new dimension at the end. The shape is (3,4,5), but you might not like the display.
np.tile(a[:,:,None],[1,1,2])
Another consideration - what happens when you flatten the tile?
octave:10> repmat(a,[1,1,2])(:).'
ans =
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
with the order F a
In [78]: np.tile(a[:,:,None],[1,1,2]).flatten()
Out[78]:
array([ 0, 0, 3, 3, 6, 6, 9, 9, 1, 1, 4, 4, 7, 7, 10, 10, 2,
2, 5, 5, 8, 8, 11, 11])
In [79]: np.tile(a,[2,1,1]).flatten()
Out[79]:
array([ 0, 3, 6, 9, 1, 4, 7, 10, 2, 5, 8, 11, 0, 3, 6, 9, 1,
4, 7, 10, 2, 5, 8, 11])
with a C order array:
In [80]: a=np.arange(12).reshape(3,4)
In [81]: np.tile(a,[2,1,1]).flatten()
Out[81]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
This last one matches the Octave layout.
So does:
In [83]: a=np.arange(12).reshape(3,4,order='F')
In [84]: np.tile(a[:,:,None],[1,1,2]).flatten(order='F')
Out[84]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
Confused yet?

Categories

Resources