Retuning columns in a numpy array given a boolean index - python

I have the given dataset:
data = np.array([
[1, 2, 1, 3, 1, 2, 1],
[3, 4, 1, 5, 2, 7, 2],
[2, 1, 2, 1, 1, 4, 5],
[6, 1, 2 ,3, 1, 3, 1]])
cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])
I want to return columns from data where cols_idx == 1. For that I used:
data[:, np.nonzero(cols_idx)]
But it returns a 3D instead a 2D array:
data[:, np.nonzero(cols_idx)]
array([[[1, 1]],
[[1, 2]],
[[2, 1]],
[[2, 1]]])
data[:, np.nonzero(cols_idx)].shape
(4, 1, 2)
I would like the output to be:
data[:, np.nonzero(cols_idx)]
array([[1, 1],
[1, 2],
[2, 1],
[2, 1]])
data[:, np.nonzero(cols_idx)].shape
(4, 2)
How can I achieve that?

print(np.nonzero(cols_idx)) gives (array([2, 4]),) (a tuple rather than just an array)
So you should use np.nonzero(cols_idx)[0] # gives [2 4] to get what you want:
Full code:
import numpy as np
data = np.array([
[1, 2, 1, 3, 1, 2, 1],
[3, 4, 1, 5, 2, 7, 2],
[2, 1, 2, 1, 1, 4, 5],
[6, 1, 2 ,3, 1, 3, 1]])
cols_idx = np.array([0, 0, 1, 0, 1, 0, 0])
new_data = data[:, np.nonzero(cols_idx)[0]]
print(new_data)
'''[[1 1]
[1 2]
[2 1]
[2 1]]'''
print(new_data.shape) # (4,2)

From numpy documentation:
While the nonzero values can be obtained with a[nonzero(a)], it is recommended to use x[x.astype(bool)] or x[x != 0] instead, which will correctly handle 0-d arrays.
So it's better to use:
data[:, cols_idx.astype(bool)]
or
data[:, cols_idx != 0]

Related

how to convert a 1d numpy array to a lower triangular matrix?

I have a numpy array like:
np.array([1,2,3,4])
and I want to convert it to a lower triangular matrix like
np.array([
[4, 0, 0, 0],
[3, 4, 0, 0],
[2, 3, 4, 0],
[1, 2, 3, 4]
])
, without for loop.... how can i do it?
A similar solution to proposed in a comment by Michael Szczesny can be:
b = np.arange(a.size)
result = np.tril(np.take(a, b - b[:,None] + a.size - 1, mode='clip'))
The result is:
array([[4, 0, 0, 0],
[3, 4, 0, 0],
[2, 3, 4, 0],
[1, 2, 3, 4]])

Find most common value in numpy 2d array rows, otherwise return maximum

I have an array like this
Nbank = np.array([[2, 3, 1],
[1, 2, 2],
[3, 2, 1],
[3, 2, 1],
[2, 3, 2],
[2, 2, 3],
[1, 1, 3],
[2, 1, 1],
[2, 2, 3],
[1, 1, 1],
[2, 1, 1],
[2, 3, 1],
[1, 2, 1]])
I want to return an array with only one column. The condition is to return the most common value in each row; if multiple values have the same number of occurrences, just return the maximum of them.
I used this code
most_f = np.array([np.bincount(row).argmax() for row in Nbank])
if multiple values have the same number of occurrences, it returns the first item instead of the maximum. how can I work this around?
You could use a Counter after sorting in descending order by row. There's a most_common that will return what you want. Since it's sorted already, the first element is always either the largest or the most frequent.
import numpy as np
from collections import Counter
Nbank = np.array([[2, 3, 1],
[1, 2, 2],
[3, 2, 1],
[3, 2, 1],
[2, 3, 2],
[2, 2, 3],
[1, 1, 3],
[2, 1, 1],
[2, 2, 3],
[1, 1, 1],
[2, 1, 1],
[2, 3, 1],
[1, 2, 1]])
np.array([Counter(sorted(row, reverse=True)).most_common(1)[0][0] for row in Nbank])
Output
array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1])
I believe this will solve the problem. You could probable make it into a one liner with some fancy list comprehension, but I don't think that would be worth while.
most_f = []
for n in Nbank: #iterate over elements
counts = np.bincount(n) #count the number of elements of each value
most_f.append(np.argwhere(counts == np.max(counts))[-1][0]) #append the last and highest
You can cheat a little bit and reverse each row in order to make np.argmax return indice of the rightmost occurence which corresponds to the largest item:
N = np.max(arr)
>>> [N - np.argmax(np.bincount(row, minlength=N+1)[::-1]) for row in Nbank]
[3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1]
You might also like to avoid loops which is definitely adviseable if you want to take full advantages of numpy. Unfortunately np.bincount is not supported for 2D arrays but you can do it manually:
N, M = arr.shape[0], np.max(arr)+1
bincount_2D = np.zeros(shape=(N, M), dtype=int)
advanced_indexing = np.repeat(np.arange(N), arr.shape[1]), arr.ravel()
np.add.at(bincount_2D, advanced_indexing, 1)
>>> bincount_2D
array([[0, 1, 1, 1],
[0, 1, 2, 0],
[0, 1, 1, 1],
[0, 1, 1, 1],
[0, 0, 2, 1],
[0, 0, 2, 1],
[0, 2, 0, 1],
[0, 2, 1, 0],
[0, 0, 2, 1],
[0, 3, 0, 0],
[0, 2, 1, 0],
[0, 1, 1, 1],
[0, 2, 1, 0]])
And then repeat the process for all the rows simultaneously:
>>> M -1 - np.argmax(bincount_2D[:,::-1], axis=1)
array([3, 2, 3, 3, 2, 2, 1, 1, 2, 1, 1, 3, 1], dtype=int64)

Fast merge elements of two arrays of arrays only if the element is different than zero

So I have two numpy arrays of arrays
a = [[[1, 2, 3, 4], [3, 3, 3, 3], [4, 4, 4, 4]]]
b = [[[0, 0, 4, 0], [0, 0, 0, 0], [0, 1, 0, 1]]]
Both arrays are always of the same size.
The result should be like
c = [[[1, 2, 4, 4], [3, 3, 3, 3], [4, 1, 4, 1]]]
How can I do that in a very fast way in numpy?
Use numpy.where:
import numpy as np
a = np.array([[1, 2, 3, 4], [3, 3, 3, 3], [4, 4, 4, 4]])
b = np.array([[0, 0, 4, 0], [0, 0, 0, 0], [0, 1, 0, 1]])
res = np.where(b == 0, a, b)
print(res)
Output
[[1 2 4 4]
[3 3 3 3]
[4 1 4 1]]
For optimal speed use b criterion directly.
Instead of
np.where(b == 0, a, b)
# array([[1, 2, 4, 4],
# [3, 3, 3, 3],
# [4, 1, 4, 1]])
timeit(lambda:np.where(b==0,a,b))
# 2.6133874990046024
better do
np.where(b,b,a)
# array([[1, 2, 4, 4],
# [3, 3, 3, 3],
# [4, 1, 4, 1]])
timeit(lambda:np.where(b,b,a))
# 1.5850481310044415

Using function with entries from two 3D arrays (Python)

So let's say I have two arrays (numpy arrays that is):
array1 =
[[[1, 0, 0], [0, 6, 0], [3, 0, 0]],
[[0, 2, 4], [0, 4, 0], [0, 4, 0]],
[[0, 0, 2], [1, 3, 2], [3, 4, 0]]]
and
array2 =
[[[2, 4, 0], [0, 4, 0], [3, 0, 0]],
[[0, 0, 3], [1, 4, 3], [2, 4, 3]],
[[0, 0, 1], [0, 2, 1], [1, 0, 2]]]
I then make a function like:
def array_calc(x,y,z):
x*y+z
What I would like to do now is have the x-values come from array1 and y-values from array2, and z-values just a constant I choose (let's say z = 0), and then do the calculation on each entry of the arrays, and ultimately end up with a new array, where the calculation has been done, and I get something like:
array_result =
[[[2, 0, 0], [0, 24, 0], [9, 0, 0]],
[[0, 0, 12], [0, 16, 0], [0, 16, 0]],
[[0, 0, 2], [0, 6, 2], [3, 0, 0]]]
But, I'm not quite sure how that is done.
If your arrays are numpy arrays, it is as simple as:
import numpy as np
x = np.array([[1,0],[0,1]])
y = np.array([[4,1],[0,2]])
z = 1
result = x*y + z
# result = array([[5, 1], [1, 3]])
Using simple for loops:
import numpy as np
def array_calc(x, y, z):
"""Returns x * y + z with x and y 3D Numpy arrays and z a number"""
new_arr = x.copy()
for i in np.arange(x.shape[0]):
for k in np.arange(x.shape[1]):
for j in np.arange(x.shape[2]):
new_arr[i, k, j] = x[i, k, j] * y[i, k, j] + z
return new_arr
With:
array1 = np.array([[[1, 0, 0], [0, 6, 0], [3, 0, 0]],
[[0, 2, 4], [0, 4, 0], [0, 4, 0]],
[[0, 0, 2], [1, 3, 2], [3, 4, 0]]])
array2 = np.array([[[2, 4, 0], [0, 4, 0], [3, 0, 0]],
[[0, 0, 3], [1, 4, 3], [2, 4, 3]],
[[0, 0, 1], [0, 2, 1], [1, 0, 2]]])
Returns:
array([[[ 3, 1, 1],
[ 1, 25, 1],
[10, 1, 1]],
[[ 1, 1, 13],
[ 1, 17, 1],
[ 1, 17, 1]],
[[ 1, 1, 3],
[ 1, 7, 3],
[ 4, 1, 1]]])
A way I can think of is to iterate through them and perform your calculations.
This can be done with 3 dimensional arrays too but I just found it easier to do it with 2 dimensional arrays. I am sure there are other ways to reduce the complexity further down because 3 for loops is not the best solution but it gets the work done.
The code is here:
array1 = [[[1, 0, 0], [0, 6, 0], [3, 0, 0]],[[0, 2, 4], [0, 4, 0], [0, 4, 0]],[[0, 0, 2], [1, 3, 2], [3, 4, 0]]]
array2 = [[[2, 4, 0], [0, 4, 0], [3, 0, 0]], [[0, 0, 3], [1, 4, 3], [2, 4, 3]], [[0, 0, 1], [0, 2, 1], [1, 0, 2]]]
z=0
array_1 = reduce(list.__add__, array1)
array_2 = reduce(list.__add__, array2)
array_3 = [[0,0,0] for _ in xrange(9)]
len_array=9
for i in range(len_array):
for l in range(3):
array_3[i][l] = array_1[i][l]*array_2[i][l]+z
print array_3

Is there a way to loop through the return value of np.where?

Is there a way to loop-through this tuple(?) where the left array are positions in an array and the right array is the value I would like to insert into the given positions:
(array([ 0, 4, 6, ..., 9992, 9996, 9997]), array([3, 3, 3, ..., 3, 3, 3]))
The output above is generated from the following piece of code:
np.where(h2 == h2[i,:].max())[1]
I would like the result to be like this:
array[0] = 3
array[4] = 3
...
array[9997] = 3
Just use a simple indexing:
indices, values = my_tuple
array[indices] = values
If you don't have the final array yet you can create it using a desire function like np.zeros, np.ones, etc. with a size as the size of maximum index.
I think you want the transpose of the where tuple:
In [204]: x=np.arange(1,13).reshape(3,4)
In [205]: x
Out[205]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
In [206]: idx=np.where(x)
In [207]: idx
Out[207]:
(array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2], dtype=int32),
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32))
In [208]: ij=np.transpose(idx)
In [209]: ij
Out[209]:
array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[2, 0],
[2, 1],
[2, 2],
[2, 3]], dtype=int32)
In fact there's a function that does just that:
np.argwhere(x)
Iterating on ij, I can print:
In [213]: for i,j in ij:
...: print('array[{}]={}'.format(i,j))
...:
array[0]=0
array[0]=1
array[0]=2
zip(*) is a list version of transpose:
for i,j in zip(*idx):
print(i,j)

Categories

Resources