find top_k element of numpy ndarray and ignore zero - python

Given a numpy ndarray like the following
x = [[4.,0.,2.,0.,8.],
[1.,3.,0.,9.,5.],
[0.,0.,4.,0.,1.]]
I want to find the indices of the top k (e.g. k=3) elements of each row, excluding 0, if possible. If there are less than k positive elements, then just return their indices (in a sorted way).
The result should look like (a list of array)
res = [[4, 0, 2],
[3, 4, 1],
[2, 4]]
or just one flatten array
res = [4,0,2,3,4,2,2,4]
I know argsort can find the indices of top k elements in a sorted order. But I am not sure how to filter out the 0.

You can use numpy.argsort with (-num) for getting index as descending. then use numpy.take_along_axis for getting values base index of 2D sorted.
Because you want to ignore zero you can insert zero for other columns after three (as you mention in the question). At the end return value from the sorted values that is not zero.
x = np.array([[4.,0.,2.,0.,8.],[1.,3.,0.,9.,5.],[0.,0.,4.,0.,1.]])
idx_srt = np.argsort(-x)
val_srt = np.take_along_axis(x, idx_srt, axis=-1)
val_srt[:, 3:] = 0
res = idx_srt[val_srt!=0]
print(res)
[4 0 2 3 4 1 2 4]

Try one of these two:
k = 3
res = [sorted(range(len(r)), key=(lambda i: r[i]), reverse=True)[:min(k, len([n for n in r if n > 0]))] for r in x]
or
res1 = [np.argsort(r)[::-1][:min(k, len([n for n in r if n > 0]))] for r in x]

I came up with the following solution:
top_index = score.argsort(axis=1) # score here is my x
positive = (score > 0).sum(axis=1)
positive = np.minimum(positive, k) # top k
# broadcasting trick to get mask matrix that selects top k (k = min(2000, num of positive scores))
r = np.arange(score.shape[1])
mask = (positive[:,None] > r)
top_index_flatten = top_index[:, ::-1][mask]
I compare my result with the one suggested by #I'mahdi and they are consistent.

Related

Numpy array operation to shift index

I have a very specific situation: I have a long 1-D numpy array (arr). I am interested in those elements that are greater than a no. (n). So I am using: idx = np.argwhere(arr > n) and: val = arr[idx] to get the elements and their indices. Now the problem: I am adding an integer offset (ofs) to the indices (idx) and bringing back the overflowing indices to the front using: idx = (idx + ofs) % len(arr) (as if the original array (arr) is rolled and again argwhere used). If it is correct till here, what exactly should I use to get the updated val (the array that corresponds to the idx)? Thanks in advance.
Ex: Let arr=[2,5,8,4,9], n=4, so idx=[1,2,4] and val=[5,8,9]. Now let ofs=3, then idx=[4,5,7]%5=[4,0,2]. I expect val=[8,9,5].
I don't know if I understand the aim of this question correctly, but if we want to rearrange val with orders in idx, it can be done by np.argsort as:
mask_idx = np.where(arr > n)[0] # satisfied indices in arr, where elements are bigger than the specified value
val = arr[mask_idx] # satisfied corresponding values
mask_updated_idx = (mask_idx + ofs) % len(arr) # --> [4 0 2]
idx_sorted = mask_updated_idx.argsort() # --> [1 2 0] indices rearranging order array
val = val[idx_sorted] # --> [8 9 5]

Compare and multiply elements in a list

I'm trying to write in an algorithm a function that:
Check if all elements in a list are different
Multiply all elements in the list, except the zeros
But I can't find a way to compare all elements in one list, do you have any idea ?
Thanks!
PS: I use arr = np.random.randint(10, size=a) to create a random list
EDIT:
More precisely, I'm trying to check if, in a numpy array to be more precise, all the elements are the same or different, if they are different, that it returns me True.
Also, once that done, multiply all elements in the array except the zeros
For example:
If I have an array [4,2,6,8,9,0], the algorithm tells returns me at first True because all elements are different, then it multiplies them 4*2*6*8*9 except the 0
To check if all elements in a list are different you can convert the list into a set which removes duplicates and compare the length of the set to the original list. If the length of the set is different than the length of the list, then there are duplicates.
x = np.random.randint(10, size=10)
len(set(x)) == len(x)
To multiply all values except 0 you can do list comprehension to remove the 0s and use np.prod to multiply the values in the new list.
np.prod([i for i in x if i != 0])
Example:
x = np.random.randint(10, size=10)
if len(set(x)) == len(x):
print(np.prod([i for i in x if i != 0]))
else:
print("array is not unique")
You can use numpy.unique.
Following code snippet checks if all elements in the array are unique (different from each other) and if so, it will multiply non-zero values with factor factor:
import numpy as np
factor = 5
if np.unique(arr).size == arr.size:
arr[arr != 0] = arr[arr != 0] * factor
You can use Collections to find the unique numbers. I have included a code that solves your problem.
import numpy as np
from collections import Counter
a = 5
arr = np.random.randint(10, size=a)
result = 1 #Variable that will store the product
flag = 0 #The counter
#Check if all the numbers are unique
for i in Counter(arr):
if Counter(arr)[i] > 1:
flag = 1
break
#Convert the dictionary into a list
l = [i for i in Counter(arr)]
#Return the product of all the numbers in the list except 0
if flag == 0:
for i in l:
if i != 0:
result = result * i
else:
print("The numbers are not unique")
Just for fun, here's a one-liner:
arr = np.array([1, 2, 3, 4, 0])
np.prod(arr[arr!=0]) if np.unique(arr).size == arr.size else False
>>> 24
If the array is [1, 2, 3, 4, 4] the result is False

How to calculate numbers of "uninterrupted" repeats in an array in python?

I have a 0,1 numpy array like this:
[0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0]
I want to have a function that tells me number 1 is repeated 3,2,4 times in this array, respectively. Is there a simple numpy function for this?
This is one way to do it to find first the clusters and then get their frequency using Counter. The first part is inspired from this answer for 2d arrays. I added the second Counter part to get the desired answer.
If you find the linked original answer helpful, please visit it and upvote it.
from scipy.ndimage import measurements
from collections import Counter
arr = np.array([0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0])
cluster, freq = measurements.label(arr)
print (list(Counter(cluster).values())[1:])
# [3, 2, 4]
Assume you only have 0s and 1s:
import numpy as np
a = np.array([0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0])
# pad a with 0 at both sides for edge cases when a starts or ends with 1
d = np.diff(np.pad(a, pad_width=1, mode='constant'))
# subtract indices when value changes from 0 to 1 from indices where value changes from 1 to 0
np.flatnonzero(d == -1) - np.flatnonzero(d == 1)
# array([3, 2, 4])
A custom implementation?
def count_consecutives(predicate, iterable):
tmp = []
for e in iterable:
if predicate(e): tmp.append(e)
else:
if len(tmp) > 0: yield(len(tmp)) # > 1 if you want at least two consecutive
tmp = []
if len(tmp) > 0: yield(len(tmp)) # > 1 if you want at least two consecutive
So you can:
array = [0,0,0,1,1,1,0,0,1,1,0,0,0,1,1,1,1,0,0,0]
(count_consecutives(lambda x: x == 0, array)
#=> [3, 2, 4]
And also:
array = [0,0,0,1,2,3,0,0,3,2,1,0,0,1,11,10,10,0,0,100]
count_consecutives(lambda x: x > 1, array)
# => [2, 2, 3, 1]

Repeat numbers according to pattern numpy

I have an array, say p = [2,3,2,4] and a number, say n = 4. I need to generate an array of ones and zeros according to the pattern p, n-p. That is for each element, u in p, there are u ones followed by n-u zeros. It's very easy to do this using the np.insert operation. But theano doesn't have any insert op. Is it possible to achieve this without using a loop? Also, given multiple such ps and corresponding ns, is it possible to generate the ones and zeros patterns without using a loop?
Here's an example:
1 value of p:
p = [2,3,2,4,1], n=4
n-p = [2,1,2,0,3]
result = [1,1,0,0,1,1,1,0,1,1,0,0,1,1,1,1,1,0,0,0]
multiple values of p: In this case all p's will have same dimension(p is a 2D array)
p = [[2,3,2,4,1],[2,2,3,5,4]], n = [4, 5]
n-p = [[2,1,2,0,3],[3,3,2,0,1]]
result = [[1,1,0,0,1,1,1,0,1,1,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0],[1,1,0,0,0,1,1,0,0,0,1,1,1,0,0,1,1,1,1,1,1,1,1,1,0]]
Please note that I've padded result[0] with 0s at the end to match the dimensions of result[0] and result[1]
p = numpy.array([2, 3, 2, 4])
n = 4
result = (p[:, None] > numpy.arange(n)).ravel().astype(int)
We compare
[[2]
[3]
[2]
[4]]
to [0 1 2 3] to get an array of booleans, then flatten it and convert it to integers to get the output you want.

How to multiply matrixes using for loops - Python

I have no idea how to even begin doing this
It needs to be a for loop to multiply mtrixes
for example
[[1,2],[3,4]] * [[3,4],[5,6]]
[1 , 2] , [3 , 4]
[3 , 4] *[5 , 6]
Need help much appreciated
I know 90% of dont want to code for me so that's ok
It only needs to be two square matrixes
i'm pretty sure the pattern is looking at it in the list thing
a[1][1]*b[1][1]+a[1][2]*b[2][1] a[1][1]b[1][2]+a[1][2]b[2][2]
a[2][1]b[1][1]+a[2][2]b[2][1] a[2][1]b[1][2]+a[2][2]b[2][2]
result = [] # final result
for i in range(len(A)):
row = [] # the new row in new matrix
for j in range(len(B[0])):
product = 0 # the new element in the new row
for v in range(len(A[i])):
product += A[i][v] * B[v][j]
row.append(product) # append sum of product into the new row
result.append(row) # append the new row into the final result
print(result)
Break it down. Before you try to write a function that multiplies matrices, write one that multiplies vectors. If you can do that, multiplying two matrices is just a matter of multiplying row i and column j for every element i,j of the resultant matrix.
If you look at how matrix multiplication works:
[ 1 2 ] x [ 5 6 ] = [ 1*5+2*7 1*6+2*8 ]
[ 3 4 ] [ 7 8 ] [ 3*5+4*7 3*6+4*8 ]
then you can determine a method to calculate this, e.g. if you are multiplying for element i, j of the output matrix, then you need to multiply everything in row i of the LHS matrix by everything in the column j of the RHS matrix, so that is a single for loop (as the number of elements in the row i is equal to column j).
You also need to cover every combination of i and j for the dimensions of the output matrix, which is a for loop for the columns nested inside a for loop for the rows.
The actual code is, of course, an exercise for you to implement.
>>> A=[[1,2],[3,4]]
>>> B=[[3,4],[5,6]]
>>> n=2
>>> ans=[[0]*n for i in range(n)]
>>> ans
[[0, 0], [0, 0]]
>>> for i in range(n):
... for j in range(n):
... ans[i][j]=sum((A[i][v]*B[v][j] for v in range(n)))
...
>>> ans
[[13, 16], [29, 36]]
I think you just need to simplify the formula of matrix multiplication.
We have A*B=C then:
Cij= the value in the ith row and jth column of the answer. For example above we have C12=16 and C11=13.. (note that this is the 0th position in the array so often we start from 0 instead of 1)
Cij= dot_product(row_i_of_A,column_j_of_B)=sum(row_i_of_A(v)*column_j_of_B(v) for v in range(n))
Because we want the whole answer (all of C), we need to work out all possible Cij. This means we need to try all possible pairs ij, so we loop through i in range(n), j in range(n) and do this for each possible pair.
from numpy import *
m1 = array([[1, 2, 3],[4, 5, 6] ])
m2 = array([[7, 8],[9, 10],[11, 12]])
r = array([[0, 0],[0, 0]])
s = 0
for i in range(2):
for j in range(2):
for k in range(3):
s = s + m1[i][k]*m2[k][j]
r[i][j] = s
s = 0
print(r)
I think append function is not working in a two-dimensional array when we are using numpy module, so this is the way I have solved it.
def matmul(matrix1_,matrix2_):
result = [] # final result
for i in range(len(matrix1_)):
row = [] # the new row in new matrix
for j in range(len(matrix2_[0])):
product = 0 # the new element in the new row
for v in range(len(matrix1_[i])):
product += matrix1_[i][v] * matrix2_[v][j]
row.append(product) # append sum of product into the new row
result.append(row) # append the new row into the final result
return result
u and v are constructed for visualization purpose.
from typing import List
A = [[1,0,0],[-1,0,3]]
B = [[7,0,0],[0,0,0],[0,0,1]]
def mult_mat(A:List[List[int]], B:List[List[int]]) -> List[List[int]]:
n = len(A) # Number of rows in matrix A
m = len(B[0]) # Number of columns in matrix B
ret = [[0 for i in range(m)] for j in range(n)]
for row in range(n):
u = A[row]
for col in range(m):
v = [B[i][col] for i in range(len(B))]
# Here you can calculate ret[row][col] directly without v
# But v is constructed for visualization purpose
ret[row][col] = sum([x*y for x,y in zip(u,v)])
return ret
if __name__ == '__main__':
print(mult_mat(A,B))

Categories

Resources