Count unique elements row wise in an ndarray - python

An extension to this question. In addition to having the unique elements row-wise, I want to have a similarly shaped array that gives me the count of unique values. For example, if the initial array looks like this:
a = np.array([[1, 2, 2, 3, 4, 5],
[1, 2, 3, 3, 4, 5],
[1, 2, 3, 4, 4, 5],
[1, 2, 3, 4, 5, 5],
[1, 2, 3, 4, 5, 6]])
I would like to get this as the output from the function:
np.array([[1, 2, 0, 1, 1, 1],
[1, 1, 2, 0, 1, 1],
[1, 1, 1, 2, 0, 1],
[1, 1, 1, 1, 2, 0],
[1, 1, 1, 1, 1, 1]])
In numpy v.1.9 there seems to be an additional argument return_counts that can return the counts in a flattened array. Is there some way this can be re-constructed into the original array dimensions with zeros where values were duplicated?

The idea behind this answer is very similar to the one used here. I'm adding a unique imaginary number to each row. Therefore, no two numbers from different rows can be equal. Thus, you can find all the unique values in a 2D array per row with just one call to np.unique.
The index, ind, returned when return_index=True gives you the location of the first occurrence of each unique value.
The count, cnt, returned when return_counts=True gives you the count.
np.put(b, ind, cnt) places the count in the location of the first occurence of each unique value.
One obvious limitation of the trick used here is that the original array must have int or float dtype. It can not have a complex dtype to start with, since multiplying each row by a unique imaginary number may produce duplicate pairs from different rows.
import numpy as np
a = np.array([[1, 2, 2, 3, 4, 5],
[1, 2, 3, 3, 4, 5],
[1, 2, 3, 4, 4, 5],
[1, 2, 3, 4, 5, 5],
[1, 2, 3, 4, 5, 6]])
def count_unique_by_row(a):
weight = 1j*np.linspace(0, a.shape[1], a.shape[0], endpoint=False)
b = a + weight[:, np.newaxis]
u, ind, cnt = np.unique(b, return_index=True, return_counts=True)
b = np.zeros_like(a)
np.put(b, ind, cnt)
return b
yields
In [79]: count_unique_by_row(a)
Out[79]:
array([[1, 2, 0, 1, 1, 1],
[1, 1, 2, 0, 1, 1],
[1, 1, 1, 2, 0, 1],
[1, 1, 1, 1, 2, 0],
[1, 1, 1, 1, 1, 1]])

This method does the same as np.unique for each row, by sorting each row and getting the length of consecutive equal values. This has complexity O(NMlog(M)) which is better than running unique on the whole array, since that has complexity O(NM(log(NM))
def row_unique_count(a):
args = np.argsort(a)
unique = a[np.indices(a.shape)[0], args]
changes = np.pad(unique[:, 1:] != unique[:, :-1], ((0, 0), (1, 0)), mode="constant", constant_values=1)
idxs = np.nonzero(changes)
tmp = np.hstack((idxs[-1], 0))
counts = np.where(tmp[1:], np.diff(tmp), a.shape[-1]-tmp[:-1])
count_array = np.zeros(a.shape, dtype="int")
count_array[(idxs[0], args[idxs])] = counts
return count_array
Running times:
In [162]: b = np.random.random(size=100000).reshape((100, 1000))
In [163]: %timeit row_unique_count(b)
100 loops, best of 3: 10.4 ms per loop
In [164]: %timeit count_unique_by_row(b)
100 loops, best of 3: 19.4 ms per loop
In [165]: assert np.all(row_unique_count(b) == count_unique_by_row(b))

Related

Splitting a sorted array of repeated elements

I have an array of repeated elements, where each repeated element represents a class. What i would like to do is obtain the indices of the repeated elements and partition in order of the nth first elements in 3 slices. For example:
np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
split the first occurences in 3
[0, 2, 1] [2, 0, 1], [2, 1, 0, 0]
I would like to find the indices of the repeated elements and split the array in proportions of 3, where each sliced array will contain the first 3 repeated elements indices:
So for the array and it's splits, i'd like to obtain the following:
array[0, 2, 2, 1, 0, 1, 2, 1, 0, 0]
indices:[0, 1, 3], [2, 4, 5], [6, 7, 8, 9]
I've tried the following:
a = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
length = np.arange(len(a))
array_set = (([length[a ==unique] for unique in np.unique(a)]))
But i can't figure how to split the partitions in order of the first occurences like the above examples.
This is a way to split the array in proportions of 3, that is, the last 0 will be left out:
# unique values
uniques = np.unique(a)
# counting occurrence of each unique value
occ = np.cumsum(a == uniques[:,None], axis=1)
# maximum common occurrence
max_occ = occ.max(axis=1).min()
# masking the first occurrences
u = (occ[None,...] == (np.arange(max_occ)+1)[:,None, None])
# the indexes
idx = np.sort(np.argmax(u, axis=-1), axis=-1)
# the partitions
partitions = a[idx]
Output:
# idx
array([[0, 1, 3],
[2, 4, 5],
[6, 7, 8]])
# partitions
array([[0, 2, 1],
[2, 0, 1],
[2, 1, 0]])
This is a problem where np.concatenate(...) + some algorithm + np.split(...) does the trick, though they are slow methods.
Lets start from concatenation and referencing indexes where you split:
classes = [[0, 2, 1], [2, 0, 1], [2, 1, 0, 0]]
split_idx = np.cumsum(list(map(len, classes[:-1])))
flat_classes = np.concatenate(classes)
Then indexes that sorts an initial array and also indexes of starts of groups are needed. In this case sorted array is [0,0,0,0,1,1,1,2,2,2] and distinct groups start at 0, 4 and 7.
c = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
idx = np.argsort(c)
u, cnt = np.unique(c, return_counts=True)
marker_idx = np.r_[0, np.cumsum(cnt[:-1])]
Now this is a trickiest part. It is known that one of indexes 0, 4 or 7 changes in each step (while you iterate on flat_classes), so you can accumulate these changes in a special array called counter which has 3 columns for each index and after that access only these indexes where changes were met:
take = np.zeros((len(flat_classes), len(u)), dtype=int)
take[np.arange(len(flat_classes)), flat_classes] = 1
counter = np.cumsum(take, axis=0)
counter = counter + marker_idx - np.ones(len(u), dtype=int)
active_idx = counter[np.arange(len(flat_classes)), flat_classes]
splittable = idx[active_idx] #remember that we are working on indices that sorts array
output = np.split(splittable, split_idx)
Output
[array([0, 1, 3], dtype=int64),
array([2, 4, 5], dtype=int64),
array([6, 7, 8, 9], dtype=int64)]
Remark: the main idea of solution is to manipulate with changes of indexes of other indexes that sorts an array. This is example of changes for this problem:
>>> counter
array([[0, 3, 6],
[0, 3, 7],
[0, 4, 7],
[0, 4, 8],
[1, 4, 8],
[1, 5, 8],
[1, 5, 9],
[1, 6, 9],
[2, 6, 9],
[3, 6, 9]]

How to find the indices of the maximum in each line, a concatenation of rows, in numpy?

I don't know if this is simple or not or if it is asked before or not. (I searched but did not find the correct way to do it. I have found numpy.argmax and numpy.amax but I am not able to use them correctly.)
I have a numpy array (it is a CxKxN matrix) as follows (C=K=N=3):
array([[[1, 2, 3],
[2, 1, 4],
[4, 3, 3]],
[[2, 1, 1],
[1, 3, 1],
[3, 4, 2]],
[[5, 2, 1],
[3, 3, 3],
[4, 1, 2]]])
I would like to find the indices of the maximum elements across each line. A line is the concatenation of the three (C) rows of each matrix. In other words, the i-th line is the concatenation of the i-th row in the first matrix, the i-th row in the second matrix, ..., until the i-th row in the C-th matrix.
For example, the first line is
[1, 2, 3, 2, 1, 1, 5, 2, 1]
So I would like to return
[2, 0, 0] # the index of the maximum in the first line
and
[0, 1, 2] # the index of the maximum in the second line
and
[0, 2, 0] # the index of the maximum in the third line
or
[1, 2, 1] # the index of the maximum in the third line
or
[2, 2, 0] # the index of the maximum in the third line
Now, I am trying this
np.argmax(a[:,0,:], axis=None) # for the first line
It returns 6 and
np.argmax(a[:,1,:], axis=None)
and it returns 2 and
np.argmax(a[:,2,:], axis=None)
and it returns 0
but I am able to convert these numbers to indices like 6 = (2,0,0), etc.
With an transpose and reshape I get your 'rows'
In [367]: arr.transpose(1,0,2).reshape(3,9)
Out[367]:
array([[1, 2, 3, 2, 1, 1, 5, 2, 1],
[2, 1, 4, 1, 3, 1, 3, 3, 3],
[4, 3, 3, 3, 4, 2, 4, 1, 2]])
In [368]: np.argmax(_, axis=1)
Out[368]: array([6, 2, 0])
These max are same as yours. The same indices, but in a (3,3) array:
In [372]: np.unravel_index([6,2,0],(3,3))
Out[372]: (array([2, 0, 0]), array([0, 2, 0]))
Join them with middle dimension range:
In [373]: tup = (_[0],np.arange(3),_[1])
In [374]: np.transpose(tup)
Out[374]:
array([[2, 0, 0],
[0, 1, 2],
[0, 2, 0]])

Clone items in a list by index

I have a numpy array
np.array([[1,4,3,5,2],
[3,2,5,2,3],
[5,2,4,2,1]])
and I want to clone items by their indexes. For example, I have an index of
np.array([[1,4],
[2,4],
[1,4]])
These correspond to the positions of the items at each row. e.g. the first [1,4] are the indexes for 4, 2 in the first row.
I want in the end returning a new numpy array giving initial array and the index array.
np.array([[1,4,4,3,5,2,2],
[3,2,5,5,2,3,3],
[5,2,2,4,2,1,1]])
The effect is the selected column values are repeated once. Any way to do this? Thanks.
I commented that this could be viewed as a 1d problem. There's nothing 2d about it, except that you are adding 2 values per row, so you end up with a 2d array. The other key idea is that np.repeats lets us repeat selected elements several times.
In [70]: arr =np.array([[1,4,3,5,2],
...: [3,2,5,2,3],
...: [5,2,4,2,1]])
...:
In [71]: idx = np.array([[1,4],
...: [2,4],
...: [1,4]])
...:
Make an array of 'repeat' counts - start with 1 for everything, and add 1 for the elements we want to dupicate:
In [72]: repeats = np.ones_like(arr)
In [73]: repeats
Out[73]:
array([[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1]])
In [74]: for i,j in enumerate(idx):
...: repeats[i,j] += 1
...:
In [75]: repeats
Out[75]:
array([[1, 2, 1, 1, 2],
[1, 1, 2, 1, 2],
[1, 2, 1, 1, 2]])
Now just apply repeat to the flattened arrays, and reshape:
In [76]: np.repeat(arr.ravel(),repeats.ravel())
Out[76]: array([1, 4, 4, 3, 5, 2, 2, 3, 2, 5, 5, 2, 3, 3, 5, 2, 2, 4, 2, 1, 1])
In [77]: _.reshape(3,-1)
Out[77]:
array([[1, 4, 4, 3, 5, 2, 2],
[3, 2, 5, 5, 2, 3, 3],
[5, 2, 2, 4, 2, 1, 1]])
I may add a list solution, once I work that out.
a row by row np.insert solution (fleshing out the concept suggested by #f5r5e5d):
Test with one row:
In [81]: row=arr[0]
In [82]: i=idx[0]
In [83]: np.insert(row,i,row[i])
Out[83]: array([1, 4, 4, 3, 5, 2, 2])
Now apply iteratively to all rows. The list of arrays can then be turned back into an array:
In [84]: [np.insert(row,i,row[i]) for i,row in zip(idx,arr)]
Out[84]:
[array([1, 4, 4, 3, 5, 2, 2]),
array([3, 2, 5, 5, 2, 3, 3]),
array([5, 2, 2, 4, 2, 1, 1])]
np.insert may help
a = np.array([[1,4,3,5,2],
[3,2,5,2,3],
[5,2,4,2,1]])
i = np.array([[1,4],
[2,4],
[1,4]])
np.insert(a[0], 4, a[0,4])
Out[177]: array([1, 4, 3, 5, 2, 2])
as mentioned, np.insert can do more than one element at a time from a one dimensional obj
np.insert(a[0], i[0], a[0,i[0]])
Out[187]: array([1, 4, 4, 3, 5, 2, 2])

Get array of indices of first zero in every row of numpy array

I have a numpy array of 1650 rows and 1275 columns containing 0s and 255s.
I want to get the index of every first zero in the row and store it in an array.
I used for loop to achieve that. Here is the example code
#new_arr is a numpy array and k is an empty array
for i in range(new_arr.shape[0]):
if not np.all(new_arr[i,:]) == 255:
x = np.where(new_arr[i,:]==0)[0][0]
k.append(x)
else:
k.append(-1)
It takes around 1.3 seconds for 1650 rows. Is there any other way or function to get the indices array in a much faster way?
One approach would be to get mask of matches with ==0 and then get argmax along each row, i.e argmax(axis=1) that gives us the first matching index for each row -
(arr==0).argmax(axis=1)
Sample run -
In [443]: arr
Out[443]:
array([[0, 1, 0, 2, 2, 1, 2, 2],
[1, 1, 2, 2, 2, 1, 0, 1],
[2, 1, 0, 1, 0, 0, 2, 0],
[2, 2, 1, 0, 1, 2, 1, 0]])
In [444]: (arr==0).argmax(axis=1)
Out[444]: array([0, 6, 2, 3])
Catching non-zero rows (if we can!)
To facilitate for rows that won't have any zero, we need to do one more step of work, with some masking -
In [445]: arr[2] = 9
In [446]: arr
Out[446]:
array([[0, 1, 0, 2, 2, 1, 2, 2],
[1, 1, 2, 2, 2, 1, 0, 1],
[9, 9, 9, 9, 9, 9, 9, 9],
[2, 2, 1, 0, 1, 2, 1, 0]])
In [447]: mask = arr==0
In [448]: np.where(mask.any(1), mask.argmax(1), -1)
Out[448]: array([ 0, 6, -1, 3])

function to find minimum number greater than zero from rows of a array and store into a list

I want to find out minimum number from the rows of 9x9 array A and store in a list m. But I want to exclude zero. The program that I made is returning zero as minimum.
m = []
def find (L):
for i in range(len(L)):
m.append(A[L[i]].min())
c = [2,3,4,5,6];
find(c)
print m
Here's a NumPy solution -
np.where(a>0,a,a.max()).min(1)
Sample run -
In [45]: a
Out[45]:
array([[0, 4, 6, 6, 1],
[3, 1, 5, 0, 0],
[6, 3, 6, 0, 0],
[0, 6, 3, 5, 2]])
In [46]: np.where(a>0,a,a.max()).min(1)
Out[46]: array([1, 1, 3, 2])
If you want to perform this operation along selected rows only specified by row indices in L -
def find(a,L):
return np.where(a[L]>0,a[L],a.max()).min(1)
Sample run -
In [62]: a
Out[62]:
array([[0, 4, 6, 6, 1],
[3, 1, 5, 0, 0],
[6, 3, 6, 0, 0],
[0, 6, 3, 5, 2]])
In [63]: L = [2,3]
In [64]: find(a,L)
Out[64]: array([3, 2])

Categories

Resources