Find the counts of unique values row-wise using numpy - python

i am finding counts of unique values row-wise
import numpy as np
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
arr
output::
array([[ 9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
[ 3, 3, 2, 1, 9, 5, 1, 10, 7, 3],
[ 5, 2, 6, 4, 5, 5, 4, 8, 2, 2],
[ 8, 8, 1, 3, 10, 10, 4, 3, 6, 9],
[ 2, 1, 8, 7, 3, 1, 9, 3, 6, 2],
[ 9, 2, 6, 5, 3, 9, 4, 6, 1, 10]])
my code to finding counts of unique values row wise
row,col = np.shape(arr)
for i in range(row):
a,c = np.unique(arr[i], return_counts = True)
print(c)
output::
[1 2 1 1 1 2 2]
[2 1 3 1 1 1 1]
[3 2 3 1 1]
[1 2 1 1 2 1 2]
[2 2 2 1 1 1 1]
[1 1 1 1 1 2 2 1]
But i want my output to look like this
[[1, 0, 2, 1, 1, 1, 0, 2, 2, 0],
[2, 1, 3, 0, 1, 0, 1, 0, 1, 1],
[0, 3, 0, 2, 3, 1, 0, 1, 0, 0],
[1, 0, 2, 1, 0, 1, 0, 2, 1, 2],
[2, 2, 2, 0, 0, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 2, 0, 0, 2, 1]]

I think what you want is np.bincount:
row,col = np.shape(arr)
for i in range(row):
c = np.bincount(arr[i], minlength=11)[1:] #The purpose of [1:] is to ignore the first element
print(c)
this gives you a bunch of 1D arrays, not the 2D array you wanted. However, have a look at, for example, this question Can numpy bincount work with 2D arrays?.

Related

In Python, why did I get -9223372036854775808 when I set one row of a 2D array as np.nan?

In Python, if I define a 2D array, and set the second row as np.nan, the second row will become all -9223372036854775808 rather than missing values. An example is here:
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
[[ 0 0 0
0 0 0
0 0 0
5]
[-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808 -9223372036854775808 -9223372036854775808
-9223372036854775808]
[ 0 0 0
3 6 6
6 6 6
6]
[ 0 0 3
4 6 6
6 6 6
6]
[ 0 1 2
4 4 4
4 4 4
4]]
Does anyone have any idea? And how should I correctly assign one row to np.nan?
For your reference, I am running these codes on python 3.7.10 environment created by mamba on Ubuntu 16.04.7 LTS (GNU/Linux 4.15.0-132-generic x86_64).
np.nan is a special floating point value that cannot be used in integer arrays. Since b is an array of integers, the code b[1, :] = np.nan attempts to convert np.nan to an integer, which is an undefined behavior. See this for a discussion of a similar issue.
You initialised your array with integers. Integers do not have a possible "nan" value and will resort to the minimal value. A quick fix is to initialize your array as np.floats, they are allowed to be "nan":
b = np.array(
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=np.float)
b[1, :] = np.nan
print(b)
First of all nan is a special value for float arrays only.
I tried running your code on my python 3.8(64 bit environment) on Windows x-64 based.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]])
b[1, :] = np.nan
print(b)
This is what I got
[[ 0 0 0 0 0 0
0 0 0 5]
[-2147483648 -2147483648 -2147483648 -2147483648 -2147483648 -2147483648
-2147483648 -2147483648 -2147483648 -2147483648]
[ 0 0 0 3 6 6
6 6 6 6]
[ 0 0 3 4 6 6
6 6 6 6]
[ 0 1 2 4 4 4
4 4 4 4]]
In case of int array I got lower bound of int in place of NaN and you are also getting the same depending on your environment.
So instead of int array you can use float array.
b = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
[0, 3, 4, 4, 6, 6, 6, 5, 4, 5],
[0, 0, 0, 3, 6, 6, 6, 6, 6, 6],
[0, 0, 3, 4, 6, 6, 6, 6, 6, 6],
[0, 1, 2, 4, 4, 4, 4, 4, 4, 4]], dtype=float)

selecting certain indices in Numpy ndarray using another array

I'm trying to mark the value and indices of max values in a 3D array, getting the max in the third axis.
Now this would have been obvious in a lower dimension:
argmaxes=np.argmax(array)
maximums=array[argmaxes]
but NumPy doesn't understand the second syntax properly for higher than 1D.
Let's say my 3D array has shape (8,8,250). argmaxes=np.argmax(array,axis=-1)would return a (8,8) array with numbers between 0 to 250. Now my expected output is an (8,8) array containing the maximum number in the 3rd dimension. I can achieve this with maxes=np.max(array,axis=-1) but that's repeating the same calculation twice (because I need both values and indices for later calculations)
I can also just do a crude nested loop:
for i in range(8):
for j in range(8):
maxes[i,j]=array[i,j,argmaxes[i,j]]
But is there a nicer way to do this?
You can use advanced indexing. This is a simpler case when shape is (8,8,3):
arr = np.random.randint(99, size=(8,8,3))
x, y = np.indices(arr.shape[:-1])
arr[x, y, np.argmax(array,axis=-1)]
Sample run:
>>> x
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3, 3, 3],
[4, 4, 4, 4, 4, 4, 4, 4],
[5, 5, 5, 5, 5, 5, 5, 5],
[6, 6, 6, 6, 6, 6, 6, 6],
[7, 7, 7, 7, 7, 7, 7, 7]])
>>> y
array([[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7]])
>>> np.argmax(arr,axis=-1)
array([[2, 1, 1, 2, 0, 0, 0, 1],
[2, 2, 2, 1, 0, 0, 1, 0],
[1, 2, 0, 1, 1, 1, 2, 0],
[1, 0, 0, 0, 2, 1, 1, 0],
[2, 0, 1, 2, 2, 2, 1, 0],
[2, 2, 0, 1, 1, 0, 2, 2],
[1, 1, 0, 1, 1, 2, 1, 0],
[2, 1, 1, 1, 0, 0, 2, 1]], dtype=int64)
This is a visual example of array to help to understand it better:

Count specific values in a pandas series

I have a pandas series in python.
Is there a function/easy way to construct a series which contains the number of appearances of given values?
For demonstration, suppose I have the following Series: 1, 3, 1, 5, 10.
I want to count how many appearances each value has, from the following list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10].
The series that should return is 2, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1
We do value_counts + reindex
l=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
l1=[1, 3, 1, 5, 10]
pd.Series(l1).value_counts().reindex(l,fill_value=0).tolist()
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
Use numpy.bincount
import numpy as np
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
s = pd.Series([1, 3, 1, 5, 10])
out = list(np.bincount(s)[[l]])
out
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]
With map:
s = pd.Series([1, 3, 1, 5, 10])
inp_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pd.Series(inp_list).map(s.value_counts()).fillna(0).astype(int).tolist()
Or list comp with get
c = s.value_counts()
[c.get(i,0) for i in inp_list]
#or [*map(lambda x: c.get(x,0),inp_list)]
[2, 0, 1, 0, 1, 0, 0, 0, 0, 1]

Replacing specific values of a 2d numpy array, but only at the edges

To illustrate my point, lets take this 2d numpy array:
array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
I want to replace the value 1 with some other value, let's say 0, but only at the edges. This is the desired result:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])
Note that the 1's surrounded by other values are not changed.
I could implement this by iterating over every row and element, but I feel like that would be very inefficient. Normally I would use the np.where function to replace a specific value, but I don't think you can add positional conditions?
m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
These three lines will give you the index for the trailling ones. The idea has been taken from trailing zeroes.
Try:
arr = np.array([[1, 1, 5, 1, 1, 5, 4, 1],
[1, 5, 6, 1, 5, 4, 1, 1],
[5, 1, 5, 6, 1, 1, 1, 1]])
for row in arr:
m = row!=1
w1 = m.argmax()-1
w2 = m.size - m[::-1].argmax()
# print(w1, w2)
row[0:w1+1] = 0
row[w2:] = 0
# print(row)
arr:
array([[0, 0, 5, 1, 1, 5, 4, 0],
[0, 5, 6, 1, 5, 4, 0, 0],
[5, 1, 5, 6, 0, 0, 0, 0]])

Create a new series with same index

I have a series with the following
2 [2, 2, 1, 2, 0, 0, 5, 8, 7, 1, 2, 1, 0, 8, 4, ...
5 [3, 1, 5, 0]
8 [9, 0, 0, 0, 9, 0, 6, 1, 7, 0, 1, 4, 6, 1, 3, ...
9 [1, 1, 0, 8, 0, 0, 2, 9, 8, 6, 0, 3, 0]
11 [1, 0, 0, 2, 0, 0, 0, 0, 1, 1, 8, 7, 5, 7, 5, ...
I want to create a new series that keeps the index (2, 5, 8, 9, 11), with values equal to the length of the list in each row
The result would be
2 25
5 4
8 20
9 13
11 18
list(map(lambda x: (x, len(object[x])), indices))
Its somewhat pseudo code because you haven't specified your data type or variable names, but the general approach is that you have an object of data indexed by some index x. So loop over all the xs and obtain the length property of the resultant data structure.
Edit: since you stated it was pandas Series of integer lists try this:
import pandas as pd
S = pd.Series([[1,2,3], [2,3]], index=[2,4])
print(S)
# 2 [1, 2, 3]
# 4 [2, 3]
lengths = list(map(lambda x: len(S[x]), S.index))
S2 = pd.Series(lengths, index=S.index)
print(S2)
# 2 3
# 4 2

Categories

Resources