Finding NaN Values in Pandas MultiIndex

Finding NaN Values in Pandas MultiIndex - python

I'm trying to find the difference between two Pandas MultiIndex objects of different shapes. I've used:
df1.index.difference(df2)
and receive
TypeError: '<' not supported between instances of 'float' and 'str'
My indices are str and datetime, but I suspect there are NaNs hidden there (the floats). Hence my question:
What's the best way to find the NaNs somewhere in the MultiIndex? How does one iterate through the levels and names? Can I use something like isna()?

For MultiIndex are not implemented many functions, you can check this.
You need convert MultiIndex to DataFrame by MultiIndex.to_frame first:
#W-B sample
idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])
print (idx.to_frame())
0 1
NaN 1 NaN 1
1 1 1.0 1
2 1.0 2
print (idx.to_frame().isnull())
0 1
NaN 1 True False
1 1 False False
2 False False
Or use DataFrame constructor:
print (pd.DataFrame(list(idx.tolist())))
0 1
0 NaN 1
1 1.0 1
2 1.0 2
Because:
print (pd.isnull(idx))
NotImplementedError: isna is not defined for MultiIndex
EDIT:
For check at least one True per rows use any with boolean indexing:
df = idx.to_frame()
print (df[df.isna().any(axis=1)])
0 1
NaN 1 NaN 1
Also is possible filter MultiIndex, but is necessary add MultiIndex.remove_unused_levels:
print (idx[idx.to_frame().isna().any(axis=1)].remove_unused_levels())
MultiIndex(levels=[[], [1]],
labels=[[-1], [0]])

We can using reset_index , then with isna
idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)])
df=pd.DataFrame([1,2,3],index=idx)
df.reset_index().filter(like='level_').isna()
Out[304]:
level_0 level_1
0 True False
1 False False
2 False False

Related

Pandas change Nan Columns values to True or False

I need to change a column to either True or False based on the NaN value.
Here is the df.
missing
0 NaN
1 b
2 NaN
4 y
5 NaN
would become
missing
0 False
1 True
2 False
4 True
5 False
yes I can do a loop but there was to be a simple way to do in a single line of code.
thank you.

You can do
df['missing'].notna() # or notnull()

you need to overwrite the column values with binary applied on the same column, which can be achieved notna()
df['missing'] = df['missing'].notna()

Why sometimes we have to add .values when we do elementwise operation in pandas?

Suppose I have a dataframe looks like
A
0 0
1 1
2 2
3 3
and when I run:
a = df.loc[np.arange(0,2)] / df.loc[np.arange(2,4)]
I get
A
0 NaN
1 NaN
2 NaN
3 NaN
I know I could get the right result by writing
a = df.loc[np.arange(0,2)].values / df.loc[np.arange(2,4)]
b = df.loc[np.arange(0,2)] / df.loc[np.arange(2,4)].values
Can anyone explain why?

Due to pandas is index and columns sensitive, when you do the calculation the hidden key for them get match first , if we only need to get the value match and remove the impact of index and columns is adding .values or to_numpy() , however, index also bring some advantage as well
Example 1 index not match so the value will return NaN
s1=pd.Series([1],index=[1])
s2=pd.Series([1],index=[999])
s1/s2
1 NaN
999 NaN
dtype: float64
s1.values/s2.values
array([1.])
Example 2 index match so pandas will return the value when the index match
s1=pd.Series([1],index=[1])
s2=pd.Series([1,999],index=[1,999])
s1/s2
1 1.0
999 NaN
dtype: float64

comparing two columns and replace NaN with numbers

for i in range(len(df1)-1):
if (df1['overall_rating'][i]==np.nan) and (df1['recommended'][i]==0):
df1['overall_rating']=df1['overall_rating'][i].replace(np.nan,1)
else:
df1['overall_rating']
print(df1['overall_rating'])
I am comparing overall rating columns and recommended column in a pandas dataframe. If both column values happens to be true then i should replace nan in rating column to be 1 . But I am not getting answer as well error.Anyone please let me know where I am going wrong.

Use DataFrame.loc for set 1 by 2 conditions, for test missing values is used Series.isna function:
df1 = pd.DataFrame({'overall_rating':[np.nan,2,4,np.nan],
'recommended':[0,0,1,1]})
df1.loc[df1['overall_rating'].isna() & (df1['recommended']==0), 'overall_rating'] = 1
print (df1)
overall_rating recommended
0 1.0 0
1 2.0 0
2 4.0 1
3 NaN 1

how can I get the index of rows having null values in all columns

I'd like to get index of rows which have only null values straight in pandas, python3.
thanks.

Use:
i = df.index[df.isna().all(axis=1)]
If large DataFrame, slowier solution:
i = df[df.isna().all(axis=1)].index
Sample:
df=pd.DataFrame({"a":[np.nan,0,1],
"b":[np.nan,1,np.nan]})
print (df)
a b
0 NaN NaN
1 0.0 1.0
2 1.0 NaN
i = df.index[df.isna().all(axis=1)]
print (i)
Int64Index([0], dtype='int64')
Explanation:
First compare missing values by DataFrame.isna:
print (df.isna())
a b
0 True True
1 False False
2 False True
Then check if all Trues per rows by DataFrame.all:
print (df.isna().all(axis=1))
0 True
1 False
2 False
dtype: bool
And last filter index values by boolean indexing.

What is the Right Syntax When Using .notnull() in Pandas?

I want to use .notnull() on several columns of a dataframe to eliminate the rows which contain "NaN" values.
Let say I have the following df:
A B C
0 1 1 1
1 1 NaN 1
2 1 NaN NaN
3 NaN 1 1
I tried to use this syntax but it does not work? do you know what I am doing wrong?
df[[df.A.notnull()],[df.B.notnull()],[df.C.notnull()]]
I get this Error:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
What should I do to get the following output?
A B C
0 1 1 1
Any idea?

You can first select subset of columns by df[['A','B','C']], then apply notnull and specify if all values in mask are True:
print (df[['A','B','C']].notnull())
A B C
0 True True True
1 True False True
2 True False False
3 False True True
print (df[['A','B','C']].notnull().all(1))
0 True
1 False
2 False
3 False
dtype: bool
print (df[df[['A','B','C']].notnull().all(1)])
A B C
0 1.0 1.0 1.0
Another solution is from Ayhan comment with dropna:
print (df.dropna(subset=['A', 'B', 'C']))
A B C
0 1.0 1.0 1.0
what is same as:
print (df.dropna(subset=['A', 'B', 'C'], how='any'))
and means drop all rows, where is at least one NaN value.

You can apply multiple conditions by combining them with the & operator (this works not only for the notnull() function).
df[(df.A.notnull() & df.B.notnull() & df.C.notnull())]
A B C
0 1.0 1.0 1.0
Alternatively, you can just drop all columns which contain NaN. The original DataFrame is not modified, instead a copy is returned.
df.dropna()

You can simply do:
df.dropna()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding NaN Values in Pandas MultiIndex - python

We can using reset_index , then with isna idx=pd.MultiIndex.from_tuples([(np.nan,1),(1,1),(1,2)]) df=pd.DataFrame([1,2,3],index=idx) df.reset_index().filter(like='level_').isna() Out[304]: level_0 level_1 0 True False 1 False False 2 False False

Related

Pandas change Nan Columns values to True or False

Why sometimes we have to add .values when we do elementwise operation in pandas?

comparing two columns and replace NaN with numbers

how can I get the index of rows having null values in all columns

What is the Right Syntax When Using .notnull() in Pandas?

Categories

Resources