Cannot assign "nan"/empty value in np.where

Cannot assign "nan"/empty value in np.where - python

I'm trying to assign an empty cell (blank/nan) in my "np.where" condition to my pandas dataframe, but nothing seems to work.
The reason for this is to run fillna,ffill on the missing values.
Np.Where code:
df['x'] = np.where(df['y']>0.05,1,np.nan)
Fillna code:
df['x'] = df['x'].fillna(method="ffill")
Anybody know where I'm going wrong?

this line of code works:
df['x'] = np.where(df['y']>0.05,1,np.nan)
just remove the unneeded paratheses in the right

I was able to fix it using pandas.NA instead which fillna, for some reason, recognizes as blanks to fill with ffill
Fix:
df['x'] = np.where(df['y']>0.05,1,pd.NA)
df['x'] = df['x'].fillna(method="ffill")

Related

Use fillna() and lambda function in Pandas to replace NaN values

I'm trying to write fillna() or a lambda function in Pandas that checks if 'user_score' column is a NaN and if so, uses column's data from another DataFrame. I tried two options:
games_data['user_score'].fillna(
genre_score[games_data['genre']]['user_score']
if np.isnan(games_data['user_score'])
else games_data['user_score'],
inplace = True
)
# but here is 'ValueError: The truth value of a Series is ambiguous'
and
games_data['user_score'] = games_data.apply(
lambda row:
genre_score[row['genre']]['user_score']
if np.isnan(row['user_score'])
else row['user_score'],
axis=1
)
# but here is 'KeyError' with another column from games_data
My dataframes:
games_data
genre_score
I will be glad for any help!

You can also fillna() directly with the user_score_by_genre mapping:
user_score_by_genre = games_data.genre.map(genre_score.user_score)
games_data.user_score = games_data.user_score.fillna(user_score_by_genre)
BTW if games_data.user_score will never deviate from the genre_score values, you can skip the fillna() and just assign directly to games_data.user_score:
games_data.user_score = games_data.genre.map(genre_score.user_score)
Pandas' built-in Series.where also works and is a bit more concise:
df1.user_score.where(df1.user_score.isna(), df2.user_score, inplace=True)

Use numpy.where:
import numpy as np
df1['user_score'] = np.where(df1['user_score'].isna(), df2['user_score'], df1['user_score'])

I found the part of the solution here
I use series.map:
user_score_by_genre = games_data['genre'].map(genre_score['user_score'])
And after that I use #MayankPorwal answer:
games_data['user_score'] = np.where(games_data['user_score'].isna(), user_score_by_genre, games_data['user_score'])
I'm not sure that it is the best way but it works for me.

Warning replace value pandas

I have an error trying to replace the value
table.loc[table['Column1'].str.contains('Unnamed'), 'Column1'] = np.NaN
A value is trying to be set on a copy of a slice from a DataFrame
Any suggestion?

You could use the apply method
def changer(x):
if 'unnamed' in x:
x= np.NaN
return x
df['column'].apply(changer)

Solved:
table=table.copy()
include before the code

pandas df[df["A"]==None] is not the same as df["A"].values==None

Assume I have a dataframe df where the column A consists of 10 None and the rest is something else.
If I do the slicing df=df[df["A"]==None] I get a wrong result. I figured out that df["A"]==None returns False (even when the elements are None) but df["A"].values==None returns the correct.
How come? Shouldn't we be able to slice in the first way ?

You should use isna() method over the serie.
For your case:
df = df.loc[df['A'].isna()]

You can use as follows
df = df[df['A'].isnull()]

Replacing nan with blanks in Python

Below is my dataframe:
Id,ReturnCreated,ReturnTime,TS_startTime
O108808972773560,Return Not Created,nan,2018-08-23 12:30:41
O100497888936380,Return Not Created,nan,2018-08-18 14:57:20
O109648374050370,Return Not Created,nan,2018-08-16 13:50:06
O112787613729150,Return Not Created,nan,2018-08-16 13:15:26
O110938305325240,Return Not Created,nan,2018-08-22 11:03:37
O110829757146060,Return Not Created,nan,2018-08-21 16:10:37
I want to replace the nan with Blanks. Tried the below code, but its not working.
import pandas as pd
import numpy as np
df = pd.concat({k:pd.Series(v) for k, v in ordercreated.items()}).unstack().astype(str).sort_index()
df.columns = 'ReturnCreated ReturnTime TS_startTime'.split()
df1 = df.replace(np.nan,"", regex=True)
df1.to_csv('OrderCreationdetails.csv')
Kindly help me understand where i am going wrong and how can i fix the same.

You should try DataFrame.fillna() method
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
In your case:
df1 = df.fillna("")
should work I think

I think nans are strings, because .astype(str). So need:
df1 = df.replace('nan',"")

Either you can use df.fillna("") (i think that will perform better) or simple replace that values with blank
df1 = df.replace('NaN',"")

Converting for loop to list comprehension?

This has been killing me!
Any idea how to convert this to a list comprehension?
for x in dataframe:
if dataframe[x].value_counts().sum()<=1:
dataframe.drop(x, axis=1, inplace=True)

[dataframe.drop(x, axis=1, inplace=True) for x in dataframe if dataframe[x].value_counts().sum() <= 1]
I have not used pandas yet, but the documentation on dataframe.drop says it returns a new object, so I assume it will work.

I would probably suggest going the other way and filtering it, I don't know your dataframe but something like this should work:
counts_valid = df.T.apply(pd.value_counts()).sum() > 1
df = df[counts_valid]
Or, if I see what you are doing, you may be better with
counts_valid = df.T.nunique() > 1
df = df[counts_valid]
That will just keep rows that have more than one unique value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cannot assign "nan"/empty value in np.where - python

this line of code works: df['x'] = np.where(df['y']>0.05,1,np.nan) just remove the unneeded paratheses in the right

I was able to fix it using pandas.NA instead which fillna, for some reason, recognizes as blanks to fill with ffill Fix: df['x'] = np.where(df['y']>0.05,1,pd.NA) df['x'] = df['x'].fillna(method="ffill")

Related

Use fillna() and lambda function in Pandas to replace NaN values

Warning replace value pandas

pandas df[df["A"]==None] is not the same as df["A"].values==None

Replacing nan with blanks in Python

Converting for loop to list comprehension?

Categories

Resources