I'm trying to assign an empty cell (blank/nan) in my "np.where" condition to my pandas dataframe, but nothing seems to work.
The reason for this is to run fillna,ffill on the missing values.
Np.Where code:
df['x'] = np.where(df['y']>0.05,1,np.nan)
Fillna code:
df['x'] = df['x'].fillna(method="ffill")
Anybody know where I'm going wrong?
this line of code works:
df['x'] = np.where(df['y']>0.05,1,np.nan)
just remove the unneeded paratheses in the right
I was able to fix it using pandas.NA instead which fillna, for some reason, recognizes as blanks to fill with ffill
Fix:
df['x'] = np.where(df['y']>0.05,1,pd.NA)
df['x'] = df['x'].fillna(method="ffill")
Related
I'm trying to write fillna() or a lambda function in Pandas that checks if 'user_score' column is a NaN and if so, uses column's data from another DataFrame. I tried two options:
games_data['user_score'].fillna(
genre_score[games_data['genre']]['user_score']
if np.isnan(games_data['user_score'])
else games_data['user_score'],
inplace = True
)
# but here is 'ValueError: The truth value of a Series is ambiguous'
and
games_data['user_score'] = games_data.apply(
lambda row:
genre_score[row['genre']]['user_score']
if np.isnan(row['user_score'])
else row['user_score'],
axis=1
)
# but here is 'KeyError' with another column from games_data
My dataframes:
games_data
genre_score
I will be glad for any help!
You can also fillna() directly with the user_score_by_genre mapping:
user_score_by_genre = games_data.genre.map(genre_score.user_score)
games_data.user_score = games_data.user_score.fillna(user_score_by_genre)
BTW if games_data.user_score will never deviate from the genre_score values, you can skip the fillna() and just assign directly to games_data.user_score:
games_data.user_score = games_data.genre.map(genre_score.user_score)
Pandas' built-in Series.where also works and is a bit more concise:
df1.user_score.where(df1.user_score.isna(), df2.user_score, inplace=True)
Use numpy.where:
import numpy as np
df1['user_score'] = np.where(df1['user_score'].isna(), df2['user_score'], df1['user_score'])
I found the part of the solution here
I use series.map:
user_score_by_genre = games_data['genre'].map(genre_score['user_score'])
And after that I use #MayankPorwal answer:
games_data['user_score'] = np.where(games_data['user_score'].isna(), user_score_by_genre, games_data['user_score'])
I'm not sure that it is the best way but it works for me.
I have an error trying to replace the value
table.loc[table['Column1'].str.contains('Unnamed'), 'Column1'] = np.NaN
A value is trying to be set on a copy of a slice from a DataFrame
Any suggestion?
You could use the apply method
def changer(x):
if 'unnamed' in x:
x= np.NaN
return x
df['column'].apply(changer)
Solved:
table=table.copy()
include before the code
Assume I have a dataframe df where the column A consists of 10 None and the rest is something else.
If I do the slicing df=df[df["A"]==None] I get a wrong result. I figured out that df["A"]==None returns False (even when the elements are None) but df["A"].values==None returns the correct.
How come? Shouldn't we be able to slice in the first way ?
You should use isna() method over the serie.
For your case:
df = df.loc[df['A'].isna()]
You can use as follows
df = df[df['A'].isnull()]
Below is my dataframe:
Id,ReturnCreated,ReturnTime,TS_startTime
O108808972773560,Return Not Created,nan,2018-08-23 12:30:41
O100497888936380,Return Not Created,nan,2018-08-18 14:57:20
O109648374050370,Return Not Created,nan,2018-08-16 13:50:06
O112787613729150,Return Not Created,nan,2018-08-16 13:15:26
O110938305325240,Return Not Created,nan,2018-08-22 11:03:37
O110829757146060,Return Not Created,nan,2018-08-21 16:10:37
I want to replace the nan with Blanks. Tried the below code, but its not working.
import pandas as pd
import numpy as np
df = pd.concat({k:pd.Series(v) for k, v in ordercreated.items()}).unstack().astype(str).sort_index()
df.columns = 'ReturnCreated ReturnTime TS_startTime'.split()
df1 = df.replace(np.nan,"", regex=True)
df1.to_csv('OrderCreationdetails.csv')
Kindly help me understand where i am going wrong and how can i fix the same.
You should try DataFrame.fillna() method
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
In your case:
df1 = df.fillna("")
should work I think
I think nans are strings, because .astype(str). So need:
df1 = df.replace('nan',"")
Either you can use df.fillna("") (i think that will perform better) or simple replace that values with blank
df1 = df.replace('NaN',"")
This has been killing me!
Any idea how to convert this to a list comprehension?
for x in dataframe:
if dataframe[x].value_counts().sum()<=1:
dataframe.drop(x, axis=1, inplace=True)
[dataframe.drop(x, axis=1, inplace=True) for x in dataframe if dataframe[x].value_counts().sum() <= 1]
I have not used pandas yet, but the documentation on dataframe.drop says it returns a new object, so I assume it will work.
I would probably suggest going the other way and filtering it, I don't know your dataframe but something like this should work:
counts_valid = df.T.apply(pd.value_counts()).sum() > 1
df = df[counts_valid]
Or, if I see what you are doing, you may be better with
counts_valid = df.T.nunique() > 1
df = df[counts_valid]
That will just keep rows that have more than one unique value.