fillna does fill the dataframe in the NaN cells - python

What am I missing? fillna doesn't fill NaN values:
#filling multi columns df with values..
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
#just for kicks
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
#retun true
print df.isnull().values.any()
I verified it - I actually see NaN values in some first cells..
Edit
So I'm trying to write it myself:
def bfill(df):
for column in df:
for cell in df[column]:
if cell is not None:
tmpValue = cell
break
for cell in df[column]:
if cell is not None:
break
cell = tmpValue
However it doesn't work... Isn't the cell is by ref?

ffill fills rows with values from the previous row if they weren't NaN, bfill fills rows with the values from the NEXT row if they weren't NaN. In both cases, if you have NaNs on the first and/or last row, they won't get filled. Try doing both one after the other. If any columns have entirely NaN values then you will need to fill again with axis=1, (although I get a NotImplementedError when I try to do this with inplace=True on python 3.6, which is super annoying, pandas!).

So, I don't know why but taking the fillna outside the function fixed it..
Origen:
def doWork(df):
...
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
def main():
..
doWork(df)
print df.head(5) #shows NaN
Solution:
def doWork(df):
...
def main():
..
doWork(df)
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
print df.head(5) #no NaN

Related

Replace NaN values from DataFrame with values from series

I am trying to implement code which will do the following with pandas.
def fill_in_capabilities(df):
capacity_means = df.groupby("LV_Name").mean(["LEO_Capa", "GTO_Capa"])
for row in df:
if np.isnan(row["LEO_Capa"]):
row["LEO_Capa"] = capacity_means[row["LV_Name"]]
return df
Basically, for the rows in df where the value in the column "LEO_Capa" is NaN, I would like to replace the value there with a value from the series capacity_means, indexed by the value in the column "LV_Name" from the df with the missing value. How would one do this with pandas, as the code there does not work. Thanks.
You can use a function:
def fill_in_capabilities(df: pd.DataFrame) -> pd.DataFrame:
df[["LEO_Capa", "GTO_Capa"]] = df[["LEO_Capa", "GTO_Capa"]].fillna(
df.groupby("LV_Name")[["LEO_Capa", "GTO_Capa"]].transform("mean")
)
return df
df = fill_in_capabilities(df)

df.fillna() not working but df.dropna() working

In my code the df.fillna() method is not working when the df.dropna() method is working. I don't want to drop the column though. What can I do that the fillna() method works?
def preprocess_df(df):
for col in df.columns: # go through all of the columns
if col != "target": # normalize all ... except for the target itself!
df[col] = df[col].pct_change() # pct change "normalizes" the different currencies (each crypto coin has vastly diff values, we're really more interested in the other coin's movements)
# df.dropna(inplace=True) # remove the nas created by pct_change
df.fillna(method="ffill", inplace=True)
print(df)
break
df[col] = preprocessing.scale(df[col].values) # scale between 0 and 1.
You were almost there:
df = df.fillna(method="ffill", inplace=True)
You have to assign it back to df

Replace values with nan in python

I have to set the values of the first 3 rows of dataset in column "alcohol" as NaN
newdf=pd.DataFrame({'alcohol':[np.nan]},index=[0,1,2])
wine.update(newdf)
wine
After running the code, no error is coming and dataframe is also not updated
Assuming alcohol as column.
df.loc[:2, "alcohol"] = np.nan
#alternative
df.alcohol.iloc[:3] = np.nan
Use iloc with get_loc for position for column alcohol:
wine.iloc[:3, wine.columns.get_loc('alcohol')] = np.nan
Or use loc with first values of index:
wine.loc[wine.index[:3], 'alcohol'] = np.nan

Fill nan with zero python pandas

this is my code:
for col in df:
if col.startswith('event'):
df[col].fillna(0, inplace=True)
df[col] = df[col].map(lambda x: re.sub("\D","",str(x)))
I have 0 to 10 event column "event_0, event_1,..."
When I fill nan with this code it fills all nan cells under all event columns to 0 but it does not change event_0 which is the first column of that selection and it is also filled by nan.
I made these columns from 'events' column with following code:
event_seperator = lambda x: pd.Series([i for i in
str(x).strip().split('\n')]).add_prefix('event_')
df_events = df['events'].apply(event_seperator)
df = pd.concat([df.drop(columns=['events']), df_events], axis=1)
Please tell me what is wrong? you can see dataframe before changing in the picture.
I don't know why that happened since I made all those columns the
same.
Your data suggests this is precisely what has not been done.
You have a few options depending on what you are trying to achieve.
1. Convert all non-numeric values to 0
Use pd.to_numeric with errors='coerce':
df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
2. Replace either string ('nan') or null (NaN) values with 0
Use pd.Series.replace followed by the previous method:
df[col] = df[col].replace('nan', np.nan).fillna(0)

Python Pandas replace multiple columns zero to Nan

List with attributes of persons loaded into pandas dataframe df2. For cleanup I want to replace value zero (0 or '0') by np.nan.
df2.dtypes
ID object
Name object
Weight float64
Height float64
BootSize object
SuitSize object
Type object
dtype: object
Working code to set value zero to np.nan:
df2.loc[df2['Weight'] == 0,'Weight'] = np.nan
df2.loc[df2['Height'] == 0,'Height'] = np.nan
df2.loc[df2['BootSize'] == '0','BootSize'] = np.nan
df2.loc[df2['SuitSize'] == '0','SuitSize'] = np.nan
Believe this can be done in a similar/shorter way:
df2[["Weight","Height","BootSize","SuitSize"]].astype(str).replace('0',np.nan)
However the above does not work. The zero's remain in df2. How to tackle this?
I think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
You could use the 'replace' method and pass the values that you want to replace in a list as the first parameter along with the desired one as the second parameter:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace(['0', 0], np.nan)
Try:
df2.replace(to_replace={
'Weight':{0:np.nan},
'Height':{0:np.nan},
'BootSize':{'0':np.nan},
'SuitSize':{'0':np.nan},
})
data['amount']=data['amount'].replace(0, np.nan)
data['duration']=data['duration'].replace(0, np.nan)
in column "age", replace zero with blanks
df['age'].replace(['0', 0'], '', inplace=True)
Replace zero with nan for single column
df['age'] = df['age'].replace(0, np.nan)
Replace zero with nan for multiple columns
cols = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
df[cols] = df[cols].replace(['0', 0], np.nan)
Replace zero with nan for dataframe
df.replace(0, np.nan, inplace=True)
If you just want to o replace the zeros in whole dataframe, you can directly replace them without specifying any columns:
df = df.replace({0:pd.NA})
Another alternative way:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].mask(df2[cols].eq(0) | df2[cols].eq('0'))

Categories

Resources