I am trying to implement code which will do the following with pandas.
def fill_in_capabilities(df):
capacity_means = df.groupby("LV_Name").mean(["LEO_Capa", "GTO_Capa"])
for row in df:
if np.isnan(row["LEO_Capa"]):
row["LEO_Capa"] = capacity_means[row["LV_Name"]]
return df
Basically, for the rows in df where the value in the column "LEO_Capa" is NaN, I would like to replace the value there with a value from the series capacity_means, indexed by the value in the column "LV_Name" from the df with the missing value. How would one do this with pandas, as the code there does not work. Thanks.
You can use a function:
def fill_in_capabilities(df: pd.DataFrame) -> pd.DataFrame:
df[["LEO_Capa", "GTO_Capa"]] = df[["LEO_Capa", "GTO_Capa"]].fillna(
df.groupby("LV_Name")[["LEO_Capa", "GTO_Capa"]].transform("mean")
)
return df
df = fill_in_capabilities(df)
In my code the df.fillna() method is not working when the df.dropna() method is working. I don't want to drop the column though. What can I do that the fillna() method works?
def preprocess_df(df):
for col in df.columns: # go through all of the columns
if col != "target": # normalize all ... except for the target itself!
df[col] = df[col].pct_change() # pct change "normalizes" the different currencies (each crypto coin has vastly diff values, we're really more interested in the other coin's movements)
# df.dropna(inplace=True) # remove the nas created by pct_change
df.fillna(method="ffill", inplace=True)
print(df)
break
df[col] = preprocessing.scale(df[col].values) # scale between 0 and 1.
You were almost there:
df = df.fillna(method="ffill", inplace=True)
You have to assign it back to df
I have to set the values of the first 3 rows of dataset in column "alcohol" as NaN
newdf=pd.DataFrame({'alcohol':[np.nan]},index=[0,1,2])
wine.update(newdf)
wine
After running the code, no error is coming and dataframe is also not updated
Assuming alcohol as column.
df.loc[:2, "alcohol"] = np.nan
#alternative
df.alcohol.iloc[:3] = np.nan
Use iloc with get_loc for position for column alcohol:
wine.iloc[:3, wine.columns.get_loc('alcohol')] = np.nan
Or use loc with first values of index:
wine.loc[wine.index[:3], 'alcohol'] = np.nan
this is my code:
for col in df:
if col.startswith('event'):
df[col].fillna(0, inplace=True)
df[col] = df[col].map(lambda x: re.sub("\D","",str(x)))
I have 0 to 10 event column "event_0, event_1,..."
When I fill nan with this code it fills all nan cells under all event columns to 0 but it does not change event_0 which is the first column of that selection and it is also filled by nan.
I made these columns from 'events' column with following code:
event_seperator = lambda x: pd.Series([i for i in
str(x).strip().split('\n')]).add_prefix('event_')
df_events = df['events'].apply(event_seperator)
df = pd.concat([df.drop(columns=['events']), df_events], axis=1)
Please tell me what is wrong? you can see dataframe before changing in the picture.
I don't know why that happened since I made all those columns the
same.
Your data suggests this is precisely what has not been done.
You have a few options depending on what you are trying to achieve.
1. Convert all non-numeric values to 0
Use pd.to_numeric with errors='coerce':
df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
2. Replace either string ('nan') or null (NaN) values with 0
Use pd.Series.replace followed by the previous method:
df[col] = df[col].replace('nan', np.nan).fillna(0)
List with attributes of persons loaded into pandas dataframe df2. For cleanup I want to replace value zero (0 or '0') by np.nan.
df2.dtypes
ID object
Name object
Weight float64
Height float64
BootSize object
SuitSize object
Type object
dtype: object
Working code to set value zero to np.nan:
df2.loc[df2['Weight'] == 0,'Weight'] = np.nan
df2.loc[df2['Height'] == 0,'Height'] = np.nan
df2.loc[df2['BootSize'] == '0','BootSize'] = np.nan
df2.loc[df2['SuitSize'] == '0','SuitSize'] = np.nan
Believe this can be done in a similar/shorter way:
df2[["Weight","Height","BootSize","SuitSize"]].astype(str).replace('0',np.nan)
However the above does not work. The zero's remain in df2. How to tackle this?
I think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
You could use the 'replace' method and pass the values that you want to replace in a list as the first parameter along with the desired one as the second parameter:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace(['0', 0], np.nan)
Try:
df2.replace(to_replace={
'Weight':{0:np.nan},
'Height':{0:np.nan},
'BootSize':{'0':np.nan},
'SuitSize':{'0':np.nan},
})
data['amount']=data['amount'].replace(0, np.nan)
data['duration']=data['duration'].replace(0, np.nan)
in column "age", replace zero with blanks
df['age'].replace(['0', 0'], '', inplace=True)
Replace zero with nan for single column
df['age'] = df['age'].replace(0, np.nan)
Replace zero with nan for multiple columns
cols = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
df[cols] = df[cols].replace(['0', 0], np.nan)
Replace zero with nan for dataframe
df.replace(0, np.nan, inplace=True)
If you just want to o replace the zeros in whole dataframe, you can directly replace them without specifying any columns:
df = df.replace({0:pd.NA})
Another alternative way:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].mask(df2[cols].eq(0) | df2[cols].eq('0'))