I have to set the values of the first 3 rows of dataset in column "alcohol" as NaN
newdf=pd.DataFrame({'alcohol':[np.nan]},index=[0,1,2])
wine.update(newdf)
wine
After running the code, no error is coming and dataframe is also not updated
Assuming alcohol as column.
df.loc[:2, "alcohol"] = np.nan
#alternative
df.alcohol.iloc[:3] = np.nan
Use iloc with get_loc for position for column alcohol:
wine.iloc[:3, wine.columns.get_loc('alcohol')] = np.nan
Or use loc with first values of index:
wine.loc[wine.index[:3], 'alcohol'] = np.nan
Related
I have a pandas dataframe and want to create a new column.
This new column would return 1 if all columns in the row have a value (are not Nan)
If there was a Nan in any one of the columns in the row it would return 0
Does anyone have guidance on how to go about this?
I have used the below to sum the instances of 'Not Nans' in the row, which could possibly be used in an if statement? or is there a more simple way
code_count.apply(lambda x: x.count(), axis=1)
code_count['count_languages'] = code_count.apply(lambda x: x.count(), axis=1)
Use DataFrame.notna for test non missing values with DataFrame.all for test if all values per rows are True, then convert mask to 1,0 by Series.view:
code_count['count_languages'] = code_count.notna().all(axis=1).view('i1')
Or Series.astype:
code_count['count_languages'] = code_count.notna().all(axis=1).astype('int')
Or numpy.where:
code_count['count_languages'] = np.where(code_count.notna().all(axis=1), 1, 0)
this is my code:
for col in df:
if col.startswith('event'):
df[col].fillna(0, inplace=True)
df[col] = df[col].map(lambda x: re.sub("\D","",str(x)))
I have 0 to 10 event column "event_0, event_1,..."
When I fill nan with this code it fills all nan cells under all event columns to 0 but it does not change event_0 which is the first column of that selection and it is also filled by nan.
I made these columns from 'events' column with following code:
event_seperator = lambda x: pd.Series([i for i in
str(x).strip().split('\n')]).add_prefix('event_')
df_events = df['events'].apply(event_seperator)
df = pd.concat([df.drop(columns=['events']), df_events], axis=1)
Please tell me what is wrong? you can see dataframe before changing in the picture.
I don't know why that happened since I made all those columns the
same.
Your data suggests this is precisely what has not been done.
You have a few options depending on what you are trying to achieve.
1. Convert all non-numeric values to 0
Use pd.to_numeric with errors='coerce':
df[col] = pd.to_numeric(df[col], errors='coerce').fillna(0)
2. Replace either string ('nan') or null (NaN) values with 0
Use pd.Series.replace followed by the previous method:
df[col] = df[col].replace('nan', np.nan).fillna(0)
The code below will generate only one value of a normal distribution, and fill in all the missing values with this same value:
helper_df = df.dropna()
df = df.fillna(numpy.random.normal(loc=helper_df.mean(), scale=numpy.std(helper_df)))
What can we do to generate a value for each missing value?
You can create a series with normal values. You should extract the index of the Nan values in the column you are working on.
df: your dataframe
col: the col containing Nan values
index = df[df.col.isna()].index
value = np.random.normal(loc=data.col.mean(), scale=data.col.std(), size=data.Age.isna().sum())
data.Age.fillna(pd.Series(value, index=index), inplace=True)
You can create a series of random variables with the same length as your dataframe, then apply fillna:
df.fillna(pd.Series([np.random.normal() for x in range(len(df))]))
If a value in a row is not missing, fillna just ignores it.
What am I missing? fillna doesn't fill NaN values:
#filling multi columns df with values..
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
#just for kicks
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
#retun true
print df.isnull().values.any()
I verified it - I actually see NaN values in some first cells..
Edit
So I'm trying to write it myself:
def bfill(df):
for column in df:
for cell in df[column]:
if cell is not None:
tmpValue = cell
break
for cell in df[column]:
if cell is not None:
break
cell = tmpValue
However it doesn't work... Isn't the cell is by ref?
ffill fills rows with values from the previous row if they weren't NaN, bfill fills rows with the values from the NEXT row if they weren't NaN. In both cases, if you have NaNs on the first and/or last row, they won't get filled. Try doing both one after the other. If any columns have entirely NaN values then you will need to fill again with axis=1, (although I get a NotImplementedError when I try to do this with inplace=True on python 3.6, which is super annoying, pandas!).
So, I don't know why but taking the fillna outside the function fixed it..
Origen:
def doWork(df):
...
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
def main():
..
doWork(df)
print df.head(5) #shows NaN
Solution:
def doWork(df):
...
def main():
..
doWork(df)
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
print df.head(5) #no NaN
List with attributes of persons loaded into pandas dataframe df2. For cleanup I want to replace value zero (0 or '0') by np.nan.
df2.dtypes
ID object
Name object
Weight float64
Height float64
BootSize object
SuitSize object
Type object
dtype: object
Working code to set value zero to np.nan:
df2.loc[df2['Weight'] == 0,'Weight'] = np.nan
df2.loc[df2['Height'] == 0,'Height'] = np.nan
df2.loc[df2['BootSize'] == '0','BootSize'] = np.nan
df2.loc[df2['SuitSize'] == '0','SuitSize'] = np.nan
Believe this can be done in a similar/shorter way:
df2[["Weight","Height","BootSize","SuitSize"]].astype(str).replace('0',np.nan)
However the above does not work. The zero's remain in df2. How to tackle this?
I think you need replace by dict:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
You could use the 'replace' method and pass the values that you want to replace in a list as the first parameter along with the desired one as the second parameter:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace(['0', 0], np.nan)
Try:
df2.replace(to_replace={
'Weight':{0:np.nan},
'Height':{0:np.nan},
'BootSize':{'0':np.nan},
'SuitSize':{'0':np.nan},
})
data['amount']=data['amount'].replace(0, np.nan)
data['duration']=data['duration'].replace(0, np.nan)
in column "age", replace zero with blanks
df['age'].replace(['0', 0'], '', inplace=True)
Replace zero with nan for single column
df['age'] = df['age'].replace(0, np.nan)
Replace zero with nan for multiple columns
cols = ["Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"]
df[cols] = df[cols].replace(['0', 0], np.nan)
Replace zero with nan for dataframe
df.replace(0, np.nan, inplace=True)
If you just want to o replace the zeros in whole dataframe, you can directly replace them without specifying any columns:
df = df.replace({0:pd.NA})
Another alternative way:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].mask(df2[cols].eq(0) | df2[cols].eq('0'))