pandas conditionally fill values with 0 and 1

pandas conditionally fill values with 0 and 1 - python

Doing the following conditional fill in pyspark how would I do this in pandas
colIsAcceptable = when(col("var") < 0.9, 1).otherwise(0)

You can use:
df['new_col'] = df['col'].lt(0.9).astype(int)
or with numpy.where:
import numpy as np
df['new_col'] = np.where(df['col'].lt(0.9), 1, 0)

You can use numpy.where.
import numpy as np
df['colIsAcceptable'] = np.where(df['col'] < 0.9, 1, 0)

colIsAcceptable = df['var'].apply(lambda x: 1 if x < 0.9 else 0)
apply can be slow on very large datasets, and there are more efficient ways that I don't know of, but is good for general purposes

I assume the first column on your dataframe is named 'var'. and then the second column name is 'colIsAcceptable', then you can use .map() function
df['colIsAcceptable']= df['var'].map(lambda x: 1 if x<0.9 else 0)

df['col2'] = 0
df.loc[df['col1'] < 0.9, 'col2'] = 1
This is a simple example to do something like what you are asking.

Related

replace all values in all columns based on condition

I have a df as below
I want to make this df binary as follows
I tried
df[:]=np.where(df>0, 1, 0)
but with this I am losing my df index.
I can try this on all columns one by one or use loop but I think there would be some easy & quick way to do this.

You can convert boolean mask by DataFrame.gt to integers:
df1 = df.gt(0).astype(int)
Or use DataFrame.clip if integers and no negative values:
df1 = df.clip(upper=1)
Your solution should working with loc:
df.loc[:] = np.where(df>0, 1, 0)

of course it is possible by func, it can be done with just operator
(df > 0) * 1

Without using numpy:
df[df>0]=0

Pandas numpy how to convert np.where into pandas filter

I have a dataframe df_ac and a logic for this dataframe is:
df_ac['annfact'] = np.where((df_ac['annfact'] == 0) & (df_ac['cert'] == 0), 1, df_ac['annfact'])
How to use pandas filter to convert the above logic, something like this ?
df_ac['annfact'] = df_ac[(df_ac['annfact'] == 0) & (df_ac['cert'] == 0)] =1 ?
And I hope the pandas filter way will faster than np.where
Any friend can help convert the code or any suggestion ?

You can use a boolean mask to update certain values. This will modify the "annfact" column directly.
mask = (df_ac["annfact"] == 0) & (df_ac["cert"] == 0)
df.loc[mask, "annfact"] = 1

pandas Dataframe divide a column with a specific value and create new column with the result?

A B
0 0.119 5.344960e+08
1 0.008 7.950629e+09
2 318.575 1.996548e+05
3 153.644 4.139767e+05
sum = 63605028.818
df['B'] = df['A'].rdiv(sum).replace(np.inf, 0).round(3)
Getting exponential values(as a series) , I want normal numerical values in B column like - 534496040.49 etc.

You can do something like this:
df['B'] = df['A'].rdiv(my_sum).replace(np.inf, 0).astype('int64')
You can also change the view option of pandas:
pd.set_option('display.float_format', lambda x: '%.3f' % x)

import pandas as pd
pd.options.display.float_format = '{:,.3f}'.format
Set float_format option/setting of pandas and it will show all floats in this format. You won't need to explicitly round each column.
Alternatively, use map()
df['B'] = df['A'].rdiv(sum).replace(np.inf, 0)
df['B'] = df['B'].map(':,.3f'.format)

pandas apply function with multiple condition?

if i want to apply lambda with multiple condition how should I do it?
df.train.age.apply(lambda x:0 (if x>=0 and x<500))
or is there much better methods?

create a mask and select from your array with the mask ... only apply to the result
mask = (df.train.age >=0) & (df.train.age < 500)
df.train.age[mask].apply(something)
if you just need to set the ones that dont match to zero thats even easier
df.train.age[~mask] = 0

Your syntax needs to have an else:
df.train.age.apply(lambda x:0 if x>=0 and x<500 else 0)
This is a good way to do it

The same can be obtained without using apply, but using np.where as below.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'age' : [-10,10,20,30,40,100,110]
})
df['age'] = np.where((df['age'] >= 100) | (df['age'] < 0), 0, df['age'])
df
If you have any confusion in using the above code, Please post your sample dataframe. I'll update my answer.

Filtering DataFrame with a mean treshhold

I have a DataFrame, and I want to keep only columns, when their mean is over a certain treshhold.
My code looks like this:
import pandas as pd
df = pd.DataFrame(np.random.random((20,20)))
mean_keep= (df.mean() > 0.5)
mean_keep= mean_keep[mean_keep == True]
df_new = df[mean_keep.index]
and it is working. However I wonder if there is a function like "TAKE_ONLY_COLUMNS" that can reduce this to one line like
df_new = df[TAKE_ONLY_COLUMNS(df.mean() > 0.5)]

Use df.loc[] here:
df_new=df.loc[:,df.mean() > 0.5]
print(df_new)
This will automatically keep the columns where the condition is True.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas conditionally fill values with 0 and 1 - python

Doing the following conditional fill in pyspark how would I do this in pandas colIsAcceptable = when(col("var") < 0.9, 1).otherwise(0)

You can use: df['new_col'] = df['col'].lt(0.9).astype(int) or with numpy.where: import numpy as np df['new_col'] = np.where(df['col'].lt(0.9), 1, 0)

You can use numpy.where. import numpy as np df['colIsAcceptable'] = np.where(df['col'] < 0.9, 1, 0)

colIsAcceptable = df['var'].apply(lambda x: 1 if x < 0.9 else 0) apply can be slow on very large datasets, and there are more efficient ways that I don't know of, but is good for general purposes

I assume the first column on your dataframe is named 'var'. and then the second column name is 'colIsAcceptable', then you can use .map() function df['colIsAcceptable']= df['var'].map(lambda x: 1 if x<0.9 else 0)

df['col2'] = 0 df.loc[df['col1'] < 0.9, 'col2'] = 1 This is a simple example to do something like what you are asking.

Related

replace all values in all columns based on condition

Pandas numpy how to convert np.where into pandas filter

pandas Dataframe divide a column with a specific value and create new column with the result?

pandas apply function with multiple condition?

Filtering DataFrame with a mean treshhold

Categories

Resources