I have a Dataframe for instance:
df <- data.frame("condition" = [a,a,a,b,b,b,a,b], "dv1" = [7,8,6,3,2,1,5,4])`
and I want to subtract 10 from column dv1 only if values in column condition equals to "a". Is there a way in python to do so such as using def and if function? I have tried the following but doesn't work:
def recode():
for i in df["condition']:
if i == "a":
return abs(10-df["dv1"])
Indent the code like this:
def recode():
for i in df["condition"]:
if i == "a":
return abs(10-df["dv1"])
Related
Consider the following dataframe
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
Now consider the following code snippet in which I try to print slices of the dataframe filtered on the column "Date". However, it prints a empty dataframe
for date in set(Y['Date']):
print(Y.query(f'Date == {date.date()}'))
Essentially, I wanted to filter the dataframe on the column "Date" and do some processing on that in the loop. How do I achieve that?
The date needs to be accessed at the following query command:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
for date in set(Y['Date']):
print(Y.query('Date == #date'))
Use "" because f-strings removed original "" and error is raised:
Y = pd.DataFrame([("2021-10-11","john"),("2021-10-12","wick")],columns = ['Date','Name'])
Y['Date'] = pd.to_datetime(Y['Date'])
for date in set(Y['Date']):
print(Y.query(f'Date == "{date}"'))
I know how to apply an IF condition in Pandas DataFrame. link
However, my question is how to do the following:
if (df[df['col1'] == 0]):
sys.path.append("/desktop/folder/")
import self_module as sm
df = sm.call_function(df)
What I really want to do is when value in col1 equals to 0 then call function call_function().
def call_function(ds):
ds['new_age'] = (ds['age']* 0.012345678901).round(12)
return ds
I provide a simple example above for call_function().
Since your function interacts with multiple columns and returns a whole data frame, run conditional logic inside the method:
def call_function(ds):
ds['new_age'] = np.nan
ds.loc[ds['col'] == 0, 'new_age'] = ds['age'].mul(0.012345678901).round(12)
return ds
df = call_function(df)
If you are unable to modify the function, run method on splits of data frame and concat or append together. Any new columns in other split will be have values filled with NAs.
def call_function(ds):
ds['new_age'] = (ds['age']* 0.012345678901).round(12)
return ds
df = pd.concat([call_function(df[df['col'] == 0].copy()),
df[df['col'] != 0].copy()])
I am writing a function that will serve as filter for rows that I wanted to use.
The sample data frame is as follow:
df = pd.DataFrame()
df ['Xstart'] = [1,2.5,3,4,5]
df ['Xend'] = [6,8,9,10,12]
df ['Ystart'] = [0,1,2,3,4]
df ['Yend'] = [6,8,9,10,12]
df ['GW'] = [1,1,2,3,4]
def filter(data,Game_week):
pass_data = data [(data['GW'] == Game_week)]
when I recall the function filter as follow, I got an error.
df1 = filter(df,1)
The error message is
AttributeError: 'NoneType' object has no attribute 'head'
but when I use manual filter, it works.
pass_data = df [(df['GW'] == [1])]
This is my first issue.
My second issue is that I want to filter the rows with multiple GW (1,2,3) etc.
For that I can manually do it as follow:
pass_data = df [(df['GW'] == [1])|(df['GW'] == [2])|(df['GW'] == [3])]
if I want to use in function input as list [1,2,3]
how can I write it in function such that I can input a range of 1 to 3?
Could anyone please advise?
Thanks,
Zep
Use isin for pass list of values instead scalar, also filter is existing function in python, so better is change function name:
def filter_vals(data,Game_week):
return data[data['GW'].isin(Game_week)]
df1 = filter_vals(df,range(1,4))
Because you don't return in the function, so it will be None, not the desired dataframe, so do (note that also no need parenthesis inside the data[...]):
def filter(data,Game_week):
return data[data['GW'] == Game_week]
Also, isin may well be better:
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Use return to return data from the function for the first part. For the second, use -
def filter(data,Game_week):
return data[data['GW'].isin(Game_week)]
Now apply the filter function -
df1 = filter(df,[1,2])
I just wrote this function to calculated the age's person based in two columns in a Python DataFrame. Unfortunately, if a use the return the function return the same value for all rows, but if I use the print statement the function gives me the right values.
Here is the code:
def calc_age(dataset):
index = dataset.index
for element in index:
year_nasc = train['DT_NASCIMENTO_BENEFICIARIO'][element][6:]
year_insc = train['ANO_CONCESSAO_BOLSA'][element]
age = int(year_insc) - int(year_nasc)
print ('Age: ', age)
#return age
train['DT_NASCIMENTO_BENEFICIARIO'] = 03-02-1987
train['ANO_CONCESSAO_BOLSA'] = 2009
What am I doing wrong?!
If what you want is to subtract the year of DT_NASCIMENTO_BENEFICIARIO from ANO_CONCESSAO_BOLSA, and df is your DataFrame:
# cast to datetime
df["DT_NASCIMENTO_BENEFICIARIO"] = pd.to_datetime(df["DT_NASCIMENTO_BENEFICIARIO"])
df["age"] = df["ANO_CONCESSAO_BOLSA"] - df["DT_NASCIMENTO_BENEFICIARIO"].dt.year
# print the result, or do something else with it:
print(df["age"])
I am new to Python and I am trying to figure out the Python equivalent of VBA Cells command.
I am trying to be able to loop through all the rows in a Dataframe and replace the value given a condition. In VBA, the code would look something like this:
for i = 1 to 100
if range.cells(i,1) <> "" then
range.cells(i,1) = 100
end if
next i
However, in Python I believe I need to use the df.iloc command but I cant seem to figure out how to iterate over rows and replace the value of the row based on a condition.
For example, in the dataframe below I would like to see the string 'answer' in d['test'] every time the right most digit in d['val_1'] == 4.
import pandas as pd
d = {'val_1': range(0,100), 'val_2': 4}
d = pd.DataFrame(d)
d['sum'] = d['val_1'] + d['val_2']
d['test'] = ""
for i in range(0,len(d)):
if right(d.iloc[i,'val_1'],1) == 4:
d.iloc[i,'test'] = 'answer'