How to apply multiple conditions in a dataframe - python

I do have an issue while applying a condition and then applying a value to that particular values.
I'm new to this just practicing with dataset.
My question is I need to apply a bonus mark to those students who have secured above 77 marks for 'math'.
Any suggestion would be helpful.
I have used if condition to check if the mark is above 77 and then created a column giving bonus mark to those.
What I have tried is :The above image gives the values for those students who have secured more than 77 in 'math_score'
Now, I need to give bonus score to these students, what I tried is that I looped through each student and then wrote a if condition gives me an error as '
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please help me to achieve this goal, do correct me too. This image shows the result for the condityion, who have secured score more than 77
This is the error, I have got when I applied my condition

You can use np.where():
import numpy as np
df['bonus']=np.where(df['math_score'] > 77,(df['math_score'] + df['reading_score'] + df['writing_socre'] + 10),(df['math_score'] + df['reading_score'] + df['writing_socre']))

Related

Create Json File if Dataframe > Column > Criteria matches

I am struggling with the following:
Row1 Row2
A 10
B 10
C 10
D 11
F 12
I have a large data and want to create a json file if its meets Row2. (It's an Object dtype)
if df['Row2'] == '10':
df.to_json(filelocation)
else:
df.to_json(diff_filelocation)
The error is receive is: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all. I used bool and still get the same error message. When I tried any(), then only the first file gets created. I have checked multiple posts, but nothing seems to work.
I have tried the following method as well
if df[df['Row2'] == '10']
or
if df.loc[(df.Row2=='10')]
but those aren't working either.
I am also confused as something like df[df["Row2"]] works, but not in an if statement.
Thanks in advance.
You need to separate df on two different segments basing on boolean mask:
m = df['Row2'].eq(10)
d[m].to_json(filelocation)
d[~m].to_json(diff_filelocation)

Python - Panel data create indicator with if statement

I am trying to create an indicator equal to 1 if my meeting_date variable matches my date variable, and zero otherwise. I am getting an error in my code that consists of the following:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please let me know what I am doing wrong! Here is my code:
if crsp_12['meeting_date'] == crsp_12['date']:
crsp_12['i_meeting_date_dayof'] == 1
else:
crsp_12['i_meeting_date_dayof'] == 0
You should always avoid classical if/for constructs with pandas. Use vectorial code:
crsp_12['i_meeting_date_dayof'] = crsp_12['meeting_date'].eq(crsp_12['date']).astype(int)

The truth value of a is ambiguous when I used Iflese in Python

I am using conditional multiplication within data frame and using following syntax:
if(df_merged1["region_id"]=="EMEA"):
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]-df_merged1["TX_f"]
else:
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]
i want tax to be substracted only when region is EMEA. but getting following error
ValueError: The truth value of a {type(self).__name__} is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think there is some problem in proving the if condition but how to resolve it not getting any idea
There is no problem here - df_merged1["region_id"] == "EMEA" returns a pd.Series instance populated with boolean values, not a boolean that can be handled using conditional statements. Pandas is reluctant to automatically run a method that would convert a pd.Series instance to a boolean like pd.Series.any() or pd.Series.all(), hence the error.
To achieve what you have meant to do for reasonably sized dataframes, use pd.DataFrame.apply, axis=1 with a lambda expression and a ternary operator. That way you populate a column ["fcst_gr"] based on value in column ["region_id"] for each individual row:
df_merged1["fcst_gr"] = df_merged1.apply(
lambda row: row["plan_price_amount"] * (row["Enr"] - row["FM_f"])
+ row["OA_f"]
- row["TX_f"]
if row["region_id"] == "EMEA"
else row["plan_price_amount"] * (row["Enr"] - row["FM_f"]) + row["OA_f"],
axis=1,
)
For bigger dataframes or more complex scenarios, consider more efficient solutions.

Comparison within a Dataframe

I have list of Data from a CSV file like this :
I wish to find a list of all members whose values lie within an interval. For ex. From the attached Dataset, to find list of all warriors whose powerlevels lie between 675000 and 750000.
In the following code I enter, the operators 'and', 'or', '&','|' are not working and are returning a ValueError.
strong = df[['name', 'attack', 'defense', 'HP','armour','powerlevel']][df.powerlevel > 675000 & df.powerlevel < 750000]
print(strong)
I get the following error-
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I get by this issue, without creating a different dataframe each time?
You can use loc
strong = df.loc[(df.powerlevel > 675000) & (df.powerlevel < 750000)]
strong = strong[['name', 'attack', 'defense', 'HP','armour','powerlevel']]

How to loop through a pandas array?

I read several questions about for loop of pandas Dataframe, but couldn't work it out for my case
px=pd.read_sql()
for i, row in px.iterrows():
if x == 1 :
if px['first'] <= S1 :
S1 = px['first'];
if px['second'] > S2 :
previous_value = last_value;
last_value = px['second'];
x = 0;
else :
.....
This is, of course, part of the code to show the looping logic. I expected that the rows are read one by one as I can compare the values of each row with the previous row, but
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You're accessing the entire column px['first'] from inside the loop that's intended to access only one entry at a time.
To fix your current loop, it might be enough just to change px['first'] to row['first'], and also px['second'] to row['second'].
Better would be to replace this manual looping with equivalent pandas expressions, which will be much faster and readable. If you post the full code (edit into question, not as comments!), we might be able to help.

Categories

Resources