Could someone help me with the proper format of a python ternary operation on a vector? I have two dataframes temperature: df_today and df_yesterday. I am trying to calculate a new column for df_today to determine whether the temperature is warmer than yesterday:
df["warmer_than_yesterday"] = 'yes, warmer' if df["temp_celsius"] > df_yesterday["temp_celsius"] and df["temp_celsius"] > 10 else 'nah, not warmer'
However, I keep getting the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Does anyone know what I might be doing wrong?
Thanks in advance!
First, you can combine your if conditions into one, using np.maximum (for conciseness). Should also be more performant.
m = df["temp_celsius"] > np.maximum(10, df_yesterday["temp_celsius"])
Now, pass this mask to np.where,
df["warmer_than_yesterday"] = np.where(m, 'yes', 'no')
Or, to loc to set slices:
df["warmer_than_yesterday"] = 'no'
df.loc[m, "warmer_than_yesterday"] = 'yes'
Related
I am trying to create an indicator equal to 1 if my meeting_date variable matches my date variable, and zero otherwise. I am getting an error in my code that consists of the following:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please let me know what I am doing wrong! Here is my code:
if crsp_12['meeting_date'] == crsp_12['date']:
crsp_12['i_meeting_date_dayof'] == 1
else:
crsp_12['i_meeting_date_dayof'] == 0
You should always avoid classical if/for constructs with pandas. Use vectorial code:
crsp_12['i_meeting_date_dayof'] = crsp_12['meeting_date'].eq(crsp_12['date']).astype(int)
I am using conditional multiplication within data frame and using following syntax:
if(df_merged1["region_id"]=="EMEA"):
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]-df_merged1["TX_f"]
else:
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]
i want tax to be substracted only when region is EMEA. but getting following error
ValueError: The truth value of a {type(self).__name__} is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think there is some problem in proving the if condition but how to resolve it not getting any idea
There is no problem here - df_merged1["region_id"] == "EMEA" returns a pd.Series instance populated with boolean values, not a boolean that can be handled using conditional statements. Pandas is reluctant to automatically run a method that would convert a pd.Series instance to a boolean like pd.Series.any() or pd.Series.all(), hence the error.
To achieve what you have meant to do for reasonably sized dataframes, use pd.DataFrame.apply, axis=1 with a lambda expression and a ternary operator. That way you populate a column ["fcst_gr"] based on value in column ["region_id"] for each individual row:
df_merged1["fcst_gr"] = df_merged1.apply(
lambda row: row["plan_price_amount"] * (row["Enr"] - row["FM_f"])
+ row["OA_f"]
- row["TX_f"]
if row["region_id"] == "EMEA"
else row["plan_price_amount"] * (row["Enr"] - row["FM_f"]) + row["OA_f"],
axis=1,
)
For bigger dataframes or more complex scenarios, consider more efficient solutions.
while (i< len(df)):
if (df['ID'][i] == df['ID'][i+1]) & (df['Week_start'] == df['Week_end']):
if (df['ship'][i] > df['ship'][i+1] ):
df['radar'][i] =df['radar'][i+1] + df['parked'][i] - df['parked'][i+1]
else:
df['radar'][i] =df['radar'][i+1]
else:
df['radar'][i] = df['ship'][i]
i = i+1
I tried to get this code running but I keep on getting an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What do you recommend? Essentially I want to fill up the column radar based on conditions, I think the rest but that part work.
You are getting the error in this line:
df['Week_start'] == df['Week_end']
specify some index like
df['Week_start'][i]== df['Week_end'][i+1]
Hope this will help!
I have list of Data from a CSV file like this :
I wish to find a list of all members whose values lie within an interval. For ex. From the attached Dataset, to find list of all warriors whose powerlevels lie between 675000 and 750000.
In the following code I enter, the operators 'and', 'or', '&','|' are not working and are returning a ValueError.
strong = df[['name', 'attack', 'defense', 'HP','armour','powerlevel']][df.powerlevel > 675000 & df.powerlevel < 750000]
print(strong)
I get the following error-
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I get by this issue, without creating a different dataframe each time?
You can use loc
strong = df.loc[(df.powerlevel > 675000) & (df.powerlevel < 750000)]
strong = strong[['name', 'attack', 'defense', 'HP','armour','powerlevel']]
This question already has answers here:
Conditional Replace Pandas
(7 answers)
Closed 3 years ago.
I have a dataset where I have two time stamp columns, one is the start time and the other is the end time. I have calculated the difference and also stored it in another column in the dataset. Based on the difference column of the dataset, I want to fill in a value in another column. I am using for loop and if else for the same but upon execution, the error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" appears
Time_df = pd.read_excel('filepath')
print(Time_df.head(20))
for index, rows in Time_df.head().iterrows():
if(Time_df["Total Time"] < 6.00 ):
Time_df["Code"] = 1
print(Time_df.head(20))
In Total Downtime, wherever a less than 6 is encountered, it will put 1 in the column code. However, I get the error as stated in the question.
Try with np.where():
df["Code"]= np.where(df["Total Time"]<6.00,1,df["Code"])
Explanation:
#np.where(condition, choice if condition is met, choice if condition is not met)
#returns an array explained above
To fix your code
print(Time_df.head(20))
for index, rows in Time_df.head().iterrows():
if(rows["Total Time"] < 6.00 ):
Time_df.loc[index,"Code"] = 1
print(Time_df.head(20))
This happens to me a lot. In if (Time_df["Total Time"] < 6.00 ), (Time_df["Total Time"] < 6.00 ) is a series and Python does not know how to evaluate the series as a Boolean. Depending on what you want, but most likely you want to do:
Time_df.loc[Time_df["Total Time"] < 6.00, "Code"] = 1
which puts 1 in column "Code" wherever "Total Time" is < 6.
def myfn(row):
if row['Total Time'] < 6:
return 1
time_df['code'] = time_df.apply(lambda row: myfn(row), axis=1)