How to loop through a pandas array? - python

I read several questions about for loop of pandas Dataframe, but couldn't work it out for my case
px=pd.read_sql()
for i, row in px.iterrows():
if x == 1 :
if px['first'] <= S1 :
S1 = px['first'];
if px['second'] > S2 :
previous_value = last_value;
last_value = px['second'];
x = 0;
else :
.....
This is, of course, part of the code to show the looping logic. I expected that the rows are read one by one as I can compare the values of each row with the previous row, but
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You're accessing the entire column px['first'] from inside the loop that's intended to access only one entry at a time.
To fix your current loop, it might be enough just to change px['first'] to row['first'], and also px['second'] to row['second'].
Better would be to replace this manual looping with equivalent pandas expressions, which will be much faster and readable. If you post the full code (edit into question, not as comments!), we might be able to help.

Related

Create Json File if Dataframe > Column > Criteria matches

I am struggling with the following:
Row1 Row2
A 10
B 10
C 10
D 11
F 12
I have a large data and want to create a json file if its meets Row2. (It's an Object dtype)
if df['Row2'] == '10':
df.to_json(filelocation)
else:
df.to_json(diff_filelocation)
The error is receive is: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all. I used bool and still get the same error message. When I tried any(), then only the first file gets created. I have checked multiple posts, but nothing seems to work.
I have tried the following method as well
if df[df['Row2'] == '10']
or
if df.loc[(df.Row2=='10')]
but those aren't working either.
I am also confused as something like df[df["Row2"]] works, but not in an if statement.
Thanks in advance.
You need to separate df on two different segments basing on boolean mask:
m = df['Row2'].eq(10)
d[m].to_json(filelocation)
d[~m].to_json(diff_filelocation)

The truth value of a is ambiguous when I used Iflese in Python

I am using conditional multiplication within data frame and using following syntax:
if(df_merged1["region_id"]=="EMEA"):
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]-df_merged1["TX_f"]
else:
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]
i want tax to be substracted only when region is EMEA. but getting following error
ValueError: The truth value of a {type(self).__name__} is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think there is some problem in proving the if condition but how to resolve it not getting any idea
There is no problem here - df_merged1["region_id"] == "EMEA" returns a pd.Series instance populated with boolean values, not a boolean that can be handled using conditional statements. Pandas is reluctant to automatically run a method that would convert a pd.Series instance to a boolean like pd.Series.any() or pd.Series.all(), hence the error.
To achieve what you have meant to do for reasonably sized dataframes, use pd.DataFrame.apply, axis=1 with a lambda expression and a ternary operator. That way you populate a column ["fcst_gr"] based on value in column ["region_id"] for each individual row:
df_merged1["fcst_gr"] = df_merged1.apply(
lambda row: row["plan_price_amount"] * (row["Enr"] - row["FM_f"])
+ row["OA_f"]
- row["TX_f"]
if row["region_id"] == "EMEA"
else row["plan_price_amount"] * (row["Enr"] - row["FM_f"]) + row["OA_f"],
axis=1,
)
For bigger dataframes or more complex scenarios, consider more efficient solutions.

How to apply a function with several dataframe columns as arguments?

I'm trying to compute a new column in a pandas dataframe, based upon others columns, and a function I created. Instead of using a for loop, I prefer to apply the function with entires dataframe columns.
My code is like this :
df['po'] = vect.func1(df['gra'],
Se,
df['p_a'],
df['t'],
Tc)
where df['gra'], df['p_a'], and df['t'] are my dataframe columns (parameters), and Se and Tc are others (real) parameters. df['po'] is my new column.
func1 is a function described in my vect package.
This function is :
def func1(g, surf_e, Pa, t, Tco):
if (t <= Tco):
pos = (g-(Pa*surf_e*g))
else:
pos = 0.0
return(pos)
When implemented this way, I obtain an error message, which concern the line : if (t <= Tco):
The error is :
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I read the pandas documentation, but didn't find the solution. Can anybody explain me what is the problem ?
I tried to use apply :
for example :
df['po'] = df['gra'].apply(vect.func1)
but I don't know how to use apply with multiples columns as parameters.
Thank you by advance.
Use np.where with the required condition, value when the condition is True and the default value.
df['po'] = np.where(
df['t'] <= Tc, # Condition
df['gra'] - (df['P_a'] * Se * df['gra']), # Value if True
0 # Value if False
)
EDIT:
Don't forget to import numpy as np
Also, you get an error because you are comparing a series to a series
and hence obtain a series of boolean values and not an atomic boolean
value which if condition needs.

How to fix "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" in Python Pandas? [duplicate]

This question already has answers here:
Conditional Replace Pandas
(7 answers)
Closed 3 years ago.
I have a dataset where I have two time stamp columns, one is the start time and the other is the end time. I have calculated the difference and also stored it in another column in the dataset. Based on the difference column of the dataset, I want to fill in a value in another column. I am using for loop and if else for the same but upon execution, the error "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" appears
Time_df = pd.read_excel('filepath')
print(Time_df.head(20))
for index, rows in Time_df.head().iterrows():
if(Time_df["Total Time"] < 6.00 ):
Time_df["Code"] = 1
print(Time_df.head(20))
In Total Downtime, wherever a less than 6 is encountered, it will put 1 in the column code. However, I get the error as stated in the question.
Try with np.where():
df["Code"]= np.where(df["Total Time"]<6.00,1,df["Code"])
Explanation:
#np.where(condition, choice if condition is met, choice if condition is not met)
#returns an array explained above
To fix your code
print(Time_df.head(20))
for index, rows in Time_df.head().iterrows():
if(rows["Total Time"] < 6.00 ):
Time_df.loc[index,"Code"] = 1
print(Time_df.head(20))
This happens to me a lot. In if (Time_df["Total Time"] < 6.00 ), (Time_df["Total Time"] < 6.00 ) is a series and Python does not know how to evaluate the series as a Boolean. Depending on what you want, but most likely you want to do:
Time_df.loc[Time_df["Total Time"] < 6.00, "Code"] = 1
which puts 1 in column "Code" wherever "Total Time" is < 6.
def myfn(row):
if row['Total Time'] < 6:
return 1
time_df['code'] = time_df.apply(lambda row: myfn(row), axis=1)

Python-pandas: the truth value of a series is ambiguous

I am currently trying to compare values from a json file(on which I can already work on) to values from a csv file(which might be the issue). My current code looks like this:
for data in trades['timestamp']:
data = pd.to_datetime(data)
print(data)
if data == ask_minute['lastUpdated']:
#....'do something'
Which gives:
":The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
My current print(data) looks like this:
2018-10-03 18:03:38.067000
2018-10-03 18:03:38.109000
2018-10-03 18:04:28
2018-10-03 18:04:28.685000
However, I am still unable to compare these timestamps from my CSV file to those of my Json file. Does someone have an idea?
Let's reduce it to a simpler example. By doing for instance the following comparison:
3 == pd.Series([3,2,4,1])
0 True
1 False
2 False
3 False
dtype: bool
The result you get is a Series of booleans, equal in size to the pd.Series in the right hand side of the expression. So really what's happening here is that the integer is being broadcast across the series, and then they are compared. So when you do:
if 3 == pd.Series([3,2,4,1]):
pass
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You get an error. The problem here is that you are comparing a pd.Series with a value, so you'll have multiple True and multiple False values, as in the case above. This of course is ambiguous, since the condition is neither True or False.
So you need to further aggregate the result so that a single boolean value results from the operation. For that you'll have to use either any or all depending on whether you want at least one (any) or all values to satisfy the condition.
(3 == pd.Series([3,2,4,1])).all()
# False
or
(3 == pd.Series([3,2,4,1])).any()
# True
The problem I see is that even if you are evaluating one row in a dataframe, the code knows that a dataframe has the ability to have many rows. The code doesn't just assume you want the only row that exists. You have to tell it explicitly. The way I solved it was like this:
if data.iloc[0] == ask_minute['lastUpdated']:
then the code knows you are selecting the one row that exists.

Categories

Resources