Python-pandas: the truth value of a series is ambiguous - python

I am currently trying to compare values from a json file(on which I can already work on) to values from a csv file(which might be the issue). My current code looks like this:
for data in trades['timestamp']:
data = pd.to_datetime(data)
print(data)
if data == ask_minute['lastUpdated']:
#....'do something'
Which gives:
":The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
My current print(data) looks like this:
2018-10-03 18:03:38.067000
2018-10-03 18:03:38.109000
2018-10-03 18:04:28
2018-10-03 18:04:28.685000
However, I am still unable to compare these timestamps from my CSV file to those of my Json file. Does someone have an idea?

Let's reduce it to a simpler example. By doing for instance the following comparison:
3 == pd.Series([3,2,4,1])
0 True
1 False
2 False
3 False
dtype: bool
The result you get is a Series of booleans, equal in size to the pd.Series in the right hand side of the expression. So really what's happening here is that the integer is being broadcast across the series, and then they are compared. So when you do:
if 3 == pd.Series([3,2,4,1]):
pass
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You get an error. The problem here is that you are comparing a pd.Series with a value, so you'll have multiple True and multiple False values, as in the case above. This of course is ambiguous, since the condition is neither True or False.
So you need to further aggregate the result so that a single boolean value results from the operation. For that you'll have to use either any or all depending on whether you want at least one (any) or all values to satisfy the condition.
(3 == pd.Series([3,2,4,1])).all()
# False
or
(3 == pd.Series([3,2,4,1])).any()
# True

The problem I see is that even if you are evaluating one row in a dataframe, the code knows that a dataframe has the ability to have many rows. The code doesn't just assume you want the only row that exists. You have to tell it explicitly. The way I solved it was like this:
if data.iloc[0] == ask_minute['lastUpdated']:
then the code knows you are selecting the one row that exists.

Related

Create Json File if Dataframe > Column > Criteria matches

I am struggling with the following:
Row1 Row2
A 10
B 10
C 10
D 11
F 12
I have a large data and want to create a json file if its meets Row2. (It's an Object dtype)
if df['Row2'] == '10':
df.to_json(filelocation)
else:
df.to_json(diff_filelocation)
The error is receive is: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all. I used bool and still get the same error message. When I tried any(), then only the first file gets created. I have checked multiple posts, but nothing seems to work.
I have tried the following method as well
if df[df['Row2'] == '10']
or
if df.loc[(df.Row2=='10')]
but those aren't working either.
I am also confused as something like df[df["Row2"]] works, but not in an if statement.
Thanks in advance.
You need to separate df on two different segments basing on boolean mask:
m = df['Row2'].eq(10)
d[m].to_json(filelocation)
d[~m].to_json(diff_filelocation)

The truth value of a is ambiguous when I used Iflese in Python

I am using conditional multiplication within data frame and using following syntax:
if(df_merged1["region_id"]=="EMEA"):
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]-df_merged1["TX_f"]
else:
df_merged1["fcst_gr"] = df_merged1["plan_price_amount"]*(df_merged1["Enr"]-df_merged1["FM_f"])+df_merged1["OA_f"]
i want tax to be substracted only when region is EMEA. but getting following error
ValueError: The truth value of a {type(self).__name__} is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think there is some problem in proving the if condition but how to resolve it not getting any idea
There is no problem here - df_merged1["region_id"] == "EMEA" returns a pd.Series instance populated with boolean values, not a boolean that can be handled using conditional statements. Pandas is reluctant to automatically run a method that would convert a pd.Series instance to a boolean like pd.Series.any() or pd.Series.all(), hence the error.
To achieve what you have meant to do for reasonably sized dataframes, use pd.DataFrame.apply, axis=1 with a lambda expression and a ternary operator. That way you populate a column ["fcst_gr"] based on value in column ["region_id"] for each individual row:
df_merged1["fcst_gr"] = df_merged1.apply(
lambda row: row["plan_price_amount"] * (row["Enr"] - row["FM_f"])
+ row["OA_f"]
- row["TX_f"]
if row["region_id"] == "EMEA"
else row["plan_price_amount"] * (row["Enr"] - row["FM_f"]) + row["OA_f"],
axis=1,
)
For bigger dataframes or more complex scenarios, consider more efficient solutions.

How to include a string being equal to itself shifted as a coniditon in a function definition?

I'm defining a simple if xxxx return y - else return NaN function. If the record, ['Product'], equals ['Product'] offset by 8 then the if condition is true.
I've tried calling the record and setting it equal to itself offset by 8 using == and .shift(8). ['Product'] is a string and ['Sales'] is an integer.
def Growth (X):
if X['Product'] == X['Product'].shift(8):
return (1+ X['Sales'].shift(4)) / (1+ X['Sales'].shift(8) - 1)
else:
return 'NaN'
I expect the output to be NaN for the first 8 records, and then to have numbers at record 9, but I receive the error instead.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Firstly a general comment from StackOverflow's Truth value of a Series is ambiguous...:
The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use "bitwise" | (or) or & (and) operations.
Secondly, you use == on Series objects. For this Pandas tries to convert the first object to a truth value - and fails, because this is ambiguous.
use X['Product'].equals(X['Product'].shift(8))

How to loop through a pandas array?

I read several questions about for loop of pandas Dataframe, but couldn't work it out for my case
px=pd.read_sql()
for i, row in px.iterrows():
if x == 1 :
if px['first'] <= S1 :
S1 = px['first'];
if px['second'] > S2 :
previous_value = last_value;
last_value = px['second'];
x = 0;
else :
.....
This is, of course, part of the code to show the looping logic. I expected that the rows are read one by one as I can compare the values of each row with the previous row, but
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You're accessing the entire column px['first'] from inside the loop that's intended to access only one entry at a time.
To fix your current loop, it might be enough just to change px['first'] to row['first'], and also px['second'] to row['second'].
Better would be to replace this manual looping with equivalent pandas expressions, which will be much faster and readable. If you post the full code (edit into question, not as comments!), we might be able to help.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()?

I'm noob with pandas, and recently, I got that 'ValueError' when I'm trying to modify the columns that follows some rules, as:
csv_input = pd.read_csv(fn, error_bad_lines=False)
if csv_input['ip.src'] == '192.168.1.100':
csv_input['flow_dir'] = 1
csv_input['ip.src'] = 1
csv_input['ip.dst'] = 0
else:
if csv_input['ip.dst'] == '192.168.1.100':
csv_input['flow_dir'] = 0
csv_input['ip.src'] = 0
csv_input['ip.dst'] = 1
I was searching about this error and I guess that it's because the 'if' statement and the '==' operator, but I don't know how to fix this.
Thanks!
So Andrew L's comment is correct, but I'm going to expand on it a bit for your benefit.
When you call, e.g.
csv_input['ip.dst'] == '192.168.1.100'
What this returns is a Series, with the same index as csv_input, but all the values in that series are boolean, and represent whether the value in csv_input['ip.dst'] for that row is equal to '192.168.1.100'.
So, when you call
if csv_input['ip.dst'] == '192.168.1.100':
You're asking whether that Series evaluates to True or False. Hopefully that explains what it meant by The truth value of a Series is ambiguous., it's a Series, it can't be boiled down to a boolean.
Now, what it looks like you're trying to do is set the values in the flow_dir,ip.src & ip.dst columns, based on the value in the ip.src column.
The correct way to do this is would be with .loc[], something like this:
#equivalent to first if statement
csv_input.loc[
csv_input['ip.src'] = '192.168.1.100',
('ip.src','ip.dst','flow_dir')
] = (1,0,1)
#equivalent to second if statement
csv_input.loc[
csv_input['ip.dst'] = '192.168.1.100',
('ip.src','ip.dst','flow_dir')
] = (0,1,0)

Categories

Resources