Looking to write an if statement which does a calculation based on if 3 conditions across other columns in a dataframe are true. I have tried the below code which seems to have worked for others on stackoverflow but kicks up an error for me. Note the 'check', 'sqm' and 'sqft' columns are in float64 format.
if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
merge['checksqm'] == merge['sqft']/10.7639
#Error below:
alueError Traceback (most recent call last)
<ipython-input-383-e84717fde2c0> in <module>
----> 1 if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
2 merge['checksqm'] == merge['sqft']/10.7639
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Each condition you code evaluates into a series of multiple boolean values. The combined result of the 3 conditions also become a boolean series. Python if statement cannot handle such Pandas series with evaluating each element in the series and feed to the statement following it one by one. Hence, the error ValueError: The truth value of a Series is ambiguous.
To solve the problem, you have to code it using Pandas syntax, like the following:
mask = (merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)
merge.loc[mask, 'checksqm'] = merge['sqft']/10.7639
or, combine in one statement, as follows:
merge.loc[(merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0), 'checksqm'] = merge['sqft']/10.7639
In this way, Pandas can evaluate the boolean series and work on the rows corresponding to True values of the combined 3 conditions and process each row one by one taking corresponding values from each row for processing. This kind of vectorized operation under the scene is not supported by ordinary Python statement such as if statement.
You are trying to use pd.Series as the condition inside the if clause. This series is a mask of True, False values. You need to cast the series to bool using series.any() or series.all().
Related
I am trying to calculate the candle stick pattern called Doji. It requires two calculation of two conditions
values is a pandas dataframe with the historical stock data with columns Date, High, Low, open and Close.
With the if condition I tried to explicitly make condition1 and condition2 bool and also tried it by typecasting it with any(). Both of them did not give the desired result.Printing condition1 and condition 2 separately give appropriate boolean value but combining it with '&' goes horribly wrong.
51315 True
51316 True
51317 True
51318 True
51319 True
...
53790 True
53791 True
53792 True
53793 True
53794 True
Length: 2480, dtype: bool
ValueError Traceback (most recent call last)
<ipython-input-58-3f42eed169f4> in <module>
4 values = pd.DataFrame(stocks_data.loc[stocks_data['Company']=='TCS'])
5 values.reset_index()
----> 6 study_candlesticks(values)
<ipython-input-57-fd67b4117699> in study_candlesticks(values)
21 # for row in values
22
---> 23 if calc_doji(values):
24 values['Pattern']='Doji'
25
<ipython-input-57-fd67b4117699> in calc_doji(values)
81 condition2=((values['High'] - values['Low'])>values['Close']*min_candle_size)
82 print(condition2)
---> 83 if ((condition1).bool()&(condition2).any()):
84 return True
85 else:
~\Anaconda3\lib\site-packages\pandas\core\generic.py in bool(self)
1581 )
1582
-> 1583 self.__nonzero__()
1584
1585 def __abs__(self):
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.__class__.__name__
1556 )
1557 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I am not sure where I am going wrong. Any suggestions?
Below is the code
calc_doji(values)
def calc_doji(values):
max_candle_size=0.1/100
min_candle_size=1/100
condition1 =(abs(values['Open'] - values['Close'])<values['Close']*max_candle_size)
print(condition1)
condition2=((values['High'] - values['Low'])>values['Close']*min_candle_size)
print(condition2)
if ((condition1).bool()&(condition2).any()):
return True
else:
return False
If you have two pd.Series, where dtype('bool'). You can compare them in the following way. Without knowing what your data looks like, I've created two pd.Series with either True or False.
import pandas as pd
import numpy as np
condition1= pd.Series(np.random.choice([True, False], 100))
condition2= pd.Series(np.random.choice([True, False], 100))
Then you can compare by doing the following.
(condition1) & (condition2) # which returns a `pd.Series` where each row is either `True` or `False`.
To find any index position from each pd.Series where both values are True.
((condition1) & (condition2)).any() # Which returns either `True` or `False`
From your code, I would guess this line is the issue.
if ((condition1).bool()&(condition2).any()):
which should be
if ((condition1) & (condition2)).any():
while (i< len(df)):
if (df['ID'][i] == df['ID'][i+1]) & (df['Week_start'] == df['Week_end']):
if (df['ship'][i] > df['ship'][i+1] ):
df['radar'][i] =df['radar'][i+1] + df['parked'][i] - df['parked'][i+1]
else:
df['radar'][i] =df['radar'][i+1]
else:
df['radar'][i] = df['ship'][i]
i = i+1
I tried to get this code running but I keep on getting an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What do you recommend? Essentially I want to fill up the column radar based on conditions, I think the rest but that part work.
You are getting the error in this line:
df['Week_start'] == df['Week_end']
specify some index like
df['Week_start'][i]== df['Week_end'][i+1]
Hope this will help!
This question already has answers here:
pandas: multiple conditions while indexing data frame - unexpected behavior
(5 answers)
Closed 3 years ago.
I have a dataframe df with a column "A". How do I choose a subset of df based on multiple conditions. I am trying:
train.loc[(train["A"] != 2) or (train["A"] != 10)]
The or operator doesnt seem to be working. How can I fix this? I got the error:
ValueError Traceback (most recent call last)
<ipython-input-30-e949fa2bb478> in <module>
----> 1 sub_train.loc[(sub_train["primary_use"] != 2) or (sub_train["primary_use"] != 10), "year_built"]
/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.__class__.__name__
1556 )
1557 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Use | for bitwise OR or & for bitwise AND, also loc is not necessary:
#filter 2 or 10
train[(train["A"] == 2) | (train["A"] == 10)]
#filter not 2 and not 10
train[(train["A"] != 2) & (train["A"] != 10)]
If want also select some columns then is necessary:
train.loc[(train["A"] == 2) | (train["A"] == 10), 'B']
you need | instead of OR to do logic with Series:
train.loc[(train["A"] != 2) | (train["A"] != 10)]
To not worry about parentheses use Series.ne.
loc here in principle is not necessary if you do not want to select a specific column:
train[train["A"].ne(2) | train["A"].ne(10)]
But I think your logic is wrong since this mask does not filter
If the value is 2 it will not be filtered because it is different from 10 and vice versa. I think you wantSeries.isin + ~:
train[~train["A"].isin([2,10])]
or &
train[train["A"].ne(2) & train["A"].ne(10)]
I'm defining a simple if xxxx return y - else return NaN function. If the record, ['Product'], equals ['Product'] offset by 8 then the if condition is true.
I've tried calling the record and setting it equal to itself offset by 8 using == and .shift(8). ['Product'] is a string and ['Sales'] is an integer.
def Growth (X):
if X['Product'] == X['Product'].shift(8):
return (1+ X['Sales'].shift(4)) / (1+ X['Sales'].shift(8) - 1)
else:
return 'NaN'
I expect the output to be NaN for the first 8 records, and then to have numbers at record 9, but I receive the error instead.
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Firstly a general comment from StackOverflow's Truth value of a Series is ambiguous...:
The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use "bitwise" | (or) or & (and) operations.
Secondly, you use == on Series objects. For this Pandas tries to convert the first object to a truth value - and fails, because this is ambiguous.
use X['Product'].equals(X['Product'].shift(8))
I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?
df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])
if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?
to do this properly you can do the following:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?