Working with pandas dataframe and Combining conditions gives ambiguity error - python

I am trying to calculate the candle stick pattern called Doji. It requires two calculation of two conditions
values is a pandas dataframe with the historical stock data with columns Date, High, Low, open and Close.
With the if condition I tried to explicitly make condition1 and condition2 bool and also tried it by typecasting it with any(). Both of them did not give the desired result.Printing condition1 and condition 2 separately give appropriate boolean value but combining it with '&' goes horribly wrong.
51315 True
51316 True
51317 True
51318 True
51319 True
...
53790 True
53791 True
53792 True
53793 True
53794 True
Length: 2480, dtype: bool
ValueError Traceback (most recent call last)
<ipython-input-58-3f42eed169f4> in <module>
4 values = pd.DataFrame(stocks_data.loc[stocks_data['Company']=='TCS'])
5 values.reset_index()
----> 6 study_candlesticks(values)
<ipython-input-57-fd67b4117699> in study_candlesticks(values)
21 # for row in values
22
---> 23 if calc_doji(values):
24 values['Pattern']='Doji'
25
<ipython-input-57-fd67b4117699> in calc_doji(values)
81 condition2=((values['High'] - values['Low'])>values['Close']*min_candle_size)
82 print(condition2)
---> 83 if ((condition1).bool()&(condition2).any()):
84 return True
85 else:
~\Anaconda3\lib\site-packages\pandas\core\generic.py in bool(self)
1581 )
1582
-> 1583 self.__nonzero__()
1584
1585 def __abs__(self):
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1553 "The truth value of a {0} is ambiguous. "
1554 "Use a.empty, a.bool(), a.item(), a.any() or a.all().".format(
-> 1555 self.__class__.__name__
1556 )
1557 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I am not sure where I am going wrong. Any suggestions?
Below is the code
calc_doji(values)
def calc_doji(values):
max_candle_size=0.1/100
min_candle_size=1/100
condition1 =(abs(values['Open'] - values['Close'])<values['Close']*max_candle_size)
print(condition1)
condition2=((values['High'] - values['Low'])>values['Close']*min_candle_size)
print(condition2)
if ((condition1).bool()&(condition2).any()):
return True
else:
return False

If you have two pd.Series, where dtype('bool'). You can compare them in the following way. Without knowing what your data looks like, I've created two pd.Series with either True or False.
import pandas as pd
import numpy as np
condition1= pd.Series(np.random.choice([True, False], 100))
condition2= pd.Series(np.random.choice([True, False], 100))
Then you can compare by doing the following.
(condition1) & (condition2) # which returns a `pd.Series` where each row is either `True` or `False`.
To find any index position from each pd.Series where both values are True.
((condition1) & (condition2)).any() # Which returns either `True` or `False`
From your code, I would guess this line is the issue.
if ((condition1).bool()&(condition2).any()):
which should be
if ((condition1) & (condition2)).any():

Related

ValueError while iterating over dataframe: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

This question already has answers here:
How to add value to column conditional on other column
(1 answer)
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
I have a dataframe in which I am trying to convert the values in "LoginTime" to a 24HR format based on whether the "Timing" contains "am" or "pm".
data = """
LoginDate LoginTime Timing StudentId
2021-03-23 12 am 3574
2021-03-23 12 am 3574
2021-03-23 12 am 2512
2021-03-23 12 am 2692
2021-03-23 12 am 3064
"""
df = pd.read_csv(StringIO(data.strip()), sep='\s+')
I am using the following logic to convert the values:
for index in df.index:
if (df.loc[index,"Timing"] == "pm"):
df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
However, this gives me the following error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11688/1623466071.py in <module>
1 for index in df.index:
----> 2 if (df.loc[index,"Timing"] == "pm"):
3 df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1535 #final
1536 def __nonzero__(self):
-> 1537 raise ValueError(
1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It is worth noting that I have set the index of the Dataframe as "LoginDate" which is of datetime format. However, when I change the index to normal integer values (0,1,2,3,...) and keep "LoginDate" as a normal column label, the above error disappears and the code executes properly.
How do I make the code work while keeping the index as "LoginDate" ?
Do not use a loop for your operation, use a vector approach:
df['LoginTime'] = df['LoginTime'].where(df['Timing'].ne('pm'), df['LoginTime']+12)
This is simpler to read and more efficient
You could try this :
df["LoginTime"] = np.where(df["Timing"] == "pm", df["LoginTime"] + 12, df["LoginTime"])

Python error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

With more than 250 independent variables, I am trying to find variables that are statistically significant. For this, I am trying to build a for loop which will only return the variables whose P-value is less than alpha.
cols = x2.columns
alpha = 0.05
for i in cols:
if (est2.pvalues[i] < alpha) == True:
print(i)
where est2 = sm.OLS(y,x2).fit(). This is the output that I get:
LotArea
OverallQual
OverallCond
YearBuilt
YearRemodAdd
BsmtFinSF1
TotalBsmtSF
1stFlrSF
2ndFlrSF
GrLivArea
BsmtFullBath
HalfBath
GarageArea
WoodDeckSF
EnclosedPorch
ScreenPorch
MSZoning_FV
MSZoning_RH
MSZoning_RL
MSZoning_RM
LotConfig_FR2
LotConfig_Inside
LandSlope_Sev
Neighborhood_Crawfor
Neighborhood_Edwards
Neighborhood_MeadowV
Neighborhood_NridgHt
Neighborhood_StoneBr
Condition1_Norm
Condition1_PosN
Condition2_PosN
Condition2_RRAe
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-8387e9b8424a> in <module>
2 alpha = 0.05
3 for i in cols:
----> 4 if (est2.pvalues[i] < alpha) == True:
5 print(i)
6 #print(i, est2.pvalues[i] > alpha)
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 #final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It stops in the middle like this.
First, the == True is superfluous, but that shouldn't have anything to do with the error.
The error indicates that for one of the variable names i (This is a bit misleading by the way, since i would usually be an integer in a loop like this.) the expression est2.pvalues[i] is a pandas series, not just a single value. Why exactly that happens is impossible to tell without seeing the problematic variable name.
In any case, est2.pvalues is a pandas series, so you can get all the low p-values (and the corresponding variable names) with Boolean indexing like this:
est2.pvalues[est2.pvalues < 0.05]

Multiple if conditions pandas

Looking to write an if statement which does a calculation based on if 3 conditions across other columns in a dataframe are true. I have tried the below code which seems to have worked for others on stackoverflow but kicks up an error for me. Note the 'check', 'sqm' and 'sqft' columns are in float64 format.
if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
merge['checksqm'] == merge['sqft']/10.7639
#Error below:
alueError Traceback (most recent call last)
<ipython-input-383-e84717fde2c0> in <module>
----> 1 if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
2 merge['checksqm'] == merge['sqft']/10.7639
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Each condition you code evaluates into a series of multiple boolean values. The combined result of the 3 conditions also become a boolean series. Python if statement cannot handle such Pandas series with evaluating each element in the series and feed to the statement following it one by one. Hence, the error ValueError: The truth value of a Series is ambiguous.
To solve the problem, you have to code it using Pandas syntax, like the following:
mask = (merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)
merge.loc[mask, 'checksqm'] = merge['sqft']/10.7639
or, combine in one statement, as follows:
merge.loc[(merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0), 'checksqm'] = merge['sqft']/10.7639
In this way, Pandas can evaluate the boolean series and work on the rows corresponding to True values of the combined 3 conditions and process each row one by one taking corresponding values from each row for processing. This kind of vectorized operation under the scene is not supported by ordinary Python statement such as if statement.
You are trying to use pd.Series as the condition inside the if clause. This series is a mask of True, False values. You need to cast the series to bool using series.any() or series.all().

How to update a Python Dataframe column dependent on the presence of a substring in another column

So I have a dataframe containing a float64 type column and an object type column containing a string.
If object column contains substring 'abc' I want to subtract 12 from the float column. If object column contains substring 'def' I want to subtract 24 from the float column. If object column contains neither 'abc' or 'def', I want to leave float column as is.
Example:
Nmbr Strng
52 abcghi
80 defghi
10 ghijkl
Expected output:
Nmbr Strng
40 abcghi
56 defghi
10 ghijkl
I have tried the following but keep getting an error:
if df.Strng.str.contains("abc"):
df.Nmbr = (df.Nmbr - 12)
elif df.Strng.str.contains("def"):
df.Nmbr = (df.Nmbr - 24)
else:
df.Nmbr = df.Nmbr
The error I'm getting is as follows:
915 raise ValueError("The truth value of a {0} is ambiguous. "
916 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
917 .format(self.__class__.__name__))
918
919 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Note:Line 917 is the one that's highlighted as the error.
Your error occurs because you are testing whether a Boolean series is True or False. This is not possible. You could test if all or any values are True, to return a single Boolean, but this isn't what you are looking for.
It is good practice to vectorize your calculations rather than introduce loops. Below is how you can implement your logic via the .loc accessor.
df.loc[df['Strng'].str.contains('abc', regex=False, na=False), 'Nmbr'] -= 12
df.loc[df['Strng'].str.contains('def', regex=False, na=False), 'Nmbr'] -= 24
Result:
Nmbr Strng
0 40 abcghi
1 56 defghi
2 10 ghijkl

The truth value of a Series is ambiguous in dataframe

I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?
df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])
if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?
to do this properly you can do the following:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?

Categories

Resources