The truth value of a Series is ambiguous in dataframe - python

I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?

df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])

if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?
to do this properly you can do the following:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?

Related

How do you plot information from a dataframe based on the categorical data within the rows?

Output of df.describe()
I am trying to plot a scatterplot using certain categorical information from a DataFrame, within the column 'Equipment'. I want to plot only the rows where the df['Equipment'] == 'Raw'. I have tried to use an if statement, but have come across an error.
Here is the if statement I used,
if df.Equipment == 'Raw':
plt.scatter(df['Bench'], df['Total'])
plt.show()
Here is the error code,
ValueError Traceback (most recent call last)
/tmp/ipykernel_2869/841785767.py in <module>
----> 1 if df.Equipment == 'Raw':
2 plt.scatter(df['Bench'], df['Total'])
3 plt.show()
~/anaconda3/lib/python3.9/site-packages/pandas/core/generic.py in __nonzero__(self)
1535 #final
1536 def __nonzero__(self):
-> 1537 raise ValueError(
1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
MY SOLUTION
I found a good work-around for my problem. I instead create a new DataFrame. The main problem was trying to use the conditional statement on a string instead of a value. The code I used is,
df_raw = df[df.Equipment.str.contains('Raw',case=False)]
Here is a link to the website that helped me handle my problem, it has other good useful solutions to similar problems too.
https://kanoki.org/2019/03/27/pandas-select-rows-by-condition-and-string-operations/

ValueError while iterating over dataframe: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

This question already has answers here:
How to add value to column conditional on other column
(1 answer)
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
I have a dataframe in which I am trying to convert the values in "LoginTime" to a 24HR format based on whether the "Timing" contains "am" or "pm".
data = """
LoginDate LoginTime Timing StudentId
2021-03-23 12 am 3574
2021-03-23 12 am 3574
2021-03-23 12 am 2512
2021-03-23 12 am 2692
2021-03-23 12 am 3064
"""
df = pd.read_csv(StringIO(data.strip()), sep='\s+')
I am using the following logic to convert the values:
for index in df.index:
if (df.loc[index,"Timing"] == "pm"):
df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
However, this gives me the following error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11688/1623466071.py in <module>
1 for index in df.index:
----> 2 if (df.loc[index,"Timing"] == "pm"):
3 df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1535 #final
1536 def __nonzero__(self):
-> 1537 raise ValueError(
1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It is worth noting that I have set the index of the Dataframe as "LoginDate" which is of datetime format. However, when I change the index to normal integer values (0,1,2,3,...) and keep "LoginDate" as a normal column label, the above error disappears and the code executes properly.
How do I make the code work while keeping the index as "LoginDate" ?
Do not use a loop for your operation, use a vector approach:
df['LoginTime'] = df['LoginTime'].where(df['Timing'].ne('pm'), df['LoginTime']+12)
This is simpler to read and more efficient
You could try this :
df["LoginTime"] = np.where(df["Timing"] == "pm", df["LoginTime"] + 12, df["LoginTime"])

Python error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

With more than 250 independent variables, I am trying to find variables that are statistically significant. For this, I am trying to build a for loop which will only return the variables whose P-value is less than alpha.
cols = x2.columns
alpha = 0.05
for i in cols:
if (est2.pvalues[i] < alpha) == True:
print(i)
where est2 = sm.OLS(y,x2).fit(). This is the output that I get:
LotArea
OverallQual
OverallCond
YearBuilt
YearRemodAdd
BsmtFinSF1
TotalBsmtSF
1stFlrSF
2ndFlrSF
GrLivArea
BsmtFullBath
HalfBath
GarageArea
WoodDeckSF
EnclosedPorch
ScreenPorch
MSZoning_FV
MSZoning_RH
MSZoning_RL
MSZoning_RM
LotConfig_FR2
LotConfig_Inside
LandSlope_Sev
Neighborhood_Crawfor
Neighborhood_Edwards
Neighborhood_MeadowV
Neighborhood_NridgHt
Neighborhood_StoneBr
Condition1_Norm
Condition1_PosN
Condition2_PosN
Condition2_RRAe
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-8387e9b8424a> in <module>
2 alpha = 0.05
3 for i in cols:
----> 4 if (est2.pvalues[i] < alpha) == True:
5 print(i)
6 #print(i, est2.pvalues[i] > alpha)
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 #final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It stops in the middle like this.
First, the == True is superfluous, but that shouldn't have anything to do with the error.
The error indicates that for one of the variable names i (This is a bit misleading by the way, since i would usually be an integer in a loop like this.) the expression est2.pvalues[i] is a pandas series, not just a single value. Why exactly that happens is impossible to tell without seeing the problematic variable name.
In any case, est2.pvalues is a pandas series, so you can get all the low p-values (and the corresponding variable names) with Boolean indexing like this:
est2.pvalues[est2.pvalues < 0.05]

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() for np.argmax

I'm getting the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Even tho I'm not evaluating any statement. The error comes up when I'm trying to pass this
max_sharpe_idx = np.argmax(results[2])
where results is previously passed as
results = np.zeros((3,num_portfolios), object)
and results[2] is an array of floats.
Why is it raising this error, I can't comprehend, any thoughts?
Can provide the whole functions if needed.
EDIT: Function that fills results:
def random_portfolios(num_portfolios, mean_returns, cov_matrix, risk_free_rate):
results = np.zeros((3,num_portfolios), object)
weights_record = []
for i in range(num_portfolios):
weights = np.random.random(12)
weights /= np.sum(weights)
weights_record.append(weights)
portfolio_std_dev, portfolio_return = portfolio_annualised_performance(weights, mean_returns, cov_matrix)
results[0,i] = portfolio_std_dev
results[1,i] = portfolio_return
results[2,i] = (portfolio_return - risk_free_rate) / portfolio_std_dev
return results, weights_record
UPDATE: When printing the type of results and results[2,0] this is the output:
results: <class 'numpy.ndarray'>
results[2,0]: <class 'pandas.core.series.Series'>
The variable that is probably raising a problem is:
portfolio_return <class 'pandas.core.series.Series'>
The output of portfolio_return looks like this:
ABB.ST 0.043190
ALFA.ST 0.015955
AMD 0.031319
SAAB-B.ST 0.018625
ERIC-B.ST 0.080382
FORTUM.HE 0.013456
INVE-B.ST 0.044658
NDA-SE.ST 0.027568
NOKIA-SEK.ST 0.040725
SWED-A.ST 0.013694
TEL2-B.ST 0.038682
VOLV-B.ST 0.003941
dtype: float64
Since portfolio return is a an output from:
mean_returns = returns.mean()
pandas.core.series.Series
How do I get around this?
Full code if needed: https://github.com/timoudas/PortfolioOpt
But the conclusion is that there a underlying data-structure issues that I do not know how to solve
It seems like results[2] is not an array of floats, otherwise what you provided would work. If conversion is possible,
np.argmax(results[2].astype(float))
should do it. If this yields an
ValueError: setting an array element with a sequence.
then I think the reason is that your numpy array not only contains numbers, but also other objects such as strings. Your array originally being of type object makes this very likely. I would recommend taking a look at this post and also make sure your object definitely contains nothing else than floats/integers.
Let's take seriously the error message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The problem must be an array element that is a pandas Series!
In [145]: import pandas as pd
In [146]: S = pd.Series(np.arange(10))
In [148]: x = np.empty(3,object)
In [150]: x[:]=[S,S,S]
In [151]: x
Out[151]:
array([0 0
1 1
2 2
3 3
4 4
5 5
6 6
....
9 9
dtype: int64], dtype=object)
Now I can recreate your error message:
In [152]: np.argmax(x)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-152-81bcc042be54> in <module>
----> 1 np.argmax(x)
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in argmax(a, axis, out)
1101
1102 """
-> 1103 return _wrapfunc(a, 'argmax', axis=axis, out=out)
1104
1105
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
54 def _wrapfunc(obj, method, *args, **kwds):
55 try:
---> 56 return getattr(obj, method)(*args, **kwds)
57
58 # An AttributeError occurs if the object does not have
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __nonzero__(self)
1477 raise ValueError("The truth value of a {0} is ambiguous. "
1478 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1479 .format(self.__class__.__name__))
1480
1481 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So you must have an object array that contains one or more Series
You give some code, but don't specify the nature (type) of the variables:
results[2,i] = (portfolio_return - risk_free_rate) / portfolio_std_dev

How to retrieve column value from Pandas dataframe and check condition

Dataframe column Class consists of 2 values 0 and 1.I want to count how many rows are present for Class 0 and how many rows for Class 1.I wrote code like this
genuine_count=0
fraud_count=0
if credit_card_df['Class'] == 1:
fraud_count +=1
else:
genuine_count +=1
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count
I am getting this error
ValueError Traceback (most recent call last)
<ipython-input-12-2e8ec920b69d> in <module>()
1 genuine_count=0
2 fraud_count=0
----> 3 if credit_card_df['Class'] == 1:
4 fraud_count +=1
5 else:
C:\Users\JAYASHREE\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Kindly help me resolve.Thanks
Thankfully, pandas has already written this for you:
credit_card_df['Class'].value_counts()
Alternatively, if you want to print in your own format:
genuine_count, fraud_count = credit_card_df['Class'].value_counts(sort=True)
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count
Just do:
fraud_count = (credit_card_df['Class'] == 1).sum()
genuine_count = (credit_card_df['Class'] == 0).sum()
print "Genuine transactions {}.".format(genuine_count)
print "Fraud transactions {}.".format(fraud_count)
I hope this helps.

Categories

Resources