add one column (subtraction) to a dataframe with python - python

I try to add one column in a dataframe df2 which contains the value 0 if(df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0 else df2['P_ACT_KW']- df2['P_SOUSCR'].
if (df2['P_ACT_KW'] - df2['P_SOUSCR']) <0:
df2['depassement']=0
else:
df2['depassement']= (df2['P_ACT_KW'] - df2['P_SOUSCR'])
I got this error message :
ValueError Traceback (most recent call
last) in ()
----> 1 if (df2['P_ACT_KW'] - df2['P_SOUSCR']) <0:
2 df2['depassement']=0
3 else:
4 df2['depassement']= (df2['P_ACT_KW'] - df2['P_SOUSCR'])
C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\generic.py
in nonzero(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.class.name))
893
894 bool = nonzero
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
Any idea please?
Thank you

IIUC:
df2['depassement'] = df2['P_ACT_KW'] - df2['P_SOUSCR']
df2[df2['depassement'] < 0, 'depassement'] = 0
This should also work:
df2['depassement'] = df2.P_ACT_KW.sub(df2.P_SOUSCR).apply(lambda x: max(x, 0))

You need to do:
df2['depassement'] = np.where((df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0), 0, df2['P_ACT_KW'] - df2['P_SOUSCR'])
if doesn't understand how to compare array like structures hence the error, here we can use np.where to compare all rows to produce a mask and where the condition is true set to 0 else perform the subtraction

Related

ValueError while iterating over dataframe: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

This question already has answers here:
How to add value to column conditional on other column
(1 answer)
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
I have a dataframe in which I am trying to convert the values in "LoginTime" to a 24HR format based on whether the "Timing" contains "am" or "pm".
data = """
LoginDate LoginTime Timing StudentId
2021-03-23 12 am 3574
2021-03-23 12 am 3574
2021-03-23 12 am 2512
2021-03-23 12 am 2692
2021-03-23 12 am 3064
"""
df = pd.read_csv(StringIO(data.strip()), sep='\s+')
I am using the following logic to convert the values:
for index in df.index:
if (df.loc[index,"Timing"] == "pm"):
df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
However, this gives me the following error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11688/1623466071.py in <module>
1 for index in df.index:
----> 2 if (df.loc[index,"Timing"] == "pm"):
3 df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1535 #final
1536 def __nonzero__(self):
-> 1537 raise ValueError(
1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It is worth noting that I have set the index of the Dataframe as "LoginDate" which is of datetime format. However, when I change the index to normal integer values (0,1,2,3,...) and keep "LoginDate" as a normal column label, the above error disappears and the code executes properly.
How do I make the code work while keeping the index as "LoginDate" ?
Do not use a loop for your operation, use a vector approach:
df['LoginTime'] = df['LoginTime'].where(df['Timing'].ne('pm'), df['LoginTime']+12)
This is simpler to read and more efficient
You could try this :
df["LoginTime"] = np.where(df["Timing"] == "pm", df["LoginTime"] + 12, df["LoginTime"])

Python error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

With more than 250 independent variables, I am trying to find variables that are statistically significant. For this, I am trying to build a for loop which will only return the variables whose P-value is less than alpha.
cols = x2.columns
alpha = 0.05
for i in cols:
if (est2.pvalues[i] < alpha) == True:
print(i)
where est2 = sm.OLS(y,x2).fit(). This is the output that I get:
LotArea
OverallQual
OverallCond
YearBuilt
YearRemodAdd
BsmtFinSF1
TotalBsmtSF
1stFlrSF
2ndFlrSF
GrLivArea
BsmtFullBath
HalfBath
GarageArea
WoodDeckSF
EnclosedPorch
ScreenPorch
MSZoning_FV
MSZoning_RH
MSZoning_RL
MSZoning_RM
LotConfig_FR2
LotConfig_Inside
LandSlope_Sev
Neighborhood_Crawfor
Neighborhood_Edwards
Neighborhood_MeadowV
Neighborhood_NridgHt
Neighborhood_StoneBr
Condition1_Norm
Condition1_PosN
Condition2_PosN
Condition2_RRAe
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-8387e9b8424a> in <module>
2 alpha = 0.05
3 for i in cols:
----> 4 if (est2.pvalues[i] < alpha) == True:
5 print(i)
6 #print(i, est2.pvalues[i] > alpha)
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 #final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It stops in the middle like this.
First, the == True is superfluous, but that shouldn't have anything to do with the error.
The error indicates that for one of the variable names i (This is a bit misleading by the way, since i would usually be an integer in a loop like this.) the expression est2.pvalues[i] is a pandas series, not just a single value. Why exactly that happens is impossible to tell without seeing the problematic variable name.
In any case, est2.pvalues is a pandas series, so you can get all the low p-values (and the corresponding variable names) with Boolean indexing like this:
est2.pvalues[est2.pvalues < 0.05]

Multiple if conditions pandas

Looking to write an if statement which does a calculation based on if 3 conditions across other columns in a dataframe are true. I have tried the below code which seems to have worked for others on stackoverflow but kicks up an error for me. Note the 'check', 'sqm' and 'sqft' columns are in float64 format.
if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
merge['checksqm'] == merge['sqft']/10.7639
#Error below:
alueError Traceback (most recent call last)
<ipython-input-383-e84717fde2c0> in <module>
----> 1 if ((merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)):
2 merge['checksqm'] == merge['sqft']/10.7639
~/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Each condition you code evaluates into a series of multiple boolean values. The combined result of the 3 conditions also become a boolean series. Python if statement cannot handle such Pandas series with evaluating each element in the series and feed to the statement following it one by one. Hence, the error ValueError: The truth value of a Series is ambiguous.
To solve the problem, you have to code it using Pandas syntax, like the following:
mask = (merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0)
merge.loc[mask, 'checksqm'] = merge['sqft']/10.7639
or, combine in one statement, as follows:
merge.loc[(merge['check'] == 1) & (merge['sqft'] > 0) & (merge['sqm'] == 0), 'checksqm'] = merge['sqft']/10.7639
In this way, Pandas can evaluate the boolean series and work on the rows corresponding to True values of the combined 3 conditions and process each row one by one taking corresponding values from each row for processing. This kind of vectorized operation under the scene is not supported by ordinary Python statement such as if statement.
You are trying to use pd.Series as the condition inside the if clause. This series is a mask of True, False values. You need to cast the series to bool using series.any() or series.all().

The truth value of a Series is ambiguous in dataframe

I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?
df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])
if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?
to do this properly you can do the following:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?

How to retrieve column value from Pandas dataframe and check condition

Dataframe column Class consists of 2 values 0 and 1.I want to count how many rows are present for Class 0 and how many rows for Class 1.I wrote code like this
genuine_count=0
fraud_count=0
if credit_card_df['Class'] == 1:
fraud_count +=1
else:
genuine_count +=1
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count
I am getting this error
ValueError Traceback (most recent call last)
<ipython-input-12-2e8ec920b69d> in <module>()
1 genuine_count=0
2 fraud_count=0
----> 3 if credit_card_df['Class'] == 1:
4 fraud_count +=1
5 else:
C:\Users\JAYASHREE\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Kindly help me resolve.Thanks
Thankfully, pandas has already written this for you:
credit_card_df['Class'].value_counts()
Alternatively, if you want to print in your own format:
genuine_count, fraud_count = credit_card_df['Class'].value_counts(sort=True)
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count
Just do:
fraud_count = (credit_card_df['Class'] == 1).sum()
genuine_count = (credit_card_df['Class'] == 0).sum()
print "Genuine transactions {}.".format(genuine_count)
print "Fraud transactions {}.".format(fraud_count)
I hope this helps.

Categories

Resources