How to retrieve column value from Pandas dataframe and check condition - python

Dataframe column Class consists of 2 values 0 and 1.I want to count how many rows are present for Class 0 and how many rows for Class 1.I wrote code like this
genuine_count=0
fraud_count=0
if credit_card_df['Class'] == 1:
fraud_count +=1
else:
genuine_count +=1
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count
I am getting this error
ValueError Traceback (most recent call last)
<ipython-input-12-2e8ec920b69d> in <module>()
1 genuine_count=0
2 fraud_count=0
----> 3 if credit_card_df['Class'] == 1:
4 fraud_count +=1
5 else:
C:\Users\JAYASHREE\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Kindly help me resolve.Thanks

Thankfully, pandas has already written this for you:
credit_card_df['Class'].value_counts()
Alternatively, if you want to print in your own format:
genuine_count, fraud_count = credit_card_df['Class'].value_counts(sort=True)
print "Genuine transactions"+genuine_count
print "Fraud transactions"+fraud_count

Just do:
fraud_count = (credit_card_df['Class'] == 1).sum()
genuine_count = (credit_card_df['Class'] == 0).sum()
print "Genuine transactions {}.".format(genuine_count)
print "Fraud transactions {}.".format(fraud_count)
I hope this helps.

Related

ValueError while iterating over dataframe: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() [duplicate]

This question already has answers here:
How to add value to column conditional on other column
(1 answer)
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
I have a dataframe in which I am trying to convert the values in "LoginTime" to a 24HR format based on whether the "Timing" contains "am" or "pm".
data = """
LoginDate LoginTime Timing StudentId
2021-03-23 12 am 3574
2021-03-23 12 am 3574
2021-03-23 12 am 2512
2021-03-23 12 am 2692
2021-03-23 12 am 3064
"""
df = pd.read_csv(StringIO(data.strip()), sep='\s+')
I am using the following logic to convert the values:
for index in df.index:
if (df.loc[index,"Timing"] == "pm"):
df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
However, this gives me the following error:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11688/1623466071.py in <module>
1 for index in df.index:
----> 2 if (df.loc[index,"Timing"] == "pm"):
3 df.loc[index, "LoginTime"] = df.loc[index, "LoginTime"] + 12
c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1535 #final
1536 def __nonzero__(self):
-> 1537 raise ValueError(
1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It is worth noting that I have set the index of the Dataframe as "LoginDate" which is of datetime format. However, when I change the index to normal integer values (0,1,2,3,...) and keep "LoginDate" as a normal column label, the above error disappears and the code executes properly.
How do I make the code work while keeping the index as "LoginDate" ?
Do not use a loop for your operation, use a vector approach:
df['LoginTime'] = df['LoginTime'].where(df['Timing'].ne('pm'), df['LoginTime']+12)
This is simpler to read and more efficient
You could try this :
df["LoginTime"] = np.where(df["Timing"] == "pm", df["LoginTime"] + 12, df["LoginTime"])

Python error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

With more than 250 independent variables, I am trying to find variables that are statistically significant. For this, I am trying to build a for loop which will only return the variables whose P-value is less than alpha.
cols = x2.columns
alpha = 0.05
for i in cols:
if (est2.pvalues[i] < alpha) == True:
print(i)
where est2 = sm.OLS(y,x2).fit(). This is the output that I get:
LotArea
OverallQual
OverallCond
YearBuilt
YearRemodAdd
BsmtFinSF1
TotalBsmtSF
1stFlrSF
2ndFlrSF
GrLivArea
BsmtFullBath
HalfBath
GarageArea
WoodDeckSF
EnclosedPorch
ScreenPorch
MSZoning_FV
MSZoning_RH
MSZoning_RL
MSZoning_RM
LotConfig_FR2
LotConfig_Inside
LandSlope_Sev
Neighborhood_Crawfor
Neighborhood_Edwards
Neighborhood_MeadowV
Neighborhood_NridgHt
Neighborhood_StoneBr
Condition1_Norm
Condition1_PosN
Condition2_PosN
Condition2_RRAe
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-8387e9b8424a> in <module>
2 alpha = 0.05
3 for i in cols:
----> 4 if (est2.pvalues[i] < alpha) == True:
5 print(i)
6 #print(i, est2.pvalues[i] > alpha)
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1440 #final
1441 def __nonzero__(self):
-> 1442 raise ValueError(
1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
It stops in the middle like this.
First, the == True is superfluous, but that shouldn't have anything to do with the error.
The error indicates that for one of the variable names i (This is a bit misleading by the way, since i would usually be an integer in a loop like this.) the expression est2.pvalues[i] is a pandas series, not just a single value. Why exactly that happens is impossible to tell without seeing the problematic variable name.
In any case, est2.pvalues is a pandas series, so you can get all the low p-values (and the corresponding variable names) with Boolean indexing like this:
est2.pvalues[est2.pvalues < 0.05]

The truth value of a Series is ambiguous in dataframe

I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?
df_reader['email1_b']=='NaN' is a vector of Boolean values (one per row), but you need one Boolean value for if to work. Use this instead:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'? Is it not NaN? In the latter case, your expression should be:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])
if expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False/True?
to do this properly you can do the following:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str 'NaN' rather than the numerical NaN is this intended?

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). for a string comparison

for Country in energy:
if energy[Country] == 'United States':
This is traversing a DataFrame in pandas called energy. energy has all the countries listed in alphabetical order with country as a column. df energy It always gives me a ValueError if I write it out in an if statement but if I just do return it works.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-61-e820508b0b91> in <module>()
20
21 return energy
---> 22 answer_one()
<ipython-input-61-e820508b0b91> in answer_one()
16
17 for Country in energy:
---> 18 if energy[Country] == 'United States':
19 return
20
/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.__class__.__name__))
893
894 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please have a try:
{
a = energy[Country] == 'United States'
if a.any(): # if any one is Ture, return True'
if a.all(): # if the all is Ture, return True. else,return False'
}
Maybe it can help you.
You probably want if Country == 'United States':, not if energy[Country] == 'United States':. The latter compares the values in the 'United States' column to the string 'United States' (and returns a series).

add one column (subtraction) to a dataframe with python

I try to add one column in a dataframe df2 which contains the value 0 if(df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0 else df2['P_ACT_KW']- df2['P_SOUSCR'].
if (df2['P_ACT_KW'] - df2['P_SOUSCR']) <0:
df2['depassement']=0
else:
df2['depassement']= (df2['P_ACT_KW'] - df2['P_SOUSCR'])
I got this error message :
ValueError Traceback (most recent call
last) in ()
----> 1 if (df2['P_ACT_KW'] - df2['P_SOUSCR']) <0:
2 df2['depassement']=0
3 else:
4 df2['depassement']= (df2['P_ACT_KW'] - df2['P_SOUSCR'])
C:\Users\Demonstrator\Anaconda3\lib\site-packages\pandas\core\generic.py
in nonzero(self)
890 raise ValueError("The truth value of a {0} is ambiguous. "
891 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892 .format(self.class.name))
893
894 bool = nonzero
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
Any idea please?
Thank you
IIUC:
df2['depassement'] = df2['P_ACT_KW'] - df2['P_SOUSCR']
df2[df2['depassement'] < 0, 'depassement'] = 0
This should also work:
df2['depassement'] = df2.P_ACT_KW.sub(df2.P_SOUSCR).apply(lambda x: max(x, 0))
You need to do:
df2['depassement'] = np.where((df2['P_ACT_KW'] - df2['P_SOUSCR']) < 0), 0, df2['P_ACT_KW'] - df2['P_SOUSCR'])
if doesn't understand how to compare array like structures hence the error, here we can use np.where to compare all rows to produce a mask and where the condition is true set to 0 else perform the subtraction

Categories

Resources