I get an error stating
ValueError:The truth value of a series is ambiguous for the if
condition.
with the the following function:
for i , row in train_news.iterrows():
if train_news.iloc[:,0].isin(['mostly-true','half-true','true']):
train_news.iloc[:,0] = "true"
else :
train_news.iloc[:,0] = "false"
The problem is in your if statement -
if train_news.iloc[:,0].isin(['mostly-true','half-true','true'])
Think about what this does -
Let's say train_news.iloc[:,0] looks like this -
mostly-true
not-true
half-true
Now if you do train_news.iloc[:,0].isin(['mostly-true','half-true','true']), this will check iteratively whether each element is present in the list ['mostly-true','half-true','true']
So, this will yield another pandas.Series which looks like this -
True
False
True
The if statement in python, being the simpleton, expects one bool value and you are just confusing it by providing a bunch of boolean values. So, either you need to use .all() or .any() (those are the usual to-do things) at the end depending upon what you want
Related
in a given dataframe in pandas, is there a way to see all the Booleans present in filt in the code below:
filt = dataframe['tag1'] =='ABC'
filt
TLDR
It's possible. I think you should use indexing, it's extensively described here. To be more specific you can use boolean indexing.
Code should look like this
filt = df[df.loc[:,"tag1"] == 'ABC]
Now what actually happens here
df.loc[:,"tag1"] returns all rows : character, but limits columns to just "tag1". Next df.loc[:,"tag1"] == 'ABC comperes returned rows with value "ABC", as the result grid of True/False will be created. True row was equal to "ABC" etc. Now the grand final. Whenever you pass grid of logical values to an dataframe they are treated as indicators whether or not to include the result. So let's say value at [0,0] in passed grid is True, therefore it will be included in the result.
I understand it's hard to wrap one's head around this concept but once you get it it's super useful. The best is to just play around with this iloc[] and loc[] functions.
I want to loop over all the rows in a df, checking that two conditions hold and, if they do, replace the value in a column with something else. I've attempted to do this two different ways:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & np.isnan(sales.iloc[idx]['traceable_blend']):
sales.iloc[idx]['traceable_blend'] = False
and:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & (sales.iloc[idx]['traceable_blend'] == np.NaN):
sales.iloc[idx]['traceable_blend'] = False
By including print statements we've verified that the if statement is actually functional, but no assignment ever takes place. Once we've run the loop, there are True and NaN values in the 'traceable_blend' column, but never False. Somehow the assignment is failing.
It looks like this might've worked:
if (sales.iloc[idx]['shelf'] in ("DRY BLENDS","LIQUID BLENDS")) & np.isnan(sales.iloc[idx]['traceable_blend']):
sales.at[idx, 'traceable_blend'] = False
But I would still like to understand what's happening.
This, sales.iloc[idx]['traceable_blend']=False, is index chaining, and will almost never work. In fact, you don't need to loop:
sales['traceable_blend'] = sales['traceable_blend'].fillna(sales['shelf'].isin(['DRY BLENDS', 'LIQUID BLENDS']))
Pandas offers two functions for checking for missing data (NaN or null): isnull() and notnull() - They return a boolean value. I suggest to try these instead of isnan()
You can also determine if any value is missing in your series by chaining .values.any()
Trying to create multiple lists that are dependent on the previous list.
So for example list 1 would read a specific file and return either a number or the boolean false based on a comparison.
The second list would then compare the number that appears in the same position as those in the previous list (if the value from the previous list was not false) and return the value or false based on the same comparison as the first list
I created a function that carries out these comparisons and creates a list
def generic_state_machine(file,obs_nums):
return file.ix[:,0][obs_nums] if file.ix[:,0][obs_nums] > 0.2 else False
Note: obs_nums looks at the position of the item in a list
I then created the lists that look at different files
session_to_leads = []
lead_to_opps = []
for i in range(1,len(a)):
session_to_leads.append(generic_state_machine(file=a,obs_nums=i))
lead_to_opps.append(generic_state_machine(file=b,obs_nums=i)) if session_to_leads != False else lead_to_opps.append(False)
Given
a = pd.DataFrame([0,0.9,0.6,0.7,0.8])
b = pd.DataFrame([0.7,0.51,0.3,0.7,0.2])
I managed to sort out the initial error I encountered, the only problem now is that list lead_to_opps is not dependent on session_to_leads so if there is a False value in position 1, lead_to_opps will not automatically return a False in the same position. So assuming that random.uniform(0,1) generates 0.5 all the time, this is my current outcome:
session_to_leads = [False,0.9,0.6,0.7,0.8]
lead_to_opps = [0.7,0.51,False,0.7,False]
whereas my desired outcome would be
session_to_leads = [False,0.9,0.6,0.7,0.8]
lead_to_opps = [False,0.51,False,0.7,False]
"During handling of the above exception, another exception occurred:"
This is not an error, this is basically "based on the previous error, this new error occurred.
Please post the error before this one, it will help a lot.
Also, I did not got what is [obs_nums]
It looks like
file.ix[:, 1][obs_nums]
Is the problem, assuming .ix behaves like .loc (it seems .ix is deprecated)
>>> help(pd.Dataframe.loc)
Allowed inputs are...
- A slice object with labels, e.g. 'a':'f'
warning:: Note that contrary to usual python slices,
**both** the start and the stop are included
It's a bit difficult to follow the indexing but do you need to slice at all? Would just:
file.loc[obs_nums]
return the number or Boolean you are looking for?
For a given series, e.g.
s = pd.Series([0,0,0])
I would like to check whether ALL elements in this series are equal to a specific value (we can use 0 in this example) and return TRUE if that is the case, and FALSE otherwise.
is there a handy way to do those in Pandas/numpy?
You should use the following syntax:
s = pd.Series([0,0,0])
print(s.eq(0).all())
True
Another way to do the same would be:
print((s==0).all())
I want to know if any of the values contained in a 1024 length array are greater than the value 1.2. I've found the median value of the array and its 1.1, so I know the array contains values that are higher and lower than 1. The code that I'm using is shown below and the resulting message i'm getting is "No signal present".
if in1_norm.any()>=1.2: ## Comparison of array to threshold. Using
## a generic value for now
print "A signal is present"
else:
print "No signal is present"
I've read in a previous post that any() evaluates as a value of 1 or "true, so, I believe I'm not getting the correct result because the comparison is viewed as 1>=1.2, which is false. Is there any other way of doing this??
Thanks
The part in1_norm.any()>=1.2 will not do what you're intended. The any() function returns True if any of the array's items can be evaluated as True otherwise it will return False. You need to first compare your items with 1.2 then call the any on the results.
(in1_norm >= 1.2).any()