Using any() function for value of 1.5 - python

I want to know if any of the values contained in a 1024 length array are greater than the value 1.2. I've found the median value of the array and its 1.1, so I know the array contains values that are higher and lower than 1. The code that I'm using is shown below and the resulting message i'm getting is "No signal present".
if in1_norm.any()>=1.2: ## Comparison of array to threshold. Using
## a generic value for now
print "A signal is present"
else:
print "No signal is present"
I've read in a previous post that any() evaluates as a value of 1 or "true, so, I believe I'm not getting the correct result because the comparison is viewed as 1>=1.2, which is false. Is there any other way of doing this??
Thanks

The part in1_norm.any()>=1.2 will not do what you're intended. The any() function returns True if any of the array's items can be evaluated as True otherwise it will return False. You need to first compare your items with 1.2 then call the any on the results.
(in1_norm >= 1.2).any()

Related

How to drop elements from a series by using a Pandas for loop index as the index parameter for the drop function?

I am attempting to run a loop that filters certain elements based on a condition and removes those that match, as shown below:
for index, value in enumerate(some_dataset.iloc):
if min(some_dataset.iloc[index]) >= some_dataset.iloc[0].values[index]:
dataset_filtered = some_dataset.drop(index=index)
However, the value being passed to the index parameter in the variable index does not seem to behave as an integer. Instead, I receive the following error for the first value that attempts to be dropped:
KeyError: '[1] not found in axis'
Thinking it was a Series element, I attempted to cast it as an integer by setting index = index.astype(int) in the parameters for the drop() function, but in this case, it does seem to behave as an integer, producing the following error message:
AttributeError: 'int' object has no attribute 'astype'
To solve this problem, I looked at Anton Protopopov's answer to this question asked by jjjayn, but it did not help in my situation as specific elements were referenced in place of an iterating index.
For context, the if statement is in place to filter out any samples whose lowest values are at the 0th index (thus, where the min() value of a sample transect is equal to the value at index 0. Essentially, it would tell me that values in the sample only grow larger for increasing x, which here is wavelength. When I print a table to see which samples this applies to, the results are what I expect (100 nm wavelengths are the 0th index):
Sample Value (100 nm) Value (minima) Min (λ)
#2 0.0050 0.0050 100
#3 0.0060 0.0060 100
#14 0.0025 0.0025 100
...
So, with these results printed, I don't think the condition is the issue. Indeed, the first index that should be getting dropped is also one that I'd expect to be dropped -- sample 2, which corresponds to [1], is getting passed, but I think the brackets are being passed along with it (at least, that's my guess). So in sum, the issue is that a single-element list/series [n] is being passed to the index parameter instead of the integer, n, which is what I want.
Answer fully rewritten due to new information. See diff for previous versions.
I reproduced your error. I had initially thought it was something related to an indexing error, which the following code would have forced to occur.
df = pd.DataFrame({'A': range(0, 3)})
for i, r in enumerate(df.iloc):
print(i, type(r), df.iloc[0].values[i])
This code will throw an IndexError, complaining that the value in i is greater than the number of columns present. But the error that you report, a KeyError, only occurs if you try something like this:
>>> df.drop(index=15)
KeyError: '[15] not found in axis'
The reason for this is because when you drop with the index keyword, it is not dropping based on your iloc indexer (ie 0 ... n) but rather on the standard loc indexer, which can be in arbitrary order and have missing values etc. The underlying reason for why the integer 15 is turned into [15] is because it is automatically wrapped into a list by the error print line: raise KeyError(f'{ list(labels[mask]) } not found in axis').
Use only one form of indexer. In this instance, I would use for i, row in df.iterrows() rather than enumerating over the df.iloc property.

Positional comparison in multiple lists

Trying to create multiple lists that are dependent on the previous list.
So for example list 1 would read a specific file and return either a number or the boolean false based on a comparison.
The second list would then compare the number that appears in the same position as those in the previous list (if the value from the previous list was not false) and return the value or false based on the same comparison as the first list
I created a function that carries out these comparisons and creates a list
def generic_state_machine(file,obs_nums):
return file.ix[:,0][obs_nums] if file.ix[:,0][obs_nums] > 0.2 else False
Note: obs_nums looks at the position of the item in a list
I then created the lists that look at different files
session_to_leads = []
lead_to_opps = []
for i in range(1,len(a)):
session_to_leads.append(generic_state_machine(file=a,obs_nums=i))
lead_to_opps.append(generic_state_machine(file=b,obs_nums=i)) if session_to_leads != False else lead_to_opps.append(False)
Given
a = pd.DataFrame([0,0.9,0.6,0.7,0.8])
b = pd.DataFrame([0.7,0.51,0.3,0.7,0.2])
I managed to sort out the initial error I encountered, the only problem now is that list lead_to_opps is not dependent on session_to_leads so if there is a False value in position 1, lead_to_opps will not automatically return a False in the same position. So assuming that random.uniform(0,1) generates 0.5 all the time, this is my current outcome:
session_to_leads = [False,0.9,0.6,0.7,0.8]
lead_to_opps = [0.7,0.51,False,0.7,False]
whereas my desired outcome would be
session_to_leads = [False,0.9,0.6,0.7,0.8]
lead_to_opps = [False,0.51,False,0.7,False]
"During handling of the above exception, another exception occurred:"
This is not an error, this is basically "based on the previous error, this new error occurred.
Please post the error before this one, it will help a lot.
Also, I did not got what is [obs_nums]
It looks like
file.ix[:, 1][obs_nums]
Is the problem, assuming .ix behaves like .loc (it seems .ix is deprecated)
>>> help(pd.Dataframe.loc)
Allowed inputs are...
- A slice object with labels, e.g. 'a':'f'
warning:: Note that contrary to usual python slices,
**both** the start and the stop are included
It's a bit difficult to follow the indexing but do you need to slice at all? Would just:
file.loc[obs_nums]
return the number or Boolean you are looking for?

Why are floating point numbers rounded automatically in python?

I am trying to assign a certain list position value to a certain value in another list. The value in the second list is 24.199999999999996. However, when i assign that value to a certain index value in the first list I get the value 24.2, when I print it. How do I keep the value as it is?
The value is getting rounded when printed for display purposes by default. The actual value in the list does not change. If you actually check it:
value == 24.2
False will be returned.
If you do not want this printing behaviour you have to round your numbers to a reasonable precision. You can use the built-in function 'round'.
>>> round(1.33333333, 5)
1.33333

The truth value of a series is ambiguous: use

I get an error stating
ValueError:The truth value of a series is ambiguous for the if
condition.
with the the following function:
for i , row in train_news.iterrows():
if train_news.iloc[:,0].isin(['mostly-true','half-true','true']):
train_news.iloc[:,0] = "true"
else :
train_news.iloc[:,0] = "false"
The problem is in your if statement -
if train_news.iloc[:,0].isin(['mostly-true','half-true','true'])
Think about what this does -
Let's say train_news.iloc[:,0] looks like this -
mostly-true
not-true
half-true
Now if you do train_news.iloc[:,0].isin(['mostly-true','half-true','true']), this will check iteratively whether each element is present in the list ['mostly-true','half-true','true']
So, this will yield another pandas.Series which looks like this -
True
False
True
The if statement in python, being the simpleton, expects one bool value and you are just confusing it by providing a bunch of boolean values. So, either you need to use .all() or .any() (those are the usual to-do things) at the end depending upon what you want

Empty numpy array boolean contradiction [duplicate]

This question already has answers here:
How do Python's any and all functions work?
(10 answers)
Closed 4 years ago.
I accidentally found something in Numpy, which I can't really understand. If I check an empty Numpy array for any true value
np.array([]).any()
it will evaluate to false, whereas if I check all values to be true
np.array([]).all()
it evaluates to true. This appears weird to me since no value is true but at the same time all values are true...
This isn't a bug, it returns True because all values are not equal to zero which is the criteria for returning True see the following note:
Not a Number (NaN), positive infinity and negative infinity evaluate
to True because these are not equal to zero.
compare with the following:
In[102]:
np.array([True,]).all()
Out[102]: True
This would be equivalent to an array full of NaN which would return True
The logic you are seeing is not specific to NumPy. This is a Python convention which has been implemented in NumPy:
any returns True if any value is True. Otherwise False.
all returns True if no value is False. Otherwise True.
See the pseuedo-code in the docs to see the logic in pure Python.
In the case of np.array([]).any() or any([]), there are no True values, because you have a 0-dimensional array or a 0-length list. Therefore, the result is False.
In the case of np.array([]).all() or all([]), there are no False values, because you have a 0-dimensional array or a 0-length list. Therefore, the result is True.
This is a normal behavior.
It is not possible to find a value that is true, so np.array([]).any() is False
For every value in the array, this value is False (It is easy to check, because there are no values in the array, so you don't have to check anything).

Categories

Resources