Why boolean output turns to numerical output once I create a loop? - python

I am using skiimage library, I got it works correctly for different input files data.
Here the code that is working:
To explain it briefly, alpha_time is a level set function structured as [time,x,y,z], so alpha_time[0,:,:,:] is the level set function at time = 0.
gaz0 = alpha_0[0,:,:,:] >= 0
labels_gaz0 = measure.label(gaz0)
props_gaz0 = measure.regionprops_table(labels_gaz0,properties
['label','area'])
df0 = pandas.DataFrame(props_gaz0)
This code works correctly.
Now, rather than repeating it each time, I create a for loop to loop over time. I started with this line (let's say I have 10 files, let's say that the shape of gaz0 was (11,12,13):
gaz = numpy.zeros(time,11,12,13)
for counter in range(0,10):
gaz[counter,:,:,:] = alpha_time[counter,:,:,:] >=0
I did not have an error output, however when I do print(gaz[counter,:,:,:] I have a numerical matrix, ....
and when I do print(gaz0) I have a boolean output True when alpha_0[0,:,:,:] >= 0 and False elsewhere
I think that my output from the loop should be similar to the example before looping. I couldn't find from where this is coming from?

The problem is that gaz is defined as a numpy array of floats (default dtype for np.zeros function), while gaz0 is a boolean mask computed when you define it, so, it contains booleans on it.
If you want gaz0 to contain floats instead of booleans, you need to cast it as follows:
gaz0 = gaz0.astype(np.float64)
Note that you can cast it to whatever dtype you need. Consider that True values are casted to 1 and False to zero.
This is exactly the rule being applied implicitly in your assignment:
gaz[counter,:,:,:] = alpha_time[counter,:,:,:] >=0
In the second case, if you want to get booleans in your gaz numpy array, you just need to define with the proper dtype:
gaz = numpy.zeros((time,11,12,13), dtype=bool)

Related

Assigning values to subset of an array defined by multiple conditions

I want to assign values to part of an array which is specified by multiple conditions.
For example:
import numpy as np
temp = np.arange(120).reshape((2,3,4,5))
mask = temp > 22
submask = temp[mask] < 43
(temp[mask])[submask] = 0 # assign zeros to the part of the array specified via mask and submask
print(temp) # notice that temp is unchanged - I want it to be changed
This example is artificial. Generally I want to do something more complex which involves a combination of indexing and boolean masks. Using a list index fails in similar circumstances. For example: temp[:,:,0,[1,3,2]]=0 is a valid assignment, but temp[:,:,0,[1,3,2]][mask]=0 will fail.
My understanding is that the assignment is failing because the complex indexing is prompting numpy to make a copy of the array object and assigning to that, rather than to the original array. So it isn't that the assignment is failing per se, just that the assignment is directed towards the "wrong" object.
I have tried using functions such as np.copyto and np.putmask but these also fail, presumably because the backend implementation mimics the original problem. For example: np.putmask(temp, mask, 0) will work but np.putmask(temp[mask], submask, 0) will not.
Is there a good way to do what I want to do?

What Is the Function of Less Than (<) Operator in numpy Array?

I'm learning Python right now and I'm stuck with this line of code I found on the internet. I can not understand what actually this line of code do.
Suppose I have this array:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[np.sqrt(x[:,0]**2+x[:,1]**2) < 1]
print (y)
The result is an empty array. What I want to know is what does actually the y do? I've never encountered this kind of code before. It seems like the square brackets is like the if-conditional statement. Instead of that code, If write this line of code:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[0 < 1]
print (y)
It will return exactly what x is (because zero IS less than one).
Assuming that it is a way to write if-conditional statement, I find it really absurd because I'm comparing an array with an integer.
Thank you for your answer!
In Numpy:
[1,1,2,3,4] < 2
is (very roughly) equivalent to something like:
[x<2 for x in [1,1,2,3,4]]
for vanilla Python lists. And as such, in both cases, the result would be:
[True, True, False, False, False]
The same holds true for some other functions, like addition, multiplication and so on. Broadcasting is actually a major selling point for Numpy.
Now, another thing you can do in Numpy is boolean indexing, which is providing an array of bools that are interpreted as 'Keep this value Y/N?'. So:
arr = [1,1,2,3,4]
res = arr[arr<2]
# evaluates to:
=> [1,1]
numpy works differently when you slice an array using a boolean or an int.
From the docs:
This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators. A single
boolean index array is practically identical to x[obj.nonzero()]
where, as described above, obj.nonzero() returns a tuple (of length
obj.ndim) of integer index arrays showing the True elements of obj.
However, it is faster when obj.shape == x.shape.
If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled
with the elements of x corresponding to the True values of obj. The
search order will be row-major, C-style. If obj has True values at
entries that are outside of the bounds of x, then an index error will
be raised. If obj is smaller than x it is identical to filling it with
False.
When you index an array using booleans, you are telling numpy to select the data corresponding to True, therefore array[True] is not the same as array[1]. In the first case, numpy will therefore interpret it as a zero dimensional boolean array, which, based on how masks works, is the same as selecting all data.
Therefore:
x[True]
will return the full array, just as
x[False]
will return an empty array.

Using np.any() to capture outliers

I am trying to pull all upper and lower outliers from a data frame. I can do it in separate lines, assigning a variable to the uppers (ex. hi_outs = (sepal_outliers > 4.05)) and another variable to the lowers (ex. lo_outs = (sepal_outliers < 2.05)). But I wanted to try and grab both in one variable.
sepal_outliers = x[:,1]
outliers = np.any(sepal_outliers < 2.05, sepal_outliers > 4.05)
df[outliers]
I'm not sure why I've gotten the following error.
TypeError: only integer scalar arrays can be converted to a scalar index
any thoughts? How might I make this work? I'll keep reading up in the meantime...
You probably want to use NumPy's boolean indexing:
outliers = sepal_outliers[(sepal_outliers < 2.05) | (sepal_outliers > 4.05)]
That is, construct the boolean array of True or False for each element of the condition you set and index into the same array, sepal_outliers with it.
Note that for this, you need sepal_outliers to be a NumPy array rather than a common-or-garden Python list.

Creating logical array from numpy array

I have a very large numpy array in Python full of meteorological data. In order to observe flawed data, I would like to look at every value and test it to see if it is less than -1. Eventually I would like to represent this with a logical array of 0's and 1's with 1 representing indices where the value is less than -1 and zeros representing all others. I have tried using the numpy.where funtion as follows
logarr = np.where(metdat < -1)
which returns the original array and the array of zeros for when this condition is true (around 200 times). I have tried using the numpy.where syntax laid out in Sci.Py.org where
logarr = np.where(metdat < -1 [1,0])
but my program dislikes the syntax. What am I doing wrong or would anyone recommend a better way of going about this?
Thanks,
jmatt
This works for your case, which directly converts the type from logical to int:
(matdat < -1).astype(int)
Or for np.where, the syntax needs to be:
np.where(matdat < -1, 1, 0)

Getting numbers within a range from a gaussian_kde_resample array

I have a gaussian_kde.resample array. I don't know if it is a numpy array so that I can use numpy functions.
I had the data 0<x<=0.5 of 3000 variables and I used
kde = scipy.stats.gaussian_kde(x) # can also mention bandwidth here (x,bandwidth)
sample = kde.resample(100000) # returns 100,000 values that follow the prob distribution of "x"
This gave me a sample of data that follows the probability distribution of "x". But the problem is, no matter what bandwidth I try to select, I get very few negative values in my "sample". I only want values within the range 0 < sample <= 0.5
I tried to do:
sample = np.array(sample) # to convert this to a numpy array
keep = 0<sample<=0.5
sample = sample[keep] # using the binary conditions
But this does not work! How can I remove the negative values in my array?
Firstly, you can check what type it is by using the 'type' call within python:
x = kde.resample(10000)
type(x)
numpy.ndarray
Secondly, it should be working in the way you wrote, but I would be more explicit in your binary condition:
print x
array([[ 1.42935658, 4.79293343, 4.2725778 , ..., 2.35775067, 1.69647609]])
x.size
10000
y = x[(x>1.5) & (x<4)]
which you can see, does the correct binary conditions and removes the values >1.5 and <4:
print y
array([ 2.95451084, 2.62400183, 2.79426449, ..., 2.35775067, 1.69647609])
y.size
5676
I know I'm answering about 3 years late, but this may be useful for future reference.
The catch is that while kde.resample(100000) technically returns a NumPy array, this array actually contains another array(!), and that gets in the way of all the attempts to use indexing to get subsets of the sample. To get the array that the resample() method probably should have returned all along, do this instead:
sample = kde.resample(100000)[0]
The array variable sample should then have all 100000 samples, and indexing this array should work as expected.
Why SciPy does it this way, I don't know. This misfeature doesn't even appear to be documented.
First of all, the return value of kde.resample is a numpy array, so you do not need to reconvert it.
The problem lies in the line (Edit: No, it doesn't. This should work!)
keep = 0 < sample <= 0.5
It does not do what you would think. Try:
keep = (0 < sample) * (sample <= 0.5)

Categories

Resources