NumPy masking issue -- What am I missing?

NumPy masking issue -- What am I missing? - python

I'm plotting diet information using matplotlib, where the x-axis represents a range of dates, and the y-axis represents the number of calories consumed. Not too complicated that, but there is one snag: not all dates have calorie information, and it would make most sense to leave those out rather than do some sort of interpolation/smoothing.
I found several good examples of using numpy masks for such situations, but it seems I'm not getting something straight, as the code that I think should produce the results I want doesn't change anything.
Have a look:
calories_list_ma = np.ma.masked_where(calories_list == 0, calories_list)
plt.plot(datetimes_list, calories_list_ma, marker = 'x', color = 'r', ls = '-')
Which produces this:
I just want there to be an unplotted gap in the line for 9-23.
And actually, I know my use of masked_where must be incorrect, because when I print calories_list_ma.mask, the result is 'False'. Not a list, as it should be, showing which values are masked/unmasked with True and False.
Can someone set me straight?
Thanks so much!

I'm guessing from the name that your calories_list is a list. If it is a list calories_list == 0 will return one value, namely False, since the list does not equal the value 0. masked_where will then dutifully set the mask to False, resulting in an unmasked copy of your list.
You need to do calories_list = np.array(calories_list) first to make it into a numpy array. Unlike lists, numpy arrays have the "broadcasting" feature whereby calories_list == 0 compares each element individually to zero.

try using
calories_list_ma = np.ma.masked[calories_list == 0]

Related

Why boolean output turns to numerical output once I create a loop?

I am using skiimage library, I got it works correctly for different input files data.
Here the code that is working:
To explain it briefly, alpha_time is a level set function structured as [time,x,y,z], so alpha_time[0,:,:,:] is the level set function at time = 0.
gaz0 = alpha_0[0,:,:,:] >= 0
labels_gaz0 = measure.label(gaz0)
props_gaz0 = measure.regionprops_table(labels_gaz0,properties
['label','area'])
df0 = pandas.DataFrame(props_gaz0)
This code works correctly.
Now, rather than repeating it each time, I create a for loop to loop over time. I started with this line (let's say I have 10 files, let's say that the shape of gaz0 was (11,12,13):
gaz = numpy.zeros(time,11,12,13)
for counter in range(0,10):
gaz[counter,:,:,:] = alpha_time[counter,:,:,:] >=0
I did not have an error output, however when I do print(gaz[counter,:,:,:] I have a numerical matrix, ....
and when I do print(gaz0) I have a boolean output True when alpha_0[0,:,:,:] >= 0 and False elsewhere
I think that my output from the loop should be similar to the example before looping. I couldn't find from where this is coming from?

The problem is that gaz is defined as a numpy array of floats (default dtype for np.zeros function), while gaz0 is a boolean mask computed when you define it, so, it contains booleans on it.
If you want gaz0 to contain floats instead of booleans, you need to cast it as follows:
gaz0 = gaz0.astype(np.float64)
Note that you can cast it to whatever dtype you need. Consider that True values are casted to 1 and False to zero.
This is exactly the rule being applied implicitly in your assignment:
gaz[counter,:,:,:] = alpha_time[counter,:,:,:] >=0
In the second case, if you want to get booleans in your gaz numpy array, you just need to define with the proper dtype:
gaz = numpy.zeros((time,11,12,13), dtype=bool)

Extracting the maximum element of an array by index based on conditional of other arrays

I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, I could not sort it out by my own.
I made up the following examples, which naturally represents a way simpler scenario of what I have been working on, hence I am looking for a applicable general solution to other cases. So, please, consider:
import numpy as np
x = np.array([300,300,450,500,510,750,600,300])
x_validate1 = np.array([0,27,3,4,6,4,13,5])
x_validate2 = np.array([0,27,3,4,6,4,3,5])
x_validate3 = np.array([0,7,3,14,6,16,6,5])
x_validate4 = np.array([0,3,3,5,7,4,9,5])
What I need is to extract the maximum value in x whose same index in other arrays (x_validate1,2,3,4) represents elements between 5 and 10 (conditional), that means, if I wanted to pick up the maximum in the x array, it would logically be 750, however by applying this condition, what I want the script to return is 510, since for the other arrays, the condition is met.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!

Here's one approach:
# combine all the above arrays into one
a = np.array([x_validate1, x_validate2, x_validate3, x_validate4])
# check in which columns all rows satisfy the condition
m = ((a > 5) & (a < 10)).all(0)
# index x, and find the maximum value
x[m].max()
# 510

Creating an empty multidimensional array

In Python when using np.empty(), for example np.empty((3,1)) we get an array that is of size (3,1) but, in reality, it is not empty and it contains very small values (e.g., 1.7*(10^315)). Is possible to create an array that is really empty/have no values but have given dimensions/shape?

I'd suggest using np.full_like to choose the fill-value directly...
x = np.full_like((3, 1), None, dtype=object)
... of course the dtype you chose kind of defines what you mean by "empty"

I am guessing that by empty, you mean an array filled with zeros.
Use np.zeros() to create an array with zeros. np.empty() just allocates the array, so the numbers in there are garbage. It is provided as a way to even reduce the cost of setting the values to zero. But it is generally safer to use np.zeros().

I suggest to use np.nan. like shown below,
yourdata = np.empty((3,1)) * np.nan
(Or)
you can use np.zeros((3,1)). but it will fill all the values as zero. It is not intuitively well. I feel like using np.nan is best in practice.
Its all upto you and depends on your requirement.

Selecting data on median value

I want to select one row of an array by the median value in one of the columns.
My method does not work the way I expect it to work, and it could be related to the representation/precision of the value returned by the numpy.median() function.
Here is a minimal working example and a workaround that I found:
import numpy as np
# Create an array with random numbers
some_array = np.random.rand(100)
# Try to select
selection = (some_array == np.median(some_array))
print len(some_array[selection]),len(some_array[~selection]) # Gives: 0, 100 -> selection fails
# Work-around
abs_dist_from_median = np.abs(some_array-np.median(some_array))
selection = (abs_dist_from_median == np.min(abs_dist_from_median))
print len(some_array[selection]),len(some_array[~selection]) # Gives: 1, 99 -> selection succeeded
It seems that the np.median() function returns a different representation off the number, thereby leading to a mismatch in the selection.
I find this behaviour strange, since by definition the median value of an array should be contained in the array. Any help/clarification would be appreciated!

First, the number of values is even such as [1, 2, 3, 4]. the median is (2+3)/2 not 2 or 3. If you change 100 to 101, it works properly. So your second approach is more appropriate on your purpose.
However, the best solution seems to use argsort as
some_array[some_array.argsort()[len(some_array)/2]]
Also, do not use == when you compare two float values. use np.isclose instead.

python(numpy) -- create array and how to implement an expression

i have this :
npoints=10
vectorpoint=random.uniform(-1,1,[1,2])
experiment=random.uniform(-1,1,[npoints,2])
and now i want to create an array with dimensions [1,npoints].
I can't think how to do this.
For example table=[1,npoints]
Also, i want to evaluate this:
for i in range(1,npoints):
if experiment[i,0]**2+experiment[i,1]**2 >1:
table[i]=0
else:
table[i]=1
I am trying to evaluate the experiment[:,0]**2+experiment[:,1]**2 and if it is >1 then an element in table becomes 0 else becomes 1.
The table must give me sth like [1,1,1,1,0,1,0,1,1,0].
I can't try it because i can't create the array "table".
Also,if there is a better way (with list comprehensions) to produce this..
Thanks!

Try:
table = (experiment[:,0]**2 + experiment[:,1]**2 <= 1).astype(int)
You can leave off the astype(int) call if you're happy with an array of booleans rather than an array of integers. As Joe Kington points out, this can be simplified to:
table = 1 - (experiment**2).sum(axis=1).astype(int)
If you really need to create the table array up front, you could do:
table = zeros(npoints, dtype=int)
(assuming that you've already import zeros from numpy). Then your for loop should work as written.
Aside: I suspect that you want range(npoints) rather than range(1, npoints) in your for statement.
Edit: just noticed that I had the 1s and 0s backwards. Now fixed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NumPy masking issue -- What am I missing? - python

try using calories_list_ma = np.ma.masked[calories_list == 0]

Related

Why boolean output turns to numerical output once I create a loop?

Extracting the maximum element of an array by index based on conditional of other arrays

Creating an empty multidimensional array

Selecting data on median value

python(numpy) -- create array and how to implement an expression

Categories

Resources