What is the expected effect of where in numpy's negative? - python

As far as I understand the documentation of numpy's negative function, its where option allows you to leave some array components unnegated:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1., 2.],
[-3., 4.],
[-5., 6.]])
However, when I try it, it seems that those values are (almost) zeroed instead:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1.00000000e+000, 6.92885436e-310],
[-3.00000000e+000, 6.92885377e-310],
[-5.00000000e+000, 6.92885375e-310]])
So how should I see the where option?

The documentation describes where like this:
Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.
Let's try an example using the out parameter:
x = np.ones(3)
np.negative(np.array([4.,5.,6.]), where=np.array([False,True,False]), out=x)
This sets x to [1., -5., 1.], and returns the same.
This makes some amount of sense once you realize that "leave the value in the output alone" literally means the output value is "don't care", rather than "same as the input" (the latter interpretation was how I read it the first time, too).
The problem comes in when you specify where but not out. Apparently the "ufunc machinery" (which is not visible in the implementation of np.negative()) creates an empty output array, meaning the values are indeterminate. So the locations at which where is False will have uninitialized values, which could be anything.
This seems pretty wrong to me, but there was a NumPy issue filed about it last year, and closed. It seems unlikely to change, so you'll have to work around it (e.g. by creating the output array yourself using zeros).

Related

How to modify a Numpy 2D array in every row but specific column indexes?

This is what I am doing now to achieve what I want.
In:
a=numpy.zeros((3,2))
a[range(a.shape[0]),[0,0,1]] = 1
a
Out:
array([[ 1., 0.],
[ 1., 0.],
[ 0., 1.]])
As you can see, I used range function to select all the rows in a. Is there any other cleaner way to select every row?
Doing this specific thing can be done more cleanly, as you're just constructing a one-hot array, and there are good answers for "clean" ways to do this in numpy. I recommend this one by #MartinThoma:
a = np.eye(2)[[0,0,1]]
Also, most machine learning packages have their own one-hot encoding method that will be even more efficient and "clean" than this one, as machine learning is the most common use of one-hot encoding.
However, in general eliminating the range coordinate when doing this sort of fancy indexing is not possible. There really isn't a clear, explicit way to represent that operation that wouldn't be confusing when you want to extend the representation beyond 2 dimensions. And working in more than 2 dimensions at a time is the whole point of numpy

Testing the equality of two numpy 2d arrays

I have been trying to copy the individual elements from one 2D array to another. My code is as follows:
tp_matrix = np.array(tp_matrix)
my_array = np.empty(shape = (tp_matrix.shape))
for x in range(tp_matrix.shape[0]):
for y in range(tp_matrix.shape[1]):
my_array[x][y] = tp_matrix[x][y]
if(np.array_equal(my_array, tp_matrix)):
print('Equal')
else:
print('Not equal')
However the two arrays are not equal for some reason. What is the problem here and what can I do to solve it?
I cannot use numpy's copy function as I want to make modifications later to some of the elements from my_array later with the other values being the same as that of my_matrix.
Edit: On running the code I get the following message:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
Does this mean there is something wrong with the dataset (tp_matrix)?
Edit 2: I have tried the allclose and isclose functions but I get this error:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The data is stored as floats. Also it is a bit large (399 x 5825).
Edit 3: Solved. I had to reinstall python.
Use np.allclose to test the (almost) equality of float arrays, because of the way float numbers are represented in a computer.
For more details, you could read for instance "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
I tried to mimic what you are experiencing and did the following:
one = np.array([1,2,3])
two = np.array([one,one,one])
three = np.empty(shape=(two.shape))
for x in range(two.shape[0]):
for y in range(two.shape[1]):
three[x][y] = two[x][y]
Printing the contents of 'two' and 'three' gives the following result
print(three)
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
print(two)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Although for this small example numpy returns True if I test equality using np.array_equal, it is possible that rounding errors cause the test to be False in your case.
A workaround for this could be the following test:
sum(sum(two==three)) == two.shape[0]*three.shape[1]
Although there are probably more efficient ways.

How do I efficiently fill a numpy ndarray with a function called at each row and column?

I want to create a numpy ndarray by specifying a function of row and column to define the values.
For example, I'd like to be able to do something like this (completely incorrect syntax, but you should get the idea):
>>> np.ndarray((2,3), lambda r,c: 3*r+c)
[[ 0 1 2]
[ 3 4 5]]
Searching online has yielded nothing, though I've had trouble thinking of exactly how to search for it...
Right now I've got (the equivalent of) the following code, but it's horribly inefficient:
def ndarrayFuncFill(size, func):
z = np.empty(size)
for r in np.arange(size[0]):
for c in np.arange(size[1]):
z[r][c] = func(r,c)
return z
>>> ndarrayFuncFill((2,3), lambda r,c: 3*r+c)
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
Unfortunately, the function I particularly want to use this with right now is not something I can easily rewrite as a ufunc or anything like that. I pretty much have to treat it as a black box.
The function I'm actually interested in using this with (not something so simple as the above lambda), is not one I have permission to post. However, it essentially does interpolation on a lookup table. So you give it a row and column, and then it translates that to indices in a lookup table -- but there's some tricky stuff going on where it's not just a one-to-one lookup, it sometimes does a combination of 'nearby' values, and that sort of thing. So it's not the most efficient function either, but I'd rather not have too many other silly sources of waste like nested for-loops.
Any suggestions?
You could try using index arrays.
For your simple example, using np.indices you could do something like:
import numpy as np
r, c = 2, 3
a = np.empty((r, c))
b = np.indices((r, c))
a[b[0], b[1]] = 3 * b[0] + b[1]
So then we have:
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
The fastest solution for your particular example is np.arange(6).reshape(3, 2). In general you could use np.vectorize for 1D arrays and then reshape if necessary, but this isn't optimized ("The implementation is essentially a for loop.").

index Nd array with list of length N in python

This seems like a simple problem but I can't figure it out.
I have a numpy array of an arbitrary dimension (rank) N. I need to set a single element in the array to 0 given by the index values in a 1D array of length N. So for example:
import numpy as np
A=np.ones((2,2,2))
b=[1,1,1]
so at first I thought
A[b]=0
would do the job, but it did not.
If I knew A had a rank of 3 it would be a simple case of doing this:
A[b[0],b[1],b[2]]=0
but the rank of A is not known until runtime, any thoughts?
Indexing in numpy has somewhat complicated rules. In your particular case this warning applies:
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this is occurs.
Also recognize that x[[1,2,3]] will trigger advanced indexing, whereas x[[1,2,slice(None)]] will trigger basic slicing.
You want simple indexing (addressing a particular element), so you'll have to cast your list to a tuple:
A[tuple(b)] = 0
Result:
>>> A
array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 0.]]])

In numpy, how can I assign an array of size N to a larger array with a boolean mask

I want to assign a matrix N values long to N entries in a column of a much longer matrix, where a a boolean mask selects N entries. I am doing it wrong, because the large matrix remains unchanged. Please, see the next example:
Each entry in a large matrix contains a timestamp, a valid flag and an empty field to be filled with the time since the previous valid entry. I want to compute these time lapses:
a = np.array([(0,0,0),
(1,0,0),
(2,1,0),
(3,1,0),
(4,1,0),
(5,0,0),
(6,0,0),
(7,0,0),
(8,1,0),
(9,1,0)],
dtype=np.dtype([('time', '<i4'), ('ena', '|b1'), ('elapsed', '<i4')]))
To calculate time difference with previous unmasked entries:
elapsed = a[a['ena']]['timestamp'][1:] - a[a['ena']]['timestamp'][0:-1]
elapsed will be [1,1,4,1], (which is what I wanted).
Now I want to write elapsed seconds to the original array:
a[a['ena']]['step_secs'][1:] = timestep
there is no warning or error, but a remains unchanged, although I expected:
a = np.array([
(0,0,0),
(1,0,0),
(2,1,0),
(3,1,1),
(4,1,1),
(5,0,0),
(6,0,0),
(7,0,0),
(8,1,4),
(9,1,1)]
How should I do it? Many thanks.
The numpy folks have done some amazing magic to make fancy indexing (which includes boolean indexing) work as well as it does. This magic is pretty impressive but, it still cannot handle fancy indexing followed by more indexing on the left side of the assignment, for example a[fancy][index2] = something. Here is a simple example:
>>> a = np.zeros(3)
>>> b = np.array([True, False, True])
>>> a[b][1:] = 2
array([ 0., 0., 0.])
>>> a[1:][b[1:]] = 2
array([ 0., 0., 2.])
I think this is a bug, and I wonder if it is possible to catch it and raise an error instead of letting it silently fail. But getting back to your question, the easiest solution seems to be to replace:
a[a['ena']]['step_secs'][1:] = timestep
with:
tmp = a['ena'][1:]
a['step_secs'][1:][tmp] = timestep
or maybe:
a['step_secs'][1:][a['ena'][1:]] = timestep

Categories

Resources