index Nd array with list of length N in python - python

This seems like a simple problem but I can't figure it out.
I have a numpy array of an arbitrary dimension (rank) N. I need to set a single element in the array to 0 given by the index values in a 1D array of length N. So for example:
import numpy as np
A=np.ones((2,2,2))
b=[1,1,1]
so at first I thought
A[b]=0
would do the job, but it did not.
If I knew A had a rank of 3 it would be a simple case of doing this:
A[b[0],b[1],b[2]]=0
but the rank of A is not known until runtime, any thoughts?

Indexing in numpy has somewhat complicated rules. In your particular case this warning applies:
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this is occurs.
Also recognize that x[[1,2,3]] will trigger advanced indexing, whereas x[[1,2,slice(None)]] will trigger basic slicing.
You want simple indexing (addressing a particular element), so you'll have to cast your list to a tuple:
A[tuple(b)] = 0
Result:
>>> A
array([[[ 1., 1.],
[ 1., 1.]],
[[ 1., 1.],
[ 1., 0.]]])

Related

What is the expected effect of where in numpy's negative?

As far as I understand the documentation of numpy's negative function, its where option allows you to leave some array components unnegated:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1., 2.],
[-3., 4.],
[-5., 6.]])
However, when I try it, it seems that those values are (almost) zeroed instead:
>>> import numpy as np
>>> np.negative(np.array([[1.,2.],[3.,4.],[5.,6.]]), where=[True, False])
array([[-1.00000000e+000, 6.92885436e-310],
[-3.00000000e+000, 6.92885377e-310],
[-5.00000000e+000, 6.92885375e-310]])
So how should I see the where option?
The documentation describes where like this:
Values of True indicate to calculate the ufunc at that position, values of False indicate to leave the value in the output alone.
Let's try an example using the out parameter:
x = np.ones(3)
np.negative(np.array([4.,5.,6.]), where=np.array([False,True,False]), out=x)
This sets x to [1., -5., 1.], and returns the same.
This makes some amount of sense once you realize that "leave the value in the output alone" literally means the output value is "don't care", rather than "same as the input" (the latter interpretation was how I read it the first time, too).
The problem comes in when you specify where but not out. Apparently the "ufunc machinery" (which is not visible in the implementation of np.negative()) creates an empty output array, meaning the values are indeterminate. So the locations at which where is False will have uninitialized values, which could be anything.
This seems pretty wrong to me, but there was a NumPy issue filed about it last year, and closed. It seems unlikely to change, so you'll have to work around it (e.g. by creating the output array yourself using zeros).

How to modify a Numpy 2D array in every row but specific column indexes?

This is what I am doing now to achieve what I want.
In:
a=numpy.zeros((3,2))
a[range(a.shape[0]),[0,0,1]] = 1
a
Out:
array([[ 1., 0.],
[ 1., 0.],
[ 0., 1.]])
As you can see, I used range function to select all the rows in a. Is there any other cleaner way to select every row?
Doing this specific thing can be done more cleanly, as you're just constructing a one-hot array, and there are good answers for "clean" ways to do this in numpy. I recommend this one by #MartinThoma:
a = np.eye(2)[[0,0,1]]
Also, most machine learning packages have their own one-hot encoding method that will be even more efficient and "clean" than this one, as machine learning is the most common use of one-hot encoding.
However, in general eliminating the range coordinate when doing this sort of fancy indexing is not possible. There really isn't a clear, explicit way to represent that operation that wouldn't be confusing when you want to extend the representation beyond 2 dimensions. And working in more than 2 dimensions at a time is the whole point of numpy

Testing the equality of two numpy 2d arrays

I have been trying to copy the individual elements from one 2D array to another. My code is as follows:
tp_matrix = np.array(tp_matrix)
my_array = np.empty(shape = (tp_matrix.shape))
for x in range(tp_matrix.shape[0]):
for y in range(tp_matrix.shape[1]):
my_array[x][y] = tp_matrix[x][y]
if(np.array_equal(my_array, tp_matrix)):
print('Equal')
else:
print('Not equal')
However the two arrays are not equal for some reason. What is the problem here and what can I do to solve it?
I cannot use numpy's copy function as I want to make modifications later to some of the elements from my_array later with the other values being the same as that of my_matrix.
Edit: On running the code I get the following message:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
Does this mean there is something wrong with the dataset (tp_matrix)?
Edit 2: I have tried the allclose and isclose functions but I get this error:
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The data is stored as floats. Also it is a bit large (399 x 5825).
Edit 3: Solved. I had to reinstall python.
Use np.allclose to test the (almost) equality of float arrays, because of the way float numbers are represented in a computer.
For more details, you could read for instance "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
I tried to mimic what you are experiencing and did the following:
one = np.array([1,2,3])
two = np.array([one,one,one])
three = np.empty(shape=(two.shape))
for x in range(two.shape[0]):
for y in range(two.shape[1]):
three[x][y] = two[x][y]
Printing the contents of 'two' and 'three' gives the following result
print(three)
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
print(two)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Although for this small example numpy returns True if I test equality using np.array_equal, it is possible that rounding errors cause the test to be False in your case.
A workaround for this could be the following test:
sum(sum(two==three)) == two.shape[0]*three.shape[1]
Although there are probably more efficient ways.

How do I efficiently fill a numpy ndarray with a function called at each row and column?

I want to create a numpy ndarray by specifying a function of row and column to define the values.
For example, I'd like to be able to do something like this (completely incorrect syntax, but you should get the idea):
>>> np.ndarray((2,3), lambda r,c: 3*r+c)
[[ 0 1 2]
[ 3 4 5]]
Searching online has yielded nothing, though I've had trouble thinking of exactly how to search for it...
Right now I've got (the equivalent of) the following code, but it's horribly inefficient:
def ndarrayFuncFill(size, func):
z = np.empty(size)
for r in np.arange(size[0]):
for c in np.arange(size[1]):
z[r][c] = func(r,c)
return z
>>> ndarrayFuncFill((2,3), lambda r,c: 3*r+c)
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
Unfortunately, the function I particularly want to use this with right now is not something I can easily rewrite as a ufunc or anything like that. I pretty much have to treat it as a black box.
The function I'm actually interested in using this with (not something so simple as the above lambda), is not one I have permission to post. However, it essentially does interpolation on a lookup table. So you give it a row and column, and then it translates that to indices in a lookup table -- but there's some tricky stuff going on where it's not just a one-to-one lookup, it sometimes does a combination of 'nearby' values, and that sort of thing. So it's not the most efficient function either, but I'd rather not have too many other silly sources of waste like nested for-loops.
Any suggestions?
You could try using index arrays.
For your simple example, using np.indices you could do something like:
import numpy as np
r, c = 2, 3
a = np.empty((r, c))
b = np.indices((r, c))
a[b[0], b[1]] = 3 * b[0] + b[1]
So then we have:
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
The fastest solution for your particular example is np.arange(6).reshape(3, 2). In general you could use np.vectorize for 1D arrays and then reshape if necessary, but this isn't optimized ("The implementation is essentially a for loop.").

Just curious about result from NumPy function!

I have used NumPy for my Master thesis. I've converted parts of the code from MATLAB code, but I have doubts in NumPy/Python when I reference:
m = numpy.ones((10,2))
m[:,0]
which returns:
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
and when I ref to:
m[:,0:1]
it returns:
array([[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.],
[ 1.]])
that I think it should be, cause same result with MATLAB!!!
This is because numpy has the concept of 1d arrays which Matlab doesn't have. Coupled with numpys broadcasting this provides a powerful simplification (less worrying about inserting transposes everywhere) but does mean you have to think a little bit about translating from Matlab. In this case, extracting a single column with a scalar Numpy simplifies the result to a 1d array - but with a slice it preserves the original dimensions. If you want to stay closer to Matlab semantics you could try using the Matrix class. See NumPy for matlab users page for details. In this case, you could do either of the following:
m[:,0][:,newaxis] # gives same as matlab
np.matrix(m)[:,0] # gives same as matlab
But remember if you use matrix class * becomes matrix multiplication and you need to use multiply() for elementwise. (This is all covered in NumPy for Matlab Users page). Generally I would recommend trying to get used to using 1d arrays where you would have column or row vector in matlab and generally things just work. You only need to worry about column vs row when reassembling them into a 2d array.
You may be interested in automated matlab to python converters such as OMPC (paper) (I think there are others as well).
I'm still learning Python myself, but I think the way that slicing works is that indices point to in-between locations, therefore 0:1 only gets you the first column. Is this what you were asking about?
This is what the documentation has to say:
One way to remember how slices work is
to think of the indices as pointing
between characters, with the left edge
of the first character numbered 0.
Then the right edge of the last
character of a string of n characters
has index n, for example:
+---+---+---+---+---+
| H | e | l | p | A |
+---+---+---+---+---+
0 1 2 3 4 5
-5 -4 -3 -2 -1
I forget what numpy does, but Matlab indexes vectors from 1, not 0. So array(:,0) is an error in Matlab.

Categories

Resources