numpy ravel versus flat in slice assignment - python

According to the documentation, ndarray.flat is an iterator over the array while ndarray.ravel returns a flattened array (when possible). So my question is, when should we use one or the other?
Which one would be preferred as the rvalue in an assignment like the one in the code below?
import numpy as np
x = np.arange(2).reshape((2,1,1))
y = np.arange(3).reshape((1,3,1))
z = np.arange(5).reshape((1,1,5))
mask = np.random.choice([True, False], size=(2,3,5))
# netCDF4 module wants this kind of boolean indexing:
nc4slice = tuple(mask.any(axis=axis) for axis in ((1,2),(2,0),(0,1)))
indices = np.ix_(*nc4slice)
ncrds = 3
npnts = (np.broadcast(*indices)).size
points = np.empty((npnts, ncrds))
for i,crd in enumerate(np.broadcast_arrays(x,y,z)):
# Should we use ndarray.flat ...
points[:,i] = crd[indices].flat
# ... or ndarray.ravel():
points[:,i] = crd[indices].ravel()

You don't need either. crd[mask] is already 1-d. If you did, numpy always calls np.asarray(rhs) first, so it is the same if no copy is needed for ravel. When the copy is needed, I would guess that ravel may be faster currently (I did not time it).
If you knew that a copy might be needed, and here you know that nothing is needed, reshaping points could actually be the fastest. Since you usually don't need the fastest, I would say it is more a matter of taste, and would personally probably use ravel.

Related

Assigning values to subset of an array defined by multiple conditions

I want to assign values to part of an array which is specified by multiple conditions.
For example:
import numpy as np
temp = np.arange(120).reshape((2,3,4,5))
mask = temp > 22
submask = temp[mask] < 43
(temp[mask])[submask] = 0 # assign zeros to the part of the array specified via mask and submask
print(temp) # notice that temp is unchanged - I want it to be changed
This example is artificial. Generally I want to do something more complex which involves a combination of indexing and boolean masks. Using a list index fails in similar circumstances. For example: temp[:,:,0,[1,3,2]]=0 is a valid assignment, but temp[:,:,0,[1,3,2]][mask]=0 will fail.
My understanding is that the assignment is failing because the complex indexing is prompting numpy to make a copy of the array object and assigning to that, rather than to the original array. So it isn't that the assignment is failing per se, just that the assignment is directed towards the "wrong" object.
I have tried using functions such as np.copyto and np.putmask but these also fail, presumably because the backend implementation mimics the original problem. For example: np.putmask(temp, mask, 0) will work but np.putmask(temp[mask], submask, 0) will not.
Is there a good way to do what I want to do?

Understanding NumPy's copy behaviour when using advanced indexing

I am currently struggling writing tidy code in NumPy using advanced indexing.
arr = np.arange(100).reshape(10,10) # array I want to manipulate
sl1 = arr[:,-1] # basic indexing
# Do stuff with sl1...
sl1[:] = -1
# arr has been changed as well
sl2 = arr[arr >= 50] # advanced indexing
# Do stuff with sl2...
sl2[:] = -2
# arr has not been changed,
# changes must be written back into it
arr[arr >= 50] = sl2 # What I'd like to avoid
I'd like to avoid this "write back" operation because it feels superfluous and I often forget it. Is there a more elegant way to accomplish the same thing?
Both boolean and integer array indexing, fall under the category of advanced indexing methods. In the second example (boolean indexing), you'll see that the original array is not updated, this is because advanced indexing always returns a copy of the data (see second paragraph in the advanced indexing section of the docs). This means that once you do arr[arr >= 50] this is already a copy of arr, and whatever changes you apply over it they won't affect arr.
The reason why it does not return a view is that advanced indexing cannot be expressed as a slice, and hence cannot be addressed with offsets, strides, and counts, which is required to be able to take a view of the array's elements.
We can easily verify that we are viewing different objects in the case of advanced indexing with:
np.shares_memory(arr, arr[arr>50])
# False
np.shares_memory(arr, arr[:,-1])
# True
Views are only returned when performing basic slicing operations. So you'll have to assign back as you're doing in the last example. In reference to the question in the comments, when assigning back in a same expression:
arr[arr >= 50] = -2
This is translated by the python interpreter as:
arr.__setitem__(arr >= 50, -2)
Here the thing to understand is that the expression can be evaluated in-place, hence there's no new object creation involved since there is no need for it.

Numpy fill_diagonal return None

I want to generate symmetric zero diagonal matrices. My symmetric part work, but when I use fill_diagonal from numpy as the result I got "None". My code is below. Thank you for reading
import numpy as np
matrix_size = int(input("Size of the matrix \n"))
random_matrix = np.random.random_integers(-4,4,size=(matrix_size,matrix_size))
symmetric_matrix = (random_matrix + random_matrix.T)/2
print(symmetric_matrix)
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
print(zero_diogonal_matrix)
np.fill_diagonal(), like many other methods across python/numpy, works in-place. For example: Why does “return list.sort()” return None, not the list?. That is that it directly alters the object in memory and does not create a new object. The return value from such functions is None. Therefore, change:
zero_diogonal_matrix = np.fill_diagonal(symmetric_matrix,0)
To just:
np.fill_diagonal(symmetric_matrix,0)
You will then see the change reflected in symmetric_matrix.
It's probably overkill, but in case you want to preserve the tenet of minimising surprise, you could wrap this (and other functions like it) in a function that takes care of preserving the original array:
def fill_diagonal(source_array, diagonal):
copy = source_array.copy()
np.fill_diagonal(copy, diagonal)
return copy
But the question then becomes "who exactly is going to be least surprised by doing it this way?"

Perform a reverse cumulative sum on a numpy array

Can anyone recommend a way to do a reverse cumulative sum on a numpy array?
Where 'reverse cumulative sum' is defined as below (I welcome any corrections on the name for this procedure):
if
x = np.array([0,1,2,3,4])
then
np.cumsum(x)
gives
array([0,1,3,6,10])
However, I would like to get
array([10,10,9,7,4]
Can anyone suggest a way to do this?
This does it:
np.cumsum(x[::-1])[::-1]
You can use .flipud() for this as well, which is equivalent to [::-1]
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flipud.html
In [0]: x = np.array([0,1,2,3,4])
In [1]: np.flipud(np.flipud(x).cumsum())
Out[1]: array([10, 10, 9, 7, 4]
.flip() is new as of NumPy 1.12, and combines the .flipud() and .fliplr() into one API.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flip.html
This is equivalent, and has fewer function calls:
np.flip(np.flip(x, 0).cumsum(), 0)
The answers given so far seem to be all inefficient if you want the result stored in the original array. As well, if you want a copy, keep in mind this will return a view not a contiguous array and np.ascontiguousarray() is still needed.
How about
view=np.flip(x, 0)
np.cumsum(view, 0, out=view)
#x contains the reverse cumsum result and remains contiguous and unflipped
This modifies the flipped view of x which writes the data properly in reverse order back into the original x variable. It requires no non-contiguous views at the end of execution and is about as speed efficient as possible. I am guessing numpy will never add a reversecumsum method namely because the technique I describe is so trivially and efficiently possible. Albeit, it might be ever so slightly more efficient to have the explicit method.
Otherwise if a copy is desired, then the extra flip is required AND conversion back to a contiguous array, mainly if it will be used in many vector operations thereafter. A tricky part of numpy, but views and contiguity are something to be careful with if you are seriously interested in performance.

Apply function to 3 elements at a time in numpy

I would like to apply a function to a monodimensional array 3 elements at a time, and output for each of them a single element.
for example I have an array of 13 elements:
a = np.arange(13)**2
and I want to apply a function, let's say np.std as an example.
Here is the equivalent list comprehension:
[np.std(a[i:i+3]) for i in range(0, len(a),3)]
[1.6996731711975948,
6.5489609014628334,
11.440668201153674,
16.336734339790461,
0.0]
does anyone know a more efficient way using numpy functions?
The simplest way is to reshape it and apply the function along an axis.
import numpy as np
a = np.arange(12)**2
b = a.reshape(4,3)
print np.std(b, axis=1)
If you need a little better performance than that, you could try stride_tricks. Below is the same as above except using stride_tricks. I was wrong about the performance gain, because as you can see below, b becomes exactly the same view as b above. I wouldn't be surprised if they compiled to exactly the same thing.
import numpy as np
a = np.arange(12)**2
b = np.lib.stride_tricks.as_strided(a, shape=(4,3), strides=(a.itemsize*3, a.itemsize))
print np.std(b, axis=1)
Are you talking about something like vectorize? http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
You can reshape it. But that does require that the size not change. If you can tack on some bogus entries at the end you can do this:
[np.std(s) for s in a.reshape(-1,3)]

Categories

Resources