Recently I used the numpy argmax function which gives the index of the maximum value in a numpy array.
Due to some circumstances I found out that when used with a scalar it just gives out 0, so like this:
np.argmax(3) # equals 0
np.argmax(1000) #equals 0
which makes sense of course, since there is only one index - but is there an actual application where one needs to find the maximum index of a scalar?
I think this is just for consistency as explained in the documentation on scalars:
Array scalars have the same attributes and methods as ndarrays. This
allows one to treat items of an array partly on the same footing as
arrays, smoothing out rough edges that result when mixing scalar and
array operations.
When you don't specify axis in argmax it returns the index into the flattened array, so even in this case the scalar is internally viewed as a 0D array.
Related
As the title indicates I want to use numpy.argpartition on a masked array. I can, but .argpartition does not honor the mask (it emits a warning message notifying the user as well). This is not useful since the masked data corrupt the results of .argpartition.
Any suggestions for replacement methods? I need to know the indices of the k smallest values in a large 1D array.
Current ideas:
a) write my own implementation of .argpartition for masked arrays
b) my current data set has the feature that the masked values are all negative (which is why they corrupt the search for smallest values).
which leads to two solutions
I could sort through them and assign a very large number to the masked values...If I do this, I feel I could just drop used of masked arrays.
I could count the number of masked elements = p and then argpartition on p+k elements. Removing the p elements from the list.
Neither a) or b) seem very pythonic or elegant.
There is this great Question/Answer about slicing the last dimension:
Numpy slice of arbitrary dimensions: for slicing a numpy array to obtain the i-th index in the last dimension, one can use ... or Ellipsis,
slice = myarray[...,i]
What if the first N dimensions are needed ?
For 3D myarray, N=2:
slice = myarray[:,:,0]
For 4D myarray, N=2:
slice = myarray[:,:,0,0]
Does this can be generalized to an arbitrary dimension?
I don't think there's any built-in syntactic sugar for that, but slices are just objects like anything else. The slice(None) object is what is created from :, and otherwise just picking the index 0 works fine.
myarray[(slice(None),)*N+(0,)*(myarray.ndim-N)]
Note the comma in (slice(None),). Python doesn't create tuples from parentheses by default unless the parentheses are empty. The comma signifies that don't just want to compute whatever's on the inside.
Slices are nice because they give you a view into the object instead of a copy of the object. You can use the same idea to, e.g., iterate over everything except the N-th dimension on the N-th dimension. There have been some stackoverflow questions about that, and they've almost unanimously resorted to rolling the indices and other things that I think are hard to reason about in high-dimensional spaces. Slice tuples are your friend.
From the comments, #PaulPanzer points out another technique that I rather like.
myarray.T[(myarray.ndim-N)*(0,)].T
First, transposes in numpy are view-operations instead of copy-operations. This isn't inefficient in the slightest. Here's how it works:
Start with myarray with dimensions (0,...,k)
The transpose myarray.T reorders those to (k,...,0)
The whole goal is to fix the last myarray.ndim-N dimensions from the original array, so we select those with [(myarray.ndim-N)*(0,)], which grabs the first myarray.ndim-N dimensions from this array.
They're in the wrong order. We have dimensions (N-1,...,0). Use another transpose with .T to get the ordering (0,...,N-1) instead.
In order to create an empty array for some results, I need to know the resulting dtype for a certain operation (e.g. multiply) when doing the operation based on two other arrays.
How to determine resulting dtype of a numpy array operation in advance?
If a and b are the argument arrays, I can for example determine the resulting dtype of the multiplication (*) by making to zero values (0) and doing a trial operation, like:
dtype=(a.dtype.type(0) * b.dtype.type(0)).dtype
However, it seems a little award... or maybe I do this the wrong way around...
So using the result_type, given in the accepted answer, the code can be like:
dtype=numpy.result_type(a, b)
use numpy.result_type(), in numpy >= 1.6.0
https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html
I am testing some edge cases of my program and observed a strange fact. When I create a scalar numpy array, it has size==1 and ndim==0.
>>> A=np.array(1.0)
>>> A.ndim # returns 0
>>> A.size # returns 1
But when I create empty array with no element, then it has size==0 but ndim==1.
>>> A=np.array([])
>>> A.ndim # returns 1
>>> A.size # returns 0
Why is that? I would expect the ndim to be also 0. Or is there another way of creation of 'really' empty array with size and ndim equal to 0?
UPDATE: even A=np.empty(shape=None) does not create dimensionless array of size 0...
I believe the answer is that "No, you can't create an ndarray with both ndim and size of zero". As you've already found out yourself, the (ndim,size) pairs of (1,0) and (0,1) are as low as you can go.
This very nice answer explains a lot about numpy scalar types, and why they're a bit odd to have around. This explanation makes it clear that scalar numpy arrays like array(1) are a very special kind of beast. They only have a single value (causing size==1), but by definition they don't have a sense of dimensionality, hence ndim==0. Non-scalar numpy arrays, on the other hand, can be empty, but they contain at least a pair of square brackets, leading to a minimal ndim of 1, even if their size can be 0 if they are made up of empty lists. (This is how I think about the situation: ndarrays are in a way lists of lists of lists of ..., on as many levels as there are dimensions. 1d arrays are compatible with lists, so an empty list, being still a list, also has a defining dimension.)
The only way to come up with an empty scalar would be to call np.array() like this, but arrays can only be initialized by some actual object. So I believe your program is safe from this edge case.
I have the following challenge in a simulation for my PhD thesis:
I need to optimize the following code:
repelling_forces = repelling_force_prefactor * np.exp(-(height_r_t/potential_steepness))
In this code snippet 'height_r_t' is a real Numpy array and 'potential_steepness' is an scalar. 'repelling_force_prefactor' is also a Numpy array, which is mostly ZERO, but ONE at pre-calculated position, which do NOT change during runtime (i.e. a Mask).
Obviously the code is inefficient as it would make much more sense to only calculate the exponential function at the positions, where 'repelling_force_prefactor' is non-zero.
The question is how do I do this in the most efficient manner?
The only idea I have up to now would be to define slice to 'height_r_t' using 'repelling_force_prefactor' and apply 'np.exp' to those slices. However, I have made the experience that slicing is slow (not sure if this is generally correct) and the solution seems awkward.
Just as a side-note the ration of 1's to 0's in 'repelling_force_prefactor' is about 1/1000 and I am running this in loop, so efficiency is very important.
(Comment: I wouldn't have a problem with resorting to Cython, as I will need/want to learn it at some point anyway... but I am a novice, so I'd need a good pointer/explanation.)
masked arrays are implemented exactly for your purposes.
Performance is the same as Sven's answer:
height_r_t = np.ma.masked_where(repelling_force_prefactor == 0, height_r_t)
repelling_forces = np.ma.exp(-(height_r_t/potential_steepness))
the advantage of masked arrays is that you do not have to slice and expand your array, the size is always the same, but numpy automatically knows not to compute the exp where the array is masked.
Also, you can sum array with different masks and the resulting array has the intersection of the masks.
Slicing is probably much faster than computing all the exponentials. Instead of using the mask repelling_force_prefactor for slicing directly, I suggest to precompute the indices where it is non-zero and use them for slicing:
# before the loop
indices = np.nonzero(repelling_force_prefactor)
# inside the loop
repelling_forces = np.exp(-(height_r_t[indices]/potential_steepness))
Now repelling_forces will contain only the results that are non-zero. If you have to update some array of the original shape of height_r_t with this values, you can use slicing with indices again, or use np.put() or a similar function.
Slicing with the list of indices will be more efficient than slicing with a boolean mask in this case, since the list of indices is shorter by a factor thousand. Actually measuring the performance is of course up to you.