In Python when using np.empty(), for example np.empty((3,1)) we get an array that is of size (3,1) but, in reality, it is not empty and it contains very small values (e.g., 1.7*(10^315)). Is possible to create an array that is really empty/have no values but have given dimensions/shape?
I'd suggest using np.full_like to choose the fill-value directly...
x = np.full_like((3, 1), None, dtype=object)
... of course the dtype you chose kind of defines what you mean by "empty"
I am guessing that by empty, you mean an array filled with zeros.
Use np.zeros() to create an array with zeros. np.empty() just allocates the array, so the numbers in there are garbage. It is provided as a way to even reduce the cost of setting the values to zero. But it is generally safer to use np.zeros().
I suggest to use np.nan. like shown below,
yourdata = np.empty((3,1)) * np.nan
(Or)
you can use np.zeros((3,1)). but it will fill all the values as zero. It is not intuitively well. I feel like using np.nan is best in practice.
Its all upto you and depends on your requirement.
Related
There are several different types of NaN possible in most floating point representations (e.g. quiet NaNs, signalling NaNs, etc.). I assume this is also true in numpy. I have a specific bit representation of a NaN, defined in C and imported into python. I wish to test whether an array contains entirely this particular floating point bit pattern. Is there any way to do that?
Note that I want to test whether the array contains this particular NaN, not whether it has NaNs in general.
Numpy allows you to have direct access to the bytes in your array. For a simple case you can view nans directly as integers:
quiet_nan1 = np.uint64(0b0111111111111000000000000000000000000000000000000000000000000000)
x = np.arange(10, dtype=np.float64)
x.view(np.uint64)[5] = quiet_nan1
x.view(np.uint64)
Now you can just compare the elements for the bit-pattern of your exact NaN. This version will preserve shape since the elements are the same size.
A more general solution, which would let you with with types like float128 that don't have a corresponding integer analog on most systems, is to use bytes:
quiet_nan1l = np.frombuffer((0b01111111111111111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000).to_bytes(16, 'big'))
x = np.arange(3 * 4 * 5, dtype=np.float128).reshape3, 4, 5)
x.view(np.uint8).reshape(*x.shape, 16)[2, 2, 3, :] = quiet_nan1l
x.view(np.uint8).reshape(*x.shape, 16)
The final reshape is not strictly necessary, but it is very convenient, since it isolates the original array elements along the last dimension.
In both cases, modifying the view modifies the original array. That's the point of a view.
And if course it goes without saying (which is why I'm saying it), that this applies to any other bit pattern you may want to assign or test for, not just NaNs.
I have a 2D numpy array with binary data, i.e. 0s and 1s (not observed or observed). For some instances, that information is missing (NaN). Since the missing values are random in the data set, I think the best way to replace them would be using random 0s and 1s.
Here is some example code:
import numpy as np
row, col = 10, 5
matrix = np.random.randint(2, size=(row,col))
matrix = matrix.astype(float)
matrix[1,2] = np.nan
matrix[5,3] = np.nan
matrix[8,0] = np.nan
matrix[np.isnan(matrix)] = np.random.randint(2)
The problem with this is that all NaNs are replaced with the same value, either 0 or 1, while I would like both. Is there a simpler solution than for example a for loop calling each NaN separately? The data set I'm working on is a lot bigger than this example.
Try
nan_mask = np.isnan(matrix)
matrix[nan_mask] = np.random.randint(0, 2, size=np.count_nonzero(nan_mask))
You can use a vectorized function:
random_replace = np.vectorize(lambda x: np.random.randint(2) if np.isnan(x) else x)
random_replace(matrix)
Since the missing values are random in the data set, I think the best way to replace them would be using random 0s and 1s.
I'd heartily contradict you here. Unless you have stochastic model that proves that assuming equal probability for each element to be either 0 or 1, that would bias your observation.
Now, I don't know where your data comes from, but "2D array" sure sounds like an image signal, or something of the like. You can find that most of the energy in many signal types is in low frequencies; if something of the like is the case for you, you can probably get lesser distortion by replacing the missing values with an element of a low-pass filtered version of your 2D array.
Either way, since you need to call numpy.isnan from python to check whether a value is NaN, I think the only way to solve this is writing an efficient loop, unless you want to senselessly calculate a huge random 2D array just to fill in a few missing numbers.
EDIT: oh, I like the vectorized version; it's effectively what I'd call a efficient loop, since it does the looping without interpreting a python loop iteration each time.
EDIT2: the mask method with counting nonzeros is even more effective, I guess :)
Can anyone recommend a way to do a reverse cumulative sum on a numpy array?
Where 'reverse cumulative sum' is defined as below (I welcome any corrections on the name for this procedure):
if
x = np.array([0,1,2,3,4])
then
np.cumsum(x)
gives
array([0,1,3,6,10])
However, I would like to get
array([10,10,9,7,4]
Can anyone suggest a way to do this?
This does it:
np.cumsum(x[::-1])[::-1]
You can use .flipud() for this as well, which is equivalent to [::-1]
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flipud.html
In [0]: x = np.array([0,1,2,3,4])
In [1]: np.flipud(np.flipud(x).cumsum())
Out[1]: array([10, 10, 9, 7, 4]
.flip() is new as of NumPy 1.12, and combines the .flipud() and .fliplr() into one API.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.flip.html
This is equivalent, and has fewer function calls:
np.flip(np.flip(x, 0).cumsum(), 0)
The answers given so far seem to be all inefficient if you want the result stored in the original array. As well, if you want a copy, keep in mind this will return a view not a contiguous array and np.ascontiguousarray() is still needed.
How about
view=np.flip(x, 0)
np.cumsum(view, 0, out=view)
#x contains the reverse cumsum result and remains contiguous and unflipped
This modifies the flipped view of x which writes the data properly in reverse order back into the original x variable. It requires no non-contiguous views at the end of execution and is about as speed efficient as possible. I am guessing numpy will never add a reversecumsum method namely because the technique I describe is so trivially and efficiently possible. Albeit, it might be ever so slightly more efficient to have the explicit method.
Otherwise if a copy is desired, then the extra flip is required AND conversion back to a contiguous array, mainly if it will be used in many vector operations thereafter. A tricky part of numpy, but views and contiguity are something to be careful with if you are seriously interested in performance.
I have need to slice an array where I would like zero to be assumed for every dimension except the first.
Given an array:
x = numpy.zeros((3,3,3))
I would like the following behavior, but without needing to know the number of dimensions before hand:
y = a[:,0,0]
Essentially I am looking for something that would take the place of Ellipsis, but instead of expanding to the needed number of : objects, it would expand into the needed number of zeros.
Is there anything built in for this? If not, what is the best way to get the functionality that I need?
Edit:
One way to do this is to use:
y = x.ravel(0:temp.shape[0])
This works fine, however in some cases (such as mine) ravel will need to create a copy of the array instead of a view. Since I am working with large arrays, I want a more memory efficient way of doing this.
You could create a indexing tuple, like this:
x = arange(3*3*3).reshape(3,3,3)
s = (slice(None),) + (0,)*(x.ndim-1)
print x[s] # array([ 0, 9, 18])
print x[:,0,0] # array([ 0, 9, 18])
I guess you could also do:
x.transpose().flat[:3]
but I prefer the first approach, since it works for any dimension (rather than only the first), and it's obviously equally efficient to just writing x[:,0,0], since it's just a different syntax.
I usually use tom10's method, but here's another:
for i in range(x.ndim-1):
x = x[...,0]
I have the following challenge in a simulation for my PhD thesis:
I need to optimize the following code:
repelling_forces = repelling_force_prefactor * np.exp(-(height_r_t/potential_steepness))
In this code snippet 'height_r_t' is a real Numpy array and 'potential_steepness' is an scalar. 'repelling_force_prefactor' is also a Numpy array, which is mostly ZERO, but ONE at pre-calculated position, which do NOT change during runtime (i.e. a Mask).
Obviously the code is inefficient as it would make much more sense to only calculate the exponential function at the positions, where 'repelling_force_prefactor' is non-zero.
The question is how do I do this in the most efficient manner?
The only idea I have up to now would be to define slice to 'height_r_t' using 'repelling_force_prefactor' and apply 'np.exp' to those slices. However, I have made the experience that slicing is slow (not sure if this is generally correct) and the solution seems awkward.
Just as a side-note the ration of 1's to 0's in 'repelling_force_prefactor' is about 1/1000 and I am running this in loop, so efficiency is very important.
(Comment: I wouldn't have a problem with resorting to Cython, as I will need/want to learn it at some point anyway... but I am a novice, so I'd need a good pointer/explanation.)
masked arrays are implemented exactly for your purposes.
Performance is the same as Sven's answer:
height_r_t = np.ma.masked_where(repelling_force_prefactor == 0, height_r_t)
repelling_forces = np.ma.exp(-(height_r_t/potential_steepness))
the advantage of masked arrays is that you do not have to slice and expand your array, the size is always the same, but numpy automatically knows not to compute the exp where the array is masked.
Also, you can sum array with different masks and the resulting array has the intersection of the masks.
Slicing is probably much faster than computing all the exponentials. Instead of using the mask repelling_force_prefactor for slicing directly, I suggest to precompute the indices where it is non-zero and use them for slicing:
# before the loop
indices = np.nonzero(repelling_force_prefactor)
# inside the loop
repelling_forces = np.exp(-(height_r_t[indices]/potential_steepness))
Now repelling_forces will contain only the results that are non-zero. If you have to update some array of the original shape of height_r_t with this values, you can use slicing with indices again, or use np.put() or a similar function.
Slicing with the list of indices will be more efficient than slicing with a boolean mask in this case, since the list of indices is shorter by a factor thousand. Actually measuring the performance is of course up to you.