I have a simple problem that causes me a lot of troubles: I have a a big 2D array which is a mixture of datetime.Timedelta objects and np.nan, simplified looking like this:
tdarray = np.array([dt.timedelta(days=5), np.nan])
Now I want to get the days and float/integer from the timedelta object, while leaving the np.nan as it is, i.e. the result should be np.array([ 5., nan]).
Getting the days from the timedelta object is easy with .days, and applying the function the array should work e.g. with np.fromiter and then reshaping. But how do I catch the error that occurs when trying to get the days from NaN? I tried masking, but this also fails with the AttributeError that the MaskedArray does not have the attribute days. Is there any simple solution?
Make use of the fact that np.nan is the only object not equals to itself. Note that if your array contains other objects, they should have the equality operator defined, else this will throw an error.
tdarray = np.asarray([dt.timedelta(days=5), np.nan])
mask = tdarray == tdarray # This gives array([True, False])
tdarray[mask] = [x.days for x in tdarray[mask]]
# Optionally cast to float
tdarray = tdarray.astype(np.float64)
Or you can simply rebuild the array
tdarray = np.asarray([x.days if x == x else x for x in tdarray],
dtype=np.float64)
And if tdarray is a ND array (N > 1) then
shape = tdarray.shape
tdarray = np.asarray([x.days if x == x else x
for x in tdarray.ravel()],
dtype=np.float64).reshape(shape)
Related
I have a piece of code that is running, but is currently a bottleneck, and I was wondering whether there is a smarter way to do it.
I have a 1D array of integers between 0-20, with a length of 20-1000 (it varies) and I'm trying to compare it to a set of 1D arrays that are stored in a 2D array.
I wish to find any row in the 2D array that completely matches the 1D array.
My current approach to do this is the following:
res = np.mean(one_d_array == two_d_array,axis=1) == 1
The problem with this approach is that it will compare all elements in all rows, even if these rows don't match on the first element, second, ect... Which is of course very inefficient.
I could remedy this by looping through the rows and comparing each row individually, then I would probably be able to stop the comparison as soon as one element is false. However then I would be stuck with a slow for loop, which would also not be ideal.
So I'm wondering is there some other clever way to get the best of both of these approaches?
numpy has a few useful built-in functions for checking matrix/vector equality, this is about twice as fast:
import numpy as np
import time
x = np.random.random((1, 1000))
y = np.random.random((10000, 1000))
y[53] = x
t = time.time()
x_in_y = np.equal(x, y).all(axis=1) # equal(x, y) returns a row x col matrix of True for matches; all(axis=0) returns a vector len(rows) if the entire row in x == y is true
idx = np.where(x_in_y) # returns the indicies where x_in_y is true (here 53)
print(time.time() - t) # 0.019975900650024414
t = time.time()
res = np.mean(x == y, axis=1) == 1
print(time.time() - t) # 0.03999614715576172
This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])
I have a long list and its element type is int. I want to find the index of element that equals to a certain number and I use np.where to achieve this.
The following is my original code,
# suppose x is [1, 1, 2, 3]
y = np.array(x, dtype=np.float32)
idx = list(np.where(y==1)[0])
# output is [0, 1]
After inspecting the code after some time, I realize that I should not use dtype=np.float32 because it would change the datatype of y to float. The correct code should be the following,
# suppose x is [1, 1, 2, 3]
y = np.array(x)
idx = list(np.where(y==1)[0])
# output is also [0, 1]
Surprisingly, these two code snippet produce exactly the same result.
my question
My does the condition for test of equality is handled in numpy.where when the datatype of array and target are not compatible (int vs float, e.g.)?
NumPy where (source code here) is not concerned with the comparison of data types: its first argument is an array of bool type. When you write y == 1, this is an array comparison operation which returns a Boolean array, which is then passed as an argument to where.
The relevant method is equal, which you implicitly invoke by writing y == 1. Its documentation says:
What is compared are values, not types.
For example,
x, y, z = np.float64(0.25), np.float32(0.25), 0.25
These are all of different types, (numpy.float64, numpy.float32, float) but x == y and y == z and x == z are True. Here it is important that 0.25 is exactly represented in binary system (1/4).
With
x, y, z = np.float64(0.2), np.float32(0.2), 0.2
we see that x == y is False and y == z is False but x == z is True, because Python floats are 64-bit just like np.float64. Since 1/5 is not exactly represented in binary, using 32 bits vs 64 bits results in two different approximations to 1/5, which is why equality fails: not because of types, but because np.float64(0.2) and np.float32(0.2) are actually different values (their difference is about 3e-9).
I'm trying to plot some complex functions using numpy. Example of some working code:
import numpy as np
from PIL import Image
size = 1000
w = np.linspace(-10, 10, size)
x, y = np.meshgrid(w, w)
r = x + 1j*y
def f(q):
return np.angle(q)
z = f(r)
normalized = ((255/(np.amax(z) - np.amin(z)))*(z+abs(np.amin(z)))).astype(int)
data = [i for j in normalized for i in j]
img = Image.new('L', (size, size))
img.putdata(data[::-1]) #pixels are done bottom to top
img.show()
However, suppose I want the function f to have a simple comparison in it, like this:
def f(q):
if np.abs(q) < 4:
return 1
else:
return 0
I get the error
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
For the np.abs(q) < 4 check.
I did some digging and realized it's because Python is doing the operation on the entire r array, and it can't compare an array to an integer. So, I tried looking for ways to do element-wise comparisons.
This page looked promising: it says I can do element-wise comparisons by using np.less(a, b), so I tried
def f(q):
if np.less(np.abs(q), 4):
return 1
else:
return 0
and got the same ValueError. It seems as though both arguments for np.less() need to be arrays of the same size.
What I want is to compare each element of my array to a single, non-array quantity. I suppose I could make a dummy array of the same size filled with identical 4's, but there has to be a more elegant way of doing this.
The key is to return an array value instead of trying to coerce an array into a single bool, which is what if (some_array): keeps trying to do. There being no unambiguous way to decide what single boolean np.array([True, False]) should convert to, it doesn't even try.
So don't even branch:
def f(q):
return abs(q) < 4
gives an array like
>>> f(np.array([1,3,5]))
array([ True, True, False], dtype=bool)
which as numbers will behave like
>>> f(np.array([1,3,5])).astype(int)
array([1, 1, 0])
and give
For example, let's consider this toy code
import numpy as np
import numpy.random as rnd
a = rnd.randint(0,10,(10,10))
k = (1,2)
b = a[:,k]
for col in np.arange(np.size(b,1)):
b[:,col] = b[:,col]+col*100
This code will work when the size of k is bigger than 1. However, with the size equal to 1, the extracted sub-matrix from a is transformed into a row vector, and applying the function in the for loop throws an error.
Of course, I could fix this by checking the dimension of b and reshaping:
if np.dim(b) == 1:
b = np.reshape(b, (np.size(b), 1))
in order to obtain a column vector, but this is expensive.
So, the question is: what is the best way to handle this situation?
This seems like something that would arise quite often and I wonder what is the best strategy to deal with it.
If you index with a list or tuple, the 2d shape is preserved:
In [638]: a=np.random.randint(0,10,(10,10))
In [639]: a[:,(1,2)].shape
Out[639]: (10, 2)
In [640]: a[:,(1,)].shape
Out[640]: (10, 1)
And I think b iteration can be simplified to:
a[:,k] += np.arange(len(k))*100
This sort of calculation will also be easier is k is always a list or tuple, and never a scalar (a scalar does not have a len).
np.column_stack ensures its inputs are 2d (and expands at the end if not) with:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
np.atleast_2d does
elif len(ary.shape) == 1:
result = ary[newaxis,:]
which of course could changed in this case to
if b.ndim==1:
b = b[:,None]
Any ways, I think it is better to ensure the k is a tuple rather than adjust b shape after. But keep both options in your toolbox.