Numpy "Where" function can not avoid evaluate Sqrt(negative) - python

It seems that the np.where function evaluates all the possible outcomes first, then it evaluates the condition later. This means that, in my case, it will evaluate square root of -5, -4, -3, -2, -1 even though it will not be used later on.
My code runs and works. But my problem is the warning. I avoided using a loop to evaluate each element, because it will run much slower than np.where.
So, here, I am asking
Is there any way to make np.where evaluate the condition first?
Can I turn off just this specific warning? How?
Another better way to do it if you have a better suggestion.
Here just a short example code corresponding my real code which is gigantic. But essentially has the same problem.
Input:
import numpy as np
c=np.arange(10)-5
d=np.where(c>=0, np.sqrt(c) ,c )
Output:
RuntimeWarning: invalid value encountered in sqrt
d=np.where(c>=0,np.sqrt(c),c)

There is a much better way of doing this. Let's take a look at what your code is doing to see why.
np.where accepts three arrays as inputs. Arrays do not support lazy evaluation.
d = np.where(c >= 0, np.sqrt(c), c)
This line is therefore equivalent to doing
a = (c >= 0)
b = np.sqrt(c)
d = np.where(a, b, c)
Notice that the inputs are computed immediately, before where ever gets called.
Luckily, you don't need to use where at all. Instead, just use a boolean mask:
mask = (c >= 0)
d = np.empty_like(c)
d[mask] = np.sqrt(c[mask])
d[~mask] = c[~mask]
If you expect a lot of negatives, you can copy all the elements instead of just the negative ones:
d = c.copy()
d[mask] = np.sqrt(c[mask])
An even better solution might be to use masked arrays:
d = np.ma.masked_array(c, c < 0)
d = np.ma.sqrt(d)
To access the whole data array, with the masked portion unaltered, use d.data.

np.sqrt is a ufunc and accepts a where parameter. It can be used as a mask in this case:
In [61]: c = np.arange(10)-5.0
In [62]: d = c.copy()
In [63]: np.sqrt(c, where=c>=0, out=d);
In [64]: d
Out[64]:
array([-5. , -4. , -3. , -2. , -1. ,
0. , 1. , 1.41421356, 1.73205081, 2. ])
In contrast to the np.where case, this does not evaluate the function at the ~where elements.

This is answer to your 2nd question.
Yes you can turn off the warnings. Use warnings module.
import warnings
warnings.filterwarnings("ignore")

One solution is to not use np.where, and use indexing instead.
c = np.arange(10)-5
d = c.copy()
c_positive = c > 0
d[c_positive] = np.sqrt(c[c_positive])

Related

How to apply a natural logarithm to a matrix and obtain zero for when the matrix entry is zero

In Python I have a Matrix with some zero values, how can I apply a natural logarithm and obtain zero for when the matrix entry is zero? I am using numpy.log(matrix) to apply the natural logarithm function, but I am getting nan when the matrix entry is equal to zero, and I would like it to be zero instead
You can do something like this:
arr = numpy.nan_to_num(numpy.log(matrix))
The behavior of nan_to_num replaces all the NaNs by zeroes.
You can find more information here:
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.nan_to_num.html
Another alternative is to pass a mask to the where= argument of the np.log function.
You can use np.where. The seterr is to turn off the warning.
RuntimeWarning: divide by zero encountered in log
In:
np.seterr(divide = 'ignore')
matrix = np.array([[10,0,5], [0,10,12]])
np.where(matrix == 0, 0, np.log(matrix))
Out:
array([[2.30258509, 0. , 1.60943791],
[0. , 2.30258509, 2.48490665]])
You can use numpy.log1p it will evaluate to zero if the entry is zero (since the Log of 1 is zero) and the reverse operation is numpy.expm1.
You can find more information in the documentation:
Log1p
Expm1
np.log is a ufunc that takes a where parameter. That tells it which elements of x will be used in the calculation. The rest are skipped. This is best used with a out parameter, as follows:
In [25]: x = np.array([1.,2,0,3,10,0])
In [26]: res = np.zeros_like(x)
In [27]: idx = x>0
In [28]: np.log(x)
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in log
#!/usr/bin/python3
Out[28]:
array([0. , 0.69314718, -inf, 1.09861229, 2.30258509,
-inf])
In [29]: np.log(x, out=res, where=idx)
Out[29]:
array([0. , 0.69314718, 0. , 1.09861229, 2.30258509,
0. ])

How to find negative imaginary parts of values in an array then turning them to positive?

I have a function a=x*V where x assumes thousands of values as x = arange(1,1000,0.1) and V is a combination of other constants. These make a always complex (has nonzero real and imaginary parts). However, because a depends on other values, the imag(a) can be negative for some x's.
For what I am doing, however, I need imag(a) to be always positive, so I need to take the negative values and turn them into positive.
I have tried doing
if imag(a)<0:
imag(a) = -1*imag(a)
That didn't seem to work because it gives me the error: SyntaxError: Can't assign to function call. I thought it was because it's an array so I tried any() and all(), but that didn't work either.
I'm out of options now.
IIUC:
In [35]: a = np.array([1+1j, 2-2j, 3+3j, 4-4j])
In [36]: a.imag *= np.where(a.imag < 0, -1, 1)
In [37]: a
Out[37]: array([ 1.+1.j, 2.+2.j, 3.+3.j, 4.+4.j])
You can't redefine a function that way. It would be like saying
sqrt(x) = 2*sqrt(x)
What you can do is reassign the value of a (not imag(a)).
if imag(a) < 0
a = a - 2*imag(a)*j
For example, if a = 3 - 5j, then it would give you
3 - 5j - 2(-5)j = 3 + 5j
It appears to be faster than doing subtraction. For a full function:
import numpy as np
def imag_abs(x):
mask = x.imag < 0
x[mask] = np.conj(x[mask])
return x

Understanding the runtime of numpy.where and equivalent alternatives

According to http://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html, if x and y are given and input arrays are 1-D, where is equivalent to [xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)]. When doing runtime benchmarks, however, they have significantly different speeds:
x = np.array(range(-500, 500))
%timeit np.where(x != 0, 1/x, x)
10000 loops, best of 3: 23.9 µs per loop
%timeit [xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)]
1000 loops, best of 3: 232 µs per loop
Is there a way I can rewrite the second form so that it has a similar runtime to the first? The reason I ask is because I'd like to use a slightly modified version of the second case to avoid division by zero errors:
[1 / xv if c else xv for (c,xv) in zip(x!=0, x)]
Another question: the first case returns a numpy array while the second case returns a list. Is the most efficient way to have the second case return an array is to first make a list and then convert the list to an array?
np.array([xv if c else yv for (c,xv, yv) in zip(x!=0, 1/x, x)])
Thanks!
You just asked about 'delaying' the 'where':
numpy.where : how to delay evaluating parameters?
and someone else just asked about divide by zero:
Replace all elements of a matrix by their inverses
When people say that where is similar to the list comprehension, they attempt to describe the action, not the actual implementation.
np.where called with just one argument is the same as np.nonzero. This quickly (in compiled code) loops through the argument, and collects the indices of all non-zero values.
np.where when called with 3 arguments, returns a new array, collecting values from the 2 and 3rd arguments based on the nonzero values. But it's important to realize that those arguments must be other arrays. They are not functions that it evaluates element by element.
So the where is more like:
m1 = 1/xv
m2 = xv
[v1 if c else v2 for (c, v1, v2) in zip(x!=0, m1, m2)]
It's easy to run this iteration in compiled code because it just involves 3 arrays of matching size (matching via broadcasting).
np.array([...]) is a reasonable way of converting a list (or list comprehension) into an array. It may be a little slower than some alternatives because np.array is a powerful general purpose function. np.fromiter([], dtype) may be faster in some cases, because it isn't as general (you have to specify dtype, and it it only works with 1d).
There are 2 time proven strategies for getting more speed in element-by-element calculations:
use packages like numba and cython to rewrite the problem as c code
rework your calculations to use existing numpy methods. The use of masking to avoid divide by zero is a good example of this.
=====================
np.ma.where, the version for masked arrays is written in Python. Its code might be instructive. Note in particular this piece:
# Construct an empty array and fill it
d = np.empty(fc.shape, dtype=ndtype).view(MaskedArray)
np.copyto(d._data, xv.astype(ndtype), where=fc)
np.copyto(d._data, yv.astype(ndtype), where=notfc)
It makes a target, and then selectively copies values from the 2 inputs arrays, based on the condition array.
You can avoid division by zero while maintaining performance by using advanced indexing:
x = np.arange(-500, 500)
result = np.empty(x.shape, dtype=float) # set the dtype to whatever is appropriate
nonzero = x != 0
result[nonzero] = 1/x[nonzero]
result[~nonzero] = 0
If you for some reason want to bypass an error with numpy it might be worth looking into the errstate context:
x = np.array(range(-500, 500))
with np.errstate(divide='ignore'): #ignore zero-division error
x = 1/x
x[x!=x] = 0 #convert inf and NaN's to 0
Consider changing the array in place by using np.put():
In [56]: x = np.linspace(-1, 1, 5)
In [57]: x
Out[57]: array([-1. , -0.5, 0. , 0.5, 1. ])
In [58]: indices = np.argwhere(x != 0)
In [59]: indices
Out[59]:
array([[0],
[1],
[3],
[4]], dtype=int64)
In [60]: np.put(x, indices, 1/x[indices])
In [61]: x
Out[61]: array([-1., -2., 0., 2., 1.])
The approach above does not create a new array, which could be very convenient if x is a large array.

Numpy.where() with an array in its conditional

I don't know how to describe this well so I'll just show it.
How do I do this...
for iy in random_y:
print(x[np.where(y == iy)], iy)
X y
[ 0.5] : 0.247403959255
[ 2.] : 0.841470984808
[ 49.5]: -0.373464754784
without for loops and I get a solution as a single array like when you use np.where() or array[cond]. Since you know, this is Python B)
NOTE: The reason why I want to do this is because I have a random subset of the Y values and I want to find the corresponding X values.
If you are looking for exact matches, you can simply use np.in1d as this is a perfect scenario for its usage, like so -
first_output = x[np.in1d(y,random_y)]
second_output = random_y[np.in1d(random_y,y)
If you are dealing with floating-point numbers, you might want to use some tolerance factor into the comparisons. So, for such cases, you can use NumPy broadcasting and then use np.where, like so -
tol = 1e-5 # Edit this to change tolerance
R,C = np.where(np.abs(random_y[:,None] - y)<=tol)
first_output = x[C]
second_output = random_y[R]
Maybe this could do the trick(not tested):
print(Str(x[np.where(y == iy)]) + " " + Str(iy) + "\n") for iy in random_y

Find float in ndarray

I tried to find a float number in ndarray. Due to the software package I am using (Abaqus), the precision it outputs is a little bit low. For example, 10 is something like 10.00003. Therefore, I was wondering whether there is a "correct" way to do it, that is neater than my code.
Example code:
import numpy as np
array = np.arange(10)
number = 5.00001
if I do this:
idx = np.where(number==array)[0][0]
Then the result is empty because 5.00001 does not equal to 5.
Now I am doing:
atol = 1e-3 # Absolute tolerance
idx = np.where(abs(number-array) < atol)[0][0]
which works, and is not too messy... Yet I was wondering there would be a neater way to do it. Thanks!
PS: numpy.allclose() is another way to do it, but I need to use number * np.ones([array.shape[0], array.shape[1]]) and it still seems verbose to me...
Edit: Thank you all so much for the fantastic answers! np.isclose() is the exact function that I am looking for, and I missed it since it is not in the doc... I wouldn't have realized this until they update the doc, if it weren't you guys. Thank you again!
PS: numpy.allclose() is another way to do it, but I need to use number * np.ones([array.shape[0], array.shape[1]]) and it still seems verbose to me...
You almost never need to do anything like number * np.ones([array.shape[0], array.shape[1]]). Just as you can multiply that scalar number by that ones array to multiply all of its 1 values by number, you can pass that scalar number to allclose to compare all of the original array's values to number. For example:
>>> a = np.array([[2.000000000001, 2.0000000002], [2.000000000001, 1.999999999]])
>>> np.allclose(a, 2)
True
As a side note, if you really do need an array of all 2s, there's an easier way to do it than multiplying 2 by ones:
>>> np.tile(2, array.shape)
array([[2, 2], [2, 2]])
For that matter, I don't know why you need to do [array.shape[0], array.shape[1]]. If the array is 2D, that's exactly the same thing as array.shape. If the array might be larger, it's exactly the same as array.shape[:2].
I'm not sure this solves your actual problem, because it seems like you want to know which ones are close and not close, rather than just whether or not they all are. But the fact that you said you could use allclose if not for the fact that it's too verbose to create the array to compare with.
So, if you need whereclose rather than allclose… well, there's no such function. But it's pretty easy to build yourself, and you can always wrap it up if you're doing it repeatedly.
If you had an isclose method—like allclose, but returning a bool array instead of a single bool—you could just write:
idx = np.where(isclose(a, b, 0, atol))[0][0]
… or, if you're doing it over and over:
def whereclose(a, b, rtol=1e-05, atol=1e-08):
return np.where(isclose(a, b, rtol, atol))
idx = whereclose(a, b, 0, atol)[0][0]
As it turns out, version 1.7 of numpy does have exactly that function (see also here), but it doesn't appear to be in the docs. If you don't want to rely on a possibly-undocumented function, or need to work with numpy 1.6, you can write it yourself trivially:
def isclose(a, b, rtol=1e-05, atol=1e-08):
return np.abs(a-b) <= (atol + rtol * np.abs(b))
If you have up-to-date numpy (1.7), then the best way is to use np.isclose which will broadcast the shapes together automatically:
import numpy as np
a = np.arange(10)
n = 5.000001
np.isclose(a, n).nonzero()
#(array([5]),)
or, if you expect only one match:
np.isclose(a, n).nonzero()[0][0]
#5
(np.nonzero is basically the same thing as np.where except that it doesn't have the if condition then/else capability)
The method you use above, specifically abs(A - B) < atol, is standard for doing floating point comparisons across many languages. Obviously when using numpy A and/or B can be arrays or numbers.
Here is another approach that might be useful to look at. I'm not sure it applies to your case, but it could be very helpful if you're looking for more than one number in the array (which is a common use case). It's inspired by this question which is kind of similar.
import numpy as np
def find_close(a, b, rtol=1e-05, atol=1e-08):
tol = atol + abs(b) * rtol
lo = b - tol
hi = b + tol
order = a.argsort()
a_sorted = a[order]
left = a_sorted.searchsorted(lo)
right = a_sorted.searchsorted(hi, 'right')
return [order[L:R] for L, R in zip(left, right)]
a = np.array([2., 3., 3., 4., 0., 1.])
b = np.array([1.01, 3.01, 100.01])
print find_close(a, b, atol=.1)
# [array([5]), array([1, 2]), array([], dtype=int64)]

Categories

Resources