I have a big data set of floating point numbers. I iterate through them and evaluate np.log(x) for each of them.
I get
RuntimeWarning: divide by zero encountered in log
I would like to get around this and return 0 if this error occurs.
I am thinking of defining a new function:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
l = np.log(x)
#if l = -inf:
l = 0
return l
Basically,I need a way of testing that the output is -inf but I don't know how to proceed.
Thank you for your help!
You are using a np function, so I can safely guess that you are working on a numpy array?
Then the most efficient way to do this is to use the where function instead of a for loop
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
otherwise you can simply use the log function and then patch the hole:
myarray= np.random.randint(10,size=10)
result = np.log(myarray)
result[result==-np.inf]=0
The np.log function return correctly -inf when used on a value of 0, so are you sure that you want to return a 0? if somewhere you have to revert to the original value, you are going to experience some problem, changing zeros into ones...
Since the log for x=0 is minus infinite, I'd simply check if the input value is zero and return whatever you want there:
def safe_ln(x):
if x <= 0:
return 0
return math.log(x)
EDIT: small edit: you should check for all values smaller than or equal to 0.
EDIT 2: np.log is of course a function to calculate on a numpy array, for single values you should use math.log. This is how the above function looks with numpy:
def safe_ln(x, minval=0.0000000001):
return np.log(x.clip(min=minval))
You can do this.
def safe_ln(x):
try:
l = np.log(x)
except ZeroDivisionError:
l = 0
return l
I like to use sys.float_info.min as follows:
>>> import numpy as np
>>> import sys
>>> arr = np.linspace(0.0, 1.0, 3)
>>> print(arr)
[0. 0.5 1. ]
>>> arr[arr < sys.float_info.min] = sys.float_info.min
>>> print(arr)
[2.22507386e-308 5.00000000e-001 1.00000000e+000]
>>> np.log10(arr)
array([-3.07652656e+02, -3.01029996e-01, 0.00000000e+00])
Other answers have also introduced small positive values, but I prefer to use the smallest possible value to make the approximation more accurate.
The answer given by Enrico is nice, but both solutions result in a warning:
RuntimeWarning: divide by zero encountered in log
As an alternative, we can still use the where function but only execute the main computation where it is appropriate:
# alternative implementation -- a bit more typing but avoids warnings.
loc = np.where(myarray>0)
result2 = np.zeros_like(myarray, dtype=float)
result2[loc] =np.log(myarray[loc])
# answer from Enrico...
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)
# check it is giving right solution:
print(np.allclose(result, result2))
My use case was for division, but the principle is clearly the same:
x = np.random.randint(10, size=10)
divisor = np.ones(10,)
divisor[3] = 0 # make one divisor invalid
y = np.zeros_like(divisor, dtype=float)
loc = np.where(divisor>0) # (or !=0 if your data could have -ve values)
y[loc] = x[loc] / divisor[loc]
use exception handling:
In [27]: def safe_ln(x):
try:
return math.log(x)
except ValueError: # np.log(x) might raise some other error though
return float("-inf")
....:
In [28]: safe_ln(0)
Out[28]: -inf
In [29]: safe_ln(1)
Out[29]: 0.0
In [30]: safe_ln(-100)
Out[30]: -inf
you could do:
def safe_ln(x):
#returns: ln(x) but replaces -inf with 0
try:
l = np.log(x)
except RunTimeWarning:
l = 0
return l
For those looking for a np.log solution that intakes a np.ndarray and nudges up only zero values:
import sys
import numpy as np
def smarter_nextafter(x: np.ndarray) -> np.ndarray:
safe_x = np.where(x != 0, x, np.nextafter(x, 1))
return np.log(safe_x)
def clip_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/13497931/
clipped_x = x.clip(min=safe_min or np.finfo(x.dtype).min)
return np.log(clipped_x)
def inplace_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
# Inspiration: https://stackoverflow.com/a/62292638/
x[x == 0] = safe_min or np.finfo(x.dtype).min
return np.log(x)
Or if you don't mind nudging all values and like bad big-O runtimes:
def brute_nextafter(x: np.ndarray) -> np.ndarray:
# Just for reference, don't use this
while not x.all():
x = np.nextafter(x, 1)
return np.log(x)
Related
I have a NumPy array with the following properties:
shape: (9986080, 2)
dtype: np.float32
I have a method that loops over the range of the array, performs an operation and then inputs result to new array:
def foo(arr):
new_arr = np.empty(arr.size, dtype=np.uint64)
for i in range(arr.size):
x, y = arr[i]
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
new_arr[i] = int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))
I agree with Iguananaut's comment that this data structure seems a bit odd. My biggest problem with it is that it is really tricky to try and vectorize the putting together of integers in a string and then re-converting that to an integer. Still, this will certainly help speed up the function:
def foo(arr):
x_values = arr[:,0]
y_values = arr[:,1]
ones = np.ones(arr.shape[0], dtype=np.uint64)
e = np.char.array(np.where(x_values < 0, ones, ones * 2))
n = np.char.array(np.where(y_values < 0, ones * 3, ones * 4))
x_values = np.char.array(np.absolute(x_values))
y_values = np.char.array(np.absolute(y_values))
x_values = np.char.replace(x_values, '.', '')
y_values = np.char.replace(y_values, '.', '')
new_arr = np.char.add(np.char.add(x_values, e), np.char.add(y_values, n))
return new_arr.astype(np.uint64)
Here, the x and y values of the input array are first split up. Then we use a vectorized computation to determine where e and n should be 1 or 2, 3 or 4. The last line uses a standard list comprehension to do the string merging bit, which is still undesirably slow for super large arrays but faster than a regular for loop. Also vectorizing the previous computations should speed the function up hugely.
Edit:
I was mistaken before. Numpy does have a nice way of handling string concatenation using the np.char.add() method. This requires converting x_values and y_values to Numpy character arrays using np.char.array(). Also for some reason, the np.char.add() method only takes two arrays as inputs, so it is necessary to first concatenate x_values and e and y_values and n and then concatenate these results. Still, this vectorizes the computations and should be pretty fast. The code is still a bit clunky because of the rather odd operation you are after, but I think this will help you speed up the function greatly.
You may use np.apply_along_axis. When you feed this function with another function that takes row (or column) as an argument, it does what you want to do.
For you case, You may rewrite the function as below:
def foo(row):
x, y = row
e, n = ''
if x < 0:
e = '1'
else:
w = '2'
if y > 0:
n = '3'
else:
s = '4'
return int(f'{abs(x)}{e}{abs(y){n}'.replace('.', ''))
# Where you want to you use it.
new_arr = np.apply_along_axis(foo, 1, n)
I have to write a function, s(x) = x * sin(3/x) in python that is capable of taking single values or vectors/arrays, but I'm having a little trouble handling the cases when x is zero (or has an element that's zero). This is what I have so far:
def s(x):
result = zeros(size(x))
for a in range(0,size(x)):
if (x[a] == 0):
result[a] = 0
else:
result[a] = float(x[a] * sin(3.0/x[a]))
return result
Which...doesn't work for x = 0. And it's kinda messy. Even worse, I'm unable to use sympy's integrate function on it, or use it in my own simpson/trapezoidal rule code. Any ideas?
When I use integrate() on this function, I get the following error message: "Symbol" object does not support indexing.
This takes about 30 seconds per integrate call:
import sympy as sp
x = sp.Symbol('x')
int2 = sp.integrate(x*sp.sin(3./x),(x,0.000001,2)).evalf(8)
print int2
int1 = sp.integrate(x*sp.sin(3./x),(x,0,2)).evalf(8)
print int1
The results are:
1.0996940
-4.5*Si(zoo) + 8.1682775
Clearly you want to start the integration from a small positive number to avoid the problem at x = 0.
You can also assign x*sin(3./x) to a variable, e.g.:
s = x*sin(3./x)
int1 = sp.integrate(s, (x, 0.00001, 2))
My original answer using scipy to compute the integral:
import scipy.integrate
import math
def s(x):
if abs(x) < 0.00001:
return 0
else:
return x*math.sin(3.0/x)
s_exact = scipy.integrate.quad(s, 0, 2)
print s_exact
See the scipy docs for more integration options.
If you want to use SymPy's integrate, you need a symbolic function. A wrong value at a point doesn't really matter for integration (at least mathematically), so you shouldn't worry about it.
It seems there is a bug in SymPy that gives an answer in terms of zoo at 0, because it isn't using limit correctly. You'll need to compute the limits manually. For example, the integral from 0 to 1:
In [14]: res = integrate(x*sin(3/x), x)
In [15]: ans = limit(res, x, 1) - limit(res, x, 0)
In [16]: ans
Out[16]:
9⋅π 3⋅cos(3) sin(3) 9⋅Si(3)
- ─── + ──────── + ────── + ───────
4 2 2 2
In [17]: ans.evalf()
Out[17]: -0.164075835450162
Suppose I have a function whose range is a scalar but whose domain is a vector. For example:
def func(x):
return x[0] + 1 + x[1]**2
What's a good way to find the a root of this function? scipy.optimize.fsolve and scipy.optimize.root expect func to return a vector (rather than a scalar), and scipy.optimize.newton only takes scalar arguments. I can redefine func as
def func(x):
return [x[0] + 1 + x[1]**2, 0]
Then root and fsolve can find a root, but the zeros in the Jacobian means it won't always do a good job. For example:
fsolve(func, array([0,2]))
=> array([-5, 2])
It'll only vary the first parameter but not the second, meaning that it often finds a zero that's far away.
EDIT: it looks like the following redefinition of func works better:
def func(x):
fx = x[0] + 1 + x[1]**2
return [fx, fx]
fsolve(func, array([0,5]))
=>array([-16.27342781, 3.90812331])
So it's now willing to change both parameters. The code is still kind of ugly though.
Have you tried the minimization of the absolute value of your function using fmin?
For example:
>>> import scipy.optimize as op
>>> import numpy as np
>>> def func(x):
>>> return x[0] + 1 + x[1]**2
>>> func1 = lambda x: np.abs(func(x))
>>> tmp = op.fmin(func1, [10000., 10000.])
>>> func(tmp)
0.0
>>> print tmp
[-8346.12025122 91.35162971]
Since -- for my problem -- I have a good initial guess and a non-crazy function, Newton's method works well. For a scalar, multidimensional function, Newton's method becomes:
Here's a rough code example:
def func(x): #the function to find a root of
return x[0] + 1 + x[1]**2
def dfunc(x): #the gradient of that function
return array([1, 2*x[1]])
def newtRoot(x0, func, dfunc):
x = array(x0)
for n in xrange(100): # do at most 100 iterations
f = func(x)
df = dfunc(x)
if abs(f) < 1e-6: # exit function if we're close enough
break
x = x - df*f/norm(df)**2 # update guess
return x
In use:
nsolve([0,2],func,dfunc)
=> array([-1.0052546 , 0.07248865])
func([-1.0052546 , 0.07248865])
=> 4.3788225025098715e-09
Not bad! Of course, this function is very rough, but you get the idea. It also won't work well for "tricky" functions or where you don't have a good starting guess. I think I'll use something like this but then fall back to fsolve or root if Newton's method doesn't converge.
When I use this random generator: numpy.random.multinomial, I keep getting:
ValueError: sum(pvals[:-1]) > 1.0
I am always passing the output of this softmax function:
def softmax(w, t = 1.0):
e = numpy.exp(numpy.array(w) / t)
dist = e / np.sum(e)
return dist
except now that I am getting this error, I also added this for the parameter (pvals):
while numpy.sum(pvals) > 1:
pvals /= (1+1e-5)
but that didn't solve it. What is the right way to make sure I avoid this error?
EDIT: here is function that includes this code
def get_MDN_prediction(vec):
coeffs = vec[::3]
means = vec[1::3]
stds = np.log(1+np.exp(vec[2::3]))
stds = np.maximum(stds, min_std)
coe = softmax(coeffs)
while np.sum(coe) > 1-1e-9:
coe /= (1+1e-5)
coeff = unhot(np.random.multinomial(1, coe))
return np.random.normal(means[coeff], stds[coeff])
I also encountered this problem during my language modelling work.
The root of this problem rises from numpy's implicit data casting: the output of my sorfmax() is in float32 type, however, numpy.random.multinomial() will cast the pval into float64 type IMPLICITLY. This data type casting would cause pval.sum() exceed 1.0 sometimes due to numerical rounding.
This issue is recognized and posted here
I know the question is old but since I faced the same problem just now, it seems to me it's still valid. Here's the solution I've found for it:
a = np.asarray(a).astype('float64')
a = a / np.sum(a)
b = np.random.multinomial(1, a, 1)
I've made the important part bold. If you omit that part the problem you've mentioned will happen from time to time. But if you change the type of array into float64, it will never happen.
Something that few people noticed: a robust version of the softmax can be easily obtained by removing the logsumexp from the values:
from scipy.misc import logsumexp
def log_softmax(vec):
return vec - logsumexp(vec)
def softmax(vec):
return np.exp(log_softmax(vec))
Just check it:
print(softmax(np.array([1.0, 0.0, -1.0, 1.1])))
Simple, isn't it?
The softmax implementation I was using is not stable enough for the values I was using it with. As a result, sometimes the output has a sum greater than 1 (e.g. 1.0000024...).
This case should be handled by the while loop. But sometimes the output contains NaNs, in which case the loop is never triggered, and the error persists.
Also, numpy.random.multinomial doesn't raise an error if it sees a NaN.
Here is what I'm using right now, instead:
def softmax(vec):
vec -= min(A(vec))
if max(vec) > 700:
a = np.argsort(vec)
aa = np.argsort(a)
vec = vec[a]
i = 0
while max(vec) > 700:
i += 1
vec -= vec[i]
vec = vec[aa]
e = np.exp(vec)
return e/np.sum(e)
def sample_multinomial(w):
"""
Sample multinomial distribution with parameters given by softmax of w
Returns an int
"""
p = softmax(w)
x = np.random.uniform(0,1)
for i,v in enumerate(np.cumsum(p)):
if x < v: return i
return len(p)-1 # shouldn't happen...
Can someone help me to find a solution on how to calculate a cubic root of the negative number using python?
>>> math.pow(-3, float(1)/3)
nan
it does not work. Cubic root of the negative number is negative number. Any solutions?
A simple use of De Moivre's formula, is sufficient to show that the cube root of a value, regardless of sign, is a multi-valued function. That means, for any input value, there will be three solutions. Most of the solutions presented to far only return the principle root. A solution that returns all valid roots, and explicitly tests for non-complex special cases, is shown below.
import numpy
import math
def cuberoot( z ):
z = complex(z)
x = z.real
y = z.imag
mag = abs(z)
arg = math.atan2(y,x)
return [ mag**(1./3) * numpy.exp( 1j*(arg+2*n*math.pi)/3 ) for n in range(1,4) ]
Edit: As requested, in cases where it is inappropriate to have dependency on numpy, the following code does the same thing.
def cuberoot( z ):
z = complex(z)
x = z.real
y = z.imag
mag = abs(z)
arg = math.atan2(y,x)
resMag = mag**(1./3)
resArg = [ (arg+2*math.pi*n)/3. for n in range(1,4) ]
return [ resMag*(math.cos(a) + math.sin(a)*1j) for a in resArg ]
You could use:
-math.pow(3, float(1)/3)
Or more generally:
if x > 0:
return math.pow(x, float(1)/3)
elif x < 0:
return -math.pow(abs(x), float(1)/3)
else:
return 0
math.pow(abs(x),float(1)/3) * (1,-1)[x<0]
You can get the complete (all n roots) and more general (any sign, any power) solution using:
import cmath
x, t = -3., 3 # x**(1/t)
a = cmath.exp((1./t)*cmath.log(x))
p = cmath.exp(1j*2*cmath.pi*(1./t))
r = [a*(p**i) for i in range(t)]
Explanation:
a is using the equation xu = exp(u*log(x)). This solution will then be one of the roots, and to get the others, rotate it in the complex plane by a (full rotation)/t.
Taking the earlier answers and making it into a one-liner:
import math
def cubic_root(x):
return math.copysign(math.pow(abs(x), 1.0/3.0), x)
The cubic root of a negative number is just the negative of the cubic root of the absolute value of that number.
i.e. x^(1/3) for x < 0 is the same as (-1)*(|x|)^(1/3)
Just make your number positive, and then perform cubic root.
You can also wrap the libm library that offers a cbrt (cube root) function:
from ctypes import *
libm = cdll.LoadLibrary('libm.so.6')
libm.cbrt.restype = c_double
libm.cbrt.argtypes = [c_double]
libm.cbrt(-8.0)
gives the expected
-2.0
numpy has an inbuilt cube root function cbrt that handles negative numbers fine:
>>> import numpy as np
>>> np.cbrt(-8)
-2.0
This was added in version 1.10.0 (released 2015-10-06).
Also works for numpy array / list inputs:
>>> np.cbrt([-8, 27])
array([-2., 3.])
You can use cbrt from scipy.special:
>>> from scipy.special import cbrt
>>> cbrt(-3)
-1.4422495703074083
This also works for arrays.
this works with numpy array as well:
cbrt = lambda n: n/abs(n)*abs(n)**(1./3)
Primitive solution:
def cubic_root(nr):
if nr<0:
return -math.pow(-nr, float(1)/3)
else:
return math.pow(nr, float(1)/3)
Probably massively non-pythonic, but it should work.
I just had a very similar problem and found the NumPy solution from this forum post.
In a nushell, we can use of the NumPy sign and absolute methods to help us out. Here is an example that has worked for me:
import numpy as np
x = np.array([-81,25])
print x
#>>> [-81 25]
xRoot5 = np.sign(x) * np.absolute(x)**(1.0/5.0)
print xRoot5
#>>> [-2.40822469 1.90365394]
print xRoot5**5
#>>> [-81. 25.]
So going back to the original cube root problem:
import numpy as np
y = -3.
np.sign(y) * np.absolute(y)**(1./3.)
#>>> -1.4422495703074083
I hope this helps.
For an arithmetic, calculator-like answer in Python 3:
>>> -3.0**(1/3)
-1.4422495703074083
or -3.0**(1./3) in Python 2.
For the algebraic solution of x**3 + (0*x**2 + 0*x) + 3 = 0 use numpy:
>>> p = [1,0,0,3]
>>> numpy.roots(p)
[-3.0+0.j 1.5+2.59807621j 1.5-2.59807621j]
New in Python 3.11
There is now math.cbrt which handles negative roots seamlessly:
>>> import math
>>> math.cbrt(-3)
-1.4422495703074083