Line of code in question:
summing += yval * np.log( sigmoid(np.dot(w.transpose(),xi.transpose())))
+(1-yval)* np.log(max(0.001, 1-sigmoid(np.dot(w.transpose(),xi.transpose()))))
Error:
File "classify.py", line 67, in sigmoid
return 1/(1+ math.exp(-gamma))
OverflowError: math range error
The sigmoid function is just 1/(1+ math.exp(-gamma)).
I'm getting a math range error. Does anyone see why?
You can avoid this problem by using different cases for positive and negative gamma:
def sigmoid(gamma):
if gamma < 0:
return 1 - 1/(1 + math.exp(gamma))
else:
return 1/(1 + math.exp(-gamma))
The math range error is likely because your gamma argument is a large negative value, so you are calling exp() with a large positive value. It is very easy to exceed your floating point range that way.
The problem is that, when gamma becomes large, math.exp(gamma) overflows. You can avoid this problem by noticing that
sigmoid(x) = 1 / (1 + exp(-x))
= exp(x) / (exp(x) + 1)
= 1 - 1 / (1 + exp(x))
= 1 - sigmoid(-x)
This gives you a numerically stable implementation of sigmoid which guarantees you never even call math.exp with a positive value:
def sigmoid(gamma):
if gamma < 0:
return 1 - 1 / (1 + math.exp(gamma))
return 1 / (1 + math.exp(-gamma))
Related
I am working with Neural Network from scratch and when I try to implement the stable sigmoid function, numpy where seems to behave strangely. Both functions here return RuntimeWarning: overflow encountered in exp:
#Original Function
def sigmoid(x):
return np.where(x >= 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
#Dummy function that is also misbehaving
def sigmoid(x):
return np.where(x>=0, 1 / (1 + np.exp(-x)), 0)
This is the result:
It is a runtime warning and not an error. Your code works perfectly fine. The warning is because you are trying to calculate exp(-(-1000)) which overflows float capacity (essentially returns inf). Since you have it in the denominator, I would not worry about it, bacuase it returns 1/inf = 0.
I'm keep getting this error when I execute annuity_rate(5, 100, 510) or I try negative values. How can I fix this?
It works well with a large number but somehow not working for negative and small numbers .
def pv_annuity(r, n, pmt):
""" Return the present value of an annuity of pmt to be received
each period for n periods"""
pv = pmt * (1 - (1 + r) ** (-n)) / r
return pv
def annuity_rate(n, pmt, pv):
""" return the rate of interest required to amortize the pv in n periods
with equal periodic payments of pmt"""
rate_low, rate_high = 0, 1
while True:
rate = (rate_high + rate_low) / 2
#print('trying rate', rate)
test_pv = pv_annuity(rate, n, pmt)
#print(test_pv)
if abs(pv - test_pv) <= 0.01:
break
if test_pv > pv:
rate_low = (rate_high + rate_low) / 2
if test_pv < pv:
rate_high = (rate_high + rate_low) / 2
return rate
Using your example of annuity_rate(5, 100, 510):
The rate_high will keep decreasing no matter what conditions are met, since each loop starts with division of the rate by 2.
pv_test gets higher than the pv, then will convert into zero when the rate is so low (test_pv = pmt * (1 - (1 + r) ** (-n)) / r) since the nominator is decreasing faster than the denominator.
After that, the rate keeps decreasing with zero test_pv, at an even faster rate (rate_high = (rate_high + rate_low) / 2).
The rate finally reaches 5e-324, and upon further reduction, it becomes almost zero, leading to ( pmt * (1 - (1 + r) ** (-n)) / r) being a division by zero.
Suggested solution:
Change the payment in (pv_annuity(rate, n, pmt)), it should reflect the changes in the rate.
I'm practicing using Binet formula to compute Fibonacci number, by following the Binet formula, I came up with the following code and passed the test case in leetcode:
class Solution(object):
def fib(self, N):
goldenratio = (1 + 5 ** 0.5) / 2
ratio2 = (1 - 5 ** 0.5) / 2
return int((goldenratio**N-ratio2**N)/(5**0.5))
But I don't understand the solution given by leetcode (gives correct results of course):
class Solution:
def fib(self, N):
golden_ratio = (1 + 5 ** 0.5) / 2
return int((golden_ratio ** N + 1) / 5 ** 0.5)
My question about leetcode solution is: why do they plus 1 after "golden_ratio ** N"? According to Binet formula, I think my code is correct, but I want to know why leetcode uses another way but still get correct results.
Here is the link for Binet formula:
https://artofproblemsolving.com/wiki/index.php/Binet%27s_Formula
Your code is a digital rendering of the exact formula: φ^n - ψ^n; this is correct to the precision limits of your mechanical representation, but fails as the result grows beyond that point.
The given solution is a reasonable attempt to correct that fault: instead of subtracting a precise correction amount, since that amount is trivially shown to be less than 1, the given solution merely adds 1 and truncates to the floor integer, yielding the correct result further out than your "exact" implementation.
Try generating some results:
def fib_exact(n):
goldenratio = (1 + 5 ** 0.5) / 2
ratio2 = (1 - 5 ** 0.5) / 2
return int((goldenratio**n - ratio2**n)/(5**0.5))
def fib_trunc(n):
golden_ratio = (1 + 5 ** 0.5) / 2
return int((golden_ratio ** n + 1) / 5 ** 0.5)
for n in range(100):
a = fib_trunc(n)
b = fib_exact(n)
print(n, a-b, a, b)
I have implemented the following logic and had asked this question for a different question (array range). I'm getting output but it is not going through for loop for the iteration because I have given frange(start, stop, range)
Explanation
"""Approximate definite integral of function from a to b using Simpson's method.
This function is vectorized, it uses numpy array operations to calculate the approximation.
This is an adaptive implementation, the method starts out with N=2 intervals, and try
successive sizes of N (by doubling the size), until the desired precision, is reached.
This adaptive solution uses our improved approach/equation for Simpson's method, to
avoid unnecessary recalculations of the integrand function.
a, b - Scalar float values, the begin, and endpoints of the interval we are to
integrate the function over.
f - A vectorized function, should accept a numpy array of x values, and compute the
corresponding y values for all points according to some function.
epsilon - The desired precision to calculate the integral to. Default is 8 decimal places
of precision (1e-8)
returns - A tuple, (ival, error). A scalar float value, the approximated integral of
the function over the given interval, and a scaler float value of the
approximation error on the integral
"""
Code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
import pylab as pl
def simpsons_adaptive_approximation(a, b, f, epsilon=1e-8):
N_prev = 2 # the previous number of slices
h_prev = (b - a) / N_prev # previous interval width
x = np.arange(a+h_prev, b, h_prev) # x locations of the previous interval
I_prev = h_prev * (0.5 * f(a) + 0.5 * f(b) + np.sum(f(x)))
# set up variables to adaptively iterate successively better approximations
N_cur = 2 # the current number of slices
I_cur = 0.0 # calculated in loop iteration
error = 1.0 # calculated in loop iteration
itr = 1 # keep track of the number of iterations we perform, for display/debug
h = (b-a)/float(epsilon)
I_cur = f(a) + f(b)
while error > epsilon:
for i in pl.frange(1,epsilon,1):
print('Hello')
if(i%2 ==0):
print('Hello')
I_cur = I_cur + (2*(f(a + i*h)))
else:
I_cur = I_cur + (4*(f(a + i*h)))
error = np.abs((1.0/3.0) * (I_cur - I_prev))
print("At iteration %d (N=%d), val=%0.16f prev=%0.16f error=%e" % (itr, N_cur, I_cur, I_prev, error) )
I_cur *= (h/3.0)
I_prev = I_cur
N_prev = N_cur
N_cur *= 2
itr += 1
return (I_cur, error)
Another function that calling above-mentioned function
def f2(x):
return x**4 - 2*x + 1
a = 0.0
b = 2.0
eps = 1e-10
(val, err) = simpsons_adaptive_approximation(a, b, f2, eps)
print( "Calculated value: %0.16f error: %e for an epsilon of: %e" % (val, err, eps) )
Following is the outcome
At iteration 1 (N=2), val=14.0000000000000000 prev=7.0000000000000000 error=2.333333e+00
At iteration 2 (N=4), val=93333333333.3333435058593750 prev=93333333333.3333435058593750 error=0.000000e+00
Calculated value: 622222222222222295040.0000000000000000 error: 0.000000e+00 for an epsilon of: 1.000000e-10
It should give me more iteration
Can anyone help me to iterate over for loop get more result
For a scalar variable x, we know how to write down a numerically stable sigmoid function in python:
def sigmoid(x):
if x >= 0:
return 1. / ( 1. + np.exp(-x) )
else:
return exp(x) / ( 1. + np.exp(x) )
For a list of scalars, say z = [x_1, x_2, x_3, ...], and suppose we don't know the sign of each x_i beforehand, we could generalize the above definition and try:
def sigmoid(z):
result = []
for x in z:
if x >= 0:
result.append(1. / ( 1. + np.exp(-x) ) )
else:
result.append( exp(x) / ( 1. + np.exp(x) ) )
return result
This seems to work. However, I feel this is perhaps not the most pythonic way. How should I improve the definition in terms of 'cleanness'? Say, is there a way to use comprehension to shorten the function definition?
I'm sorry if this has been asked, because I cannot find similar questions on SO. Thank you very much for your time and help!
You are right, you can do better by using np.where, the numpy equivalent of if:
def sigmoid(x):
return np.where(x >= 0,
1 / (1 + np.exp(-x)),
np.exp(x) / (1 + np.exp(x)))
This function takes a numpy array x and returns a numpy array, too:
data = np.arange(-5,5)
sigmoid(data)
#array([0.00669285, 0.01798621, 0.04742587, 0.11920292, 0.26894142,
# 0.5 , 0.73105858, 0.88079708, 0.95257413, 0.98201379])
Fully correct answer (no warnings) was provided by #hao peng but solution wasn't explained clearly. This would be too long for a comment, so I'll go for an answer.
Let's start with analysis of a few answers (pure numpy answers only):
#DYZ accepted answer
This one is correct mathematically but still gives us a warning. Let's look at the code:
def sigmoid(x):
return np.where(
x >= 0, # condition
1 / (1 + np.exp(-x)), # For positive values
np.exp(x) / (1 + np.exp(x)) # For negative values
)
As both branches are evaluated (they are arguments, they have to be), the first branch will give us a warning for negative values and the second for positive.
Although the warnings will be raised, results from overflows will not be incorporated, hence the result is correct.
Downsides
unnecessary evaluation of both branches (twice as many operations as needed)
warnings are thrown
#ynn answer
This one is almost correct, BUT will work only on floating point values, see below:
def sigmoid(x):
return np.piecewise(
x,
[x > 0],
[lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
)
sigmoid(np.array([0.0, 1.0])) # [0.5 0.73105858] correct
sigmoid(np.array([0, 1])) # [0, 0] incorrect
Why? Longer answer was provided by
#mhawke in another thread, but the main point is:
It seems that piecewise() converts the return values to the same type
as the input so, when an integer is input an integer conversion is
performed on the result, which is then returned.
Downsides
no automatic casting due to strange behavior of piecewise function
Improved #hao peng answer
Idea of stable sigmoid comes from the fact that:
Both versions are equally efficient in terms of operations if coded correctly (one exp evaluation is enough). Now:
e^x will overflow when x is positive
e^-x will overflow when x is negative
Hence we have to branch on x equal to zero. Using numpy's masking we can transform only the part of array which is positive or negative with specific sigmoid implementations.
See code comments for additional points:
def _positive_sigmoid(x):
return 1 / (1 + np.exp(-x))
def _negative_sigmoid(x):
# Cache exp so you won't have to calculate it twice
exp = np.exp(x)
return exp / (exp + 1)
def sigmoid(x):
positive = x >= 0
# Boolean array inversion is faster than another comparison
negative = ~positive
# empty contains junk hence will be faster to allocate
# Zeros has to zero-out the array after allocation, no need for that
# See comment to the answer when it comes to dtype
result = np.empty_like(x, dtype=np.float)
result[positive] = _positive_sigmoid(x[positive])
result[negative] = _negative_sigmoid(x[negative])
return result
Time measurements
Results (50 times case test from ynn):
289.5070939064026 #DYZ
222.49267292022705 #ynn
230.81086134910583 #this
Indeed piecewise seems faster (not sure about the reasons, maybe masking and additional masking ops make it slower).
Code below was used:
import time
import numpy as np
def _positive_sigmoid(x):
return 1 / (1 + np.exp(-x))
def _negative_sigmoid(x):
# Cache exp so you won't have to calculate it twice
exp = np.exp(x)
return exp / (exp + 1)
def sigmoid(x):
positive = x >= 0
# Boolean array inversion is faster than another comparison
negative = ~positive
# empty contains juke hence will be faster to allocate than zeros
result = np.empty_like(x)
result[positive] = _positive_sigmoid(x[positive])
result[negative] = _negative_sigmoid(x[negative])
return result
N = int(1e4)
x = np.random.uniform(size=(N, N))
start: float = time.time()
for _ in range(50):
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
y1 += 1
end: float = time.time()
print(end - start)
start: float = time.time()
for _ in range(50):
y2 = np.piecewise(
x,
[x > 0],
[lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
)
y2 += 1
end: float = time.time()
print(end - start)
start: float = time.time()
for _ in range(50):
y2 = sigmoid(x)
y2 += 1
end: float = time.time()
print(end - start)
def sigmoid(x):
"""
A numerically stable version of the logistic sigmoid function.
"""
pos_mask = (x >= 0)
neg_mask = (x < 0)
z = np.zeros_like(x)
z[pos_mask] = np.exp(-x[pos_mask])
z[neg_mask] = np.exp(x[neg_mask])
top = np.ones_like(x)
top[neg_mask] = z[neg_mask]
return top / (1 + z)
This piece of code comes from assignment3 of cs231n, I don't really understand why we should calculate it in this way, but I know this may be the code that you are looking for. Hope to be helpful.
The accepted answer is correct but, as pointed out by this comment, it calculates both branches and is thus problematic.
Rather, you may want to use np.piecewise(). This is much faster, meaningful (np.where is not intended to define a piecewise function) and free of misleading warnings caused by entering into both branches.
Benchmark
Source Code
import numpy as np
import time
N: int = int(1e+4)
np.random.seed(0)
x: np.ndarray = np.random.random((N, N))
x *= 1e+3
start: float = time.time()
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
end: float = time.time()
print()
print(end - start)
start: float = time.time()
y2 = np.piecewise(x, [x > 0], [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))])
end: float = time.time()
print(end - start)
assert (np.array_equal(y1, y2))
Result
np.piecewise() is silent and twice faster!
test.py:12: RuntimeWarning: overflow encountered in exp
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
test.py:12: RuntimeWarning: invalid value encountered in true_divide
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
6.32736349105835
3.138420343399048
Another alternative to your code is the following:
def sigmoid(z):
return [(1. / (1. + np.exp(-x)) if x >= 0 else (np.exp(x) / (1. + np.exp(x))) for x in z]
I wrote one trick, I guess np.where or torch.where are implemented in the same manner to deal with binary conditions:
def sigmoid(x, max_v=1.0):
sign = (torch.sign(x) + 3)//3
x = torch.abs(x)
res = max_v/(1 + torch.exp(-x))
res = res * sign + (1 - sign) * (max_v - res)
return res