Unexpected output for keras ReLU layer - python

In the keras documentation, the function keras.activations.relu(x, alpha=0.0, max_value=None, threshold=0.0) is defined as:
f(x) = max_value for x >= max_value,
f(x) = x for threshold <= x < max_value,
f(x) = alpha * (x - threshold) otherwise.
I did a small test with alpha=0.01, threshold=5.0 and max_value=100.0 and for x=5.0 the output I get is f(x)=0.0.
If I am not mistaken, since x == threshold, I should get f(x)=x=5.0.
Can anyone explain please?
Thanks,
Julien

The documentation in the source code is wrong. (And you should be moving to tf.keras instead of keras). It should be,
f(x) = max_value for x >= max_value,
--> f(x) = x for threshold < x < max_value,
f(x) = alpha * (x - threshold) otherwise.
So when your x == threshold it goes to the third case which has a 0 in it (i.e. x - threshold). This is why you get 0.
If you need the behavior documented this line needs to change as,
x = x * tf.cast(tf.greater_equal(x, threshold), floatx())

Related

optimal way of defining a numerically stable sigmoid function for a list in python

For a scalar variable x, we know how to write down a numerically stable sigmoid function in python:
def sigmoid(x):
if x >= 0:
return 1. / ( 1. + np.exp(-x) )
else:
return exp(x) / ( 1. + np.exp(x) )
For a list of scalars, say z = [x_1, x_2, x_3, ...], and suppose we don't know the sign of each x_i beforehand, we could generalize the above definition and try:
def sigmoid(z):
result = []
for x in z:
if x >= 0:
result.append(1. / ( 1. + np.exp(-x) ) )
else:
result.append( exp(x) / ( 1. + np.exp(x) ) )
return result
This seems to work. However, I feel this is perhaps not the most pythonic way. How should I improve the definition in terms of 'cleanness'? Say, is there a way to use comprehension to shorten the function definition?
I'm sorry if this has been asked, because I cannot find similar questions on SO. Thank you very much for your time and help!
You are right, you can do better by using np.where, the numpy equivalent of if:
def sigmoid(x):
return np.where(x >= 0,
1 / (1 + np.exp(-x)),
np.exp(x) / (1 + np.exp(x)))
This function takes a numpy array x and returns a numpy array, too:
data = np.arange(-5,5)
sigmoid(data)
#array([0.00669285, 0.01798621, 0.04742587, 0.11920292, 0.26894142,
# 0.5 , 0.73105858, 0.88079708, 0.95257413, 0.98201379])
Fully correct answer (no warnings) was provided by #hao peng but solution wasn't explained clearly. This would be too long for a comment, so I'll go for an answer.
Let's start with analysis of a few answers (pure numpy answers only):
#DYZ accepted answer
This one is correct mathematically but still gives us a warning. Let's look at the code:
def sigmoid(x):
return np.where(
x >= 0, # condition
1 / (1 + np.exp(-x)), # For positive values
np.exp(x) / (1 + np.exp(x)) # For negative values
)
As both branches are evaluated (they are arguments, they have to be), the first branch will give us a warning for negative values and the second for positive.
Although the warnings will be raised, results from overflows will not be incorporated, hence the result is correct.
Downsides
unnecessary evaluation of both branches (twice as many operations as needed)
warnings are thrown
#ynn answer
This one is almost correct, BUT will work only on floating point values, see below:
def sigmoid(x):
return np.piecewise(
x,
[x > 0],
[lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
)
sigmoid(np.array([0.0, 1.0])) # [0.5 0.73105858] correct
sigmoid(np.array([0, 1])) # [0, 0] incorrect
Why? Longer answer was provided by
#mhawke in another thread, but the main point is:
It seems that piecewise() converts the return values to the same type
as the input so, when an integer is input an integer conversion is
performed on the result, which is then returned.
Downsides
no automatic casting due to strange behavior of piecewise function
Improved #hao peng answer
Idea of stable sigmoid comes from the fact that:
Both versions are equally efficient in terms of operations if coded correctly (one exp evaluation is enough). Now:
e^x will overflow when x is positive
e^-x will overflow when x is negative
Hence we have to branch on x equal to zero. Using numpy's masking we can transform only the part of array which is positive or negative with specific sigmoid implementations.
See code comments for additional points:
def _positive_sigmoid(x):
return 1 / (1 + np.exp(-x))
def _negative_sigmoid(x):
# Cache exp so you won't have to calculate it twice
exp = np.exp(x)
return exp / (exp + 1)
def sigmoid(x):
positive = x >= 0
# Boolean array inversion is faster than another comparison
negative = ~positive
# empty contains junk hence will be faster to allocate
# Zeros has to zero-out the array after allocation, no need for that
# See comment to the answer when it comes to dtype
result = np.empty_like(x, dtype=np.float)
result[positive] = _positive_sigmoid(x[positive])
result[negative] = _negative_sigmoid(x[negative])
return result
Time measurements
Results (50 times case test from ynn):
289.5070939064026 #DYZ
222.49267292022705 #ynn
230.81086134910583 #this
Indeed piecewise seems faster (not sure about the reasons, maybe masking and additional masking ops make it slower).
Code below was used:
import time
import numpy as np
def _positive_sigmoid(x):
return 1 / (1 + np.exp(-x))
def _negative_sigmoid(x):
# Cache exp so you won't have to calculate it twice
exp = np.exp(x)
return exp / (exp + 1)
def sigmoid(x):
positive = x >= 0
# Boolean array inversion is faster than another comparison
negative = ~positive
# empty contains juke hence will be faster to allocate than zeros
result = np.empty_like(x)
result[positive] = _positive_sigmoid(x[positive])
result[negative] = _negative_sigmoid(x[negative])
return result
N = int(1e4)
x = np.random.uniform(size=(N, N))
start: float = time.time()
for _ in range(50):
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
y1 += 1
end: float = time.time()
print(end - start)
start: float = time.time()
for _ in range(50):
y2 = np.piecewise(
x,
[x > 0],
[lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))],
)
y2 += 1
end: float = time.time()
print(end - start)
start: float = time.time()
for _ in range(50):
y2 = sigmoid(x)
y2 += 1
end: float = time.time()
print(end - start)
def sigmoid(x):
"""
A numerically stable version of the logistic sigmoid function.
"""
pos_mask = (x >= 0)
neg_mask = (x < 0)
z = np.zeros_like(x)
z[pos_mask] = np.exp(-x[pos_mask])
z[neg_mask] = np.exp(x[neg_mask])
top = np.ones_like(x)
top[neg_mask] = z[neg_mask]
return top / (1 + z)
This piece of code comes from assignment3 of cs231n, I don't really understand why we should calculate it in this way, but I know this may be the code that you are looking for. Hope to be helpful.
The accepted answer is correct but, as pointed out by this comment, it calculates both branches and is thus problematic.
Rather, you may want to use np.piecewise(). This is much faster, meaningful (np.where is not intended to define a piecewise function) and free of misleading warnings caused by entering into both branches.
Benchmark
Source Code
import numpy as np
import time
N: int = int(1e+4)
np.random.seed(0)
x: np.ndarray = np.random.random((N, N))
x *= 1e+3
start: float = time.time()
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
end: float = time.time()
print()
print(end - start)
start: float = time.time()
y2 = np.piecewise(x, [x > 0], [lambda i: 1 / (1 + np.exp(-i)), lambda i: np.exp(i) / (1 + np.exp(i))])
end: float = time.time()
print(end - start)
assert (np.array_equal(y1, y2))
Result
np.piecewise() is silent and twice faster!
test.py:12: RuntimeWarning: overflow encountered in exp
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
test.py:12: RuntimeWarning: invalid value encountered in true_divide
y1 = np.where(x > 0, 1 / (1 + np.exp(-x)), np.exp(x) / (1 + np.exp(x)))
6.32736349105835
3.138420343399048
Another alternative to your code is the following:
def sigmoid(z):
return [(1. / (1. + np.exp(-x)) if x >= 0 else (np.exp(x) / (1. + np.exp(x))) for x in z]
I wrote one trick, I guess np.where or torch.where are implemented in the same manner to deal with binary conditions:
def sigmoid(x, max_v=1.0):
sign = (torch.sign(x) + 3)//3
x = torch.abs(x)
res = max_v/(1 + torch.exp(-x))
res = res * sign + (1 - sign) * (max_v - res)
return res

How to vectorize hinge loss gradient computation

I'm computing thousands of gradients and would like to vectorize the computations in Python. The context is SVM and the loss function is Hinge Loss. Y is Mx1, X is MxN and w is Nx1.
L(w) = lam/2 * ||w||^2 + 1/m Sum i=1:m ( max(0, 1-y[i]X[i]w) )
The gradient of this is
grad = lam*w + 1/m Sum i=1:m {-y[i]X[i].T if y[i]*X[i]*w < 1, else 0}
Instead of looping through each element of the sum and evaluating the max function, is it possible to vectorize this? I want to use something like np.where like the following
grad = np.where(y*X.dot(w) < 1, -X.T.dot(y), 0)
This does not work because where the condition is true, -X.T*y is the wrong dimension.
edit: list comprehension version, would like to know if there's a cleaner or more optimal way
def grad(X,y,w,lam):
# cache y[i]*X[i].dot(w), each row of Xw is multiplied by a single element of y
yXw = y*X.dot(w)
# cache y[i]*X[i], note each row of X is multiplied by a single element of y
yX = X*y[:,np.newaxis]
# return the average of this max function
return lam*w + np.mean( [-yX[i] if yXw[i] < 1 else 0 for i in range(len(y))] )
you have two vectors A and B, and you want to return array C, such that C[i] = A[i] if B[i] < 1 and 0 else, consequently all you need to do is
C := A * sign(max(0, 1-B)) # suprisingly similar to the original hinge loss, right?:)
since
if B < 1 then 1-B > 0, thus max(0, 1-B) > 0 and sign(max(0, 1-B)) == 1
if B >= 1 then 1-B <= 0, thus max(0, 1-B) = 0 and sign(max(0, 1-B)) == 0
so in your code it will be something like
A = (y*X.dot(w)).ravel()
B = (X*y[:,np.newaxis]).ravel()
C = A * np.sign(np.maximum(0, 1-B))

Use Newton's method to find square root of a number?

Here is my code thus far. I don't know why it doesn't print anything. I hope it isn't because of some stupid mistake.
y = float(raw_input("Enter a number you want square rooted: "))
x = 0
# Newton's Method: y = (x+y)/x + 2
while y > x:
x += 0.1
if x == y/(2*y-1):
print x
else:
pass
Any suggestions or alternatives?
Any help would be greatly appreciated.
Your code doesn't resemble Newton's method at all. Here is code with rewritten logic:
y = float(raw_input("Enter a number you want square rooted: "))
# Solve f(x) = x^2 - y = 0 for x.
# Newton's method: Iterate new_x = x - f(x)/f'(x).
# Note that f'(x) = 2x. Thus new_x = x - (x^2 - y)/(2x).
prevx = -1.0
x = 1.0
while abs(x - prevx) > 1e-10: # Loop until x stabilizes
prevx = x
x = x - (x*x - y) / (2*x)
print(x)
Side note: An alternate but similar way of iteratively approximating the square root is the Babylonian method.

Defining an ellipse around data points

I have a question/idea that I am not sure how to do.
I have a scatter plot of X vs. Y
I can draw a rectangle and then pick all the points within in.
Ideally I want to define a ellipse as it better captures the shape and exclude all the points that are outside it.
How does one do this? is it even possible? I drew the plot using matplotlib.
I used Linear Regression (LR) to fit the points but thats not really what I am looking for.
I want to define APPROXIMATELY a ellipse to cover as many points as possible within it and then exclude points outside it. How can I define an equation/code to pick the ones inside ?
If you have the data structure that is represented in the graph, you can do this with a function and a list comprehension.
If you have the data in a list like this:
# Made up data
lst = [
# First element is X, second is Y.
(0,0),
(92,20),
(10,0),
(13,40),
(27,31),
(.5,.5),
]
def shape_bounds(x):
"""
Function that returns lower and upper bounds for y based on x
Using a circle as an example here.
"""
r = 4
# A circle is x**2 + y**2 = r**2, r = radius
if -r <= x <= r:
y = sqrt(r**2-x**2)
return -y, y
else:
return 1, -1 # Remember, returns lower, upper.
# This will fail any lower < x < upper test.
def in_shape(elt):
"""
Unpacks a pair and tests if y is inside the shape bounds given by x
"""
x, y = elt
lower_bound, upper_bound = shape_bounds(x)
if lower_bound < y < upper_bound:
return True
else:
return False
# Demo walkthrough
for elt in lst:
x, y = elt
print x, y
lower_bound, upper_bound = shape_bounds(x)
if lower_bound < y < upper_bound:
print "X: {0}, Y: {1} is in the circle".format(x, y)
# New list of only points inside the shape
new_lst = [x for x in lst if in_shape(x)]
As for an ellipse, try changing the shape equation based on this

Overflow Error when using Newton's Method in Python

I am trying to carry out the Newton's method in Python to solve a problem. I have followed the approach of some examples but I am getting an Overflow Error. Do you have any idea what is causing this?
def f1(x):
return x**3-(2.*x)-5.
def df1(x):
return (3.*x**2)-2.
def Newton(f, df, x, tol):
while True:
x1 = f(x) - (f(x)/df(x))
t = abs(x1-x)
if t < tol:
break
x = x1
return x
init = 2
print Newton(f1,df1,init,0.000001)
Newton's method is
so x1 = f(x) - (f(x)/df(x))
should be
x1 = x - (f(x)/df(x))
There is a bug in your code. It should be
def Newton(f, df, x, tol):
while True:
x1 = x - (f(x)/df(x)) # it was f(x) - (f(x)/df(x))
t = abs(x1-x)
if t < tol:
break
x = x1
return x
The equation you're solving is cubic, so there are two values of x where df(x)=0. Dividing by zero or a value close to zero will cause an overflow, so you need to avoid doing that.
One practical consideration for Newton's algorithm is how to handle values of x near local maxima or minima. Overflow is likely caused by dividing by something near zero. You can show this by adding a print statement before your x= line -- print x and df(x). To avoid this problem, you can calculate df(x) before dividing, and if it's below some threshold, bump the value of x up or down a small amount and try again.

Categories

Resources