Understanding np.where() in the context of Logistic Regression - python

I am currently studying the Deep Learning specialization taught on Coursera by Andrew Ng. In the first assignment, I have to define a prediction function, and wanted to know if my alternative solution is as valid as the actual solution.
Please let me know if my understanding of the np.where() function is correct as I have commented on this in the code under "ALTERNATIVE SOLUTION COMMENTS". Also, it would be much appreciated if my understanding under the "ACTUAL SOLUTION COMMENTS" could be checked as well.
The alternate solution that uses np.where() also works when I try to increase the number of examples/inputs in X the current amount (m = 3), to 4, to 5, and so on.
Let me know what you think, and if both solutions are just as good as the other! Thanks.
def predict(w, b, X):
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
m = X.shape[1]
Y_prediction = np.zeros((1,m)) # Initialize Y_prediction as an array of zeros
w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
### START CODE HERE ### (≈ 1 line of code)
A = sigmoid(np.dot(w.T, X) + b) # Note: The shape of A will always be a (1,m) row vector
for i in range(A.shape[1]): # for i in range(# of examples in A = # of examples in our set)
# Convert probabilities A[0,i] to actual predictions p[0,i]
### START CODE HERE ### (≈ 4 lines of code)
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
The above reads as:
Change/update the i-th value of Y_prediction to 1 if the corresponding i-th value in A is > 0.5.
Otherwise, change/update the i-th value of Y_prediction to 0.
To condense this code, you could delete the for loop and Y_prediction var from the top,
and then use the following one line:
return np.where(A > 0.5, np.ones((1,m)), np.zeros((1,m)))
This reads as:
Given the condition > 0.5, return np.ones((1,m)) if True,
or return np.zeros((1,m)) if False.
Another way to understand this is as follows:
Tell me where in the array A, entries satisfies the condition A > 0.5,
At those positions, give me np.ones((1,m)), otherwise, give me
assert(Y_prediction.shape == (1, m))
return Y_prediction
w = np.array([[0.1124579],[0.23106775]])
b = -0.3
X = np.array([[1.,-1.1,-3.2],[1.2,2.,0.1]])
print(sigmoid(np.dot(w.T, X) + b))
print ("predictions = " + str(predict(w, b, X))) # Output gives 1,1,0 as expected

Your alternative approach seems fine. As a remark, I'll add that you don't even need the np.ones and np.zeros, you can just specify directly the integers 0 and 1. When using np.where, as long as X and y (the values to replace according to the condition) and the same condition are broadcastable, it should work fine.
Here's a simple example:
y_pred = np.random.rand(1,6).round(2)
# array([[0.53, 0.54, 0.68, 0.34, 0.53, 0.46]])
np.where(y_pred> 0.5, np.ones((1,6)), np.zeros((1,6)))
# array([[1., 1., 1., 0., 1., 0.]])
And using integers:
np.where(y_pred> 0.5,1,0)
# array([[1, 1, 1, 0, 1, 0]])
As per your comments on how the function works it is indeed working as you describe. Perhaps just instead of To condense this code, I'd argue that using numpy makes it more efficient, and also intelligible in this case.


Layer normalization in pytorch

I'm trying to test layer normalization function of PyTorch.
But I don't know why b[0] and result have different values here
Did I do something wrong ?
import numpy as np
import torch
import torch.nn as nn
a = torch.randn(1, 5)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
input: a[0] = tensor([-1.3549, 0.3857, 0.1110, -0.8456, 0.1486])
output: b[0] = tensor([-1.5561, 1.0386, 0.6291, -0.7967, 0.6851])
mean = torch.mean(a[0])
var = torch.var(a[0])
result = (a[0]-mean)/(torch.sqrt(var+1e-5))
result = tensor([-1.3918, 0.9289, 0.5627, -0.7126, 0.6128])
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0] (or [-1.0, 1.0]) . I can't understand why. Please let me know if you have any hints
a = torch.randn(1, 2)
m = nn.LayerNorm(a.size()[1:], elementwise_affine= False)
b = m(a)
b = tensor([-1.0000, 1.0000])
For calculating the variance use torch.var(a[0], unbiased=False). Then you will get the same result. By default pytorch calculates the unbiased estimation of the variance.
For your 1st question, as #Theodor said, you need to use unbiased=False unbiased when calculating variance.
Only if you want to explore more: As your input size is 5, unbiased estimation of variance will be 5/4 = 1.25 times the biased estimation. Because unbiased estimation uses N-1 instead of N in the denominator. As a result, each value of result that you generated, is sqrt(4/5) = 0.8944 times the values of b[0].
About your 2nd question:
And, for n*2 normalization , the result of pytorch layer norm is always [1.0 , -1.0]
This is reasonable. Suppose only two elements are a and b. So, mean will be (a+b)/2 and variance ((a-b)^2)/4. So, the normalization result will be [((a-b)/2) / (sqrt(variance)) ((b-a)/2) / (sqrt(variance))] which is essentially [1, -1] or [-1, 1] depending on a > b or a < b.

How can I code if statement in a TensorFlow graph?

In the model I want to build, there are two placeholders
x = tf.placeholder('float32', shape=[1000, 10])
tags = tf.placeholder('int32', shape=[1000, 1])
(1000 is just the number of examples)
x holds the inputs to neural networks, tags determines which one of the three neural networks will be used to compute the output.
w1 = tf.get_variable('w1', [10, 1], tf.truncated_normal_initializer())
w2 = tf.get_variable('w2', [10, 1], tf.truncated_normal_initializer())
w3 = tf.get_variable('w3', [10, 1], tf.truncated_normal_initializer())
def nn_1(): return tf.matmul(x, w1)
def nn_2(): return tf.matmul(x, w2)
def nn_3(): return tf.matmul(x, w3)
I want to find an elegant way to implement a TensorFlow graph which can compute the output of an x given its tag.
[x1, x2, x3, ..., xn]
[1, 2, 3, ..., 1]
[nn_1(x), nn_2(x), nn_3(x), ..., nn_1(x)]
If x and tags are not arrays, I can implement it with tf.case, for example,
a = tf.placeholder('int32')
b = tf.placeholder('int32')
result = tf.case(
tf.equal(b, 1): a + 1,
tf.equal(b, 2): a + 2
But I have no idea how to do when x and tags are arrays.
You can use some math trick to do the job.
let's say you want to implement the code you implemented but with a and b being arrays.
First, you compute an array of condition.
This would be the condition that must be true in order to apply the operation.
Typically conditions use "less", "equal", "greater" operation or a logical composition of those.
You can use tf.bitwise or tf.math.logical* for logical operation and tf.math for the others.
The condition must be a boolean array. 1 if the condition is true, 0 if false.
After that, you initialize the result array with the default value (what is in the "else" statement)
To apply the condition you simply multiply the condition array with the value you want to assign.
The code would be something like this.
//default value
result = tf.zeros(tf.shape(a)[0])
condition = tf.equal(b, index)
condition = tf.cast(condition, tf.float32)
result = tf.multiply(condition, a) + index
If you want to use tag as index of the functions array you need to use a 2d array. Create a 2d array of all possible combination nn X x.
This array will contain nn_j(x[i]) for each i,j couple.
To do this you need to create an array x X nn X 2 array.
First, expand x and create an array with x X nn array
if your x is x=[0,2,1] and len(n) = 2 then you need to have x_nn = [[0,0], [2,2], [1,1]].
nn_x = x
nn_x = tf.expand_dims(nn_x,0)
nn_x = tf.tile(nn_x, [len(nn), 1])
Then you create a 2d array with the same shape having the index of nn.
For the arrey used early index2d = [[0,1],[0,1],[0,1]]
index = tf.linspace(0,len(nn)-1)
index2d = tf.expand_dims(index,0)
index2d = tf.tile(index2d, tf.shape(x)[0])
Then you need to stack these and arrays, move the first dimension at the last place, then flat along axis 0 and 1.
In this way you will have map2d = [[0,0],[0,1],[2,0],[2,1],[1,0],[1,1]]
For each couple the first is the value of x, second is the index of the nn
Then you map this 2d array using the tf.map_fn function. write something like
tf.map_fn(t => [nn[t[1]](t[0]), t[1]], map2d)
Now you have all possible value of nn for each x
At this point, you can reshape back map2d compare map2d[:,:,1] with you tag and select the one that is equal.
#reshape map2d
# ...
# transform tag
tag2d = tag
tag2d = tf.expand_dims(tag2d,0)
tag2d = tf.tile(tag2d, [len(nn), 1])
result = tf.equal(tag, map2d[:,:,1])
the result will have only one non-zero value for each column
result = tf.multiply(result, map2d[:,:,0])
result = tf.reduce_max(result, [1])
I didn't try the code, but the mechanism should work.
Hope this help

Tensorflow: use different expression for forward and backward pass

I have a tensorflow expression where I want to use a different expression depending on whether I'm computing the forward or backward (gradient) pass. Specifically, I want to ignore the effects of some randomness (noise) added into the network during the backwards pass.
Here's a simplified example
import numpy as np
import tensorflow as tf
x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
g = tf.gradients(z, x)[0]
with tf.Session() as sess:
yv, yuv, zv, gv = sess.run([y,yu,z,g], {x: [-2, -1, 1]})
which outputs something like
[4. 1. 1.]
[4.1626534 0.9370764 1.0806011]
[2.0402582 0.96802706 1.0395197 ]
[-1.0201291 -0.96802706 1.0395197 ]
The last values here are the derivative of z with respect to x. I would like them to not include the multiplicative noise term u, i.e. they should consistently be [-1, -1, 1] for these input values of x.
Is there a way to do such a thing only using Python? I know I can make a custom operator in C and define a custom gradient for it, but I'd like to avoid this if possible.
Also, I'm hoping to use this as part of a Keras layer, so a Keras-based solution would be an alternative (i.e. if one could define a different expression for the forwards and backwards pass through a Keras layer). This does mean that just defining a second expression z2 = tf.sqrt(y) and calling gradients on that isn't a solution for me, though, because I don't know how I would put that in Keras (since in Keras, it will be part of a very long computational graph).
The short answer is that Sergey Ioffe's trick, which you mentioned above, will only work if it's applied at the very end of the graph, right before the gradient computation.
I am assuming that you tried the following, which will not work:
yu_fixed = tf.stop_gradient(yu - y) + y
z = tf.sqrt(yu_fixed)
This still outputs random-tainted gradients.
To see why, let's follow along the gradient computation. Let's use s as shorthand for tf.stop_gradient. The way this works is that when TensorFlow needs to compute s(expr), it just returns expr, but when it needs to compute the gradient of s(expr), it returns 0.
We want to compute the gradient of z = sqrt(s(yu - y) + y). Now, because
we find that the gradient of z contains both a term with the derivative of s(), but also a term containing s() itself. This latter term will not zero out the s() portion, so the computed derivative of z will depend (in some odd and incorrect way) on the value yu. This is why the above solution still contains randomness in the gradient.
As far as I can see, the only way to work around this is to apply Ioffe's trick as the last stage before the tf.gradient. In other words, if you do something like the following, you will get the expected result:
x = tf.placeholder(tf.float32)
y = x**2
u = tf.random_uniform(tf.shape(x), minval=0.9, maxval=1.1)
yu = y * u
z = tf.sqrt(yu)
z_fixed = tf.stop_gradient(z - tf.sqrt(y)) + tf.sqrt(y)
g = tf.gradients(z_fixed, x)[0]
with tf.Session() as sess:
yv, yuv, zv, gv = sess.run([y,yu,z_fixed,g], {x: [-2, -1, 1]})
[ 4. 1. 1.]
[ 3.65438652 1.07519293 0.94398856]
[ 1.91164494 1.03691506 0.97159076]
[-1. -1. 1.]

Weighted smoothing of a 1D array - Python

I am quite new to Python and I have an array of some parameter detections, some of the values were detected incorrectly and (like 4555555):
array = [1, 20, 55, 33, 4555555, 1]
And I want to somehow smooth it. Right now I'm doing that with a weighted mean:
def smoothify(array):
for i in range(1, len(array) - 2):
array[i] = 0.7 * array[i] + 0.15 * (array[i - 1] + array[i + 1])
return array
But it works pretty bad, of course, we can take a weighted mean of more than 3 elements, but it results in copypasting... I tried to find some native functions for that, but I failed.
Could you please help me with that?
P.S. Sorry if it's a noob question :(
Thanks for your time,
Best regards, Anna
For weighted smoothing purposes, you are basically looking to perform convolution. For our case, since we are dealing with 1D arrays, we can simply use NumPy's 1D convolution function : np.convolve for a vectorized solution. The only important thing to remember here is that the weights are to be reversed given the nature of convolution that uses a reversed version of the kernel that slides across the main input array. Thus, the solution would be -
weights = [0.7,0.15,0.15]
out = np.convolve(array,np.array(weights)[::-1],'same')
If you were looking to get weighted mean, you could get those with out/sum(weights). In our case, since the sum of the given weights is already 1, so the output would stay the same as out.
Let's plot the output alongwith the input for a graphical debugging -
# Input array and weights
array = [1, 20, 55, 33, 455, 200, 100, 20 ]
weights = [0.7,0.15,0.15]
out = np.convolve(array,np.array(weights)[::-1],'same')
x = np.arange(len(array))
f, axarr = plt.subplots(2, sharex=True, sharey=True)
axarr[0].set_title('Original and smoothened arrays')
Output -
Would suggest numpy.average to help you with this. the trick is getting the weights calculated - below I zip up the three lists - one the same as the original array, the next one step ahead, the next one step behind. Once we have the weights, we feed them into the np.average function
import numpy as np
array = [1, 20, 55, 33, 4555555, 1]
arrayCompare = zip(array, array[1:] + [0], [0] + array)
weights = [.7 * x + .15 * (y + z) for x, y, z in arrayCompare]
avg = np.average(array, weights=weights)
Maybe you want to have a look at numpy and in particular at numpy.average.
Also, did you see this question Weighted moving average in python? Might be helpful, too.
Since you tagged this with numpy I wrote how I would do this with numpy:
import numpy as np
def smoothify(thisarray):
returns moving average of input using:
out(n) = .7*in(n) + 0.15*( in(n-1) + in(n+1) )
# make sure we got a numpy array, else make it one
if type(thisarray) == type([]): thisarray = np.array(thisarray)
# do the moving average by adding three slices of the original array
# returns a numpy array,
# could be modified to return whatever type we put in...
return 0.7 * thisarray[1:-1] + 0.15 * ( thisarray[2:] + thisarray[:-2] )
myarray = [1, 20, 55, 33, 4555555, 1]
smootharray = smoothify(myarray)
Instead of looping through the original array, with numpy you can get "slices" by indexing. The output array will be two items shorter than the input array. The central points (n) are thisarray[1:-1] : "From item index 1 until the last item (not inclusive)". The other slices are "From index 2 until the end" and "Everything except the last two"

What is the difference between np.mean and tf.reduce_mean?

In the MNIST beginner tutorial, there is the statement
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
tf.cast basically changes the type of tensor the object is, but what is the difference between tf.reduce_mean and np.mean?
Here is the doc on tf.reduce_mean:
reduce_mean(input_tensor, reduction_indices=None, keep_dims=False, name=None)
input_tensor: The tensor to reduce. Should have numeric type.
reduction_indices: The dimensions to reduce. If None (the defaut), reduces all dimensions.
# 'x' is [[1., 1. ]]
# [2., 2.]]
tf.reduce_mean(x) ==> 1.5
tf.reduce_mean(x, 0) ==> [1.5, 1.5]
tf.reduce_mean(x, 1) ==> [1., 2.]
For a 1D vector, it looks like np.mean == tf.reduce_mean, but I don't understand what's happening in tf.reduce_mean(x, 1) ==> [1., 2.]. tf.reduce_mean(x, 0) ==> [1.5, 1.5] kind of makes sense, since mean of [1, 2] and [1, 2] is [1.5, 1.5], but what's going on with tf.reduce_mean(x, 1)?
The functionality of numpy.mean and tensorflow.reduce_mean are the same. They do the same thing. From the documentation, for numpy and tensorflow, you can see that. Lets look at an example,
c = np.array([[3.,4], [5.,6], [6.,7]])
Mean = tf.reduce_mean(c,1)
with tf.Session() as sess:
result = sess.run(Mean)
[ 3.5 5.5 6.5]
[ 3.5 5.5 6.5]
Here you can see that when axis(numpy) or reduction_indices(tensorflow) is 1, it computes mean across (3,4) and (5,6) and (6,7), so 1 defines across which axis the mean is computed. When it is 0, the mean is computed across(3,5,6) and (4,6,7), and so on. I hope you get the idea.
Now what are the differences between them?
You can compute the numpy operation anywhere on python. But in order to do a tensorflow operation, it must be done inside a tensorflow Session. You can read more about it here. So when you need to perform any computation for your tensorflow graph(or structure if you will), it must be done inside a tensorflow Session.
Lets look at another example.
npMean = np.mean(c)
tfMean = tf.reduce_mean(c)
Add = tfMean + 1
with tf.Session() as sess:
result = sess.run(Add)
We could increase mean by 1 in numpy as you would naturally, but in order to do it in tensorflow, you need to perform that in Session, without using Session you can't do that. In other words, when you are computing tfMean = tf.reduce_mean(c), tensorflow doesn't compute it then. It only computes that in a Session. But numpy computes that instantly, when you write np.mean().
I hope it makes sense.
The key here is the word reduce, a concept from functional programming, which makes it possible for reduce_mean in TensorFlow to keep a running average of the results of computations from a batch of inputs.
If you are not familiar with functional programming, this can seem mysterious. So first let us see what reduce does. If you were given a list like [1,2,5,4] and were told to compute the mean, that is easy - just pass the whole array to np.mean and you get the mean. However what if you had to compute the mean of a stream of numbers? In that case, you would have to first assemble the array by reading from the stream and then call np.mean on the resulting array - you would have to write some more code.
An alternative is to use the reduce paradigm. As an example, look at how we can use reduce in python to calculate the sum of numbers:
reduce(lambda x,y: x+y, [1,2,5,4]).
It works like this:
Step 1: Read 2 digits from the list - 1,2. Evaluate lambda 1,2. reduce stores the result 3. Note - this is the only step where 2 digits are read off the list
Step 2: Read the next digit from the list - 5. Evaluate lambda 5, 3 (3 being the result from step 1, that reduce stored). reduce stores the result 8.
Step 3: Read the next digit from the list - 4. Evaluate lambda 8,4 (8 being the result of step 2, that reduce stored). reduce stores the result 12
Step 4: Read the next digit from the list - there are none, so return the stored result of 12.
Read more here Functional Programming in Python
To see how this applies to TensorFlow, look at the following block of code, which defines a simple graph, that takes in a float and computes the mean. The input to the graph however is not a single float but an array of floats. The reduce_mean computes the mean value over all those floats.
import tensorflow as tf
inp = tf.placeholder(tf.float32)
mean = tf.reduce_mean(inp)
x = [1,2,3,4,5]
with tf.Session() as sess:
print(mean.eval(feed_dict={inp : x}))
This pattern comes in handy when computing values over batches of images. Look at The Deep MNIST Example where you see code like:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
The new documentation states that tf.reduce_mean() produces the same results as np.mean:
Equivalent to np.mean
It also has absolutely the same parameters as np.mean. But here is an important difference: they produce the same results only on float values:
import tensorflow as tf
import numpy as np
from random import randint
num_dims = 10
rand_dim = randint(0, num_dims - 1)
c = np.random.randint(50, size=tuple([5] * num_dims)).astype(float)
with tf.Session() as sess:
r1 = sess.run(tf.reduce_mean(c, rand_dim))
r2 = np.mean(c, rand_dim)
is_equal = np.array_equal(r1, r2)
print is_equal
if not is_equal:
print r1
print r2
If you will remove type conversion, you will see different results
In additional to this, many other tf.reduce_ functions such as reduce_all, reduce_any, reduce_min, reduce_max, reduce_prod produce the same values as there numpy analogs. Clearly because they are operations, they can be executed only from inside of the session.

