What does numpy.gradient do?

What does numpy.gradient do? - python

So I know what the gradient of a (mathematical) function is, so I feel like I should know what numpy.gradient does. But I don't. The documentation is not really helpful either:
Return the gradient of an N-dimensional array.
What is the gradient of an array? When is numpy.gradient useful?

Also in the documentation1:
>>> y = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> j = np.gradient(y)
>>> j
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
Gradient is defined as (change in y)/(change in x).
x, here, is the list index, so the difference between adjacent values is 1.
At the boundaries, the first difference is calculated. This means that at each end of the array, the gradient given is simply, the difference between the end two values (divided by 1)
Away from the boundaries the gradient for a particular index is given by taking the difference between the the values either side and dividing by 2.
So, the gradient of y, above, is calculated thus:
j[0] = (y[1]-y[0])/1 = (2-1)/1 = 1
j[1] = (y[2]-y[0])/2 = (4-1)/2 = 1.5
j[2] = (y[3]-y[1])/2 = (7-2)/2 = 2.5
j[3] = (y[4]-y[2])/2 = (11-4)/2 = 3.5
j[4] = (y[5]-y[3])/2 = (16-7)/2 = 4.5
j[5] = (y[5]-y[4])/1 = (16-11)/1 = 5
You could find the minima of all the absolute values in the resulting array to find the turning points of a curve, for example.
1The array is actually called x in the example in the docs, I've changed it to y to avoid confusion.

The gradient is computed using central differences in the interior and
first differences at the boundaries.
and
The default distance is 1
This means that in the interior it is computed as
where h = 1.0
and at the boundaries

Here is what is going on. The Taylor series expansion guides us on how to approximate the derivative, given the value at close points. The simplest comes from the first order Taylor series expansion for a C^2 function (two continuous derivatives)...
f(x+h) = f(x) + f'(x)h+f''(xi)h^2/2.
One can solve for f'(x)...
f'(x) = [f(x+h) - f(x)]/h + O(h).
Can we do better? Yes indeed. If we assume C^3, then the Taylor expansion is
f(x+h) = f(x) + f'(x)h + f''(x)h^2/2 + f'''(xi) h^3/6, and
f(x-h) = f(x) - f'(x)h + f''(x)h^2/2 - f'''(xi) h^3/6.
Subtracting these (both the h^0 and h^2 terms drop out!) and solve for f'(x):
f'(x) = [f(x+h) - f(x-h)]/(2h) + O(h^2).
So, if we have a discretized function defined on equal distant partitions:
x = x_0,x_0+h(=x_1),....,x_n=x_0+h*n, then numpy gradient will yield a "derivative" array using the first order estimate on the ends and the better estimates in the middle.
Example 1. If you don't specify any spacing, the interval is assumed to be 1. so if you call
f = np.array([5, 7, 4, 8])
what you are saying is that f(0) = 5, f(1) = 7, f(2) = 4, and f(3) = 8. Then
np.gradient(f)
will be: f'(0) = (7 - 5)/1 = 2, f'(1) = (4 - 5)/(2*1) = -0.5, f'(2) = (8 - 7)/(2*1) = 0.5, f'(3) = (8 - 4)/1 = 4.
Example 2. If you specify a single spacing, the spacing is uniform but not 1.
For example, if you call
np.gradient(f, 0.5)
this is saying that h = 0.5, not 1, i.e., the function is really f(0) = 5, f(0.5) = 7, f(1.0) = 4, f(1.5) = 8. The net effect is to replace h = 1 with h = 0.5 and all the results will be doubled.
Example 3. Suppose the discretized function f(x) is not defined on uniformly spaced intervals, for instance f(0) = 5, f(1) = 7, f(3) = 4, f(3.5) = 8, then there is a messier discretized differentiation function that the numpy gradient function uses and you will get the discretized derivatives by calling
np.gradient(f, np.array([0,1,3,3.5]))
Lastly, if your input is a 2d array, then you are thinking of a function f of x, y defined on a grid. The numpy gradient will output the arrays of "discretized" partial derivatives in x and y.

Think about N-dimensional array as a matrix.
Then gradient is nothing else as matrix differentiation
For a good explanation look at gradient description in matlab documentation.

Related

Python array optimization with two constraints

I have an optimization problem where I'm trying to find an array that needs to optimize two functions simultaneously.
In the minimal example below I have two known arrays w and x and an unknown array y. I initialize array y to contains only 1s.
I then specify function np.sqrt(np.sum((x-np.array)**2) and want to find the array y where
np.sqrt(np.sum((x-y)**2) approaches 5
np.sqrt(np.sum((w-y)**2) approaches 8
The code below can be used to successfully optimize y with respect to a single array, but I would like to find that the solution that optimizes y with respect to both x and y simultaneously, but am unsure how to specify the two constraints.
y should only consist of values greater than 0.
Any ideas on how to go about this ?
w = np.array([6, 3, 1, 0, 2])
x = np.array([3, 4, 5, 6, 7])
y = np.array([1, 1, 1, 1, 1])
def func(x, y):
z = np.sqrt(np.sum((x-y)**2)) - 5
return np.zeros(x.shape[0],) + z
r = opt.root(func, x0=y, method='hybr')
print(r.x)
# array([1.97522498 3.47287981 5.1943792 2.10120135 4.09593969])
print(np.sqrt(np.sum((x-r.x)**2)))
# 5.0

One option is to use scipy.optimize.minimize instead of root, Here you have multiple solver options and some of them (ie SLSQP) allow you to specify multiple constraints. Note that I changed the variable names so that x is the array you want to optimise and y and z define the constraints.
from scipy.optimize import minimize
import numpy as np
x0 = np.array([1, 1, 1, 1, 1])
y = np.array([6, 3, 1, 0, 2])
z = np.array([3, 4, 5, 6, 7])
constraint_x = dict(type='ineq',
fun=lambda x: x) # fulfilled if > 0
constraint_y = dict(type='eq',
fun=lambda x: np.linalg.norm(x-y) - 5) # fulfilled if == 0
constraint_z = dict(type='eq',
fun=lambda x: np.linalg.norm(x-z) - 8) # fulfilled if == 0
res = minimize(fun=lambda x: np.linalg.norm(x), constraints=[constraint_y, constraint_z], x0=x0,
method='SLSQP', options=dict(ftol=1e-8)) # default 1e-6
print(res.x) # [1.55517124 1.44981672 1.46921122 1.61335466 2.13174483]
print(np.linalg.norm(res.x-y)) # 5.00000000137866
print(np.linalg.norm(res.x-z)) # 8.000000000930026
This is a minimizer so besides the constraints it also wants a function to minimize, I chose just the norm of y, but setting the function to a constant (ie lambda x: 1) would have also worked.
Note also that the constraints are not exactly fulfilled, you can increase the accuracy by setting optional argument ftol to a smaller value ie 1e-10.
For more information see also the documentation and the corresponding sections for each solver.

Building N-th order Markovian transition matrix from a given sequence

I am trying to create a function which can transform a given input sequence to a transition matrix of the requested order. I found an implementation for the first-order Markovian transition matrix.
Now, I want to be able to come up with a solution which can calculate 2nd and 3rd order transition matrices.
Example of the 1st order matrix implementation:
import numpy as np
# sequence with 3 states -> 0, 1, 2
a = [0, 1, 0, 0, 0, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 0, 2]
def transition_matrix_first_order(seq):
M = np.full((3, 3), fill_value = 1/3, dtype= np.float64)
for (i,j) in zip(seq, seq[1:]):
M[i, j] += 1
M = M / M.sum(axis = 1, keepdims = True)
return M
print(transition_matrix_first_order(a))
Which gives me this:
[[0.61111111 0.19444444 0.19444444]
[0.38888889 0.38888889 0.22222222]
[0.22222222 0.22222222 0.55555556]]
When making a 2nd order matrix, it should have unique_state_count ** order rows and unique_state_count columns. In the example above, I have 3 unique states, so the matrix will have 9x3 structure.
Desirable function sample:
cal_tr_matrix(seq, unique_state_count, order)

I think you have a slight misunderstanding about the Markov chains and their transition matrices.
First of all, the estimated transition matrix your function produces is unfortunately not correct. Why? Let's refresh.
A discrete Markov chain in discrete time with N different states has a transition matrix P of size N x N, where a (i, j) element is P(X_1=j|X_0=i), i.e. the probability of transition from state i to state j in a single time step.
Now a transition matrix of order n, denoted P^{n}is once again a matrix of size N x N where a (i, j) element is P(X_n=j|X_0=i), i.e. the probability of transition from state i to state j in n time steps.
A wonderful result says: P^{n} = P^n, i.e. taking n powers of single-step transition matrix gives you the n-step transition matrix.
Now with this recap, all that is needed is to estimate P from the given sequence, then to estimate P^{n} one can just use the already estimated P and take a n-th power of the matrix. So how to estimate the matrix P? Well if we denote N_{ij} the number of observations of transition from state i to state j and N_{i*} the number of observations being in state i, then P_{ij} = N_{ij} / N_{i*}.
Overall here in Python:
import numpy as np
def transition_matrix(arr, n=1):
""""
Computes the transition matrix from Markov chain sequence of order `n`.
:param arr: Discrete Markov chain state sequence in discrete time with states in 0, ..., N
:param n: Transition order
"""
M = np.zeros(shape=(max(arr) + 1, max(arr) + 1))
for (i, j) in zip(arr, arr[1:]):
M[i, j] += 1
T = (M.T / M.sum(axis=1)).T
return np.linalg.matrix_power(T, n)
transition_matrix(arr=a, n=1)
>>> array([[0.63636364, 0.18181818, 0.18181818],
>>> [0.4 , 0.4 , 0.2 ],
>>> [0.2 , 0.2 , 0.6 ]])
transition_matrix(arr=a, n=2)
>>> array([[0.51404959, 0.22479339, 0.26115702],
>>> [0.45454545, 0.27272727, 0.27272727],
>>> [0.32727273, 0.23636364, 0.43636364]])
transition_matrix(arr=a, n=3)
>>> array([[0.46927122, 0.23561232, 0.29511645],
>>> [0.45289256, 0.24628099, 0.30082645],
>>> [0.39008264, 0.24132231, 0.36859504]])
Interesting thing, when you set the order n to a fairly high number, the higher and higher powers of the P matrix seem to converge to some very specific values. That's known as stationary/invariant distribution of the Markov chain and it gives a very good indication of how the chain behaves over a long period of time/transitions. Also:
P = transition_matrix(a, 1)
P111 = transition_matrix(a, 111)
print(P)
print(P111.dot(P))
EDIT: Now to the tweaked solution based on your comment, I'd suggest to have higher dimensional matrices for higher orders instead of exploding the number of rows. One way would be like this:
def cal_tr_matrix(arr, order):
_shape = (max(arr) + 1,) * (order + 1)
M = np.zeros(_shape)
for _ind in zip(*[arr[_x:] for _x in range(order + 1)]):
M[_ind] += 1
return M
res1 = cal_tr_matrix(a, 1)
res2 = cal_tr_matrix(a, 2)
Now the element res1[i, j] says how many times transition i->j happened, while the element res2[i, j, k] says how many times transition i->j->k happened.

Shortest distance between a point and a line in 3 d space

I am trying to find the minimum distance from a point (x0,y0,z0) to a line joined by (x1,y1,z1) and (x2,y2,z2) using numpy or anything in python. Unfortunately, all i can find on the net is related to 2d spaces and i am fairly new to python. Any help will be appreciated. Thanks in advance!

StackOverflow doesn't support Latex, so I'm going to gloss over some of the math. One solution comes from the idea that if your line spans the points p and q, then every point on that line can be represented as t*(p-q)+q for some real-valued t. You then want to minimize the distance between your given point r and any point on that line, and distance is conveniently a function of the single variable t, so standard calculus tricks work fine. Consider the following example, which calculates the minimum distance between r and the line spanned by p and q. By hand, we know the answer should be 1.
import numpy as np
p = np.array([0, 0, 0])
q = np.array([0, 0, 1])
r = np.array([0, 1, 1])
def t(p, q, r):
x = p-q
return np.dot(r-q, x)/np.dot(x, x)
def d(p, q, r):
return np.linalg.norm(t(p, q, r)*(p-q)+q-r)
print(d(p, q, r))
# Prints 1.0
This works fine in any number of dimensions, including 2, 3, and a billion. The only real constraint is that p and q have to be distinct points so that there is a unique line between them.
I broke the code down in the above example in order to show the two distinct steps arising from the way I thought about it mathematically (finding t and then computing the distance). That isn't necessarily the most efficient approach, and it certainly isn't if you want to know the minimum distance for a wide variety of points and the same line -- doubly so if the number of dimensions is small. For a more efficient approach, consider the following:
import numpy as np
p = np.array([0, 0, 0]) # p and q can have shape (n,) for any
q = np.array([0, 0, 1]) # n>0, and rs can have shape (m,n)
rs = np.array([ # for any m,n>0.
[0, 1, 1],
[1, 0, 1],
[1, 1, 1],
[0, 2, 1],
])
def d(p, q, rs):
x = p-q
return np.linalg.norm(
np.outer(np.dot(rs-q, x)/np.dot(x, x), x)+q-rs,
axis=1)
print(d(p, q, rs))
# Prints array([1. , 1. , 1.41421356, 2. ])
There may well be some simplifications I'm missing or other things that could speed that up, but it should be a good start at least.

This duplicates #Hans Musgrave solution, but imagine we know nothing of
'standard calculus tricks' that 'work fine' and also very bad at linear algebra.
All we know is:
how to calculate a distance between two points
a point on line can be represented as a function of two points and a paramater
we know find a function minimum
(lists are not friends with code blocks)
def distance(a, b):
"""Calculate a distance between two points."""
return np.linalg.norm(a-b)
def line_to_point_distance(p, q, r):
"""Calculate a distance between point r and a line crossing p and q."""
def foo(t: float):
# x is point on line, depends on t
x = t * (p-q) + q
# we return a distance, which also depends on t
return distance(x, r)
# which t minimizes distance?
t0 = sci.optimize.minimize(foo, 0.1).x[0]
return foo(t0)
# in this example the distance is 5
p = np.array([0, 0, 0])
q = np.array([2, 0, 0])
r = np.array([1, 5, 0])
assert abs(line_to_point_distance(p, q, r) - 5) < 0.00001
You should not use this method for real calculations, because it uses
approximations wher eyou have a closed form solution, but maybe it helpful
to reveal some logic begind the neighbouring answer.

How to implement the Softmax function in Python

From the Udacity's deep learning class, the softmax of y_i is simply the exponential divided by the sum of exponential of the whole Y vector:
Where S(y_i) is the softmax function of y_i and e is the exponential and j is the no. of columns in the input vector Y.
I've tried the following:
import numpy as np
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
scores = [3.0, 1.0, 0.2]
print(softmax(scores))
which returns:
[ 0.8360188 0.11314284 0.05083836]
But the suggested solution was:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
which produces the same output as the first implementation, even though the first implementation explicitly takes the difference of each column and the max and then divides by the sum.
Can someone show mathematically why? Is one correct and the other one wrong?
Are the implementation similar in terms of code and time complexity? Which is more efficient?

They're both correct, but yours is preferred from the point of view of numerical stability.
You start with
e ^ (x - max(x)) / sum(e^(x - max(x))
By using the fact that a^(b - c) = (a^b)/(a^c) we have
= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))
= e ^ x / sum(e ^ x)
Which is what the other answer says. You could replace max(x) with any variable and it would cancel out.

(Well... much confusion here, both in the question and in the answers...)
To start with, the two solutions (i.e. yours and the suggested one) are not equivalent; they happen to be equivalent only for the special case of 1-D score arrays. You would have discovered it if you had tried also the 2-D score array in the Udacity quiz provided example.
Results-wise, the only actual difference between the two solutions is the axis=0 argument. To see that this is the case, let's try your solution (your_softmax) and one where the only difference is the axis argument:
import numpy as np
# your solution:
def your_softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
# correct solution:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0) # only difference
As I said, for a 1-D score array, the results are indeed identical:
scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188 0.11314284 0.05083836]
print(softmax(scores))
# [ 0.8360188 0.11314284 0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True, True, True], dtype=bool)
Nevertheless, here are the results for the 2-D score array given in the Udacity quiz as a test example:
scores2D = np.array([[1, 2, 3, 6],
[2, 4, 5, 6],
[3, 8, 7, 6]])
print(your_softmax(scores2D))
# [[ 4.89907947e-04 1.33170787e-03 3.61995731e-03 7.27087861e-02]
# [ 1.33170787e-03 9.84006416e-03 2.67480676e-02 7.27087861e-02]
# [ 3.61995731e-03 5.37249300e-01 1.97642972e-01 7.27087861e-02]]
print(softmax(scores2D))
# [[ 0.09003057 0.00242826 0.01587624 0.33333333]
# [ 0.24472847 0.01794253 0.11731043 0.33333333]
# [ 0.66524096 0.97962921 0.86681333 0.33333333]]
The results are different - the second one is indeed identical with the one expected in the Udacity quiz, where all columns indeed sum to 1, which is not the case with the first (wrong) result.
So, all the fuss was actually for an implementation detail - the axis argument. According to the numpy.sum documentation:
The default, axis=None, will sum all of the elements of the input array
while here we want to sum row-wise, hence axis=0. For a 1-D array, the sum of the (only) row and the sum of all the elements happen to be identical, hence your identical results in that case...
The axis issue aside, your implementation (i.e. your choice to subtract the max first) is actually better than the suggested solution! In fact, it is the recommended way of implementing the softmax function - see here for the justification (numeric stability, also pointed out by some other answers here).

So, this is really a comment to desertnaut's answer but I can't comment on it yet due to my reputation. As he pointed out, your version is only correct if your input consists of a single sample. If your input consists of several samples, it is wrong. However, desertnaut's solution is also wrong. The problem is that once he takes a 1-dimensional input and then he takes a 2-dimensional input. Let me show this to you.
import numpy as np
# your solution:
def your_softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
# desertnaut solution (copied from his answer):
def desertnaut_softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0) # only difference
# my (correct) solution:
def softmax(z):
assert len(z.shape) == 2
s = np.max(z, axis=1)
s = s[:, np.newaxis] # necessary step to do broadcasting
e_x = np.exp(z - s)
div = np.sum(e_x, axis=1)
div = div[:, np.newaxis] # dito
return e_x / div
Lets take desertnauts example:
x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)
This is the output:
your_softmax(x1)
array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047]])
desertnaut_softmax(x1)
array([[ 1., 1., 1., 1.]])
softmax(x1)
array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047]])
You can see that desernauts version would fail in this situation. (It would not if the input was just one dimensional like np.array([1, 2, 3, 6]).
Lets now use 3 samples since thats the reason why we use a 2 dimensional input. The following x2 is not the same as the one from desernauts example.
x2 = np.array([[1, 2, 3, 6], # sample 1
[2, 4, 5, 6], # sample 2
[1, 2, 3, 6]]) # sample 1 again(!)
This input consists of a batch with 3 samples. But sample one and three are essentially the same. We now expect 3 rows of softmax activations where the first should be the same as the third and also the same as our activation of x1!
your_softmax(x2)
array([[ 0.00183535, 0.00498899, 0.01356148, 0.27238963],
[ 0.00498899, 0.03686393, 0.10020655, 0.27238963],
[ 0.00183535, 0.00498899, 0.01356148, 0.27238963]])
desertnaut_softmax(x2)
array([[ 0.21194156, 0.10650698, 0.10650698, 0.33333333],
[ 0.57611688, 0.78698604, 0.78698604, 0.33333333],
[ 0.21194156, 0.10650698, 0.10650698, 0.33333333]])
softmax(x2)
array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047],
[ 0.01203764, 0.08894682, 0.24178252, 0.65723302],
[ 0.00626879, 0.01704033, 0.04632042, 0.93037047]])
I hope you can see that this is only the case with my solution.
softmax(x1) == softmax(x2)[0]
array([[ True, True, True, True]], dtype=bool)
softmax(x1) == softmax(x2)[2]
array([[ True, True, True, True]], dtype=bool)
Additionally, here is the results of TensorFlows softmax implementation:
import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})
And the result:
array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037045],
[ 0.01203764, 0.08894681, 0.24178252, 0.657233 ],
[ 0.00626879, 0.01704033, 0.04632042, 0.93037045]], dtype=float32)

I would say that while both are correct mathematically, implementation-wise, first one is better. When computing softmax, the intermediate values may become very large. Dividing two large numbers can be numerically unstable. These notes (from Stanford) mention a normalization trick which is essentially what you are doing.

sklearn also offers implementation of softmax
from sklearn.utils.extmath import softmax
import numpy as np
x = np.array([[ 0.50839931, 0.49767588, 0.51260159]])
softmax(x)
# output
array([[ 0.3340521 , 0.33048906, 0.33545884]])

From mathematical point of view both sides are equal.
And you can easily prove this. Let's m=max(x). Now your function softmax returns a vector, whose i-th coordinate is equal to
notice that this works for any m, because for all (even complex) numbers e^m != 0
from computational complexity point of view they are also equivalent and both run in O(n) time, where n is the size of a vector.
from numerical stability point of view, the first solution is preferred, because e^x grows very fast and even for pretty small values of x it will overflow. Subtracting the maximum value allows to get rid of this overflow. To practically experience the stuff I was talking about try to feed x = np.array([1000, 5]) into both of your functions. One will return correct probability, the second will overflow with nan
your solution works only for vectors (Udacity quiz wants you to calculate it for matrices as well). In order to fix it you need to use sum(axis=0)

EDIT. As of version 1.2.0, scipy includes softmax as a special function:
https://scipy.github.io/devdocs/generated/scipy.special.softmax.html
I wrote a function applying the softmax over any axis:
def softmax(X, theta = 1.0, axis = None):
"""
Compute the softmax of each element along an axis of X.
Parameters
----------
X: ND-Array. Probably should be floats.
theta (optional): float parameter, used as a multiplier
prior to exponentiation. Default = 1.0
axis (optional): axis to compute values along. Default is the
first non-singleton axis.
Returns an array the same size as X. The result will sum to 1
along the specified axis.
"""
# make X at least 2d
y = np.atleast_2d(X)
# find axis
if axis is None:
axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)
# multiply y against the theta parameter,
y = y * float(theta)
# subtract the max for numerical stability
y = y - np.expand_dims(np.max(y, axis = axis), axis)
# exponentiate y
y = np.exp(y)
# take the sum along the specified axis
ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)
# finally: divide elementwise
p = y / ax_sum
# flatten if X was 1D
if len(X.shape) == 1: p = p.flatten()
return p
Subtracting the max, as other users described, is good practice. I wrote a detailed post about it here.

Here you can find out why they used - max.
From there:
"When you’re writing code for computing the Softmax function in practice, the intermediate terms may be very large due to the exponentials. Dividing large numbers can be numerically unstable, so it is important to use a normalization trick."

I was curious to see the performance difference between these
import numpy as np
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
def softmaxv2(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
def softmaxv3(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / np.sum(e_x, axis=0)
def softmaxv4(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x)), axis=0)
x=[10,10,18,9,15,3,1,2,1,10,10,10,8,15]
Using
print("----- softmax")
%timeit a=softmax(x)
print("----- softmaxv2")
%timeit a=softmaxv2(x)
print("----- softmaxv3")
%timeit a=softmaxv2(x)
print("----- softmaxv4")
%timeit a=softmaxv2(x)
Increasing the values inside x (+100 +200 +500...) I get consistently better results with the original numpy version (here is just one test)
----- softmax
The slowest run took 8.07 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 17.8 µs per loop
----- softmaxv2
The slowest run took 4.30 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv3
The slowest run took 4.06 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23 µs per loop
----- softmaxv4
10000 loops, best of 3: 23 µs per loop
Until.... the values inside x reach ~800, then I get
----- softmax
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: overflow encountered in exp
after removing the cwd from sys.path.
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in true_divide
after removing the cwd from sys.path.
The slowest run took 18.41 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv2
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.8 µs per loop
----- softmaxv3
The slowest run took 19.44 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 23.6 µs per loop
----- softmaxv4
The slowest run took 16.82 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 22.7 µs per loop
As some said, your version is more numerically stable 'for large numbers'. For small numbers could be the other way around.

A more concise version is:
def softmax(x):
return np.exp(x) / np.exp(x).sum(axis=0)

To offer an alternative solution, consider the cases where your arguments are extremely large in magnitude such that exp(x) would underflow (in the negative case) or overflow (in the positive case). Here you want to remain in log space as long as possible, exponentiating only at the end where you can trust the result will be well-behaved.
import scipy.special as sc
import numpy as np
def softmax(x: np.ndarray) -> np.ndarray:
return np.exp(x - sc.logsumexp(x))

I needed something compatible with the output of a dense layer from Tensorflow.
The solution from #desertnaut does not work in this case because I have batches of data. Therefore, I came with another solution that should work in both cases:
def softmax(x, axis=-1):
e_x = np.exp(x - np.max(x)) # same code
return e_x / e_x.sum(axis=axis, keepdims=True)
Results:
logits = np.asarray([
[-0.0052024, -0.00770216, 0.01360943, -0.008921], # 1
[-0.0052024, -0.00770216, 0.01360943, -0.008921] # 2
])
print(softmax(logits))
#[[0.2492037 0.24858153 0.25393605 0.24827873]
# [0.2492037 0.24858153 0.25393605 0.24827873]]
Ref: Tensorflow softmax

I would suggest this:
def softmax(z):
z_norm=np.exp(z-np.max(z,axis=0,keepdims=True))
return(np.divide(z_norm,np.sum(z_norm,axis=0,keepdims=True)))
It will work for stochastic as well as the batch.
For more detail see :
https://medium.com/#ravish1729/analysis-of-softmax-function-ad058d6a564d

In order to maintain for numerical stability, max(x) should be subtracted. The following is the code for softmax function;
def softmax(x):
if len(x.shape) > 1:
tmp = np.max(x, axis = 1)
x -= tmp.reshape((x.shape[0], 1))
x = np.exp(x)
tmp = np.sum(x, axis = 1)
x /= tmp.reshape((x.shape[0], 1))
else:
tmp = np.max(x)
x -= tmp
x = np.exp(x)
tmp = np.sum(x)
x /= tmp
return x

Already answered in much detail in above answers. max is subtracted to avoid overflow. I am adding here one more implementation in python3.
import numpy as np
def softmax(x):
mx = np.amax(x,axis=1,keepdims = True)
x_exp = np.exp(x - mx)
x_sum = np.sum(x_exp, axis = 1, keepdims = True)
res = x_exp / x_sum
return res
x = np.array([[3,2,4],[4,5,6]])
print(softmax(x))

Everybody seems to post their solution so I'll post mine:
def softmax(x):
e_x = np.exp(x.T - np.max(x, axis = -1))
return (e_x / e_x.sum(axis=0)).T
I get the exact same results as the imported from sklearn:
from sklearn.utils.extmath import softmax

import tensorflow as tf
import numpy as np
def softmax(x):
return (np.exp(x).T / np.exp(x).sum(axis=-1)).T
logits = np.array([[1, 2, 3], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])
sess = tf.Session()
print(softmax(logits))
print(sess.run(tf.nn.softmax(logits)))
sess.close()

Based on all the responses and CS231n notes, allow me to summarise:
def softmax(x, axis):
x -= np.max(x, axis=axis, keepdims=True)
return np.exp(x) / np.exp(x).sum(axis=axis, keepdims=True)
Usage:
x = np.array([[1, 0, 2,-1],
[2, 4, 6, 8],
[3, 2, 1, 0]])
softmax(x, axis=1).round(2)
Output:
array([[0.24, 0.09, 0.64, 0.03],
[0. , 0.02, 0.12, 0.86],
[0.64, 0.24, 0.09, 0.03]])

The softmax function is an activation function that turns numbers into probabilities which sum to one. The softmax function outputs a vector that represents the probability distributions of a list of outcomes. It is also a core element used in deep learning classification tasks.
Softmax function is used when we have multiple classes.
It is useful for finding out the class which has the max. Probability.
The Softmax function is ideally used in the output layer, where we are actually trying to attain the probabilities to define the class of each input.
It ranges from 0 to 1.
Softmax function turns logits [2.0, 1.0, 0.1] into probabilities [0.7, 0.2, 0.1], and the probabilities sum to 1. Logits are the raw scores output by the last layer of a neural network. Before activation takes place. To understand the softmax function, we must look at the output of the (n-1)th layer.
The softmax function is, in fact, an arg max function. That means that it does not return the largest value from the input, but the position of the largest values.
For example:
Before softmax
X = [13, 31, 5]
After softmax
array([1.52299795e-08, 9.99999985e-01, 5.10908895e-12]
Code:
import numpy as np
# your solution:
def your_softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum()
# correct solution:
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
# only difference

This also works with np.reshape.
def softmax( scores):
"""
Compute softmax scores given the raw output from the model
:param scores: raw scores from the model (N, num_classes)
:return:
prob: softmax probabilities (N, num_classes)
"""
prob = None
exponential = np.exp(
scores - np.max(scores, axis=1).reshape(-1, 1)
) # subract the largest number https://jamesmccaffrey.wordpress.com/2016/03/04/the-max-trick-when-computing-softmax/
prob = exponential / exponential.sum(axis=1).reshape(-1, 1)
return prob

I would like to supplement a little bit more understanding of the problem. Here it is correct of subtracting max of the array. But if you run the code in the other post, you would find it is not giving you right answer when the array is 2D or higher dimensions.
Here I give you some suggestions:
To get max, try to do it along x-axis, you will get an 1D array.
Reshape your max array to original shape.
Do np.exp get exponential value.
Do np.sum along axis.
Get the final results.
Follow the result you will get the correct answer by doing vectorization. Since it is related to the college homework, I cannot post the exact code here, but I would like to give more suggestions if you don't understand.

Goal was to achieve similar results using Numpy and Tensorflow. The only change from original answer is axis parameter for np.sum api.
Initial approach : axis=0 - This however does not provide intended results when dimensions are N.
Modified approach: axis=len(e_x.shape)-1 - Always sum on the last dimension. This provides similar results as tensorflow's softmax function.
def softmax_fn(input_array):
"""
| **#author**: Prathyush SP
|
| Calculate Softmax for a given array
:param input_array: Input Array
:return: Softmax Score
"""
e_x = np.exp(input_array - np.max(input_array))
return e_x / e_x.sum(axis=len(e_x.shape)-1)

Here is generalized solution using numpy and comparision for correctness with tensorflow ans scipy:
Data preparation:
import numpy as np
np.random.seed(2019)
batch_size = 1
n_items = 3
n_classes = 2
logits_np = np.random.rand(batch_size,n_items,n_classes).astype(np.float32)
print('logits_np.shape', logits_np.shape)
print('logits_np:')
print(logits_np)
Output:
logits_np.shape (1, 3, 2)
logits_np:
[[[0.9034822 0.3930805 ]
[0.62397 0.6378774 ]
[0.88049906 0.299172 ]]]
Softmax using tensorflow:
import tensorflow as tf
logits_tf = tf.convert_to_tensor(logits_np, np.float32)
scores_tf = tf.nn.softmax(logits_np, axis=-1)
print('logits_tf.shape', logits_tf.shape)
print('scores_tf.shape', scores_tf.shape)
with tf.Session() as sess:
scores_np = sess.run(scores_tf)
print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)
print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np,axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))
Output:
logits_tf.shape (1, 3, 2)
scores_tf.shape (1, 3, 2)
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
[0.4965232 0.5034768 ]
[0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]
Softmax using scipy:
from scipy.special import softmax
scores_np = softmax(logits_np, axis=-1)
print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)
print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))
Output:
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
[0.4965232 0.5034768 ]
[0.6413727 0.35862732]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]
Softmax using numpy (https://nolanbconaway.github.io/blog/2017/softmax-numpy) :
def softmax(X, theta = 1.0, axis = None):
"""
Compute the softmax of each element along an axis of X.
Parameters
----------
X: ND-Array. Probably should be floats.
theta (optional): float parameter, used as a multiplier
prior to exponentiation. Default = 1.0
axis (optional): axis to compute values along. Default is the
first non-singleton axis.
Returns an array the same size as X. The result will sum to 1
along the specified axis.
"""
# make X at least 2d
y = np.atleast_2d(X)
# find axis
if axis is None:
axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)
# multiply y against the theta parameter,
y = y * float(theta)
# subtract the max for numerical stability
y = y - np.expand_dims(np.max(y, axis = axis), axis)
# exponentiate y
y = np.exp(y)
# take the sum along the specified axis
ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)
# finally: divide elementwise
p = y / ax_sum
# flatten if X was 1D
if len(X.shape) == 1: p = p.flatten()
return p
scores_np = softmax(logits_np, axis=-1)
print('scores_np.shape', scores_np.shape)
print('scores_np:')
print(scores_np)
print('np.sum(scores_np, axis=-1).shape', np.sum(scores_np, axis=-1).shape)
print('np.sum(scores_np, axis=-1):')
print(np.sum(scores_np, axis=-1))
Output:
scores_np.shape (1, 3, 2)
scores_np:
[[[0.62490064 0.37509936]
[0.49652317 0.5034768 ]
[0.64137274 0.3586273 ]]]
np.sum(scores_np, axis=-1).shape (1, 3)
np.sum(scores_np, axis=-1):
[[1. 1. 1.]]

The purpose of the softmax function is to preserve the ratio of the vectors as opposed to squashing the end-points with a sigmoid as the values saturate (i.e. tend to +/- 1 (tanh) or from 0 to 1 (logistical)). This is because it preserves more information about the rate of change at the end-points and thus is more applicable to neural nets with 1-of-N Output Encoding (i.e. if we squashed the end-points it would be harder to differentiate the 1-of-N output class because we can't tell which one is the "biggest" or "smallest" because they got squished.); also it makes the total output sum to 1, and the clear winner will be closer to 1 while other numbers that are close to each other will sum to 1/p, where p is the number of output neurons with similar values.
The purpose of subtracting the max value from the vector is that when you do e^y exponents you may get very high value that clips the float at the max value leading to a tie, which is not the case in this example. This becomes a BIG problem if you subtract the max value to make a negative number, then you have a negative exponent that rapidly shrinks the values altering the ratio, which is what occurred in poster's question and yielded the incorrect answer.
The answer supplied by Udacity is HORRIBLY inefficient. The first thing we need to do is calculate e^y_j for all vector components, KEEP THOSE VALUES, then sum them up, and divide. Where Udacity messed up is they calculate e^y_j TWICE!!! Here is the correct answer:
def softmax(y):
e_to_the_y_j = np.exp(y)
return e_to_the_y_j / np.sum(e_to_the_y_j, axis=0)

This generalizes and assumes you are normalizing the trailing dimension.
def softmax(x: np.ndarray) -> np.ndarray:
e_x = np.exp(x - np.max(x, axis=-1)[..., None])
e_y = e_x.sum(axis=-1)[..., None]
return e_x / e_y

I used these three simple lines:
x_exp=np.exp(x)
x_sum=np.sum(x_exp, axis = 1, keepdims = True)
s=x_exp / x_sum

Partial convolution / correlation with numpy [duplicate]

I am learning numpy/scipy, coming from a MATLAB background. The xcorr function in Matlab has an optional argument "maxlag" that limits the lag range from –maxlag to maxlag. This is very useful if you are looking at the cross-correlation between two very long time series but are only interested in the correlation within a certain time range. The performance increases are enormous considering that cross-correlation is incredibly expensive to compute.
In numpy/scipy it seems there are several options for computing cross-correlation. numpy.correlate, numpy.convolve, scipy.signal.fftconvolve. If someone wishes to explain the difference between these, I'd be happy to hear, but mainly what is troubling me is that none of them have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This gives a 200x performance hit! Do I have to recode the cross-correlation function by hand to include this feature?

Here are a couple functions to compute auto- and cross-correlation with limited lags. The order of multiplication (and conjugation, in the complex case) was chosen to match the corresponding behavior of numpy.correlate.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def _check_arg(x, xname):
x = np.asarray(x)
if x.ndim != 1:
raise ValueError('%s must be one-dimensional.' % xname)
return x
def autocorrelation(x, maxlag):
"""
Autocorrelation with a maximum number of lags.
`x` must be a one-dimensional numpy array.
This computes the same result as
numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]
The return value has length maxlag + 1.
"""
x = _check_arg(x, 'x')
p = np.pad(x.conj(), maxlag, mode='constant')
T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
strides=(-p.strides[0], p.strides[0]))
return T.dot(p[maxlag:].conj())
def crosscorrelation(x, y, maxlag):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
x = _check_arg(x, 'x')
y = _check_arg(y, 'y')
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
return T.dot(px)
For example,
In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5, 5. , -3.5, -1. ])
In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5, 5. , -3.5, -1. ])
In [370]: y = np.arange(8)
In [371]: crosscorrelation(x, y, 3)
Out[371]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([ 5. , 23.5, 32. , 21. , 16. , 12.5, 9. ])
(It will be nice to have such a feature in numpy itself.)

Until numpy implements the maxlag argument, you can use the function ucorrelate from the pycorrelate package. ucorrelate operates on numpy arrays and has a maxlag keyword. It implements the correlation from using a for-loop and optimizes the execution speed with numba.
Example - autocorrelation with 3 time lags:
import numpy as np
import pycorrelate as pyc
x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c
Result:
Out[1]: array([20, 5, -3])
The pycorrelate documentation contains a notebook showing perfect match between pycorrelate.ucorrelate and numpy.correlate:

matplotlib.pyplot provides matlab like syntax for computating and plotting of cross correlation , auto correlation etc.
You can use xcorr which allows to define the maxlags parameter.
import matplotlib.pyplot as plt
import numpy as np
data = np.arange(0,2*np.pi,0.01)
y1 = np.sin(data)
y2 = np.cos(data)
coeff = plt.xcorr(y1,y2,maxlags=10)
print(*coeff)
[-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
8 9 10] [ -9.81991753e-02 -8.85505028e-02 -7.88613080e-02 -6.91325329e-02
-5.93651264e-02 -4.95600447e-02 -3.97182508e-02 -2.98407146e-02
-1.99284126e-02 -9.98232812e-03 -3.45104289e-06 9.98555430e-03
1.99417667e-02 2.98641953e-02 3.97518558e-02 4.96037706e-02
5.94189688e-02 6.91964864e-02 7.89353663e-02 8.86346584e-02
9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)

#Warren Weckesser's answer is the best as it leverages numpy to get performance savings (and not just call corr for each lag). Nonetheless, it returns the cross-product (eg the dot product between the inputs at various lags). To get the actual cross-correlation I modified his answer w/ an optional mode argument, which if set to 'corr' returns the cross-correlation as such:
def crosscorrelation(x, y, maxlag, mode='corr'):
"""
Cross correlation with a maximum number of lags.
`x` and `y` must be one-dimensional numpy arrays with the same length.
This computes the same result as
numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]
The return vaue has length 2*maxlag + 1.
"""
py = np.pad(y.conj(), 2*maxlag, mode='constant')
T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
strides=(-py.strides[0], py.strides[0]))
px = np.pad(x, maxlag, mode='constant')
if mode == 'dot': # get lagged dot product
return T.dot(px)
elif mode == 'corr': # gets Pearson correlation
return (T.dot(px)/px.size - (T.mean(axis=1)*px.mean())) / \
(np.std(T, axis=1) * np.std(px))

I encountered the same problem some time ago, I paid more attention to the efficiency of calculation.Refer to the source code of MATLAB's function xcorr.m, I made a simple one.
import numpy as np
from scipy import signal, fftpack
import math
import time
def nextpow2(x):
if x == 0:
y = 0
else:
y = math.ceil(math.log2(x))
return y
def xcorr(x, y, maxlag):
m = max(len(x), len(y))
mx1 = min(maxlag, m - 1)
ceilLog2 = nextpow2(2 * m - 1)
m2 = 2 ** ceilLog2
X = fftpack.fft(x, m2)
Y = fftpack.fft(y, m2)
c1 = np.real(fftpack.ifft(X * np.conj(Y)))
index1 = np.arange(1, mx1+1, 1) + (m2 - mx1 -1)
index2 = np.arange(1, mx1+2, 1) - 1
c = np.hstack((c1[index1], c1[index2]))
return c
if __name__ == "__main__":
s = time.clock()
a = [1, 2, 3, 4, 5]
b = [6, 7, 8, 9, 10]
c = xcorr(a, b, 3)
e = time.clock()
print(c)
print(e-c)
Take the results of a certain run as an exmple:
[ 29. 56. 90. 130. 110. 86. 59.]
0.0001745000000001884
comparing with MATLAB code:
clear;close all;clc
tic
a = [1, 2, 3, 4, 5];
b = [6, 7, 8, 9, 10];
c = xcorr(a, b, 3)
toc
29.0000 56.0000 90.0000 130.0000 110.0000 86.0000 59.0000
时间已过 0.000279 秒。
If anyone can give a strict mathematical derivation about this,that would be very helpful.

I think I have found a solution, as I was facing the same problem:
If you have two vectors x and y of any length N, and want a cross-correlation with a window of fixed len m, you can do:
x = <some_data>
y = <some_data>
# Trim your variables
x_short = x[window:]
y_short = y[window:]
# do two xcorrelations, lagging x and y respectively
left_xcorr = np.correlate(x, y_short) #defaults to 'valid'
right_xcorr = np.correlate(x_short, y) #defaults to 'valid'
# combine the xcorrelations
# note the first value of right_xcorr is the same as the last of left_xcorr
xcorr = np.concatenate(left_xcorr, right_xcorr[1:])
Remember you might need to normalise the variables if you want a bounded correlation

Here is another answer, sourced from here, seems faster on the margin than np.correlate and has the benefit of returning a normalised correlation:
def rolling_window(self, a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
def xcorr(self, x,y):
N=len(x)
M=len(y)
meany=np.mean(y)
stdy=np.std(np.asarray(y))
tmp=self.rolling_window(np.asarray(x),M)
c=np.sum((y-meany)*(tmp-np.reshape(np.mean(tmp,-1),(N-M+1,1))),-1)/(M*np.std(tmp,-1)*stdy)
return c

as I answered here, https://stackoverflow.com/a/47897581/5122657
matplotlib.xcorr has the maxlags param. It is actually a wrapper of the numpy.correlate, so there is no performance saving. Nevertheless it gives exactly the same result given by Matlab's cross-correlation function. Below I edited the code from matplotlib so that it will return only the correlation. The reason is that if we use matplotlib.corr as it is, it will return the plot as well. The problem is, if we put complex data type as the arguments into it, we will get "casting complex to real datatype" warning when matplotlib tries to draw the plot.
<!-- language: python -->
import numpy as np
import matplotlib.pyplot as plt
def xcorr(x, y, maxlags=10):
Nx = len(x)
if Nx != len(y):
raise ValueError('x and y must be equal length')
c = np.correlate(x, y, mode=2)
if maxlags is None:
maxlags = Nx - 1
if maxlags >= Nx or maxlags < 1:
raise ValueError('maxlags must be None or strictly positive < %d' % Nx)
c = c[Nx - 1 - maxlags:Nx + maxlags]
return c

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

What does numpy.gradient do? - python

So I know what the gradient of a (mathematical) function is, so I feel like I should know what numpy.gradient does. But I don't. The documentation is not really helpful either: Return the gradient of an N-dimensional array. What is the gradient of an array? When is numpy.gradient useful?

The gradient is computed using central differences in the interior and first differences at the boundaries. and The default distance is 1 This means that in the interior it is computed as where h = 1.0 and at the boundaries

Think about N-dimensional array as a matrix. Then gradient is nothing else as matrix differentiation For a good explanation look at gradient description in matlab documentation.

Related

Python array optimization with two constraints

Building N-th order Markovian transition matrix from a given sequence

Shortest distance between a point and a line in 3 d space

How to implement the Softmax function in Python

Partial convolution / correlation with numpy [duplicate]

Categories

Resources