I'm writing a moving average function that uses the convolve function in numpy, which should be equivalent to a (weighted moving average). When my weights are all equal (as in a simple arithmatic average), it works fine:
data = numpy.arange(1,11)
numdays = 5
w = [1.0/numdays]*numdays
numpy.convolve(data,w,'valid')
gives
array([ 3., 4., 5., 6., 7., 8.])
However, when I try to use a weighted average
w = numpy.cumsum(numpy.ones(numdays,dtype=float),axis=0); w = w/numpy.sum(w)
instead of the (for the same data) 3.667,4.667,5.667,6.667,... I expect, I get
array([ 2.33333333, 3.33333333, 4.33333333, 5.33333333, 6.33333333,
7.33333333])
If I remove the 'valid' flag, I don't even see the correct values. I would really like to use convolve for the WMA as well as MA as it makes the code cleaner (same code, different weights) and otherwise I think I'll have to loop through all the data and take slices.
Any ideas about this behavior?
What you want is np.correlate in a convolution the second argument is inverted basically, so that your expected result would be with np.convolve(data, w[::-1], 'valid').
Related
I have a set of many matrices each corresponding to a vector. I want to multiply each matrix by its vector smartly. I know I can putt all the matrices in a big block diagonal form, and multiply it by a big combined vector.
I want to know if there is a way to use numpy.dot to multiply all of them in an efficient way.
I have tried to use numpy.stack and the numpy.dot, but I can't get only the wanted vectors.
To be more specific. My matrices look like:
R_stack = np.stack((R, R2, R3))
which is
array([[[-0.60653066, 1.64872127],
[ 0.60653066, -1.64872127]],
[[-0.36787944, 2.71828183],
[ 0.36787944, -2.71828183]],
[[-0.22313016, 4.48168907],
[ 0.22313016, -4.48168907]]])
and my vectors look like:
p_stack = np.stack((p0, p0_2, p0_3))
which is
array([[[0.73105858],
[0.26894142]],
[[0.88079708],
[0.11920292]],
[[0.95257413],
[0.04742587]]])
I want to multiply the following: R*p0, R2*p0_2, R3*p0_3.
When I do the dot :
np.dot(R_stack, p_stack)[:,:,:,0]
I get
array([[[ 0. , -0.33769804, -0.49957337],
[ 0. , 0.33769804, 0.49957337]],
[[ 0.46211716, 0. , -0.22151555],
[-0.46211716, 0. , 0.22151555]],
[[ 1.04219061, 0.33769804, 0. ],
[-1.04219061, -0.33769804, 0. ]]])
The 3 vectors I'm interested in are the 3 [0,0] vectors on the diagonal. How can I get them?
You are almost there. You need to add a diagonal index on 1st and 3rd dimensions like so:
np.dot(R_stack, p_stack)[np.arange(3),:,np.arange(3),0]
Every row in the result will correspond to one of your desired vectors:
array([[-3.48805945e-09, 3.48805945e-09],
[-5.02509157e-09, 5.02509157e-09],
[-1.48245199e-08, 1.48245199e-08]])
Another way I found is to use numpy.diagonal
np.diagonal(np.dot(R_stack, p_stack)[:,:,:,0], axis1=0, axis2=2)
which gives a vector in each column:
array([[0., 0., 0.],
[0., 0., 0.]])
I am using python 2.7 with scipy to calculate a distance matrix for an array.
I don't get how to find the wanted distance values in the returned condensed matrix.
See example
from scipy.spatial.distance import pdist
import numpy as np
a = np.array([[1],[4],[0],[5]])
print a
print pdist(a)
will print
[ 3. 1. 4. 4. 1. 5.]
I found here that the ij entry in the condensed matrix should store the distance between the i and j entries where ithread wondering if they mean ij as i*j or str.join(i,j) e.g 1,2 -> 2 or 12.
I can't find a consistent way to know the wanted index.
see my example, you should expect that all of the distances from entry 0 to anywhere else will be stored in entry 0 if the first option is valid.
can anyone shed some light on how can i extract my wanted distance from entry x to entry y? which index am i looking for?
Thanks!
This vector is in condensed form. It enumerates all pairs of indices in a natural order (in your example 0,1 0,2 0,3 0,4 1,2 1,3 1,4 2,3 2,4 ) and yields the distance between the elements at these array entries.
There is also the squareform function, which transforms the condensed form into a square matrix form (and vice versa). The square matrix form is exactly what you expect, i.e. at entry ij (row i, column j), it stores the distance between the i-th and j-th entry. For example, if you add print squareform(d) at the end of you code, the output will be:
array([[ 0., 3., 1., 4.],
[ 3., 0., 4., 1.],
[ 1., 4., 0., 5.],
[ 4., 1., 5., 0.]])
Is there a way to determine at runtime if a function requires numpy.vectorize() to behave as expected?
For background, I ask this because I'm using Numpy in a program to calculate phase diagrams from thermodynamic functions available in the literature (CALPHAD-based). For a given temperature, one evaluates the free energy functions and determines the common tangent curves touching concave-up (second derivative > 0) to define composition ranges of phase coexistence. For this, it was nice to directly define the second derivative function. All was going well with real free energy functions (not hard to get derivatives of) until I tried to test with a simple parabolic free enrgy, which has a constant second derivative. This blew up my algorithm since I had not expected the numpy broadcasting to look inside the function and decide it did not need to broadcast.
The difficulty comes down to this behavior:
import numpy as np
def f(x):
return(x*x)
def g(x):
return(3.0)
def h(x):
return(0*x+3.0)
def i(x):
return(x-x+3.0)
x = np.linspace(1.0, 5.0, 5)
Running in IPython 3.3.2 results in these outputs:
f(x) ->
array([ 1., 4., 9., 16., 25.]) -- what I expected
g(x) ->
3.0 (note only 1 element, and a float, not ndarray) -- not naively expected
h(x) ->
array([ 3., 3., 3., 3., 3.]) -- OK, fooled the broadcasting by having x do something
i(x) ->
array([ 3., 3., 3., 3., 3.]) -- Same as h(x) but avoiding the multiply but with roundoff issues
Now I could use
gv = np.vectorize(g)
and get
gv(x) -> array([ 3., 3., 3., 3., 3.]) -- expected behavior
If my program is to (eventually) accept arbitrary user-entered free energy functions this will cause problems unless all users understand numpy internal broadcasting magics.
Or, I could reflexively np.vectorize everything to prevent this. The problem is the cost if the function will "just work" in numpy.
That is, using %timeit in IPython,
h(x) -> 100000 loops, best of 3: 3.45 µs per loop
If I vectorize h(x) needlessly (i.e. hv = np.vectorize(h)), I get
hv(x) -> 10000 loops, best of 3: 43.2 µs per loop
So, needlessly vectorizing is a huge penalty (40 microseconds for 5 function evals).
I guess I could to an initial test on the return of a function evaluating on a small ndarray to see if the return type is array or float, and then define a new function if it is float, like:
def gv(x):
return(g(x)+0.0*x)
That just seems like a horrible kludge.
So - is there a better way to 'fool' numpy into efficiently broadcasting in this case?
To solve the problem shown. If you want a new array:
def g(x):
return np.ones_like(x)*3
or if you want to set all elements in an array to 3 in place:
def g(x):
x[:] = 3
Note there is no return statement here as you are simply updating array x so that all elements are 3.
The issue with def g(x): return(3) as shown is there is no reference to numpy inside the function. You state for any given input return 3. Stating x=3 will run into similar issues as you are updating the pointer x to point to 3 instead of the numpy array. While the statement x[:]=3 accesses an internal function known as a view from the numpy.ndarray class instead of the usual use of the = statement that simply updates a pointer.
As others have suggested, you could wrap the user-provided functions to make sure the output shape is correct. For example:
def wrap_user_function(func, x):
out = func(x)
if np.isscalar(out):
return np.zeros_like(x) + out
return out
This only handles the scalar output case specially, but it should at least take care of your g(x) issue, without imposing much of a performance hit.
I'm passing some simple IDL code to Python. However the returned FFT values form the SciPy/NumPy packages is different than the IDL one and I can't find out why.
Reducing it all to a simple example of 8 elements I found that the SciPy/NumPy routines return values that are 8 (2^3) times bigger than the IDL ones (a normalization problem I thought).
Here is the example code (copied from here) in both languages:
IDL
signal = ([-2., 8., -6., 4., 1., 0., 3., 5.])
fourier = fft(signal)
print, fourier
returns
( 1.62500, 0.00000) ( 0.420495, 0.506282) ( 0.250000, 0.125000) ( -1.17050, -1.74372) ( -2.62500, -0.00000) ( -1.17050, 1.74372) ( 0.250000, -0.125000) ( 0.420495, -0.506282)
Python
from scipy.fftpack import fft
import numpy as N
…
signal = N.array([-2., 8., -6., 4., 1., 0., 3., 5.])
fourier = fft(signal)
print fourier
returns
[ 13. +0.j , 3.36396103 +4.05025253j, 2. +1.j , -9.36396103-13.94974747j, -21. +0.j , -9.36396103+13.94974747j, 2. -1.j , 3.36396103 -4.05025253j]
I did it with the NumPy package and I got the same results. I tried also print fft(signal, 8 ) just in case but it returned the same, as expected.
However that's not all, coming back to my real array of 256 elements I found that the difference was no longer 8 or 256, but 256*8! it's just insane.
Although I worked around the problem I NEED to know why there is that difference.
Solved: It was just the normalization, at some point I divided the IDL 256 array by a factor of 8 that I forgot to remove. In Dougal's answer there is the documentation that I missed.
IDL and numpy use slightly different definitions of the DFT. Numpy's is (from the documentation):
(source: scipy.org)
while IDL's is (from here):
Numpy's m is the same as IDL's x, k is u, n is N. I think a_m and f(x) are the same thing as well. So the factor of 1/N is the obvious difference, explaining the difference of 8 in your 8-elt case.
I'm not sure about the 256*8 one for the 256-elt case; could you maybe post the original array and both outputs somewhere? (Does this happen for all 256-elt arrays? What about other sizes? I don't have IDL....)
Is it possible to do array broadcasting in numpy with parameters that are vectors?
For example, I know that I can do this
def bernoulli_fraction_to_logodds(fraction):
if fraction == 1.0:
return inf
return log(fraction / (1 - fraction))
bernoulli_fraction_to_logodds = numpy.frompyfunc(bernoulli_fraction_to_logodds, 1, 1)
and have it work with the whole array. What if I have a function that take a 2-element vector and returns a 2-element vector. Can I pass it an array of 2-element vectors? E.g.,
def beta_ml_fraction(beta):
a = beta[0]
b = beta[1]
return a / (a + b)
beta_ml_fraction = numpy.frompyfunc(beta_ml_fraction, 1, 1)
Unfortunately, this doesn't work. Is there a similar function to from_py_func that works. I can hack around this when they are 2-element vectors, but what about when they are n-element vectors?
Thus, input of (2,3) should give 0.4, but input of [[2,3], [3,3]] should give [0.4, 0.5].
I don't think frompyfunc can do this, though I could be wrong.
Regarding np.vectorize A. M. Archibald wrote:
In fact, anything that goes through
python code for the "combine two
scalars" will be slow. The slowness of
looping in python is not because
python's looping constructs are slow,
it's because executing python code is
slow. So vectorize is kind of a cheat
- it doesn't actually run fast, but it is convenient.
So np.frompyfunc (and np.vectorize) are just syntactic sugar -- they don't make Python functions run any faster.
After realizing that, my interest in frompyfunc flagged (to near zero).
There is nothing unreadable about a Python loop, so either use one explicitly,
or rewrite the function to truly leverage numpy (by writing truly vectorized equations).
import numpy as np
def beta_ml_fraction(beta):
a = beta[:,0]
b = beta[:,1]
return a / (a + b)
arr=np.array([(2,3)],dtype=np.float)
print(beta_ml_fraction(arr))
# [ 0.4]
arr=np.array([(2,3),(3,3)],dtype=np.float)
print(beta_ml_fraction(arr))
# [ 0.4 0.5]
When dealing with bidimensional vector array I like to keep the x and y components as the first index. For this I make heavy use of the transpose()
def beta_ml_fraction(beta):
a = beta[0]
b = beta[1]
return a / (a + b)
arr=np.array([(2,3),(3,3)],dtype=np.float)
print(beta_ml_fraction(arr.transpose()))
# [ 0.4 0.5]
the advantage of this approach is that handling multidimensional array of bi-dimensional vector becomes easiear.
x = np.arange(18,dtype=np.float).reshape(2,3,3)
print(x)
#array([[[ 0., 1., 2.],
# [ 3., 4., 5.],
# [ 6., 7., 8.]],
#
# [[ 9., 10., 11.],
# [ 12., 13., 14.],
# [ 15., 16., 17.]]])
print(beta_ml_fraction(x))
#array([[ 0. , 0.09090909, 0.15384615],
# [ 0.2 , 0.23529412, 0.26315789],
# [ 0.28571429, 0.30434783, 0.32 ]])