I want to do some rolling window calculation in pandas which need to deal with two columns at the same time. I'll take an simple instance to express the problem clearly:
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
result = []
for i in range(1, len(df)+1):
if i < windowSize:
result.append(None)
else:
x = df.x.iloc[i-windowSize:i]
y = df.y.iloc[i-windowSize:i]
m = y.mean()
r = sum(x[y > m]) / sum(x[y <= m])
result.append(r)
print(result)
Is there any way without for loop in pandas to solve the problem? Any help is appreciated
You can use the rolling window trick for numpy arrays and apply it to the array underlying the DataFrame.
import pandas as pd
import numpy as np
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
The result lacks the leading None values, but they should be easy to append (in form of np.nan or after converting the result to a list).
This is probably not what you are looking for, working with pandas, but it will get the job done without loops.
Here's one vectorized approach using NumPy tools -
windowSize = 4
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
strided_app is taken from here.
Runtime test -
Approaches -
# #kazemakase's solution
def rolling_window_sum(df, windowSize=4):
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
return result
# Proposed in this post
def strided_einsum(df, windowSize=4):
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
return out
Timings -
In [46]: df = pd.DataFrame(np.random.randint(0,9,(1000000,2)))
In [47]: %timeit rolling_window_sum(df)
10 loops, best of 3: 90.4 ms per loop
In [48]: %timeit strided_einsum(df)
10 loops, best of 3: 62.2 ms per loop
To squeeze in more performance, we can compute the Y.mean(1) part, which is basically a windowed summation with Scipy's 1D uniform filter. Thus, M could be alternatively computed for windowSize=4 as -
from scipy.ndimage.filters import uniform_filter1d as unif1d
M = unif1d(a[:,1].astype(float),windowSize)[2:-1]
The performance gains are significant -
In [65]: %timeit strided_einsum(df)
10 loops, best of 3: 61.5 ms per loop
In [66]: %timeit strided_einsum_unif_filter(df)
10 loops, best of 3: 49.4 ms per loop
Related
Numpy's arange accepts only single scalar values for start/stop/step. Is there a multi version of this function? Which can accept array inputs for start/stop/step? E.g. having input 2D array like:
[[1 5 1], # start/stop/step first
[3 8 2]] # start/stop/step second
should create array consisting of concatenation of aranges for every row of input (each start/stop/step), input above should create 1D array
1 2 3 4 3 5 7
i.e. we need to design such function that it does next:
print(np.multi_arange(np.array([[1,5,1],[3,8,2]])))
# prints:
# array([1, 2, 3, 4, 3, 5, 7])
And this function should be efficient (pure numpy), i.e. very fast process input array of shape (10000, 3) without pure-Python looping.
Of cause it is possible to create pure Python's loop (or listcomp) to create arange for each row and concatenate results of this loop. But I have very many rows with triples start/stop/step and need to have efficient and fast code, hence looking for pure numpy function.
Why do I need it. I needed this for several tasks. One of this is for indexing - suppose I have 1D array a and I need to extract many (possibly intersecting) subranges of this array. If I had that multi version of arange I would just do:
values = a[np.multi_arange(starts_stops_steps)]
Maybe it is possible to create multi arange function using some combinations of numpy functions? Can you suggest?
Also maybe there are some more efficient solutions for the specific case of extracting subranges of 1D array (see last line of code above) without creating all indexes using multi_arange?
Here's a vectorized one with cumsum that accounts for positive and negative stepsizes -
def multi_arange(a):
steps = a[:,2]
lens = ((a[:,1]-a[:,0]) + steps-np.sign(steps))//steps
b = np.repeat(steps, lens)
ends = (lens-1)*steps + a[:,0]
b[0] = a[0,0]
b[lens[:-1].cumsum()] = a[1:,0] - ends[:-1]
return b.cumsum()
If you need to validate for valid ranges : (start < stop when step > 0) and (start > stop when step < 0) , use a pre-processing step :
a = a[((a[:,1] > a[:,0]) & (a[:,2]>0) | (a[:,1] < a[:,0]) & (a[:,2]<0))]
Sample run -
In [17]: a
Out[17]:
array([[ 1, 5, 1],
[ 3, 8, 2],
[18, 6, -2]])
In [18]: multi_arange(a)
Out[18]: array([ 1, 2, 3, 4, 3, 5, 7, 18, 16, 14, 12, 10, 8])
In [1]: np.r_[1:5:1, 3:8:2]
Out[1]: array([1, 2, 3, 4, 3, 5, 7])
In [2]: np.hstack((np.arange(1,5,1),np.arange(3,8,2)))
Out[2]: array([1, 2, 3, 4, 3, 5, 7])
The r_ version is nice and compact, but not faster:
In [3]: timeit np.r_[1:5:1, 3:8:2]
23.9 µs ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [4]: timeit np.hstack((np.arange(1,5,1),np.arange(3,8,2)))
11.2 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I've just came up with my solution using numba. Still I prefer numpy-only solution if we find best one not to carry heavy numba JIT compiler.
I've also tested #Divakar solution in my code.
Next code output is:
naive_multi_arange 0.76601 sec
arty_multi_arange 0.01801 sec 42.52 speedup
divakar_multi_arange 0.05504 sec 13.92 speedup
Meaning my numba solution has 42x speedup, #Divakar's numpy solution has 14x speedup.
Next code can be also run online here.
import time, random
import numpy as np, numba
#numba.jit(nopython = True)
def arty_multi_arange(a):
starts, stops, steps = a[:, 0], a[:, 1], a[:, 2]
pos = 0
cnt = np.sum((stops - starts + steps - np.sign(steps)) // steps, dtype = np.int64)
res = np.zeros((cnt,), dtype = np.int64)
for i in range(starts.size):
v, stop, step = starts[i], stops[i], steps[i]
if step > 0:
while v < stop:
res[pos] = v
pos += 1
v += step
elif step < 0:
while v > stop:
res[pos] = v
pos += 1
v += step
assert pos == cnt
return res
def divakar_multi_arange(a):
steps = a[:,2]
lens = ((a[:,1]-a[:,0]) + steps-np.sign(steps))//steps
b = np.repeat(steps, lens)
ends = (lens-1)*steps + a[:,0]
b[0] = a[0,0]
b[lens[:-1].cumsum()] = a[1:,0] - ends[:-1]
return b.cumsum()
random.seed(0)
neg_prob = 0.5
N = 100000
minv, maxv, maxstep = -100, 300, 15
steps = [random.randrange(1, maxstep + 1) * ((1, -1)[random.random() < neg_prob]) for i in range(N)]
starts = [random.randrange(minv + 1, maxv) for i in range(N)]
stops = [random.randrange(*(((starts[i] + 1, maxv + 1), (minv, starts[i]))[steps[i] < 0])) for i in range(N)]
joined = np.array([starts, stops, steps], dtype = np.int64).T
tb = time.time()
aref = np.concatenate([np.arange(joined[i, 0], joined[i, 1], joined[i, 2], dtype = np.int64) for i in range(N)])
npt = time.time() - tb
print('naive_multi_arange', round(npt, 5), 'sec')
for func in ['arty_multi_arange', 'divakar_multi_arange']:
globals()[func](joined)
tb = time.time()
a = globals()[func](joined)
myt = time.time() - tb
print(func, round(myt, 5), 'sec', round(npt / myt, 2), 'speedup')
assert a.size == aref.size, (a.size, aref.size)
assert np.all(a == aref), np.vstack((np.flatnonzero(a != aref)[:5], a[a != aref][:5], aref[a != aref][:5])).T
Say I have
q = np.array(['a', 'b'])
terms = np.array(['a', 'b', 'c', 'd'])
How can I create a n-hot vector v, as [1, 1, 0, 0], such that every item that appears in q will have its' index set to 1 in a zero-vector of length = len(terms)?
You can use np.isin, and turn it into an array of int:
>>> np.isin(terms,q).astype(int)
array([1, 1, 0, 0])
If you have pandas, you can use the pd.Index API for very fast (constant time) searching per term:
>>> idx = pd.Index(q)
>>> (idx.get_indexer_for(terms) >= 0).astype(int)
array([1, 1, 0, 0])
Another option is broadcasted comparison:
>>> (q == terms[:, None]).any(1).astype(int)
array([1, 1, 0, 0])
This is fast, but you should prefer the first option, or #sacul's answer for large (~1M) data.
Here is a searchsorted based method that is fast and readily applicable to batches of vectors:
Timings for 4, 12 and 26 classes and batches of 1000 vectors of length 2. broadcast is #coldspeed's method.
4
broadcast : 0.248 ms
searchsorted: 0.095 ms
12
broadcast : 0.468 ms
searchsorted: 0.119 ms
26
broadcast : 0.748 ms
searchsorted: 0.137 ms
Code:
import numpy as np
from string import ascii_lowercase
def broadcast(test, classes):
return (test[..., None] == classes).any(-2).view(np.uint8)
def searchsorted(test, classes):
X = classes.argsort()
out = np.zeros((*test.shape[:-1], classes.size), np.uint8)
idx = np.ogrid[tuple(map(slice, out.shape))]
idx = *idx[:-1], X[classes[X].searchsorted(test)]
out[idx] = 1
return out
letters = np.fromiter(ascii_lowercase, 'U1', 26)
np.random.shuffle(letters)
def make_test(n=26, shp=(1000,)):
v = np.random.randint(0, n, shp)
w = (np.random.randint(0, n-1, shp) + 1 + v) % n
d = len(shp)
return letters[:n], letters[np.r_[f'{d},{d+1},0', v, w]]
from timeit import timeit
def test_it(f, args, n=1000, format='{0.__name__:12s}: {1:10.3f} ms'.format):
res = timeit('f(*args)', globals=dict(f=f, args=args), number=n) * 1000/n
return res, format(f, res)
for k in [4, 12, 26]:
T, L = make_test(k)
print(k)
for f in [broadcast, searchsorted]:
t, msg = test_it(f, (L, T))
print(msg)
I have a 4 x 4 matrix
import numpy as np
c = np.random.rand((4,4))
I want to create an 100 x 4 x 4 x 100 tensor such that when the first an last index are equal, I get back my matrix else I get zeros.
I can do this in a loop as
Z = np.zeros((100, 4, 4, 100))
for i in range(100):
Z[i, :, :, i] = c
is there a better way to do this? I tried looking at np.tensordot and np.einsum but could not figure it out.
Thanks,
Sahil
Use advanced-indexing -
n = 100
Zout = np.zeros((n, 4, 4, n))
I = np.arange(n)
Zout[I,:,:,I] = c
With eye-masking -
n = 100
mask = np.eye(n, dtype=bool)
Zout = np.zeros((n, 4, 4, n))
Zout.transpose(0,3,1,2)[mask] = c
Timings -
In [72]: c = np.random.rand(4,4)
In [73]: %%timeit
...: n = 100
...: Zout = np.zeros((n, 4, 4, n))
...: I = np.arange(n)
...: Zout[I,:,:,I] = c
10000 loops, best of 3: 47.5 µs per loop
In [74]: %%timeit
...: n = 100
...: mask = np.eye(n, dtype=bool)
...: Zout = np.zeros((n, 4, 4, n))
...: Zout.transpose(0,3,1,2)[mask] = c
10000 loops, best of 3: 73.1 µs per loop
I have a 2d numpy array, for example:
a = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
and another 1d array:
I = np.array([0, 2, 3, 1, 0, 2, 0, 1])
I want to rotate a by np.rot90 function like following:
b = np.zeros((len(I), 3, 3))
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
Can I do it more efficiently without the floop?
Approach #1
Generate a 3D array of all possible 4 rotations and simply index into it with I and thus have a vectorized solution -
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a # For np.rot90(a, k=0)
P[1] = a.T[::-1] # For np.rot90(a, k=1)
P[2] = a[::-1,::-1] # For np.rot90(a, k=2)
P[3] = a.T[:,::-1] # For np.rot90(a, k=3)
out = P[I]
Approach #2
Another way to create P would be with -
P = np.array([np.rot90(a, k=i) for i in range(4)])
and as with the previous method simply index into P with I for final output.
Runtime test
Approaches -
def org_app(a, I):
m,n = a.shape
b = np.zeros((len(I), m, n), dtype=a.dtype)
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
return b
def app1(a, I):
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a
P[1] = a.T[::-1]
P[2] = a[::-1,::-1]
P[3] = a.T[:,::-1]
return P[I]
def app2(a, I):
P = np.array([np.rot90(a, k=i) for i in range(4)])
return P[I]
Timings -
In [54]: a = np.random.randint(0,9,(10,10))
In [55]: I = np.random.randint(0,4,(10000))
In [56]: %timeit org_app(a, I)
10 loops, best of 3: 51 ms per loop
In [57]: %timeit app1(a, I)
1000 loops, best of 3: 469 µs per loop
In [58]: %timeit app2(a, I)
1000 loops, best of 3: 549 µs per loop
100x+ speedup!
One more efficient way that I can think of (still not vectorized) is using a list comprehension, in one line:
np.array([np.rot90(a, k=i) for i in I])
I'm trying to compute circular cross-correlation of two signals with Theano to use it in further calculation of loss that I would optimize over. But I'm not quite sure how to do that.
It is defined as following:
(f * g)[n] = sum_k f[k]g[k+n]
ccc[n] = \sum_k (f*g)[n-kN]
"periodic" summation or like "for each k-th component".
I could do an ordinary correlation and then perform periodic summation, but it's not quite clear how to do that (periodic summation) symbolically (using scan, probably?)
conv2d = T.signal.conv.conv2d
x = T.dmatrix()
y = T.dmatrix()
veclen = x.shape[1]
corr_expr = conv2d(x, y[:, ::-1], image_shape=(1, veclen), border_mode='full')
# circ_corr = T.sum([corr_expr[k::veclen] for k in T.arange(veclen)])
corr = theano.function([x, y], outputs=circ_corr)
corr( np.array([[2, 3, 5]]), np.array([[7, 11, 13]]) )
or use circular cross-correlation theorem and compute as a iFFT(FFT(x)*FFT(y)):
import theano.sandbox.fourier as dft
x = T.dmatrix()
y = T.dvector()
veclen = x.shape[1]
exp = T.real(
dft.ifft(
dft.fft(x, veclen, axis=1)
* dft.fft(y[::-1], y.shape[0], axis=1).reshape((1, -1)),
veclen, axis=1
)
)[:, ::-1]
f = theano.function([x, y], outputs=exp)
f(np.array([[2, 3, 5], [3, 4, 4], [5, 6, 7]]), np.array([7, 11, 13]) )
but in this case I can't actually compute a gradient because gradient for ifft (and all functions that has something to do with complex numbers in general, afaik) is not implemented yet, I guess (aborts with an error: Elemwise{real,no_inplace}.grad illegally returned an integer-valued variable. (Input index 0, dtype complex128))
Here's a working solution I came up with (definitely not optimal as soon as FFT is not used):
def circular_crosscorelation(X, y):
"""
Input:
symbols for X [n, m]
and y[m,]
Returns:
symbol for circular cross corelation of each of row in X with
cc[n, m]
"""
n, m = X.shape
corr_expr = T.signal.conv.conv2d(X, y[::-1].reshape((1, -1)), image_shape=(1, m), border_mode='full')
corr_len = corr_expr.shape[1]
pad = m - corr_len%m
v_padded = T.concatenate([corr_expr, T.zeros((n, pad))], axis=1)
circ_corr_exp = T.sum(v_padded.reshape((n, v_padded.shape[1] / m, m)), axis=1)
return circ_corr_exp[:, ::-1]
X = T.dmatrix()
y = T.dmatrix()
cc = theano.function([X, y], circular_crosscorelation(X, y))
print cc( np.array([[2, 3, 5], [4, 5, 6]]), np.array([[7, 11, 13]]) )
returns
[[ 94. 108. 108.]
[ 149. 157. 159.]]
as expected.
And can be analytically differentiated:
score = T.sum(circ_corr_exp**2)
grad = T.grad(score, x)
g = theano.function([x, y], outputs=grad)
print g( np.array([[2, 3, 5], [4, 5, 6]]), np.array([[7, 11, 13]]) )
>> [[ 6332. 6388. 6500.]
>> [ 9554. 9610. 9666.]]
here's also few more options (through direct circulant calculation) and time-comparation:
def circulant_np(v):
row = np.arange(len(v))
col = -np.arange(len(v))
idx = (row[:, np.newaxis] + col)%len(v)
return v[idx]
print circulant_np(np.array([1, 2, 3, 5]))
def c_corr_np(a, b):
return circulant_np(a).dot(b[::-1])
def circulant_t(v):
row = T.arange(v.shape[0])
col = -T.arange(v.shape[0])
idx = (row.reshape((-1, 1)) + col)%v.shape[0]
return v[idx]
def c_corr_t_f(a, b):
""" 1d correlation using circulant matrix """
return circulant_t(a).dot(b[::-1])
a = T.dvector('a')
b = T.dvector('b')
c_corr_t = theano.function([a, b], c_corr_t_f(a, b))
print c_corr_np(np.array([2, 3, 5]), np.array([7, 11, 13]))
print c_corr_t(np.array([2, 3, 5]), np.array([7, 11, 13]))
print c_corr( np.array([[2, 3, 5]]), np.array([[7, 11, 13]]) )
%timeit c_corr_np(np.array([2, 3, 5]), np.array([7, 11, 13]))
%timeit c_corr_t(np.array([2, 3, 5]), np.array([7, 11, 13]))
%timeit c_corr( np.array([[2, 3, 5]]), np.array([[7, 11, 13]]) ) # = circular_crosscorelation
which gives
10000 loops, best of 3: 30.6 µs per loop
10000 loops, best of 3: 132 µs per loop
10000 loops, best of 3: 149 µs per loop
and inverse cross-corr:
def inverse_circular_crosscorelation(y):
"""
Input:
symbol for y[1, m]
Returns:
symbol for y_inv s.t.
cc( y, y_inv ) = (1, 0 ... 0)
"""
A = circulant_t(y.reshape((-1, )))
b = T.concatenate([T.zeros((y.shape[1] - 1, )), T.ones((1, ))]).reshape((-1, 1))
return T.nlinalg.matrix_inverse(A).dot(b).reshape((1, -1))[:, ::-1]