Related
Numpy's arange accepts only single scalar values for start/stop/step. Is there a multi version of this function? Which can accept array inputs for start/stop/step? E.g. having input 2D array like:
[[1 5 1], # start/stop/step first
[3 8 2]] # start/stop/step second
should create array consisting of concatenation of aranges for every row of input (each start/stop/step), input above should create 1D array
1 2 3 4 3 5 7
i.e. we need to design such function that it does next:
print(np.multi_arange(np.array([[1,5,1],[3,8,2]])))
# prints:
# array([1, 2, 3, 4, 3, 5, 7])
And this function should be efficient (pure numpy), i.e. very fast process input array of shape (10000, 3) without pure-Python looping.
Of cause it is possible to create pure Python's loop (or listcomp) to create arange for each row and concatenate results of this loop. But I have very many rows with triples start/stop/step and need to have efficient and fast code, hence looking for pure numpy function.
Why do I need it. I needed this for several tasks. One of this is for indexing - suppose I have 1D array a and I need to extract many (possibly intersecting) subranges of this array. If I had that multi version of arange I would just do:
values = a[np.multi_arange(starts_stops_steps)]
Maybe it is possible to create multi arange function using some combinations of numpy functions? Can you suggest?
Also maybe there are some more efficient solutions for the specific case of extracting subranges of 1D array (see last line of code above) without creating all indexes using multi_arange?
Here's a vectorized one with cumsum that accounts for positive and negative stepsizes -
def multi_arange(a):
steps = a[:,2]
lens = ((a[:,1]-a[:,0]) + steps-np.sign(steps))//steps
b = np.repeat(steps, lens)
ends = (lens-1)*steps + a[:,0]
b[0] = a[0,0]
b[lens[:-1].cumsum()] = a[1:,0] - ends[:-1]
return b.cumsum()
If you need to validate for valid ranges : (start < stop when step > 0) and (start > stop when step < 0) , use a pre-processing step :
a = a[((a[:,1] > a[:,0]) & (a[:,2]>0) | (a[:,1] < a[:,0]) & (a[:,2]<0))]
Sample run -
In [17]: a
Out[17]:
array([[ 1, 5, 1],
[ 3, 8, 2],
[18, 6, -2]])
In [18]: multi_arange(a)
Out[18]: array([ 1, 2, 3, 4, 3, 5, 7, 18, 16, 14, 12, 10, 8])
In [1]: np.r_[1:5:1, 3:8:2]
Out[1]: array([1, 2, 3, 4, 3, 5, 7])
In [2]: np.hstack((np.arange(1,5,1),np.arange(3,8,2)))
Out[2]: array([1, 2, 3, 4, 3, 5, 7])
The r_ version is nice and compact, but not faster:
In [3]: timeit np.r_[1:5:1, 3:8:2]
23.9 µs ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [4]: timeit np.hstack((np.arange(1,5,1),np.arange(3,8,2)))
11.2 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I've just came up with my solution using numba. Still I prefer numpy-only solution if we find best one not to carry heavy numba JIT compiler.
I've also tested #Divakar solution in my code.
Next code output is:
naive_multi_arange 0.76601 sec
arty_multi_arange 0.01801 sec 42.52 speedup
divakar_multi_arange 0.05504 sec 13.92 speedup
Meaning my numba solution has 42x speedup, #Divakar's numpy solution has 14x speedup.
Next code can be also run online here.
import time, random
import numpy as np, numba
#numba.jit(nopython = True)
def arty_multi_arange(a):
starts, stops, steps = a[:, 0], a[:, 1], a[:, 2]
pos = 0
cnt = np.sum((stops - starts + steps - np.sign(steps)) // steps, dtype = np.int64)
res = np.zeros((cnt,), dtype = np.int64)
for i in range(starts.size):
v, stop, step = starts[i], stops[i], steps[i]
if step > 0:
while v < stop:
res[pos] = v
pos += 1
v += step
elif step < 0:
while v > stop:
res[pos] = v
pos += 1
v += step
assert pos == cnt
return res
def divakar_multi_arange(a):
steps = a[:,2]
lens = ((a[:,1]-a[:,0]) + steps-np.sign(steps))//steps
b = np.repeat(steps, lens)
ends = (lens-1)*steps + a[:,0]
b[0] = a[0,0]
b[lens[:-1].cumsum()] = a[1:,0] - ends[:-1]
return b.cumsum()
random.seed(0)
neg_prob = 0.5
N = 100000
minv, maxv, maxstep = -100, 300, 15
steps = [random.randrange(1, maxstep + 1) * ((1, -1)[random.random() < neg_prob]) for i in range(N)]
starts = [random.randrange(minv + 1, maxv) for i in range(N)]
stops = [random.randrange(*(((starts[i] + 1, maxv + 1), (minv, starts[i]))[steps[i] < 0])) for i in range(N)]
joined = np.array([starts, stops, steps], dtype = np.int64).T
tb = time.time()
aref = np.concatenate([np.arange(joined[i, 0], joined[i, 1], joined[i, 2], dtype = np.int64) for i in range(N)])
npt = time.time() - tb
print('naive_multi_arange', round(npt, 5), 'sec')
for func in ['arty_multi_arange', 'divakar_multi_arange']:
globals()[func](joined)
tb = time.time()
a = globals()[func](joined)
myt = time.time() - tb
print(func, round(myt, 5), 'sec', round(npt / myt, 2), 'speedup')
assert a.size == aref.size, (a.size, aref.size)
assert np.all(a == aref), np.vstack((np.flatnonzero(a != aref)[:5], a[a != aref][:5], aref[a != aref][:5])).T
I have a 4 x 4 matrix
import numpy as np
c = np.random.rand((4,4))
I want to create an 100 x 4 x 4 x 100 tensor such that when the first an last index are equal, I get back my matrix else I get zeros.
I can do this in a loop as
Z = np.zeros((100, 4, 4, 100))
for i in range(100):
Z[i, :, :, i] = c
is there a better way to do this? I tried looking at np.tensordot and np.einsum but could not figure it out.
Thanks,
Sahil
Use advanced-indexing -
n = 100
Zout = np.zeros((n, 4, 4, n))
I = np.arange(n)
Zout[I,:,:,I] = c
With eye-masking -
n = 100
mask = np.eye(n, dtype=bool)
Zout = np.zeros((n, 4, 4, n))
Zout.transpose(0,3,1,2)[mask] = c
Timings -
In [72]: c = np.random.rand(4,4)
In [73]: %%timeit
...: n = 100
...: Zout = np.zeros((n, 4, 4, n))
...: I = np.arange(n)
...: Zout[I,:,:,I] = c
10000 loops, best of 3: 47.5 µs per loop
In [74]: %%timeit
...: n = 100
...: mask = np.eye(n, dtype=bool)
...: Zout = np.zeros((n, 4, 4, n))
...: Zout.transpose(0,3,1,2)[mask] = c
10000 loops, best of 3: 73.1 µs per loop
I have a 2d numpy array, for example:
a = np.array([
[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
and another 1d array:
I = np.array([0, 2, 3, 1, 0, 2, 0, 1])
I want to rotate a by np.rot90 function like following:
b = np.zeros((len(I), 3, 3))
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
Can I do it more efficiently without the floop?
Approach #1
Generate a 3D array of all possible 4 rotations and simply index into it with I and thus have a vectorized solution -
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a # For np.rot90(a, k=0)
P[1] = a.T[::-1] # For np.rot90(a, k=1)
P[2] = a[::-1,::-1] # For np.rot90(a, k=2)
P[3] = a.T[:,::-1] # For np.rot90(a, k=3)
out = P[I]
Approach #2
Another way to create P would be with -
P = np.array([np.rot90(a, k=i) for i in range(4)])
and as with the previous method simply index into P with I for final output.
Runtime test
Approaches -
def org_app(a, I):
m,n = a.shape
b = np.zeros((len(I), m, n), dtype=a.dtype)
for i, k in enumerate(I):
b[i] = np.rot90(a, k=k)
return b
def app1(a, I):
P = np.empty((4,) + a.shape, dtype=a.dtype)
P[0] = a
P[1] = a.T[::-1]
P[2] = a[::-1,::-1]
P[3] = a.T[:,::-1]
return P[I]
def app2(a, I):
P = np.array([np.rot90(a, k=i) for i in range(4)])
return P[I]
Timings -
In [54]: a = np.random.randint(0,9,(10,10))
In [55]: I = np.random.randint(0,4,(10000))
In [56]: %timeit org_app(a, I)
10 loops, best of 3: 51 ms per loop
In [57]: %timeit app1(a, I)
1000 loops, best of 3: 469 µs per loop
In [58]: %timeit app2(a, I)
1000 loops, best of 3: 549 µs per loop
100x+ speedup!
One more efficient way that I can think of (still not vectorized) is using a list comprehension, in one line:
np.array([np.rot90(a, k=i) for i in I])
I want to do some rolling window calculation in pandas which need to deal with two columns at the same time. I'll take an simple instance to express the problem clearly:
import pandas as pd
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
result = []
for i in range(1, len(df)+1):
if i < windowSize:
result.append(None)
else:
x = df.x.iloc[i-windowSize:i]
y = df.y.iloc[i-windowSize:i]
m = y.mean()
r = sum(x[y > m]) / sum(x[y <= m])
result.append(r)
print(result)
Is there any way without for loop in pandas to solve the problem? Any help is appreciated
You can use the rolling window trick for numpy arrays and apply it to the array underlying the DataFrame.
import pandas as pd
import numpy as np
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
df = pd.DataFrame({
'x': [1, 2, 3, 2, 1, 5, 4, 6, 7, 9],
'y': [4, 3, 4, 6, 5, 9, 1, 3, 1, 2]
})
windowSize = 4
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
The result lacks the leading None values, but they should be easy to append (in form of np.nan or after converting the result to a list).
This is probably not what you are looking for, working with pandas, but it will get the job done without loops.
Here's one vectorized approach using NumPy tools -
windowSize = 4
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
strided_app is taken from here.
Runtime test -
Approaches -
# #kazemakase's solution
def rolling_window_sum(df, windowSize=4):
rw = rolling_window(df.values.T, windowSize)
m = np.mean(rw[1], axis=-1, keepdims=True)
a = np.sum(rw[0] * (rw[1] > m), axis=-1)
b = np.sum(rw[0] * (rw[1] <= m), axis=-1)
result = a / b
return result
# Proposed in this post
def strided_einsum(df, windowSize=4):
a = df.values
X = strided_app(a[:,0],windowSize,1)
Y = strided_app(a[:,1],windowSize,1)
M = Y.mean(1)
mask = Y>M[:,None]
sums = np.einsum('ij,ij->i',X,mask)
rest_sums = X.sum(1) - sums
out = sums/rest_sums
return out
Timings -
In [46]: df = pd.DataFrame(np.random.randint(0,9,(1000000,2)))
In [47]: %timeit rolling_window_sum(df)
10 loops, best of 3: 90.4 ms per loop
In [48]: %timeit strided_einsum(df)
10 loops, best of 3: 62.2 ms per loop
To squeeze in more performance, we can compute the Y.mean(1) part, which is basically a windowed summation with Scipy's 1D uniform filter. Thus, M could be alternatively computed for windowSize=4 as -
from scipy.ndimage.filters import uniform_filter1d as unif1d
M = unif1d(a[:,1].astype(float),windowSize)[2:-1]
The performance gains are significant -
In [65]: %timeit strided_einsum(df)
10 loops, best of 3: 61.5 ms per loop
In [66]: %timeit strided_einsum_unif_filter(df)
10 loops, best of 3: 49.4 ms per loop
The similar question has been asked, but none of the answers quite do what I need - some allow multidimensional searches (aka 'rows' option in matlab) but dont return the index. Some return the index but dont allow rows. My arrays are very large (1M x 2) and I have bee successful in making a loop that works, but obviously that is very slow. In matlab, the built-in ismember function takes about 10 seconds.
Here is what I am looking for:
a=np.array([[4, 6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
The exact matlab function that does the trick is:
[~,index] = ismember(a,b,'rows')
where
index = [6, 3, 9]
import numpy as np
def asvoid(arr):
"""
View the array as dtype np.void (bytes)
This views the last axis of ND-arrays as bytes so you can perform comparisons on
the entire row.
http://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)
Warning: When using asvoid for comparison, note that float zeros may compare UNEQUALLY
>>> asvoid([-0.]) == asvoid([0.])
array([False], dtype=bool)
"""
arr = np.ascontiguousarray(arr)
return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))
def in1d_index(a, b):
voida, voidb = map(asvoid, (a, b))
return np.where(np.in1d(voidb, voida))[0]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
print(in1d_index(a, b))
prints
[2 5 8]
This would be equivalent to Matlab's [3, 6, 9], since Python uses 0-based indexing.
Some caveats:
The indices are returned in increasing order. They do not correspond
to the location of the items of a in b.
asvoid will work for integer dtypes, but be careful if using asvoid
on float dtypes, since asvoid([-0.]) == asvoid([0.]) returns
array([False]).
asvoid works best on contiguous arrays. If the arrays are not contiguous, the data will be copied to a contiguous array, which will slow down the performance.
Despite the caveats, one might choose to use in1d_index anyway for the sake of speed:
def ismember_rows(a, b):
# http://stackoverflow.com/a/22705773/190597 (ashg)
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
In [41]: a2 = np.tile(a,(2000,1))
In [42]: b2 = np.tile(b,(2000,1))
In [46]: %timeit in1d_index(a2, b2)
100 loops, best of 3: 8.49 ms per loop
In [47]: %timeit ismember_rows(a2, b2)
1 loops, best of 3: 5.55 s per loop
So in1d_index is ~650x faster (for arrays of length in the low thousands), but again note the comparison is not exactly apples-to-apples since in1d_index returns the indices in increasing order, while ismember_rows returns the indices in the order rows of a show up in b.
import numpy as np
def ismember_rows(a, b):
'''Equivalent of 'ismember' from Matlab
a.shape = (nRows_a, nCol)
b.shape = (nRows_b, nCol)
return the idx where b[idx] == a
'''
return np.nonzero(np.all(b == a[:,np.newaxis], axis=2))[1]
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
idx = ismember_rows(a, b)
print idx
print np.all(b[idx] == a)
print
array([5, 2, 8])
True
e...I used broadcasting
--------------------------[update]------------------------------
def ismember(a, b):
return np.flatnonzero(np.in1d(b[:,0], a[:,0]) & np.in1d(b[:,1], a[:,1]))
a = np.array([[4, 6],[2, 6],[5, 2]])
b = np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
a2 = np.tile(a,(2000,1))
b2 = np.tile(b,(2000,1))
%timeit timeit in1d_index(a2, b2)
# 100 loops, best of 3: 8.74 ms per loop
%timeit ismember(a2, b2)
# 100 loops, best of 3: 8.5 ms per loop
np.all(in1d_index(a2, b2) == ismember(a2, b2))
# True
as what unutbu said, the indices are returned in increasing order
The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used to find out the desire answer, please try the following code:
import numpy as np
def ismemberRow(A,B):
'''
This function is find which rows found in A can be also found in B,
The function first turns multiple columns of elements into a single column array, then numpy.in1d can be used
Input: m x n numpy array (A), and p x q array (B)
Output unique numpy array with length m, storing either True or False, True for rows can be found in both A and B
'''
sa = np.chararray((A.shape[0],1))
sa[:] = '-'
sb = np.chararray((B.shape[0],1))
sb[:] = '-'
ba = (A).astype(np.str)
sa2 = np.expand_dims(ba[:,0],axis=1) + sa + np.expand_dims(ba[:,1],axis=1)
na = A.shape[1] - 2
for i in range(0,na):
sa2 = sa2 + sa + np.expand_dims(ba[:,i+2],axis=1)
bb = (B).astype(np.str)
sb2 = np.expand_dims(bb[:,0],axis=1) + sb + np.expand_dims(bb[:,1],axis=1)
nb = B.shape[1] - 2
for i in range(0,nb):
sb2 = sb2 + sb + np.expand_dims(bb[:,i+2],axis=1)
return np.in1d(sa2,sb2)
A = np.array([[1, 3, 4],[2, 4, 3],[7, 4, 3],[1, 1, 1],[1, 3, 4],[5, 3, 4],[1, 1, 1],[2, 4, 3]])
B = np.array([[1, 3, 4],[1, 1, 1]])
d = ismemberRow(A,B)
print A[np.where(d)[0],:]
#results:
#[[1 3 4]
# [1 1 1]
# [1 3 4]
# [1 1 1]]
Here's a function based on libigl's igl::ismember_rows which closely mimics the behavior of Matlab's ismember(A,B,'rows'):
def ismember_rows(A,B, return_index=False):
"""
Return whether each row in A occurs as a row in B
Parameters
----------
A : #A by dim array
B : #B by dim array
return_index : {True,False}, optional.
Returns
-------
IA : #A 1D array, IA[i] == True if and only if
there exists j = LOCB[i] such that B[j,:] == A[i,:]
LOCB : #A 1D array of indices. LOCB[j] == -1 if IA[i] == False,
only returned if return_index=True
"""
IA = np.full(A.shape[0],False)
LOCB = np.full(A.shape[0],-1)
if len(A) == 0: return (IA,LOCB) if return_index else IA
if len(B) == 0: return (IA,LOCB) if return_index else IA
# Get rid of any duplicates
uA,uIuA = np.unique(A, axis=0, return_inverse=True)
uB,uIB = np.unique(B, axis=0, return_index=True)
# Sort both
sIA = np.lexsort(uA.T[::-1])
sA = uA[sIA,:]
sIB = np.lexsort(uB.T[::-1])
sB = uB[sIB,:]
#
uF = np.full(sA.shape[0],False)
uLOCB = np.full(sA.shape[0],-1)
def row_greater_than(a,b):
for c in range(sA.shape[1]):
if(sA[a,c] > sB[b,c]): return True
if(sA[a,c] < sB[b,c]): return False
return False
# loop over sA
bi = 0
past = False
for a in range(sA.shape[0]):
while not past and row_greater_than(a,bi):
bi+=1
past = bi>=sB.shape[0]
if not past and np.all(sA[a,:]==sB[bi,:]):
uF[sIA[a]] = True
uLOCB[sIA[a]] = uIB[sIB[bi]]
for a in range(A.shape[0]):
IA[a] = uF[uIuA[a]]
LOCB[a] = uLOCB[uIuA[a]]
return (IA,LOCB) if return_index else IA
For example,
a=np.array([[4, 6],[6,6],[2, 6],[5, 2]])
b=np.array([[1, 7],[1, 8],[2, 6],[2, 1],[2, 4],[4, 6],[4, 7],[5, 9],[5, 2],[5, 1]])
(flag,index) = ismember_rows(a,b,return_index=True)
produces
>>> flag
array([ True, False, True, True])
>>> index
array([ 5, -1, 2, 8])
Update: Here's a faster version that makes better use of numpy.unique based on array_correspondence in gpytoolbox.
def ismember_rows(A,B,return_index=False):
"""
Return whether each row in A occurs as a row in B
Parameters
----------
A : #A by dim array
B : #B by dim array
return_index : {True,False}, optional.
Returns
-------
IA : #A 1D array, IA[i] == True if and only if
there exists j = LOCB[i] such that B[j,:] == A[i,:]
LOCB : #A 1D array of indices. LOCB[j] == -1 if IA[i] == False,
only returned if return_index=True
"""
if len(A) == 0 or len(B) == 0:
IA = np.full(A.shape[0],False)
LOCB = np.full(A.shape[0],-1)
return (IA,LOCB) if return_index else IA
uB,mapB = np.unique(B,axis=0, return_index=True)
uU,idx,inv = np.unique(np.vstack((uB,A)),axis=0,return_index=True, return_inverse=True)
imap = idx[inv[uB.shape[0]:]]
imap[imap>=uB.shape[0]] = -1
LOCB = np.where(imap<0, -1, mapB[imap])
IA = LOCB>=0
return (IA,LOCB) if return_index else IA
Seems to be a bit faster on my laptop.