Does NumPy have a function equivalent to Matlab's buffer? - python

I see there is an array_split and split methods but these are not very handy when you have to split an array of length which is not integer multiple of the chunk size. Moreover, these method’s input is the number of slices rather than the slice size. I need something more like Matlab's buffer method which is more suitable for signal processing.
For example, if I want to buffer a signals to chunks of size 60 I need to do: np.vstack(np.hsplit(x.iloc[0:((len(x)//60)*60)], len(x)//60)) which is cumbersome.

I wrote the following routine to handle the use cases I needed, but I have not implemented/tested for "underlap".
Please feel free to make suggestions for improvement.
def buffer(X, n, p=0, opt=None):
'''Mimic MATLAB routine to generate buffer array
MATLAB docs here:
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
opt: str
Initial condition options. default sets the first `p` values to zero,
while 'nodelay' begins filling the buffer immediately.
result : (n,n) ndarray
Buffer array created from X
import numpy as np
if opt not in [None, 'nodelay']:
raise ValueError('{} not implemented'.format(opt))
i = 0
first_iter = True
while i < len(X):
if first_iter:
if opt == 'nodelay':
# No zeros at array start
result = X[:n]
i = n
# Start with `p` zeros
result = np.hstack([np.zeros(p), X[:n-p]])
i = n-p
# Make 2D array and pivot
result = np.expand_dims(result, axis=0).T
first_iter = False
# Create next column, add `p` results from last col if given
col = X[i:i+(n-p)]
if p != 0:
col = np.hstack([result[:,-1][-p:], col])
i += n-p
# Append zeros if last row and not length `n`
if len(col) < n:
col = np.hstack([col, np.zeros(n-len(col))])
# Combine result with next row
result = np.hstack([result, np.expand_dims(col, axis=0).T])
return result

def buffer(X = np.array([]), n = 1, p = 0):
#buffers data vector X into length n column vectors with overlap p
#excess data at the end of X is discarded
n = int(n) #length of each data vector
p = int(p) #overlap of data vectors, 0 <= p < n-1
L = len(X) #length of data to be buffered
m = int(np.floor((L-n)/(n-p)) + 1) #number of sample vectors (no padding)
data = np.zeros([n,m]) #initialize data matrix
for startIndex,column in zip(range(0,L-n,n-p),range(0,m)):
data[:,column] = X[startIndex:startIndex + n] #fill in by column
return data

This Keras function may be considered as a Python equivalent of MATLAB Buffer().
See the Sample Code :
import numpy as np
S = np.arange(1,99) #A Demo Array
See Output Here
import tensorflow.keras.preprocessing as kp
list(kp.timeseries_dataset_from_array(S, targets = None,sequence_length=7,sequence_stride=7,batch_size=5))
See the Buffered Array Output Here
Reference : See This

Same as the other answer, but faster.
def buffer(X, n, p=0):
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
result : (n,m) ndarray
Buffer array created from X
import numpy as np
d = n - p
m = len(X)//d
if m * d != len(X):
m = m + 1
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xne = np.concatenate((Xn,np.zeros((1,d))))
Xn = np.concatenate((Xn,Xne[1:,0:p]), axis = 1)
return np.transpose(Xn[:-1])

ryanjdillon's answer rewritten for significant performance improvement; it appends to a list instead of concatenating arrays, latter which copies the array iteratively and is much slower.
def buffer(x, n, p=0, opt=None):
if opt not in ('nodelay', None):
raise ValueError('{} not implemented'.format(opt))
i = 0
if opt == 'nodelay':
# No zeros at array start
result = x[:n]
i = n
# Start with `p` zeros
result = np.hstack([np.zeros(p), x[:n-p]])
i = n-p
# Make 2D array, cast to list for .append()
result = list(np.expand_dims(result, axis=0))
while i < len(x):
# Create next column, add `p` results from last col if given
col = x[i:i+(n-p)]
if p != 0:
col = np.hstack([result[-1][-p:], col])
# Append zeros if last row and not length `n`
if len(col):
col = np.hstack([col, np.zeros(n - len(col))])
# Combine result with next row
i += (n - p)
return np.vstack(result).T

def buffer(X, n, p=0):
x: ndarray, Signal array, input a long vector as raw speech wav
n: int, frame length
p: int, Number of values to overlap
result : (n,m) ndarray, Buffer array created from X
import numpy as np
d = n - p
m = len(X)//d
c = n//d
if m * d != len(X):
m = m + 1
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xn_out = Xn
for i in range(c-1):
Xne = np.concatenate((Xn,np.zeros((i+1,d))))
Xn_out = np.concatenate((Xn_out, Xne[i+1:,:]),axis=1)
if n-d*c>0:
Xne = np.concatenate((Xn, np.zeros((c,d))))
Xn_out = np.concatenate((Xn_out,Xne[c:,:n-p*c]),axis=1)
return np.transpose(Xn_out)
here is a improved code of Ali Khodabakhsh's sample code which is not work in my cases. Feel free to comment and use it.

Comparing the execution time of the proposed answers, by running
x = np.arange(1,200000)
start = timer()
y = buffer(x,60,20)
end = timer()
the results are:
Andrzej May, 0.005595300000095449
OverLordGoldDragon, 0.06954789999986133
ryanjdillon, 2.427092700000003


Matlab operation to Python

I'm having trouble translating this operation from MatLab to Python:
"aa" is a vector of 1x1000 elements.
"ncomp" = 128
"k" is a variable for a loop cycle.
The problem is ... I don't understand how does it work.
I'm posting the whole section of the algorithm:
while(testnorm>0.0001 && epoca<maxit)
if (funz==1)
w = w / norm(w);
can you help ?
Essentially what this
line is doing; It is fetching data from vector aa, starting from index 1+k to ncomp+k index(i.e ncomp total elements) and transposing(from single row to ncomp rows) those elements. Now this transposed data is being inserted into the xup as ncomp rows, where each row has a single element. The similar python code would be
import numpy as np
aa = np.array([i for i in range(1000)])
ncomp = 128
k = 0
xup = [[0] for i in range(ncomp)]
data = (np.transpose(aa[k:ncomp+k]))
for i in range(ncomp):
xup[i] = data[i]
k += 1

Numpy is not swapping elements

I am trying to swap two indices in the 2D array of NumPy. Unfortunately, only one element is getting swapped. Here is the code:
n = len(A)
perMatrix = np.zeros((n,n))
np.fill_diagonal(perMatrix, 1)
perMatrix = A
# swapping the row
temp = perMatrix[switchIndex1]
# perMatrix[switchIndex1][0] = 14
perMatrix[switchIndex1], perMatrix[switchIndex2] = perMatrix[switchIndex2], perMatrix[switchIndex1]
Here's what the code is outputting:
You could just add (on the line after perMatrix is created):
sigma = [switchIndex1, switchIndex2]
tau = [switchIndex2, switchIndex1]
perMatrix[sigma,:] = perMatrix[tau,:]

Optimize code for step function using only NumPy

I'm trying to optimize the function 'pw' in the following code using only NumPy functions (or perhaps list comprehensions).
from time import time
import numpy as np
def pw(x, udata):
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
N = 50000
x = np.linspace(0,10,N)
data = [1,3,4,5,5,7]
udata = np.unique(data)
ti = time()
tf = time()
print(tf - ti)
import cProfile'pw(x,udata)')
The is telling me that most of the overhead is coming from np.where (about 1 ms) but I'd like to create faster code if possible. It seems that performing the operations row-wise versus column-wise makes some difference, unless I'm mistaken, but I think I've accounted for it. I know that sometimes list comprehensions can be faster but I couldn't figure out a faster way than what I'm doing using it.
Searchsorted seems to yield better performance but that 1 ms still remains on my computer:
def pw(xx, uu):
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds != uu.shape[0]]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
This answer using piecewise seems pretty close, but that's on a scalar x0 and x1. How would I do it on arrays? And would it be more efficient?
Understandably, x may be pretty big but I'm trying to put it through a stress test.
I am still learning though so some hints or tricks that can help me out would be great.
There seems to be a mistake in the second function since the resulting array from the second function doesn't match the first one (which I'm confident that it works):
N1 = pw1(x,udata.reshape(udata.shape[0],1)).shape[0]
N2 = np.sum(pw1(x,udata.reshape(udata.shape[0],1)) == pw2(x,udata))
print(N1 - N2)
data points that are not the same. So it seems that I don't know how to use 'searchsorted'.
Actually I fixed it:
pw_func = vals[inds[inds != uu.shape[0]]]
was changed to
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
so at least the resulting arrays match. But the question still remains on whether there's a more efficient way of going about doing this.
Thanks Tin Lai for pointing out the mistake. This one should work
pw_func = vals[inds[(inds != uu.shape[0])*(inds != 0)]-1]
Maybe a more readable way of presenting it would be
non_endpts = (inds != uu.shape[0])*(inds != 0) # only consider the points in between the min/max data values
shift_inds = inds[non_endpts]-1 # searchsorted side='right' includes the left end point and not right end point so a shift is needed
pw_func = vals[shift_inds]
I think I got lost in all those brackets! I guess that's the importance of readability.
A very abstract yet interesting problem! Thanks for entertaining me, I had fun :)
p.s. I'm not sure about your pw2 I wasn't able to get it output the same as pw1.
For reference the original pws:
def pw1(x, udata):
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
def pw2(xx, uu):
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
My first attempt was utilising a lot of boardcasting operation from numpy:
def pw3(x, udata):
# the None slice is to create new axis
step_bool = x >= udata[None,:].T
# we exploit the fact that bools are integer value of 1s
# skipping the last value in "data"
step_vals = np.sum(step_bool[:-1], axis=0)
# for the step_bool that we skipped from previous step (last index)
# we set it to zerp so that we can negate the step_vals once we reached
# the last value in "data"
step_vals[step_bool[-1]] = 0
return step_vals
After looking at the searchsorted from your pw2 I had a new approach that utilise it with much higher performance:
def pw4(x, udata):
inds = np.searchsorted(udata, x, side='right')
# fix-ups the last data if x is already out of range of data[-1]
if x[-1] > udata[-1]:
inds[inds == inds[-1]] = 0
return inds
Plots with:
plt.plot(pw1(x,udata.reshape(udata.shape[0],1)), label='pw1')
plt.plot(pw2(x,udata), label='pw2')
plt.plot(pw3(x,udata), label='pw3')
plt.plot(pw4(x,udata), label='pw4')
with data = [1,3,4,5,5,7]:
with data = [1,3,4,5,5,7,11]
pw1,pw3,pw4 are all identical
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw3(x,udata)))
>>> True
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw4(x,udata)))
>>> True
Performance: (timeit by default runs 3 times, average of number=N of times)
print(timeit.Timer('pw1(x,udata.reshape(udata.shape[0],1))', "from __main__ import pw1, x, udata").repeat(number=1000))
>>> [3.1938983199979702, 1.6096494779994828, 1.962694135003403]
print(timeit.Timer('pw2(x,udata)', "from __main__ import pw2, x, udata").repeat(number=1000))
>>> [0.6884554479984217, 0.6075002400029916, 0.7799002879983163]
print(timeit.Timer('pw3(x,udata)', "from __main__ import pw3, x, udata").repeat(number=1000))
>>> [0.7369808239964186, 0.7557657590004965, 0.8088172269999632]
print(timeit.Timer('pw4(x,udata)', "from __main__ import pw4, x, udata").repeat(number=1000))
>>> [0.20514375300263055, 0.20203858999957447, 0.19906871100101853]

Loop and extract window

I am trying to create a function (or series of functions), that perform the following operations:
Having an input array(A), for each cell A[i,j], extract a window (W), of custom size, where the value 'min' will be:
min = np.min(W)
The output matrix (H) will store the values as:
H[i,j] = A[i,j] - min(W)
For an easier understanding of the issue, I attached a picture (Example):
My current code is this:
def res_array(matrix, size):
result = []
sc.generic_filter(matrix, nothing, size, extra_arguments=(result,), mode = 'nearest')
mat_out = result
return mat_out
def local(window):
H = np.empty_like(window)
w = res_array(window, 3)
win_min = np.apply_along_axis(min, 1, w)
# This is where I think it's broken
for k in win_min:
for i in range(window.shape[0]):
for j in range(window.shape[1]):
h[i, j] = window[i,j] - k
k += 1
return h
def nothing(window, out):
list = []
for i in range(window.shape[0]):
return 0
test = np.ones((10, 10)) * np.arange(10)
a = local(test)
I need the code to pass to the next value in 'for k in win_min', for each cell of the input matrix A, or test.
Edit: I thought of something like directly accessing the index of the 'win_min', and increment by one, like I saw here: Increment the value inside a list element, but I don't know how to do that.
Thanks for any help!
N=4 #matrix size
a=random((N,N)) #input
#--window size
wl=1 #left
wr=1 #right
wt=1 #top
wb=1 #bottom
H=np.zeros((N,N)) #output
def h(k,l): #individual cell function
#--- checks to not run out of array
return a[k,l]-np.amin(a[k1:k2,l1:l2])
H=array([[h(k,l) for l in range(N)] for k in range(N)]) #running over all matrix elements
print a
print H

"shape mismatch" error using numpy in python

I am trying to generate a random array of 0s and 1s, and I am getting the error: shape mismatch: objects cannot be broadcast to a single shape. The error seems to be occurring in the line randints = np.random.binomial(1,p,(n,n)). Here is the function:
import numpy as np
def rand(n,p):
'''Generates a random array subject to parameters p and N.'''
# Create an array using a random binomial with one trial and specified
# parameters.
randints = np.random.binomial(1,p,(n,n))
# Use nested while loops to go through each element of the array
# and assign True to 1 and False to 0.
i = 0
j = 0
rand = np.empty(shape = (n,n),dtype = bool)
while i < n:
while j < n:
if randints[i][j] == 0:
rand[i][j] = False
if randints[i][j] == 1:
rand[i][j] = True
j = j+1
i = i +1
j = 0
# Return the new array.
return rand
print rand
When I run it by itself, it returns <function rand at 0x1d00170>. What does this mean? How should I convert it to an array that can be worked with in other functions?
You needn't go through all of that,
randints = np.random.binomial(1,p,(n,n))
produces your array of 0 and 1 values,
rand_true_false = randints == 1
will produce another array, just with the 1s replaced with True and 0s with False.
Obviously, the answer by #danodonovan is the most Pythonic, but if you really want something more similar to your looping code. Here is an example that fixes the name conflicts and loops more simply.
import numpy as np
def my_rand(n,p):
'''Generates a random array subject to parameters p and N.'''
# Create an array using a random binomial with one trial and specified
# parameters.
randInts = np.random.binomial(1,p,(n,n))
# Use nested while loops to go through each element of the array
# and assign True to 1 and False to 0.
randBool = np.empty(shape = (n,n),dtype = bool)
for i in range(n):
for j in range(n):
if randInts[i][j] == 0:
randBool[i][j] = False
randBool[i][j] = True
return randBool
newRand = my_rand(5,0.3)

