Matlab operation to Python - python

I'm having trouble translating this operation from MatLab to Python:
xup(1:ncomp,1)=aa(1+k:ncomp+k).';
"aa" is a vector of 1x1000 elements.
"ncomp" = 128
"k" is a variable for a loop cycle.
The problem is ... I don't understand how does it work.
I'm posting the whole section of the algorithm:
while(testnorm>0.0001 && epoca<maxit)
k=0;
xup=[];
while(k<=npatt)
xup(1:ncomp,1)=aa(1+k:ncomp+k).';
if (funz==1)
gy=tanh(alpha.*xup.'*w);
else
gy=sign(xup.'*w).*log(1+abs(alpha*xup.'*w));
end
w=w+lr.*(xup*gy-w*triu((xup.'*w).'*gy));
w = w / norm(w);
k=k+1;
end
[...]
end
can you help ?

Essentially what this
xup(1:ncomp,1)=aa(1+k:ncomp+k).';
line is doing; It is fetching data from vector aa, starting from index 1+k to ncomp+k index(i.e ncomp total elements) and transposing(from single row to ncomp rows) those elements. Now this transposed data is being inserted into the xup as ncomp rows, where each row has a single element. The similar python code would be
import numpy as np
aa = np.array([i for i in range(1000)])
ncomp = 128
k = 0
xup = [[0] for i in range(ncomp)]
while(k<10):
data = (np.transpose(aa[k:ncomp+k]))
for i in range(ncomp):
xup[i] = data[i]
k += 1
print(xup[:128])

Related

Optimize code for step function using only NumPy

I'm trying to optimize the function 'pw' in the following code using only NumPy functions (or perhaps list comprehensions).
from time import time
import numpy as np
def pw(x, udata):
"""
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
"""
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
N = 50000
x = np.linspace(0,10,N)
data = [1,3,4,5,5,7]
udata = np.unique(data)
ti = time()
pw(x,udata)
tf = time()
print(tf - ti)
import cProfile
cProfile.run('pw(x,udata)')
The cProfile.run is telling me that most of the overhead is coming from np.where (about 1 ms) but I'd like to create faster code if possible. It seems that performing the operations row-wise versus column-wise makes some difference, unless I'm mistaken, but I think I've accounted for it. I know that sometimes list comprehensions can be faster but I couldn't figure out a faster way than what I'm doing using it.
Searchsorted seems to yield better performance but that 1 ms still remains on my computer:
(modified)
def pw(xx, uu):
"""
Creates the step function
| 1, if d0 <= x < d1
| 2, if d1 <= x < d2
pw(x,data) = ...
| N, if d(N-1) <= x < dN
| 0, otherwise
where di is the ith element in data.
INPUT: x -- interval which the step function is defined over
data -- an ordered set of data (without repetitions)
OUTPUT: pw_func -- an array of size x.shape[0]
"""
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds != uu.shape[0]]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
This answer using piecewise seems pretty close, but that's on a scalar x0 and x1. How would I do it on arrays? And would it be more efficient?
Understandably, x may be pretty big but I'm trying to put it through a stress test.
I am still learning though so some hints or tricks that can help me out would be great.
EDIT
There seems to be a mistake in the second function since the resulting array from the second function doesn't match the first one (which I'm confident that it works):
N1 = pw1(x,udata.reshape(udata.shape[0],1)).shape[0]
N2 = np.sum(pw1(x,udata.reshape(udata.shape[0],1)) == pw2(x,udata))
print(N1 - N2)
yields
15000
data points that are not the same. So it seems that I don't know how to use 'searchsorted'.
EDIT 2
Actually I fixed it:
pw_func = vals[inds[inds != uu.shape[0]]]
was changed to
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
so at least the resulting arrays match. But the question still remains on whether there's a more efficient way of going about doing this.
EDIT 3
Thanks Tin Lai for pointing out the mistake. This one should work
pw_func = vals[inds[(inds != uu.shape[0])*(inds != 0)]-1]
Maybe a more readable way of presenting it would be
non_endpts = (inds != uu.shape[0])*(inds != 0) # only consider the points in between the min/max data values
shift_inds = inds[non_endpts]-1 # searchsorted side='right' includes the left end point and not right end point so a shift is needed
pw_func = vals[shift_inds]
I think I got lost in all those brackets! I guess that's the importance of readability.
A very abstract yet interesting problem! Thanks for entertaining me, I had fun :)
p.s. I'm not sure about your pw2 I wasn't able to get it output the same as pw1.
For reference the original pws:
def pw1(x, udata):
vals = np.arange(1,udata.shape[0]+1).reshape(udata.shape[0],1)
pw_func = np.sum(np.where(np.greater_equal(x,udata)*np.less(x,np.roll(udata,-1)),vals,0),axis=0)
return pw_func
def pw2(xx, uu):
inds = np.searchsorted(uu, xx, side='right')
vals = np.arange(1,uu.shape[0]+1)
pw_func = vals[inds[inds[(inds != uu.shape[0])*(inds != 0)]-1]]
num_mins = np.sum(xx < np.min(uu))
num_maxs = np.sum(xx > np.max(uu))
pw_func = np.concatenate((np.zeros(num_mins), pw_func, np.zeros(xx.shape[0]-pw_func.shape[0]-num_mins)))
return pw_func
My first attempt was utilising a lot of boardcasting operation from numpy:
def pw3(x, udata):
# the None slice is to create new axis
step_bool = x >= udata[None,:].T
# we exploit the fact that bools are integer value of 1s
# skipping the last value in "data"
step_vals = np.sum(step_bool[:-1], axis=0)
# for the step_bool that we skipped from previous step (last index)
# we set it to zerp so that we can negate the step_vals once we reached
# the last value in "data"
step_vals[step_bool[-1]] = 0
return step_vals
After looking at the searchsorted from your pw2 I had a new approach that utilise it with much higher performance:
def pw4(x, udata):
inds = np.searchsorted(udata, x, side='right')
# fix-ups the last data if x is already out of range of data[-1]
if x[-1] > udata[-1]:
inds[inds == inds[-1]] = 0
return inds
Plots with:
plt.plot(pw1(x,udata.reshape(udata.shape[0],1)), label='pw1')
plt.plot(pw2(x,udata), label='pw2')
plt.plot(pw3(x,udata), label='pw3')
plt.plot(pw4(x,udata), label='pw4')
with data = [1,3,4,5,5,7]:
with data = [1,3,4,5,5,7,11]
pw1,pw3,pw4 are all identical
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw3(x,udata)))
>>> True
print(np.all(pw1(x,udata.reshape(udata.shape[0],1)) == pw4(x,udata)))
>>> True
Performance: (timeit by default runs 3 times, average of number=N of times)
print(timeit.Timer('pw1(x,udata.reshape(udata.shape[0],1))', "from __main__ import pw1, x, udata").repeat(number=1000))
>>> [3.1938983199979702, 1.6096494779994828, 1.962694135003403]
print(timeit.Timer('pw2(x,udata)', "from __main__ import pw2, x, udata").repeat(number=1000))
>>> [0.6884554479984217, 0.6075002400029916, 0.7799002879983163]
print(timeit.Timer('pw3(x,udata)', "from __main__ import pw3, x, udata").repeat(number=1000))
>>> [0.7369808239964186, 0.7557657590004965, 0.8088172269999632]
print(timeit.Timer('pw4(x,udata)', "from __main__ import pw4, x, udata").repeat(number=1000))
>>> [0.20514375300263055, 0.20203858999957447, 0.19906871100101853]

Can anybody explain me this simple Python code?

I have simple function which returns matrix of zeros and ones. I can't understand how line: out[range(n), vec] = 1 works. Vector v can have values from 0 to 9.
import numpy as np
def one_hot_encode(vec, vals=10):
n = len(vec)
out = np.zeros((n, vals))
out[range(n), vec] = 1
return out
v = [1,2,3,1,3,5,7,8,9,1,2,3,4,5,6,7,8,9,0,1,2,3,1,3,5,7,8,9,1,2,3]
one_hot_encode(v, 10)
the line line: out[range(n), vec] = 1 is placing the one corresponding to vec values i.e. if vec has first value 1 then in out matrix first row and second column(value +1) will be assigned as 1. if 4rt value is 1 then 4rt row and second column will be assigned 1.

Pandas/Numpy Changed File Type Read in Now Get Calculation Error

I was reading in an Excel file with Pandas, extracting a Numpy array from 2 sets of 9 columns, and creating a complex number array for doing complex number matrix multiplication. The original data is just decimal values, but depending on the column the values come from decides whether they are real or imaginary. I originally wrote the code reading from and Excel file as an xlsx. Then I changed it to a csv file source, I get the same individual 3X3 matrix in both cases which I am combining into a complex matrix. When I run the code, I get the following error.
TypeError Traceback (most recent call last)
<ipython-input-107-e87255037c7f> in <module>()
11 while counter < count_net_r:
12 n = counter # int
---> 13 net = (net_r[n] + 1.0j * net_x[n])
14 counter = counter + 1
15 net_seq.append(net)
TypeError: can't multiply sequence by non-int of type 'complex'
My code for readin in the file:
df = pd.read_csv('Report.csv')
# uses Pandas .loc to extract and create the network equivalent R per phase components into an array
# for multiplication to calculate the sequence matrix - REAL Part
r = df.loc[df['Code\n'] == 'Network Equivalent', 'NetEq Z R AA':'NetEq Z R CC']
rr = r.shape[0]
net_r = r.values.reshape(rr,3,3)
# uses Pandas .loc to extract and create the network equivalent X per phase components into an array
# for multiplication to calculate the sequence matrix - COMPLEX Part
x = df.loc[df['Code\n'] == 'Network Equivalent', 'NetEq Z X AA':'NetEq Z X CC']
xx = x.shape[0]
net_x = x.values.reshape(xx,3,3)
# loop to concatenate the R & X into one array of complex numbers
# if the R and X matrices are of unequal lengths then prints unequal length so it can be solved and attempted again
count_net_r = len(net_r) # int
count_net_x = len(net_x) # int
net = [] # list
net_seq = [] # list
counter = 0
if count_net_r != count_net_x:
print('Network Equivalent matrices are not equivalent')
else:
while counter < count_net_r:
n = counter # int
net = (net_r[n] + 1.0j * net_x[n])
counter = counter + 1
net_seq.append(net)
net_seq = np.array(net_seq)
All I changed is how the file is read in. So what do I need to change to get this code to work? Or, is there a better way?
So I have no clue why, but all I did was change the type on creation of the two arrays.
# uses Pandas .loc to extract and create the network equivalent R per phase components into an array
# for multiplication to calculate the sequence matrix - REAL Part
r = df.loc[df['Code\n'] == 'Network Equivalent', 'NetEq Z R AA':'NetEq Z R CC'].astype('float')
rr = r.shape[0]
net_r = r.values.reshape(rr,3,3)
# uses Pandas .loc to extract and create the network equivalent X per phase components into an array
# for multiplication to calculate the sequence matrix - COMPLEX Part
x = df.loc[df['Code\n'] == 'Network Equivalent', 'NetEq Z X AA':'NetEq Z X CC'].astype('float')
xx = x.shape[0]
net_x = x.values.reshape(xx,3,3)
I thought even in a csv file type a number is a number. Guess I was wrong.

efficient way to split temporal Numpy vector automatically

I have a temporal vector as in the following image:
Numpy vector:
https://drive.google.com/file/d/0B4Jac-wNMDxHS3BnUzBoUkdmOGs/view?usp=sharing
I would like to know an efficient way to split the vector in numpy, and extract the 5 chunks of the signals that drop in amplitude significantly.
I could separate them by considering the amplitude 2.302 as the cut off amplitude and separate them by the initial index when the signal drops bellow this value and the final index when the signal goes above this value.
Any efficient way to do this in numpy?
So I've programmed the solution in pure python and lists:
vec = np.load('vector_numpy.npy')
# plt.plot(vec)
# plt.show()
print vec.shape
temporal_vec = []
flag = 0
flag_start = 0
flag_end = 0
all_vectors = []
all_index = []
count = -1
for element in vec:
count = count+1
#print element
if element < 2.302:
if flag_start ==0:
all_index.append(count)
flag_start=1
temporal_vec.append(element)
flag = 1
if flag == 1:
if element >= 2.302:
if flag_start==1:
all_index.append(count)
flag_start=0
all_vectors.append(temporal_vec)
temporal_vec = []
flag = 0
print(all_vectors)
for element in all_vectors:
print(len(element))
plt.plot(element)
plt.show()
print(all_index)
Any fancier way in Numpy or better/shorter python code?

Does NumPy have a function equivalent to Matlab's buffer?

I see there is an array_split and split methods but these are not very handy when you have to split an array of length which is not integer multiple of the chunk size. Moreover, these method’s input is the number of slices rather than the slice size. I need something more like Matlab's buffer method which is more suitable for signal processing.
For example, if I want to buffer a signals to chunks of size 60 I need to do: np.vstack(np.hsplit(x.iloc[0:((len(x)//60)*60)], len(x)//60)) which is cumbersome.
I wrote the following routine to handle the use cases I needed, but I have not implemented/tested for "underlap".
Please feel free to make suggestions for improvement.
def buffer(X, n, p=0, opt=None):
'''Mimic MATLAB routine to generate buffer array
MATLAB docs here: https://se.mathworks.com/help/signal/ref/buffer.html
Parameters
----------
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
opt: str
Initial condition options. default sets the first `p` values to zero,
while 'nodelay' begins filling the buffer immediately.
Returns
-------
result : (n,n) ndarray
Buffer array created from X
'''
import numpy as np
if opt not in [None, 'nodelay']:
raise ValueError('{} not implemented'.format(opt))
i = 0
first_iter = True
while i < len(X):
if first_iter:
if opt == 'nodelay':
# No zeros at array start
result = X[:n]
i = n
else:
# Start with `p` zeros
result = np.hstack([np.zeros(p), X[:n-p]])
i = n-p
# Make 2D array and pivot
result = np.expand_dims(result, axis=0).T
first_iter = False
continue
# Create next column, add `p` results from last col if given
col = X[i:i+(n-p)]
if p != 0:
col = np.hstack([result[:,-1][-p:], col])
i += n-p
# Append zeros if last row and not length `n`
if len(col) < n:
col = np.hstack([col, np.zeros(n-len(col))])
# Combine result with next row
result = np.hstack([result, np.expand_dims(col, axis=0).T])
return result
def buffer(X = np.array([]), n = 1, p = 0):
#buffers data vector X into length n column vectors with overlap p
#excess data at the end of X is discarded
n = int(n) #length of each data vector
p = int(p) #overlap of data vectors, 0 <= p < n-1
L = len(X) #length of data to be buffered
m = int(np.floor((L-n)/(n-p)) + 1) #number of sample vectors (no padding)
data = np.zeros([n,m]) #initialize data matrix
for startIndex,column in zip(range(0,L-n,n-p),range(0,m)):
data[:,column] = X[startIndex:startIndex + n] #fill in by column
return data
This Keras function may be considered as a Python equivalent of MATLAB Buffer().
See the Sample Code :
import numpy as np
S = np.arange(1,99) #A Demo Array
See Output Here
import tensorflow.keras.preprocessing as kp
list(kp.timeseries_dataset_from_array(S, targets = None,sequence_length=7,sequence_stride=7,batch_size=5))
See the Buffered Array Output Here
Reference : See This
Same as the other answer, but faster.
def buffer(X, n, p=0):
'''
Parameters
----------
x: ndarray
Signal array
n: int
Number of data segments
p: int
Number of values to overlap
Returns
-------
result : (n,m) ndarray
Buffer array created from X
'''
import numpy as np
d = n - p
m = len(X)//d
if m * d != len(X):
m = m + 1
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xne = np.concatenate((Xn,np.zeros((1,d))))
Xn = np.concatenate((Xn,Xne[1:,0:p]), axis = 1)
return np.transpose(Xn[:-1])
ryanjdillon's answer rewritten for significant performance improvement; it appends to a list instead of concatenating arrays, latter which copies the array iteratively and is much slower.
def buffer(x, n, p=0, opt=None):
if opt not in ('nodelay', None):
raise ValueError('{} not implemented'.format(opt))
i = 0
if opt == 'nodelay':
# No zeros at array start
result = x[:n]
i = n
else:
# Start with `p` zeros
result = np.hstack([np.zeros(p), x[:n-p]])
i = n-p
# Make 2D array, cast to list for .append()
result = list(np.expand_dims(result, axis=0))
while i < len(x):
# Create next column, add `p` results from last col if given
col = x[i:i+(n-p)]
if p != 0:
col = np.hstack([result[-1][-p:], col])
# Append zeros if last row and not length `n`
if len(col):
col = np.hstack([col, np.zeros(n - len(col))])
# Combine result with next row
result.append(np.array(col))
i += (n - p)
return np.vstack(result).T
def buffer(X, n, p=0):
'''
Parameters:
x: ndarray, Signal array, input a long vector as raw speech wav
n: int, frame length
p: int, Number of values to overlap
-----------
Returns:
result : (n,m) ndarray, Buffer array created from X
'''
import numpy as np
d = n - p
#print(d)
m = len(X)//d
c = n//d
#print(c)
if m * d != len(X):
m = m + 1
#print(m)
Xn = np.zeros(d*m)
Xn[:len(X)] = X
Xn = np.reshape(Xn,(m,d))
Xn_out = Xn
for i in range(c-1):
Xne = np.concatenate((Xn,np.zeros((i+1,d))))
Xn_out = np.concatenate((Xn_out, Xne[i+1:,:]),axis=1)
#print(Xn_out.shape)
if n-d*c>0:
Xne = np.concatenate((Xn, np.zeros((c,d))))
Xn_out = np.concatenate((Xn_out,Xne[c:,:n-p*c]),axis=1)
return np.transpose(Xn_out)
here is a improved code of Ali Khodabakhsh's sample code which is not work in my cases. Feel free to comment and use it.
Comparing the execution time of the proposed answers, by running
x = np.arange(1,200000)
start = timer()
y = buffer(x,60,20)
end = timer()
print(end-start)
the results are:
Andrzej May, 0.005595300000095449
OverLordGoldDragon, 0.06954789999986133
ryanjdillon, 2.427092700000003

Categories

Resources