What is the best way to implement 1D-Convolution in python? - python

I am trying to implement 1D-convolution for signals.
It should have the same output as:
ary1 = np.array([1, 1, 2, 2, 1])
ary2 = np.array([1, 1, 1, 3])
conv_ary = np.convolve(ary2, ary1, 'full')
>>>> [1 2 4 8 8 9 7 3]
I came up with this approach:
def convolve_1d(signal, kernel):
n_sig = signal.size
n_ker = kernel.size
n_conv = n_sig - n_ker + 1
# by a factor of 3.
rev_kernel = kernel[::-1].copy()
result = np.zeros(n_conv, dtype=np.double)
for i in range(n_conv):
result[i] = np.dot(signal[i: i + n_ker], rev_kernel)
return result
But my result is [8,8] I might have to zero pad my array instead and change its indexing.
Is there a smoother way to achieve the desired outcome?

Here is a possible solution:
def convolve_1d(signal, kernel):
kernel = kernel[::-1]
return [
np.dot(
signal[max(0,i):min(i+len(kernel),len(signal))],
kernel[max(-i,0):len(signal)-i*(len(signal)-len(kernel)<i)],
)
for i in range(1-len(kernel),len(signal))
]
Here is an example:
>>> convolve_1d([1, 1, 2, 2, 1], [1, 1, 1, 3])
[1, 2, 4, 8, 8, 9, 7, 3]

Related

How to create a sequence of sequences of numbers in NumPy?

Inspired by the post How to create a sequence of sequences of numbers in R?.
Question:
I would like to make the following sequence in NumPy.
[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
I have tried the following:
Non-generic and hard coding using np.r_
np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
Pure Python to generate the desired array.
n = 5
a = np.r_[1:n+1]
[i for idx in range(a.shape[0]) for i in a[idx:]]
# [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
Create a 2D array and take the upper triangle from it.
n = 5
a = np.r_[1:n+1]
arr = np.tile(a, (n, 1))
print(arr)
# [[1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]
# [1 2 3 4 5]]
o = np.triu(arr).flatten()
# array([1, 2, 3, 4, 5,
# 0, 2, 3, 4, 5,
# 0, 0, 3, 4, 5, # This is 1D array
# 0, 0, 0, 4, 5,
# 0, 0, 0, 0, 5])
out = o[o > 0]
# array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.
I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.
np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
Here is the full code:
import numpy as np
from time import time
n = 5000
t = time()
c = np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
print(time() - t)
# 0.039876699447631836
t = time()
a = np.r_[1:n+1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688
print(all(b == c))
# True
A really plain Python (no numpy) way is:
n = 5
a = [r for start in range(1, n+1) for r in range(start, n+1)]
This will be faster for small n (~150) but slower than #tangolin's solution for larger n. It is still faster than the OP's "pure python" way.
A faster implementation prepares the data in advance, avoiding creating a new range each time :
source = np.arange(1, n+1)
d = np.concatenate([source[i: n+1] for i in range(0, n)])
NOTE
My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading #tangolin's answer and noticed that concatenate does the same.
Original implementation:
e = np.empty((n*(n+1)//2, ), dtype='int64')
source = np.arange(1, n+1)
for i in range(n):
init = n * i - i*(i-1)//2
end = n - i + init
e[init:end] = source[i:n]

how to replace all the items of array by indexes of specified lists?

I want to replace all the items of sequence with ids that tell which list of labeller they are in. Assume that all the values are distinct in both sequence and labeller and a union of lists of labeller has the same items as sequence. lsizes corresponds to the sizes of lists in labeller and is redundant for Pythonic solution but might be compulsory for solution to be vectorised fully.
sequence = [1, 2, 10, 5, 6, 4, 3, 8, 7, 9],
labeller = [[1, 2, 10], [3, 4, 5, 6, 7], [8, 9]]
lsizes = [3, 5, 2]
I know how to solve it in a simple way:
idx = {u:i for i, label in enumerate(labeller) for u in label}
tags = [idx[u] for u in sequence]
And the output is:
tags = [0, 0, 0, 1, 1, 1, 1, 2, 1, 2]
After that I put all my efforts to do it in vectorised way. It's quite complicated for me. This is my attempt, done rather by a guess, but, unfortunately, it doesn't pass all my tests. I hope I'm close:
sequence = np.array(sequence)
cl = np.concatenate(labeller)
_, cl_idx = np.unique(cl, return_index=True)
_, idx = np.unique(sequence[cl_idx], return_index=True)
tags = np.repeat(np.arange(len(lsizes)), lsizes)[idx]
#output: [0 0 1 1 0 1 1 1 2 2]
How can I finish it? I would also like to see rigour explanation what it does and how to understand it better. Any sources are also welcome.
Approach #1
For those tracing back problems, searchsorted seems to be the way to go and works here too, re-using your cl -
cl = np.concatenate(labeller)
sidx = cl.argsort()
idx = np.searchsorted(cl, sequence, sorter=sidx)
idx0 = sidx[idx]
l = list(map(len, labeller))
r = np.repeat(np.arange(len(l)), l)
out = r[idx0]
Using lsizes for l makes it fully vectorized. But, I suspect the concatenation step might be heavy. Whether this is worth it or not would depend a lot on the lengths of the subarrays.
Approach #2
For positive numbers, here's one with array-indexing as a hashing mechanism -
N = max(map(max, labeller))+1
id_ar = np.zeros(N, dtype=int) # use np.empty for perf. boost
for i,l in enumerate(labeller):
id_ar[l] = i
out = id_ar[sequence]
sequence = [1, 2, 10, 5, 6, 4, 3, 8, 7, 9]
labeller = [[1, 2, 10], [3, 4, 5, 6, 7], [8, 9]]
lsizes = [3, 5, 2]
sequence_array = np.array(sequence)
labeller_array = np.array(labeller).sum()
index_array = np.repeat(list(range(len(lsizes))), lsizes)
np.apply_along_axis(lambda num : index_array[np.where(labeller_array == num)[0]], 0, sequence_array[None, :])
# output: array([[0, 0, 0, 1, 1, 1, 1, 2, 1, 2]])
Alternative:
label_df = pd.DataFrame({'label':labeller_array, 'index':index_array})
seq_df = pd.DataFrame({'seq':sequence_array})
seq_df.merge(label_df, left_on = 'seq', right_on = 'label')['index'].tolist()
#output: [0, 0, 0, 1, 1, 1, 1, 2, 1, 2]

Why are my answer's decimals incorrect in Python?

I am currently busy with a question that requires for one to solve Ax = b using Jacobi method where the function created must return x and the 2 norm of x.
The question states that when b is inputted as
b = [71; 42;-11;-37;-61; 52]
T = 2
N = 2
The answer that i am supposed to get is x = [2.73186728; 1.44791667; 0.62885802; 6.32696759; 6.390625; 3.33012821] and the norm of x 10.0953
However I get x = [ 3.07507642; 0.58675203; -0.64849988; 5.33343053; 6.66765397; 4.16161712] and the 2 norm of x 10.0221
I am trying to find where the error in my code is but finding it difficult...below is my code
import numpy as np
from numpy.linalg import norm
from numpy import array
from scipy.linalg import solve
def jacobi(A, b, x0, N):
n = A.shape[0]
x = x0.copy()
k = 0
x_prev= x0.copy()
for i in range(0, n):
subs = 0.0
for j in range(0, n):
if i != j:
subs += np.matrix(A[i,j])*np.matrix(x_prev[j])
x[i] = (b[i]-subs)/np.matrix(A[i,i])
k += 1
return(x)
A = array([[18, 1, 4, 3, -1, 2],
[2, 12, -1, 7, -2, 1],
[-1, 1, -9, 2, -5, 2],
[2, 4, 1, -12, 1, 3],
[1, 3, 1, 7, -16, 1],
[-2, 1, 7, -1, 2, 13]])
x0 = array([[0],[0],[0],[0],[0],[0]])
elements_of_b_and_N = list(map(float, input().split(' ')))
b_and_N = array(elements_of_b_and_N).reshape(A.shape[0]+1, )
b = b_and_N[:A.shape[0]]
N = b_and_N[A.shape[0]]
x = jacobi(A, b, x0, N)
print((solve(A, b)))
print(round(norm((solve(A,b)), 2), 4))
How did you compute the true value ?
The question states that when b is inputted as b = [71;
42;-11;-37;-61; 52]T and N = 2, the answer that i am supposed to get
is x = [2.73186728; 1.44791667; 0.62885802; 6.32696759; 6.390625;
3.33012821] and the norm of x 10.0953
When I execute :
x0 = array([[0], [0], [0], [0], [0], [0]], dtype=float)
A = array([[18, 1, 4, 3, -1, 2],
[2, 12, -1, 7, -2, 1],
[-1, 1, -9, 2, -5, 2],
[2, 4, 1, -12, 1, 3],
[1, 3, 1, 7, -16, 1],
[-2, 1, 7, -1, 2, 13]])
b = array([[71], [42], [-11], [-37], [-61], [52]], dtype=float)
print(solve(A, b))
I get :
[[ 3.07507642]
[ 0.58675203]
[-0.64849988]
[ 5.33343053]
[ 6.66765397]
[ 4.16161712]]
As you do with Jacobi.
Hope this helps :)

How to use tf.unique_with_counts for each row/column of a tensor

I'm trying to solve KNN using tensorflow. After I get the K neighbours for N vectors, I have a N by K tensor. Now, for each vector in N, I need to use tf.unique_with_counts to find the majority vote. However, I cannot iterate in a tensor and I cannot run tf.unique_with_counts with a multi-dimensional tensor. It keeps giving me InvalidArgumentError (see above for traceback): unique expects a 1D vector.
Example:
def knnVote():
'''
KNN using majority vote
'''
#nearest indices
A = tf.constant([1, 1, 2, 4, 4, 4, 7, 8, 8])
print(A.shape)
nearest_k_y, idx, votes = tf.unique_with_counts(A)
print("y", nearest_k_y.eval())
print("idx", idx.eval())
print("votes", votes.eval())
majority = tf.argmax(votes)
predict_res = tf.gather(nearest_k_y, majority)
print("majority", majority.eval())
print("predict", predict_res.eval())
return predict_res
Result:
y [1 2 4 7 8]
idx [0 0 1 2 2 2 3 4 4]
votes [2 1 3 1 2]
majority 2
predict 4
But how can I extend this to N by D input A, such as the case when A = tf.constant([[1, 1, 2, 4, 4, 4, 7, 8, 8],
[2, 2, 3, 3, 3, 4, 4, 5, 6]])
You can use tf.while_loop to iterate over A rows and process each row independently. This requires a little bit of dark magic with shape_invariants (to accumulate the results) and careful processing in a loop body. But it becomes more or less clear after you stare at it for some time.
Here's a code:
def multidimensionalKnnVote():
A = tf.constant([
[1, 1, 2, 4, 4, 4, 7, 8, 8],
[2, 2, 3, 3, 3, 4, 4, 5, 6],
])
def cond(i, all_idxs, all_vals):
return i < A.shape[0]
def body(i, all_idxs, all_vals):
nearest_k_y, idx, votes = tf.unique_with_counts(A[i])
majority_idx = tf.argmax(votes)
majority_val = nearest_k_y[majority_idx]
majority_idx = tf.reshape(majority_idx, shape=(1,))
majority_val = tf.reshape(majority_val, shape=(1,))
new_idxs = tf.cond(tf.equal(i, 0),
lambda: majority_idx,
lambda: tf.concat([all_idxs, majority_idx], axis=0))
new_vals = tf.cond(tf.equal(i, 0),
lambda: majority_val,
lambda: tf.concat([all_vals, majority_val], axis=0))
return i + 1, new_idxs, new_vals
# This means: starting from 0, apply the `body`, while the `cond` is true.
# Note that `shape_invariants` allow the 2nd and 3rd tensors to grow.
i0 = tf.constant(0)
idx0 = tf.constant(0, shape=(1,), dtype=tf.int64)
val0 = tf.constant(0, shape=(1,), dtype=tf.int32)
_, idxs, vals = tf.while_loop(cond, body,
loop_vars=(i0, idx0, val0),
shape_invariants=(i0.shape, tf.TensorShape([None]), tf.TensorShape([None])))
print('majority:', idxs.eval())
print('predict:', vals.eval())
you can use tf.map_fn to apply a function to each row of a matrix variable
def knnVote(A):
nearest_k_y, idx, votes = tf.unique_with_counts(A)
majority = tf.argmax(votes)
predict_res = tf.gather(nearest_k_y, majority)
return predict_res
sess = tf.Session()
with sess.as_default():
B = tf.constant([[1, 1, 2, 4, 4, 4, 7, 8, 8],
[2, 2, 3, 3, 3, 4, 4, 5, 6]])
C = tf.map_fn(knnVote, B)
print(C.eval())

Quantile/Median/2D binning in Python

do you know a quick/elegant Python/Scipy/Numpy solution for the following problem:
You have a set of x, y coordinates with associated values w (all 1D arrays). Now bin x and y onto a 2D grid (size BINSxBINS) and calculate quantiles (like the median) of the w values for each bin, which should at the end result in a BINSxBINS 2D array with the required quantiles.
This is easy to do with some nested loop,but I am sure there is a more elegant solution.
Thanks,
Mark
This is what I came up with, I hope it's useful. It's not necessarily cleaner or better than using a loop, but maybe it'll get you started toward something better.
import numpy as np
bins_x, bins_y = 1., 1.
x = np.array([1,1,2,2,3,3,3])
y = np.array([1,1,2,2,3,3,3])
w = np.array([1,2,3,4,5,6,7], 'float')
# You can get a bin number for each point like this
x = (x // bins_x).astype('int')
y = (y // bins_y).astype('int')
shape = [x.max()+1, y.max()+1]
bin = np.ravel_multi_index([x, y], shape)
# You could get the mean by doing something like:
mean = np.bincount(bin, w) / np.bincount(bin)
# Median is a bit harder
order = bin.argsort()
bin = bin[order]
w = w[order]
edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1
med_index = (np.r_[0, edges] + np.r_[edges, len(w)]) // 2
median = w[med_index]
# But that's not quite right, so maybe
median2 = [np.median(i) for i in np.split(w, edges)]
Also take a look at numpy.histogram2d
I'm just trying to do this myself and it sound like you want the command "scipy.stats.binned_statistic_2d" from you can find the mean, median, standard devation or any defined function for the third parameter given the bins.
I realise this question has already been answered but I believe this is a good built in solution.
thanks a lot for your code. Based on it I found the following solution of my problem (only a minor modification of your code):
import numpy as np
BINS=10
boxsize=10.0
bins_x, bins_y = boxsize/BINS, boxsize/BINS
x = np.array([0,0,0,1,1,1,2,2,2,3,3,3])
y = np.array([0,0,0,1,1,1,2,2,2,3,3,3])
w = np.array([0,1,2,0,1,2,0,1,2,0,1,2], 'float')
# You can get a bin number for each point like this
x = (x // bins_x).astype('int')
y = (y // bins_y).astype('int')
shape = [BINS, BINS]
bin = np.ravel_multi_index([x, y], shape)
# Median
order = bin.argsort()
bin = bin[order]
w = w[order]
edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1
median = [np.median(i) for i in np.split(w, edges)]
#construct BINSxBINS matrix with median values
binvals=np.unique(bin)
medvals=np.zeros([BINS*BINS])
medvals[binvals]=median
medvals=medvals.reshape([BINS,BINS])
print medvals
With numpy/scipy it goes like this:
import numpy as np
import scipy.stats as stats
x = np.random.uniform(0,200,100)
y = np.random.uniform(0,200,100)
w = np.random.uniform(1,10,100)
h = np.histogram2d(x,y,bins=[10,10], weights=w,range=[[0,200],[0,200]])
hist, bins_x, bins_y = h
q = stats.mstats.mquantiles(hist,prob=[0.25, 0.5, 0.75])
>>> q.round(2)
array([ 512.8 , 555.41, 592.73])
q1 = np.where(hist<q[0],1,0)
q2 = np.where(np.logical_and(q[0]<=hist,hist<q[1]),2,0)
q3 = np.where(np.logical_and(q[1]<=hist,hist<=q[2]),3,0)
q4 = np.where(q[2]<hist,4,0)
>>>q1 + q2 + q3 + q4
array([[4, 3, 4, 3, 1, 1, 4, 3, 1, 2],
[1, 1, 4, 4, 2, 3, 1, 3, 3, 3],
[2, 3, 3, 2, 2, 2, 3, 2, 4, 2],
[2, 2, 3, 3, 3, 1, 2, 2, 1, 4],
[1, 3, 1, 4, 2, 1, 3, 1, 1, 3],
[4, 2, 2, 1, 2, 1, 3, 2, 1, 1],
[4, 1, 1, 3, 1, 3, 4, 3, 2, 1],
[4, 3, 1, 4, 4, 4, 1, 1, 2, 4],
[2, 4, 4, 4, 3, 4, 2, 2, 2, 4],
[2, 2, 4, 4, 3, 3, 1, 3, 4, 4]])
prob = [0.25, 0.5, 0.75] is the default value for the quantile settings, you can change it or leave it away.

Categories

Resources