Have a relatively simple block of code that loops through two arrays, multiplies, and adds cumulatively:
import numpy as np
a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
c = []
d = 0
for i, val in enumerate(a):
d += val
c.append(d)
d *= b[i]
Is there a way to do this without iterating? I imagine cumsum/cumprod could be used but I'm having trouble figuring out how. When you break down what's happening step by step, it looks like this:
# 0: 0 + a[0]
# 1: ((0 + a[0]) * b[0]) + a[1]
# 2: ((((0 + a[0]) * b[0]) + a[1]) * b[1]) + a[2]
Edit for clarification: Am interested in the list (or array) c.
In each iteration, you have -
d[n+1] = d[n] + a[n]
d[n+1] = d[n+1] * b[n]
Thus, essentially -
d[n+1] = (d[n] + a[n]) * b[n]
i.e. -
d[n+1] = (d[n]* b[n]) + K[n] #where `K[n] = a[n] * b[n]`
Now, using this formula if you write down the expressions for until n = 2 cases, you would have -
d[1] = d[0]*b[0] + K[0]
d[2] = d[0]*b[0]*b[1] + K[0]*b[1] + K[1]
d[3] = d[0]*b[0]*b[1]*b[2] + K[0]*b[1]*b[2] + K[1]*b[2] + K[2]
Scalars : b[0]*b[1]*b[2] b[1]*b[2] b[2] 1
Coefficients : d[0] K[0] K[1] K[2]
Thus, you would need reversed cumprod of b, perform elementwise multiplication with K array. Finally, to get c, perform cumsum and since c is stored before scaling down by b, so you would need to scale down the cumsum version by the reversed cumprod of b.
The final implementation would look like this -
# Get reversed cumprod of b and pad with `1` at the end
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
# Get K
K = a*b
# Append with 0 at the start, corresponding starting d
K_ext = np.hstack((0,K))
# Perform elementwsie multiplication and cumsum and scale down for final c
sums = (B*K_ext).cumsum()
c = sums[1:]/b_rev_cumprod
Runtime tests and verify output
Function definitions -
def original_approach(a,b):
c = []
d = 0
for i, val in enumerate(a):
d = d+val
c.append(d)
d = d*b[i]
return c
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
Runtimes and verification
Case #1: OP Sample case
In [301]: # Inputs
...: a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
...: b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
...:
In [302]: original_approach(a,b)
Out[302]:
[1,
2.0099999999999998,
4.4020000000000001,
6.1320600000000001,
7.6132059999999999,
8.7613205999999995,
14.256792359999999,
18.128396179999999]
In [303]: vectorized_approach(a,b)
Out[303]:
array([ 1. , 2.01 , 4.402 , 6.13206 ,
7.613206 , 8.7613206 , 14.25679236, 18.12839618])
Case #2: Large input case
In [304]: # Inputs
...: N = 1000
...: a = np.random.randint(0,100000,N)
...: b = np.random.rand(N)+0.1
...:
In [305]: np.allclose(original_approach(a,b),vectorized_approach(a,b))
Out[305]: True
In [306]: %timeit original_approach(a,b)
1000 loops, best of 3: 746 µs per loop
In [307]: %timeit vectorized_approach(a,b)
10000 loops, best of 3: 76.9 µs per loop
Please be mindful that for extremely huge input array cases if the b elements are such small fractions, because of cummulative operations, the initial numbers of b_rev_cumprod might come out as zeros resulting in NaNs in those initial places.
Let's see if we can get even faster. I am now leaving the pure python world and show that this purely numeric problems can be optimized even further.
The two players are #Divakar's fast vectorized version:
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
and a cython version:
%%cython
import numpy as np
def cython_approach(long[:] a, double[:] b):
cdef double d
cdef size_t i, n
n = a.shape[0]
cdef double[:] c = np.empty(n)
d = 0
for i in range(n):
d += a[i]
c[i] = d
d *= b[i]
return c
The cython version is about 5x faster than the vectorized version:
%timeit vectorized_approach(a,b) -> 10000 loops, best of 3: 43.4 µs per loop
%timeit cython_approach(a,b) -> 100000 loops, best of 3: 7.7 µs per loop
Another plus of the cython version is that it is much more readable.
The big downside is that you are leaving pure python and depending on your use case compiling an extension module may not be an option for you.
This here works for me and is vectorized
b_mat = np.tile(b,(b.size,1)).T
b_mat = np.vstack((np.ones(b.size),b_mat))
np.fill_diagonal(b_mat,1)
b_mat[np.triu_indices(b.size)]=1
b_prod_mat = np.cumprod(b_mat,axis=0)
b_prod_mat[np.triu_indices(b.size)] = 0
np.fill_diagonal(b_prod_mat,1)
c = np.dot(b_prod_mat,a)
c
# output
array([ 1. , 2.01 , 4.402, 6.132, 7.613, 8.761, 14.257,
18.128, 16.316])
I agree it is not easy to see whats going on. Your array c can be written as a matrix-vector multiplication b_prod_mat * a where a is your array and b_prod_mat consists of specific products of b. All the emphasis is basically to create b_prod_mat.
I am not sure that's better than a for loop but here is a way:
a.dot([np.concatenate((np.zeros(i), (1, ), b[i:-1])) for i in range(len(b))])
What it does it's create line of a big matrix A like this:
1 b0 b0b1 b0b1b2 ... b0b1..bn-1
0 1 b1 b1b2 ... b1..bn-1
0 0 1 b2 ...
...
0 0 0 0 ... 1
Then you simply multiply the vector a with the matrix A and you get your expected result.
Related
I'm counting the occurrences of non-overlapping grouped subsequences of length i in a binary list, so for example if I have a list:
[0, 1, 0, 1, 1, 0, 0, 0, 1, 1], I want to count occurrences of [0,0] (one), [0,1] (two), [1,0] (one), [1,1] (one).
I have created a function that accomplishes this (see below). However, I would like to see if there is anything that can be done to speed up the execution time of the function. I've already got it to be pretty quick (over previous versions of the same function), and it currently takes about ~0.03 seconds for a list of length=100,000 and i=2, and about 30 seconds for a list of length=100,000,000 and i=2. (This is a seemingly linear increase in time in relation to sequence length). However, my end goal is to do this with functions for multiple values of i, with sequences of lengths near 15 billion. Which, assuming linearity holds, would take about 4.2 hours for just i=2 (a higher value of i take longer as it has to count more unique subsequences).
I unsure if there is much more speed that can be gained here(at least, while still working in python), but I am open to suggestions on how to accomplish this faster (with any method or language)?
def subseq_counter(i,l):
"""counts the frequency of unique, non-overlapping, grouped subsequences of length i in a binary list l"""
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
#groups terms into i length subsequences
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
#removes any subsequences at the end that are not of length i
grouped_sort = sorted(grouped)
#necesary so as to make sure the output frequencies correlate to the ascending binary order of the subsequences
grouped_sort_values = Counter(grouped_sort).values()
# counts the elements' frequency
freq_list = list(grouped_sort_values)
return freq_list
I know that a marginally faster execution time can be obtained by removing the grouped_sorted line, however, I need to be able to access the frequencies in correlation to the ascening binary order of the subsequences (so for i=2 that would be [0,0],[0,1],[1,0],[1,1]) and have not figured about a better way around this.
I don't know if is faster, but try
import numpy as np
# create data
bits = np.random.randint(0, 2, 10000)
def subseq_counter(i: int, l: np.array):
"""
Counts the number of subsequences of length l in the array i
"""
# the list l is reshaped as a matrix of i columns, and
# matrix-multiplied by the binary weigts "power of 2"
# | [[2**2],
# | [2**1],
# | [2**0]]
# |____________________
# [[1,0,1], | 1*4 + 0*2 + 1*1 = 5
# [0,1,0], | 0*4 + 1*2 + 0*1 = 2
# ..., | ....
# [1,1,1]] | 1*4 + 1*2 + 1*1 = 7
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
print(f"Counts for {i} bits:")
for u, c in zip(unique, counts):
print(f"{u:0{i}b}:{c}")
return unique, counts
subseq_counter(2,bits)
subseq_counter(3,bits)
>>> Counts for 2 bits:
>>> 00:1264
>>> 01:1279
>>> 10:1237
>>> 11:1220
>>> Counts for 3 bits:
>>> 000:425
>>> 001:429
>>> 010:411
>>> 011:395
>>> 100:437
>>> 101:412
>>> 110:407
>>> 111:417
what it does is to reshape the list into an array of n rows by i columns, and converting to integer by multiplying by 2**n, converting 00 to 0, 01 to 1, 10 to 2 and 11 to 3, then doing the counting with np.unique()
Benchmark including some new solutions from me:
For i=2:
2.9 s ± 0.0 s Kelly_NumPy
3.7 s ± 0.0 s Kelly_bytes_count
6.6 s ± 0.0 s Kelly_zip
7.8 s ± 0.1 s Colim_numpy
8.4 s ± 0.0 s Paul_genzip
8.6 s ± 0.0 s Kelly_bytes_split2
10.5 s ± 0.0 s Kelly_bytes_slices2
10.6 s ± 0.1 s Kelly_bytes_split1
16.1 s ± 0.0 s Kelly_bytes_slices1
20.9 s ± 0.1 s constantstranger
45.1 s ± 0.3 s original
For i=5:
2.3 s ± 0.0 s Kelly_NumPy
3.8 s ± 0.0 s Kelly_zip
4.5 s ± 0.0 s Paul_genzip
4.5 s ± 0.0 s Kelly_bytes_split2
5.2 s ± 0.0 s Kelly_bytes_split1
5.4 s ± 0.0 s Kelly_bytes_slices2
7.1 s ± 0.0 s Colim_numpy
7.2 s ± 0.0 s Kelly_bytes_slices1
9.3 s ± 0.0 s constantstranger
20.6 s ± 0.0 s Kelly_bytes_count
25.3 s ± 0.1 s original
This is for a list of length n=1e6, times multiplied by 100 so it somewhat reflects your timings with length 1e8. I minimally modified the other solutions so they do what your original does, i.e., take a list of into and return a list of into in the correct order. One or two of my slower solutions only work if the length is a multiple of their block size, I didn't bother making them work for all lengths since they're slower anyway.
Full code (Try it online!):
def Kelly_NumPy(i, l):
a = np.frombuffer(bytes(l), np.int8)
stop = a.size // i * i
s = a[:stop:i]
for j in range(1, i):
s = (s << 1) | a[j:stop:i]
return np.unique(s, return_counts=True)[1].tolist()
def Kelly_zip(i, l):
ctr = Counter(zip(*[iter(l)]*i))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices1(i, l):
a = bytes(l)
slices = [a[j:j+i] for j in range(0, len(a)//i*i, i)]
ctr = Counter(slices)
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices2(i, l):
a = bytes(l)
ig = itemgetter(*(slice(j, j+i) for j in range(0, 1000*i, i)))
ctr = Counter(chain.from_iterable(
ig(a[k:k+1000*i])
for k in range(0, len(l), 1000*i)
))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_count(i, l):
n = len(l)
a = bytes(l)
b = bytearray([2]) * (n + n//i)
for j in range(i):
b[j+1::i+1] = a[j::i]
a = b
ss = [bytes([2])]
for _ in range(i):
ss = [s+b for s in ss for b in [bytes([0]), bytes([1])]]
return [a.count(s) for s in ss]
def Kelly_bytes_split1(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (stop + n - 1)
for j in range(i):
b[j::i+1] = a[j::i]
ctr = Counter(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_split2(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (5000*i + 4999)
ctr = Counter()
for k in range(0, stop, 5000*i):
for j in range(i):
b[j::i+1] = a[k+j:k+5000*i+j:i]
ctr.update(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def original(i,l):
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
grouped_sort = sorted(grouped)
grouped_sort_values = Counter(grouped_sort).values()
freq_list = list(grouped_sort_values)
return freq_list
def Paul_genzip(subseq_len, sequence):
ctr = Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
return [v for k, v in sorted(ctr.items())]
def constantstranger(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
def Colim_numpy(i: int, l):
l = np.array(l)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
return counts.tolist()
funcs = [
original,
Colim_numpy,
Paul_genzip,
constantstranger,
Kelly_NumPy,
Kelly_bytes_count,
Kelly_zip,
Kelly_bytes_slices1,
Kelly_bytes_slices2,
Kelly_bytes_split1,
Kelly_bytes_split2,
]
from time import time
import os
from collections import Counter
from itertools import repeat, chain, product
import numpy as np
from operator import itemgetter
from statistics import mean, stdev
n = 10**6
i = 2
times = {f: [] for f in funcs}
def stats(f):
ts = [t/n*1e8 for t in sorted(times[f])[:3]]
return f'{mean(ts):4.1f} s ± {stdev(ts):3.1f} s '
for _ in range(10):
l = [b % 2 for b in os.urandom(n)]
expect = None
for f in funcs:
t = time()
result = f(i, l)
t = time() - t
times[f].append(t)
if expect is None:
expect = result
else:
assert result == expect
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__,)
Not really sure I understood that last part about the order. It seems unnecessary to build a giant list of subsequences. Use a generator to yield the subsequences to the counter - that way you also don't have to fiddle with indices:
from collections import Counter
def count_subsequences(sequence, subseq_len=2):
return Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
sequence = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
counter = count_subsequences(sequence)
for subseq in (0, 0), (0, 1), (1, 0), (1, 1):
print("{}: {}".format(subseq, counter[subseq]))
Output:
(0, 0): 1
(0, 1): 2
(1, 0): 1
(1, 1): 1
>>>
In this case, the function returns the counter object itself, and the calling code displays the results in some order.
This is much faster. It uses Kelly's idea of using numpy.frombuffer instead of converting the list to numpy array, and uses Pandas to count unique values, which is faster than numpy.unique for more than 100 000 results
import pandas as pd
def subseq_counter(i: int, l):
l = np.frombuffer(bytes(l), np.int8)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2 **np.arange(i-1, -1, -1).T).astype(np.int8)
# bug fix: when not enough data, (higly probable for large i),
# iBits do not has every possible value, so returning unique values
# as list may lose information
answer = [0]*2**i # empty counter including all possible values
if len(iBits) > 100000:
for i, v in pd.value_counts(iBits).items():
answer[i] = v
else:
unique, count = np.unique(iBits, return_counts=True)
for i, v in zip(unique, count):
answer[i] = v
return answer
This is a way to do it:
from collections import Counter
from itertools import product
def subseq_counter(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
l = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
i = 2
print(subseq_counter(i, l))
Ouput:
[1, 2, 1, 1]
Notes:
Using the above code and changing i to 3 gives:
[0, 1, 1, 0, 0, 0, 1, 0]
This is showing the frequency for all possible binary values of length 3 in ascending order beginning with 0 (binary 0,0,0) and ending with 7 (binary 1,1,1). In other words, 0,0,0 occurs 0 times, 0,0,1 occurs 1 time, 0,1,0 occurs 1 time, 0,1,1 occurs 0 times, etc., through 1,1,1 which occurs 0 times.
Using the code in the question with i changed to 3 gives:
[1, 1, 1]
This output seems hard to decipher, as it isn't labeled so that we can easily see that the results with a non-zero value correspond to the 3-digit binary values 0,0,1, 0,1,0 and 1,1,0.
UPDATE:
Here's a benchmark of several approaches on an input list of length 55 million (with i set to 2) including OP's, counting sort (this answer), numpy including list-to-ndarray conversion overhead, and numpy without the overhead:
foo_1 output:
[10000000, 15000000, 15000000, 15000000]
foo_2 output:
[10000000, 15000000, 15000000, 15000000]
foo_3 output:
[10000000 15000000 15000000 15000000]
foo_4 output:
[10000000 15000000 15000000 15000000]
Timeit results:
foo_1 (OP) ran in 32.20719700001064 seconds using 1 iterations
foo_2 (counting sort) ran in 17.91718759998912 seconds using 1 iterations
foo_3 (numpy with list-to-array conversion) ran in 9.713831000000937 seconds using 1 iterations
foo_4 (numpy) ran in 1.695262699999148 seconds using 1 iterations
The clear winner is numpy, though unless the calling program can easily be changed to use ndarrays, the required conversion slows things down by a factor of about 5x in this example.
I have two arrays of size 15 : A = [a_0, ... , a_14] and B = [b_0, ..., b_14]
Goal: obtain the array C of size 8 resulting from
C = [a_0] * [b_7, ..., b_14] + [a_2, a_3] * [b_3, b_4, b_5, b_6] + [a_3, a_4, a_5, a_6] * [b_2, b_3] + [a_7, ..., a_14] * [b_0]
where * is the outer product np.outer. Note that:
each sub-array is of length 2^i for i between 0 and 3.
from the outer product, we obtain two vectors of size (8) and two matrices of sizes (2, 4) and (4, 2). We suppose that we flatten immediately after the product, in order to be able to sum the four products and have at the end a long vector of size 8.
My implementation is the following:
inds = [0, 1, 3, 7, 15]
C = np.zeros(8)
d = 4
for i in range(d):
left = A[inds[i]:inds[i+1]]
right = B[inds[d-i-1]:inds[d-i]]
C += (left[:, None]*right[None, :]).ravel() # same as np.outer(left, right).ravel()
Question: what is the fastest way to obtain C ? i.e. is there a way to avoid having this for loop to perform the summation ?
If not: what are my options ? code in C++ ? Cython ?
NB: this is to be generalized for loops of range(L+1) with L any integer. In the example above I have illustrated the case L=3 for better comprehension. FYI, the generalized code would look like this:
L = 3
inds = np.cumsum([2**k for k in range(0, L+1)])
inds = np.concatenate(([0], inds))
# Input arrays A and B are of size inds[-1]
C = np.zeros(2**L)
d = L+1
for i in range(d):
left = A[inds[i]:inds[i+1]]
right = B[inds[d-i-1]:inds[d-i]]
C += (left[:, None]*right[None, :]).ravel() # same as np.outer(left, right).ravel()
I think you can simply do:
C = np.outer(A[0], B[7:])+\
np.outer(A[[2,3]], B[[3,4,5,6]]).ravel()+\
np.outer(A[[3,4,5,6]], B[[2,3]]).ravel()+\
np.outer(A[7:], B[0]).ravel()
Am I wrong?
I have the following for loop that operates over three numpy arrays of the same length:
n = 100
a = np.random.random(n)
b = np.random.random(n)
c = np.random.random(n)
valid = np.empty(n)
for i in range(n):
valid[i] = np.any(a[i] > b[i:] + c[i:].cumsum())
Is there a way to replace this for loop with some vectorized numpy operations?
For example, because I only care if a[i] is larger than any value in b[i:], I can do np.minimum.accumulate(b[::-1])[::-1] which gets the smallest value of b at every index and onwards, and then compare it to a like this:
a > np.minimum.accumulate(b[::-1])[::-1]
but I still would need a way to vectorize the c[i:].cumsum() into a single array calculation.
Your goal is to find the minimum of b[i:] + c[i:].cumsum() for each i. Clearly you can compare that to a directly.
You can write the elements of c[i:].cumsum() as the upper triangle of a matrix. Let's look at a toy case with n = 3:
c = [c1, c2, c3]
s1 = c.cumsum()
s0 = np.r_[0, s1[:-1]]
You can write the elements of the cumulative sum as
c1, c1 + c2, c1 + c2 + c3 s1[0:] s1[0:] - s0[0]
c2, c2 + c3 = s1[1:] - c1 = s1[1:] - s0[1]
c3 s1[2:] - (c1 + c2) s1[2:] - s0[2]
You can use np.triu_indices to construct these sums as a raveled array:
r, c = np.triu_indices(n)
diff = s1[c] - s0[r] + b[c]
Since np.minimum is a ufunc, you can accumulate diff for each run defined by r using minimum.reduceat. The locations are given roughly by np.flatnonzero(np.diff(r)) + 1, but you can generate them faster with np.arange:
m = np.minimum.reduceat(diff, np.r_[0, np.arange(n, 1, -1).cumsum()])
So finally, you have:
valid = a > m
TL;DR
s1 = c.cumsum()
s0 = np.r_[0, s1[:-1]]
r, c = np.triu_indices(n)
valid = a > np.minimum.reduceat(s1[c] - s0[r] + b[c], np.r_[0, np.arange(n, 1, -1).cumsum()])
I assume you want to vectorize it to decrease the running time. Since you are only using pure NumPy operations, you can use numba: see 5 Minutes Guide to Numba
it will look something like this:
import numba
#numba.njit()
def valid_for_single_idx(idx, a, b, c):
return np.any(a[idx] > b[idx:] + c[idx:].cumsum())
valid = np.empty(n)
for i in range(n):
valid[i] = valid_for_single_idx(i, a, b, c)
So far it isn't really vectorization (as the loop still happens), but it translate the the numpy line into llvm, so it happens as fast as probably possible.
Although it's not increasing the speed, but looks a bit nicer, you can use .map:
import numba
from functools import partial
#numba.njit()
def valid_for_single_idx(idx, a, b, c):
return np.any(a[idx] > b[idx:] + c[idx:].cumsum())
valid = map(partial(valid_for_single_idx, a=a, b=b, c=c), range(n))
what is the most efficient way of concat two numbers to one number in python?
numbers are always in between 0 to 255, i have tested few ways by Concat as string and cast back to int but they are very costly in time vice for my code.
example
a = 152
c = 255
d = concat(a,c)
answer:
d = 152255
If the numbers are bounded, just multiply and add:
>>> a = 152
>>> c = 255
>>> d = a*1000+c
>>> d
152255
>>>
This is pretty fast:
def concat(a, b):
return 10**int(log(b, 10)+1)*a+b
It uses the logarithm to find how many times the first number must be multiplied by 10 for the sum to work as a concatenation
In [1]: from math import log
In [2]: a = 152
In [3]: b = 255
In [4]: def concat(a, b):
...: return 10**int(log(b, 10)+1)*a+b
...:
In [5]: concat(a, b)
Out[5]: 152255
In [6]: %timeit concat(a, b)
1000000 loops, best of 3: 1.18 us per loop
Yeah, there you go:
a = 152
b = 255
def concat(a, b):
n = next(x for x in range(10) if 10**x>a) # concatenates numbers up to 10**10
return a * 10**n + b
print(concat(a, b)) # -> 152255
I have an array A and a reference array B. Size of A is at least as big as B. e.g.
A = [2,100,300,793,1300,1500,1810,2400]
B = [4,305,789,1234,1890]
B is in fact the position of peaks in a signal at a specified time, and A contains position of peaks at a later time. But some of the elements in A are actually not the peaks I want (might be due to noise, etc), and I want to find the 'real' one in A based on B. The 'real' elements in A should be close to those in B, and in the example given above, the 'real' ones in A should be A'=[2,300,793,1300,1810]. It should be obvious in this example that 100,1500,2400 are not the ones we want as they are quite far off from any of the elements in B. How can I code this in the most efficient/accurate way in python/matlab?
Approach #1: With NumPy broadcasting, we can look for absolute element-wise subtractions between the input arrays and use an appropriate threshold to filter out unwanted elements from A. It seems for the given sample inputs, a threshold of 90 works.
Thus, we would have an implementation, like so -
thresh = 90
Aout = A[(np.abs(A[:,None] - B) < thresh).any(1)]
Sample run -
In [69]: A
Out[69]: array([ 2, 100, 300, 793, 1300, 1500, 1810, 2400])
In [70]: B
Out[70]: array([ 4, 305, 789, 1234, 1890])
In [71]: A[(np.abs(A[:,None] - B) < 90).any(1)]
Out[71]: array([ 2, 300, 793, 1300, 1810])
Approach #2: Based on this post, here's a memory efficient approach using np.searchsorted, which could be crucial for large arrays -
def searchsorted_filter(a, b, thresh):
choices = np.sort(b) # if b is already sorted, skip it
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
return a[np.minimum(np.abs(a - cl), np.abs(a - cr)) < thresh]
Sample run -
In [95]: searchsorted_filter(A,B, thresh = 90)
Out[95]: array([ 2, 300, 793, 1300, 1810])
Runtime test
In [104]: A = np.sort(np.random.randint(0,100000,(1000)))
In [105]: B = np.sort(np.random.randint(0,100000,(400)))
In [106]: out1 = A[(np.abs(A[:,None] - B) < 10).any(1)]
In [107]: out2 = searchsorted_filter(A,B, thresh = 10)
In [108]: np.allclose(out1, out2) # Verify results
Out[108]: True
In [109]: %timeit A[(np.abs(A[:,None] - B) < 10).any(1)]
100 loops, best of 3: 2.74 ms per loop
In [110]: %timeit searchsorted_filter(A,B, thresh = 10)
10000 loops, best of 3: 85.3 µs per loop
Jan 2018 Update with further performance boost
We can avoid the second usage of np.searchsorted(..., 'right') by making use of the indices obtained from np.searchsorted(..., 'left') and also the absolute computations, like so -
def searchsorted_filter_v2(a, b, thresh):
N = len(b)
choices = np.sort(b) # if b is already sorted, skip it
l = np.searchsorted(choices, a, 'left')
l_invalid_mask = l==N
l[l_invalid_mask] = N-1
left_offset = choices[l]-a
left_offset[l_invalid_mask] *= -1
r = (l - (left_offset!=0))
r_invalid_mask = r<0
r[r_invalid_mask] = 0
r += l_invalid_mask
right_offset = a-choices[r]
right_offset[r_invalid_mask] *= -1
out = a[(left_offset < thresh) | (right_offset < thresh)]
return out
Updated timings to test the further speedup -
In [388]: np.random.seed(0)
...: A = np.random.randint(0,1000000,(100000))
...: B = np.unique(np.random.randint(0,1000000,(40000)))
...: np.random.shuffle(B)
...: thresh = 10
...:
...: out1 = searchsorted_filter(A, B, thresh)
...: out2 = searchsorted_filter_v2(A, B, thresh)
...: print np.allclose(out1, out2)
True
In [389]: %timeit searchsorted_filter(A, B, thresh)
10 loops, best of 3: 24.2 ms per loop
In [390]: %timeit searchsorted_filter_v2(A, B, thresh)
100 loops, best of 3: 13.9 ms per loop
Digging deeper -
In [396]: a = A; b = B
In [397]: N = len(b)
...:
...: choices = np.sort(b) # if b is already sorted, skip it
...:
...: l = np.searchsorted(choices, a, 'left')
In [398]: %timeit np.sort(B)
100 loops, best of 3: 2 ms per loop
In [399]: %timeit np.searchsorted(choices, a, 'left')
100 loops, best of 3: 10.3 ms per loop
Seems like searchsorted and sort are taking almost all of the runtime and they seem essential to this method. So, doesn't seem like it could be improved any further staying with this sort-based approach.
You could find the distance of each point in A from each value in B using bsxfun and then find the index of the point in A which is closest to each value in B using min.
[dists, ind] = min(abs(bsxfun(#minus, A, B.')), [], 2)
If you're on R2016b, bsxfun can be removed thanks to automatic broadcasting
[dists, ind] = min(abs(A - B.'), [], 2);
If you suspect that some values in B are not real peaks, then you can set a threshold value and remove any distances that were greater than this value.
threshold = 90;
ind = ind(dists < threshold);
Then we can use ind to index into A
output = A(ind);
You can use MATLAB interp1 function that exactly does what you want.
option nearest is used to find nearest points and there is no need to specify a threshold.
out = interp1(A, A, B, 'nearest', 'extrap');
comparing with other method:
A = sort(randi([0,1000000],1,10000));
B = sort(randi([0,1000000],1,4000));
disp('---interp1----------------')
tic
out = interp1(A, A, B, 'nearest', 'extrap');
toc
disp('---subtraction with threshold------')
%numpy version is the same
tic
[dists, ind] = min(abs(bsxfun(#minus, A, B.')), [], 2);
toc
Result:
---interp1----------------
Elapsed time is 0.00778699 seconds.
---subtraction with threshold------
Elapsed time is 0.445485 seconds.
interp1 can be used for inputs larger than 10000 and 4000 but in subtrction method out of memory error occured.