Fastest way to count frequencies of ordered list entries - python

I'm counting the occurrences of non-overlapping grouped subsequences of length i in a binary list, so for example if I have a list:
[0, 1, 0, 1, 1, 0, 0, 0, 1, 1], I want to count occurrences of [0,0] (one), [0,1] (two), [1,0] (one), [1,1] (one).
I have created a function that accomplishes this (see below). However, I would like to see if there is anything that can be done to speed up the execution time of the function. I've already got it to be pretty quick (over previous versions of the same function), and it currently takes about ~0.03 seconds for a list of length=100,000 and i=2, and about 30 seconds for a list of length=100,000,000 and i=2. (This is a seemingly linear increase in time in relation to sequence length). However, my end goal is to do this with functions for multiple values of i, with sequences of lengths near 15 billion. Which, assuming linearity holds, would take about 4.2 hours for just i=2 (a higher value of i take longer as it has to count more unique subsequences).
I unsure if there is much more speed that can be gained here(at least, while still working in python), but I am open to suggestions on how to accomplish this faster (with any method or language)?
def subseq_counter(i,l):
"""counts the frequency of unique, non-overlapping, grouped subsequences of length i in a binary list l"""
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
#groups terms into i length subsequences
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
#removes any subsequences at the end that are not of length i
grouped_sort = sorted(grouped)
#necesary so as to make sure the output frequencies correlate to the ascending binary order of the subsequences
grouped_sort_values = Counter(grouped_sort).values()
# counts the elements' frequency
freq_list = list(grouped_sort_values)
return freq_list
I know that a marginally faster execution time can be obtained by removing the grouped_sorted line, however, I need to be able to access the frequencies in correlation to the ascening binary order of the subsequences (so for i=2 that would be [0,0],[0,1],[1,0],[1,1]) and have not figured about a better way around this.

I don't know if is faster, but try
import numpy as np
# create data
bits = np.random.randint(0, 2, 10000)
def subseq_counter(i: int, l: np.array):
"""
Counts the number of subsequences of length l in the array i
"""
# the list l is reshaped as a matrix of i columns, and
# matrix-multiplied by the binary weigts "power of 2"
# | [[2**2],
# | [2**1],
# | [2**0]]
# |____________________
# [[1,0,1], | 1*4 + 0*2 + 1*1 = 5
# [0,1,0], | 0*4 + 1*2 + 0*1 = 2
# ..., | ....
# [1,1,1]] | 1*4 + 1*2 + 1*1 = 7
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
print(f"Counts for {i} bits:")
for u, c in zip(unique, counts):
print(f"{u:0{i}b}:{c}")
return unique, counts
subseq_counter(2,bits)
subseq_counter(3,bits)
>>> Counts for 2 bits:
>>> 00:1264
>>> 01:1279
>>> 10:1237
>>> 11:1220
>>> Counts for 3 bits:
>>> 000:425
>>> 001:429
>>> 010:411
>>> 011:395
>>> 100:437
>>> 101:412
>>> 110:407
>>> 111:417
what it does is to reshape the list into an array of n rows by i columns, and converting to integer by multiplying by 2**n, converting 00 to 0, 01 to 1, 10 to 2 and 11 to 3, then doing the counting with np.unique()

Benchmark including some new solutions from me:
For i=2:
2.9 s ± 0.0 s Kelly_NumPy
3.7 s ± 0.0 s Kelly_bytes_count
6.6 s ± 0.0 s Kelly_zip
7.8 s ± 0.1 s Colim_numpy
8.4 s ± 0.0 s Paul_genzip
8.6 s ± 0.0 s Kelly_bytes_split2
10.5 s ± 0.0 s Kelly_bytes_slices2
10.6 s ± 0.1 s Kelly_bytes_split1
16.1 s ± 0.0 s Kelly_bytes_slices1
20.9 s ± 0.1 s constantstranger
45.1 s ± 0.3 s original
For i=5:
2.3 s ± 0.0 s Kelly_NumPy
3.8 s ± 0.0 s Kelly_zip
4.5 s ± 0.0 s Paul_genzip
4.5 s ± 0.0 s Kelly_bytes_split2
5.2 s ± 0.0 s Kelly_bytes_split1
5.4 s ± 0.0 s Kelly_bytes_slices2
7.1 s ± 0.0 s Colim_numpy
7.2 s ± 0.0 s Kelly_bytes_slices1
9.3 s ± 0.0 s constantstranger
20.6 s ± 0.0 s Kelly_bytes_count
25.3 s ± 0.1 s original
This is for a list of length n=1e6, times multiplied by 100 so it somewhat reflects your timings with length 1e8. I minimally modified the other solutions so they do what your original does, i.e., take a list of into and return a list of into in the correct order. One or two of my slower solutions only work if the length is a multiple of their block size, I didn't bother making them work for all lengths since they're slower anyway.
Full code (Try it online!):
def Kelly_NumPy(i, l):
a = np.frombuffer(bytes(l), np.int8)
stop = a.size // i * i
s = a[:stop:i]
for j in range(1, i):
s = (s << 1) | a[j:stop:i]
return np.unique(s, return_counts=True)[1].tolist()
def Kelly_zip(i, l):
ctr = Counter(zip(*[iter(l)]*i))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices1(i, l):
a = bytes(l)
slices = [a[j:j+i] for j in range(0, len(a)//i*i, i)]
ctr = Counter(slices)
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices2(i, l):
a = bytes(l)
ig = itemgetter(*(slice(j, j+i) for j in range(0, 1000*i, i)))
ctr = Counter(chain.from_iterable(
ig(a[k:k+1000*i])
for k in range(0, len(l), 1000*i)
))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_count(i, l):
n = len(l)
a = bytes(l)
b = bytearray([2]) * (n + n//i)
for j in range(i):
b[j+1::i+1] = a[j::i]
a = b
ss = [bytes([2])]
for _ in range(i):
ss = [s+b for s in ss for b in [bytes([0]), bytes([1])]]
return [a.count(s) for s in ss]
def Kelly_bytes_split1(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (stop + n - 1)
for j in range(i):
b[j::i+1] = a[j::i]
ctr = Counter(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_split2(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (5000*i + 4999)
ctr = Counter()
for k in range(0, stop, 5000*i):
for j in range(i):
b[j::i+1] = a[k+j:k+5000*i+j:i]
ctr.update(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def original(i,l):
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
grouped_sort = sorted(grouped)
grouped_sort_values = Counter(grouped_sort).values()
freq_list = list(grouped_sort_values)
return freq_list
def Paul_genzip(subseq_len, sequence):
ctr = Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
return [v for k, v in sorted(ctr.items())]
def constantstranger(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
def Colim_numpy(i: int, l):
l = np.array(l)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
return counts.tolist()
funcs = [
original,
Colim_numpy,
Paul_genzip,
constantstranger,
Kelly_NumPy,
Kelly_bytes_count,
Kelly_zip,
Kelly_bytes_slices1,
Kelly_bytes_slices2,
Kelly_bytes_split1,
Kelly_bytes_split2,
]
from time import time
import os
from collections import Counter
from itertools import repeat, chain, product
import numpy as np
from operator import itemgetter
from statistics import mean, stdev
n = 10**6
i = 2
times = {f: [] for f in funcs}
def stats(f):
ts = [t/n*1e8 for t in sorted(times[f])[:3]]
return f'{mean(ts):4.1f} s ± {stdev(ts):3.1f} s '
for _ in range(10):
l = [b % 2 for b in os.urandom(n)]
expect = None
for f in funcs:
t = time()
result = f(i, l)
t = time() - t
times[f].append(t)
if expect is None:
expect = result
else:
assert result == expect
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__,)

Not really sure I understood that last part about the order. It seems unnecessary to build a giant list of subsequences. Use a generator to yield the subsequences to the counter - that way you also don't have to fiddle with indices:
from collections import Counter
def count_subsequences(sequence, subseq_len=2):
return Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
sequence = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
counter = count_subsequences(sequence)
for subseq in (0, 0), (0, 1), (1, 0), (1, 1):
print("{}: {}".format(subseq, counter[subseq]))
Output:
(0, 0): 1
(0, 1): 2
(1, 0): 1
(1, 1): 1
>>>
In this case, the function returns the counter object itself, and the calling code displays the results in some order.

This is much faster. It uses Kelly's idea of using numpy.frombuffer instead of converting the list to numpy array, and uses Pandas to count unique values, which is faster than numpy.unique for more than 100 000 results
import pandas as pd
def subseq_counter(i: int, l):
l = np.frombuffer(bytes(l), np.int8)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2 **np.arange(i-1, -1, -1).T).astype(np.int8)
# bug fix: when not enough data, (higly probable for large i),
# iBits do not has every possible value, so returning unique values
# as list may lose information
answer = [0]*2**i # empty counter including all possible values
if len(iBits) > 100000:
for i, v in pd.value_counts(iBits).items():
answer[i] = v
else:
unique, count = np.unique(iBits, return_counts=True)
for i, v in zip(unique, count):
answer[i] = v
return answer

This is a way to do it:
from collections import Counter
from itertools import product
def subseq_counter(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
l = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
i = 2
print(subseq_counter(i, l))
Ouput:
[1, 2, 1, 1]
Notes:
Using the above code and changing i to 3 gives:
[0, 1, 1, 0, 0, 0, 1, 0]
This is showing the frequency for all possible binary values of length 3 in ascending order beginning with 0 (binary 0,0,0) and ending with 7 (binary 1,1,1). In other words, 0,0,0 occurs 0 times, 0,0,1 occurs 1 time, 0,1,0 occurs 1 time, 0,1,1 occurs 0 times, etc., through 1,1,1 which occurs 0 times.
Using the code in the question with i changed to 3 gives:
[1, 1, 1]
This output seems hard to decipher, as it isn't labeled so that we can easily see that the results with a non-zero value correspond to the 3-digit binary values 0,0,1, 0,1,0 and 1,1,0.
UPDATE:
Here's a benchmark of several approaches on an input list of length 55 million (with i set to 2) including OP's, counting sort (this answer), numpy including list-to-ndarray conversion overhead, and numpy without the overhead:
foo_1 output:
[10000000, 15000000, 15000000, 15000000]
foo_2 output:
[10000000, 15000000, 15000000, 15000000]
foo_3 output:
[10000000 15000000 15000000 15000000]
foo_4 output:
[10000000 15000000 15000000 15000000]
Timeit results:
foo_1 (OP) ran in 32.20719700001064 seconds using 1 iterations
foo_2 (counting sort) ran in 17.91718759998912 seconds using 1 iterations
foo_3 (numpy with list-to-array conversion) ran in 9.713831000000937 seconds using 1 iterations
foo_4 (numpy) ran in 1.695262699999148 seconds using 1 iterations
The clear winner is numpy, though unless the calling program can easily be changed to use ndarrays, the required conversion slows things down by a factor of about 5x in this example.

Related

Taking equal number of elements from two arrays, such that the taken values have as few duplicates as possible

Consider we have 2 arrays of size N, with their values in the range [0, N-1]. For example:
a = np.array([0, 1, 2, 0])
b = np.array([2, 0, 3, 3])
I need to produce a new array c which contains exactly N/2 elements from a and b respectively, i.e. the values must be taken evenly/equally from both parent arrays.
(For odd length, this would be (N-1)/2 and (N+1)/2. Can also ignore odd length case, not important).
Taking equal number of elements from two arrays is pretty trivial, but there is an additional constraint: c should have as many unique numbers as possible / as few duplicates as possible.
For example, a solution to a and b above is:
c = np.array([b[0], a[1], b[2], a[3]])
>>> c
array([2, 1, 3, 0])
Note that the position/order is preserved. Each element of a and b that we took to form c is in same position. If element i in c is from a, c[i] == a[i], same for b.
A straightforward solution for this is simply a sort of path traversal, easy enough to implement recursively:
def traverse(i, a, b, path, n_a, n_b, best, best_path):
if n_a == 0 and n_b == 0:
score = len(set(path))
return (score, path.copy()) if score > best else (best, best_path)
if n_a > 0:
path.append(a[i])
best, best_path = traverse(i + 1, a, b, path, n_a - 1, n_b, best, best_path)
path.pop()
if n_b > 0:
path.append(b[i])
best, best_path = traverse(i + 1, a, b, path, n_a, n_b - 1, best, best_path)
path.pop()
return best, best_path
Here n_a and n_b are how many values we will take from a and b respectively, it's 2 and 2 as we want to evenly take 4 items.
>>> score, best_path = traverse(0, a, b, [], 2, 2, 0, None)
>>> score, best_path
(4, [2, 1, 3, 0])
Is there a way to implement the above in a more vectorized/efficient manner, possibly through numpy?
The algorithm is slow mainly because it runs in an exponential time. There is no straightforward way to vectorize this algorithm using only Numpy because of the recursion. Even if it would be possible, the huge number of combinations would cause most Numpy implementations to be inefficient (due to large Numpy arrays to compute). Additionally, there is AFAIK no vectorized operation to count the number of unique values of many rows efficiently (the usual way is to use np.unique which is not efficient in this case and cannot be use without a loop). As a result, there is two possible strategy to speed this up:
trying to find an algorithm with a reasonable complexity (eg. <= O(n^4));
using compilation methods, micro-optimizations and tricks to write a faster brute-force implementation.
Since finding a correct sub-exponential algorithm turns out not to be easy, I choose the other approach (though the first approach is the best).
The idea is to:
remove the recursion by generating all possible solutions using a loop iterating on integer;
write a fast way to count unique items of an array;
use the Numba JIT compiler so to optimize the code that is only efficient once compiled.
Here is the final code:
import numpy as np
import numba as nb
# Naive way to count unique items.
# This is a slow fallback implementation.
#nb.njit
def naive_count_unique(arr):
count = 0
for i in range(len(arr)):
val = arr[i]
found = False
for j in range(i):
if arr[j] == val:
found = True
break
if not found:
count += 1
return count
# Optimized way to count unique items on small arrays.
# Count items 2 by 2.
# Fast on small arrays.
#nb.njit
def optim_count_unique(arr):
count = 0
for i in range(0, len(arr), 2):
if arr[i] == arr[i+1]:
tmp = 1
for j in range(i):
if arr[j] == arr[i]: tmp = 0
count += tmp
else:
val1, val2 = arr[i], arr[i+1]
tmp1, tmp2 = 1, 1
for j in range(i):
val = arr[j]
if val == val1: tmp1 = 0
if val == val2: tmp2 = 0
count += tmp1 + tmp2
return count
#nb.njit
def count_unique(arr):
if len(arr) % 2 == 0:
return optim_count_unique(arr)
else:
# Odd case: not optimized yet
return naive_count_unique(arr)
# Count the number of bits in a 32-bit integer
# See https://stackoverflow.com/questions/71097470/msb-lsb-popcount-in-numba
#nb.njit('int_(uint32)', inline='always')
def popcount(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
c = np.uint32((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
return c
# Count the number of bits in a 64-bit integer
#nb.njit(inline='always')
def bit_count(n):
if n < (1 << 30):
return popcount(np.uint32(n))
else:
return popcount(np.uint32(n)) + popcount(np.uint32(n >> 32))
# Mutate `out` so not to create an expensive new temporary array
#nb.njit
def int_to_path(n, out, a, b):
for i in range(len(out)):
out[i] = a[i] if ((n >> i) & 1) else b[i]
#nb.njit(['(int32[:], int32[:], int64, int64)', '(int64[:], int64[:], int64, int64)'])
def traverse_fast(a, b, n_a, n_b):
# This assertion is needed because the paths are encoded using 64-bit.
# This should not be a problem in practice since the number of solutions to
# test would be impracticably huge to test using this algorithm anyway.
assert n_a + n_b < 62
max_iter = 1 << (n_a + n_b)
path = np.empty(n_a + n_b, dtype=a.dtype)
score, best_score, best_i = 0, 0, 0
# Iterate over all cases (more than the set of possible solution)
for i in range(max_iter):
# Filter the possible solutions
if bit_count(i) != n_b:
continue
# Analyse the score of the solution
int_to_path(i, path, a, b)
score = count_unique(path)
# Store it if it better than the previous one
if score > best_score:
best_score = score
best_i = i
int_to_path(best_i, path, a, b)
return best_score, path
This implementation is about 30 times faster on arrays of size 8 on my machine. On could use several cores to speed this up even further. However, I think it is better to focus on finding a sub-exponential implementation so to avoid wasting more computing resources. Note that the path is different from the initial function but the score is the same on random arrays. It can help others to test their implementation on larger arrays without waiting for a long time.
Test this heavily.
import numpy as np
from numpy.random._generator import default_rng
rand = default_rng(seed=1)
n = 16
a = rand.integers(low=0, high=n, size=n)
b = rand.integers(low=0, high=n, size=n)
uniques = np.setxor1d(a, b)
print(a)
print(b)
print(uniques)
def limited_uniques(arr: np.ndarray) -> np.ndarray:
choose = np.zeros(shape=n, dtype=bool)
_, idx, _ = np.intersect1d(arr, uniques, return_indices=True)
idx = idx[:n//2]
choose[idx] = True
n_missing = n//2 - len(idx)
counts = choose.cumsum()
diffs = np.arange(n) - counts
at = np.searchsorted(diffs, n_missing)
choose[:at] = True
return arr[choose]
a_half = limited_uniques(a)
uniques = np.union1d(uniques, np.setdiff1d(a, a_half))
interleaved = np.empty_like(a)
interleaved[0::2] = a_half
interleaved[1::2] = limited_uniques(b)
print(interleaved)
[ 7 8 12 15 0 2 13 15 3 4 13 6 4 13 4 6]
[10 8 1 0 13 12 13 8 13 5 7 12 1 4 1 7]
[ 1 2 3 5 6 10 15]
[ 7 10 8 8 12 1 15 0 0 13 2 12 3 5 6 4]

Finding a maximum of the product of three integers in a list of any size

I just finished working on this practice problem:
Given a list of integers, return the largest product that can be made by multiplying any three integers.
For example, if the list is [-10, -10, 5, 2], we should return 500, since that's -10 * -10 * 5.
I've written the following code and it seems to work, but I feel like it could be simpler somehow. Any ideas?
def maxProduct(lst):
combo = []
for i in range(0,len(lst)):
for j in range(0,len(lst)):
for k in range(0,len(lst)):
if i != j and j != k and i != k:
x = sorted([lst[i]] + [lst[j]] + [lst[k]])
if x not in combo:
combo.append(x)
final = []
for i in combo:
result = 1
for j in i:
result = result * j
final.append(result)
return max(final)
Taking a similar "brute force" approach, you can use some built-ins to make this a one-liner (and some imports):
from itertools import combinations
from functools import reduce
from operator import mul
result = max(reduce(mul, p, 1) for p in combinations(arr, 3))
There are only two non-degenerate possibilities, given that the list is of integers:
The three maximum positive integers.
The maximum positive integer and the two minimum negative integers.
Sort the list.
return max( values[-1] * values[-2] * values[-3],
values[-1] * values[ 0] * values[ 1])
Change x = sorted([lst[i]] + [lst[j]] + [lst[k]]) to x = sorted([lst[i], lst[j], lst[k]])
Here is a generic and yet very fast way (O(n) for small k) of doing this (< 6ms for 100K elements and k=3).
It is generic in that you can look for the max product of any k>0 values (even or odd) if you so chose, not just k=3.
The idea is to look for the top k and bottom k values (hi, resp. lo). Instead of sorting (O(n log n)), we use heapq (O(n), used twice) to find hi and lo.
Usually hi will have the most positive values and lo will have the most negative values, but it works if there are negatives in hi or positives in lo. Then we look for k combinations among the 2k-list hi + lo.
from itertools import combinations
from functools import reduce
from operator import mul
import heapq
def prod(lst):
return reduce(mul, lst, 1)
def find_max_prod(lst, k):
hi = heapq.nlargest(k, lst)
lo = heapq.nsmallest(min(k, len(lst) - len(pos)), lst)
return max((prod(vals), vals) for vals in combinations(hi + lo, r=k))
This also returns both the product and the values that contributed to it (for inspection):
>>> find_max_prod([-10, -10, 5, 2], k=3)
(500, (-10, -10, 5))
>>> find_max_prod([-10, -10, 7, 3, 1, 20, -30, 5, 2], k=3)
(6000, (-30, -10, 20))
>>> find_max_prod([-10, -10, 7, 3, 1, 20, -30, 5, 2], k=4)
(42000, (-30, -10, 20, 7))
Speed
n = 100_000
lst = np.random.randint(-20, 20, size=n).tolist()
%timeit find_max_prod(lst, 3)
# 5.7 ms ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Notes
It works if there are no positive numbers (or no negative ones):
>>> find_max_prod([-1,-2,-1,-5], 3)
(-2, (-1, -1, -2))
>>> find_max_prod([5,6,7,8], 3)
(336, (8, 7, 6))
It works fine with floats as well:
>>> lst = np.random.normal(size=n).tolist()
... lst[:4]
[-1.2645437227178908,
0.04542859270503795,
-0.17997935575118532,
-0.03485546753207921]
>>> find_max_prod(lst, 3)
(72.00185172194192,
(-4.159094140171658, -4.145875048073711, 4.175694390863968))
Simple naive implementation (O(n3)):
import itertools
def max_product_of_three(values):
return max(
x * y * z
for x, y, z in itertools.combinations(values, 3)
)
You can probably improve this by sorting values first.
Edit: here is a much more performant (O(n)) solution
import heapq
def max_product_of_three(values):
x, y, z = heapq.nlargest(3, values)
a, b = heapq.nsmallest(2, values)
return max(
x * a * b,
x * y * z
)

Most fast and computationally efficient way of generating several unique random ints within range, excluding list of ints

I want to generate a list of unique numbers from 0 to 2 million, excluding several numbers. The best solution I came up with is this
excludez = [34, 394849, 2233, 22345, 95995, 2920]
random.sample([i for i in range(0,2000000) if i not in excludez ], 64)
This generates 64 random ints from 0 to 2 million, excluding values in the list excludez.
This contains a generator expression, so I am wondering if there is a faster solution to this. I am open to using any library, especially numpy.
Edit:
The generated samples should contain unique numbers.
Edit 2:
I tested all the solutions using
print(timeit(lambda: solnX(), number=256))
And then did 3 samples of that code.
Here are the average results:
Original:
135.838 seconds
#inspectorG4dget
0.02750687366665261
#jdehesa 1st solution
150.08836392466674
(surprising since was a numpy solution
#jdehesa 2nd solution
0.022973252333334433 seconds
#Andrej Kesely
0.016359308333373217 seconds
#Divakar
39.05853628633334 seconds
I timed in google colab, here's a link to the notebook.
I rearranged the code a bit so that all solutions had a level playing field.
https://colab.research.google.com/drive/1ITYNrSTEVR_M5QZhqaSDmM8Q06IHsE73
Here's one with masking -
def random_uniq(excludez, maxnum, num_samples):
m = np.ones(maxnum, dtype=bool)
m[excludez] = 0
c = np.count_nonzero(m)
idx = np.random.choice(c,num_samples,replace=False)
m2 = np.ones(c, dtype=bool)
m2[idx] = 0
mc = m.copy()
m[m] = m2
out = np.flatnonzero(m!=mc)
return out
excludez = [34, 394849, 2233, 22345, 95995, 2920]
out = random_uniq(excludez, maxnum=2000000, num_samples=64)
In [85]: excludez = set([34, 394849, 2233, 22345, 95995, 2920]) # faster lookups
In [86]: answer = set() # since you don't really care about order
In [87]: while len(answer) < 64:
...: r = random.randrange(0,2000000)
...: if r not in excludez and r not in answer: answer.add(r)
...:
This is one method to do it with NumPy:
import numpy as np
np.random.seed(0)
excludez = np.sort([2, 3, 6, 7, 13])
n = 15
size = 5
# Get unique integers in a reduced range
r = np.random.choice(n - len(excludez), size, replace=False)
# Shift values accordingly so excluded values are avoided
shift = np.arange(len(excludez) + 1)
r += shift[np.searchsorted(excludez - shift[:-1], r, 'right')]
print(r)
# [ 4 12 8 14 1]
Here is the same algorithm with plain Python:
import random
import bisect
random.seed(0)
excludez = [2, 3, 6, 7, 13]
n = 15
size = 5
shift = range(len(excludez) + 1)
search = [exc - i for i, exc in enumerate(excludez)]
r = random.sample(range(n - len(excludez)), size)
r = [v + shift[bisect.bisect_right(search, v)] for v in r]
print(r)
# [10, 14, 0, 4, 8]
One possible solution, method2 might contain duplicates, method3 no:
from timeit import timeit
import random
excludez = [34, 394849, 2233, 22345, 95995, 2920]
def method1():
return random.sample([i for i in range(0,2000000) if i not in excludez ], 64)
def method2():
out = []
while len(out) < 64:
i = int(random.random() * 2000000)
if i in excludez:
continue
out.append(i)
return out
def method3():
out = []
while len(out) < 64:
i = int(random.random() * 2000000)
if i in excludez or i in out:
continue
out.append(i)
return out
print(timeit(lambda: method1(), number=10))
print(timeit(lambda: method2(), number=10))
print(timeit(lambda: method3(), number=10))
Prints:
1.865599181000107
0.0002175730000999465
0.00039564000007885625
EDIT: Added int()

Cumulative addition/multiplication in NumPy

Have a relatively simple block of code that loops through two arrays, multiplies, and adds cumulatively:
import numpy as np
a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
c = []
d = 0
for i, val in enumerate(a):
d += val
c.append(d)
d *= b[i]
Is there a way to do this without iterating? I imagine cumsum/cumprod could be used but I'm having trouble figuring out how. When you break down what's happening step by step, it looks like this:
# 0: 0 + a[0]
# 1: ((0 + a[0]) * b[0]) + a[1]
# 2: ((((0 + a[0]) * b[0]) + a[1]) * b[1]) + a[2]
Edit for clarification: Am interested in the list (or array) c.
In each iteration, you have -
d[n+1] = d[n] + a[n]
d[n+1] = d[n+1] * b[n]
Thus, essentially -
d[n+1] = (d[n] + a[n]) * b[n]
i.e. -
d[n+1] = (d[n]* b[n]) + K[n] #where `K[n] = a[n] * b[n]`
Now, using this formula if you write down the expressions for until n = 2 cases, you would have -
d[1] = d[0]*b[0] + K[0]
d[2] = d[0]*b[0]*b[1] + K[0]*b[1] + K[1]
d[3] = d[0]*b[0]*b[1]*b[2] + K[0]*b[1]*b[2] + K[1]*b[2] + K[2]
Scalars : b[0]*b[1]*b[2] b[1]*b[2] b[2] 1
Coefficients : d[0] K[0] K[1] K[2]
Thus, you would need reversed cumprod of b, perform elementwise multiplication with K array. Finally, to get c, perform cumsum and since c is stored before scaling down by b, so you would need to scale down the cumsum version by the reversed cumprod of b.
The final implementation would look like this -
# Get reversed cumprod of b and pad with `1` at the end
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
# Get K
K = a*b
# Append with 0 at the start, corresponding starting d
K_ext = np.hstack((0,K))
# Perform elementwsie multiplication and cumsum and scale down for final c
sums = (B*K_ext).cumsum()
c = sums[1:]/b_rev_cumprod
Runtime tests and verify output
Function definitions -
def original_approach(a,b):
c = []
d = 0
for i, val in enumerate(a):
d = d+val
c.append(d)
d = d*b[i]
return c
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
Runtimes and verification
Case #1: OP Sample case
In [301]: # Inputs
...: a = np.array([1, 2, 4, 6, 7, 8, 9, 11])
...: b = np.array([0.01, 0.2, 0.03, 0.1, 0.1, 0.6, 0.5, 0.9])
...:
In [302]: original_approach(a,b)
Out[302]:
[1,
2.0099999999999998,
4.4020000000000001,
6.1320600000000001,
7.6132059999999999,
8.7613205999999995,
14.256792359999999,
18.128396179999999]
In [303]: vectorized_approach(a,b)
Out[303]:
array([ 1. , 2.01 , 4.402 , 6.13206 ,
7.613206 , 8.7613206 , 14.25679236, 18.12839618])
Case #2: Large input case
In [304]: # Inputs
...: N = 1000
...: a = np.random.randint(0,100000,N)
...: b = np.random.rand(N)+0.1
...:
In [305]: np.allclose(original_approach(a,b),vectorized_approach(a,b))
Out[305]: True
In [306]: %timeit original_approach(a,b)
1000 loops, best of 3: 746 µs per loop
In [307]: %timeit vectorized_approach(a,b)
10000 loops, best of 3: 76.9 µs per loop
Please be mindful that for extremely huge input array cases if the b elements are such small fractions, because of cummulative operations, the initial numbers of b_rev_cumprod might come out as zeros resulting in NaNs in those initial places.
Let's see if we can get even faster. I am now leaving the pure python world and show that this purely numeric problems can be optimized even further.
The two players are #Divakar's fast vectorized version:
def vectorized_approach(a,b):
b_rev_cumprod = b[::-1].cumprod()[::-1]
B = np.hstack((b_rev_cumprod,1))
K = a*b
K_ext = np.hstack((0,K))
sums = (B*K_ext).cumsum()
return sums[1:]/b_rev_cumprod
and a cython version:
%%cython
import numpy as np
def cython_approach(long[:] a, double[:] b):
cdef double d
cdef size_t i, n
n = a.shape[0]
cdef double[:] c = np.empty(n)
d = 0
for i in range(n):
d += a[i]
c[i] = d
d *= b[i]
return c
The cython version is about 5x faster than the vectorized version:
%timeit vectorized_approach(a,b) -> 10000 loops, best of 3: 43.4 µs per loop
%timeit cython_approach(a,b) -> 100000 loops, best of 3: 7.7 µs per loop
Another plus of the cython version is that it is much more readable.
The big downside is that you are leaving pure python and depending on your use case compiling an extension module may not be an option for you.
This here works for me and is vectorized
b_mat = np.tile(b,(b.size,1)).T
b_mat = np.vstack((np.ones(b.size),b_mat))
np.fill_diagonal(b_mat,1)
b_mat[np.triu_indices(b.size)]=1
b_prod_mat = np.cumprod(b_mat,axis=0)
b_prod_mat[np.triu_indices(b.size)] = 0
np.fill_diagonal(b_prod_mat,1)
c = np.dot(b_prod_mat,a)
c
# output
array([ 1. , 2.01 , 4.402, 6.132, 7.613, 8.761, 14.257,
18.128, 16.316])
I agree it is not easy to see whats going on. Your array c can be written as a matrix-vector multiplication b_prod_mat * a where a is your array and b_prod_mat consists of specific products of b. All the emphasis is basically to create b_prod_mat.
I am not sure that's better than a for loop but here is a way:
a.dot([np.concatenate((np.zeros(i), (1, ), b[i:-1])) for i in range(len(b))])
What it does it's create line of a big matrix A like this:
1 b0 b0b1 b0b1b2 ... b0b1..bn-1
0 1 b1 b1b2 ... b1..bn-1
0 0 1 b2 ...
...
0 0 0 0 ... 1
Then you simply multiply the vector a with the matrix A and you get your expected result.

Elegant way of reducing list by averaging?

Is there a more elegant way of writing this function?
def reduce(li):
result=[0 for i in xrange((len(li)/2)+(len(li)%2))]
for i,e in enumerate(li):
result[int(i/2)] += e
for i in range(len(result)):
result[i] /= 2
if (len(li)%2 == 1):
result[len(result)-1] *= 2
return result
Here, what it does:
a = [0,2,10,12]
b = [0,2,10,12,20]
reduce(a)
>>> [1,11]
reduce(b)
>>> [1,11,20]
It is taking average of even and odd indexes, and leaves last one as is if list has odd number of elements
what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.
Using numpy it gets pretty elegant:
import numpy as np
np.convolve(a + [a[-1]], [.5,.5], mode='valid')[::2]
array([ 1., 11.])
np.convolve(b + [b[-1]], [.5,.5], mode='valid')[::2]
array([ 1., 11., 20.])
you can convert back to list using list(outputarray).
using numpy is very useful if performance matters, optimized C math code is doing the work:
In [10]: %time a=reduce(list(np.arange(1000000))) #chosen answer
CPU times: user 6.38 s, sys: 0.08 s, total: 6.46 s
Wall time: 6.39 s
In [11]: %time c=np.convolve(list(np.arange(1000000)), [.5,.5], mode='valid')[::2]
CPU times: user 0.59 s, sys: 0.01 s, total: 0.60 s
Wall time: 0.61 s
def reduce(li):
result = [(x+y)/2.0 for x, y in zip(li[::2], li[1::2])]
if len(li) % 2:
result.append(li[-1])
return result
Note that your original code had two bugs: [0,1] would give 0 rather than 0.5, and [5] would give [4] instead of [5].
Here's a one-liner:
[(0.5*(x+y) if y != None else x) for x,y in map(None, *(iter(b),) * 2)]
where b is your original list that you want to reduce.
Edit: Here's a variant on the code I have above that maybe is a bit clearer and relies on itertools:
from itertools import izip_longest
[(0.5*(x+y) if y != None else x) for x,y in izip_longest(*[iter(b)]* 2)]
Here's another attempt at it that seems more straightforward to me because it's all one pass:
def reduce(li):
result = []
it = iter(li)
try:
for i in it:
result.append((i + next(it)) / 2)
except StopIteration:
result.append(li[-1])
return result
Here's my try, using itertools:
import itertools
def reduce(somelist):
odds = itertools.islice(somelist, 0, None, 2)
eves = itertools.islice(somelist, 1, None, 2)
for (x,y) in itertools.izip(odds,evens):
yield( (x + y) / 2.0)
if len(somelist) % 2 != 0 : yield(somelist[-1])
>>> [x for x in reduce([0, 2, 10, 12, 20]) ]
[1, 11, 20]
See also: itertools documentation.
Update: Fixed to divide by float rather than int.

Categories

Resources