Elegant way of reducing list by averaging?

Elegant way of reducing list by averaging? - python

Is there a more elegant way of writing this function?
def reduce(li):
result=[0 for i in xrange((len(li)/2)+(len(li)%2))]
for i,e in enumerate(li):
result[int(i/2)] += e
for i in range(len(result)):
result[i] /= 2
if (len(li)%2 == 1):
result[len(result)-1] *= 2
return result
Here, what it does:
a = [0,2,10,12]
b = [0,2,10,12,20]
reduce(a)
>>> [1,11]
reduce(b)
>>> [1,11,20]
It is taking average of even and odd indexes, and leaves last one as is if list has odd number of elements

what you actually want to do is to apply a moving average of 2 samples trough your list, mathematically you convolve a window of [.5,.5], then take just the even samples. To avoid dividing by two the last element of odd arrays, you should duplicate it, this does not affect even arrays.
Using numpy it gets pretty elegant:
import numpy as np
np.convolve(a + [a[-1]], [.5,.5], mode='valid')[::2]
array([ 1., 11.])
np.convolve(b + [b[-1]], [.5,.5], mode='valid')[::2]
array([ 1., 11., 20.])
you can convert back to list using list(outputarray).
using numpy is very useful if performance matters, optimized C math code is doing the work:
In [10]: %time a=reduce(list(np.arange(1000000))) #chosen answer
CPU times: user 6.38 s, sys: 0.08 s, total: 6.46 s
Wall time: 6.39 s
In [11]: %time c=np.convolve(list(np.arange(1000000)), [.5,.5], mode='valid')[::2]
CPU times: user 0.59 s, sys: 0.01 s, total: 0.60 s
Wall time: 0.61 s

def reduce(li):
result = [(x+y)/2.0 for x, y in zip(li[::2], li[1::2])]
if len(li) % 2:
result.append(li[-1])
return result
Note that your original code had two bugs: [0,1] would give 0 rather than 0.5, and [5] would give [4] instead of [5].

Here's a one-liner:
[(0.5*(x+y) if y != None else x) for x,y in map(None, *(iter(b),) * 2)]
where b is your original list that you want to reduce.
Edit: Here's a variant on the code I have above that maybe is a bit clearer and relies on itertools:
from itertools import izip_longest
[(0.5*(x+y) if y != None else x) for x,y in izip_longest(*[iter(b)]* 2)]

Here's another attempt at it that seems more straightforward to me because it's all one pass:
def reduce(li):
result = []
it = iter(li)
try:
for i in it:
result.append((i + next(it)) / 2)
except StopIteration:
result.append(li[-1])
return result

Here's my try, using itertools:
import itertools
def reduce(somelist):
odds = itertools.islice(somelist, 0, None, 2)
eves = itertools.islice(somelist, 1, None, 2)
for (x,y) in itertools.izip(odds,evens):
yield( (x + y) / 2.0)
if len(somelist) % 2 != 0 : yield(somelist[-1])
>>> [x for x in reduce([0, 2, 10, 12, 20]) ]
[1, 11, 20]
See also: itertools documentation.
Update: Fixed to divide by float rather than int.

Related

Fastest way to count frequencies of ordered list entries

I'm counting the occurrences of non-overlapping grouped subsequences of length i in a binary list, so for example if I have a list:
[0, 1, 0, 1, 1, 0, 0, 0, 1, 1], I want to count occurrences of [0,0] (one), [0,1] (two), [1,0] (one), [1,1] (one).
I have created a function that accomplishes this (see below). However, I would like to see if there is anything that can be done to speed up the execution time of the function. I've already got it to be pretty quick (over previous versions of the same function), and it currently takes about ~0.03 seconds for a list of length=100,000 and i=2, and about 30 seconds for a list of length=100,000,000 and i=2. (This is a seemingly linear increase in time in relation to sequence length). However, my end goal is to do this with functions for multiple values of i, with sequences of lengths near 15 billion. Which, assuming linearity holds, would take about 4.2 hours for just i=2 (a higher value of i take longer as it has to count more unique subsequences).
I unsure if there is much more speed that can be gained here(at least, while still working in python), but I am open to suggestions on how to accomplish this faster (with any method or language)?
def subseq_counter(i,l):
"""counts the frequency of unique, non-overlapping, grouped subsequences of length i in a binary list l"""
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
#groups terms into i length subsequences
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
#removes any subsequences at the end that are not of length i
grouped_sort = sorted(grouped)
#necesary so as to make sure the output frequencies correlate to the ascending binary order of the subsequences
grouped_sort_values = Counter(grouped_sort).values()
# counts the elements' frequency
freq_list = list(grouped_sort_values)
return freq_list
I know that a marginally faster execution time can be obtained by removing the grouped_sorted line, however, I need to be able to access the frequencies in correlation to the ascening binary order of the subsequences (so for i=2 that would be [0,0],[0,1],[1,0],[1,1]) and have not figured about a better way around this.

I don't know if is faster, but try
import numpy as np
# create data
bits = np.random.randint(0, 2, 10000)
def subseq_counter(i: int, l: np.array):
"""
Counts the number of subsequences of length l in the array i
"""
# the list l is reshaped as a matrix of i columns, and
# matrix-multiplied by the binary weigts "power of 2"
# | [[2**2],
# | [2**1],
# | [2**0]]
# |____________________
# [[1,0,1], | 1*4 + 0*2 + 1*1 = 5
# [0,1,0], | 0*4 + 1*2 + 0*1 = 2
# ..., | ....
# [1,1,1]] | 1*4 + 1*2 + 1*1 = 7
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
print(f"Counts for {i} bits:")
for u, c in zip(unique, counts):
print(f"{u:0{i}b}:{c}")
return unique, counts
subseq_counter(2,bits)
subseq_counter(3,bits)
>>> Counts for 2 bits:
>>> 00:1264
>>> 01:1279
>>> 10:1237
>>> 11:1220
>>> Counts for 3 bits:
>>> 000:425
>>> 001:429
>>> 010:411
>>> 011:395
>>> 100:437
>>> 101:412
>>> 110:407
>>> 111:417
what it does is to reshape the list into an array of n rows by i columns, and converting to integer by multiplying by 2**n, converting 00 to 0, 01 to 1, 10 to 2 and 11 to 3, then doing the counting with np.unique()

Benchmark including some new solutions from me:
For i=2:
2.9 s ± 0.0 s Kelly_NumPy
3.7 s ± 0.0 s Kelly_bytes_count
6.6 s ± 0.0 s Kelly_zip
7.8 s ± 0.1 s Colim_numpy
8.4 s ± 0.0 s Paul_genzip
8.6 s ± 0.0 s Kelly_bytes_split2
10.5 s ± 0.0 s Kelly_bytes_slices2
10.6 s ± 0.1 s Kelly_bytes_split1
16.1 s ± 0.0 s Kelly_bytes_slices1
20.9 s ± 0.1 s constantstranger
45.1 s ± 0.3 s original
For i=5:
2.3 s ± 0.0 s Kelly_NumPy
3.8 s ± 0.0 s Kelly_zip
4.5 s ± 0.0 s Paul_genzip
4.5 s ± 0.0 s Kelly_bytes_split2
5.2 s ± 0.0 s Kelly_bytes_split1
5.4 s ± 0.0 s Kelly_bytes_slices2
7.1 s ± 0.0 s Colim_numpy
7.2 s ± 0.0 s Kelly_bytes_slices1
9.3 s ± 0.0 s constantstranger
20.6 s ± 0.0 s Kelly_bytes_count
25.3 s ± 0.1 s original
This is for a list of length n=1e6, times multiplied by 100 so it somewhat reflects your timings with length 1e8. I minimally modified the other solutions so they do what your original does, i.e., take a list of into and return a list of into in the correct order. One or two of my slower solutions only work if the length is a multiple of their block size, I didn't bother making them work for all lengths since they're slower anyway.
Full code (Try it online!):
def Kelly_NumPy(i, l):
a = np.frombuffer(bytes(l), np.int8)
stop = a.size // i * i
s = a[:stop:i]
for j in range(1, i):
s = (s << 1) | a[j:stop:i]
return np.unique(s, return_counts=True)[1].tolist()
def Kelly_zip(i, l):
ctr = Counter(zip(*[iter(l)]*i))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices1(i, l):
a = bytes(l)
slices = [a[j:j+i] for j in range(0, len(a)//i*i, i)]
ctr = Counter(slices)
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_slices2(i, l):
a = bytes(l)
ig = itemgetter(*(slice(j, j+i) for j in range(0, 1000*i, i)))
ctr = Counter(chain.from_iterable(
ig(a[k:k+1000*i])
for k in range(0, len(l), 1000*i)
))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_count(i, l):
n = len(l)
a = bytes(l)
b = bytearray([2]) * (n + n//i)
for j in range(i):
b[j+1::i+1] = a[j::i]
a = b
ss = [bytes([2])]
for _ in range(i):
ss = [s+b for s in ss for b in [bytes([0]), bytes([1])]]
return [a.count(s) for s in ss]
def Kelly_bytes_split1(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (stop + n - 1)
for j in range(i):
b[j::i+1] = a[j::i]
ctr = Counter(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def Kelly_bytes_split2(i, l):
n = len(l) // i
stop = n * i
a = bytes(l)
sep = bytearray([2])
b = sep * (5000*i + 4999)
ctr = Counter()
for k in range(0, stop, 5000*i):
for j in range(i):
b[j::i+1] = a[k+j:k+5000*i+j:i]
ctr.update(bytes(b).split(sep))
return [v for k, v in sorted(ctr.items())]
def original(i,l):
grouped = [str(l[k:k + i]) for k in range(0, len(l), i)]
if len(grouped[len(grouped) - 1]) != len(grouped[0]):
grouped.pop(len(grouped) - 1)
grouped_sort = sorted(grouped)
grouped_sort_values = Counter(grouped_sort).values()
freq_list = list(grouped_sort_values)
return freq_list
def Paul_genzip(subseq_len, sequence):
ctr = Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
return [v for k, v in sorted(ctr.items())]
def constantstranger(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
def Colim_numpy(i: int, l):
l = np.array(l)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2**np.arange(i-1,-1,-1).T)
unique, counts = np.unique(iBits, return_counts=True)
return counts.tolist()
funcs = [
original,
Colim_numpy,
Paul_genzip,
constantstranger,
Kelly_NumPy,
Kelly_bytes_count,
Kelly_zip,
Kelly_bytes_slices1,
Kelly_bytes_slices2,
Kelly_bytes_split1,
Kelly_bytes_split2,
]
from time import time
import os
from collections import Counter
from itertools import repeat, chain, product
import numpy as np
from operator import itemgetter
from statistics import mean, stdev
n = 10**6
i = 2
times = {f: [] for f in funcs}
def stats(f):
ts = [t/n*1e8 for t in sorted(times[f])[:3]]
return f'{mean(ts):4.1f} s ± {stdev(ts):3.1f} s '
for _ in range(10):
l = [b % 2 for b in os.urandom(n)]
expect = None
for f in funcs:
t = time()
result = f(i, l)
t = time() - t
times[f].append(t)
if expect is None:
expect = result
else:
assert result == expect
for f in sorted(funcs, key=stats):
print(stats(f), f.__name__,)

Not really sure I understood that last part about the order. It seems unnecessary to build a giant list of subsequences. Use a generator to yield the subsequences to the counter - that way you also don't have to fiddle with indices:
from collections import Counter
def count_subsequences(sequence, subseq_len=2):
return Counter(subseq for subseq in zip(*[iter(sequence)] * subseq_len))
sequence = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
counter = count_subsequences(sequence)
for subseq in (0, 0), (0, 1), (1, 0), (1, 1):
print("{}: {}".format(subseq, counter[subseq]))
Output:
(0, 0): 1
(0, 1): 2
(1, 0): 1
(1, 1): 1
>>>
In this case, the function returns the counter object itself, and the calling code displays the results in some order.

This is much faster. It uses Kelly's idea of using numpy.frombuffer instead of converting the list to numpy array, and uses Pandas to count unique values, which is faster than numpy.unique for more than 100 000 results
import pandas as pd
def subseq_counter(i: int, l):
l = np.frombuffer(bytes(l), np.int8)
iBits = l[:i*(l.size//i)].reshape(-1, i)#(2 **np.arange(i-1, -1, -1).T).astype(np.int8)
# bug fix: when not enough data, (higly probable for large i),
# iBits do not has every possible value, so returning unique values
# as list may lose information
answer = [0]*2**i # empty counter including all possible values
if len(iBits) > 100000:
for i, v in pd.value_counts(iBits).items():
answer[i] = v
else:
unique, count = np.unique(iBits, return_counts=True)
for i, v in zip(unique, count):
answer[i] = v
return answer

This is a way to do it:
from collections import Counter
from itertools import product
def subseq_counter(i,l):
freq_list = [0] * 2 ** i
binaryTupToInt = {binTup:j for j, binTup in enumerate(product((0,1),repeat=i))}
c = Counter(binaryTupToInt[tuple(l[k:k+i])] for k in range(0, len(l) // i * i, i))
for k, v in c.items():
freq_list[k] = v
return freq_list
l = [0, 1, 0, 1, 1, 0, 0, 0, 1, 1]
i = 2
print(subseq_counter(i, l))
Ouput:
[1, 2, 1, 1]
Notes:
Using the above code and changing i to 3 gives:
[0, 1, 1, 0, 0, 0, 1, 0]
This is showing the frequency for all possible binary values of length 3 in ascending order beginning with 0 (binary 0,0,0) and ending with 7 (binary 1,1,1). In other words, 0,0,0 occurs 0 times, 0,0,1 occurs 1 time, 0,1,0 occurs 1 time, 0,1,1 occurs 0 times, etc., through 1,1,1 which occurs 0 times.
Using the code in the question with i changed to 3 gives:
[1, 1, 1]
This output seems hard to decipher, as it isn't labeled so that we can easily see that the results with a non-zero value correspond to the 3-digit binary values 0,0,1, 0,1,0 and 1,1,0.
UPDATE:
Here's a benchmark of several approaches on an input list of length 55 million (with i set to 2) including OP's, counting sort (this answer), numpy including list-to-ndarray conversion overhead, and numpy without the overhead:
foo_1 output:
[10000000, 15000000, 15000000, 15000000]
foo_2 output:
[10000000, 15000000, 15000000, 15000000]
foo_3 output:
[10000000 15000000 15000000 15000000]
foo_4 output:
[10000000 15000000 15000000 15000000]
Timeit results:
foo_1 (OP) ran in 32.20719700001064 seconds using 1 iterations
foo_2 (counting sort) ran in 17.91718759998912 seconds using 1 iterations
foo_3 (numpy with list-to-array conversion) ran in 9.713831000000937 seconds using 1 iterations
foo_4 (numpy) ran in 1.695262699999148 seconds using 1 iterations
The clear winner is numpy, though unless the calling program can easily be changed to use ndarrays, the required conversion slows things down by a factor of about 5x in this example.

Finding a maximum of the product of three integers in a list of any size

I just finished working on this practice problem:
Given a list of integers, return the largest product that can be made by multiplying any three integers.
For example, if the list is [-10, -10, 5, 2], we should return 500, since that's -10 * -10 * 5.
I've written the following code and it seems to work, but I feel like it could be simpler somehow. Any ideas?
def maxProduct(lst):
combo = []
for i in range(0,len(lst)):
for j in range(0,len(lst)):
for k in range(0,len(lst)):
if i != j and j != k and i != k:
x = sorted([lst[i]] + [lst[j]] + [lst[k]])
if x not in combo:
combo.append(x)
final = []
for i in combo:
result = 1
for j in i:
result = result * j
final.append(result)
return max(final)

Taking a similar "brute force" approach, you can use some built-ins to make this a one-liner (and some imports):
from itertools import combinations
from functools import reduce
from operator import mul
result = max(reduce(mul, p, 1) for p in combinations(arr, 3))

There are only two non-degenerate possibilities, given that the list is of integers:
The three maximum positive integers.
The maximum positive integer and the two minimum negative integers.
Sort the list.
return max( values[-1] * values[-2] * values[-3],
values[-1] * values[ 0] * values[ 1])

Change x = sorted([lst[i]] + [lst[j]] + [lst[k]]) to x = sorted([lst[i], lst[j], lst[k]])

Here is a generic and yet very fast way (O(n) for small k) of doing this (< 6ms for 100K elements and k=3).
It is generic in that you can look for the max product of any k>0 values (even or odd) if you so chose, not just k=3.
The idea is to look for the top k and bottom k values (hi, resp. lo). Instead of sorting (O(n log n)), we use heapq (O(n), used twice) to find hi and lo.
Usually hi will have the most positive values and lo will have the most negative values, but it works if there are negatives in hi or positives in lo. Then we look for k combinations among the 2k-list hi + lo.
from itertools import combinations
from functools import reduce
from operator import mul
import heapq
def prod(lst):
return reduce(mul, lst, 1)
def find_max_prod(lst, k):
hi = heapq.nlargest(k, lst)
lo = heapq.nsmallest(min(k, len(lst) - len(pos)), lst)
return max((prod(vals), vals) for vals in combinations(hi + lo, r=k))
This also returns both the product and the values that contributed to it (for inspection):
>>> find_max_prod([-10, -10, 5, 2], k=3)
(500, (-10, -10, 5))
>>> find_max_prod([-10, -10, 7, 3, 1, 20, -30, 5, 2], k=3)
(6000, (-30, -10, 20))
>>> find_max_prod([-10, -10, 7, 3, 1, 20, -30, 5, 2], k=4)
(42000, (-30, -10, 20, 7))
Speed
n = 100_000
lst = np.random.randint(-20, 20, size=n).tolist()
%timeit find_max_prod(lst, 3)
# 5.7 ms ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Notes
It works if there are no positive numbers (or no negative ones):
>>> find_max_prod([-1,-2,-1,-5], 3)
(-2, (-1, -1, -2))
>>> find_max_prod([5,6,7,8], 3)
(336, (8, 7, 6))
It works fine with floats as well:
>>> lst = np.random.normal(size=n).tolist()
... lst[:4]
[-1.2645437227178908,
0.04542859270503795,
-0.17997935575118532,
-0.03485546753207921]
>>> find_max_prod(lst, 3)
(72.00185172194192,
(-4.159094140171658, -4.145875048073711, 4.175694390863968))

Simple naive implementation (O(n3)):
import itertools
def max_product_of_three(values):
return max(
x * y * z
for x, y, z in itertools.combinations(values, 3)
)
You can probably improve this by sorting values first.
Edit: here is a much more performant (O(n)) solution
import heapq
def max_product_of_three(values):
x, y, z = heapq.nlargest(3, values)
a, b = heapq.nsmallest(2, values)
return max(
x * a * b,
x * y * z
)

Strange results while using numpy arrays

I am getting two different results for some inputs but not others. Let me explain using the concrete example. I have the following function:
In [86]: def f(x, p):
...: n = len(p)
...: tot = 0
...: for i in range(n):
...: tot += p[i] * x**(n-i-1)
...: return tot
p is an array with very small values:
In [87]: p
Out[87]:
array([ -3.93107522e-45, 9.17048746e-40, -8.11593366e-35,
3.05584286e-30, -1.06065846e-26, -3.03946945e-21,
1.05944707e-16, -1.56986924e-12, 1.07293061e-08,
-3.22670121e-05, 1.12072912e-01])
Now consider the outputs:
In [90]: [f(i, p) for i in range(11, 20)]
Out[90]:
[0.11171927108787173,
0.1116872502272328,
0.1116552507123586,
0.11162327253386167,
0.11159131568235707,
0.11155938014846242,
0.1115274659227979,
0.11149557299598616,
0.11146370135865244]
In [88]: [f(i, p) for i in np.array(range(11, 20))]
Out[88]:
[0.11171927108787173,
0.1116872502272328,
0.1116552507123586,
0.11162327253386167,
0.11159131568235707,
0.11155938014846242,
0.1115274659227979,
0.11149557299598616,
0.11146370135865244]
As you can see, these outputs are exactly same as they should be. The only difference is that in one case I am using range(a, b) while in the other case I am converting that range to a numpy array.
But now, let us change the values inside the range:
In [91]: [f(i, p) for i in range(50001, 50010)]
Out[91]:
[-0.011943965521167818,
-0.011967640114171604,
-0.011991315947644229,
-0.012014993019120554,
-0.012038671327427961,
-0.012062350870605351,
-0.012086031644648818,
-0.012109713648648865,
-0.012133396879791744]
In [92]: [f(i, p) for i in np.array(range(50001, 50010))]
Out[92]:
[491.26519430165808,
491.32457916465478,
491.38395932037008,
491.38726606180143,
491.44663641006275,
491.50600185375316,
491.56536239249812,
491.56864971072332,
491.6280006336612]
And they are not even close! Am I missing something ridiculously simple?

You're missing the fact that ordinary Python integers are arbitrary-precision, while NumPy integers are fixed-size.
This:
x**(n-i-1)
overflows with the NumPy inputs.

The values in f(x, p) for x in the error case, are type numpy.int32. They can overflow. The fix in this case is relatively straight forward, convert the values to int:
tot += p[i] * np.asarray(x).astype(int) ** (n - i - 1)

What is the quickest way to get a number with unique digits in python?

Lemme clarify:
What would be the fastest way to get every number with all unique digits between two numbers. For example, 10,000 and 100,000.
Some obvious ones would be 12,345 or 23,456. I'm trying to find a way to gather all of them.
for i in xrange(LOW, HIGH):
str_i = str(i)
...?

Use itertools.permutations:
from itertools import permutations
result = [
a * 10000 + b * 1000 + c * 100 + d * 10 + e
for a, b, c, d, e in permutations(range(10), 5)
if a != 0
]
I used the fact, that:
numbers between 10000 and 100000 have either 5 or 6 digits, but only 6-digit number here does not have unique digits,
itertools.permutations creates all combinations, with all orderings (so both 12345 and 54321 will appear in the result), with given length,
you can do permutations directly on sequence of integers (so no overhead for converting the types),
EDIT:
Thanks for accepting my answer, but here is the data for the others, comparing mentioned results:
>>> from timeit import timeit
>>> stmt1 = '''
a = []
for i in xrange(10000, 100000):
s = str(i)
if len(set(s)) == len(s):
a.append(s)
'''
>>> stmt2 = '''
result = [
int(''.join(digits))
for digits in permutations('0123456789', 5)
if digits[0] != '0'
]
'''
>>> setup2 = 'from itertools import permutations'
>>> stmt3 = '''
result = [
x for x in xrange(10000, 100000)
if len(set(str(x))) == len(str(x))
]
'''
>>> stmt4 = '''
result = [
a * 10000 + b * 1000 + c * 100 + d * 10 + e
for a, b, c, d, e in permutations(range(10), 5)
if a != 0
]
'''
>>> setup4 = setup2
>>> timeit(stmt1, number=100)
7.955858945846558
>>> timeit(stmt2, setup2, number=100)
1.879319190979004
>>> timeit(stmt3, number=100)
8.599710941314697
>>> timeit(stmt4, setup4, number=100)
0.7493319511413574
So, to sum up:
solution no. 1 took 7.96 s,
solution no. 2 (my original solution) took 1.88 s,
solution no. 3 took 8.6 s,
solution no. 4 (my updated solution) took 0.75 s,
Last solution looks around 10x faster than solutions proposed by others.
Note: My solution has some imports that I did not measure. I assumed your imports will happen once, and code will be executed multiple times. If it is not the case, please adapt the tests to your needs.
EDIT #2: I have added another solution, as operating on strings is not even necessary - it can be achieved by having permutations of real integers. I bet this can be speed up even more.

Cheap way to do this:
for i in xrange(LOW, HIGH):
s = str(i)
if len(set(s)) == len(s):
# number has unique digits
This uses a set to collect the unique digits, then checks to see that there are as many unique digits as digits in total.

List comprehension will work a treat here (logic stolen from nneonneo):
[x for x in xrange(LOW,HIGH) if len(set(str(x)))==len(str(x))]
And a timeit for those who are curious:
> python -m timeit '[x for x in xrange(10000,100000) if len(set(str(x)))==len(str(x))]'
10 loops, best of 3: 101 msec per loop

Here is an answer from scratch:
def permute(L, max_len):
allowed = L[:]
results, seq = [], range(max_len)
def helper(d):
if d==0:
results.append(''.join(seq))
else:
for i in xrange(len(L)):
if allowed[i]:
allowed[i]=False
seq[d-1]=L[i]
helper(d-1)
allowed[i]=True
helper(max_len)
return results
A = permute(list("1234567890"), 5)
print A
print len(A)
print all(map(lambda a: len(set(a))==len(a), A))
It perhaps could be further optimized by using an interval representation of the allowed elements, although for n=10, I'm not sure it will make a difference. I could also transform the recursion into a loop, but in this form it is more elegant and clear.
Edit: Here are the timings of the various solutions
2.75808000565 (My solution)
8.22729802132 (Sol 1)
1.97218298912 (Sol 2)
9.659760952 (Sol 3)
0.841020822525 (Sol 4)

no_list=['115432', '555555', '1234567', '5467899', '3456789', '987654', '444444']
rep_list=[]
nonrep_list=[]
for no in no_list:
u=[]
for digit in no:
# print(digit)
if digit not in u:
u.append(digit)
# print(u)
#iF REPEAT IS THERE
if len(no) != len(u):
# print(no)
rep_list.append(no)
#If repeatation is not there
else:
nonrep_list.append(no)
print('Numbers which have no repeatation are=',rep_list)
print('Numbers which have repeatation are=',nonrep_list)

Fibonacci numbers, with an one-liner in Python 3?

I know there is nothing wrong with writing with proper function structure, but I would like to know how can I find nth fibonacci number with most Pythonic way with a one-line.
I wrote that code, but It didn't seem to me best way:
>>> fib = lambda n:reduce(lambda x, y: (x[0]+x[1], x[0]), [(1,1)]*(n-2))[0]
>>> fib(8)
13
How could it be better and simplier?

fib = lambda n:reduce(lambda x,n:[x[1],x[0]+x[1]], range(n),[0,1])[0]
(this maintains a tuple mapped from [a,b] to [b,a+b], initialized to [0,1], iterated N times, then takes the first tuple element)
>>> fib(1000)
43466557686937456435688527675040625802564660517371780402481729089536555417949051
89040387984007925516929592259308032263477520968962323987332247116164299644090653
3187938298969649928516003704476137795166849228875L
(note that in this numbering, fib(0) = 0, fib(1) = 1, fib(2) = 1, fib(3) = 2, etc.)
(also note: reduce is a builtin in Python 2.7 but not in Python 3; you'd need to execute from functools import reduce in Python 3.)

A rarely seen trick is that a lambda function can refer to itself recursively:
fib = lambda n: n if n < 2 else fib(n-1) + fib(n-2)
By the way, it's rarely seen because it's confusing, and in this case it is also inefficient. It's much better to write it on multiple lines:
def fibs():
a = 0
b = 1
while True:
yield a
a, b = b, a + b

I recently learned about using matrix multiplication to generate Fibonacci numbers, which was pretty cool. You take a base matrix:
[1, 1]
[1, 0]
and multiply it by itself N times to get:
[F(N+1), F(N)]
[F(N), F(N-1)]
This morning, doodling in the steam on the shower wall, I realized that you could cut the running time in half by starting with the second matrix, and multiplying it by itself N/2 times, then using N to pick an index from the first row/column.
With a little squeezing, I got it down to one line:
import numpy
def mm_fib(n):
return (numpy.matrix([[2,1],[1,1]])**(n//2))[0,(n+1)%2]
>>> [mm_fib(i) for i in range(20)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]

This is a closed expression for the Fibonacci series that uses integer arithmetic, and is quite efficient.
fib = lambda n:pow(2<<n,n+1,(4<<2*n)-(2<<n)-1)%(2<<n)
>> fib(1000)
4346655768693745643568852767504062580256466051737178
0402481729089536555417949051890403879840079255169295
9225930803226347752096896232398733224711616429964409
06533187938298969649928516003704476137795166849228875L
It computes the result in O(log n) arithmetic operations, each acting on integers with O(n) bits. Given that the result (the nth Fibonacci number) is O(n) bits, the method is quite reasonable.
It's based on genefib4 from http://fare.tunes.org/files/fun/fibonacci.lisp , which in turn was based on an a less efficient closed-form integer expression of mine (see: http://paulhankin.github.io/Fibonacci/)

If we consider the "most Pythonic way" to be elegant and effective then:
def fib(nr):
return int(((1 + math.sqrt(5)) / 2) ** nr / math.sqrt(5) + 0.5)
wins hands down. Why use a inefficient algorithm (and if you start using memoization we can forget about the oneliner) when you can solve the problem just fine in O(1) by approximation the result with the golden ratio? Though in reality I'd obviously write it in this form:
def fib(nr):
ratio = (1 + math.sqrt(5)) / 2
return int(ratio ** nr / math.sqrt(5) + 0.5)
More efficient and much easier to understand.

This is a non-recursive (anonymous) memoizing one liner
fib = lambda x,y=[1,1]:([(y.append(y[-1]+y[-2]),y[-1])[1] for i in range(1+x-len(y))],y[x])[1]

fib = lambda n, x=0, y=1 : x if not n else fib(n-1, y, x+y)
run time O(n), fib(0) = 0, fib(1) = 1, fib(2) = 1 ...

I'm Python newcomer, but did some measure for learning purposes. I've collected some fibo algorithm and took some measure.
from datetime import datetime
import matplotlib.pyplot as plt
from functools import wraps
from functools import reduce
from functools import lru_cache
import numpy
def time_it(f):
#wraps(f)
def wrapper(*args, **kwargs):
start_time = datetime.now()
f(*args, **kwargs)
end_time = datetime.now()
elapsed = end_time - start_time
elapsed = elapsed.microseconds
return elapsed
return wrapper
#time_it
def fibslow(n):
if n <= 1:
return n
else:
return fibslow(n-1) + fibslow(n-2)
#time_it
#lru_cache(maxsize=10)
def fibslow_2(n):
if n <= 1:
return n
else:
return fibslow_2(n-1) + fibslow_2(n-2)
#time_it
def fibfast(n):
if n <= 1:
return n
a, b = 0, 1
for i in range(1, n+1):
a, b = b, a + b
return a
#time_it
def fib_reduce(n):
return reduce(lambda x, n: [x[1], x[0]+x[1]], range(n), [0, 1])[0]
#time_it
def mm_fib(n):
return (numpy.matrix([[2, 1], [1, 1]])**(n//2))[0, (n+1) % 2]
#time_it
def fib_ia(n):
return pow(2 << n, n+1, (4 << 2 * n) - (2 << n)-1) % (2 << n)
if __name__ == '__main__':
X = range(1, 200)
# fibslow_times = [fibslow(i) for i in X]
fibslow_2_times = [fibslow_2(i) for i in X]
fibfast_times = [fibfast(i) for i in X]
fib_reduce_times = [fib_reduce(i) for i in X]
fib_mm_times = [mm_fib(i) for i in X]
fib_ia_times = [fib_ia(i) for i in X]
# print(fibslow_times)
# print(fibfast_times)
# print(fib_reduce_times)
plt.figure()
# plt.plot(X, fibslow_times, label='Slow Fib')
plt.plot(X, fibslow_2_times, label='Slow Fib w cache')
plt.plot(X, fibfast_times, label='Fast Fib')
plt.plot(X, fib_reduce_times, label='Reduce Fib')
plt.plot(X, fib_mm_times, label='Numpy Fib')
plt.plot(X, fib_ia_times, label='Fib ia')
plt.xlabel('n')
plt.ylabel('time (microseconds)')
plt.legend()
plt.show()
The result is usually the same.
Fiboslow_2 with recursion and cache, Fib integer arithmetic and Fibfast algorithms seems the best ones. Maybe my decorator not the best thing to measure performance, but for an overview it seemed good.

Another example, taking the cue from Mark Byers's answer:
fib = lambda n,a=0,b=1: a if n<=0 else fib(n-1,b,a+b)

I wanted to see if I could create an entire sequence, not just the final value.
The following will generate a list of length 100. It excludes the leading [0, 1] and works for both Python2 and Python3. No other lines besides the one!
(lambda i, x=[0,1]: [(x.append(x[y+1]+x[y]), x[y+1]+x[y])[1] for y in range(i)])(100)
Output
[1,
2,
3,
...
218922995834555169026,
354224848179261915075,
573147844013817084101]

Here's an implementation that doesn't use recursion, and only memoizes the last two values instead of the whole sequence history.
nthfib() below is the direct solution to the original problem (as long as imports are allowed)
It's less elegant than using the Reduce methods above, but, although slightly different that what was asked for, it gains the ability to to be used more efficiently as an infinite generator if one needs to output the sequence up to the nth number as well (re-writing slightly as fibgen() below).
from itertools import imap, islice, repeat
nthfib = lambda n: next(islice((lambda x=[0, 1]: imap((lambda x: (lambda setx=x.__setitem__, x0_temp=x[0]: (x[1], setx(0, x[1]), setx(1, x0_temp+x[1]))[0])()), repeat(x)))(), n-1, None))
>>> nthfib(1000)
43466557686937456435688527675040625802564660517371780402481729089536555417949051
89040387984007925516929592259308032263477520968962323987332247116164299644090653
3187938298969649928516003704476137795166849228875L
from itertools import imap, islice, repeat
fibgen = lambda:(lambda x=[0,1]: imap((lambda x: (lambda setx=x.__setitem__, x0_temp=x[0]: (x[1], setx(0, x[1]), setx(1, x0_temp+x[1]))[0])()), repeat(x)))()
>>> list(islice(fibgen(),12))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]

def fib(n):
x =[0,1]
for i in range(n):
x=[x[1],x[0]+x[1]]
return x[0]
take the cue from Jason S, i think my version have a better understanding.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and update a variable within a list comprehension:
fib = lambda n,x=(0,1):[x := (x[1], sum(x)) for i in range(n+1)][-1][0]
This:
Initiates the duo n-1 and n-2 as a tuple x=(0, 1)
As part of a list comprehension looping n times, x is updated via an assignment expression (x := (x[1], sum(x))) to the new n-1 and n-2 values
Finally, we return from the last iteration, the first part of the x

To solve this problem I got inspired by a similar question here in Stackoverflow Single Statement Fibonacci, and I got this single line function that can output a list of fibonacci sequence. Though, this is a Python 2 script, not tested on Python 3:
(lambda n, fib=[0,1]: fib[:n]+[fib.append(fib[-1] + fib[-2]) or fib[-1] for i in range(n-len(fib))])(10)
assign this lambda function to a variable to reuse it:
fib = (lambda n, fib=[0,1]: fib[:n]+[fib.append(fib[-1] + fib[-2]) or fib[-1] for i in range(n-len(fib))])
fib(10)
output is a list of fibonacci sequence:
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

I don't know if this is the most pythonic method but this is the best i could come up with:->
Fibonacci = lambda x,y=[1,1]:[1]*x if (x<2) else ([y.append(y[q-1] + y[q-2]) for q in range(2,x)],y)[1]
The above code doesn't use recursion, just a list to store the values.

My 2 cents
# One Liner
def nthfibonacci(n):
return long(((((1+5**.5)/2)**n)-(((1-5**.5)/2)**n))/5**.5)
OR
# Steps
def nthfibonacci(nth):
sq5 = 5**.5
phi1 = (1+sq5)/2
phi2 = -1 * (phi1 -1)
n1 = phi1**(nth+1)
n2 = phi2**(nth+1)
return long((n1 - n2)/sq5)

Why not use a list comprehension?
from math import sqrt, floor
[floor(((1+sqrt(5))**n-(1-sqrt(5))**n)/(2**n*sqrt(5))) for n in range(100)]
Without math imports, but less pretty:
[int(((1+(5**0.5))**n-(1-(5**0.5))**n)/(2**n*(5**0.5))) for n in range(100)]

import math
sqrt_five = math.sqrt(5)
phi = (1 + sqrt_five) / 2
fib = lambda n : int(round(pow(phi, n) / sqrt_five))
print([fib(i) for i in range(1, 26)])
single line lambda fibonacci but with some extra variables

Similar:
def fibonacci(n):
f=[1]+[0]
for i in range(n):
f=[sum(f)] + f[:-1]
print f[1]

A simple Fibonacci number generator using recursion
fib = lambda x: 1-x if x < 2 else fib(x-1)+fib(x-2)
print fib(100)
This takes forever to calculate fib(100) in my computer.
There is also closed form of Fibonacci numbers.
fib = lambda n: int(1/sqrt(5)*((1+sqrt(5))**n-(1-sqrt(5))**n)/2**n)
print fib(50)
This works nearly up to 72 numbers due to precision problem.

Lambda with logical operators
fibonacci_oneline = lambda n = 10, out = []: [ out.append(i) or i if i <= 1 else out.append(out[-1] + out[-2]) or out[-1] for i in range(n)]

here is how i do it ,however the function returns None for the list comprehension line part to allow me to insert a loop inside ..
so basically what it does is appending new elements of the fib seq inside of a list which is over two elements
>>f=lambda list,x :print('The list must be of 2 or more') if len(list)<2 else [list.append(list[-1]+list[-2]) for i in range(x)]
>>a=[1,2]
>>f(a,7)

You can generate once a list with some values and use as needed:
fib_fix = []
fib = lambda x: 1 if x <=2 else fib_fix[x-3] if x-2 <= len(fib_fix) else (fib_fix.append(fib(x-2) + fib(x-1)) or fib_fix[-1])
fib_x = lambda x: [fib(n) for n in range(1,x+1)]
fib_100 = fib_x(100)
than for example:
a = fib_fix[76]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elegant way of reducing list by averaging? - python

def reduce(li): result = [(x+y)/2.0 for x, y in zip(li[::2], li[1::2])] if len(li) % 2: result.append(li[-1]) return result Note that your original code had two bugs: [0,1] would give 0 rather than 0.5, and [5] would give [4] instead of [5].

Here's another attempt at it that seems more straightforward to me because it's all one pass: def reduce(li): result = [] it = iter(li) try: for i in it: result.append((i + next(it)) / 2) except StopIteration: result.append(li[-1]) return result

Related

Fastest way to count frequencies of ordered list entries

Finding a maximum of the product of three integers in a list of any size

Strange results while using numpy arrays

What is the quickest way to get a number with unique digits in python?

Fibonacci numbers, with an one-liner in Python 3?

Categories

Resources