Efficient methods of accessing the outputs of generators - python

I'm working through some exercises in my python book and was asked to implement an infinite product to estimate the sine function depending on some amount of iterations k. The code I wrote is the following:
def approx_sin(x,k):
def u(k):
if k==0:
return x
else:
return (1-((x**2)/((k**2)*(pi**2))))
u0=u(0)
yield u0
for n in range(1,k+1):
u0=u(n)*u0
yield u0
Which works as intended. Now I'm asked to sketch this function on some interval for different values of k, meaning we have to extract some values from the generator. To do this, I create the following function:
def sin_sketch(x,k):
return list(itertools.islice(approx_sin(x,k),k,k+1))[0]
And I can now plot sin_sketch(x,n) for some linspace x and a given value for n. My question boils down to, there has to be a better/more time efficient way to extract the value produced by itertools.islice(approx_sin(x,k),k-1,k) rather then converting it to a list, then taking its only element. Any tips?

Creating a list of a single element should not be much of a problem, performance-wise, but you could e.g. also use next to get the first (and only) element in that islice. In this case, the upper bound does not matter at all.
def sin_sketch(x, k):
return next(itertools.islice(approx_sin(x, k), k-1, k))
Or, instead of islice, use next with enumerate and check the index. I did not time which of those is faster, but this might be a bit clearer.
def sin_sketch(x, k):
return next(v for i, v in enumerate(approx_sin(x, k)) if i == k-1)

It seems like a case of Get the nth item of a generator in Python[*]. But I don't particularly like approaches described there, so here's mine.
From what I understand, you want to get k'th value generated by approx_sin(x, k). So let's do just that:
def sin_sketch(x, k):
gen = approx_sin(x, k)
for _ in range(k):
v = next(gen)
return v
This might look a bit bulkier, but I think it's way more explicit and transparent, without requiring itertools.
[*]: Looking through answers there, I found the one that I like even more than my own: https://stackoverflow.com/a/72817329/9288580

I would simplify your appox function a little to become
from math import pi
from itertools import count
def approx_sin(x):
product = 1
for n in count(1):
product = product * (1-((x**2) / ((n**2) * (pi**2))))
yield x * product
that might make the second part of the question more straightforward.

Related

Time complexity of this recursive python k-combination generator function

I was looking for a python k-combination algorithm and found this little beauty here https://stackoverflow.com/a/2837693/553383
Any idea about its T(n) and/or time complexity?
Here is the code that you'll find in above link:
def choose_iter(elements, length):
for i in xrange(len(elements)):
if length == 1:
yield (elements[i],)
else:
for next in choose_iter(elements[i+1:len(elements)], length-1):
yield (elements[i],) + next
def choose(l, k):
return list(choose_iter(l, k))
Assuming this function indeed generates all possible combinations of length k, its time complexity of this function is O(n!/[(n-k)!k!] * k^2).
There are exactly O(n!/[(n-k)!k!]) k-combinations, and we generate each of them.
Let's look on the geration of each. It is done by creating a tuple iteratively. First the 1st element is added, then the 2nd, then the 3rd and so on.
However, cretaing a tuple of length k is O(k), and we actually get O(1+2+...+k) for each tuple creation. Since O(1+2+...+k)=O(k^2), and we do that for each tuple, we can conclude that the total complexity of this function is O(n!/[(n-k)!k!] * k^2).

Python lists, dictionary optimization

I have been attending a couple of hackathons. I am beginning to understand that writing code is not enough. The code has to be optimized. That brings me to my question. Here are two questions that I faced.
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i, j in numbers:
if i != j:
if i+j == k
return i, j
I wrote this function. And I was kind of stuck with optimization.
Next problem.
string = "ksjdkajsdkajksjdalsdjaksda"
def dedup(string):
""" write a function to remove duplicates in the variable string"""
output = []
for i in string:
if i not in output:
output.append(i)
These are two very simple programs that I wrote. But I am stuck with optimization after this. More on this, when we optimize code, how does the complexity reduce? Any pointers will help. Thanks in advance.
Knowing the most efficient Python idioms and also designing code that can reduce iterations and bail out early with an answer is a major part of optimization. Here are a few examples:
List list comprehensions and generators are usually fastest:
With a straightforward nested approach, a generator is faster than a for loop:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((i, j) for i in numbers for j in numbers if i+j == k and i != j)
This is probably faster on average since it only goes though one iteration at most and does not check if a possible result is in numbers unless k-i != i:
def pairsum(numbers, k):
"""Returns two unique values in numbers whose sum is k"""
return next((k-i, i) for i in numbers if k-i != i and k-i in numbers)
Ouput:
>>> pairsum([1,2,3,4,5,6], 8)
(6, 2)
Note: I assumed numbers was a flat list since the doc string did not mention tuples and it makes the problem more difficult which is what I would expect in a competition.
For the second problem, if you are to create your own function as opposed to just using ''.join(set(s)) you were close:
def dedup(s):
"""Returns a string with duplicate characters removed from string s"""
output = ''
for c in s:
if c not in output:
output += c
return output
Tip: Do not use string as a name
You can also do:
def dedup(s):
for c in s:
s = c + s.replace(c, '')
return s
or a much faster recursive version:
def dedup(s, out=''):
s0, s = s[0], s.replace(s[0], '')
return dedup(s, n + s0) if s else out + s0
but not as fast as set for strings without lots of duplicates:
def dedup(s):
return ''.join(set(s))
Note: set() will not preserve the order of the remaining characters while the other approaches will preserve the order based on first occurrence.
Your first program is a little vague. I assume numbers is a list of tuples or something? Like [(1,2), (3,4), (5,6)]? If so, your program is pretty good, from a complexity standpoint - it's O(n). Perhaps you want a little more Pythonic solution? The neatest way to clean this up would be to join your conditions:
if i != j and i + j == k:
But this simply increases readability. I think it may also add an additional boolean operation, so it might not be an optimization.
I am not sure if you intended for your program to return the first pair of numbers which sum to k, but if you wanted all pairs which meet this requirement, you could write a comprehension:
def pairsum(numbers, k):
return list(((i, j) for i, j in numbers if i != j and i + j == k))
In that example, I used a generator comprehension instead of a list comprehension so as to conserve resources - generators are functions which act like iterators, meaning that they can save memory by only giving you data when you need it. This is called lazy iteration.
You can also use a filter, which is a function which returns only the elements from a set for which a predicate returns True. (That is, the elements which meet a certain requirement.)
import itertools
def pairsum(numbers, k):
return list(itertools.ifilter(lambda t: t[0] != t[1] and t[0] + t[1] == k, ((i, j) for i, j in numbers)))
But this is less readable in my opinion.
Your second program can be optimized using a set. If you recall from any discrete mathematics you may have learned in grade school or university, a set is a collection of unique elements - in other words, a set has no duplicate elements.
def dedup(mystring):
return set(mystring)
The algorithm to find the unique elements of a collection is generally going to be O(n^2) in time if it is O(1) in space - if you allow yourself to allocate more memory, you can use a Binary Search Tree to reduce the time complexity to O(n log n), which is likely how Python sets are implemented.
Your solution took O(n^2) time but also O(n) space, because you created a new list which could, if the input was already a string with only unique elements, take up the same amount of space - and, for every character in the string, you iterated over the output. That's essentially O(n^2) (although I think it's actually O(n*m), but whatever). I hope you see why this is. Read the Binary Search Tree article to see how it improves your code. I don't want to re-implement one again... freshman year was so grueling!
The key to optimization is basically to figure out a way to make the code do less work, in terms of the total number of primitive steps that needs to be performed. Code that employs control structures like nested loops quickly contributes to the number of primitive steps needed. Optimization is therefore often about replacing loops iterating over the a full list with something more clever.
I had to change the unoptimized pairsum() method sligtly to make it usable:
def pairsum(numbers, k):
"""
Write a function that returns two values in numbers whose sum is K
"""
for i in numbers:
for j in numbers:
if i != j:
if i+j == k:
return i,j
Here we see two loops, one nested inside the other. When describing the time complexity of a method like this, we often say that it is O(n²). Since when the length of the numbers array passed in grows proportional to n, then the number of primitive steps grows proportional to n². Specifically, the i+j == k conditional is evaluated exactly len(number)**2 times.
The clever thing we can do here is to presort the array at the cost of O(n log(n)) which allows us to hone in on the right answer by evaluating each element of the sorted array at most one time.
def fast_pairsum(numbers, k):
sortedints = sorted(numbers)
low = 0
high = len(numbers) - 1
i = sortedints[0]
j = sortedints[-1]
while low < high:
diff = i + j - k
if diff > 0:
# Too high, let's lower
high -= 1
j = sortedints[high]
elif diff < 0:
# Too low, let's increase.
low += 1
i = sortedints[low]
else:
# Just right
return i, j
raise Exception('No solution')
These kinds of optimization only begin to really matter when the size of the problem becomes large. On my machine the break-even point between pairsum() and fast_pairsum() is with a numbers array containing 13 integers. For smaller arrays pairsum() is faster, and for larger arrays fast_pairsum() is faster. As the size grows fast_pairsum() becomes drastically faster than the unoptimized pairsum().
The clever thing to do for dedup() is to avoid having to linearly scan through the output list to find out if you've already seen a character. This can be done by storing information about which characters you've seen in a set, which has O(log(n)) look-up cost, rather than the O(n) look-up cost of a regular list.
With the outer loop, the total cost becomes O(n log(n)) rather than O(n²).
def fast_dedup(string):
# if we didn't care about the order of the characters in the
# returned string we could simply do
# return set(string)
seen = set()
output = []
seen_add = seen.add
output_append = output.append
for i in string:
if i not in seen:
seen_add(i)
output_append(i)
return output
On my machine the break-even point between dedup() and fast_dedup() is with a string of length 30.
The fast_dedup() method also shows another simple optimization trick: Moving as much of the code out of the loop bodies as possible. Since looking up the add() and append() members in the seen and output objects takes time, it is cheaper to do it once outside the loop bodies and store references to the members in variables that is used repeatedly inside the loop bodies.
To properly optimize Python, one needs to find a good algorithm for the problem and a Python idiom close to that algorithm. Your pairsum example is a good case. First, your implementation appears wrong — numbers is most likely a sequence of numbers, not a sequence of pairs of numbers. Thus a naive implementation would look like this:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
for i in numbers:
for j in numbers:
if i != j and i + j != k:
return i, j
This will perform n^2 iterations, n being the length of numbers. For small ns this is not a problem, but once n gets into hundreds, the nested loops will become visibly slow, and once n gets into thousands, they will become unusable.
An optimization would be to recognize the difference between the inner and the outer loops: the outer loop traverses over numbers exactly once, and is unavoidable. The inner loop, however, is only used to verify that the other number (which has to be k - i) is actually present. This is a mere lookup, which can be made extremely fast by using a dict, or even better, a set:
def pairsum(numbers, k)
"""Write a function that returns two values in numbers whose sum is K"""
numset = set(numbers)
for i in numbers:
if k - i in numset:
return i, k - i
This is not only faster by a constant because we're using a built-in operation (set lookup) instead of a Python-coded loop. It actually does less work because set has a smarter algorithm of doing the lookup, it performs it in constant time.
Optimizing dedup in the analogous fashion is left as an excercise for the reader.
Your string one, order preserving is most easily and should be fairly efficient written as:
from collections import OrderedDict
new_string = ''.join(OrderedDict.fromkeys(old_string))

How to do a sum in python

I was wondering how one can represent a sum in python without loops like here
where we have:
def rosen(x):
"""The Rosenbrock function"""
return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
My function is the following: V(theta) = Sum(i=1->N)[a0*(cos(i*theta)]
Thank you in advance for your help :):)
Your formula is:
V(theta) = Sum(i=1->N)[a0*(cos(i*theta)]
which means: sum all values of a0*(cos(i*theta) for a given value theta in the range 1 to and including N.
This becomes something like this in Python:
def V(theta, N):
return sum(a0*(cos(i*theta)) for i in range(1, N + 1))
Note that you have to pass theta and N to the function. Also note that we are using N + 1 to make sure N is included (as range iterates over the values until, but not including, the last value).
something like:
def V(theta,N):
return sum(a0*(cos(i*theta) for i in range(1,N+1))
print V(theta,N)
or you can use lambda:
V =lambda theta,N : sum(a0*(cos(i*theta) for i in range(1,N+1))
print V(theta,N)
Your shown example uses no math functions, just basic arithmetical operations. That's why it works as shown, but math.cos doesn't support lists and so will not work this way.
If you really want to get around without any for, you should use numpy. Numpy's math functions support lists (actually arrays).
This way you can write something like:
from numpy import *
def fun(theta):
return a0*sum(cos(arange(1,N+1)*theta))
Should you be doing a lot of this kind of calculations, it is best to use numpy.

Question on a solution from Google python class day

Hey,
I'm trying to learn a bit about Python so I decided to follow Google's tutorial. Anyway I had a question regarding one of their solution for an exercise.
Where I did it like this way.
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
# +++your code here+++
return sorted(list1 + list2)
However they did it in a more complicated way. So is Google's solution quicker? Because I noticed in the comment lines that the solution should work in "linear" time, which mine probably isn't?
This is their solution
def linear_merge(list1, list2):
# +++your code here+++
# LAB(begin solution)
result = []
# Look at the two lists so long as both are non-empty.
# Take whichever element [0] is smaller.
while len(list1) and len(list2):
if list1[0] < list2[0]:
result.append(list1.pop(0))
else:
result.append(list2.pop(0))
# Now tack on what's left
result.extend(list1)
result.extend(list2)
return result
this could be another soln?
#
def linear_merge(list1, list2):
tmp = []
while len(list1) and len(list2):
#print list1[-1],list2[-1]
if list1[-1] > list2[-1]:
tmp.append(list1.pop())
else:
tmp.append(list2.pop())
#print "tmp = ",tmp
#print list1,list2
tmp = tmp + list1
tmp = tmp + list2
tmp.reverse()
return tmp
Yours is not linear, but that doesn't mean it's slower. Algorithmic complexity ("big-oh notation") is often only a rough guide and always only tells one part of the story.
However, theirs isn't linear either, though it may appear to be at first blush. Popping from a list requires moving all later items, so popping from the front requires moving all remaining elements.
It is a good exercise to think about how to make this O(n). The below is in the same spirit as the given solution, but avoids its pitfalls while generalizing to more than 2 lists for the sake of exercise. For exactly 2 lists, you could remove the heap handling and simply test which next item is smaller.
import heapq
def iter_linear_merge(*args):
"""Yield non-decreasing items from sorted a and b."""
# Technically, [1, 1, 2, 2] isn't an "increasing" sequence,
# but it is non-decreasing.
nexts = []
for x in args:
x = iter(x)
for n in x:
heapq.heappush(nexts, (n, x))
break
while len(nexts) >= 2:
n, x = heapq.heappop(nexts)
yield n
for n in x:
heapq.heappush(nexts, (n, x))
break
if nexts: # Degenerate case of the heap, not strictly required.
n, x = nexts[0]
yield n
for n in x:
yield n
Instead of the last if-for, the while loop condition could be changed to just "nexts", but it is probably worthwhile to specially handle the last remaining iterator.
If you want to strictly return a list instead of an iterator:
def linear_merge(*args):
return list(iter_linear_merge(*args))
With mostly-sorted data, timsort approaches linear. Also, your code doesn't have to screw around with the lists themselves. Therefore, your code is possibly just a bit faster.
But that's what timing is for, innit?
I think the issue here is that the tutorial is illustrating how to implement a well-known algorithm called 'merge' in Python. The tutorial is not expecting you to actually use a library sorting function in the solution.
sorted() is probably O(nlgn); then your solution cannot be linear in the worst case.
It is important to understand how merge() works because it is useful in many other algorithms. It exploits the fact the input lists are individually sorted, moving through each list sequentially and selecting the smallest option. The remaining items are appended at the end.
The question isn't which is 'quicker' for a given input case but about which algorithm is more complex.
There are hybrid variations of merge-sort which fall back on another sorting algorithm once the input list size drops below a certain threshold.

Nicest, efficient way to get result tuple of sequence items fulfilling and not fulfilling condition

(This is professional best practise/ pattern interest, not home work request)
INPUT: any unordered sequence or generator items, function myfilter(item) returns True if filter condition is fulfilled
OUTPUT: (filter_true, filter_false) tuple of sequences of
original type which contain the
elements partitioned according to
filter in original sequence order.
How would you express this without doing double filtering, or should I use double filtering? Maybe filter and loop/generator/list comprehencion with next could be answer?
Should I take out the requirement of keeping the type or just change requirement giving tuple of tuple/generator result, I can not return easily generator for generator input, or can I? (The requirements are self-made)
Here test of best candidate at the moment, offering two streams instead of tuple
import itertools as it
from sympy.ntheory import isprime as myfilter
mylist = xrange(1000001,1010000,2)
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
print 'Hundred primes and non-primes odd numbers'
print '\n'.join( " Prime %i, not prime %i" %
(next(filter_true),next(filter_false))
for i in range(100))
Here is a way to do it which only calls myfilter once for each item and will also work if mylist is a generator
import itertools as it
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
Let's suppose that your probleme is not memory but cpu, myfilter is heavy and you don't want to iterate and filter the original dataset twice. Here are some single pass ideas :
The simple and versatile version (memoryvorous) :
filter_true=[]
filter_false=[]
for item in items:
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The memory friendly version : (doesn't work with generators (unless used with list(items)))
while items:
item=items.pop()
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The generator friendly version :
while True:
try:
item=next(items)
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
except StopIteration:
break
The easy way (but less efficient) is to tee the iterable and filter both of them:
import itertools
left, right = itertools.tee( mylist )
filter_true = (x for x in left if myfilter(x))
filter_false = (x for x in right if myfilter(x))
This is less efficient than the optimal solution, because myfilter will be called repeatedly for each element. That is, if you have tested an element in left, you shouldn't have to re-test it in right because you already know the answer. If you require this optimisation, it shouldn't be hard to implement: have a look at the implementation of tee for clues. You'll need a deque for each returned iterable which you stock with the elements of the original sequence that should go in it but haven't been asked for yet.
I think your best bet will be constructing two separate generators:
filter_true = (x for x in mylist if myfilter(x))
filter_false = (x for x in mylist if not myfilter(x))

Categories

Resources