Python Generator/Iterator - python

I am working on improving my python and getting up to speed on generators. I have an object that I am working on to process a series of events. I want the list of events to be pulled sequentially and through various methods. I want to use generators for this purpose (I know I can write something else to do this without them).
Here is the sample code that I've been working on:
def _get_next_event():
def gen():
for i,event in enumerate(range(1,10)):
yield event
iterator = gen()
def run():
return iterator
run.next = iterator.__next__
return run
t = _get_next_event()
t.next()
for x in t():
if x < 5:
print(x)
else:
break
t.next()
This lets me do a for loop on the events as well as pull next one individually via function's next method.
I am implementing this in my class, it looks like this:
def _get_next_event(self):
def gen():
print(self.all_sessions)
for event in self.all_sessions:
yield event['event']
iterator = gen()
def run():
return iterator
run.next = iterator.__next__
return run
However, before it works in the class I have to run it, for example before the for loop I have one of these:
self._get_next_event = self._get_next_event()
I think there should be a more elegant way of doing this... what am I missing?

Usually, generators are... not written like that.
Ordinarily, you'd just use yield in the top-level function:
def _get_next_event():
for i,event in enumerate(range(1,10)):
yield event
You can then just write this:
for event in _get_next_event():
# do something with event
Perhaps you had some reason for doing it the way you've shown, but that reason is not evident from the code you've posted.
(for the record, I'm assuming your generator does not literally look like that, or else I'd tell you to change the whole function body to return range(1, 10))

Related

Two infinite loops alternately?

I'm looking for solution of this problem:
I have two functions:
def foo1():
while True:
print("x")
def foo2():
while True:
print("y")
I would like to run them alternately, like:
run foo1()
stop foo1() run foo2()
stop foo2() run foo1()
Functions can't work simultaneously, at the moment only one of them can work.
Edit: Leave it to #StefanPochmann to shoot down my beautiful answer: It doesn't matter for the question as it is posed, but it is certainly worth mentioning that zip works pair-by-pair. So while you can formally pause alter defined below after an odd number of iterations, at this point 'y' will have been printed already.
If you care about this the solution also due to Stefan is
>>> alter = map(next, it.cycle((foo1(), foo2())))
which is actually pretty clever.
Edit ends
You could convert the functions into generators, then zip them.
>>> import itertools as it
>>>
>>> def foo1():
... while True:
... print("x")
... yield None
...
>>> def foo2():
... while True:
... print("y")
... yield None
...
>>> alter = it.chain.from_iterable(zip(foo1(), foo2()))
>>> while True:
... next(alter)
x
y
x
y
...
or:
>>> for a in alter:
... pass
x
y
x
y
...
I personally favour the itertools approach by Paul Panzer. However, if it is an option, I'd strongly suggest moving the "infinite loop" out of these functions and instead putting it in another "driver" function that calls these two in an alternating fashion.
def f1():
print('x')
def f2():
print('y')
def driver():
while True:
f1()
f2()
I get that this won't be possible to implement in every situation, but it does lead to better design in the end if you can.
There's no way to do what you want with the functions you've shown. However, there are some approaches that might come close, to differing degrees, depending on what details you're willing to change in your current code.
One option to get sort-of parallel execution is to use threads. Python doesn't actually allow more than one thread to be running Python code at once, so this isn't really useful in most situations, but it might be good enough for your specific use case. Unfortunately it does very badly with your current code, since one thread will tend to block the other for a long time, so you'll get lots of xs, followed by lots of ys, as they functions only swap the GIL irregularly.
Another option is to use multiprocessing, which does allow parallel execution. It would do a better job of interleaving your two functions, but the exact timing of the swaps between the two processes will still not be exactly consistent, so you won't always get "xyxyxyxy", you might get something more like "xyxxyyxy" (the exact details may depend on your OS, CPU and how busy your machine is at the time you run the code).
Now we get into solutions that require modifications to the functions. An obvious solution is to simply rewrite them into a single function that does both steps:
def foo_combined():
while True:
print('x')
print('y')
Another option is to make the functions into generators, which you can combine yourself using zip:
def foo1_gen():
while True:
yield 'x'
def foo2_gen():
while True
yield 'y'
You could consume the generators like this:
for x, y in zip(foo1_gen(), foo2_gen()):
print(x)
print(y)
Or, if you want a single iterable:
for val in itertools.chain.from_iterable(zip(foo1_gen(), foo2_gen())):
print(val)
From what you say you don't want/need for them really both be running at the same time, but rather to be able to switch execution back and forth among them. What you've describe is what's called co-operative multitasking.
This can be accomplished by simply making each function a coroutine. In Python that can be accomplished by making them generator functions which is done by simply by adding a yield statement to them at the right spot. That way they will each pause execution until next() is called on them again.
The only slightly tricky part is remembering that each generator function will need to be "primed" once it's called the first time (which actually doesn't execute the code in the function). The first call to next() of the value returned by the first call to the function itself will advances its execution to the location of the first yield in it.
Here's what I mean:
def foo1():
while True:
yield
print("x")
def foo2():
while True:
yield
print("y")
# Sample usage.
def process():
f1 = foo1()
next(f1) # Prime generator.
f2 = foo2()
next(f2) # Prime generator.
while True:
next(f1)
next(f2)
process()
Output:
x
y
x
y
x
...
If you're going to do a lot of this, you can create a decorator that will make generator functions used as coroutines prime themselves automatically (taken from a tutorial David Beazley gave at Pycon 2009 titled Curious Course on Coroutines and Concurrency).
def coroutine(func):
""" Decorator for wrapping coroutine functions so they automatically
prime themselves.
"""
def start(*args,**kwargs):
cr = func(*args,**kwargs)
next(cr)
return cr
return start
#coroutine
def foo1():
while True:
yield
print("x")
#coroutine
def foo2():
while True:
yield
print("y")
# Sample usage.
def process():
f1, f2 = foo1(), foo2() # Each will be "primed" automatically.
for _ in range(10):
next(f1)
next(f2)
So this is my personal way of doing this but there's probably a better one.
def foo(i):
while(True):
i += 1
if(i%2 == 0):
print('x')
else:
print('y')
So the way this works is i is working as an iterator. Then it checks it increments the iterator every time by 1 with i+= 1. Next we check if i is odd or even and because it will alternate. So with every even number you will get x and every odd you you will get y. Hope this helps!

Python: How to return data more that once during a function call

Is there any way that a function can be called once and then return data mutliple times at distinct times?
For example, suppose I had the following code:
def do_something():
for i in range(1, 10):
return 1
However, I want to be able to return more than one piece of data from a single function call, but at asynchronous times, is this possible?
For context, I have a program that generates word documents, converts them into pdfs and then combines them into a single pdf document. I want to be able to call an external function from the GUI to create the documents, and then display a progress bar that displays the current progress through the function.
Edit:
I am already aware of the yield function. I thought my specific problem at the bottom of the question would help. To be clearer, I am looking for is a way to return multiple values from a function and cause a different event for each value returned. Although it may be a poor example, what I want is to be able to do is something similar to a .then(){} in Javascript, but be able to perform the .then(){} using multiple returned values
yield is the thing as mentioned by almost everyone for returning or getting multiple values from a function.
Having read your problem statement. Here is the solution I would devise for you.
Create a function to update status bar, the value of status bar would be fetched from a global variable. So global x=0 at starting, and in the update function it will first update the x = x+1 then after that it will increment the status bar.
def do_something():
for i in range(1, 10):
# fetch and perform operation on that Doc for PDF
update_status_bar()
You want a generator:
def do_something():
for i in range(1,10):
yield i
nums = do_something()
Each time you call next on nums, the body of do_something will continue executing up to the next yield statement, at which point it returns that value.
>>> print next(nums) # Outputs 1
>>> print next(nums) # Outputs 2
>>> ...
You are looking for generators.
Instead of returning you yield (read What does the "yield" keyword do in Python?) from your function.
def do_something():
for i in range(1, 10):
yield i
If you want this function to be called repeatedly you will need to have a wrapper that calls this function repeatedly. Somewhat similar to:
def worker():
for i in do_something():
UpdateProgress(i)
sleep(prgressInterval)
thread = Thread(target=worker)
thread.start()

How could I pass block to a function in Python which is like the way to pass block in Ruby

In Ruby, I can pass a block of code to a method.
For example, I can pass different code blocks to get_schedules_with_retries method.
And invoke the block by calling black.call
I'd like to know how could I implement that logic in Python,
Because I have lots of code blocks, need retry pattern.
I don't like copy paste the retry logic in many code blocks
Example:
def get_schedules_with_retries(&block)
max_retry_count = 3
retry_count = 0
while (retry_count < max_retry_count)
begin
schedules = get_more_raw_schedules
block.call(schedules)
rescue Exception => e
print_error(e)
end
if schedules.count > 0
break
else
retry_count+=1
end
end
return schedules
end
get_schedules_with_retries do |schedules|
# do something here
end
get_schedules_with_retries do |schedules|
# do another thing here
end
In Python, a block is a syntactic feature (an indentation under block opening statements like if or def) and not an object. The feature you expect may be a closure (which can access variables outside of the block), which you can achieve using inner functions, but any callable could be used. Because of how lambda works in Python, the inline function definition you've shown with do |arg| is limited to a single expression.
Here's a rough rewrite of your sample code in Python.
def get_schedules_with_retries(callable, max_retry_count = 3):
retry_count = 0
while retry_count < max_retry_count:
schedules = get_more_raw_schedules()
try:
callable(schedules)
except: # Note: could filter types, bind name etc.
traceback.print_exc()
if schedules.count > 0:
break
else:
retry_count+=1
return schedules
get_schedules_with_retries(lambda schedules: single_expression)
def more_complex_function(schedules):
pass # do another thing here
get_schedules_with_retries(more_complex_function)
One variant uses a for loop to make it clear the loop is finite:
def call_with_retries(callable, args=(), tries=3):
for attempt in range(tries):
try:
result=callable(*args)
break
except:
traceback.print_exc()
continue
else: # break never reached, so function always failed
raise # Reraises the exception we printed above
return result
Frequently when passing callables like this, you'll already have the function you want available somewhere and won't need to redefine it. For instance, methods on objects (bound methods) are perfectly valid callables.
You could do it like this:
def codeBlock(paramter1, parameter2):
print("I'm a code block")
def passMeABlock(block, *args):
block(*args)
#pass the block like this
passMeABlock(codeBlock, 1, 2)
You do so by defining a function, either by using the def statement or a lambda expression.
There are other techniques however, that may apply here. If you need to apply common logic to the input or output of a function, write a decorator. If you need to handle exceptions in a block of code, perhaps creating a context manager is applicable.

Structuring Python Code for Data Analysis

I wrote code for a data analysis project, but it's becoming unwieldy and I'd like to find a better way of structuring it so I can share it with others.
For the sake of brevity, I have something like the following:
def process_raw_text(txt_file):
# do stuff
return token_text
def tag_text(token_text):
# do stuff
return tagged
def bio_tag(tagged):
# do stuff
return bio_tagged
def restructure(bio_tagged):
# do stuff
return(restructured)
print(restructured)
Basically I'd like the program to run through all of the functions sequentially and print the output.
In looking into ways to structure this, I read up on classes like the following:
class Calculator():
def add(x, y):
return x + y
def subtract(x, y):
return x - y
This seems useful when structuring a project to allow individual functions to be called separately, such as the add function with Calculator.add(x,y), but I'm not sure it's what I want.
Is there something I should be looking into for a sequential run of functions (that are meant to structure the data flow and provide readability)? Ideally, I'd like all functions to be within "something" I could call once, that would in turn run everything within it.
Chain together the output from each function as the input to the next:
def main():
print restructure(bio_tag(tag_text(process_raw_text(txt_file))
if __name__ == '__main__':
main()
#SvenMarnach makes a nice suggestion. A more general solution is to realise that this idea of repeatedly using the output as the input for the next in a sequence is exactly what the reduce function does. We want to start with some input txt_file:
def main():
pipeline = [process_raw_text, tag_text, bio_tag, restructure]
print reduce(apply, pipeline, txt_file)
There's nothing preventing you from creating a class (or set of classes) that represent that you want to manage with implementations that will call the functions you need in a sequence.
class DataAnalyzer():
# ...
def your_method(self, **kwargs):
# call sequentially, or use the 'magic' proposed by others
# but internally to your class and not visible to clients
pass
The functions themselves could remain private within the module, which seem to be implementation details.
you can implement a simple dynamic pipeline just using modules and functions.
my_module.py
def 01_process_raw_text(txt_file):
# do stuff
return token_text
def 02_tag_text(token_text):
# do stuff
return tagged
my_runner.py
import my_module
if __name__ == '__main__':
funcs = sorted([x in my_module.__dict__.iterkeys() if re.match('\d*.*', x)])
data = initial_data
for f in funcs:
data = my_module.__dict__[f](data)

Efficient way of having a function only execute once in a loop

At the moment, I'm doing stuff like the following, which is getting tedious:
run_once = 0
while 1:
if run_once == 0:
myFunction()
run_once = 1:
I'm guessing there is some more accepted way of handling this stuff?
What I'm looking for is having a function execute once, on demand. For example, at the press of a certain button. It is an interactive app which has a lot of user controlled switches. Having a junk variable for every switch, just for keeping track of whether it has been run or not, seemed kind of inefficient.
I would use a decorator on the function to handle keeping track of how many times it runs.
def run_once(f):
def wrapper(*args, **kwargs):
if not wrapper.has_run:
wrapper.has_run = True
return f(*args, **kwargs)
wrapper.has_run = False
return wrapper
#run_once
def my_function(foo, bar):
return foo+bar
Now my_function will only run once. Other calls to it will return None. Just add an else clause to the if if you want it to return something else. From your example, it doesn't need to return anything ever.
If you don't control the creation of the function, or the function needs to be used normally in other contexts, you can just apply the decorator manually as well.
action = run_once(my_function)
while 1:
if predicate:
action()
This will leave my_function available for other uses.
Finally, if you need to only run it once twice, then you can just do
action = run_once(my_function)
action() # run once the first time
action.has_run = False
action() # run once the second time
Another option is to set the func_code code object for your function to be a code object for a function that does nothing. This should be done at the end of your function body.
For example:
def run_once():
# Code for something you only want to execute once
run_once.func_code = (lambda:None).func_code
Here run_once.func_code = (lambda:None).func_code replaces your function's executable code with the code for lambda:None, so all subsequent calls to run_once() will do nothing.
This technique is less flexible than the decorator approach suggested in the accepted answer, but may be more concise if you only have one function you want to run once.
Run the function before the loop. Example:
myFunction()
while True:
# all the other code being executed in your loop
This is the obvious solution. If there's more than meets the eye, the solution may be a bit more complicated.
I'm assuming this is an action that you want to be performed at most one time, if some conditions are met. Since you won't always perform the action, you can't do it unconditionally outside the loop. Something like lazily retrieving some data (and caching it) if you get a request, but not retrieving it otherwise.
def do_something():
[x() for x in expensive_operations]
global action
action = lambda : None
action = do_something
while True:
# some sort of complex logic...
if foo:
action()
There are many ways to do what you want; however, do note that it is quite possible that —as described in the question— you don't have to call the function inside the loop.
If you insist in having the function call inside the loop, you can also do:
needs_to_run= expensive_function
while 1:
…
if needs_to_run: needs_to_run(); needs_to_run= None
…
I've thought of another—slightly unusual, but very effective—way to do this that doesn't require decorator functions or classes. Instead it just uses a mutable keyword argument, which ought to work in most versions of Python. Most of the time these are something to be avoided since normally you wouldn't want a default argument value to change from call-to-call—but that ability can be leveraged in this case and used as a cheap storage mechanism. Here's how that would work:
def my_function1(_has_run=[]):
if _has_run: return
print("my_function1 doing stuff")
_has_run.append(1)
def my_function2(_has_run=[]):
if _has_run: return
print("my_function2 doing some other stuff")
_has_run.append(1)
for i in range(10):
my_function1()
my_function2()
print('----')
my_function1(_has_run=[]) # Force it to run.
Output:
my_function1 doing stuff
my_function2 doing some other stuff
----
my_function1 doing stuff
This could be simplified a little further by doing what #gnibbler suggested in his answer and using an iterator (which were introduced in Python 2.2):
from itertools import count
def my_function3(_count=count()):
if next(_count): return
print("my_function3 doing something")
for i in range(10):
my_function3()
print('----')
my_function3(_count=count()) # Force it to run.
Output:
my_function3 doing something
----
my_function3 doing something
Here's an answer that doesn't involve reassignment of functions, yet still prevents the need for that ugly "is first" check.
__missing__ is supported by Python 2.5 and above.
def do_once_varname1():
print 'performing varname1'
return 'only done once for varname1'
def do_once_varname2():
print 'performing varname2'
return 'only done once for varname2'
class cdict(dict):
def __missing__(self,key):
val=self['do_once_'+key]()
self[key]=val
return val
cache_dict=cdict(do_once_varname1=do_once_varname1,do_once_varname2=do_once_varname2)
if __name__=='__main__':
print cache_dict['varname1'] # causes 2 prints
print cache_dict['varname2'] # causes 2 prints
print cache_dict['varname1'] # just 1 print
print cache_dict['varname2'] # just 1 print
Output:
performing varname1
only done once for varname1
performing varname2
only done once for varname2
only done once for varname1
only done once for varname2
One object-oriented approach and make your function a class, aka as a "functor", whose instances automatically keep track of whether they've been run or not when each instance is created.
Since your updated question indicates you may need many of them, I've updated my answer to deal with that by using a class factory pattern. This is a bit unusual, and it may have been down-voted for that reason (although we'll never know for sure because they never left a comment). It could also be done with a metaclass, but it's not much simpler.
def RunOnceFactory():
class RunOnceBase(object): # abstract base class
_shared_state = {} # shared state of all instances (borg pattern)
has_run = False
def __init__(self, *args, **kwargs):
self.__dict__ = self._shared_state
if not self.has_run:
self.stuff_done_once(*args, **kwargs)
self.has_run = True
return RunOnceBase
if __name__ == '__main__':
class MyFunction1(RunOnceFactory()):
def stuff_done_once(self, *args, **kwargs):
print("MyFunction1.stuff_done_once() called")
class MyFunction2(RunOnceFactory()):
def stuff_done_once(self, *args, **kwargs):
print("MyFunction2.stuff_done_once() called")
for _ in range(10):
MyFunction1() # will only call its stuff_done_once() method once
MyFunction2() # ditto
Output:
MyFunction1.stuff_done_once() called
MyFunction2.stuff_done_once() called
Note: You could make a function/class able to do stuff again by adding a reset() method to its subclass that reset the shared has_run attribute. It's also possible to pass regular and keyword arguments to the stuff_done_once() method when the functor is created and the method is called, if desired.
And, yes, it would be applicable given the information you added to your question.
Assuming there is some reason why myFunction() can't be called before the loop
from itertools import count
for i in count():
if i==0:
myFunction()
Here's an explicit way to code this up, where the state of which functions have been called is kept locally (so global state is avoided). I don't much like the non-explicit forms suggested in other answers: it's too surprising to see f() and for this not to mean that f() gets called.
This works by using dict.pop which looks up a key in a dict, removes the key from the dict, and takes a default value to use in case the key isn't found.
def do_nothing(*args, *kwargs):
pass
# A list of all the functions you want to run just once.
actions = [
my_function,
other_function
]
actions = dict((action, action) for action in actions)
while True:
if some_condition:
actions.pop(my_function, do_nothing)()
if some_other_condition:
actions.pop(other_function, do_nothing)()
I use cached_property decorator from functools to run just once and save the value. Example from the official documentation https://docs.python.org/3/library/functools.html
class DataSet:
def __init__(self, sequence_of_numbers):
self._data = tuple(sequence_of_numbers)
#cached_property
def stdev(self):
return statistics.stdev(self._data)
You can also use one of the standard library functools.lru_cache or functools.cache decorators in front of the function:
from functools import lru_cache
#lru_cache
def expensive_function():
return None
https://docs.python.org/3/library/functools.html
If I understand the updated question correctly, something like this should work
def function1():
print "function1 called"
def function2():
print "function2 called"
def function3():
print "function3 called"
called_functions = set()
while True:
n = raw_input("choose a function: 1,2 or 3 ")
func = {"1": function1,
"2": function2,
"3": function3}.get(n)
if func in called_functions:
print "That function has already been called"
else:
called_functions.add(func)
func()
You have all those 'junk variables' outside of your mainline while True loop. To make the code easier to read those variables can be brought inside the loop, right next to where they are used. You can also set up a variable naming convention for these program control switches. So for example:
# # _already_done checkpoint logic
try:
ran_this_user_request_already_done
except:
this_user_request()
ran_this_user_request_already_done = 1
Note that on the first execution of this code the variable ran_this_user_request_already_done is not defined until after this_user_request() is called.
A simple function you can reuse in many places in your code (based on the other answers here):
def firstrun(keyword, _keys=[]):
"""Returns True only the first time it's called with each keyword."""
if keyword in _keys:
return False
else:
_keys.append(keyword)
return True
or equivalently (if you like to rely on other libraries):
from collections import defaultdict
from itertools import count
def firstrun(keyword, _keys=defaultdict(count)):
"""Returns True only the first time it's called with each keyword."""
return not _keys[keyword].next()
Sample usage:
for i in range(20):
if firstrun('house'):
build_house() # runs only once
if firstrun(42): # True
print 'This will print.'
if firstrun(42): # False
print 'This will never print.'
I've taken a more flexible approach inspired by functools.partial function:
DO_ONCE_MEMORY = []
def do_once(id, func, *args, **kwargs):
if id not in DO_ONCE_MEMORY:
DO_ONCE_MEMORY.append(id)
return func(*args, **kwargs)
else:
return None
With this approach you are able to have more complex and explicit interactions:
do_once('foobar', print, "first try")
do_once('foo', print, "first try")
do_once('bar', print, "second try")
# first try
# second try
The exciting part about this approach it can be used anywhere and does not require factories - it's just a small memory tracker.
Depending on the situation, an alternative to the decorator could be the following:
from itertools import chain, repeat
func_iter = chain((myFunction,), repeat(lambda *args, **kwds: None))
while True:
next(func_iter)()
The idea is based on iterators, which yield the function once (or using repeat(muFunction, n) n-times), and then endlessly the lambda doing nothing.
The main advantage is that you don't need a decorator which sometimes complicates things, here everything happens in a single (to my mind) readable line. The disadvantage is that you have an ugly next in your code.
Performance wise there seems to be not much of a difference, on my machine both approaches have an overhead of around 130 ns.
If the condition check needs to happen only once you are in the loop, having a flag signaling that you have already run the function helps. In this case you used a counter, a boolean variable would work just as fine.
signal = False
count = 0
def callme():
print "I am being called"
while count < 2:
if signal == False :
callme()
signal = True
count +=1
I'm not sure that I understood your problem, but I think you can divide loop. On the part of the function and the part without it and save the two loops.

Categories

Resources