Python concurrent.futures - python

I have a multiprocessing code, and each process have to analyse same data differently.
I have implemented:
with concurrent.futures.ProcessPoolExecutor() as executor:
res = executor.map(goal_fcn, p, [global_DataFrame], [global_String])
for f in concurrent.futures.as_completed(res):
fp = res
and function:
def goal_fcn(x, DataFrame, String):
return heavy_calculation(x, DataFrame, String)
the problem is goal_fcn is called only once, while should be multiple time
In debugger, I checked now the variable p is looking, and it has multiple columns and rows. Inside goal_fcn, variable x have only first row - looks good.
But the function is called only once. There is no error, the code just execute next steps.
Even if I modify variable p = [1,3,4,5], and of course code. goal_fcn is executed only once
I have to use map() because keeping the order between input and output is required

map works like zip. It terminates once at least one input sequence is at its end. Your [global_DataFrame] and [global_String] lists have one element each, so that is where map ends.
There are two ways around this:
Use itertools.product. This is the equivalent of running "for all data frames, for all strings, for all p". Something like this:
def goal_fcn(x_DataFrame_String):
x, DataFrame, String = x_DataFrame_String
...
executor.map(goal_fcn, itertools.product(p, [global_DataFrame], [global_String]))
Bind the fixed arguments instead of abusing the sequence arguments.
def goal_fcn(x, DataFrame, String):
pass
bound = functools.partial(goal_fcn, DataFrame=global_DataFrame, String=global_String)
executor.map(bound, p)

Related

How to wrap a generator with a filter?

I have a series of connected generators and I want to create a filter that can be used to wrap one of the generators. This filter wrapper should take a generator and a function as parameters. If a data item in the incoming stream does not pass the requirements of the filter, it should be passed downstream to the next generator without going through the wrapped generator. I have made a working example here that should make it more clear as to what I am trying to achieve:
import functools
is_less_than_three = lambda x : True if x < 3 else False
def add_one(numbers):
print("new generator created")
for number in numbers:
yield number + 1
def wrapper(generator1, filter_):
#functools.wraps(generator1)
def wrapped(generator2):
for data in generator2:
if filter_(data):
yield from generator1([data])
else:
yield data
return wrapped
add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
print(answer)
#new generator created
#1
#new generator created
#2
#new generator created
#3
#3
#4
#5
The problem with this is that it requires creating a new generator for each data item. There must be a better way? I have also tried using itertools.tee and splitting the generator, but this causes memory problems when the generators yield values at different rates (they do). How can I accomplish what the above code does without re-creating generators and without causing memory problems?
edited to add background information below
As input I will receive large video streams. The video streams may or may not end (could be a webcam). Users are able to choose which image processing steps are carried out on the video frames, thus the order and number of functions will change. Subsequently, the functions should be able to take each other's outputs as inputs.
I have accomplished this by using a series of generators. The input:output ratio of the generators/functions is variable - it could be 1:n, 1:1, or n:1 (for example, extracting several objects (subimages) from an image to be processed separately).
Currently these generators take a few parameters that are repeated among them (not DRY) and I am trying to decrease the number of parameters by refactoring them into separate generators or wrappers. One of the more difficult ones is a filter on the data stream to determine whether or not a function should be applied to the frame (the function could be cpu-intensive and not needed on all frames).
The number of parameters makes the usage of the function more difficult for the user to understand. It also makes it more difficult for me in that whenever I want to make a change to one of the common parameters, I have to edit it for all functions.
edit2 renamed function to generator in example code to make it more clear
edit3 the solution
Thank you #Blckknght. This can be solved by creating an infinite iterator that passes the value of a local variable to the generator. I modified my example slightly to change add_one to a 1:n generator instead of a 1:1 generator to show how this solution can also work for 1:n generators.
import functools
is_less_than_three = lambda x : True if x < 3 else False
def add_one(numbers):
print("new generator created")
for number in numbers:
if number == 0:
yield number - 1
yield number
else:
yield number
def wrapper(generator1, filter_):
#functools.wraps(generator1)
def wrapped(generator2):
local_variable_passer = generator1(iter(lambda: data, object()))
for data in generator2:
if filter_(data):
next_data = next(local_variable_passer)
if data == 0:
yield next_data
next_data = next(local_variable_passer)
yield next_data
else:
yield next_data
else:
yield data
return wrapped
add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
print(answer)
#new generator created
#-1
#0
#1
#2
#3
#3
#4
#5
As I understand your problem, you have a stream of video frames, and you're trying to create a pipeline of processing functions that modify the stream. Different processing functions might change the number of frames, so a single input frame could result in multiple output frames, or multiple input frames could be consumed before a single output frame is produced. Some functions might be 1:1, but that's not something you can count on.
Your current implementation uses generator functions for all the processing. The output function iterates on the chain, and each processing step in the pipeline requests frames from the one before it using iteration.
The function you're trying to write right now is a sort of selective bypass. You want for some frames (those meeting some condition) to get passed in to some already existing generator function, but other frames to skip over the processing and just go directly into the output. Unfortunately, that's probably not possible to do with Python generators. The iteration protocol is just not sophisticated enough to support it.
First off, it is possible to do this for 1:1 with generators, but you can't easily generalize to n:1 or 1:n cases. Here's what it might look like for 1:1:
def selective_processing_1to1(processing_func, condition, input_iterable):
processing_iterator = processing_func(iter(lambda: input_value, object()))
for input_value in input_iterator:
if condition(input_value):
yield next(processing_iterator)
else:
yield input_value
There's a lot of work being done in the processing_iterator creation step. By using the two-argument form of iter with a lambda function and a sentinel object (that will never be yielded), I'm creating an infinite iterator that always yields the current value of the local variable input_value. Then I pass that iterator it to the processing_func function. I can selectively call next on the generator object if I want to apply the processing the filter represents to the current value, or I can just yield the value myself without processing it.
But because this only works on one frame at a time, it won't do for n:1 or 1:n filters (and I don't even want to think about m:n kinds of scenarios).
A "peekable" iterator that lets you see what the next value is going to be before you iterate onto it might let you support a limited form of selective filtering for n:1 processes (that is, where a possibly-variable n input frames go into one output frame). The limitation is that you can only do the selective filtering on the first of the n frames that is going to be consumed by the processing, the others will get taken without you getting a chance to check them first. Maybe that's good enough?
Anyway, here's what that looks like:
_sentinel = object()
class PeekableIterator:
def __init__(self, input_iterable):
self.iterator = iter(input_iterable)
self.next_value = next(self.iterator, _sentinel)
def __iter__(self):
return self
def __next__(self):
if self.next_value != _sentinel:
return_value = self.next_value
self.next_value = next(self.iterator, _sentinel)
return return_value
raise StopIteration
def peek(self): # this is not part of the iteration protocol!
if self.next_value != _sentinel:
return self.next_value
raise ValueError("input exhausted")
def selective_processing_Nto1(processing_func, condition, input_iterable):
peekable = PeekableIterator(input_iterable)
processing_iter = processing_func(peekable)
while True:
try:
value = peekable.peek()
print(value, condition(value))
except ValueError:
return
try:
yield next(processing_iter) if condition(value) else next(peekable)
except StopIteration:
return
This is as good as we can practically do when the processing function is a generator. If we wanted to do more, such as supporting 1:n processing, we'd need some way to know how large the n was going to be, so we could get that many values before deciding if we will pass on the next input value or not. While you could write a custom class for the processing that would report that, it is probably less convenient than just calling the processing function repeatedly as you do in the question.
The architecture is a conditional map - as such, each item must be mapped individually. This means the function should receive one number, not many numbers.
As long as there is a stateless 1:1 connection, use a function instead of a generator.
def add_one(number): # takes one number
return number + 1 # provides one number
def conditional_map(function, condition):
#functools.wraps(function)
def wrapped(generator):
return (
function(item) if condition(item)
else item for item in generator
)
return wrapped
for answer in conditional_map(add_one, lambda x: x < 3)(range(6)):
print(answer)
If data must be passed to a stateful "generator", it is a coroutine and should be designed as such. This means that yield is used both to receive and provide data.
from itertools import count
def add_increment(start=0):
# initially receive data
number = yield
for increment in count(start):
# provide and receive data
number = yield number + increment
Since this is still a 1:1 connection, it can be used with the previous conditional_map.
mapper = add_increment()
next(mapper) # prime the coroutine - this could be done with a decorator
for answer in conditional_map(mapper.send, lambda x: x < 3)(range(6)):
print(answer)
If 1:n connections are needed, expect to receive a generator for each input.
def add_some(number): # takes one number
yield number - 1
yield number
yield number + 1
def conditional_map(function, condition):
#functools.wraps(function)
def wrapped(generator):
for data in generator:
if filter_(data):
yield from function(data) # passes one *one* item
else:
yield data
return wrapped
If a stateful 1:n connection is required, a coroutine that produces a generator/iterable can be used.
def add_increments(start=0):
# initially receive data
number = yield
for increment in count(start):
# provide and receive data
number = yield (number + increment + i for i in (-1, 0, 1))
It really seems like you all are making this too complicated.
If you think of a data processing pipeline as
source -> transform -> filter -> sink
where source, transform, filter are all generators.
this is similar to Unix pipelines
cat f | tr 'a' 'A' | grep 'word' > /dev/null
then you can see how a pipeline works (conceptually).
One big difference is that Unix pipelines push data, where with Python generators you pull data.
Using some of your functions:
# this is a source
def add_one(numbers):
print("new generator created")
# this is the output that becomes the next function's input
for number in numbers:
yield number + 1
# this is a transform
def filter(input, predicate):
for item in input:
if predicate(item):
yield item
# this is the sink
def save(input, filename):
with open(filename, 'w') as f:
for item in input:
f.write(item)
To put the pipeline of generators together in python you start with the source, then pass it to a transform or filter as a parameter that can be iterated over. Of course each of the generators has a yield statement. Finally the outermost function is the sink and it consumes the values while it iterates.
It looks like this. You can see how the predicate function is passed to the filter function in addition to the "source" of its data.
# now run the pipeline
save(filter(add_one(range(20), is_less_than_three), 'myfile')
Some find that this looks awkward but if you think of mathematical notation it is easier. I am sure you have seen f(g(x)) which is exactly the same notation.
You could also write it as:
save(filter(add_one(range(20),
is_less_than_three),
'myfile')
which shows better how the parameters are used.
To recap
The pipeline is a generator. In this case it won't have a generator as its source. It may have non-generator input such as a list of numbers (your example), or create them some other way such as reading a file.
Transform generators always have a generator for their source, and they yield output to their "sink". In other words a transform acts like a sink to its source and like a source to its sink.
The sink is final part of a pipeline is a sink that just iterates over its source input. It consumes its input and doesn't yield any output. Its job is to consume the items, by processing, saving printing or whatever.
A transform is a m to n function that for m inputs produces n outputs, meaning it can filter out some inputs and not pass them, or product multiple outputs by creating new items. An example might be transforming a video stream from 60fps to 30fps. For every two input frames it produces one output frame.

creating generator function from ordinary function

I'm reading about generators on http://www.dabeaz.com/generators/
(which is very fine, informative article even if it's a ppt slide)
It has the following section about creating geneators
Any single-argument function is easy to turn
into a generator function
def generate(func):
def gen_func(s):
for item in s:
yield func(item)
return gen_func
• Example:
gen_sqrt = generate(math.sqrt)
for x in gen_sqrt(range(100)):
print(x)
I don't see the point of this slide. (it's on 114p of the slide)
Isn't it just (math.sqrt(e) for e in range(100))
What is he acomplishing with generate function?
The point of such higher-order functions is to allow multiple inputs to a function to be chosen at different times/places:
def filter_lines(f,filt):
with open(f) as f:
for l in f:
print(' '.join(list(filt(map(float,l.split())))))
This can accept any kind of iterable-transformer as filt, like
def ints(it):
for f in it:
if f==int(f): yield f
or the result of generate:
filter_lines("…",ints)
filter_lines("…",list) # the identity: print all
filter_lines("…",generate(math.sqrt))
filter_lines("…",generate(abs))
Therefore we can see that generate transforms a function of one element into a function of iterables of elements. (This is what is meant by “turn into a generator function”.) We can go one further:
import functools
filter_lines("…",functools.partial(map,math.sqrt))
from which we can conclude that generate itself is equivalent to functools.partial(functools.partial,map). Applying partial twice like that splits a parameter list in two, changing a normal function into a higher-order function.

list of functions Python

I have a list of patterns:
patterns_trees = [response.css("#Header").xpath("//a/img/#src"),
response.css("#HEADER").xpath("//a/img/#src"),
response.xpath("//header//a/img/#src"),
response.xpath("//a[#href='"+response.url+'/'+"']/img/#src"),
response.xpath("//a[#href='/']/img/#src")
]
After I traverse it and find the right pattern I have to send the pattern as an argument to a callback function
for pattern_tree in patterns_trees:
...
pattern_response = scrapy.Request(...,..., meta={"pattern_tree": pattern_tree.extract_first()})
By doing this I get the value of the regex not the pattern
THINGS I TRIED:
I tried isolating the patterns in a separate class but still I have the problem that I can not store them as pattern but as values.
I tried to save them as strings and maybe I can make it work but
What is the most efficient way of storing list of functions
UPDATE: Possible solution but too hardcoded and it's too problematic when I want to add more patterns:
def patter_0(response):
response.css("#Header").xpath("//a/img/#src")
def patter_1(response):
response.css("#HEADER").xpath("//a/img/#src")
.....
class patternTrees:
patterns = [patter_0,...,patter_n]
def length_patterns(self):
return len(patterns)
If you're willing to consider reformatting your list of operations, then this is a somewhat neat solution. I've changed the list of operations to a list of tuples. Each tuple contains (a ref to) the appropriate function, and another tuple consisting of arguments.
It's fairly easy to add new operations to the list: just specify what function to use, and the appropriate arguments.
If you want to use the result from one operation as an argument in the next: You will have to return the value from execute() and process it in the for loop.
I've replaced the calls to response with prints() so that you can test it easily.
def response_css_ARG_xpath_ARG(args):
return "response.css(\"%s\").xpath(\"%s\")" % (args[0],args[1])
#return response.css(args[0]).xpath(args[1])
def response_xpath_ARG(arg):
return "return respons.xpath(\"%s\")" % (arg)
#return response.xpath(arg)
def execute(function, args):
response = function(args)
# do whatever with response
return response
response_url = "https://whatever.com"
patterns_trees = [(response_css_ARG_xpath_ARG, ("#Header", "//a/img/#src")),
(response_css_ARG_xpath_ARG, ("#HEADER", "//a/img/#src")),
(response_xpath_ARG, ("//header//a/img/#src")),
(response_xpath_ARG, ("//a[#href='"+response_url+"/"+"']/img/#src")),
(response_xpath_ARG, ("//a[#href='/']/img/#src"))]
for pattern_tree in patterns_trees:
print(execute(pattern_tree[0], pattern_tree[1]))
Note that execute() can be omitted! Depending on if you need to process the result or not. Without the executioner, you may just call the function directly from the loop:
for pattern_tree in patterns_trees:
print(pattern_tree[0](pattern_tree[1]))
Not sure I understand what you're trying to do, but could you make your list a list of lambda functions like so:
patterns_trees = [
lambda response : response.css("#Header").xpath("//a/img/#src"),
...
]
And then, in your loop:
for pattern_tree in patterns_trees:
intermediate_response = scrapy.Request(...) # without meta kwarg
pattern_response = pattern_tree(intermediate_response)
Or does leaving the meta away have an impact on the response object?

Small python program involving newton method

I'm trying to write a small program in python that involves(among other things) Newton method, but i'm encountering several problems, that are probably pretty basic, but since I'm new at programming, i cant overcome..
First i defined the function and it's derivative:
import math
def f(x,e,m):
return x-e*math.sin(x)-m
def df(x,e):
return 1-e*math.cos(x)
def newtons_method(E0,m,e,q):#q is the error
while abs(f(E0,e,m))>q:
E=E0-f(E0,e,m)/df(E0,e)
E0=E
return (E0)
def trueanomaly(e,E):
ta=2*math.arctan(math.sqrt((1+e)/(1-e))*math.tan(E))
return (ta)
def keplerianfunction(T,P,e,K,y,w):
for t in frange (0,100,0.5):
m=(2*math.pi*((t-T)/P))
E0=m+e*math.sin(m)+((e**2)/2)*math.sin(2*m)
newtons_method(E0,m,e,0.001)
trueanomaly(e,E0)
rv=y+K*(e*math.cos(w)+math.cos(w+ta))
return (ta)","(rv)
def frange(start, stop, step):
i = start
while i < stop:
yield i
i += step
The question is that this keeps giving me errors, indentation errors and stuff, especially in the keplerianfunction ... Can someone help me? what am I doing wrong here?
Thank you in advance!
Many things are wrong with this code, and I don't know what the desired behaviour is so can't guarantee that this will help, but I'm going to try and help you debug (although it looks like you mostly need to re-read your Python coursebook...).
First, in most languages if not all, there is a thing called the scope: a variable, function, or any other object, exists only within a certain scope. In particular, variables exist only in the scope of the function that they are defined in. This means that, to use the result of a function, you first need to return that result (which you are doing), and when you call that function you need to store that result into a variable, for example ta = trueanomaly(e, E0).
Then, you don't really need to use brackets when returning values, even if you want to return multiple values. If you do want to return multiple values though, you just need to separate them with a comma, but not with a string character of a comma: write return ta, rv instead of return ta","rv.
Finally, you seem to be iterating over a range of values, yet you don't return the whole range of values but either the first value (if your return is in the for loop), or the last one (if your return is under the for loop). Instead, you may want to store all the ta and rv values into one/two lists, and return that/those lists in the end, for example:
def keplerianfunction(T,P,e,K,y,w):
# Initialise two empty lists
tas = []
rvs = []
for t in frange (0,100,0.5):
m = 2*math.pi*((t-T)/P)
E0 = m+e*math.sin(m)+((e**2)/2)*math.sin(2*m)
E0 = newtons_method(E0,m,e,0.001)
ta = trueanomaly(e,E0)
rv = y+K*(e*math.cos(w)+math.cos(w+ta))
# At each step save the value for ta and rv into the appropriate list
tas.append(ta)
rvs.append(rv)
# And finally return the lists
return (tas,rvs)
Also, why using a new frange function when range has the exact same behaviour and is probably more efficient?

Python Newbie: Returning Multiple Int/String Results in Python

I have a function that has several outputs, all of which "native", i.e. integers and strings. For example, let's say I have a function that analyzes a string, and finds both the number of words and the average length of a word.
In C/C++ I would use # to pass 2 parameters to the function. In Python I'm not sure what's the right solution, because integers and strings are not passed by reference but by value (at least this is what I understand from trial-and-error), so the following code won't work:
def analyze(string, number_of_words, average_length):
... do some analysis ...
number_of_words = ...
average_length = ...
If i do the above, the values outside the scope of the function don't change. What I currently do is use a dictionary like so:
def analyze(string, result):
... do some analysis ...
result['number_of_words'] = ...
result['average_length'] = ...
And I use the function like this:
s = "hello goodbye"
result = {}
analyze(s, result)
However, that does not feel right. What's the correct Pythonian way to achieve this? Please note I'm referring only to cases where the function returns 2-3 results, not tens of results. Also, I'm a complete newbie to Python, so I know I may be missing something trivial here...
Thanks
python has a return statement, which allows you to do the follwing:
def func(input):
# do calculation on input
return result
s = "hello goodbye"
res = func(s) # res now a result dictionary
but you don't need to have result at all, you can return a few values like so:
def func(input):
# do work
return length, something_else # one might be an integer another string, etc.
s = "hello goodbye"
length, something = func(s)
If you return the variables in your function like this:
def analyze(s, num_words, avg_length):
# do something
return s, num_words, avg_length
Then you can call it like this to update the parameters that were passed:
s, num_words, avg_length = analyze(s, num_words, avg_length)
But, for your example function, this would be better:
def analyze(s):
# do something
return num_words, avg_length
In python you don't modify parameters in the C/C++ way (passing them by reference or through a pointer and doing modifications in situ).There are some reasons such as that the string objects are inmutable in python. The right thing to do is to return the modified parameters in a tuple (as SilentGhost suggested) and rebind the variables to the new values.
If you need to use method arguments in both directions, you can encapsulate the arguments to the class and pass object to the method and let the method use its properties.

Categories

Resources