I'm wondering what the difference is between this code:
import multiprocessing
g = {} # global data dictionary
class Foo:
#staticmethod
def bar():
elems = [1,2,3,4,5]
g["important_data"] = get_important_data()
with multiprocessing.Pool(10) as p:
for res in p.imap(f, elems):
# do whatever
where f is a function that will use g["important_data"] in a read-only manner; and this code:
import multiprocessing
class Foo:
#staticmethod
def bar():
elems = [1,2,3,4,5]
important_data = get_important_data()
with multiprocessing.Pool(10) as p:
for res in p.imap(f, (elems, important_data)):
# do whatever
where f does exactly the same computation on elems as above, but is handed the additional data as a parameter important_data, rather than acessing it via the global variable g.
I'm currently working with code where the original authors wrote the comment # Globals for multiprocessing to prevent shared memory over g, but I don't know what this means. I know that different processes in python have their own copies of global variables by default (so the first implementation would no work at all if f were meant to write in g; however, as mentioned, this is read-only), but the second implementation seems to copy important_data as well, so where's the difference?
Related
I made some tests about this setting, that appeared unexpectedly as a quick fix for my problem:
I want to call a multiprocessing.Pool.map() from inside a main function (that sets up the parameters). However it is simpler for me to give a locally defined function as one of the args. Since the latter can't be pickled, I tried the laziest solution of declaring it as global. Should I expect some weird results? Would you advise a different strategy?
Here is an example (dummy) code:
#!/usr/bin/env python3
import random
import multiprocessing as mp
def processfunc(arg_and_func):
arg, func = arg_and_func
return "%7.4f:%s" %(func(arg), arg)
def main(*args):
# the content of var depends of main:
var = random.random()
# Now I need to pass a func that uses `var`
global thisfunc
def thisfunc(x):
return x+var
# Test regular use
for x in range(-5,0):
print(processfunc((x, thisfunc)))
# Test parallel runs.
with mp.Pool(2) as pool:
for r in pool.imap_unordered(processfunc, [(x, thisfunc) for x in range(20)]):
print(r)
if __name__=='__main__':
main()
PS: I know I could define thisfunc at module level, and pass the var argument through processfunc, but since my actual processfunc in real life already takes a lot of arguments, it seemed more readable to pass a single object thisfunc instead of many parameters...
What you have now looks OK, but might be fragile for later changes.
I might use partial in order to simplify the explicit passing of var to a globally defined function.
import random
import multiprocessing as mp
from functools import partial
def processfunc(arg_and_func):
arg, func = arg_and_func
return "%7.4f:%s" %(func(arg), arg)
def thisfunc(var, x):
return x + var
def main(*args):
# the content of var depends of main:
var = random.random()
f = partial(thisfunc, var)
# Test regular use
for x in range(-5,0):
print(processfunc((x, thisfunc)))
# Test parallel runs.
with mp.Pool(2) as pool:
for r in pool.imap_unordered(processfunc, [(x, f) for x in range(20)]):
print(r)
if __name__=='__main__':
main()
Consider
def f(x,*args):
intermediate = computationally_expensive_fct(x)
return do_stuff(intermediate,*args)
The problem: For the same x, this function might be called thousands of times with different arguments (other than x) and each time the function gets called intermediate would be computed (Cholesky factorisation, cost O(n^3)). In principle however it is enough if for each x, intermediate is computed only once for each x and then that result would be used again and again by f with different args.
My idea To remedy this, I tried to create a global dictionary where the function looks up whether for its parameter x the expensive stuff has already been done and stored in the dictionary or whether it has to compute it:
if all_intermediates not in globals():
global all_intermediates = {}
if all_intermediates.has_key(x):
pass
else:
global all_intermediates[x] = computationally_expensive_fct(x)
It turns out I can't do this because globals() is a dict itself and you can't hash dicts in python. I'm a novice programmer and would be happy if someone could point me towards a pythonic way to do what I want to achieve.
Solution
A bit more lightweight than writing a decorator and without accessing globals:
def f(x, *args):
if not hasattr(f, 'all_intermediates'):
f.all_intermediates = {}
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
Variation
A variation that avoids the if not hasattr but needs to set all_intermediates as attribute of f after it is defined:
def f(x, *args):
if x not in f.all_intermediates:
f.all_intermediates[x] = computationally_expensive_fct(x)
intermediate = f.all_intermediates[x]
return do_stuff(intermediate,*args)
f.all_intermediates = {}
This caches all_intermediates as an attribute of the function itself.
Explanation
Functions are objects and can have attributes. Therefore, you can store the dictionary all_intermediates as an attribute of the function f. This makes the function self contained, meaning you can move it to another module without worrying about module globals. Using the variation shown above, you need move f.all_intermediates = {} along with function.
Putting things into globals() feels not right. I recommend against doing this.
I don't get it why you are trying to use globals(). Instead of using globals() you can simply keep computed values in your own module level dictionary and have a wrapper function that will look up whether intermediate is already calculated or not. Something like this:
computed_intermediate = {}
def get_intermediate(x):
if x not in computed_intermediate:
computed_intermediate[x] = computationally_expensive_fct(x)
return computed_intermediate[x]
def f(x,*args):
intermediate = get_intermediate(x)
return do_stuff(intermediate,*args)
In this way computationally_expensive_fct(x) will be calculated only once for each x, namely the first time it is accessed.
This is often implemented with the #memoized decorator on the expensive function.
It is described at https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize and brief enough to duplicate here in case of link rot:
import collections
import functools
class memoized(object):
'''Decorator. Caches a function's return value each time it is called.
If called later with the same arguments, the cached value is returned
(not reevaluated).
'''
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if not isinstance(args, collections.Hashable):
# uncacheable. a list, for instance.
# better to not cache than blow up.
return self.func(*args)
if args in self.cache:
return self.cache[args]
else:
value = self.func(*args)
self.cache[args] = value
return value
def __repr__(self):
'''Return the function's docstring.'''
return self.func.__doc__
def __get__(self, obj, objtype):
'''Support instance methods.'''
return functools.partial(self.__call__, obj)
Once the expensive function is memoized, using it is invisible:
#memoized
def expensive_function(n):
# expensive stuff
return something
p = expensive_function(n)
q = expensive_function(n)
assert p is q
Do note that if the result of expensive_function is not hashable (lists are a common example) there will not be a performance gain, it will still work, but act as if it isn't memoized.
I am trying to produce a better answer to the frequently-asked question "How do I do function-local static variables in Python?" (1, 2, 3, ...) "Better" means completely encapsulated in a decorator, that can be used in any context where a function definition may appear. In particular, it must DTRT when applied to methods and nested functions; it must play nice with other decorators applied to the same function (in any order); it must accept arbitrary initializers for the static variables, and it must not modify the formal parameter list of the decorated function. Basically, if this were to be proposed for inclusion in the standard library, nobody should be able to object on quality-of-implementation grounds.
Ideal surface syntax would be
#static_vars(a=0, b=[])
def test():
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
I would also accept
#static_vars(a=0, b=[])
def test():
static.b.append(static.a)
static.a += 1
sys.stdout.write(repr(static.b) + "\n")
or similar, as long as the namespace for the static variables is not the name of the function! (I intend to use this in functions that may have very long names.)
A slightly more motivated example involves precompiled regular expressions that are only relevant to one function:
#static_vars(encode_re = re.compile(
br'[\x00-\x20\x7F-\xFF]|'
br'%(?!(?:[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}))')
def encode_nonascii_and_percents(segment):
segment = segment.encode("utf-8", "surrogateescape")
return encode_re.sub(
lambda m: "%{:02X}".format(ord(m.group(0))).encode("ascii"),
segment).decode("ascii")
Now, I already have a mostly-working implementation. The decorator rewrites each function definition as if it had read like so (using the first example):
def _wrap_test_():
a = 0
b = 1
def test():
nonlocal a, b
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
test = _wrap_test_()
del _wrap_test_
It seems that the only way to accomplish this is to munge the AST. I have code that works for simple cases (see below) but I strongly suspect it is wrong in more complicated cases. For instance, I think it will break if applied to a method definition, and of course it also breaks in any situation where inspect.getsource() fails.
So the question is, first, what should I do to make it work in more cases, and second, is there a better way to define a decorator with the same black-box effects?
Note 1: I only care about Python 3.
Note 2: Please assume that I have read all of the proposed solutions in all of the linked questions and found all of them inadequate.
#! /usr/bin/python3
import ast
import functools
import inspect
import textwrap
def function_skeleton(name, args):
"""Return the AST of a function definition for a function named NAME,
which takes keyword-only args ARGS, and does nothing. Its
.body field is guaranteed to be an empty array.
"""
fn = ast.parse("def foo(*, {}): pass".format(",".join(args)))
# The return value of ast.parse, as used here, is a Module object.
# We want the function definition that should be the Module's
# sole descendant.
assert isinstance(fn, ast.Module)
assert len(fn.body) == 1
assert isinstance(fn.body[0], ast.FunctionDef)
fn = fn.body[0]
# Remove the 'pass' statement.
assert len(fn.body) == 1
assert isinstance(fn.body[0], ast.Pass)
fn.body.clear()
fn.name = name
return fn
class static_vars:
"""Decorator which provides functions with static variables.
Usage:
#static_vars(foo=1, bar=2, ...)
def fun():
foo += 1
return foo + bar
The variables are implemented as upvalues defined by a wrapper
function.
Uses introspection to recompile the decorated function with its
context changed, and therefore may not work in all cases.
"""
def __init__(self, **variables):
self._variables = variables
def __call__(self, func):
if func.__name__ in self._variables:
raise ValueError(
"function name {} may not be the same as a "
"static variable name".format(func.__name__))
fname = inspect.getsourcefile(func)
lines, first_lineno = inspect.getsourcelines(func)
mod = ast.parse(textwrap.dedent("".join(lines)), filename=fname)
# The return value of ast.parse, as used here, is a Module
# object. Save that Module for use later and extract the
# function definition that should be its sole descendant.
assert isinstance(mod, ast.Module)
assert len(mod.body) == 1
assert isinstance(mod.body[0], ast.FunctionDef)
inner_fn = mod.body[0]
mod.body.clear()
# Don't apply decorators twice.
inner_fn.decorator_list.clear()
# Fix up line numbers. (Why the hell doesn't ast.parse take a
# starting-line-number argument?)
ast.increment_lineno(inner_fn, first_lineno - inner_fn.lineno)
# Inject a 'nonlocal' statement declaring the static variables.
svars = sorted(self._variables.keys())
inner_fn.body.insert(0, ast.Nonlocal(svars))
# Synthesize the wrapper function, which will take the static
# variableas as arguments.
outer_fn_name = ("_static_vars_wrapper_" +
inner_fn.name + "_" +
hex(id(self))[2:])
outer_fn = function_skeleton(outer_fn_name, svars)
outer_fn.body.append(inner_fn)
outer_fn.body.append(
ast.Return(value=ast.Name(id=inner_fn.name, ctx=ast.Load())))
mod.body.append(outer_fn)
ast.fix_missing_locations(mod)
# The new function definition must be evaluated in the same context
# as the original one. FIXME: supply locals if appropriate.
context = func.__globals__
exec(compile(mod, filename="<static-vars>", mode="exec"),
context)
# extract the function we just defined
outer_fn = context[outer_fn_name]
del context[outer_fn_name]
# and call it, supplying the static vars' initial values; this
# returns the adjusted inner function
adjusted_fn = outer_fn(**self._variables)
functools.update_wrapper(adjusted_fn, func)
return adjusted_fn
if __name__ == "__main__":
import sys
#static_vars(a=0, b=[])
def test():
b.append(a)
a += 1
sys.stdout.write(repr(b) + "\n")
test()
test()
test()
test()
Isn't this what classes are for?
import sys
class test_class:
a=0
b=[]
def test(self):
test_class.b.append(test_class.a)
test_class.a += 1
sys.stdout.write(repr(test_class.b) + "\n")
t = test_class()
t.test()
t.test()
[0]
[0, 1]
Here is a version of your regexp encoder:
import re
class encode:
encode_re = re.compile(
br'[\x00-\x20\x7F-\xFF]|'
br'%(?!(?:[0-9A-Fa-f]{2}|u[0-9A-Fa-f]{4}))')
def encode_nonascii_and_percents(self, segment):
segment = segment.encode("utf-8", "surrogateescape")
return encode.encode_re.sub(
lambda m: "%{:02X}".format(ord(m.group(0))).encode("ascii"),
segment).decode("ascii")
e = encode()
print(e.encode_nonascii_and_percents('foo bar'))
foo%20bar
There is always the singleton class.
Is there a simple, elegant way to define Singletons in Python?
I would like to create temporary variables visible in a limited scope.
It seems likely to me that you can do this with a "with" statement, and I would think there is a construct that makes it easy to do, but I cannot seem to find it.
I would like something like the following (but it does not work this way of course):
pronunciation = "E_0 g z #_1 m p l"
# ...
with pronunciation.split() as phonemes:
if len(phonemes) > 2 or phonemes[0].startswith('E'):
condition = 1
elif len(phonemes) < 3 and phonemes[-1] == '9r':
condition = 2
So is there a simple way to make this work, using built-ins?
Thanks!
Python creates local variables with function scope (once a name is used it stays alive until the end of the function).
If you really want to limit scope then "del <var>" when you want it explicitly discarded, or create separate function to act as a container for a more limited scope.
You can create a method
def process_pronunciation(pronunciation):
phonemes = pronunciation.split()
if len(phonemes) > 2 or phonemes[0].startswith('E'):
condition = 1
elif len(phonemes) < 3 and phonemes[-1] == '9r':
condition = 2
return condition
When you call the method, the local variable phonemes won't be available in the global namespace.
pronunciation = "E_0 g z #_1 m p l"
condition = process_phonemes(pronunciation)
You could do it with with, but I don't think it's worth the trouble. Basically (in a python function) you have two scopes - global or local, that's it. If you want a symbol to have a lifespan shorter than the function you'll have to delete it afterwards using del. You could define your own context manager to make this happen:
class TempVar:
def __init__(self, loc, name, val):
self.loc = loc
self.name = name
self.val
def __enter__(self):
if self.name in self.loc:
self.old = self.loc[self.name]
self.loc[self.name] = self.val
def __exit__(self, *exc):
if hasattr(self, "old"):
self.loc[self.name] = self.old
else:
del self.loc[self.name]
then you can use it to get a temporary variable:
with TempVar(locals(), "tempVar", 42):
print(tempVar)
The working is that it modifies the dict containing local variables (which is supplied to the constructor via locals()) on entry and restoring it when leaving. Please note that this relies on that modifying the result returned by locals() actually modifies the local namespace - the specification does NOT guarantee this behaviour.
Another (and safer) alternative that was pointed out is that you could define a separate function which would have it's own scope. Remember it's perfectly legal to nest functions. For example:
def outer():
def inner(tempVar):
# here tempVar is in scope
print(tempVar)
inner(tempVar = 42)
# here tempVar is out of scope
with statement does not have its own scope , it uses the surrounding scope (like if the with statement is directly inside the script , and not within any function, it uses global namespace , if the with statement is used inside a function, it uses the function's namespace(scope)).
If you want the statements inside a with block to run in its own local scope, one possible way would be to move the logic to a function , that way the logic would be running in its own scope (and not the surrounding scope of with.
Example -
def function_for_with(f):
#Do something.
with pronunciation.split() as phonemes:
function_for_with(phonemes)
Please note, the above will not stop phonemes from being defined in the surrounding scope.
If you want that as well (move the phonemes into its own scope), you can move the complete with statement inside a function. Example -
def function_with(pronunciation):
with pronunciation.split() as phonemes:
#do stuff
pronunciation = "E_0 g z #_1 m p l"
function_with(pronunciation)
Expanding on #skyking's answer, here's an even more magical implementation of the same idea that reads almost exactly like you wrote. Introducing: the with var statement!1
class var:
def __init__(self, value):
import inspect
self.scope = inspect.currentframe().f_back.f_locals
self.old_vars = set(self.scope.keys())
self.value = value
def __enter__(self):
return self.value
def __exit__(self, type, value, traceback):
for name in set(self.scope.keys()) - self.old_vars:
del self.scope[name]
### Usage:
line = 'a b c'
with var (line.split()) as words:
# Prints "['a', 'b', 'c']"
print(words)
# Causes a NameError
print(words)
It does all the nasty extracting of local variables and names for you! How swell. If you space it quirkily like I did and hide the definition in a from boring_stuff import * statement, you can even pretend var is a keyword to all of your confused co-workers.
[1] If you actually use this, the ghost of a dead parrot will probably haunt you forever. The other answers provide much saner solutions; this one is more of a joke.
I'm attempting to create broadcast variables from within Python methods (trying to abstract some utility methods I'm creating that rely on distributed operations). However, I can't seem to access the broadcast variables from within the Spark workers.
Let's say I have this setup:
def main():
sc = SparkContext()
SomeMethod(sc)
def SomeMethod(sc):
someValue = rand()
V = sc.broadcast(someValue)
A = sc.parallelize().map(worker)
def worker(element):
element *= V.value ### NameError: global name 'V' is not defined ###
However, if I instead eliminate the SomeMethod() middleman, it works fine.
def main():
sc = SparkContext()
someValue = rand()
V = sc.broadcast(someValue)
A = sc.parallelize().map(worker)
def worker(element):
element *= V.value # works just fine
I'd rather not have to put all my Spark logic in the main method, if I can. Is there any way to broadcast variables from within local functions and have them be globally visible to the Spark workers?
Alternatively, what would be a good design pattern for this kind of situation--e.g., I want to write a method specifically for Spark which is self-contained and performs a specific function I'd like to re-use?
I am not sure I completely understood the question but, if you need the V object inside the worker function you then you definitely should pass it as a parameter, otherwise the method is not really self-contained:
def worker(V, element):
element *= V.value
Now in order to use it in map functions you need to use a partial, so that map only sees a 1 parameter function:
from functools import partial
def SomeMethod(sc):
someValue = rand()
V = sc.broadcast(someValue)
A = sc.parallelize().map(partial(worker, V=V))