ProcessPoolExecuter and global variables - python

I have a question regarding global variables and using different processes.
For my main python script I have a main function which calls Initialize to initialize a global variable in Metrics.py. I've also created a getter function to retrieve this variable.
Main.py:
from Metrics import *
import pprint
def doMultiProcess(Files):
result = {}
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(ProcessFile, file) for file in Files]
for f in concurrent.futures.as_completed(futures):
# future.result needs to be evaluated to catch any exception
try:
filename, distribution = f.result()
result[filename] = {}
result[filename] = distribution
except Exception as e:
pprint.pprint(e)
return result
Files = ["A.txt", "B.txt", "C.txt"]
def main:
Initialize()
results = doMultiProcess(Files)
if __name__ == '__name__':
main()
Metrics.py
Keywords = ['pragma', 'contract']
def Initialize():
global Keywords
Keywords= ['pragma', 'contract', 'function', 'is']
def GetList():
global Keywords # I believe this is not necessary.
return Keywords
def ProcessFile(filename):
# Read data and extract all the words from the file, then determine the frequency
# distribution of the words, using nltk.freqDist, which is stored in: freqDistTokens
distribution = {keyword:freqDistTokens[keyword] for keyword in Keywords}
return (filename, distribution)
I hope I've simplified the example enough and not left out important information. Now, what I don't understand is why the processes keep working with the initial value of Keywords which contains 'pragma' and 'contract'. I call the initialize before actually running the processes and therefore should set the global variable to something different then the initial value, right? What am I missing that is happening here.
I've worked around this by actually supplying the Keywords list to the process by using the GetList() function but I would like to understand as to why this is happening.

Related

Python: Is there a way to have a setup of env. before calling a function and after that restore the previous env.?

My scenario:
I have a variable holding a link. e.g. REMOTE_API = "http://<site>/api/a/b/c"
This link stays the same all the time so it can be thought of as a constant.
It is used in many parts of the program.
But there are few parts of the program where the link needs to be changed e.g. REMOTE_API = "http://<site>/api/<user_name>/a/b/c" only if some condition is met. This condition is dictated by a config. file that may change without notice.
Is there a way to change the variable default before running a function and at the end of the function to switch back?
e.g.
#prepare_env(<if condition is met>)
def func():
<...>
call_api(REMOTE_API) # "http://<site>/api/<user_name>/a/b/c"
<...>
if __name__ == "__main__":
call_api_with_default(REMOTE_API) # REMOTE_API = "http://<site>/api/a/b/c"
func() # codition is met REMOTE_API = "http://<site>/api/<user_name>/a/b/c"
an_other_call_with_default(REMOTE_API) # REMOTE_API = "http://<site>/api/a/b/c"
You may write a function which takes a function and calls it, setting and restoring the environment variable as needed. E.g.:
#!/usr/bin/env python3
# content of env.py
import os
def call_with_env_var(f, var_name, var_value):
old_value = os.environ[var_name]
os.environ[var_name] = var_value
ret = f()
os.environ[var_name] = old_value
def print_var(name):
print(f'value of {name}: {os.environ[name]}')
if __name__ == '__main__':
print_var('HOME')
call_with_env_var(
lambda: print_var('HOME'),
'HOME',
'xyz'
)
print_var('HOME')
 
$ ./env.py
value of HOME: /home/etuardu
value of HOME: xyz
value of HOME: /home/etuardu
$ echo $HOME
/home/etuardu

How to pass a string as an object to getattr python

I have a number of functions that need to get called from various imported files.
The functions are formated along the lines of this:
a.foo
b.foo2
a.bar.foo4
a.c.d.foo5
and they are passed in to my script as a raw string.
I'm looking for a clean way to run these, with arguments, and get the return values
Right now I have a messy system of splitting the strings then feeding them to the right getattr call but this feels kind of clumsy and is very un-scalable. Is there a way I can just pass the object portion of getattr as a string? Or some other way of doing this?
import a, b, a.bar, a.c.d
if "." in raw_script:
split_script = raw_script.split(".")
if 'a' in raw_script:
if 'a.bar' in raw_script:
out = getattr(a.bar, split_script[-1])(args)
if 'a.c.d' in raw_script:
out = getattr(a.c.d, split_script[-1])(args)
else:
out = getattr(a, split_script[-1])(args)
elif 'b' in raw_script:
out = getattr(b, split_script[-1])(args)
It's hard to tell from your question, but it sounds like you have a command line tool you run as my-tool <function> [options]. You could use importlib like this, avoiding most of the getattr calls:
import importlib
def run_function(name, args):
module, function = name.rsplit('.', 1)
module = importlib.import_module(module)
function = getattr(module, function)
function(*args)
if __name__ == '__main__':
# Elided: retrieve function name and args from command line
run_function(name, args)
Try this:
def lookup(path):
obj = globals()
for element in path.split('.'):
try:
obj = obj[element]
except KeyError:
obj = getattr(obj, element)
return obj
Note that this will handle a path starting with ANY global name, not just your a and b imported modules. If there are any possible concerns with untrusted input being provided to the function, you should start with a dict containing the allowed starting points, not the entire globals dict.

Pythonic way to dynamically load and call modules

I have a working code, but I would like to know what is the proper Pythonic approach.
My goal: have a directory of "plugins" (one module per plugin), which is dynamically loaded when the program runs. All of the modules will have a function defined, which will act as an "entrypoint".
The aim is to have a script which is easily extended by some extra functionality.
What I have come up with is the following. reporter = plugin in this case.
import os
import importlib
import reporters # Package, where plugins (reporters) will reside
def find_reporters():
# Find all modules in directory "reporters" which look like "*_reporter.py"
reporters = [rep.rsplit('.py', 1)[0] for rep in os.listdir('reporters') if rep.endswith('_reporter.py')]
functions = []
for reporter in reporters:
module = importlib.import_module('.' + reporter, 'reporters')
try:
func = getattr(module, 'entry_function') # Read the entry_function if present
functions.append(func) # Add the function to the list to be returned
except AttributeError as e:
print(e)
return functions
def main():
funcs = find_reporters()
for func in funcs:
func() # Execute all collected functions
I am not too seasoned in Python, so is this an acceptable solution?

How to mock open differently depending on the parameters passed to open()

My question is how to mock open in python, such that it reacts differently depending on the argument open() is called with. These are some different scenario's that should be possible:
open a mocked file; read preset contents, the basic scenario.
open two mocked files and have them give back different values for the read() method. The order in which the files are opened/read from should not influence the results.
Furthermore, if I call open('actual_file.txt') to open an actual file, I want the actual file to be opened, and not a magic mock with mocked behavior. Or if I just don't want the access to a certain file mocked, but I do want other files to be mocked, this should be possible.
I know about this question: Python mock builtin 'open' in a class using two different files.
But that answer only partially answers up to the second requirement. The part about order independent results is not included and it does not specify how to mock only some calls, and allow other calls to go through to the actual files (default behavior).
A bit late, but I just recently happened upon the same need, so I'd like to share my solution, based upon this answer from the referred-to question:
import pytest
from unittest.mock import mock_open
from functools import partial
from pathlib import Path
mock_file_data = {
"file1.txt": "some text 1",
"file2.txt": "some text 2",
# ... and so on ...
}
do_not_mock: {
# If you need exact match (see note in mocked_file(),
# you should replace these with the correct Path() invocations
"notmocked1.txt",
"notmocked2.txt",
# ... and so on ...
}
# Ref: https://stackoverflow.com/a/38618056/149900
def mocked_file(m, fn, *args, **kwargs):
m.opened_file = Path(fn)
fn = Path(fn).name # If you need exact path match, remove this line
if fn in do_not_mock:
return open(fn, *args, **kwargs)
if fn not in mock_file_data:
raise FileNotFoundError
data = mock_file_data[fn]
file_obj = mock_open(read_data=data).return_value
file_obj.__iter__.return_value = data.splitlines(True)
return file_obj
def assert_opened(m, fn):
fn = Path(fn)
assert m.opened_file == fn
#pytest.fixture()
def mocked_open(mocker):
m = mocker.patch("builtins.open")
m.side_effect = partial(mocked_file, m)
m.assert_opened = partial(assert_opened, m)
return m
def test_something(mocked_open):
...
# Something that should NOT invoke open()
mocked_open.assert_not_called()
...
# Something that SHOULD invoke open()
mocked_open.assert_called_once()
mocked_open.assert_opened("file1.txt")
# Depends on how the tested unit handle "naked" filenames,
# you might have to change the arg to:
# Path.cwd() / "file1.txt"
# ... and so on ...
Do note that (1) I am using Python 3, and (2) I am using pytest.
This can be done by following the approach in the other question's accepted answer (Python mock builtin 'open' in a class using two different files) with a few alterations.
First off. Instead of just specifying a side_effect that can be popped. We need to make sure the side_effect can return the correct mocked_file depending on the parameters used with the open call.
Then if the file we wish to open is not among the files we wish to mock, we instead return the original open() of the file instead of any mocked behavior.
The code below demonstrates how this can be achieved in a clean, repeatable way. I for instance have this code inside of a file that provides some utility functions to make testing easier.
from mock import MagicMock
import __builtin__
from mock import patch
import sys
# Reference to the original open function.
g__test_utils__original_open = open
g__test_utils__file_spec = None
def create_file_mock(read_data):
# Create file_spec such as in mock.mock_open
global g__test_utils__file_spec
if g__test_utils__file_spec is None:
# set on first use
if sys.version_info[0] == 3:
import _io
g__test_utils__file_spec = list(set(dir(_io.TextIOWrapper)).union(set(dir(_io.BytesIO))))
else:
g__test_utils__file_spec = file
file_handle = MagicMock(spec=g__test_utils__file_spec)
file_handle.write.return_value = None
file_handle.__enter__.return_value = file_handle
file_handle.read.return_value = read_data
return file_handle
def flexible_mock_open(file_map):
def flexible_side_effect(file_name):
if file_name in file_map:
return file_map[file_name]
else:
global g__test_utils__original_open
return g__test_utils__original_open(file_name)
global g__test_utils__original_open
return_value = MagicMock(name='open', spec=g__test_utils__original_open)
return_value.side_effect = flexible_side_effect
return return_value
if __name__ == "__main__":
a_mock = create_file_mock(read_data="a mock - content")
b_mock = create_file_mock(read_data="b mock - different content")
mocked_files = {
'a' : a_mock,
'b' : b_mock,
}
with patch.object(__builtin__, 'open', flexible_mock_open(mocked_files)):
with open('a') as file_handle:
print file_handle.read() # prints a mock - content
with open('b') as file_handle:
print file_handle.read() # prints b mock - different content
with open('actual_file.txt') as file_handle:
print file_handle.read() # prints actual file contents
This borrows some code straight from the mock.py (python 2.7) for the creating of the file_spec.
side note: if there's any body that can help me in how to hide these globals if possible, that'd be very helpful.

Preprocessing function text in runtime bofore compilation

I decided to try to preprocess function text before it's compilation into byte-code and following execution. This is merely for training. I hardly imagine situations where it'll be a satisfactory solution to be used. I have faced one problem which I wanted to solve in this way, but eventually a better way was found. So this is just for training and to learn something new, not for real usage.
Assume we have a function, which source code we want to be modified quite a bit before compilation:
def f():
1;a()
print('Some statements 1')
1;a()
print('Some statements 2')
Let, for example, mark some lines of it with 1;, for them to be sometimes commented and sometimes not. I just take it for example, modifications of the function may be different.
To comment these lines I made a decorator. The whole code it bellow:
from __future__ import print_function
def a():
print('a()')
def comment_1(s):
lines = s.split('\n')
return '\n'.join(line.replace(';','#;',1) if line.strip().startswith('1;') else line for line in lines)
def remove_1(f):
import inspect
source = inspect.getsource(f)
new_source = comment_1(source)
with open('temp.py','w') as file:
file.write(new_source)
from temp import f as f_new
return f_new
def f():
1;a()
print('Some statements 1')
1;a()
print('Some statements 2')
f = remove_1(f) #If decorator #remove is used above f(), inspect.getsource includes #remove inside the code.
f()
I used inspect.getsourcelines to retrieve function f code. Then I made some text-processing (in this case commenting lines starting with 1;). After that I saved it to temp.py module, which is then imported. And then a function f is decorated in the main module.
The output, when decorator is applied, is this:
Some statements 1
Some statements 2
when NOT applied is this:
a()
Some statements 1
a()
Some statements 2
What I don't like is that I have to use hard drive to load compiled function. Can it be done without writing it to temporary module temp.py and importing from it?
The second question is about placing decorator above f: #replace. When I do this, inspect.getsourcelines returns f text with this decorator. I could manually be deleted from f's text. but that would be quite dangerous, as there may be more than one decorator applied. So I resorted to the old-style decoration syntax f = remove_1(f), which does the job. But still, is it possible to allow normal decoration technique with #replace?
One can avoid creating a temporary file by invoking the exec statement on the source. (You can also explicitly call compile prior to exec if you want additional control over compilation, but exec will do the compilation for you, so it's not necessary.) Correctly calling exec has the additional benefit that the function will work correctly if it accesses global variables from the namespace of its module.
The problem described in the second question can be resolved by temporarily blocking the decorator while it is running. That way the decorator remains, along all the other ones, but is a no-op.
Here is the updated source.
from __future__ import print_function
import sys
def a():
print('a()')
def comment_1(s):
lines = s.split('\n')
return '\n'.join(line.replace(';','#;',1) if line.strip().startswith('1;') else line for line in lines)
_blocked = False
def remove_1(f):
global _blocked
if _blocked:
return f
import inspect
source = inspect.getsource(f)
new_source = comment_1(source)
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec new_source in env
finally:
_blocked = False
return env[f.__name__]
#remove_1
def f():
1;a()
print('Some statements 1')
1;a()
print('Some statements 2')
f()
def remove_1(f):
import inspect
source = inspect.getsource(f)
new_source = comment_1(source)
env = sys.modules[f.__module__].__dict__.copy()
exec new_source in env
return env[f.__name__]
I'll leave a modified version of the solution given in the answer by user4815162342. It uses ast module to delete some parts of f, as was suggested in the comment to the question. To make it I majorly relied on the information in this article.
This implementation deletes all occurrences of a as standalone expression.
from __future__ import print_function
import sys
import ast
import inspect
def a():
print('a() is called')
_blocked = False
def remove_1(f):
global _blocked
if _blocked:
return f
import inspect
source = inspect.getsource(f)
a = ast.parse(source) #get ast tree of f
class Transformer(ast.NodeTransformer):
'''Will delete all expressions containing 'a' functions at the top level'''
def visit_Expr(self, node): #visit all expressions
try:
if node.value.func.id == 'a': #if expression consists of function with name a
return None #delete it
except(ValueError):
pass
return node #return node unchanged
transformer = Transformer()
a_new = transformer.visit(a)
f_new_compiled = compile(a_new,'<string>','exec')
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec(f_new_compiled,env)
finally:
_blocked = False
return env[f.__name__]
#remove_1
def f():
a();a()
print('Some statements 1')
a()
print('Some statements 2')
f()
The output is:
Some statements 1
Some statements 2

Categories

Resources