Pythonic way to dynamically load and call modules - python

I have a working code, but I would like to know what is the proper Pythonic approach.
My goal: have a directory of "plugins" (one module per plugin), which is dynamically loaded when the program runs. All of the modules will have a function defined, which will act as an "entrypoint".
The aim is to have a script which is easily extended by some extra functionality.
What I have come up with is the following. reporter = plugin in this case.
import os
import importlib
import reporters # Package, where plugins (reporters) will reside
def find_reporters():
# Find all modules in directory "reporters" which look like "*_reporter.py"
reporters = [rep.rsplit('.py', 1)[0] for rep in os.listdir('reporters') if rep.endswith('_reporter.py')]
functions = []
for reporter in reporters:
module = importlib.import_module('.' + reporter, 'reporters')
try:
func = getattr(module, 'entry_function') # Read the entry_function if present
functions.append(func) # Add the function to the list to be returned
except AttributeError as e:
print(e)
return functions
def main():
funcs = find_reporters()
for func in funcs:
func() # Execute all collected functions
I am not too seasoned in Python, so is this an acceptable solution?

Related

Is it possible to mock os.scandir and its attributes?

for entry in os.scandir(document_dir)
if os.path.isdir(entry):
# some code goes here
else:
# else the file needs to be in a folder
file_path = entry.path.replace(os.sep, '/')
I am having trouble mocking os.scandir and the path attribute within the else statement. I am not able to mock the mock object's property I created in my unit tests.
with patch("os.scandir") as mock_scandir:
# mock_scandir.return_value = ["docs.json", ]
# mock_scandir.side_effect = ["docs.json", ]
# mock_scandir.return_value.path = PropertyMock(return_value="docs.json")
These are all the options I've tried. Any help is greatly appreciated.
It depends on what you realy need to mock. The problem is that os.scandir returns entries of type os.DirEntry. One possibility is to use your own mock DirEntry and implement only the methods that you need (in your example, only path). For your example, you also have to mock os.path.isdir. Here is a self-contained example for how you can do this:
import os
from unittest.mock import patch
def get_paths(document_dir):
# example function containing your code
paths = []
for entry in os.scandir(document_dir):
if os.path.isdir(entry):
pass
else:
# else the file needs to be in a folder
file_path = entry.path.replace(os.sep, '/')
paths.append(file_path)
return paths
class DirEntry:
def __init__(self, path):
self.path = path
def path(self):
return self.path
#patch("os.scandir")
#patch("os.path.isdir")
def test_sut(mock_isdir, mock_scandir):
mock_isdir.return_value = False
mock_scandir.return_value = [DirEntry("docs.json")]
assert get_paths("anydir") == ["docs.json"]
Depending on your actual code, you may have to do more.
If you want to patch more file system functions, you may consider to use pyfakefs instead, which patches the whole file system. This will be overkill for a single test, but can be handy for a test suite relying on file system functions.
Disclaimer: I'm a contributor to pyfakefs.

How to dynamically reload function in Python?

I'm trying to create a process that dynamically watches jupyter notebooks, compiles them on modification and imports them into my current file, however I can't seem to execute the updated code. It only executes the first version that was loaded.
There's a file called producer.py that calls this function repeatedly:
import fs.fs_util as fs_util
while(True):
fs_util.update_feature_list()
In fs_util.py I do the following:
from fs.feature import Feature
import inspect
from importlib import reload
import os
def is_subclass_of_feature(o):
return inspect.isclass(o) and issubclass(o, Feature) and o is not Feature
def get_instances_of_features(name):
module = __import__(COMPILED_MODULE, fromlist=[name])
module = reload(module)
feature_members = getattr(module, name)
all_features = inspect.getmembers(feature_members, predicate=is_subclass_of_feature)
return [f[1]() for f in all_features]
This function is called by:
def update_feature_list(name):
os.system("jupyter nbconvert --to script {}{} --output {}{}"
.format(PATH + "/" + s3.OUTPUT_PATH, name + JUPYTER_EXTENSION, PATH + "/" + COMPILED_PATH, name))
features = get_instances_of_features(name)
for f in features:
try:
feature = f.create_feature()
except Exception as e:
print(e)
There is other irrelevant code that checks for whether a file has been modified etc.
I can tell the file is being reloaded correctly because when I use inspect.getsource(f.create_feature) on the class it displays the updated source code, however during execution it returns older values. I've verified this by changing print statements as well as comparing the return values.
Also for some more context the file I'm trying to import:
from fs.feature import Feature
class SubFeature(Feature):
def __init__(self):
Feature.__init__(self)
def create_feature(self):
return "hello"
I was wondering what I was doing incorrectly?
So I found out what I was doing wrong.
When called reload I was reloading the module I had newly imported, which was fairly idiotic I suppose. The correct solution (in my case) was to reload the module from sys.modules, so it would be something like reload(sys.modules[COMPILED_MODULE + "." + name])

Call a function from different file where the file name and function name are read from a list

I have multiple functions stored in different files, Both file names and function names are stored in lists. Is there any option to call the required function without the conditional statements?
Example, file1 has functions function11 and function12,
def function11():
pass
def function12():
pass
file2 has functions function21 and function22
def function21():
pass
def function22():
pass
and I have the lists
file_name = ["file1", "file2", "file1"]
function_name = ["function12", "function22", "funciton12"]
I will get the list index from different function, based on that I need to call the function and get the output.
If the other function will give you a list index directly, then you don't need to deal with the function names as strings. Instead, directly store (without calling) the functions in the list:
import file1, file2
functions = [file1.function12, file2.function22, file1.function12]
And then call them once you have the index:
function[index]()
There are ways to do what is called "reflection" in Python and get from the string to a matching-named function. But they solve a problem that is more advanced than what you describe, and they are more difficult (especially if you also have to work with the module names).
If you have a "whitelist" of functions and modules that are allowed to be called from the config file, but still need to find them by string, you can explicitly create the mapping with a dict:
allowed_functions = {
'file1': {
'function11': file1.function11,
'function12': file1.function12
},
'file2': {
'function21': file2.function21,
'function22': file2.function22
}
}
And then invoke the function:
try:
func = allowed_functions[module_name][function_name]
except KeyError:
raise ValueError("this function/module name is not allowed")
else:
func()
The most advanced approach is if you need to load code from a "plugin" module created by the author. You can use the standard library importlib package to use the string name to find a file to import as a module, and import it dynamically. It looks something like:
from importlib.util import spec_from_file_location, module_from_spec
# Look for the file at the specified path, figure out the module name
# from the base file name, import it and make a module object.
def load_module(path):
folder, filename = os.path.split(path)
basename, extension = os.path.splitext(filename)
spec = spec_from_file_location(basename, path)
module = module_from_spec(spec)
spec.loader.exec_module(module)
assert module.__name__ == basename
return module
This is still unsafe, in the sense that it can look anywhere on the file system for the module. Better if you specify the folder yourself, and only allow a filename to be used in the config file; but then you still have to protect against hacking the path by using things like ".." and "/" in the "filename".
(I have a project that does something like this. It chooses the paths from a whitelist that is also under the user's control, so I have to warn my users not to trust the path-whitelist file from each other. I also search the directories for modules, and then make a whitelist of plugins that may be used, based only on plugins that are in the directory - so no funny games with "..". And I'm still worried I forgot something.)
Once you have a module name, you can get a function from it by name like:
dynamic_module = load_module(some_path)
try:
func = getattr(dynamic_module, function_name)
except AttributeError:
raise ValueError("function not in module")
At any rate, there is no reason to eval anything, or generate and import code based on user input. That is most unsafe of all.
Another alternative. This is not much safer than an eval() however.
Someone with access to the lists you read from the config file could inject malicious code in the lists you import.
I.e.
'from subprocess import call; subprocess.call(["rm", "-rf", "./*" stdout=/dev/null, stderr=/dev/null, shell=True)'
Code:
import re
# You must first create a directory named "test_module"
# You can do this with code if needed.
# Python recognizes a "module" as a module by the existence of an __init__.py
# It will load that __init__.py at the "import" command, and you can access the methods it imports
m = ["os", "sys", "subprocess"] # Modules to import from
f = ["getcwd", "exit", "call; call('do', '---terrible-things')"] # Methods to import
# Create an __init__.py
with open("./test_module/__init__.py", "w") as FH:
for count in range(0, len(m), 1):
# Writes "from module import method" to __init.py
line = "from {} import {}\n".format(m[count], f[count])
# !!!! SANITIZE THE LINE !!!!!
if not re.match("^from [a-zA-Z0-9._]+ import [a-zA-Z0-9._]+$", line):
print("The line '{}' is suspicious. Will not be entered into __init__.py!!".format(line))
continue
FH.write(line)
import test_module
print(test_module.getcwd())
OUTPUT:
The line 'from subprocess import call; call('do', '---terrible-things')' is suspicious. Will not be entered into __init__.py!!
/home/rightmire/eclipse-workspace/junkcode
I'm not 100% sure I'm understanding the need. Maybe more detail in the question.
Is something like this what you're looking for?
m = ["os"]
f = ["getcwd"]
command = ''.join([m[0], ".", f[0], "()"])
# Put in some minimum sanity checking and sanitization!!!
if ";" in command or <other dangerous string> in command:
print("The line '{}' is suspicious. Will not run".format(command))
sys.exit(1)
print("This will error if the method isnt imported...")
print(eval(''.join([m[0], ".", f[0], "()"])) )
OUTPUT:
This will error if the method isnt imported...
/home/rightmire/eclipse-workspace/junkcode
As pointed out by #KarlKnechtel, having commands come in from an external file is a gargantuan security risk!

Dynamically reload a class definition in Python

I've written an IRC bot using Twisted and now I've gotten to the point where I want to be able to dynamically reload functionality.
In my main program, I do from bots.google import GoogleBot and I've looked at how to use reload to reload modules, but I still can't figure out how to do dynamic re-importing of classes.
So, given a Python class, how do I dynamically reload the class definition?
Reload is unreliable and has many corner cases where it may fail. It is suitable for reloading simple, self-contained, scripts. If you want to dynamically reload your code without restart consider using forkloop instead:
http://opensourcehacker.com/2011/11/08/sauna-reload-the-most-awesomely-named-python-package-ever/
You cannot reload the module using reload(module) when using the from X import Y form. You'd have to do something like reload(sys.modules['module']) in that case.
This might not necessarily be the best way to do what you want, but it works!
import bots.google
class BotClass(irc.IRCClient):
def __init__(self):
global plugins
plugins = [bots.google.GoogleBot()]
def privmsg(self, user, channel, msg):
global plugins
parts = msg.split(' ')
trigger = parts[0]
if trigger == '!reload':
reload(bots.google)
plugins = [bots.google.GoogleBot()]
print "Successfully reloaded plugins"
I figured it out, here's the code I use:
def reimport_class(self, cls):
"""
Reload and reimport class "cls". Return the new definition of the class.
"""
# Get the fully qualified name of the class.
from twisted.python import reflect
full_path = reflect.qual(cls)
# Naively parse the module name and class name.
# Can be done much better...
match = re.match(r'(.*)\.([^\.]+)', full_path)
module_name = match.group(1)
class_name = match.group(2)
# This is where the good stuff happens.
mod = __import__(module_name, fromlist=[class_name])
reload(mod)
# The (reloaded definition of the) class itself is returned.
return getattr(mod, class_name)
Better yet subprocess the plugins, then hypervise the subprocess, when the files change reload the plugins process.
Edit: cleaned up.
You can use the sys.modules to dynamically reload modules based on user-input.
Say that you have a folder with multiple plugins such as:
module/
cmdtest.py
urltitle.py
...
You can use sys.modules in this way to load/reload modules based on userinput:
import sys
if sys.modules['module.' + userinput]:
reload(sys.modules['module.' + userinput])
else:
' Module not loaded. Cannot reload '
try:
module = __import__("module." + userinput)
module = sys.modules["module." + userinput]
except:
' error when trying to load %s ' % userinput
When you do a from ... import ... it binds the object into the local namespace, so all you need to is re-import it. However, since the module is already loaded, it will just re-import the same version of the class so you would need to reload the module too. So this should do it:
from bots.google import GoogleBot
...
# do stuff
...
reload(bots.google)
from bots.google import GoogleBot
If for some reason you don't know the module name you can get it from GoogleBot.module.
def reload_class(class_obj):
module_name = class_obj.__module__
module = sys.modules[module_name]
pycfile = module.__file__
modulepath = string.replace(pycfile, ".pyc", ".py")
code=open(modulepath, 'rU').read()
compile(code, module_name, "exec")
module = reload(module)
return getattr(module,class_obj.__name__)
There is a lot of error checking you can do on this, if your using global variables you will probably have to figure out what happens then.

python windows directory mtime: how to detect package directory new file?

I'm working on an auto-reload feature for WHIFF
http://whiff.sourceforge.net
(so you have to restart the HTTP server less often, ideally never).
I have the following code to reload a package module "location"
if a file is added to the package directory. It doesn't work on Windows XP.
How can I fix it? I think the problem is that getmtime(dir) doesn't
change on Windows when the directory content changes?
I'd really rather not compare an os.listdir(dir) with the last directory
content every time I access the package...
if not do_reload and hasattr(location, "__path__"):
path0 = location.__path__[0]
if os.path.exists(path0):
dir_mtime = int( os.path.getmtime(path0) )
if fn_mtime<dir_mtime:
print "dir change: reloading package root", location
do_reload = True
md_mtime = dir_mtime
In the code the "fn_mtime" is the recorded mtime from the last (re)load.
... added comment: I came up with the following work around, which I think
may work, but I don't care for it too much since it involves code generation.
I dynamically generate a code fragment to load a module and if it fails
it tries again after a reload. Not tested yet.
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
def my_import(partname, parent):
f = None # for pychecker
parentname = parent.__name__
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
#pr "executing"
#pr defn
try:
exec(defn) # defines function f()
except SyntaxError:
raise ImportError, "bad function name "+repr(partname)+"?"
partmodule = f()
#pr "got", partmodule
setattr(parent, partname, partmodule)
#pr "setattr", parent, ".", partname, "=", getattr(parent, partname)
return partmodule
Other suggestions welcome. I'm not happy about this...
long time no see. I'm not sure exactly what you're doing, but the equivalent of your code:
GET_MODULE_FUNCTION = """
def f():
import %(parent)s
try:
from %(parent)s import %(child)s
except ImportError:
# one more time...
reload(%(parent)s)
from %(parent)s import %(child)s
return %(child)s
"""
to be execed with:
defn = GET_MODULE_FUNCTION % {"parent": parentname, "child": partname}
exec(defn)
is (per the docs), assuming parentname names a package and partname names a module in that package (if partname is a top-level name of the parentname package, such as a function or class, you'll have to use a getattr at the end):
import sys
def f(parentname, partname):
name = '%s.%s' % (parentname, partname)
try:
__import__(name)
except ImportError:
parent = __import__(parentname)
reload(parent)
__import__(name)
return sys.modules[name]
without exec or anything weird, just call this f appropriately.
you can try using getatime() instead.
I'm not understanding your question completely...
Are you calling getmtime() on a directory or an individual file?
There are two things about your first code snippet that concern me:
You cast the float from getmtime to int. Dependening on the frequency this code is run, you might get unreliable results.
At the end of the code you assign dir_mtime to a variable md_mtime. fn_mtime, which you check against, seems not to be updated.

Categories

Resources