Python: intercept a class loading action - python

Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module

You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?

There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.

The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.

Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data

Related

Passing imports to a script from inside an external method

I have kind of a tricky question, so that it is difficult to even describe it.
Suppose I have this script, which we will call master:
#in master.py
import slave as slv
def import_func():
import time
slv.method(import_func)
I want to make sure method in slave.py, which looks like this:
#in slave.py
def method(import_func):
import_func()
time.sleep(10)
actually runs like I imported the time package. Currently it does not work, I believe because the import stays exists only in the scope of import_func().
Keep in mind that the rules of the game are:
I cannot import anything in slave.py outside method
I need to pass the imports which method needs through import_func() in master.py
the procedure must work for a variable number of imports inside method. In other words, method cannot know how many imports it will receive but needs to work nonetheless.
the procedure needs to work for any import possible. So options like pyforest are not suitable.
I know it can theoretically be done through importlib, but I would prefer a more straightforward idea, because if we have a lot of imports with different 'as' labels it would become extremely tedious and convoluted with importlib.
I know it is kind of a quirky question but I'd really like to know if it is possible. Thanks
What you can do is this in the master file:
#in master.py
import slave as slv
def import_func():
import time
return time
slv.method(import_func)
Now use time return value in the slave file:
#in slave.py
def method(import_func):
time = import_func()
time.sleep(10)
Why would you have to do this? It's because of the application's stack. When import_func() is called on slave.py, it imports the library on the stack. However, when the function terminates, all stack data is released from memory. So the library would get released and collected by the garbage collector.
By returning time from import_func(), you guarantee it continues existing in memory once the function terminates executing.
Now, to import more modules? Simple. Return a list with multiples modules inside. Or maybe a Dictionary for simple access. That's one way of doing it.
[Edit] Using a dictionary and importlib to pass multiple imports to slave.py:
master.py:
import test2 as slv
import importlib
def master_import(packname, imports={}):
imports[packname] = importlib.import_module(packname)
def import_func():
imports = {}
master_import('time', imports)
return imports
slv.method(import_func)
slave.py:
#in slave.py
def method(import_func):
imports = import_func()
imports['time'].sleep(10)
This way, you can literally import any modules you want on master.py side, using master_import() function, and pass them to slave script.
Check this answer on how to use importlib.

python: dynamically loading one-time plugins?

I'm writing a python application in which I want to make use of dynamic, one-time-runnable plugins.
By this I mean that at various times during the running of this application, it looks for python source files with special names in specific locations. If any such source file is found, I want my application to load it, run a pre-named function within it (if such a function exists), and then forget about that source file.
Later during the running of the application, that file might have changed, and I want my python application to reload it afresh, execute its method, and then forget about it, like before.
The standard import system keeps the module resident after the initial load, and this means that subsequent "import" or "__import__" calls won't reload the same module after its initial import. Therefore, any changes to the python code within this source file are ignored during its second through n-th imports.
In order for such packages to be loaded uniquely each time, I came up with the following procedure. It works, but it seems kind of "hacky" to me. Are there any more elegant or preferred ways of doing this? (note that the following is an over-simplified, illustrative example)
import sys
import imp
# The following module name can be anything, as long as it doesn't
# change throughout the life of the application ...
modname = '__whatever__'
def myimport(path):
'''Dynamically load python code from "path"'''
# get rid of previous instance, if it exists
try:
del sys.modules[modname]
except:
pass
# load the module
try:
return imp.load_source(modname, path)
except Exception, e:
print 'exception: {}'.format(e)
return None
mymod = myimport('/path/to/plugin.py')
if mymod is not None:
# call the plugin function:
try:
mymod.func()
except:
print 'func() not defined in plugin: {}'.format(path)
Addendum: one problem with this is that func() runs within a separate module context, and it has no access to any functions or variables within the caller's space. I therefore have to do inelegant things like the following if I
want func_one(), func_two() and abc to be accessible within the invocation
of func():
def func_one():
# whatever
def func_two():
# whatever
abc = '123'
# Load the module as shown above, but before invoking mymod.func(),
# the following has to be done ...
mymod.func_one = func_one
mymod.func_two = func_two
mymod.abc = abc
# This is a PITA, and I'm hoping there's a better way to do all of
# this.
Thank you very much.
I use the following code to do this sort of thing.
Note that I don't actually import the code as a module, but instead execute the code in a particular context. This lets me define a bunch of api functions automatically available to the plugins without users having to import anything.
def load_plugin(filename, context):
source = open(filename).read()
code = compile(source, filename, 'exec')
exec(code, context)
return context['func']
context = { 'func_one': func_one, 'func_two': func_two, 'abc': abc }
func = load_plugin(filename, context)
func()
This method works in python 2.6+ and python 3.3+
The approach you use is totally fine. For this question
one problem with this is that func() runs within a separate module context, and it has no access to any functions or variables within the caller's space.
It may be better to use execfile function:
# main.py
def func1():
print ('func1 called')
exec(open('trackableClass.py','r').read(),globals()) # this is similar to import except everything is done in the current module
#execfile('/path/to/plugin.py',globals()) # python 2 version
func()
Test it:
#/path/to/plugin.py
def func():
func1()
Result:
python main.py
# func1 called
One potential problem with this approach is namespace pollution because every file is run in the current namespace which increase the chance of name conflict.

Access part of module from pytest

I have an issue accessing part of imported module from the pytest.
Here is branch with code referenced below: https://github.com/asvc/snapshotr/tree/develop
In particular, when running this test, it works as expected for test_correct_installation() but test_script_name_checking() fails with AttributeError.
import main as ss
import os
class TestInit:
def test_correct_installation(self):
assert os.path.exists(ss.snapr_path)
assert os.path.isfile(ss.snapr_path + "/main/markup.py")
assert os.path.isfile(ss.snapr_path + "/main/scandir.py")
def test_script_name_checking(self):
assert ss.ssPanel.check_script('blah') is None # Here it fails
Link to the main which is being tested
What I'm trying to do is to "extract" isolated piece of code, run it with known data and compare result to some reference. Seems like extraction part doesn't work quite well, best practises for such cases would be greatly appreciated.
Traceback:
AttributeError: 'module' object has no attribute 'ssPanel'
I have tried a small hack in the test_init.py:
class dummy():
pass
nuke = dummy()
nuke.GUI = True
But it (obviously) doesn't work as nuke.GUI is being redefined in __init__.py upon every launch.
This is a quite complex situation. When you import main in test_init.py, it will import main/__init__.py and execute all the code. This will cause nuke being imported and also, if nuke.GUI is False, there will not be ssPanel, as you can see.
The problem is that, you can't fake a dummy nuke in the test script. It won't work. Because before the test is running, the real nuke was already imported.
My suggestion would be seperate ssPanel into another python file. Then in __init__.py we can do:
if nuke.GUI:
from sspanel import ssPanel
And in test scripts, we can also easily import it using:
from main.sspanel import ssPanel

Pytest and Dynamic fixture modules

I am writing functional tests using pytest for a software that can run locally and in the cloud. I want to create 2 modules, each with the same module/fixture names, and have pytest load one or the other depending if I'm running tests locally or in the cloud:
/fixtures
/fixtures/__init__.py
/fixtures/local_hybrids
/fixtures/local_hybrids/__init__.py
/fixtures/local_hybrids/foo.py
/fixtures/cloud_hybrids
/fixtures/cloud_hybrids/__init__.py
/fixtures/cloud_hybrids/foo.py
/test_hybrids/test_hybrids.py
foo.py (both of them):
import pytest
#pytest.fixture()
def my_fixture():
return True
/fixtures/__init__.py:
if True:
import local_hybrids as hybrids
else:
import cloud_hybrids as hybrids
/test_hybrids/test_hybrids.py:
from fixtures.hybrids.foo import my_fixture
def test_hybrid(my_fixture):
assert my_fixture
The last code block doesn't work of course, because import fixtures.hybrids is looking at the file system instead of __init__.py's "fake" namespace, which isn't like from fixtures import hybrids, which works (but then you cannot use the fixtures as the names would involve dot notation).
I realize that I could play with pytest_generate_test to alter the fixture dynamically (maybe?) but I'd really hate managing each fixture manually from within that function... I was hoping the dynamic import (if x, import this, else import that) was standard Python, unfortunately it clashes with the fixtures mechanism:
import fixtures
def test(fixtures.hybrids.my_fixture): # of course it doesn't work :)
...
I could also import each fixture function one after the other in init; more legwork, but still a viable option to fool pytest and get fixture names without dots.
Show me the black magic. :) Can it be done?
I think in your case it's better to define a fixture - environment or other nice name.
This fixture can be just a getter from os.environ['KEY'] or you can add custom command line argument like here
then use it like here
and the final use is here.
What im trying to tell is that you need to switch thinking into dependency injection: everything should be a fixture. In your case (and in my plugin as well), runtime environment should be a fixture, which is checked in all other fixtures which depend on the environment.
You might be missing something here: If you want to re-use those fixtures you need to say it explicitly:
from fixtures.hybrids.foo import my_fixture
#pytest.mark.usefixtures('my_fixture')
def test_hybrid(my_fixture):
assert my_fixture
In that case you could tweak pytest as following:
from local_hybrids import local_hybrids_fixture
from cloud_hybrids import cloud_hybrids_fixture
fixtures_to_test = {
"local":None,
"cloud":None
}
#pytest.mark.usefixtures("local_hybrids_fixture")
def test_add_local_fixture(local_hybrids_fixture):
fixtures_to_test["local"] = local_hybrids_fixture
#pytest.mark.usefixtures("cloud_hybrids_fixture")
def test_add_local_fixture(cloud_hybrids_fixture):
fixtures_to_test["cloud"] = cloud_hybrids_fixture
def test_on_fixtures():
if cloud_enabled:
fixture = fixtures_to_test["cloud"]
else:
fixture = fixtures_to_test["local"]
...
If there are better solutions around I am also interested ;)
I don't really think there is a "good way" of doing that in python, but still it is possible with a little amount of hacking. You can update sys.path for the subfolder with fixtures you would like to use and import fixtures directly. In dirty case it look's like that:
for your fixtures/__init__.py:
if True:
import local as hybrids
else:
import cloud as hybrids
def update_path(module):
from sys import path
from os.path import join, pardir, abspath
mod_dir = abspath(join(module.__file__, pardir))
path.insert(0, mod_dir)
update_path(hybrids)
and in the client code (test_hybrids/test_hybrids.py) :
import fixtures
from foo import spam
spam()
In other cases you can use much more complex actions to perform a fake-move of all modules/packages/functions etc from your cloud/local folder directly into the fixture's __init__.py. Still, I think - it does not worth a try.
One more thing - black magic is not the best thing to use, I would recommend you to use a dotted notation with "import X from Y" - this is much more stable solution.
Use the pytest plugins feature and put your fixtures in separate modules. Then at runtime select which plug-in you’ll be drawing from via a command line argument or an environment variable. It needs to be something global because you need to place different pytest_plugins list assignments based on the global value.
Take a look at the section Conditional Plugins from this repo https://github.com/jxramos/pytest_behavior/tree/main/conditional_plugins

import module_name Vs __import__('module_name')

I am writing a python module and I am using many imports of other different modules.
I am bit confused that whether I should import all the necessary dependent modules in the opening of the file or shall I do it when necessary.
I also wanted to know the implications of both.
I come from C++ back ground so I am really thrilled with this feature and does not see any reason of not using __import__(), importing the modules only when needed inside my function.
Kindly throw some light on this.
To write less code, import a module at the first lines of the script, e.g.:
#File1.py
import os
#use os somewhere:
os.path.chdir(some_dir)
...
...
#use os somewhere else, you don't need to "import os" everywhere
os.environ.update(some_dict)
While sometimes you may need to import a module locally (e.g., in a function):
abc=3
def foo():
from some_module import abc #import inside foo avoids you from naming conflicts
abc(...) #call the function, nothing to do with the variable "abc" outside "foo"
Don't worry about the time consumption when calling foo() multiple times, since import statements loads modules/functions only one time. Once a module/function is imported, the object is stored in dictionary sys.modules, which is a lookup table for speedup when running the same import statement.
As #bruno desthuilliers mentioned, importing insede functions may not be that pythonic, it violates PEP8, here's a discussion I found, you should stick to importing at the top of the file most of the time.
First, __import__ isn't usually needed anywhere. It's main purpose is to support dynamic importing of things that you don't know ahead of time (think plug-ins). You can easily use the import statement inside your function:
import sys
def foo():
import this
if __name__ == "__main__":
print sys.version_info
foo()
The main advantage to importing everything up-front is that it is most customary. That's where people reading your code will go to see if something is imported or not. Also, you don't need to write import os in every function that uses os. The main downsides of this approach are that:
you can get yourself into unresolvable import loops (A imports B which imports A)
that you pull everything into memory even if you aren't going to use it.
The second problem isn't typically an issue -- very rarely do you notice the performance or memory impact of an import.
If you run into the first problem, it's likely a symptom of poorly grouped code and the common stuff should be factored into a new module C which both A and B can use.
Firstly, it's a violation of PEP8 using imports inside functions.
Calling import it's an expensive call EVEN if the module is already loaded, so if your function is gonna being called many times this will not compensate the performance gain.
Also when you call "import test" python do this:
dataFile = __ import__('test')
The only downside of imports at the top of file it's the namespace that get polluted very fast depending on complexity of the file, but if your file it's too complex it's a signal of bad design.

Categories

Resources