Some automated tests in a larger system need to be able to import a module, and then restore sys.modules to its original condition.
But this code fragment:
import sys
sys.modules = dict(sys.modules)
import pickle
causes this KeyError in Python 3.6-3.8:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[...]/python3.6/pickle.py", line 1562, in <module>
from _pickle import (
KeyError: '_compat_pickle'
It seems as if only pickle and modules that depend on it like multiprocessing are affected. I've investigated _compat_pickle - it's a module for pickling compatibility with Python 2 - but nothing jumps out that would cause this.
Is there a safe way to restore sys.modules back to an earlier state? And what is the mechanism behind this unexpected KeyError?
The problem is that sys.modules is a lie (I think). It is not actually the true source of the modules dict. That is stored on a C level in the current interpreter, and sys.modules is just a copy to that. _pickle is special, since it imports a module from C source, which I assume leads to this error (mismatch between what tstate->interp->modules says is imported and what sys.modules thinks is imported).
This might be considered a bug in python. I am not sure if a bug report already exists. Here is the bug report: https://bugs.python.org/issue12633 .
You could just save which keys are in modules before and after the code, and delete all other entries afterwards.
Related
My file is named "foo.py". It has only two lines.
import random
print(random.randint(10))
The error is...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py", line 45, in <module>
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
File "math.py", line 2, in <module>
from random import randint
ImportError: cannot import name randint
I am on MacOS 10.14.6
I note that I did not call random.randint() in this script, even
though it showed up in the error text.
I ran the script with $python
My /usr/bin/python is linked to python 2.7
I've tried this with python 3 as well, with the same error.
EDIT:
My script was originally named "math.py", but I changed it in response to another solution that pointed out the name conflict with the math.py library (even though my script was not importing that library). Even after my script name change, I'm still seeing --File "math.py"-- errors. Even after I'm no longer using random.randint(), I'm still seeing that function referenced in my errors.
I've tried deleting random.pyc and math.pyc to purge the artifacts of previous executions. But these do not see to eliminate the remnants of earlier errors.
Read the traceback:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py"
Python tries to do something inside the standard library random module...
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
in particular, it tries to import the standard library math module...
File "math.py", line 2, in <module>
but it gets yours instead (notice there's no path on the filename this time; it's just math.py in the current directory); i.e. the script you started from. Python detects the circular import and fails:
ImportError: cannot import name randint
Actually using randint doesn't matter, because this is a problem with the actual import of the module.
This happens because Python is configured by default (using sys.path, which is a list of paths to try, in order) to try to import scripts from the current working directory before looking anywhere else. It's convenient when you just want to write a few source files in the same folder and have them work with each other, but it causes these problems.
The expected solution is to just rename your file. Unfortunately there isn't an obvious list of names to avoid, although you could peek at your installation folder to be sure (or just check the online library reference, though that's not quite so direct).
I guess you could also modify sys.path:
import sys
sys.path.remove('') # the empty string in this list is for the current directory
sys.path.append('') # put it back, at the end this time
import random # now Python will look for modules in the standard library first,
# and only in the current folder as a last resort.
However, this is an ugly hack. It might break something else (and it can't save you if you have a local sys.py).
Code
def test_get_network_info(self):
with open(dirname(abspath(__file__)) + '/files/fake_network_info.txt', 'r') as mock_network_info:
with patch('subprocess.check_output', Mock(return_value=mock_network_info.read())):
self.assertEqual('192.168.1.100', get_network_info()[0])
self.assertEqual('255.255.255.0', get_network_info()[1])
self.assertEqual('192.168.1.0', get_network_info()[2])
Error
======================================================================
ERROR: test_get_network_info (tests.test_tools.ToolsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tim/Documents/overseer/app/tests/test_tools.py", line 21, in test_get_network_info
with patch('subprocess.check_output', Mock(return_value=mock_network_info.read())):
File "/usr/local/lib/python2.7/dist-packages/mock.py", line 1268, in __enter__
original, local = self.get_original()
File "/usr/local/lib/python2.7/dist-packages/mock.py", line 1242, in get_original
"%s does not have the attribute %r" % (target, name)
AttributeError: <module 'subprocess' from '/usr/local/lib/python2.7/dist-packages/twill/other_packages/subprocess.pyc'> does not have the attribute 'check_output'
What I understand
My understanding of the problem is that mock is trying to mock twill's subprocess module instead of the python one.
Questions
Am I doing something wrong ?
How can I specify that I want to patch the python subprocess module and not the twill's one ? (that may have been imported earlier in the test suite)**
Is there another way to patch the subprocess module ?
What I tried
I tried with patch('tools.subprocess.check_output', ...
Doesn't work.
I tired to use a decorator ...
Doesn't work either
I tired to patch directly the subprocess module subprocess.check_output = Mock( ...
Works but it's not good since it doesn't undo the patching.
Some more informations
If I run just this test and no other tests, it works because twill's subprocess module never got imported. But as soon as I run a test using twill, the above test will fail.
Here is the twill's version of subprocess wich looks like it has been copy pasted from an old version of python. It doesn't have any check_output function and that's why the test fails.
Twill's package comes from the Flask-Testing plugin which I use extensively. I submitted an issue on github here.
I hope someone from the lovely python community can help. :)
See my comment up there, due to bad practices in twill, the proper way would be to either fix twill, which may take some work, or move away to something else, but since you now heavily depend on Flask-Testing, it's not a cheap move either.
So this leaves us with a dirty trick: make sure to import subprocess anywhere before twill is imported. Internally, this will add a reference to the right subprocess module in sys.modules. Once a module is loaded, all subsequents import won't look anymore in sys.path but just use the reference already cached in sys.modules.
Unfortunately this is maybe not the end of the problem. Apparently twill uses a patched version of subprocess for some reason ; and those patches won't be available for him, since the plain built-in subprocess will be loaded instead. It's very likely it'll crash or behave in an unexpected way. If it's the case, well ... back to the suggestions above.
I have a segmenting.py module in a package called processing.
I am trying to call a function in the module in my main. It is extremely simple.
In main.py
from processing import segmenting
segmenting.test()
In segmenting.py
def test():
print 'succeed'
However, I end up with errors as follows:
>>> from processing import segmenting
>>>
>>> segmenting.test()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'test'
>>>
Where went wrong?
The most likely cause is that you didn't restart your interactive interpreter after editing (and saving!) segmenting.py. Modules are imported only once and cached. If you edit the source code and then run the import statement again, the module is simply retrieved from the cache and doesn't pick up your changes. See also the reload() built-in.
What is the precise rule under which this exception is raised by Python 3 interpreter?
There are plenty of SO questions about that, with excellent answers, but I could not find one that gave a clear, general, and logically precise definition of the circumstances when this exception occurs.
The documentation doesn't seem to be clear either. It says:
exception ImportError
Raised when an import statement fails to find
the module definition or when a from ... import fails to find a name
that is to be imported.
But this seems inconsistent with the following example.
I meant to ask for a general definition rather than a specific case, but to clarify my concerns, here's an example:
# code/t.py:
from code import d
# code/d.py
from code import t
Running module t.py from the command line results in ImportError: cannot import name d.
On the other hand, the following code doesn't raise exceptions:
# code/t.py:
import code.d
# code/d.py
import code.t
At all times, __init__.py is empty.
In this example, the only modules or names mentioned in the import statement are t and d, and they were both clearly found. If the documentation implies that some name within the d module isn't found, it's certainly not obvious; and on top of that, I'd expect it to raise NameError: name ... is not defined exception rather than ImportError.
If abc is a package and xyz is a module, and if abc's __init__.py defines an __all__ that does not include xyz, then you won't be able to do from abc import xyz, but you'll still be able to do import abc.xyz.
Edit: The short answer is: your problem is that your imports are circular. Modules t and d try to import each other. This won't work. Don't do it. I'm going to explain the whole thing, below but the explanation is pretty long.
To understand why it gives an ImportError, try to follow the code execution. If you look at the full traceback instead of just the final part, you can see what it's doing. With your setup I get a traceback like this (I called the package "testpack" instead of "code"):
Traceback (most recent call last):
File "t.py", line 1, in <module>
from testpack import d
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\d.py", line 1, in <module>
from testpack import t
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\t.py", line 1, in <module>
from testpack import d
ImportError: cannot import name d
You can see what Python is doing here.
In loading t.py, the first thing it sees is from testpack import d.
At that point, Python executes the d.py file to load that module.
But the first thing it finds there is from testpack import t.
It already is loading t.py once, but t as the main script is different than t as a module, so it tries to load t.py again.
The first thing it sees is from testpack import d, which would mean it should try to load d.py . . . but it already was trying to load d.py back in step 2. Since trying to import d led back to trying to import d again, Python realizes it can't import d and throws ImportError.
Step 4 is kind of anomalous here because you ran a file in the package directly, which isn't the usual way to do things. See this question for an explanation of why importing a module is different from running it directly. If you try to import t instead (with from testpack import t), Python realizes the circularity one step sooner, and you get a simpler traceback:
>>> from testpack import t
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from testpack import t
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\t.py", line 1, in <module>
from testpack import d
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\d.py", line 1, in <module>
from testpack import t
ImportError: cannot import name t
Notice that here the error is that it can't import t. It knows it can't, because when I told it to import t, it found itself looping back to import t again. In your original example, it didn't notice it was running t.py twice, because the first time was the main script and the second was an import, so it took one more step and tried to import d again.
Now, why doesn't this happen when you do import code.d? The answer is just because you don't actually try to use the imported modules In this case, it happens as follows (I'm going to explain as if you did from code import t rather than running it as a script):
It starts to import t. When it does this, it provisionally marks the module code.t as imported, even though it's not done importing yet.
It finds it has to do import code.d, so it runs d.
In d, it finds import code.t, but since code.t is already marked as imported, it doesn't try to import it again.
Since d finished without actually using t, it gets to go back and finish loading t. No problem.
The key difference is that the names t and d are not directly accessible to each other here; they are mediated by the package code, so Python doesn't actually have to finish "deciding what t is" until it is actually used. With from code import t, since the value has to be assigned to the variable t, Python has to know what it is right away.
You can see the problem, though if you make d.py look like this:
import code.t
print code.t
Now, after step 2, while running d, it actually tries to access the half-imported module t. This will raise an AttributeError because, since the module hasn't been fully imported yet, it hasn't been attached to the package code.
Note that it would be fine as long as the use of code.t didn't happen until after d finished running. This will work fine in d.py:
import code.t
def f():
print code.t
You can call f later and it will work. The reason is that it doesn't need to use code.t until after d finished executing, and after d finishes executing, it can go back and finish executing t.
To reiterate, the main moral of the story is don't use circular imports. It leads to all kinds of headaches. Instead, factor out common code into a third module imported by both modules.
from abc import xyz
is equivalent to doing
xyz = __import__('abc').xyz
Since if you merely import abc, abc.xyz won't exist without a separate import (unless abc/__init__.py contains an explicit import for xyz), what you're seeing is expected behavior.
The problem is abc is a predefined standard library module and just creating a subdirectory of that same name with an __init__.py in it doesn't change that fact. Change the name of your package to something else by renaming the folder the __init__.py file is in to something different, i.e. to def, and then both forms of import should execute without error.
I just started experimenting with a new technique I name (for the moment at least) "module duck typing".
Example:
Main Module
import somepackage.req ## module required by all others
import abc
import Xyz
Module abc
__all__=[]
def getBus():
""" Locates the `req` for this application """
for mod_name in sys.modules:
if mod_name.find("req") > 0:
return sys.modules[mod_name].__dict__["Bus"]
raise RuntimeError("cannot find `req` module")
Bus=getBus()
In module abc I do not need to explicitly import req: it could be anywhere in the package hierarchy. Of course this requires some discipline...
With this technique, it is easy to relocate packages within the hierarchy.
Are there pitfalls awaiting me? e.g. moving to Python 3K
Updated: after some more testing, I decided to go back to inserting package dependencies directly in sys.path.
There might be all kinds of modules imported that contain "req" and you don't know if it's the module you are actually looking for:
>>> import urllib.request
>>> import tst
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tst.py", line 12, in <module>
Bus=getBus()
File "tst.py", line 9, in getBus
return sys.modules[mod_name].__dict__["Bus"]
KeyError: 'Bus'
The whole point of packages is that there are namespaces for module hierarchies. Looking up module names "from any package" just causes your code to break randomly if the user happens to import some library that happens to contain a module with a conflicting name.
This technique is dangerous and error prone. It could work with your tests until the day that someone imports a new something.req and gets a confusing, far-off error. (This is in the best case scenario; the current implementation would jump on many other modules.) If you restructure packages, it's easy enough to at that time modify your code in an automated fashion without any use of magic. Python makes it possible to do all sorts of magical, dynamic things, but that doesn't mean we should.
I think this is more like duck typing. I would also recommend using a more unique identifier than "Bus"
def getBus():
""" Locates the Bus for this application """
for mod in sys.modules.values():
if hasattr(mod, 'Bus') and type(mod.Bus) is...: # check other stuff about mod.Bus
return mod.Bus
raise RuntimeError("cannot find Bus")