Module "duck typing" pitfalls?

Module "duck typing" pitfalls? - python

I just started experimenting with a new technique I name (for the moment at least) "module duck typing".
Example:
Main Module
import somepackage.req ## module required by all others
import abc
import Xyz
Module abc
__all__=[]
def getBus():
""" Locates the `req` for this application """
for mod_name in sys.modules:
if mod_name.find("req") > 0:
return sys.modules[mod_name].__dict__["Bus"]
raise RuntimeError("cannot find `req` module")
Bus=getBus()
In module abc I do not need to explicitly import req: it could be anywhere in the package hierarchy. Of course this requires some discipline...
With this technique, it is easy to relocate packages within the hierarchy.
Are there pitfalls awaiting me? e.g. moving to Python 3K
Updated: after some more testing, I decided to go back to inserting package dependencies directly in sys.path.

There might be all kinds of modules imported that contain "req" and you don't know if it's the module you are actually looking for:
>>> import urllib.request
>>> import tst
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "tst.py", line 12, in <module>
Bus=getBus()
File "tst.py", line 9, in getBus
return sys.modules[mod_name].__dict__["Bus"]
KeyError: 'Bus'
The whole point of packages is that there are namespaces for module hierarchies. Looking up module names "from any package" just causes your code to break randomly if the user happens to import some library that happens to contain a module with a conflicting name.

This technique is dangerous and error prone. It could work with your tests until the day that someone imports a new something.req and gets a confusing, far-off error. (This is in the best case scenario; the current implementation would jump on many other modules.) If you restructure packages, it's easy enough to at that time modify your code in an automated fashion without any use of magic. Python makes it possible to do all sorts of magical, dynamic things, but that doesn't mean we should.

I think this is more like duck typing. I would also recommend using a more unique identifier than "Bus"
def getBus():
""" Locates the Bus for this application """
for mod in sys.modules.values():
if hasattr(mod, 'Bus') and type(mod.Bus) is...: # check other stuff about mod.Bus
return mod.Bus
raise RuntimeError("cannot find Bus")

Related

Custom typing in Python [duplicate]

I'm getting this error
Traceback (most recent call last):
File "/Users/alex/dev/runswift/utils/sim2014/simulator.py", line 3, in <module>
from world import World
File "/Users/alex/dev/runswift/utils/sim2014/world.py", line 2, in <module>
from entities.field import Field
File "/Users/alex/dev/runswift/utils/sim2014/entities/field.py", line 2, in <module>
from entities.goal import Goal
File "/Users/alex/dev/runswift/utils/sim2014/entities/goal.py", line 2, in <module>
from entities.post import Post
File "/Users/alex/dev/runswift/utils/sim2014/entities/post.py", line 4, in <module>
from physics import PostBody
File "/Users/alex/dev/runswift/utils/sim2014/physics.py", line 21, in <module>
from entities.post import Post
ImportError: cannot import name Post
and you can see that I use the same import statement further up and it works. Is there some unwritten rule about circular importing? How do I use the same class further down the call stack?
See also What happens when using mutual or circular (cyclic) imports in Python? for a general overview of what is allowed and what causes a problem WRT circular imports. See What can I do about "ImportError: Cannot import name X" or "AttributeError: ... (most likely due to a circular import)"? for techniques for resolving and avoiding circular dependencies.

I think the answer by jpmc26, while by no means wrong, comes down too heavily on circular imports. They can work just fine, if you set them up correctly.
The easiest way to do so is to use import my_module syntax, rather than from my_module import some_object. The former will almost always work, even if my_module included imports us back. The latter only works if my_object is already defined in my_module, which in a circular import may not be the case.
To be specific to your case: Try changing entities/post.py to do import physics and then refer to physics.PostBody rather than just PostBody directly. Similarly, change physics.py to do import entities.post and then use entities.post.Post rather than just Post.

When you import a module (or a member of it) for the first time, the code inside the module is executed sequentially like any other code; e.g., it is not treated any differently that the body of a function. An import is just a command like any other (assignment, a function call, def, class). Assuming your imports occur at the top of the script, then here's what's happening:
When you try to import World from world, the world script gets executed.
The world script imports Field, which causes the entities.field script to get executed.
This process continues until you reach the entities.post script because you tried to import Post
The entities.post script causes physics module to be executed because it tries to import PostBody
Finally, physics tries to import Post from entities.post
I'm not sure whether the entities.post module exists in memory yet, but it really doesn't matter. Either the module is not in memory, or the module doesn't yet have a Post member because it hasn't finished executing to define Post
Either way, an error occurs because Post is not there to be imported
So no, it's not "working further up in the call stack". This is a stack trace of where the error occurred, which means it errored out trying to import Post in that class. You shouldn't use circular imports. At best, it has negligible benefit (typically, no benefit), and it causes problems like this. It burdens any developer maintaining it, forcing them to walk on egg shells to avoid breaking it. Refactor your module organization.

To understand circular dependencies, you need to remember that Python is essentially a scripting language. Execution of statements outside methods occurs at compile time. Import statements are executed just like method calls, and to understand them you should think about them like method calls.
When you do an import, what happens depends on whether the file you are importing already exists in the module table. If it does, Python uses whatever is currently in the symbol table. If not, Python begins reading the module file, compiling/executing/importing whatever it finds there. Symbols referenced at compile time are found or not, depending on whether they have been seen, or are yet to be seen by the compiler.
Imagine you have two source files:
File X.py
def X1:
return "x1"
from Y import Y2
def X2:
return "x2"
File Y.py
def Y1:
return "y1"
from X import X1
def Y2:
return "y2"
Now suppose you compile file X.py. The compiler begins by defining the method X1, and then hits the import statement in X.py. This causes the compiler to pause compilation of X.py and begin compiling Y.py. Shortly thereafter the compiler hits the import statement in Y.py. Since X.py is already in the module table, Python uses the existing incomplete X.py symbol table to satisfy any references requested. Any symbols appearing before the import statement in X.py are now in the symbol table, but any symbols after are not. Since X1 now appears before the import statement, it is successfully imported. Python then resumes compiling Y.py. In doing so it defines Y2 and finishes compiling Y.py. It then resumes compilation of X.py, and finds Y2 in the Y.py symbol table. Compilation eventually completes w/o error.
Something very different happens if you attempt to compile Y.py from the command line. While compiling Y.py, the compiler hits the import statement before it defines Y2. Then it starts compiling X.py. Soon it hits the import statement in X.py that requires Y2. But Y2 is undefined, so the compile fails.
Please note that if you modify X.py to import Y1, the compile will always succeed, no matter which file you compile. However if you modify file Y.py to import symbol X2, neither file will compile.
Any time when module X, or any module imported by X might import the current module, do NOT use:
from X import Y
Any time you think there may be a circular import you should also avoid compile time references to variables in other modules. Consider the innocent looking code:
import X
z = X.Y
Suppose module X imports this module before this module imports X. Further suppose Y is defined in X after the import statement. Then Y will not be defined when this module is imported, and you will get a compile error. If this module imports Y first, you can get away with it. But when one of your co-workers innocently changes the order of definitions in a third module, the code will break.
In some cases you can resolve circular dependencies by moving an import statement down below symbol definitions needed by other modules. In the examples above, definitions before the import statement never fail. Definitions after the import statement sometimes fail, depending on the order of compilation. You can even put import statements at the end of a file, so long as none of the imported symbols are needed at compile time.
Note that moving import statements down in a module obscures what you are doing. Compensate for this with a comment at the top of your module something like the following:
#import X (actual import moved down to avoid circular dependency)
In general this is a bad practice, but sometimes it is difficult to avoid.

For those of you who, like me, come to this issue from Django, you should know that the docs provide a solution:
https://docs.djangoproject.com/en/1.10/ref/models/fields/#foreignkey
"...To refer to models defined in another application, you can explicitly specify a model with the full application label. For example, if the Manufacturer model above is defined in another application called production, you’d need to use:
class Car(models.Model):
manufacturer = models.ForeignKey(
'production.Manufacturer',
on_delete=models.CASCADE,
)
This sort of reference can be useful when resolving circular import dependencies between two applications...."

I was able to import the module within the function (only) that would require the objects from this module:
def my_func():
import Foo
foo_instance = Foo()

If you run into this issue in a fairly complex app it can be cumbersome to refactor all your imports. PyCharm offers a quickfix for this that will automatically change all usage of the imported symbols as well.

I was using the following:
from module import Foo
foo_instance = Foo()
but to get rid of circular reference I did the following and it worked:
import module.foo
foo_instance = foo.Foo()

According to this answer we can import another module's object in the block( like function/ method and etc), without circular import error occurring, for example for import Simple object of another.py module, you can use this:
def get_simple_obj():
from another import Simple
return Simple
class Example(get_simple_obj()):
pass
class NotCircularImportError:
pass
In this situation, another.py module can easily import NotCircularImportError, without any problem.

just check your file name see if it is not the same as library you are importing.
Eg - sympy.py
import sympy as sym

Changing sys.modules caused an unexpected KeyError

Some automated tests in a larger system need to be able to import a module, and then restore sys.modules to its original condition.
But this code fragment:
import sys
sys.modules = dict(sys.modules)
import pickle
causes this KeyError in Python 3.6-3.8:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "[...]/python3.6/pickle.py", line 1562, in <module>
from _pickle import (
KeyError: '_compat_pickle'
It seems as if only pickle and modules that depend on it like multiprocessing are affected. I've investigated _compat_pickle - it's a module for pickling compatibility with Python 2 - but nothing jumps out that would cause this.
Is there a safe way to restore sys.modules back to an earlier state? And what is the mechanism behind this unexpected KeyError?

The problem is that sys.modules is a lie (I think). It is not actually the true source of the modules dict. That is stored on a C level in the current interpreter, and sys.modules is just a copy to that. _pickle is special, since it imports a module from C source, which I assume leads to this error (mismatch between what tstate->interp->modules says is imported and what sys.modules thinks is imported).
This might be considered a bug in python. I am not sure if a bug report already exists. Here is the bug report: https://bugs.python.org/issue12633 .
You could just save which keys are in modules before and after the code, and delete all other entries afterwards.

Why do circular imports seemingly work further up in the call stack but then raise an ImportError further down?

I'm getting this error
Traceback (most recent call last):
File "/Users/alex/dev/runswift/utils/sim2014/simulator.py", line 3, in <module>
from world import World
File "/Users/alex/dev/runswift/utils/sim2014/world.py", line 2, in <module>
from entities.field import Field
File "/Users/alex/dev/runswift/utils/sim2014/entities/field.py", line 2, in <module>
from entities.goal import Goal
File "/Users/alex/dev/runswift/utils/sim2014/entities/goal.py", line 2, in <module>
from entities.post import Post
File "/Users/alex/dev/runswift/utils/sim2014/entities/post.py", line 4, in <module>
from physics import PostBody
File "/Users/alex/dev/runswift/utils/sim2014/physics.py", line 21, in <module>
from entities.post import Post
ImportError: cannot import name Post
and you can see that I use the same import statement further up and it works. Is there some unwritten rule about circular importing? How do I use the same class further down the call stack?
See also What happens when using mutual or circular (cyclic) imports in Python? for a general overview of what is allowed and what causes a problem WRT circular imports. See What can I do about "ImportError: Cannot import name X" or "AttributeError: ... (most likely due to a circular import)"? for techniques for resolving and avoiding circular dependencies.

I think the answer by jpmc26, while by no means wrong, comes down too heavily on circular imports. They can work just fine, if you set them up correctly.
The easiest way to do so is to use import my_module syntax, rather than from my_module import some_object. The former will almost always work, even if my_module included imports us back. The latter only works if my_object is already defined in my_module, which in a circular import may not be the case.
To be specific to your case: Try changing entities/post.py to do import physics and then refer to physics.PostBody rather than just PostBody directly. Similarly, change physics.py to do import entities.post and then use entities.post.Post rather than just Post.

When you import a module (or a member of it) for the first time, the code inside the module is executed sequentially like any other code; e.g., it is not treated any differently that the body of a function. An import is just a command like any other (assignment, a function call, def, class). Assuming your imports occur at the top of the script, then here's what's happening:
When you try to import World from world, the world script gets executed.
The world script imports Field, which causes the entities.field script to get executed.
This process continues until you reach the entities.post script because you tried to import Post
The entities.post script causes physics module to be executed because it tries to import PostBody
Finally, physics tries to import Post from entities.post
I'm not sure whether the entities.post module exists in memory yet, but it really doesn't matter. Either the module is not in memory, or the module doesn't yet have a Post member because it hasn't finished executing to define Post
Either way, an error occurs because Post is not there to be imported
So no, it's not "working further up in the call stack". This is a stack trace of where the error occurred, which means it errored out trying to import Post in that class. You shouldn't use circular imports. At best, it has negligible benefit (typically, no benefit), and it causes problems like this. It burdens any developer maintaining it, forcing them to walk on egg shells to avoid breaking it. Refactor your module organization.

To understand circular dependencies, you need to remember that Python is essentially a scripting language. Execution of statements outside methods occurs at compile time. Import statements are executed just like method calls, and to understand them you should think about them like method calls.
When you do an import, what happens depends on whether the file you are importing already exists in the module table. If it does, Python uses whatever is currently in the symbol table. If not, Python begins reading the module file, compiling/executing/importing whatever it finds there. Symbols referenced at compile time are found or not, depending on whether they have been seen, or are yet to be seen by the compiler.
Imagine you have two source files:
File X.py
def X1:
return "x1"
from Y import Y2
def X2:
return "x2"
File Y.py
def Y1:
return "y1"
from X import X1
def Y2:
return "y2"
Now suppose you compile file X.py. The compiler begins by defining the method X1, and then hits the import statement in X.py. This causes the compiler to pause compilation of X.py and begin compiling Y.py. Shortly thereafter the compiler hits the import statement in Y.py. Since X.py is already in the module table, Python uses the existing incomplete X.py symbol table to satisfy any references requested. Any symbols appearing before the import statement in X.py are now in the symbol table, but any symbols after are not. Since X1 now appears before the import statement, it is successfully imported. Python then resumes compiling Y.py. In doing so it defines Y2 and finishes compiling Y.py. It then resumes compilation of X.py, and finds Y2 in the Y.py symbol table. Compilation eventually completes w/o error.
Something very different happens if you attempt to compile Y.py from the command line. While compiling Y.py, the compiler hits the import statement before it defines Y2. Then it starts compiling X.py. Soon it hits the import statement in X.py that requires Y2. But Y2 is undefined, so the compile fails.
Please note that if you modify X.py to import Y1, the compile will always succeed, no matter which file you compile. However if you modify file Y.py to import symbol X2, neither file will compile.
Any time when module X, or any module imported by X might import the current module, do NOT use:
from X import Y
Any time you think there may be a circular import you should also avoid compile time references to variables in other modules. Consider the innocent looking code:
import X
z = X.Y
Suppose module X imports this module before this module imports X. Further suppose Y is defined in X after the import statement. Then Y will not be defined when this module is imported, and you will get a compile error. If this module imports Y first, you can get away with it. But when one of your co-workers innocently changes the order of definitions in a third module, the code will break.
In some cases you can resolve circular dependencies by moving an import statement down below symbol definitions needed by other modules. In the examples above, definitions before the import statement never fail. Definitions after the import statement sometimes fail, depending on the order of compilation. You can even put import statements at the end of a file, so long as none of the imported symbols are needed at compile time.
Note that moving import statements down in a module obscures what you are doing. Compensate for this with a comment at the top of your module something like the following:
#import X (actual import moved down to avoid circular dependency)
In general this is a bad practice, but sometimes it is difficult to avoid.

For those of you who, like me, come to this issue from Django, you should know that the docs provide a solution:
https://docs.djangoproject.com/en/1.10/ref/models/fields/#foreignkey
"...To refer to models defined in another application, you can explicitly specify a model with the full application label. For example, if the Manufacturer model above is defined in another application called production, you’d need to use:
class Car(models.Model):
manufacturer = models.ForeignKey(
'production.Manufacturer',
on_delete=models.CASCADE,
)
This sort of reference can be useful when resolving circular import dependencies between two applications...."

I was able to import the module within the function (only) that would require the objects from this module:
def my_func():
import Foo
foo_instance = Foo()

If you run into this issue in a fairly complex app it can be cumbersome to refactor all your imports. PyCharm offers a quickfix for this that will automatically change all usage of the imported symbols as well.

I was using the following:
from module import Foo
foo_instance = Foo()
but to get rid of circular reference I did the following and it worked:
import module.foo
foo_instance = foo.Foo()

According to this answer we can import another module's object in the block( like function/ method and etc), without circular import error occurring, for example for import Simple object of another.py module, you can use this:
def get_simple_obj():
from another import Simple
return Simple
class Example(get_simple_obj()):
pass
class NotCircularImportError:
pass
In this situation, another.py module can easily import NotCircularImportError, without any problem.

just check your file name see if it is not the same as library you are importing.
Eg - sympy.py
import sympy as sym

"ImportError: cannot import name ..." - raised on "import from" but not on direct import

What is the precise rule under which this exception is raised by Python 3 interpreter?
There are plenty of SO questions about that, with excellent answers, but I could not find one that gave a clear, general, and logically precise definition of the circumstances when this exception occurs.
The documentation doesn't seem to be clear either. It says:
exception ImportError
Raised when an import statement fails to find
the module definition or when a from ... import fails to find a name
that is to be imported.
But this seems inconsistent with the following example.
I meant to ask for a general definition rather than a specific case, but to clarify my concerns, here's an example:
# code/t.py:
from code import d
# code/d.py
from code import t
Running module t.py from the command line results in ImportError: cannot import name d.
On the other hand, the following code doesn't raise exceptions:
# code/t.py:
import code.d
# code/d.py
import code.t
At all times, __init__.py is empty.
In this example, the only modules or names mentioned in the import statement are t and d, and they were both clearly found. If the documentation implies that some name within the d module isn't found, it's certainly not obvious; and on top of that, I'd expect it to raise NameError: name ... is not defined exception rather than ImportError.

If abc is a package and xyz is a module, and if abc's __init__.py defines an __all__ that does not include xyz, then you won't be able to do from abc import xyz, but you'll still be able to do import abc.xyz.
Edit: The short answer is: your problem is that your imports are circular. Modules t and d try to import each other. This won't work. Don't do it. I'm going to explain the whole thing, below but the explanation is pretty long.
To understand why it gives an ImportError, try to follow the code execution. If you look at the full traceback instead of just the final part, you can see what it's doing. With your setup I get a traceback like this (I called the package "testpack" instead of "code"):
Traceback (most recent call last):
File "t.py", line 1, in <module>
from testpack import d
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\d.py", line 1, in <module>
from testpack import t
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\t.py", line 1, in <module>
from testpack import d
ImportError: cannot import name d
You can see what Python is doing here.
In loading t.py, the first thing it sees is from testpack import d.
At that point, Python executes the d.py file to load that module.
But the first thing it finds there is from testpack import t.
It already is loading t.py once, but t as the main script is different than t as a module, so it tries to load t.py again.
The first thing it sees is from testpack import d, which would mean it should try to load d.py . . . but it already was trying to load d.py back in step 2. Since trying to import d led back to trying to import d again, Python realizes it can't import d and throws ImportError.
Step 4 is kind of anomalous here because you ran a file in the package directly, which isn't the usual way to do things. See this question for an explanation of why importing a module is different from running it directly. If you try to import t instead (with from testpack import t), Python realizes the circularity one step sooner, and you get a simpler traceback:
>>> from testpack import t
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
from testpack import t
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\t.py", line 1, in <module>
from testpack import d
File "C:\Documents and Settings\BrenBarn\My Documents\Python\testpack\d.py", line 1, in <module>
from testpack import t
ImportError: cannot import name t
Notice that here the error is that it can't import t. It knows it can't, because when I told it to import t, it found itself looping back to import t again. In your original example, it didn't notice it was running t.py twice, because the first time was the main script and the second was an import, so it took one more step and tried to import d again.
Now, why doesn't this happen when you do import code.d? The answer is just because you don't actually try to use the imported modules In this case, it happens as follows (I'm going to explain as if you did from code import t rather than running it as a script):
It starts to import t. When it does this, it provisionally marks the module code.t as imported, even though it's not done importing yet.
It finds it has to do import code.d, so it runs d.
In d, it finds import code.t, but since code.t is already marked as imported, it doesn't try to import it again.
Since d finished without actually using t, it gets to go back and finish loading t. No problem.
The key difference is that the names t and d are not directly accessible to each other here; they are mediated by the package code, so Python doesn't actually have to finish "deciding what t is" until it is actually used. With from code import t, since the value has to be assigned to the variable t, Python has to know what it is right away.
You can see the problem, though if you make d.py look like this:
import code.t
print code.t
Now, after step 2, while running d, it actually tries to access the half-imported module t. This will raise an AttributeError because, since the module hasn't been fully imported yet, it hasn't been attached to the package code.
Note that it would be fine as long as the use of code.t didn't happen until after d finished running. This will work fine in d.py:
import code.t
def f():
print code.t
You can call f later and it will work. The reason is that it doesn't need to use code.t until after d finished executing, and after d finishes executing, it can go back and finish executing t.
To reiterate, the main moral of the story is don't use circular imports. It leads to all kinds of headaches. Instead, factor out common code into a third module imported by both modules.

from abc import xyz
is equivalent to doing
xyz = __import__('abc').xyz
Since if you merely import abc, abc.xyz won't exist without a separate import (unless abc/__init__.py contains an explicit import for xyz), what you're seeing is expected behavior.

The problem is abc is a predefined standard library module and just creating a subdirectory of that same name with an __init__.py in it doesn't change that fact. Change the name of your package to something else by renaming the folder the __init__.py file is in to something different, i.e. to def, and then both forms of import should execute without error.

Python import hooks: no filenames in trackback of import errors

I've written an import hook according to PEP 302 and it seems to work fine except one annoying detail.
When there's an import error, say code that tries to import a module that doesn't exist, I get a trackback with lines like:
File "<string>", line 10, in helloEnv
line 10 is where the call to the non-existing import resides but there is no file name, just <string>.
My import hooks looks pretty much like the minimal one in PEP 302. In the creation of the module I always set a proper string value to __file__ and even check that new_module() sets a correct value to __name__. Also, both str() and repr() of the module return something informative.
These nameless files in the trackback makes it hard to debug import errors. Where does the trackback take its filenames from? why doesn't it see the names of the modules?
EDIT - thinking about it some more, it's probably since the module code is executed using exec(). is it possible to give exec() a filename?

Ok, so that was simple enough. instead of
exec(code, mod.__dict__)
write:
exec(compile(code, fullname, "exec"), mod.__dict__)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Module "duck typing" pitfalls? - python

Related

Custom typing in Python [duplicate]

Changing sys.modules caused an unexpected KeyError

Why do circular imports seemingly work further up in the call stack but then raise an ImportError further down?

"ImportError: cannot import name ..." - raised on "import from" but not on direct import

Python import hooks: no filenames in trackback of import errors

Categories

Resources