I have a testing environment to try to understand how python circular dependencies can be avoided importing the modules with an import x statement, instead of using a from x import y:
test/
__init__.py
testing.py
a/
__init__.py
m_a.py
b/
__init__.py
m_b.py
The files have the following content:
testing.py:
from a.m_a import A
m_a.py:
import b.m_b
print b.m_b
class A:
pass
m_b.py:
import a.m_a
print a.m_a
class B:
pass
There is a situation which I can't understand:
If I remove the print statements from modules m_a.py and m_b.py or only from m_b.py this works OK, but if the print is present at m_b.py, then the following error is thrown:
File "testing.py", line 1, in <module>
from a.m_a import A
File "/home/enric/test/a/m_a.py", line 1, in <module>
import b.m_b
File "/home/enric/test/b/m_b.py", line 3, in <module>
print a.m_a
AttributeError: 'module' object has no attribute 'm_a'
Do you have any ideas?
It only "works" with the print statements removed because you're not actually doing anything that depends on the imports. It's still a broken circular import.
Either run this in the debugger, or add a print statement after each line, and you'll see what happens:
testing.py: from a.m_a import A
a.m_a: import b.m_b
b.m_b: import a.m_a
b.m_b: print a.m_a
It's clearly trying to access a.m_a before the module finished importing. (In fact, you can see the rest of a.m_a on the stack in your backtrace.)
If you dump out sys.modules at this point, you'll find two partial modules named a and a.m_a, but if you dir(a), there's no m_a there yet.
As far as I can tell, the fact that m_a doesn't get added to a until m_a.py finishes evaluating is not documented anywhere in the Python 2.7 documentation. (3.x has much a more complete specification of the import process—but it's also a very different import process.) So, you can't rely on this either failing or succeeding; either one is perfectly legal for an implementation. (But it happens to fail in at least CPython and PyPy…)
More generally, using import foo instead of from foo import bar doesn't magically solve all circular-import problems. It just solves one particular class of circular-import problems (or, rather, makes that class moot). (I realize there is some misleading text in the FAQ about this.)
There are various tricks to work around circular imports while still letting you have circular top-level dependencies. But really, it's almost always simpler to get rid of the circular top-level dependencies.
In this toy case, there's really no reason for a.m_a to depend on b.m_b at all. If you need some that prints out a.m_a, there are better ways to get it than from a completely independent package!
In real-life code, there probably is some stuff in m_a that m_b needs and vice-versa. But usually, you can separate it out into two levels: stuff in m_a that needs m_b, and stuff in m_a that's needed by m_b. So, just split it into two modules. It's really the same thing as the common fix for a bunch of modules that try to reach back up and import main: split a utils off main.
What if there really is something that m_b needs from m_a, that also needs m_b? Well, in that case, you may have to insert a level of indirection. For example, maybe you can pass the thing-from-m_b into the function/constructor/whatever from m_a, so it can access it as a local parameter value instead of as a global. (It's hard to be more specific without a more specific problem.)
If worst comes to worst, and you can't remove the import via indirection, you have to move the import out of the way. That may again mean doing an import inside a function call, etc. (as explained in the FAQ immediately after the paragraph that set you off), or just moving some code above the import, or all kinds of other possibilities. But consider these last-ditch solutions to something which just can't be designed cleanly, not a roadmap to follow for your designs.
Related
I'm working with a project that contains about 30 unique modules. It wasn't designed too well, so it's common that I create circular imports when adding some new functionality to the project.
Of course, when I add the circular import, I'm unaware of it. Sometimes it's pretty obvious I've made a circular import when I get an error like AttributeError: 'module' object has no attribute 'attribute' where I clearly defined 'attribute'. But other times, the code doesn't throw exceptions because of the way it's used.
So, to my question:
Is it possible to programmatically detect when and where a circular import is occuring?
The only solution I can think of so far is to have a module importTracking that contains a dict importingModules, a function importInProgress(file), which increments importingModules[file], and throws an error if it's greater than 1, and a function importComplete(file) which decrements importingModules[file]. All other modules would look like:
import importTracking
importTracking.importInProgress(__file__)
#module code goes here.
importTracking.importComplete(__file__)
But that looks really nasty, there's got to be a better way to do it, right?
To avoid having to alter every module, you could stick your import-tracking functionality in a import hook, or in a customized __import__ you could stick in the built-ins -- the latter, for once, might work better, because __import__ gets called even if the module getting imported is already in sys.modules, which is the case during circular imports.
For the implementation I'd simply use a set of the modules "in the process of being imported", something like (benjaoming edit: Inserting a working snippet derived from original):
beingimported = set()
originalimport = __import__
def newimport(modulename, *args, **kwargs):
if modulename in beingimported:
print "Importing in circles", modulename, args, kwargs
print " Import stack trace -> ", beingimported
# sys.exit(1) # Normally exiting is a bad idea.
beingimported.add(modulename)
result = originalimport(modulename, *args, **kwargs)
if modulename in beingimported:
beingimported.remove(modulename)
return result
import __builtin__
__builtin__.__import__ = newimport
Not all circular imports are a problem, as you've found when an exception is not thrown.
When they are a problem, you'll get an exception the next time you try to run any of your tests. You can change the code when this happens.
I don't see any change required from this situation.
Example of when it's not a problem:
a.py
import b
a = 42
def f():
return b.b
b.py
import a
b = 42
def f():
return a.a
Circular imports in Python are not like PHP includes.
Python imported modules are loaded the first time into an import "handler", and kept there for the duration of the process. This handler assigns names in the local namespace for whatever is imported from that module, for every subsequent import. A module is unique, and a reference to that module name will always point to the same loaded module, regardless of where it was imported.
So if you have a circular module import, the loading of each file will happen once, and then each module will have names relating to the other module created into its namespace.
There could of course be problems when referring to specific names within both modules (when the circular imports occur BEFORE the class/function definitions that are referenced in the imports of the opposite modules), but you'll get an error if that happens.
import uses __builtin__.__import__(), so if you monkeypatch that then every import everywhere will pick up the changes. Note that a circular import is not necessarily a problem though.
main:
import fileb
favouritePizza = "pineapple"
fileb.eatPizza
fileb:
from main import favouritePizza
def eatPizza():
print("favouritePizza")
This is what I tried, however, I get no such attribute. I looked at other problems and these wouldn't help.
This is classic example of circular dependency. main imports fileb, while fileb requires main.
Your case is hard (impossible?) to solve even in theory. In reality, python import machinery does even less expected thing — every time you import anything from some module, whole module is read and imported into global(per process) module namespace. Actually, from module import function is just a syntax sugar that gives you ability to not litter your namespace with everything form particular module (from module import *), but behind the scenes, its the almost the same as import module; module.function(...).
From composition/architecture point of view, basic program structure is:
top modules import bottom
bottom modules get parameters from top when being called
You probably want to use favouritePizza variable somehow in fileb, i.e. in eatPizza functions. Good thing to do is to make this function accept parameter, that will be passed from any place that uses it:
# fileb.py
def eatPizza(name):
print(name)
And call it with
# main.py
import fileb
favouritePizza = "pineapple"
fileb.eatPizza(favouritePizza)
I have this code
mainModule
from src.comp.mypackage.wait import Wait
from src.comp.mypackage.men import Men, MenItem
""" Code and stuff using Men and MenItem """
if __name__ == '__main__':
MenuTestDrive.main()
men
from abc import ABCMeta, abstractmethod
from src.comp.mypackage.util import NullUtil, CompUtil
util
from src.comp.mypackage.stack import Stack
from src.comp.mypackage.men import Men
""" Code and stuff using Men and MenItem """
and when running mainModule I'm being given this error:
Traceback (most recent call last):
File "/home/user/PycharmProjects/pythonProj/pythonDesignPatterns/src/comp/mypackage/mainModule.py", line 2, in <module>
from pythonDesignPatterns.src.comp.mypackage.men import Men, MenItem
File "/home/user/PycharmProjects/pythonProj/pythonDesignPatterns/src/comp/mypackage/men.py", line 2, in <module>
from pythonDesignPatterns.src.comp.mypackage.iterator import NullUtil, CompUtil
File "/home/user/PycharmProjects/pythonProj/pythonDesignPatterns/src/comp/mypackage/util.py", line 2, in <module>
from pythonDesignPatterns.src.comp.mypackage.men import Men
ImportError: cannot import name 'Men'
I'm using pyCharm, but the error at the command line is the same.
I could provide more code, but I don't think there is anything fancy with the use of classes and would only distract.
Where should I being to look for failures?
TL;DR : python doesn't allow circular imports, so you can't have module men importing from module util if module util imports from module men.
Longer answer:
You have to understand that in Python import, class, def etc are actually executable statements and everything (or almost) happens at runtime. When a module is imported for the first time in a given process, all the top-level statements are executed sequentially, a module instance object is created with all top-level names as attributes (note that class, def and import all bind names) and inserted into the sys.modules cache dict so next import of the same module will retrieve it directly from the cache.
In your case, when first imported, the men module tries to import the util module, which is not in sys.modules yet so the Python runtime locates the util.py (or .pyc) file and executes it. Then it reaches the from src.comp.mypackage.men import Men. At this point, men.py has not yet been fully executed and so has no Men attribute.
The canonical solution is to either extract circular dependancies into a third module or merge the two modules into a single one, depending one what makes sense for your concrete case (the goal being, as always, to have modules with low coupling and high cohesion). FWIW, circular dependancies are considered bad design whatever the language and even if the language supports them.
Sometimes (most often in complex frameworks that impose some particular structure to your code and a specific import order), you can end up with circular dependancies chains that are either much more complex (like A.funcX depends on B.y, B depends on C which depends on D which depends on E which finally depends on A.funcZ) and/or very difficult to cleanly refactor in a way that makes sense. As a last resort you still have the possibility to defer some import statement within the function (in the above it would be inside A.funcX). This is still considered a bad practice and should really only be used as the very last resort.
As a side note: from your packages naming scheme I can smell a strong Java influence. Python is not Java ! Not that there's anything wrong with Java, it's just that these are two wildly different languages, with wildly different designs, idioms and philosopies.
Trying to forcefit Java idioms and habits in Python will be an experience in pain and frustration at best (been here, done that...), so my advice here would be: forget about mostly anything you learned with Java and start learning Python instead (not just the syntax - syntax is actually only a part of a language and not necessarily the most important one). In Python we favor flat over nested, and don't try to have one module per class, you can have a whole (micro)framework in one single module, that's ok if there's no practical reason to split it into submodules.
I have file A.py:
...
from System.IO import Directory
...
execfile('B.py')
and file B.py:
...
result = ['a','b'].Contains('a')
Such combination works fine.
But if I comment this particular import line at A.py then B.py complains:
AttributeError: 'list' object has no attribute 'Contains'
It looks strange for me, especially B.py alone runs fine.
Is there some 'list' override in module System.IO? Is there way to determine what is changed during this import? Or avoid such strange behavior?
B.py on its own (on IPy 2.7.4) also results in the error you provided. It should as there is no built-in method that would bind for that call. You could also try to reproduce this on standard python to make sure.
The way you "include" B.py into A.py has inherent risks and problems at least because of missing control over the scope the code in B is executed against. If you wrap your code in a proper module/classes you can be sure that there are no issues of that sort.
This could (simplified, just a function, no classes etc.) look like:
from B import getResult
getResult()
from System.IO import Directory
getResult()
and
def getResult():
from System.IO import Directory
result = ['a','b'].Contains('a')
return result
As you can see the function getResult can ensure that all imports and other scope properties are correct and it will work whether or not the call site has the given import.
As for why the import of System.IO.Directory causes binding to IronPython.dll!IronPython.Runtime.List.Contains(object value) is not clear and would require a close look at IronPython's internal implementation.
If you mind the actual import to System.IO.Directory but don't want to refactor the code you could go with LINQ for IronPython/ImportExtensions as that would provide a Contains method as well. This will not reduce but actually increase your importing boilerplate but be more to the point.
import clr
clr.AddReference("System.Core")
import System
clr.ImportExtensions(System.Linq)
result = ['a','b'].Contains('a')
Out of the various ways to import code, are there some ways that are preferable to use, compared to others? This link http://effbot.org/zone/import-confusion.htm in short
states that
from foo.bar import MyClass
is not the preferred way to import MyClass under normal circumstances or unless you know what you are doing. (Rather, a better way would like:
import foo.bar as foobaralias
and then in the code, to access MyClass use
foobaralias.MyClass
)
In short, it seems that the above-referenced link is saying it is usually better to import everything from a module, rather than just parts of the module.
However, that article I linked is really old.
I've also heard that it is better, at least in the context of Django projects, to instead only import the classes you want to use, rather than the whole module. It has been said that this form helps avoid circular import errors or at least makes the django import system less fragile. It was pointed out that Django's own code seems to prefer "from x import y" over "import x".
Assuming the project I am working on doesn't use any special features of __init__.py ... (all of our __init__.py files are empty), what import method should I favor, and why?
First, and primary, rule of imports: never ever use from foo import *.
The article is discussing the issue of cyclical imports, which still exists today in poorly-structured code. I dislike cyclical imports; their presence is a strong sign that some module is doing too much, and needs to be split up. If for whatever reason you need to work with code with cyclical imports which cannot be re-arranged, import foo is the only option.
For most cases, there's not much difference between import foo and from foo import MyClass. I prefer the second, because there's less typing involved, but there's a few reasons why I might use the first:
The module and class/value have different names. It can be difficult for readers to remember where a particular import is coming from, when the imported value's name is unrelated to the module.
Good: import myapp.utils as utils; utils.frobnicate()
Good: import myapp.utils as U; U.frobnicate()
Bad: from myapp.utils import frobnicate
You're importing a lot of values from one module. Save your fingers, and reader's eyes.
Bad: from myapp.utils import frobnicate, foo, bar, baz, MyClass, SomeOtherClass, # yada yada
For me, it's dependent on the situation. If it's a uniquely named method/class (i.e., not process() or something like that), and you're going to use it a lot, then save typing and just do from foo import MyClass.
If you're importing multiple things from one module, it's probably better to just import the module, and do module.bar, module.foo, module.baz, etc., to keep the namespace clean.
You also said
It has been said that this form helps avoid circular import errors or at least makes the django import system less fragile. It was pointed out that Django's own code seems to prefer "from x import y" over "import x".
I don't see how one way or the other would help prevent circular imports. The reason is that even when you do from x import y, ALL of x is imported. Only y is brought into the current namespace, but the entire module x is processed. Try out this example:
In test.py, put the following:
def a():
print "a"
print "hi"
def b():
print "b"
print "bye"
Then in 'runme.py', put:
from test import b
b()
Then just do python runme.py
You'll see the following output:
hi
bye
b
So everything in test.py was run, even though you only imported b
The advantage of the latter is that the origin of MyClass is more explicit. The former puts MyClass in the current namespace so the code can just use MyClass unqualified. So it's less obvious to someone reading the code where MyClass is defined.