I'm working with a project that contains about 30 unique modules. It wasn't designed too well, so it's common that I create circular imports when adding some new functionality to the project.
Of course, when I add the circular import, I'm unaware of it. Sometimes it's pretty obvious I've made a circular import when I get an error like AttributeError: 'module' object has no attribute 'attribute' where I clearly defined 'attribute'. But other times, the code doesn't throw exceptions because of the way it's used.
So, to my question:
Is it possible to programmatically detect when and where a circular import is occuring?
The only solution I can think of so far is to have a module importTracking that contains a dict importingModules, a function importInProgress(file), which increments importingModules[file], and throws an error if it's greater than 1, and a function importComplete(file) which decrements importingModules[file]. All other modules would look like:
import importTracking
importTracking.importInProgress(__file__)
#module code goes here.
importTracking.importComplete(__file__)
But that looks really nasty, there's got to be a better way to do it, right?
To avoid having to alter every module, you could stick your import-tracking functionality in a import hook, or in a customized __import__ you could stick in the built-ins -- the latter, for once, might work better, because __import__ gets called even if the module getting imported is already in sys.modules, which is the case during circular imports.
For the implementation I'd simply use a set of the modules "in the process of being imported", something like (benjaoming edit: Inserting a working snippet derived from original):
beingimported = set()
originalimport = __import__
def newimport(modulename, *args, **kwargs):
if modulename in beingimported:
print "Importing in circles", modulename, args, kwargs
print " Import stack trace -> ", beingimported
# sys.exit(1) # Normally exiting is a bad idea.
beingimported.add(modulename)
result = originalimport(modulename, *args, **kwargs)
if modulename in beingimported:
beingimported.remove(modulename)
return result
import __builtin__
__builtin__.__import__ = newimport
Not all circular imports are a problem, as you've found when an exception is not thrown.
When they are a problem, you'll get an exception the next time you try to run any of your tests. You can change the code when this happens.
I don't see any change required from this situation.
Example of when it's not a problem:
a.py
import b
a = 42
def f():
return b.b
b.py
import a
b = 42
def f():
return a.a
Circular imports in Python are not like PHP includes.
Python imported modules are loaded the first time into an import "handler", and kept there for the duration of the process. This handler assigns names in the local namespace for whatever is imported from that module, for every subsequent import. A module is unique, and a reference to that module name will always point to the same loaded module, regardless of where it was imported.
So if you have a circular module import, the loading of each file will happen once, and then each module will have names relating to the other module created into its namespace.
There could of course be problems when referring to specific names within both modules (when the circular imports occur BEFORE the class/function definitions that are referenced in the imports of the opposite modules), but you'll get an error if that happens.
import uses __builtin__.__import__(), so if you monkeypatch that then every import everywhere will pick up the changes. Note that a circular import is not necessarily a problem though.
Related
Let's say i have 3 modules within the same directory. (module1,module2,module3)
Suppose the 2nd module imports the 3rd module then if i import module2 in module 1. Does that automatically import module 3 to module 1 ?
Thanks
No. The imports only work inside a module. You can verify that by creating a test.
Saying,
# module1
import module2
# module2
import module3
# in module1
module3.foo() # oops
This is reasonable because you can think in reverse: if imports cause a chain of importing, it'll be hard to decide which function is from which module, thus causing complex naming conflicts.
No, it will not be imported unless you explicitly specify python to, like so:
from module2 import *
What importing does conceptually is outlined below.
import some_module
The statement above is equivalent to:
module_variable = import_module("some_module")
All we have done so far is bind some object to a variable name.
When it comes to the implementation of import_module it is also not that hard to grasp.
def import_module(module_name):
if module_name in sys.modules:
module = sys.modules[module_name]
else:
filename = find_file_for_module(module_name)
python_code = open(filename).read()
module = create_module_from_code(python_code)
sys.modules[module_name] = module
return module
First, we check if the module has been imported before. If it was, then it will be available in the global list of all modules (sys.modules), and so will simply be reused. In the case that the module is not available, we create it from the code. Once the function returns, the module will be assigned to the variable name that you have chosen. As you can see the process is not inefficient or wasteful. All you are doing is creating an alias for your module. In most cases, transparency is prefered, hence having a quick look at the top of the file can tell you what resources are available to you. Otherwise, you might end up in a situation where you are wondering where is a given resource coming from. So, that is why you do not get modules inherently "imported".
Resource:
Python doc on importing
main:
import fileb
favouritePizza = "pineapple"
fileb.eatPizza
fileb:
from main import favouritePizza
def eatPizza():
print("favouritePizza")
This is what I tried, however, I get no such attribute. I looked at other problems and these wouldn't help.
This is classic example of circular dependency. main imports fileb, while fileb requires main.
Your case is hard (impossible?) to solve even in theory. In reality, python import machinery does even less expected thing — every time you import anything from some module, whole module is read and imported into global(per process) module namespace. Actually, from module import function is just a syntax sugar that gives you ability to not litter your namespace with everything form particular module (from module import *), but behind the scenes, its the almost the same as import module; module.function(...).
From composition/architecture point of view, basic program structure is:
top modules import bottom
bottom modules get parameters from top when being called
You probably want to use favouritePizza variable somehow in fileb, i.e. in eatPizza functions. Good thing to do is to make this function accept parameter, that will be passed from any place that uses it:
# fileb.py
def eatPizza(name):
print(name)
And call it with
# main.py
import fileb
favouritePizza = "pineapple"
fileb.eatPizza(favouritePizza)
I have file A.py:
...
from System.IO import Directory
...
execfile('B.py')
and file B.py:
...
result = ['a','b'].Contains('a')
Such combination works fine.
But if I comment this particular import line at A.py then B.py complains:
AttributeError: 'list' object has no attribute 'Contains'
It looks strange for me, especially B.py alone runs fine.
Is there some 'list' override in module System.IO? Is there way to determine what is changed during this import? Or avoid such strange behavior?
B.py on its own (on IPy 2.7.4) also results in the error you provided. It should as there is no built-in method that would bind for that call. You could also try to reproduce this on standard python to make sure.
The way you "include" B.py into A.py has inherent risks and problems at least because of missing control over the scope the code in B is executed against. If you wrap your code in a proper module/classes you can be sure that there are no issues of that sort.
This could (simplified, just a function, no classes etc.) look like:
from B import getResult
getResult()
from System.IO import Directory
getResult()
and
def getResult():
from System.IO import Directory
result = ['a','b'].Contains('a')
return result
As you can see the function getResult can ensure that all imports and other scope properties are correct and it will work whether or not the call site has the given import.
As for why the import of System.IO.Directory causes binding to IronPython.dll!IronPython.Runtime.List.Contains(object value) is not clear and would require a close look at IronPython's internal implementation.
If you mind the actual import to System.IO.Directory but don't want to refactor the code you could go with LINQ for IronPython/ImportExtensions as that would provide a Contains method as well. This will not reduce but actually increase your importing boilerplate but be more to the point.
import clr
clr.AddReference("System.Core")
import System
clr.ImportExtensions(System.Linq)
result = ['a','b'].Contains('a')
I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.
I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.
Similar Question: Understanding A Chain of Imports in Python
NB: I'm using Python 3.3
I have setup the following two files in the same directory to explain importing to myself, however I still don't get exactly what it's doing. I understand function and class definitions are statements that need to run.
untitled.py:
import string
class testing:
def func(self):
try:
print(string.ascii_lowercase)
except:
print('not imported')
class second:
x=1
print('print statement in untitled executed')
stuff.py:
from untitled import testing
try:
t=testing()
t.func()
except NameError:
print('testing not imported')
try:
print(string.ascii_uppercase)
except NameError:
print('string not imported')
try:
print(untitled.string.ascii_uppercase)
except NameError:
print('string not imported in untitled')
try:
s=second()
print(s.x)
except NameError:
print('second not imported')
This is the output I get from running stuff.py:
print statement in untitled executed
abcdefghijklmnopqrstuvwxyz
string not imported
string not imported in untitled
second not imported
The print statement in untitled.py is executed despite the import in stuff.py specifying only the testing class. Moreover what is the string module's relation inside stuff.py, as it can be called from within the testing class yet not from the outside.
Could somebody please explain this behaviour to me, what exactly does a "from import" statment do (what does it run)?
You can think of python modules as namespaces. Keep in mind that imports are not includes:
modules are only imported once
the first time, the top level code is executed
any imports, variable, function or class declarations affects only the module local namespace
Suppose you have a module called foo.py:
import eggs
bar = "Lets drink, it's a bar'
So when you do a from foo import bar in another module, you will make bar available in the current namespace. The module eggs will be available under foo.eggs if you do an import foo. If you do a from foo import *, then eggs, bar and everything else in the module namespace will be also in the current namespace - but never do that, wildcard imports are frowned upon in Python.
If you do a import foo and then import eggs, the top level code at eggs will be executed once and the module namespace will be stored in the module cache: if another module imports it the information will be pulled from this cache. If you are going to use it, then import it - no need to worry about multiple imports executing the top level code multiple times.
Python programmers are very fond of namespaces; I always try to use import foo and then foo.bar instead of from foo import bar if possible - it keeps the namespace clean and prevent name clashes.
That said, the import mechanism is hackable, you can make python import statement work even with files that are not python.
The from statement isn't any different to import with regard to loading behaviour. Always the top level code is executed, when loading the module. from just controls which parts of the loaded module are being added to the current scope (the first point is most important):
The from form uses a slightly more complex process:
find the module specified in the from clause loading and initializing it if necessary;
for each of the identifiers specified in the import clauses:
check if the imported module has an attribute by that name
if not, attempt to import a submodule with that name and then check the imported module again for that attribute
if the attribute is not found, ImportError is raised.
otherwise, a reference to that value is bound in the local namespace, using the name in the as clause if it is present, otherwise using the attribute name
Thus you can access the contents of a module partially imported with from with this inelegant trick:
print(sys.modules['untitled'].string.ascii_uppercase)
In your first file (untitled.py), when python compiler parses(since you called it in import) this file It will create 2 class code objects and execute the print statement. Note that it will even print it if you run untitled.py from command line.
In your second file(stuff.py), to add to #Paulo comments, you have only imported testing class in your namspace, so only that will be available, from the 2 code objects from untitled.py
However if you just say
import untitled
your 3rd "try" statement will work, since it will have untitled in its namespace.
Next thing. try importing untitled.testing :)