Does python ever multiply-load a module? - python

I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.

I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.

Related

Detect circular imports in Python [duplicate]

I'm working with a project that contains about 30 unique modules. It wasn't designed too well, so it's common that I create circular imports when adding some new functionality to the project.
Of course, when I add the circular import, I'm unaware of it. Sometimes it's pretty obvious I've made a circular import when I get an error like AttributeError: 'module' object has no attribute 'attribute' where I clearly defined 'attribute'. But other times, the code doesn't throw exceptions because of the way it's used.
So, to my question:
Is it possible to programmatically detect when and where a circular import is occuring?
The only solution I can think of so far is to have a module importTracking that contains a dict importingModules, a function importInProgress(file), which increments importingModules[file], and throws an error if it's greater than 1, and a function importComplete(file) which decrements importingModules[file]. All other modules would look like:
import importTracking
importTracking.importInProgress(__file__)
#module code goes here.
importTracking.importComplete(__file__)
But that looks really nasty, there's got to be a better way to do it, right?
To avoid having to alter every module, you could stick your import-tracking functionality in a import hook, or in a customized __import__ you could stick in the built-ins -- the latter, for once, might work better, because __import__ gets called even if the module getting imported is already in sys.modules, which is the case during circular imports.
For the implementation I'd simply use a set of the modules "in the process of being imported", something like (benjaoming edit: Inserting a working snippet derived from original):
beingimported = set()
originalimport = __import__
def newimport(modulename, *args, **kwargs):
if modulename in beingimported:
print "Importing in circles", modulename, args, kwargs
print " Import stack trace -> ", beingimported
# sys.exit(1) # Normally exiting is a bad idea.
beingimported.add(modulename)
result = originalimport(modulename, *args, **kwargs)
if modulename in beingimported:
beingimported.remove(modulename)
return result
import __builtin__
__builtin__.__import__ = newimport
Not all circular imports are a problem, as you've found when an exception is not thrown.
When they are a problem, you'll get an exception the next time you try to run any of your tests. You can change the code when this happens.
I don't see any change required from this situation.
Example of when it's not a problem:
a.py
import b
a = 42
def f():
return b.b
b.py
import a
b = 42
def f():
return a.a
Circular imports in Python are not like PHP includes.
Python imported modules are loaded the first time into an import "handler", and kept there for the duration of the process. This handler assigns names in the local namespace for whatever is imported from that module, for every subsequent import. A module is unique, and a reference to that module name will always point to the same loaded module, regardless of where it was imported.
So if you have a circular module import, the loading of each file will happen once, and then each module will have names relating to the other module created into its namespace.
There could of course be problems when referring to specific names within both modules (when the circular imports occur BEFORE the class/function definitions that are referenced in the imports of the opposite modules), but you'll get an error if that happens.
import uses __builtin__.__import__(), so if you monkeypatch that then every import everywhere will pick up the changes. Note that a circular import is not necessarily a problem though.

Import method from Python submodule in __init__, but not submodule itself

I have a Python module with the following structure:
mymod/
__init__.py
tools.py
# __init__.py
from .tools import foo
# tools.py
def foo():
return 42
Now, when import mymod, I see that it has the following members:
mymod.foo()
mymod.tools.foo()
I don't want the latter though; it just pollutes the namespace.
Funnily enough, if tools.py is called foo.py you get what you want:
mymod.foo()
(Obviously, this only works if there is just one function per file.)
How do I avoid importing tools? Note that putting foo() into __init__.py is not an option. (In reality, there are many functions like foo which would absolutely clutter the file.)
The existence of the mymod.tools attribute is crucial to maintaining proper function of the import system. One of the normal invariants of Python imports is that if a module x.y is registered in sys.modules, then the x module has a y attribute referring to the x.y module. Otherwise, things like
import x.y
x.y.y_function()
break, and depending on the Python version, even
from x import y
can break. Even if you don't think you're doing any of the things that would break, other tools and modules rely on these invariants, and trying to remove the attribute causes a slew of compatibility problems that are nowhere near worth it.
Trying to make tools not show up in your mymod module's namespace is kind of like trying to not make "private" (leading-underscore) attributes show up in your objects' namespaces. It's not how Python is designed to work, and trying to force it to work that way causes more problems than it solves.
The leading-underscore convention isn't just for instance variables. You could mark your tools module with a leading underscore, renaming it to _tools. This would prevent it from getting picked up by from mymod import * imports (unless you explicitly put it in an __all__ list), and it'd change how IDEs and linters treat attempts to access it directly.
You are not importing the tools module, it's just available when you import the package like you're doing:
import mymod
You will have access to everything defined in the __init__ file and all the modules of this package:
import mymod
# Reference a module
mymod.tools
# Reference a member of a module
mymod.tools.foo
# And any other modules from this package
mymod.tools.subtools.func
When you import foo inside __init__ you are are just making foo available there just like if you have defined it there, but of course you defined it in tools which is a way to organize your package, so now since you imported it inside __init__ you can:
import mymod
mymod.foo()
Or you can import foo alone:
from mymod import foo
foo()
But you can import foo without making it available inside __init__, you can do the following which is exactly the same as the example above:
from mymod.tools import foo
foo()
You can use both approaches, they're both right, in all these example you are not "cluttering the file" as you can see accessing foo using mymod.tools.foo is namespaced so you can have multiple foos defined in other modules.
Try putting this in your __init__.py file:
from .tools import foo
del tools

Importing modules in python (3 modules)

Let's say i have 3 modules within the same directory. (module1,module2,module3)
Suppose the 2nd module imports the 3rd module then if i import module2 in module 1. Does that automatically import module 3 to module 1 ?
Thanks
No. The imports only work inside a module. You can verify that by creating a test.
Saying,
# module1
import module2
# module2
import module3
# in module1
module3.foo() # oops
This is reasonable because you can think in reverse: if imports cause a chain of importing, it'll be hard to decide which function is from which module, thus causing complex naming conflicts.
No, it will not be imported unless you explicitly specify python to, like so:
from module2 import *
What importing does conceptually is outlined below.
import some_module
The statement above is equivalent to:
module_variable = import_module("some_module")
All we have done so far is bind some object to a variable name.
When it comes to the implementation of import_module it is also not that hard to grasp.
def import_module(module_name):
if module_name in sys.modules:
module = sys.modules[module_name]
else:
filename = find_file_for_module(module_name)
python_code = open(filename).read()
module = create_module_from_code(python_code)
sys.modules[module_name] = module
return module
First, we check if the module has been imported before. If it was, then it will be available in the global list of all modules (sys.modules), and so will simply be reused. In the case that the module is not available, we create it from the code. Once the function returns, the module will be assigned to the variable name that you have chosen. As you can see the process is not inefficient or wasteful. All you are doing is creating an alias for your module. In most cases, transparency is prefered, hence having a quick look at the top of the file can tell you what resources are available to you. Otherwise, you might end up in a situation where you are wondering where is a given resource coming from. So, that is why you do not get modules inherently "imported".
Resource:
Python doc on importing

Is there a module cache in python? [duplicate]

If a large module is loaded by some submodule of your code, is there any benefit to referencing the module from that namespace instead of importing it again?
For example:
I have a module MyLib, which makes extensive use of ReallyBigLib. If I have code that imports MyLib, should I dig the module out like so
import MyLib
ReallyBigLib = MyLib.SomeModule.ReallyBigLib
or just
import MyLib
import ReallyBigLib
Python modules could be considered as singletons... no matter how many times you import them they get initialized only once, so it's better to do:
import MyLib
import ReallyBigLib
Relevant documentation on the import statement:
https://docs.python.org/2/reference/simple_stmts.html#the-import-statement
Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is sys.modules, the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import.
The imported modules are cached in sys.modules:
This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.
As others have pointed out, Python maintains an internal list of all modules that have been imported. When you import a module for the first time, the module (a script) is executed in its own namespace until the end, the internal list is updated, and execution of continues after the import statement.
Try this code:
# module/file a.py
print "Hello from a.py!"
import b
# module/file b.py
print "Hello from b.py!"
import a
There is no loop: there is only a cache lookup.
>>> import b
Hello from b.py!
Hello from a.py!
>>> import a
>>>
One of the beauties of Python is how everything devolves to executing a script in a namespace.
It makes no substantial difference. If the big module has already been loaded, the second import in your second example does nothing except adding 'ReallyBigLib' to the current namespace.
WARNING: Python does not guarantee that module will not be initialized twice.
I've stubled upon such issue. See discussion:
http://code.djangoproject.com/ticket/8193
The internal registry of imported modules is the sys.modules dictionary, which maps module names to module objects. You can look there to see all the modules that are currently imported.
You can also pull some useful tricks (if you need to) by monkeying with sys.modules - for example adding your own objects as pseudo-modules which can be imported by other modules.
It is the same performancewise. There is no JIT compiler in Python yet.

Why import when you need to use the full name?

In python, if you need a module from a different package you have to import it. Coming from a Java background, that makes sense.
import foo.bar
What doesn't make sense though, is why do I need to use the full name whenever I want to use bar? If I wanted to use the full name, why do I need to import? Doesn't using the full name immediately describe which module I'm addressing?
It just seems a little redundant to have from foo import bar when that's what import foo.bar should be doing. Also a little vague why I had to import when I was going to use the full name.
The thing is, even though Python's import statement is designed to look similar to Java's, they do completely different things under the hood. As you know, in Java an import statement is really little more than a hint to the compiler. It basically sets up an alias for a fully qualified class name. For example, when you write
import java.util.Set;
it tells the compiler that throughout that file, when you write Set, you mean java.util.Set. And if you write s.add(o) where s is an object of type Set, the compiler (or rather, linker) goes out and finds the add method in Set.class and puts in a reference to it.
But in Python,
import util.set
(that is a made-up module, by the way) does something completely different. See, in Python, packages and modules are not just names, they're actual objects, and when you write util.set in your code, that instructs Python to access an object named util and look for an attribute on it named set. The job of Python's import statement is to create that object and attribute. The way it works is that the interpreter looks for a file named util/__init__.py, uses the code in it to define properties of an object, and binds that object to the name util. Similarly, the code in util/set.py is used to initialize an object which is bound to util.set. There's a function called __import__ which takes care of all of this, and in fact the statement import util.set is basically equivalent to
util = __import__('util.set')
The point is, when you import a Python module, what you get is an object corresponding to the top-level package, util. In order to get access to util.set you need to go through that, and that's why it seems like you need to use fully qualified names in Python.
There are ways to get around this, of course. Since all these things are objects, one simple approach is to just bind util.set to a simpler name, i.e. after the import statement, you can have
set = util.set
and from that point on you can just use set where you otherwise would have written util.set. (Of course this obscures the built-in set class, so I don't recommend actually using the name set.) Or, as mentioned in at least one other answer, you could write
from util import set
or
import util.set as set
This still imports the package util with the module set in it, but instead of creating a variable util in the current scope, it creates a variable set that refers to util.set. Behind the scenes, this works kind of like
_util = __import__('util', fromlist='set')
set = _util.set
del _util
in the former case, or
_util = __import__('util.set')
set = _util.set
del _util
in the latter (although both ways do essentially the same thing). This form is semantically more like what Java's import statement does: it defines an alias (set) to something that would ordinarily only be accessible by a fully qualified name (util.set).
You can shorten it, if you would like:
import foo.bar as whateveriwant
Using the full name prevents two packages with the same-named submodules from clobbering each other.
There is a module in the standard library called io:
In [84]: import io
In [85]: io
Out[85]: <module 'io' from '/usr/lib/python2.6/io.pyc'>
There is also a module in scipy called io:
In [95]: import scipy.io
In [96]: scipy.io
Out[96]: <module 'scipy.io' from '/usr/lib/python2.6/dist-packages/scipy/io/__init__.pyc'>
If you wanted to use both modules in the same script, then namespaces are a convenient way to distinguish the two.
In [97]: import this
The Zen of Python, by Tim Peters
...
Namespaces are one honking great idea -- let's do more of those!
in Python, importing doesn't just indicate you might use something. The import actually executes code at the module level. You can think of the import as being the moment where the functions are 'interpreted' and created. Any code that is in the _____init_____.py level or not inside a function or class definition happens then.
The import also makes an inexpensive copy of the whole module's namespace and puts it inside the namespace of the file / module / whatever where it is imported. An IDE then has a list of the functions you might be starting to type for command completion.
Part of the Python philosophy is explicit is better than implicit. Python could automatically import the first time you try to access something from a package, but that's not explicit.
I'm also guessing that package initialization would be much more difficult if the imports were automatic, as it wouldn't be done consistently in the code.
You're a bit confused about how Python imports work. (I was too when I first started.) In Python, you can't simply refer to something within a module by the full name, unlike in Java; you HAVE to import the module first, regardless of how you plan on referring to the imported item. Try typing math.sqrt(5) in the interpreter without importing math or math.sqrt first and see what happens.
Anyway... the reason import foo.bar has you required to use foo.bar instead of just bar is to prevent accidental namespace conflicts. For example, what if you do import foo.bar, and then import baz.bar?
You could, of course, choose to do import foo.bar as bar (i.e. aliasing), but if you're doing that you may as well just use from foo import bar. (EDIT: except when you want to import methods and variables. Then you have to use the from ... import ... syntax. This includes instances where you want to import a method or variable without aliasing, i.e. you can't simply do import foo.bar if bar is a method or variable.)
Other than in Java, in Python import foo.bar declares, that you are going to use the thing referred to by foo.bar.
This matches with Python's philosophy that explicit is better than implicit. There are more programming languages that make inter-module dependencies more explicit than Java, for example Ada.
Using the full name makes it possible to disambiguate definitions with the same name coming from different modules.
You don't have to use the full name. Try one of these
from foo import bar
import foo.bar as bar
import foo.bar
bar = foo.bar
from foo import *
A few reasons why explicit imports are good:
They help signal to humans and tools what packages your module depends on.
They avoid the overhead of dynamically determining which packages have to be loaded (and possibly compiled) at run time.
They (along with sys.path) unambiguously distinguish symbols with conflicting names from different namespaces.
They give the programmer some control of what enters the namespace within which he is working.

Categories

Resources