http://docs.python.org/2/library/runpy.html#runpy.run_module
My question is about this part of the run_module documentation.
... and then executed in a fresh module namespace.
What is a "module namespace" in python? In what ways does runpy differ from import?
Every module executes with its own set of global variables, which become the module's attributes. A module namespace is where a module's globals go; "executed in a fresh module namespace" means "executed with its own global variable environment".
A Python interpreter only executes a module's code the first time it's imported in any given program. Further import statements simply return the existing module object. This prevents exponential import explosion when modules a and b both import modules c and d, which both import e and f, etc. It also means that all modules see the same versions of, say, collections.defaultdict, so type checks behave intuitively. runpy.run_module says "run the code in this module, whether or not it was imported already, and don't count it as an import." If you run_module a module and then __import__ it, the dict you got from run_module will contain objects very similar to, but distinct from, the objects in the module you got from __import__.
Related
From Dynamic linking in C/C++ (dll) vs JAVA (JAR)
when i want to use this jar file in another project we use "package" or "import" keyword
You don't have to. This is just a short hand. You can use full
package.ClassName and there is no need for an import. Note: this
doesn't import any code or data, just allow you to use a shorter name
for the class.
e.g. there is no difference between
java.util.Date date = new java.util.Date();
and
import java.util.Date();
Date date = new Date(); // don't need to specify the full package name.
Is it the same case for import in Python3?
Can we use a identifier defined in a module, without importing its module? Did I miss something in the following to make that happen?
What differences are between Java and Python's import?
>>> random.randint(1,25)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'random' is not defined
>>> import random
>>> random.randint(1,25)
18
Python is not Java. In Python you can only access names that are either builtins or defined in the current scope or it's parents scopes - the "top-level" scope being the module namespace (AKA "global" namespace).
The import statement (which is an executable statement FWIW) does two things: first load the module (this actually happens only once per process, then the module is cached in sys.modules), then bind the imported name(s) in the current scope. IOW this:
import foo
is syntaxic sugar for
foo = __import__("foo")
and
from foo import bar
is syntaxic sugar for
foo = __import__("foo")
bar = getattr(foo, "bar")
del foo
Also you have to understand what "loading a module" really means: executing all the code at the module's top-level.
As I mentionned, import is an executable statement, but so are class and def - the def statement creates a code object from the function's body and signature, then creates a function object with this code object, and finally bind this function object to the function's name in the current scope, the class statement does the same thing for a class (executing all the code at the "class" statement's top-level in a temporary namespace and using this namespace to create the "class" object, then binding the class object to it's name).
IOW, all happens at runtime, everything is an object (including functions, classes and modules), and everything you do with an import, class or def statement can be done "manually" too (more or less easily though - manually creating a function is quite an involved process).
So as you can see, this really has nothing to do with how either Java or C++ work.
Short answer: No, you can't implicitly import a module in Python by using a fully qualified name.
Slightly longer answer:
In Python, importing a module can have side effects: a module can have module-level code, not all of its code has to be wrapped in functions or classes. Therefore, importing modules at arbitrary and unexpected locations could be confusing, since it would trigger those side effects when you don't expect them.
The recommended style (see https://www.python.org/dev/peps/pep-0008/ for details) is to put all your imports at the top of your module, and not to hide imports at unexpected places.
Given a class or function, is there a way to find the full path of the module where it is originally defined? (I.e. using def xxx or class xxx.)
I'm aware that there is sys.modules[func.__module__]. However, if func is imported in a package's __init__.py, then sys.modules will simply redirect to that __init__.py, because the function has been brought into that namespace, as far as my understanding goes.
A concrete example:
>>> import numpy as np
>>> import sys
>>> np.broadcast.__module__
'numpy'
>>> sys.modules[np.broadcast.__module__]
<module 'numpy' from '/Users/brad/.../site-packages/numpy/__init__.py'>
Obviously, broadcast is not defined in __init__.py; it is just brought into the namespace with one of these from module import * statements.
It would be nice to see where in the source np.broadcast is defined (regardless of the file extension, be it .c or .py). Is this possible?
Your understanding:
However, if func is imported in a package's __init__.py, then
sys.modules will simply redirect to that __init__.py, because the
function has been brought into that namespace, as far as my
understanding goes.
is wrong. __init__.py importing a thing has no effect on that thing's __module__.
The behavior you're seeing with numpy.broadcast happens because C types don't really have a "defining module" the same way types written in Python do. numpy.broadcast.__module__ == 'numpy' because numpy.broadcast is written in C and declares its name to be "numpy.broadcast", and a C type's __module__ is determined from its name.
As for how to get a class or function's "module of original definition", the best you really have is __module__ and other functions that go through __module__.
If a large module is loaded by some submodule of your code, is there any benefit to referencing the module from that namespace instead of importing it again?
For example:
I have a module MyLib, which makes extensive use of ReallyBigLib. If I have code that imports MyLib, should I dig the module out like so
import MyLib
ReallyBigLib = MyLib.SomeModule.ReallyBigLib
or just
import MyLib
import ReallyBigLib
Python modules could be considered as singletons... no matter how many times you import them they get initialized only once, so it's better to do:
import MyLib
import ReallyBigLib
Relevant documentation on the import statement:
https://docs.python.org/2/reference/simple_stmts.html#the-import-statement
Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is sys.modules, the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import.
The imported modules are cached in sys.modules:
This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. Note that removing a module from this dictionary is not the same as calling reload() on the corresponding module object.
As others have pointed out, Python maintains an internal list of all modules that have been imported. When you import a module for the first time, the module (a script) is executed in its own namespace until the end, the internal list is updated, and execution of continues after the import statement.
Try this code:
# module/file a.py
print "Hello from a.py!"
import b
# module/file b.py
print "Hello from b.py!"
import a
There is no loop: there is only a cache lookup.
>>> import b
Hello from b.py!
Hello from a.py!
>>> import a
>>>
One of the beauties of Python is how everything devolves to executing a script in a namespace.
It makes no substantial difference. If the big module has already been loaded, the second import in your second example does nothing except adding 'ReallyBigLib' to the current namespace.
WARNING: Python does not guarantee that module will not be initialized twice.
I've stubled upon such issue. See discussion:
http://code.djangoproject.com/ticket/8193
The internal registry of imported modules is the sys.modules dictionary, which maps module names to module objects. You can look there to see all the modules that are currently imported.
You can also pull some useful tricks (if you need to) by monkeying with sys.modules - for example adding your own objects as pseudo-modules which can be imported by other modules.
It is the same performancewise. There is no JIT compiler in Python yet.
This question already has answers here:
Does Python have “private” variables in classes?
(15 answers)
Closed 7 years ago.
(I've checked out Does Python have “private” variables in classes? -- it asks about classes rather than modules. As such, answers there don't cover import which is what I'm interested in.)
Consider, there is a module called X with variable y. If any other module tries to import the module X, how to avoid loading the variable y in Python?
For example:
# x.py
y=10
We then use this in some other module:
import x
print x.y
How to avoid accessing x.y from the module x.py ?
If you do import module, there's no way to hide any global members, and that's intentional: the module object returned is the "true" one, the very same that's used by the module's members. Dirty hacks like __getattr__ are also prohibited for modules.
One way is to mark "internal" entities with a leading underscore to hint the user they are not intended for external use. This isn't necessary for references to other modules imported by yours since the guidelines explicitly discourage external use of them (the only exception is if the referenced module is inaccessible the normal way).
When doing from module import *, however, you don't get a module reference but import things from it directly into the current namespace. By default, everything except names starting from an underscore is imported. You can override this by defining the __all__ module attribute.
In normal use import foo only imports the module once; all other imports see that it has been imported, and doesn't load it again.
Quoth https://docs.python.org/3/reference/import.html#the-module-cache:
During import, the module name is looked up in sys.modules and if
present, the associated value is the module satisfying the import, and
the process completes. However, if the value is None, then an
ImportError is raised. If the module name is missing, Python will
continue searching for the module.
This has been long been a feature of the interpreter going back at least to version 2.7 and probably earlier.
Speaking to your specific question, there is no variable y, there is x.y because that's what you imported. If you do the highly unrecommended from module import * you can end up with a y, but you shouldn't do that.
I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.
I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.