I've stumbled across some odd python (2.7) import behaviour, which, whilst easy to work around, has me scratching my head.
Given the following folder structure:
test/
__init__.py
x.py
package/
__init__.py
x.py
Where test/package/__init__.py contains the following
from .. import x
print x
from .x import hello
print x
print x.hello
And test/package/x.py contains the following
hello = 1
Why would running import test.package from a REPL result in the following output?
<module 'test.x' from 'test/x.pyc'>
<module 'test.package.x' from 'test/package/x.pyc'>
1
I would have expected x to reference the top level x module, however what the second import does instead, is to import the whole local x module (not just hello as I expected), effectively trampling on the first import.
Can anyone explain the mechanics of the import here?
The from .x import name realizes that test.package.x needs to be a module. It then checks the corresponding entry in sys.modules; if it is found there, then sys.modules['test.package.x'].hello is imported into the calling module.
However, if sys.modules['test.package.x'] does not exist yet, the module is loaded; and as the last step of loading the sys.modules['test.package'].x is set to point to the newly loaded module, even if you explicitly did not ask for it. Thus the second import overrides the name of the first import.
This is by design, otherwise
import foo.bar.baz
foo.bar.baz.x()
and
from foo.bar import baz
baz.x()
wouldn't be interchangeable.
I am unable to find good documentation on this behaviour in the Python 2 documentation, but the Python 3 behaviour is essentially the same in this case:
When a submodule is loaded using any mechanism (e.g. importlib APIs, the import or import-from statements, or built-in __import__()) a binding is placed in the parent module’s namespace to the submodule object. For example, if package spam has a submodule foo, after importing spam.foo, spam will have an attribute foo which is bound to the submodule.
[...]
The invariant holding is that if you have sys.modules['spam'] and sys.modules['spam.foo'] (as you would after the above import), the latter must appear as the foo attribute of the former.
Related
I have a Python module with the following structure:
mymod/
__init__.py
tools.py
# __init__.py
from .tools import foo
# tools.py
def foo():
return 42
Now, when import mymod, I see that it has the following members:
mymod.foo()
mymod.tools.foo()
I don't want the latter though; it just pollutes the namespace.
Funnily enough, if tools.py is called foo.py you get what you want:
mymod.foo()
(Obviously, this only works if there is just one function per file.)
How do I avoid importing tools? Note that putting foo() into __init__.py is not an option. (In reality, there are many functions like foo which would absolutely clutter the file.)
The existence of the mymod.tools attribute is crucial to maintaining proper function of the import system. One of the normal invariants of Python imports is that if a module x.y is registered in sys.modules, then the x module has a y attribute referring to the x.y module. Otherwise, things like
import x.y
x.y.y_function()
break, and depending on the Python version, even
from x import y
can break. Even if you don't think you're doing any of the things that would break, other tools and modules rely on these invariants, and trying to remove the attribute causes a slew of compatibility problems that are nowhere near worth it.
Trying to make tools not show up in your mymod module's namespace is kind of like trying to not make "private" (leading-underscore) attributes show up in your objects' namespaces. It's not how Python is designed to work, and trying to force it to work that way causes more problems than it solves.
The leading-underscore convention isn't just for instance variables. You could mark your tools module with a leading underscore, renaming it to _tools. This would prevent it from getting picked up by from mymod import * imports (unless you explicitly put it in an __all__ list), and it'd change how IDEs and linters treat attempts to access it directly.
You are not importing the tools module, it's just available when you import the package like you're doing:
import mymod
You will have access to everything defined in the __init__ file and all the modules of this package:
import mymod
# Reference a module
mymod.tools
# Reference a member of a module
mymod.tools.foo
# And any other modules from this package
mymod.tools.subtools.func
When you import foo inside __init__ you are are just making foo available there just like if you have defined it there, but of course you defined it in tools which is a way to organize your package, so now since you imported it inside __init__ you can:
import mymod
mymod.foo()
Or you can import foo alone:
from mymod import foo
foo()
But you can import foo without making it available inside __init__, you can do the following which is exactly the same as the example above:
from mymod.tools import foo
foo()
You can use both approaches, they're both right, in all these example you are not "cluttering the file" as you can see accessing foo using mymod.tools.foo is namespaced so you can have multiple foos defined in other modules.
Try putting this in your __init__.py file:
from .tools import foo
del tools
It seems that entities imported using two different PYTHONPATHs are not the same objects.
I have encouneted a little problem in my code and I want to explain it with a little testcase.
I created the source tree:
a/
__init__.py
b/
__init__.py
example.py
in example.py:
class Example:
pass
and from the parent of folder a, I run python and this test:
>>> import sys
>>> sys.path.append("/home/marco/temp/a")
>>>
>>> import a.b.example as example1
>>> import b.example as example2
>>>
>>> example1.Example is example2.Example
False
So the question is: why the result is False? Even if imported by two different paths, the class is the same. This is a complete mess if the class is a custom exception and you try to catch it with except.
Tested with python 3.4.3
In Python the class statement is an executable statement, so each time you execute it you will create a new class.
When you import a module Python will check sys.modules to see whether a module at the specified path already exists. If it does then you will just get back the same module, if not it will try to load the module and execute the code it contains.
So two different paths to the same module will load the code twice, which executes the class statement twice and you get two independent classes defined.
This usually hits people when they have a file a.py which they run as a script and then in another module attempt to import a. The script is loaded as __main__ so has different classes and different global variables than the imported module.
The moral is, always be consistent in how you reference a module.
The is operator is used to check if two names are pointing to the same object (memory location).
example1.Example is example2.Example
are obviously not pointing at the same locations since you are importing the same object two times.
But, if you did something like:
a, b = example1.Example, example1.Example
a is b # True
Instead, you should use the == operator:
example1.Example == example2.Example
True
Note that if you don't implement __eq__or __hash__, the default behavior is the same as is
I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.
I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.
I am currently doing a python tutorial, but they use IDLE, and I opted to use the interpreter on terminal. So I had to find out how to import a module I created. At first I tried
import my_file
then I tried calling the function inside the module by itself, and it failed. I looked around and doing
my_file.function
works. I am very confused why this needs to be done if it was imported. Also, is there a way around it so that I can just call the function? Can anyone point me in the right direction. Thanks in advance.
If you wanted to use my_file.function by just calling function, try using the from keyword.
Instead of import my_file try from my_file import *.
You can also do this to only import parts of a module like so :
from my_file import function1, function2, class1
To avoid clashes in names, you can import things with a different name:
from my_file import function as awesomePythonFunction
EDIT:
Be careful with this, if you import two modules (myfile, myfile2) that both have the same function inside, function will will point to the function in whatever module you imported last. This could make interesting things happen if you are unaware of it.
This is a central concept to python. It uses namespaces (see the last line of import this). The idea is that with thousands of people writing many different modules, the likelihood of a name collision is reasonably high. For example, I write module foo which provides function baz and Joe Smith writes module bar which provides a function baz. My baz is not the same as Joe Smiths, so in order to differentiate the two, we put them in a namespace (foo and bar) so mine can be called by foo.baz() and Joe's can be called by bar.baz().
Of course, typing foo.baz() all the time gets annoying if you just want baz() and are sure that none of your other modules imported will provide any problems... That is why python provides the from foo import * syntax, or even from foo import baz to only import the function/object/constant baz (as others have already noted).
Note that things can get even more complex:
Assume you have a module foo which provides function bar and baz, below are a few ways to import and then call the functions contained inside foo...
import foo # >>> foo.bar();foo.baz()
import foo as bar # >>> bar.bar();bar.baz()
from foo import bar,baz # >>> bar(); baz()
from foo import * # >>> bar(); baz()
from foo import bar as cow # >>> cow() # This calls bar(), baz() is not available
...
A basic import statement is an assignment of the module object (everything's an object in Python) to the specified name. I mean this literally: you can use an import anywhere in your program you can assign a value to a variable, because they're the same thing. Behind the scenes, Python is calling a built-in function called __import__() to do the import, then returning the result and assigning it to the variable name you provided.
import foo
means "import module foo and assign it the name foo in my namespace. This is the same as:
foo = __import__("foo")
Similarly, you can do:
import foo as f
which means "import module foo and assign it the name f in my namespace." This is the same as:
f = __import__("foo")
Since in this case, you have only a reference to the module object, referring to things contained by the module requires attribute access: foo.bar etc.
You can also do from foo import bar. This creates a variable named bar in your namespace that points to the bar function in the foo module. It's syntactic sugar for:
bar = __import__("foo").bar
I don't really understand your confusion. You've imported the name my_file, not anything underneath it, so that's how you reference it.
If you want to import functions or classes inside a module directly, you can use:
from my_file import function
I'm going to incorporate many of the comments already posted.
To have access to function without having to refer to the module my_file, you can do one of the following:
from my_file import function
or
from my_file import *
For a more in-depth description of how modules work, I would refer to the documentation on python modules.
The first is the preferred solution, and the second is not recommended for many reasons:
It pollutes your namespace
It is not a good practice for maintainability (it becomes more difficult to find where specific names reside.
You typically don't know exactly what is imported
You can't use tools such as pyflakes to statically detect errors in your code
Python imports work differently than the #includes/imports in a static language like C or Java, in that python executes the statements in a module. Thus if two modules need to import a specific name (or *) out of each other, you can run into circular referencing problems, such as an ImportError when importing a specific name, or simply not getting the expected names defined (in the case you from ... import *). When you don't request specific names, you don't run into the, risk of having circular references, as long as the name is defined by the time you actually want to use it.
The from ... import * also doesn't guarantee you get everything. As stated in the documentation on python modules, a module can defined the __all__ name, and cause from ... import * statements to miss importing all of the subpackages, except those listed by __all__.
Take the following code example:
File package1/__init__.py:
from moduleB import foo
print moduleB.__name__
File package1/moduleB.py:
def foo(): pass
Then from the current directory:
>>> import package1
package1.moduleB
This code works in CPython. What surprises me about it is that the from ... import in __init__.py statement makes the moduleB name visible. According to Python documentation, this should not be the case:
The from form does not bind the module name
Could someone please explain why CPython works that way? Is there any documentation describing this in detail?
The documentation misled you as it is written to describe the more common case of importing a module from outside of the parent package containing it.
For example, using "from example import submodule" in my own code, where "example" is some third party library completely unconnected to my own code, does not bind the name "example". It does still import both the example/__init__.py and example/submodule.py modules, create two module objects, and assign example.submodule to the second module object.
But, "from..import" of names from a submodule must set the submodule attribute on the parent package object. Consider if it didn't:
package/__init__.py executes when package is imported.
That __init__ does "from submodule import name".
At some point later, other completely different code does "import package.submodule".
At step 3, either sys.modules["package.submodule"] doesn't exist, in which case loading it again will give you two different module objects in different scopes; or sys.modules["package.submodule"] will exist but "submodule" won't be an attribute of the parent package object (sys.modules["package"]), and "import package.submodule" will do nothing. However, if it does nothing, the code using the import cannot access submodule as an attribute of package!
Theoretically, how importing a submodule works could be changed if the rest of the import machinery was changed to match.
If you just need to know what importing a submodule S from package P will do, then in a nutshell:
Ensure P is imported, or import it otherwise. (This step recurses to handle "import A.B.C.D".)
Execute S.py to get a module object. (Skipping details of .pyc files, etc.)
Store module object in sys.modules["P.S"].
setattr(sys.modules["P"], "S", sys.modules["P.S"])
If that import was of the form "import P.S", bind "P" in local scope.
this is because __init__.py represent itself as package1 module object at runtime, so every .py file will be defined as an submodule. and rewrite __all__ will not make any sense. you can make another file e.g example.py and fill it with the same code in __init__.py and it will raise NameError.
i think CPython runtime takes special algorithm when __init__.py looking for variables differ from other python files, may be like this:
looking for variable named "moduleB"
if not found:
if __file__ == '__init__.py': #dont raise NameError, looking for file named moduleB.py
if current dir contains file named "moduleB.py":
import moduleB
else:
raise namerror