How is the __name__ variable in a Python module defined? - python

I'm aware of the standard example: if you execute a module directly then it's __name__ global variable is defined as "__main__". However, nowhere in the documentation can I find a precise description of how __name__ is defined in the general case. The module documentation says...
Within a module, the module's name (as a string) is available as the value of the global variable __name__.
...but what does it mean by "the module's name"? Is it just the name of the module (the filename with .py removed), or does it include the fully-qualified package name as well?
How is the value of the __name__ variable in a Python module determined? For bonus points, indicate precisely where in the Python source code this operation is performed.

It is set to the absolute name of the module as imported. If you imported it as foo.bar, then __name__ is set to 'foo.bar'.
The name is determined in the import.c module, but because that module handles various different types of imports (including zip imports, bytecode-only imports and extension modules) there are several code paths to trace through.
Normally, import statements are translated to a call to __import__, which is by default implemented as a call to PyImport_ImportModuleLevelObject. See the __import__() documentation to get a feel for what the arguments mean. Within PyImport_ImportModuleLevelObject relative names are resolved, so you can chase down the name variables there if you want to.
The rest of the module handles the actual imports, with PyImport_AddModuleObject creating the actual namespace object and setting the name key, but you can trace that name value back to PyImport_ImportModuleLevelObject. By creating a module object, it's __name__ value is set in the moduleobject.c object constructor.

The __name__ variable is an attribute of the module that would be accessible by the name.
import os
assert os.__name__ == 'os'
Example self created module that scetches the import mechanism:
>>> import types
>>> m = types.ModuleType("name of module") # create new module with name
>>> exec "source_of_module = __name__" in m.__dict__ # execute source in module
>>> m.source_of_module
'name of module'
Lines from types module:
import sys
ModuleType = type(sys)

Related

Python 'from x import z' imports more than just 'z' [duplicate]

I've noticed that asyncio/init.py from python 3.6 uses the following construct:
from .base_events import *
...
__all__ = (base_events.__all__ + ...)
The base_events symbol is not imported anywhere in the source code, yet the module still contains a local variable for it.
I've checked this behavior with the following code, put into an __init__.py with a dummy test.py next to it:
test = "not a module"
print(test)
from .test import *
print(test)
not a module
<module 'testpy.test' from 'C:\Users\MrM\Desktop\testpy\test.py'>
Which means that the test variable got shadowed after using a star import.
I fiddled with it a bit, and it turns out that it doesn't have to be a star import, but it has to be inside an __init__.py, and it has to be relative. Otherwise the module object is not being assigned anywhere.
Without the assignment, running the above example from a file that isn't an __init__.py will raise a NameError.
Where is this behavior coming from? Has this been outlined in the spec for import system somewhere? What's the reason behind __init__.py having to be special in this way? It's not in the reference, or at least I couldn't find it.
This behavior is defined in The import system documentation section 5.4.2 Submodules
When a submodule is loaded using any mechanism (e.g. importlib APIs,
the import or import-from statements, or built-in import()) a
binding is placed in the parent module’s namespace to the submodule
object. For example, if package spam has a submodule foo, after
importing spam.foo, spam will have an attribute foo which is bound to
the submodule.
A package namespace includes the namespace created in __init__.py plus extras added by the import system. The why is for namespace consistency.
Given Python’s familiar name binding rules this might seem surprising,
but it’s actually a fundamental feature of the import system. The
invariant holding is that if you have sys.modules['spam'] and
sys.modules['spam.foo'] (as you would after the above import), the
latter must appear as the foo attribute of the former.
This appears to have everything to do with the interplay of how the interpreter resolve variable assignments as the module/submodule level. We may be able to acquire additional information if we instead interrogate what the assignments are using code executed outside the module we are trying to interrogate.
In my example, I have the following:
Code listing for src/example/package/module.py:
from logging import getLogger
__all__ = ['fn1']
logger = getLogger(__name__)
def fn1():
logger.warning('running fn1')
return 'fn1'
Code listing for src/example/package/__init__.py:
def print_module():
print("`module` is assigned with %r" % module)
Now execute the following in the interactive interpreter:
>>> from example.package import print_module
>>> print_module()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/example.package/src/example/package/__init__.py", line 2, in print_module
print("`module` is assigned with %r" % module)
NameError: name 'module' is not defined
So far so good, the exception looks perfectly normal. Now let's see what happens if example.package.module gets imported:
>>> import example.package.module
>>> print_module()
`module` is assigned with <module 'example.package.module' from '/tmp/example.package/src/example/package/module.py'>
Given that relative import is a short-hand syntax for the full import, let's see what happens if we modify the __init__.py to contain the absolute import rather than relative like what was just done in the interactive interpreter and see what happens now:
import example.package.module
def print_module():
print("`module` is assigned with %r" % module)
Launch the interactive interpreter once more, we see this:
>>> print_module()
`module` is assigned with <module 'example.package.module' from '/tmp/example.package/src/example/package/module.py'>
Note that __init__.py actually represents the module binding example.package, an intuition might be that if example.package.module is imported, the interpreter will then provide an assignment of module to example.package to aid with the resolution of example.package.module, regardless of absolute or relative imports being done. This seems to be a particular quirk of executing code at a module that may have submodules (i.e. __init__.py).
Actually, one more test. Let's see if there is just something weird to do with variable assignments. Modify src/example/package/__init__.py to:
import example.package.module
def print_module():
print("`module` is assigned with %r" % module)
def delete_module():
del module
The new function would test whether or not module was actually assigned to the scope at __init__.py. Executing this we learn that:
>>> from example.package import print_module, delete_module
>>> print_module()
`module` is assigned with <module 'example.package.module' from '/tmp/example.package/src/example/package/module.py'>
>>> delete_module()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/example.package/src/example/package/__init__.py", line 7, in delete_module
del module
UnboundLocalError: local variable 'module' referenced before assignment
Indeed, it wasn't, so the interpreter is truly resolving the reference at module through the import system, rather than any variable that got assigned to the scope within __init__.py. So the prior intuition was actually wrong but it is rather the interpreter resolving the module name within example.package (even if this is done inside the scope of __init__.py) through the module system once example.package.module was imported.
I haven't looked at the specific PEPs that deals with assignment/name resolutions for modules and imports, but given that this little exercise proved that the issue is not simply reliant on relative imports, and that assignment is triggered regardless when or where the import was done, there might be something there, but this hopefully provided a greater understanding of how Python's import system deals with resolving names relating to imported modules.

Python - Calling a module function which uses globals() but it only pulls globals() from the module not the current python instance

python version: 3.8.1
platform = Windows 10 pro
dev environment : visual studio code, jupyter notebook, command line
I have function that I import from a personal module to find all the current Pandas - DataFrames in memory or the globals(). However I only get the globals() from the module. (I know now that globals() only applies to the module it is called in) <--- This is my problem!
As stated above, I know this is not even remotely able to be done with the normal methodology. Here is my code from start to finish. **Note that I am calling my function from the module and this will only return "module globals()" not 'local globals()' from my current python instance's globals().
Code from "my_module" module:
# my_module.py #
import pandas as pd
module_dataframe = pd.Dataframe({'yourName'},columns = ['Name']) # creates dataframe in my module
def curDFs():
i ='' # create variable before trying to loop through globals() prevents errors when initiating the loop
dfList = []
for i in globals():
if str(type(globals()[i])) == "<class 'pandas.core.frame.DataFrame'>":
dfList.append(i)
return df_list
so you can see I am creating a Dataframe in my module and creating the function curDFs() to return the name of variables in globals() that are dataframes.
Below is the code from a brand new python session:
# new_window.py #
import my_module as mm
#not importing pandas since already imported into "my_module"
new_dataframe = mm.pd.DataFrame({'name'},columns = ['YourName'])
#calling globals() here will return the "local global variable"->"new_dataframe" created above
#if i call my "curDFs()" function from the module, I return not 'new_dataframe' but the 'module_dataframe' from the module
mm.curDFs()
#returns
['module_dataframe']
I need this to return only the "new_dataframe" variable. How would I do this? I am so stuck, every other post just goes over global and local scope or how to create a config.py file of global variables. However this is going to have to be dynamic and not static as I am seeing the congig.py file to be.
I think it would be a hassle to have to build this function every single instance I spin up. I want to be able import it so I can share with others or minimize the repetitive typing caused by having to copy paste or retype in every new python instance.
Any comments or help here is much appreciated.
Sorry, it seems I did not your question carefully enough.
For your problem I found a hint at the this question.
It seems that the inspect module can do what you want:
# my_module.py #
import pandas as pd
import inspect
module_dataframe = pd.Dataframe({'yourName'},columns = ['Name']) # creates dataframe in my module
def curDFs():
caller_frame = inspect.stack()[1]
caller_module = inspect.getmodule(caller_frame[0])
return [name for name, entry in caller_module.__dict__.items() if str(type(entry)) == "<class 'pandas.core.frame.DataFrame'>"]
This works on my tests, but please note that this may fail under various conditions (some caveats are at the linked post).
This is the expected behaviour. According to the Python Language Reference, Chapter 4.2.2 (Resolution of Names) (emphasis added):
If the global statement occurs within a block, all uses of the name specified in the statement refer to the binding of that name in the top-level namespace. Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, the namespace of the module builtins. The global namespace is searched first. If the name is not found there, the builtins namespace is searched. The global statement must precede all uses of the name.
The namespace of the module containing the code block in your case is the namespace of my_module.py and only of my_module.py.

Given an imported module, how can I determine the import path?

I have a python function that takes a imported module as a parameter:
def printModule(module):
print("That module is named '%s'" % magic(module))
import foo.bar.baz
printModule(foo.bar.baz)
What I want is to be able to extract the module name (in this case foo.bar.baz) from a passed reference to the module. In the above example, the magic() function is a stand-in for the function I want.
__name__ would normally work, but that requires executing in the context of the passed module, and I'm trying to extract this information from merely a reference to the module.
What is the proper procedure here?
All I can think of is either doing string(module), and then some text hacking, or trying to inject a function into the module that I can then call to have return __name__, and neither of those solutions is elegant. (I tried both these, neither actually work. Modules can apparently permute their name in the string() representation, and injecting a function into the module and then calling it just returns the caller's context.)
The __name__ attribute seems to work:
def magic(m):
return m.__name__
If you have a string with the module name, you can use pkgutil.
import pkgutil
pkg = pkgutil.get_loader(module_name)
print pkg.fullname
From the module itself,
import pkgutil
pkg = pkgutil.get_loader(module.__name__)
print pkg.fullname

Does python ever multiply-load a module?

I'm noticing some weird situations where tests like the following fail:
x = <a function from some module, passed around some big application for a while>
mod = __import__(x.__module__)
x_ref = getattr(mod, x.__name__)
assert x_ref is x # Fails
(Code like this appears in the pickle module)
I don't think I have any import hooks, reload calls, or sys.modules manipulation that would mess with python's normal import caching behavior.
Is there any other reason why a module would be loaded twice? I've seen claims about this (e.g, https://stackoverflow.com/a/10989692/1332492), but I haven't been able to reproduce it in a simple, isolated script.
I believe you misunderstood how __import__ works:
>>> from my_package import my_module
>>> my_module.function.__module__
'my_package.my_module'
>>> __import__(my_module.function.__module__)
<module 'my_package' from './my_package/__init__.py'>
From the documentation:
When the name variable is of the form package.module, normally, the
top-level package (the name up till the first dot) is returned, not
the module named by name. However, when a non-empty fromlist
argument is given, the module named by name is returned.
As you can see __import__ does not return the sub-module, but only the top package. If you have function also defined at package level you will indeed have different references to it.
If you want to just load a module you should use importlib.import_module instead of __import__.
As to answer you actual question: AFAIK there is no way to import the same module, with the same name, twice without messing around with the importing mechanism. However, you could have a submodule of a package that is also available in the sys.path, in this case you can import it twice using different names:
from some.package import submodule
import submodule as submodule2
print(submodule is submodule2) # False. They have *no* relationships.
This sometimes can cause problems with, e.g., pickle. If you pickle something referenced by submodule you cannot unpickle it using submodule2 as reference.
However this doesn't address the specific example you gave us, because using the __module__ attribute the import should return the correct module.

Some confusion regarding imports in Python

I'm new to Python and there's something that's been bothering me for quite some time. I read in "Learning Python" by Mark Lutz that when we use a from statement to import a name present in a module, it first imports the module, then assigns a new name to it (i.e. the name of the function, class, etc. present in the imported module) and then deletes the module object with the del statement. However what happens if I try to import a name using from that references a name in the imported module that itself is not imported? Consider the following example in which there are two modules mod1.py and mod2.py:
#mod1.py
from mod2 import test
test('mod1.py')
#mod2.py
def countLines(name):
print len(open(name).readlines())
def countChars(name):
print len(open(name).read())
def test(name):
print 'loading...'
countLines(name)
countChars(name)
print '-'*10
Now see what happens when I run or import mod1:
>>>import mod1
loading...
3
44
----------
Here when I imported and ran the test function, it ran successfully although I didn't even import countChars or countLines, and the from statement had already deleted the mod2 module object.
So I basically need to know why this code works even though considering the problems I mentioned it shouldn't.
EDIT: Thanx alot to everyone who answered :)
Every function have a __globals__ attribute which holds a reference for the environment where it search for global variables and functions.
The test function is then linked to the global variables of mod2. So when it calls countLines the interpreter will always find the right function even if you wrote a new one with the same name in the module importing the function.
I think you're wrestling with the way python handles namespaces. when you type from module import thing you are bringing thing from module into your current namespace. So, in your example, when mod1 gets imported, the code is evaluated in the following order:
from mod2 import test #Import mod2, bring test function into current module namespace
test("mod1.py") #run the test function (defined in mod2)
And now for mod2:
#create a new function named 'test' in the current (mod2) namespace
#the first time this module is imported. Note that this function has
#access to the entire namespace where it is defined (mod2).
def test(name):
print 'loading...'
countLines(name)
countChars(name)
print '-'*10
The reason that all of this is important is because python lets you choose exactly what you want to pull into your namespace. For example, say you have a module1 which defines function cool_func. Now you are writing another module (module2) and it makes since for module2 to have a function cool_func also. Python allows you to keep those separate. In module3 you could do:
import module1
import module2
module1.cool_func()
module2.cool_func()
Or, you could do:
from module1 import cool_func
import module2
cool_func() #module1
module2.cool_func()
or you could do:
from module1 import cool_func as cool
from module2 import cool_func as cooler
cool() #module1
cooler() #module2
The possibilities go on ...
Hopefully my point is clear. When you import an object from a module, you are choosing how you want to reference that object in your current namespace.
The other answers are better articulated than this one, but if you run the following you can see that countChars and countLines are actually both defined in test.__globals__:
from pprint import pprint
from mod2 import test
pprint(test.__globals___)
test('mod1')
You can see that importing test brings along the other globals defined in mod2, letting you run the function without worrying about having to import everything you need.
Each module has its own scope. Within mod1, you cannot use the names countLines or countChars (or mod2).
mod2 itself isn't affected in the least by how it happens to be imported elsewhere; all names defined in it are available within the module.
If the webpage you reference really says that the module object is deleted with the del statement, it's wrong. del only removes names, it doesn't delete objects.
From A GUIDE TO PYTHON NAMESPACES,
Even though modules have their own global namespaces, this doesn’t mean that all names can be used from everywhere in the module. A scope refers to a region of a program from where a namespace can be accessed without a prefix. Scopes are important for the isolation they provide within a module. At any time there are a number of scopes in operation: the scope of the current function you’re in, the scope of the module and then the scope of the Python builtins. This nesting of scopes means that one function can’t access names inside another function.
Namespaces are also searched for names inside out. This means that if there is a certain name declared in the module’s global namespace, you can reuse the name inside a function while being certain that any other function will get the global name. Of course, you can force the function to use the global name by prefixing the name with the ‘global’ keyword. But if you need to use this, then you might be better off using classes and objects.
An import statement loads the whole module in memory so that's why the test() function ran successfully.
But as you used from statement that's why you can't use the countLines and countChars directly but test can surely call them.
from statement basically loads the whole module and sets the imported function, variable etc to the global namespace.
for eg.
>>> from math import sin
>>> sin(90) #now sin() is a global variable in the module and can be accesed directly
0.89399666360055785
>>> math
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
math
NameError: name 'math' is not defined
>>> vars() #shows the current namespace, and there's sin() in it
{'__builtins__': <module '__builtin__' (built-in)>, '__file__': '/usr/bin/idle', '__package__': None, '__name__': '__main__', 'main': <function main at 0xb6ac702c>, 'sin': <built-in function sin>, '__doc__': None}
consider a simple file, file.py:
def f1():
print 2+2
def f2():
f1()
import only f2:
>>> from file import f2
>>> f2()
4
though I only imported f2() not f1() but it ran f1() succesfully it's because the module is loaded in memory but we can only access f2(), but f2() can access other parts of the module.

Categories

Resources