In Python C API, I already know how to import a module via PyImport_ImportModule, as described in Python Documentation: Importing Modules. I also know that there is a lot of ways to create or allocate or initialize a module and some functions for operating a module, as described in Python Documentation: Module Objects.
But how can I get a function from a module (and call it), or, get a type/class from a module (and instantiate it), or, get an object from a module (and operate on it), or get anything from a module and do anything I want to do?
I think this can be a fool question but I really cannot find any tutorial or documentation. The only way I think that I can achieve this is use PyModule_GetDict to get the __dict__ property of the module and fetch what I want, as described in the latter documentation I mentioned. But the documentation also recommend that one should not use this function to operate the module.
So any "official way" or best practice for getting something from a module?
According to the documentation for PyModule_GetDict:
It is recommended extensions use other PyModule_*() and PyObject_*() functions rather than directly manipulate a module’s __dict__.
The functions you need are generic object functions (PyObject_*) rather than module functions (PyModule_*), and I suspect this is where you were looking in the wrong place.
You want to use PyObject_GetAttr or PyObject_GetAttrString.
I've been using Python for a good period of time. I have never found out how built-in functions work. In different words, how are they included without having any module imported to use them? What if I want to add to them (locally)?
This may seem naive. But, I haven't really found any answer that explains comprehensively how do we have built-in functions, global variables, etc., available to us when developing a script.
In a nutshell, where do we include the builtins module?
I have encountered this question. But it gives a partial answer to my question.
The not-implementation-details part of the answer is that the builtins module, or __builtin__ in Python 2, provides access to the built-ins namespace. If you want to modify the built-ins (you usually shouldn't), setting attributes on builtins is how you'd go about it.
The implementation details part of the answer is that Python keeps track of built-ins in multiple ways. For example, each frame object keeps track of the built-in namespace it's using, which may be different from other frames' built-in namespaces. You can access this through a frame's f_builtins attribute. When a LOAD_GLOBAL instruction fails to find a name in the frame's globals, it looks in the frame's builtins. There's also a __builtins__ global variable in most global namespaces, but it's not directly used for built-in variable lookup; instead, it's used to initialize f_builtins in certain situations during frame object creation. There's also a builtins reference in the global PyInterpreterState, which is used as default builtins if there's no current frame object.
I'm wondering, if Python offers something similar to the package keyword in Perl. This keyword in effect creates a labeled namespace just anywhere in the code.
As far as I know, similar namespacing in Python is only possible by putting that code into a file and import it. But what if I have the code in a variable (e.g. read from some configuration file of my script)?
So in other words: Is there a way to eval Python code within an arbitrary namespace? In Perl I would just add
package my_pack;
at the beginning of that code and then eval it (within a namespace called my_pack)
Thanks for any help.
No, Perl's and Python's module systems work very differently. It is not possible to explicitly declare a specific Python module.
For a Python eval() or exec() that should execute the code within the context of a particular module, consider which aspects define this module for your purposes – the important aspect is likely that module's global variables. You can provide these explicitly, and capture the current environment via the globals() function. The environment is just a dict, which you can copy if you want to avoid modifications of the module's environment.
I wasn't looking what I had previously written on the line so I accidently declared a variable in ipython as:
np.zerosn=10
Surprisingly this was allowed. So I thought that maybe it was because you can name use periods in your variable names, but that is not the case. So I'm wondering what is actually happening. Is this adding a new variable to the numpy module?
Yes.
In general, (most/many) python objects have dynamic attribute spaces, and you can stick whatever you want onto them whenever you want. And modules are just objects. Their attribute space is essentially the same as their global scope.
Pure python functions are another (perhaps surprising) example of something onto which you can stick arbitrary attributes, though these are not associated with the function's local scope.
Most 'builtin' types (i.e. those which are implemented in extension modules, rather than those that are found in the __builtins__ module) and their instances, do not have dynamic attribute spaces. Neither do pure python types with __slots__.
For efficiency's sake I am trying to figure out how python works with its heap of objects (and system of namespaces, but it is more or less clear). So, basically, I am trying to understand when objects are loaded into the heap, how many of them are there, how long they live etc.
And my question is when I work with a package and import something from it:
from pypackage import pymodule
what objects get loaded into the memory (into the object heap of the python interpreter)? And more generally: what happens? :)
I guess the above example does something like:
some object of the package pypackage was created in the memory (which contains some information about the package but not too much), the module pymodule was loaded into the memory and its reference was created in the local name space. The important thing here is: no other modules of the pypackage (or other objects) were created in the memory, unless it is stated explicitly (in the module itself, or somewhere in the package initialization tricks and hooks, which I am not familiar with). At the end the only one big thing in the memory is pymodule (i.e. all the objects that were created when the module was imported). Is it so? I would appreciate if someone clarified this matter a little bit. Maybe you could advice some useful article about it? (documentation covers more particular things)
I have found the following to the same question about the modules import:
When Python imports a module, it first checks the module registry (sys.modules) to see if the module is already imported. If that’s the case, Python uses the existing module object as is.
Otherwise, Python does something like this:
Create a new, empty module object (this is essentially a dictionary)
Insert that module object in the sys.modules dictionary
Load the module code object (if necessary, compile the module first)
Execute the module code object in the new module’s namespace. All variables assigned by the code will be available via the module object.
And would be grateful for the same kind of explanation about packages.
By the way, with packages a module name is added into the sys.modules oddly:
>>> import sys
>>> from pypacket import pymodule
>>> "pymodule" in sys.modules.keys()
False
>>> "pypacket" in sys.modules.keys()
True
And also there is a practical question concerning the same matter.
When I build a set of tools, which might be used in different processes and programs. And I put them in modules. I have no choice but to load a full module even when all I want is to use only one function declared there. As I see one can make this problem less painful by making small modules and putting them into a package (if a package doesn't load all of its modules when you import only one of them).
Is there a better way to make such libraries in Python? (With the mere functions, which don't have any dependencies within their module.) Is it possible with C-extensions?
PS sorry for such a long question.
You have a few different questions here. . .
About importing packages
When you import a package, the sequence of steps is the same as when you import a module. The only difference is that the packages's code (i.e., the code that creates the "module code object") is the code of the package's __init__.py.
So yes, the sub-modules of the package are not loaded unless the __init__.py does so explicitly. If you do from package import module, only module is loaded, unless of course it imports other modules from the package.
sys.modules names of modules loaded from packages
When you import a module from a package, the name is that is added to sys.modules is the "qualified name" that specifies the module name together with the dot-separated names of any packages you imported it from. So if you do from package.subpackage import mod, what is added to sys.modules is "package.subpackage.mod".
Importing only part of a module
It is usually not a big concern to have to import the whole module instead of just one function. You say it is "painful" but in practice it almost never is.
If, as you say, the functions have no external dependencies, then they are just pure Python and loading them will not take much time. Usually, if importing a module takes a long time, it's because it loads other modules, which means it does have external dependencies and you have to load the whole thing.
If your module has expensive operations that happen on module import (i.e., they are global module-level code and not inside a function), but aren't essential for use of all functions in the module, then you could, if you like, redesign your module to defer that loading until later. That is, if your module does something like:
def simpleFunction():
pass
# open files, read huge amounts of data, do slow stuff here
you can change it to
def simpleFunction():
pass
def loadData():
# open files, read huge amounts of data, do slow stuff here
and then tell people "call someModule.loadData() when you want to load the data". Or, as you suggested, you could put the expensive parts of the module into their own separate module within a package.
I've never found it to be the case that importing a module caused a meaningful performance impact unless the module was already large enough that it could reasonably be broken down into smaller modules. Making tons of tiny modules that each contain one function is unlikely to gain you anything except maintenance headaches from having to keep track of all those files. Do you actually have a specific situation where this makes a difference for you?
Also, regarding your last point, as far as I'm aware, the same all-or-nothing load strategy applies to C extension modules as for pure Python modules. Obviously, just like with Python modules, you could split things up into smaller extension modules, but you can't do from someExtensionModule import someFunction without also running the rest of the code that was packaged as part of that extension module.
The approximate sequence of steps that occurs when a module is imported is as follows:
Python tries to locate the module in sys.modules and does nothing else if it is found. Packages are keyed by their full name, so while pymodule is missing from sys.modules, pypacket.pymodule will be there (and can be obtained as sys.modules["pypacket.pymodule"].
Python locates the file that implements the module. If the module is part of the package, as determined by the x.y syntax, it will look for directories named x that contain both an __init__.py and y.py (or further subpackages). The bottom-most file located will be either a .py file, a .pyc file, or a .so/.pyd file. If no file that fits the module is found, an ImportError will be raised.
An empty module object is created, and the code in the module is executed with the module's __dict__ as the execution namespace.1
The module object is placed in sys.modules, and injected into the importer's namespace.
Step 3 is the point at which "objects get loaded into memory": the objects in question are the module object, and the contents of the namespace contained in its __dict__. This dict typically contains top-level functions and classes created as a side effect of executing all the def, class, and other top-level statements normally contained in each module.
Note that the above only desribes the default implementation of import. There is a number of ways one can customize import behavior, for example by overriding the __import__ built-in or by implementing import hooks.
1 If the module file is a .py source file, it will be compiled into memory first, and the code objects resulting from the compilation will be executed. If it is a .pyc, the code objects will be obtained by deserializing the file contents. If the module is a .so or a .pyd shared library, it will be loaded using the operating system's shared-library loading facility, and the init<module> C function will be invoked to initialize the module.