should a module export a function named `exit()`? - python

I'm currently working on a module that wraps a C-library (let's call it foo)
The C-library has its functions prefixed by foo_ to avoid nameclashes with other libraries:
int foo_dothis(int x, int y);
void foo_dothat(struct foo_struct_*s);
In python, the foo_ prefix makes little sense, as we have namespaces for that kind of things.
import foo
foo.dothis(42)
The C-library also has functions for initializing/deinitializing the entire library:
int foo_init(void);
void foo_exit(void);
Now i'm wondering whether I should strip the foo_ prefix for those as well, in order to prevent confusion with the built-in exit():
from foo import *
exit()
I guess it is OK, as
being consistent is important
exit() is easier to remember than foo_exit()
foo.exit() is prettier than foo.foo_exit()
people are generally discouraged to use exit() in production code (and should only use it in the interpreter)
importing all symbols from a module asks for trouble anyhow
So what is the common approach to that (best-practice,...)

Since the role of foo_exit() is to uninitialise the library, and this is kind of the inverse of foo_init(), you could simply use name foo.uninit() for the Python function. This will avoid name clashes and confusion with the builtin exit(), and its purpose should be obvious to users of the module.

Related

What's the performance of PyImport_ImportModule(), and what are the best practices for using it?

I have an inner loop C function which (in my case) constructs a Python datetime object:
PyObject* my_inner_loop_fn(void* some_data) {
PyObject* datetime = PyImport_ImportModule("datetime");
if (datetime == NULL) return NULL;
PyObject* datetime_date = PyObject_GetAttrString(datetime, "date");
PyObject* result = NULL;
if (datetime_date != NULL) {
/* long long my_year, my_month, my_day = ... ; */
PyObject* args = Py_BuildValue("(LLL)", my_year, my_month, my_day);
res = PyObject_Call(datetime_date, args, NULL);
Py_XDECREF(args);
}
Py_XDECREF(datetime_date);
Py_DECREF(datetime);
return result;
}
My questions:
Does the PyImport_ImportModule("datetime") reload the entire module from scratch every time, or is it cached?
If it is not cached:
What is the preferred method of caching it?
When is the earliest time it is safe to try importing the other module? Can I assign it to a global variable, for example?
I want to avoid paying a heavy cost for the import, since the function runs frequently. Is the above expected to be performant code?
Does the PyImport_ImportModule("datetime") reload the entire module from scratch every time, or is it cached?
The standard behaviour is to first check sys.modules to see if the module has already been imported and return that if possible. It's only reloaded if it hasn't been imported successfully.
You can obviously test that yourself by putting some code with a visible side-effect in a module and importing that multiple times (e.g. a print statement).
The module import system is customizable however, so I believe it's possible for another module to choose to modify that behaviour (as an example, pyximport module for example has an option to always reload). Therefore, it's not 100% guaranteed.
It may still be worth caching because there's some cost in doing the look-up - it's a balance between the convenience of not having to cache it yourself and speed.
When is the earliest time it is safe to try importing the other module?
It's safe after the Python interpreter has been initialized. If you're embedding Python in a C/C++ program this is something you need to think about. If you're writing a Python extension module then you can be confident that the interpreter is initialized for your module to be imported.
Can I assign it to a global variable, for example?
Yes. However, global variables make it a little difficult for your module to support being unloaded and reloaded cleanly. Lots of C extensions choose not to worry about this. However, the PyModule_GetState mechanism is designed to support this use-case so you might choose to put your cache in the extension module state.

Constant integer Attributes in Python Module written in C

I implemented a python extension module in C according to https://docs.python.org/3.3/extending/extending.html
Now I want to have integer constants in that module, so I did:
module= PyModule_Create(&myModuleDef);
...
PyModule_AddIntConstant(module, "VAR1",1);
PyModule_AddIntConstant(module, "VAR2",2);
...
return module;
This works. But I can modify the "constants" from python, like
import myModule
myModule.VAR1 = 10
I tried to overload __setattr__, but this function is not called upon assignment.
Is there a solution?
You can't define module level "constants" in Python as you would in C(++). The Python way is to expect everyone to behave like responsible adults. If something is in all caps with underscores (like PEP 8 dictates), you shouldn't change it.

How should I perform imports in a python module without polluting its namespace?

I am developing a Python package for dealing with some scientific data. There are multiple frequently-used classes and functions from other modules and packages, including numpy, that I need in virtually every function defined in any module of the package.
What would be the Pythonic way to deal with them? I have considered multiple variants, but every has its own drawbacks.
Import the classes at module-level with from foreignmodule import Class1, Class2, function1, function2
Then the imported functions and classes are easily accessible from every function. On the other hand, they pollute the module namespace making dir(package.module) and help(package.module) cluttered with imported functions
Import the classes at function-level with from foreignmodule import Class1, Class2, function1, function2
The functions and classes are easily accessible and do not pollute the module, but imports from up to a dozen modules in every function look as a lot of duplicate code.
Import the modules at module-level with import foreignmodule
Not too much pollution is compensated by the need to prepend the module name to every function or class call.
Use some artificial workaround like using a function body for all these manipulations and returning only the objects to be exported... like this
def _export():
from foreignmodule import Class1, Class2, function1, function2
def myfunc(x):
return function1(x, function2(x))
return myfunc
myfunc = _export()
del _export
This manages to solve both problems, module namespace pollution and ease of use for functions... but it seems to be not Pythonic at all.
So what solution is the most Pythonic? Is there another good solution I overlooked?
Go ahead and do your usual from W import X, Y, Z and then use the __all__ special symbol to define what actual symbols you intend people to import from your module:
__all__ = ('MyClass1', 'MyClass2', 'myvar1', …)
This defines the symbols that will be imported into a user's module if they import * from your module.
In general, Python programmers should not be using dir() to figure out how to use your module, and if they are doing so it might indicate a problem somewhere else. They should be reading your documentation or typing help(yourmodule) to figure out how to use your library. Or they could browse the source code yourself, in which case (a) the difference between things you import and things you define is quite clear, and (b) they will see the __all__ declaration and know which toys they should be playing with.
If you try to support dir() in a situation like this for a task for which it was not designed, you will have to place annoying limitations on your own code, as I hope is clear from the other answers here. My advice: don't do it! Take a look at the Standard Library for guidance: it does from … import … whenever code clarity and conciseness require it, and provides (1) informative docstrings, (2) full documentation, and (3) readable code, so that no one ever has to run dir() on a module and try to tell the imports apart from the stuff actually defined in the module.
One technique I've seen used, including in the standard library, is to use import module as _module or from module import var as _var, i.e. assigning imported modules/variables to names starting with an underscore.
The effect is that other code, following the usual Python convention, treats those members as private. This applies even for code that doesn't look at __all__, such as IPython's autocomplete function.
An example from Python 3.3's random module:
from warnings import warn as _warn
from types import MethodType as _MethodType, BuiltinMethodType as _BuiltinMethodType
from math import log as _log, exp as _exp, pi as _pi, e as _e, ceil as _ceil
from math import sqrt as _sqrt, acos as _acos, cos as _cos, sin as _sin
from os import urandom as _urandom
from collections.abc import Set as _Set, Sequence as _Sequence
from hashlib import sha512 as _sha512
Another technique is to perform imports in function scope, so that they become local variables:
"""Some module"""
# imports conventionally go here
def some_function(arg):
"Do something with arg."
import re # Regular expressions solve everything
...
The main rationale for doing this is that it is effectively lazy, delaying the importing of a module's dependencies until they are actually used. Suppose one function in the module depends on a particular huge library. Importing the library at the top of the file would mean that importing the module would load the entire library. This way, importing the module can be quick, and only client code that actually calls that function incurs the cost of loading the library. Further, if the dependency library is not available, client code that doesn't need the dependent feature can still import the module and call the other functions. The disadvantage is that using function-level imports obscures what your code's dependencies are.
Example from Python 3.3's os.py:
def get_exec_path(env=None):
"""[...]"""
# Use a local import instead of a global import to limit the number of
# modules loaded at startup: the os module is always loaded at startup by
# Python. It may also avoid a bootstrap issue.
import warnings
Import the module as a whole: import foreignmodule. What you claim as a drawback is actually a benefit. Namely, prepending the module name makes your code easier to maintain and makes it more self-documenting.
Six months from now when you look at a line of code like foo = Bar(baz) you may ask yourself which module Bar came from, but with foo = cleverlib.Bar it is much less of a mystery.
Of course, the fewer imports you have, the less of a problem this is. For small programs with few dependencies it really doesn't matter all that much.
When you find yourself asking questions like this, ask yourself what makes the code easier to understand, rather than what makes the code easier to write. You write it once but you read it a lot.
For this situation I would go with an all_imports.py file which had all the
from foreignmodule import .....
from another module import .....
and then in your working modules
import all_imports as fgn # or whatever you want to prepend
...
something = fgn.Class1()
Another thing to be aware of
__all__ = ['func1', 'func2', 'this', 'that']
Now, any functions/classes/variables/etc that are in your module, but not in your modules's __all__ will not show up in help(), and won't be imported by from mymodule import * See Making python imports more structured? for more info.
I would compromise and just pick a short alias for the foreign module:
import foreignmodule as fm
It saves you completely from the pollution (probably the bigger issue) and at least reduces the prepending burden.
I know this is an old question. It may not be 'Pythonic', but the cleanest way I've discovered for exporting only certain module definitions is, really as you've found, to globally wrap the module in a function. But instead of returning them, to export names, you can simply globalize them (global thus in essence becomes a kind of 'export' keyword):
def module():
global MyPublicClass,ExportedModule
import somemodule as ExportedModule
import anothermodule as PrivateModule
class MyPublicClass:
def __init__(self):
pass
class MyPrivateClass:
def __init__(self):
pass
module()
del module
I know it's not much different than your original conclusion, but frankly to me this seems to be the cleanest option. The other advantage is, you can group any number of modules written this way into a single file, and their private terms won't overlap:
def module():
global A
i,j,k = 1,2,3
class A:
pass
module()
del module
def module():
global B
i,j,k = 7,8,9 # doesn't overwrite previous declarations
class B:
pass
module()
del module
Though, keep in mind their public definitions will, of course, overlap.

dynamic module creation

I'd like to dynamically create a module from a dictionary, and I'm wondering if adding an element to sys.modules is really the best way to do this. EG
context = { a: 1, b: 2 }
import types
test_context_module = types.ModuleType('TestContext', 'Module created to provide a context for tests')
test_context_module.__dict__.update(context)
import sys
sys.modules['TestContext'] = test_context_module
My immediate goal in this regard is to be able to provide a context for timing test execution:
import timeit
timeit.Timer('a + b', 'from TestContext import *')
It seems that there are other ways to do this, since the Timer constructor takes objects as well as strings. I'm still interested in learning how to do this though, since a) it has other potential applications; and b) I'm not sure exactly how to use objects with the Timer constructor; doing so may prove to be less appropriate than this approach in some circumstances.
EDITS/REVELATIONS/PHOOEYS/EUREKA:
I've realized that the example code relating to running timing tests won't actually work, because import * only works at the module level, and the context in which that statement is executed is that of a function in the testit module. In other words, the globals dictionary used when executing that code is that of __main__, since that's where I was when I wrote the code in the interactive shell. So that rationale for figuring this out is a bit botched, but it's still a valid question.
I've discovered that the code run in the first set of examples has the undesirable effect that the namespace in which the newly created module's code executes is that of the module in which it was declared, not its own module. This is like way weird, and could lead to all sorts of unexpected rattlesnakeic sketchiness. So I'm pretty sure that this is not how this sort of thing is meant to be done, if it is in fact something that the Guido doth shine upon.
The similar-but-subtly-different case of dynamically loading a module from a file that is not in python's include path is quite easily accomplished using imp.load_source('NewModuleName', 'path/to/module/module_to_load.py'). This does load the module into sys.modules. However this doesn't really answer my question, because really, what if you're running python on an embedded platform with no filesystem?
I'm battling a considerable case of information overload at the moment, so I could be mistaken, but there doesn't seem to be anything in the imp module that's capable of this.
But the question, essentially, at this point is how to set the global (ie module) context for an object. Maybe I should ask that more specifically? And at a larger scope, how to get Python to do this while shoehorning objects into a given module?
Hmm, well one thing I can tell you is that the timeit function actually executes its code using the module's global variables. So in your example, you could write
import timeit
timeit.a = 1
timeit.b = 2
timeit.Timer('a + b').timeit()
and it would work. But that doesn't address your more general problem of defining a module dynamically.
Regarding the module definition problem, it's definitely possible and I think you've stumbled on to pretty much the best way to do it. For reference, the gist of what goes on when Python imports a module is basically the following:
module = imp.new_module(name)
execfile(file, module.__dict__)
That's kind of the same thing you do, except that you load the contents of the module from an existing dictionary instead of a file. (I don't know of any difference between types.ModuleType and imp.new_module other than the docstring, so you can probably use them interchangeably) What you're doing is somewhat akin to writing your own importer, and when you do that, you can certainly expect to mess with sys.modules.
As an aside, even if your import * thing was legal within a function, you might still have problems because oddly enough, the statement you pass to the Timer doesn't seem to recognize its own local variables. I invoked a bit of Python voodoo by the name of extract_context() (it's a function I wrote) to set a and b at the local scope and ran
print timeit.Timer('print locals(); a + b', 'sys.modules["__main__"].extract_context()').timeit()
Sure enough, the printout of locals() included a and b:
{'a': 1, 'b': 2, '_timer': <built-in function time>, '_it': repeat(None, 999999), '_t0': 1277378305.3572791, '_i': None}
but it still complained NameError: global name 'a' is not defined. Weird.

Combining C and Python functions in a module

I have a C extension module, to which I would like to add some Python utility functions. Is there a recommended way of doing this?
For example:
import my_module
my_module.super_fast_written_in_C()
my_module.written_in_Python__easy_to_maintain()
I'm primarily interested in Python 2.x.
The usual way of doing this is: mymod.py contains the utility functions written in Python, and imports the goodies in the _mymod module which is written in C and is imported from _mymod.so or _mymod.pyd. For example, look at .../Lib/csv.py in your Python distribution.
Prefix your native extension with an underscore.
Then, in Python, create a wrapper module that imports that native extension and adds some other non-native routines on top of that.
The existing answers describe the method most often used: it has the potential advantage of allowing pure-Python (or other-language) implementations on platforms in which the compiled C extension is not available (including Jython and IronPython).
In a few cases, however, it may not be worth splitting the module into a C layer and a Python layer just to provide a few extras that are more sensibly written in Python than in C. For example, gmpy (lines 7113 ff at this time), in order to enable pickling of instances of gmpy's type, uses:
copy_reg_module = PyImport_ImportModule("copy_reg");
if (copy_reg_module) {
char* enable_pickle =
"def mpz_reducer(an_mpz): return (gmpy.mpz, (an_mpz.binary(), 256))\n"
"def mpq_reducer(an_mpq): return (gmpy.mpq, (an_mpq.binary(), 256))\n"
"def mpf_reducer(an_mpf): return (gmpy.mpf, (an_mpf.binary(), 0, 256))\n"
"copy_reg.pickle(type(gmpy.mpz(0)), mpz_reducer)\n"
"copy_reg.pickle(type(gmpy.mpq(0)), mpq_reducer)\n"
"copy_reg.pickle(type(gmpy.mpf(0)), mpf_reducer)\n"
;
PyObject* namespace = PyDict_New();
PyObject* result = NULL;
if (options.debug)
fprintf(stderr, "gmpy_module imported copy_reg OK\n");
PyDict_SetItemString(namespace, "copy_reg", copy_reg_module);
PyDict_SetItemString(namespace, "gmpy", gmpy_module);
PyDict_SetItemString(namespace, "type", (PyObject*)&PyType_Type);
result = PyRun_String(enable_pickle, Py_file_input,
namespace, namespace);
If you want those few extra functions to "stick around" in your module (not necessary in this example case), you would of course use your module object as built by Py_InitModule3 (or whatever other method) and its PyModule_GetDict rather than a transient dictionary as the namespace in which to PyRun_String. And of course there are more sophisticated approaches than to PyRun_String the def and class statements you need, but, for simple enough cases, this simple approach may in fact be sufficient.

Categories

Resources