Design pattern for setup.py to avoid duplicated metadata - python

I have written some python scripts, and embedded some metadata at the top, i.e.:
__version__ = '0.6.1'
Someone contributed a setup.py, mostly just copying the meta data from the script, so we have:
setup(
(...)
version='0.4.0',
I dislike such duplication a lot, the reason should be obvious above. Updating the meta data is done by people, and not all people know/remember that it needs to be updated at both places. Mistakes are bound to happen. Not to forget the fact that it's also easier to just update the version number one place. I've attempted to deduplicate the meta data by removing it from the code itself, but ... it's actually referenced in the script (it is a script and it accepts --version and --help after all). It's also considered "best practice" to embed such meta data in the script.
I can certainly do some tricks to let setup.py read the meta data from the script. Indeed, it would probably take me less time than to write this question - I more want to know what is considered "best practice". Is it really considered "best practice" to duplicate this information both in setup.py and in the library/script? Does there exist some ready-made cookbook boilerplate code for reading the meta data from the script/package itself? Wouldn't it make sense if the setup method could do this by itself?

You could just import your package and get the metadata from there.
This does assume that your file structure looks like this:
root/
code.py
setup.py
Or
root/
code/
__init__.py
setup.py
And either code.py or __init__.py have the metadata.
import os
import sys
__dir__ = path.abspath(path.dirname(__file__)) # The directory of setup.py, root/
sys.path.insert(0, __dir__) # Will search __dir__ first for the package
try:
import code
finally:
sys.path.pop(0)
setup_args = dict(
name='code',
# ...
)
for metadata in ('version', 'author', 'email', 'license'):
if hasattr(code, '__{}__'.format(metadata)):
setup_args[metadata] = getattr(code, '__{}__'.format(metadata))
setup(**setup_args)

After a bit of research into this, I'll answer myself ...
First of all - apparently there doesn't exist any convention to embed metadata in the script or library itself - I can only guess that it was my own invention to add __author__, __license__ etc into the script.
ATOH, there exists a convention to add __version__ to scripts and libraries. PEP 8 describes that this should be automatically set to the VCS revision number on every commit - though it is commonly set to a symantic version number. PEP 396 was an attempt to clear this up, but was deferred due to a lack of interesst or time.
Hence when having symantic version numbers in the __version__ and also supplying the project with a setup.py, one will either need to update the version number in two places, or do some tricks to get it auto-updated at one place (or possibly let the VCS autoupdate it both places).
As for myeself, I ended up importing the script into the setup.py and reading __version__ from there -
https://github.com/tobixen/calendar-cli/commit/854948ddc057f0ce7cc047b3618e8d5f6a1a292c

Related

Share 50+ constants between modules in a pythonic way

I understand this is not a 'coding' question but I need a coding solution.
I have multiple packages I wrote, all supposed to be encapsulated, and independent on external parameters other than a few input arguments.
On the other side, I have a general file constants.py with 50+ constants which those packages better use in order to provide an output dictionary without hardcoded names :
A PACKAGE OUTPUT:
{
'sub_name':xyz,
'sub_type':yzg
}
Here sub_name should be given to the package as input so the general program will know what to do with sub_name output.
How should I share constants.py with the packages ?
The obvious way is to just import constants.py, which makes the package dependent on an external file somewhere else in the program.
The other way is to keep constants in some class Keys and send it as argument.
Could/should I send constants.py as an argument ?
I find it hard to understand how packages should be written and organized when inside a larger project, in a way they can be reused by other devs independently.
You can store constants in __init__.py and import them in submodules.
Example:
main_module/__init__.py
# inside this file
CONSTANT_A = 42
CONSTANT_B = 69
then:
main_module/submodule.py
# inside this file
from main_module import CONSTANT_A
print(CONSTANT_A)
>>42

How to debug a PyInstaller .spec file?

Suppose I have a minimal PyInstaller spec file, e.g. the hello_world.spec that is created when I run pyinstaller hello_world.py (also see docs). This spec file has python code but no import statements.
Now suppose I customize this file, e.g. using the Tree and TOC classes, but something is going wrong and I need to find out what.
I am aware of the PyInstaller --log-level options and warn*.txt files, but I would prefer to place some break-points in my IDE and debug the spec file (or maybe just play around with the Tree class in the console). However, debugging does not work out-of-the-box because there are no import statements in the spec file. I can add those, as below, for example:
from PyInstaller.building.build_main import Analysis, PYZ, EXE, COLLECT
from PyInstaller.building.datastruct import TOC, Tree
But then it turns out some configuration info is required, as I keep running into KeyErrors related to CONF. I tried adding those key/value-pairs manually, based on a list of globals from the docs, which appears to work, up to a point, but I cannot help thinking I'm doing something wrong...
import PyInstaller.config
PyInstaller.config.CONF['specnm'] = 'hello_world'
... etc. ...
Could someone tell me what is the right way to do this? Should I just stick with the pyinstaller --log-level approach?
A couple of years later, still no answer, so here's one alternative I came up with:
One approach is to use unittest.mock.Mock to mock the PyInstaller classes that are not relevant for the problem at hand.
For example, my spec file has some custom code that generates some of the values passed into the Analysis class. In order to debug that code, I just mock out all the PyInstaller classes and run the spec file, using runpy.run_path from the standard library, as follows:
from runpy import run_path
from unittest.mock import Mock
mocks = dict(Analysis=Mock(), EXE=Mock(), PYZ=Mock(), COLLECT=Mock())
run_path(path_name="path/to/my.spec", init_globals=mocks)
This is also very useful to extract the values of arguments defined in the spec file. For example, we can extract the name value passed into EXE() as follows:
exe_name = mocks["EXE"].call_args.kwargs["name"]
Found this unresolved thread, I found that putting an old style Python breakpoint in the spec file works with Pyinstaller 4.8:
...
import pdb; pdb.set_trace()
...

Python Import Conventions: explicit bin/lib imports

ADVICE REQUEST: Best Practices for Python Imports
I need advice re: how major projects do Python imports and the standard / Pythonic way to set this up.
I have a project Foobar that's a subproject of another, much larger and well-used project Dammit. My project directory looks like this:
/opt/foobar
/opt/foobar/bin/startFile.py
/opt/foobar/bin/tests/test_startFile.py
/opt/foobar/foobarLib/someModule.py
/opt/foobar/foobarLib/tests/test_someModule.py
So, for the Dammit project, I add to PYTHONPATH:
export PYTHONPATH=$PYTHONPATH:/opt/foobar
This lets me add a very clear, easy-to-track-down import of:
from foobarLib.someModule import SomeClass
to all 3 of:
* Dammit - using foobarLib in the import makes everyone know it's not in Dammit project, it's in Foobar.
* My startFile.py is an adjunct process that has the same pythonpath import.
* test_someModule.py has an explicit import, too.
PROBLEM: This only works for foobarLib, not bin.
I have tests I want to run on in /opt/foobar/bin/tests/test_startFile.py. But, how to set up the path and do the import? Should I do a relative import like this:
PROJ_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__),".."))
sys.path.insert(0, PROJ_ROOT)
Or, should I rename the bin dir to be foobarBin so I do the import as:
from foobarBin.startFile import StartFile
I'm tempted to do the following:
/opt/foobarProject/foobar
/opt/foobarProject/foobar/bin/startFile.py
/opt/foobarProject/foobar/bin/tests/test_startFile.py
/opt/foobarProject/foobar/foobarLib/someModule.py
/opt/foobarProject/foobar/foobarLib/tests/test_someModule.py
then, I can do all my imports with:
import foobar.bin.startFile
import foobar.lib.someModule
Basically, for large Python projects (AND THEIR ADJUNCTS), it seems to me we have goals of:
minimize number of dirs added to pythonpath;
make it obvious where imports are coming from. That is, people use 'import lib.thing' a lot, and if there's more than one directory in the pythonpath named 'lib',
it's troublesome / non-obvious where that is, short of invoking
python and searching sys.modules, etc.
minimize number of times we add paths to sys.path since that's runtime and somewhat non-obvious.
I've used Python a long time, and been part of projects that do it various ways, some of them more stupid and some less so. I guess I'm wondering if there is a best-practices here, and the reason for it? Is my convention of adding foobarProject layer okay, good, bad, or just one more way among many?

Scalability of Python Module / Package system to large projects

I would like to have a nice hierarchy of modules for a large
project.. (Python seems to get in the way of this) I am confused about
the distinction of modules and packages and how they relate to the C++
concept of a namespace. For concreteness my project is a compiler and
the code generation phases want to query properties from some set of
abstract representations which are maintained in a different directory
(actually far away in the hierarchy)
The problem can be stated as:
Ass: Let a.py and b.py be two source files somewhere in the project hierarchy
Then: I want to refer to the functions defined in b.py from
a.py -- ideally with a relative path from the well-defined root
directory of the project (which is /src). We want a general-purpose
solution for this, something which will always work..
Dirty hack: It sounds absurd but my putting all sub-directories that contain .py on this
project into PYTHONPATH we will be able to reference them with their name, but with this
the reader of the code loses any sense of hierarchy & relation about the different project
classes etc..
Note: The tutorial on Python.org only mentions the special case of referring from a file c.py to a file d.py placed in its parent directory. Where is the generality that makes
Python scale to really large projects here?
I am not sure if it is the question, but let us see.
Suppose I have the following package scheme (__init__.py files excluded for readability):
foo/baz/quux/b.py
foo/baz/quux/quuuux/c.py
foo/bar/a.py
My foo/baz/quux/b.py file contains this:
def func_b():
print 'func b'
and my foo/baz/quux/quuuux/c.py is:
def func_c():
print 'func c'
If the root directory which contains foo (in your case, src*) is in the Python path, aur foo/bar/a.py file can import any other module starting from foo:
import foo.baz.quux.b as b
import foo.baz.quux.quuuux.c as c
def func_a():
b.func_b()
c.func_c()
And one can use the foo/bar/a.py this way:
import foo.bar.a as a
a.func_a()
Have you tried it? Did you got some error?
* When you deploy your project, I do not believe the root will be src but let us maintain it simple by omitting it :)

Python: Importing an "import file"

I am importing a lot of different scripts, so at the top of my file it gets cluttered with import statements, i.e.:
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
...
Is there a way to move all of these somewhere else and then all I have to do is import that file instead so it's just one clean import?
I strongly advise against what you want to do. You are doing the global include file mistake again. Although only one module is importing all your modules (as opposed to all modules importing the global one), the remaining point is that if there's a valid reason for all those modules to be collected under a common name, fine. If there's no reason, then they should be kept as separate includes. The reason is documentation. If I open your file, and see only one import, I don't get any information about what is imported and where it comes from. If on the other hand, I have the list of imports, I know at a glance what is needed and what not.
Also, there's another important error I assume you are doing. When you say
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
I assume you are importing, for example, a class, like this
from somewhere.fileA import MyClass
this is wrong. This alternative solution is much better
from somewhere import fileA
<later>
a=fileA.MyClass()
Why? two reasons: first, namespacing. If you have two modules having a class named MyClass, you would have a clash. Second, documentation. Suppose you use the first option, and I find in your code the following line
a=MyClass()
now I have no idea where this MyClass comes from, and I will have to grep around all your files in order to find it. Having it qualified with the module name allows me to immediately understand where it comes from, and immediately find, via a /search, where stuff coming from the fileA module is used in your program.
Final note: when you say "fileA" you are doing a mistake. There are modules (or packages), not files. Modules map to files, and packages map to directories, but they may also map to egg files, and you may even create a module having no file at all. This is naming of concepts, and it's a lateral issue.
Of course there is; just create a file called myimports.py in the same directory where your main file is and put your imports there. Then you can simply use from myimports import * in your main script.

Categories

Resources