I have read through a couple of dozens of writeups on how to equip a modern python project to automate linting, testing, coverage, type checking etc (and eventually deploy it on cloud servers. Not interested in the latter part yet.)
I am thinking about using conda as my environment manager. This would give me the advantage of being able to install non python packages if needed and also by creating the project environment with a specified python version I believe it would replace pyenv.
Within the newly conda created environment I would use poetry to manage dependencies and to initialize its TOML and add other needed python packages which would be installed by pip from PyPI.
Did I get the above right?
To complete the "building" phase I am also looking into using pytest and pytest-cov for code coverage, mypy/pydantic for type checking and black for formatting.
All of this should work both on my local development machine but then when I push on GitHub trigger an Action and perform the same checks, so that any contributor will go through them. Currently managed to do it for pylint, pytest and coverage for very simple projects without requirements.
Does all of this make sense? Am I missing some important component/step? For example I'm trying to understand if tox would help me in this workflow by automating testing on different python versions, but haven't come to grips integrating this flood on (new to me) concepts. Thanks
Should/does a utility Python project need a setup.py file? My utility project will train a computer vision model with sample images. It depends on a computer vision python module/package. It will be used internally and not publicly distributed.
Is a setup.py file useful or applicable for this kind of Python project?
Yes, it is certainly useful to setup packaging even for internal libraries. You'll want to write a setup.py if you'd like any of the benefits below:
Users to be able to install your stuff with pip install myutility
Different versions, e.g. one user on myutility==1.0.1 and another on myutility==1.2.1
Control / management of dependencies (see install_requires)
To use continuous integration and software dev best practices (e.g. the tests must have passed for code to be released)
The only time I would consider omitting setup.py is for quick throwaway scripts that have no external dependencies.
When working with JVM languages a pattern commonly followed is to use a build system (ant+ivy / maven / gradle), where using a build file, the dependencies of your code can be defined. The build system is able to fetch these dependencies when you build your code. Moreover IDEs like Eclipse/IntelliJ are also able to read these build files and continuously build/verify your code as you write it.
How is something similar done while developing in Python? While there may not necessarily be a build step, I want a developer to be able to checkout my code and then run a single bootstrap command that will setup a virtualenv and pull in any thirdy-party dependencies necessary to run the code. I could include some sort of a script to do this, but I am wondering if there is a tool to do this? Most of my search so far has led me to packaging tools, which are more for distribution to end-user than for this purpose (or so I understand).
This is managed by virtualenv and the pip install -r requirements.txt command. More info here: Virtual Environments
I guess requirements.txt is what you are looking for. For example, PyCharm IDE will definitely see it as a dependency list.
I'm developing a distribution for the Python package I'm writing so I can post
it on PyPI. It's my first time working with distutils, setuptools, distribute,
pip, setup.py and all that and I'm struggling a bit with a learning curve
that's quite a bit steeper than I anticipated :)
I was having a little trouble getting some of my test data files to be
included in the tarball by specifying them in the data_files parameter in setup.py until I came across a different post here that pointed me
toward the MANIFEST.in file. Just then I snapped to the notion that what you
include in the tarball/zip (using MANIFEST.in) and what gets installed in a
user's Python environment when they do easy_install or whatever (based on what
you specify in setup.py) are two very different things; in general there being
a lot more in the tarball than actually gets installed.
This immediately triggered a code-smell for me and the realization that there
must be more than one use case for a distribution; I had been fixated on the
only one I've really participated in, using easy_install or pip to install a
library. And then I realized I was developing work product where I had only a
partial understanding of the end-users I was developing for.
So my question is this: "What are the use cases for a Python distribution
other than installing it in one's Python environment? Who else am I serving
with this distribution and what do they care most about?"
Here are some of the working issues I haven't figured out yet that bear on the
answer:
Is it a sensible thing to include everything that's under source control
(git) in the source distribution? In the age of github, does anyone download
a source distribution to get access to the full project source? Or should I
just post a link to my github repo? Won't including everything bloat the
distribution and make it take longer to download for folks who just want to
install it?
I'm going to host the documentation on readthedocs.org. Does it make any
sense for me to include HTML versions of the docs in the source
distribution?
Does anyone use python setup.py test to run tests on a source
distribution? If so, what role are they in and what situation are they in? I
don't know if I should bother with making that work and if I do, who to make
it work for.
Some things that you might want to include in the source distribution but maybe not install include:
the package's license
a test suite
the documentation (possibly a processed form like HTML in addition to the source)
possibly any additional scripts used to build the source distribution
Quite often this will be the majority or all of what you are managing in version control and possibly a few generated files.
The main reason why you would do this when those files are available online or through version control is so that people know they have the version of the docs or tests that matches the code they're running.
If you only host the most recent version of the docs online, then they might not be useful to someone who has to use an older version for some reason. And the test suite on the tip in version control may not be compatible with the version of the code in the source distribution (e.g. if it tests features added since then). To get the right version of the docs or tests, they would need to comb through version control looking for a tag that corresponds to the source distribution (assuming the developers bothered tagging the tree). Having the files available in the source distribution avoids this problem.
As for people wanting to run the test suite, I have a number of my Python modules packaged in various Linux distributions and occasionally get bug reports related to test failures in their environments. I've also used the test suites of other people's modules when I encounter a bug and want to check whether the external code is behaving as the author expects in my environment.
As a long time Python programmer, I wonder, if a central aspect of Python culture eluded me a long time: What do we do instead of Makefiles?
Most ruby-projects I've seen (not just rails) use Rake, shortly after node.js became popular, there was cake. In many other (compiled and non-compiled) languages there are classic Make files.
But in Python, no one seems to need such infrastructure. I randomly picked Python projects on GitHub, and they had no automation, besides the installation, provided by setup.py.
What's the reason behind this?
Is there nothing to automate? Do most programmers prefer to run style checks, tests, etc. manually?
Some examples:
dependencies sets up a virtualenv and installs the dependencies
check calls the pep8 and pylint commandlinetools.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
the coffeescript task compiles all coffeescripts to minified javascript
the runserver task depends on dependencies and coffeescript
the deploy task depends on check and test and deploys the project.
the docs task calls sphinx with the appropiate arguments
Some of them are just one or two-liners, but IMHO, they add up. Due to the Makefile, I don't have to remember them.
To clarify: I'm not looking for a Python equivalent for Rake. I'm glad with paver. I'm looking for the reasons.
Actually, automation is useful to Python developers too!
Invoke is probably the closest tool to what you have in mind, for automation of common repetitive Python tasks: https://github.com/pyinvoke/invoke
With invoke, you can create a tasks.py like this one (borrowed from the invoke docs)
from invoke import run, task
#task
def clean(docs=False, bytecode=False, extra=''):
patterns = ['build']
if docs:
patterns.append('docs/_build')
if bytecode:
patterns.append('**/*.pyc')
if extra:
patterns.append(extra)
for pattern in patterns:
run("rm -rf %s" % pattern)
#task
def build(docs=False):
run("python setup.py build")
if docs:
run("sphinx-build docs docs/_build")
You can then run the tasks at the command line, for example:
$ invoke clean
$ invoke build --docs
Another option is to simply use a Makefile. For example, a Python project's Makefile could look like this:
docs:
$(MAKE) -C docs clean
$(MAKE) -C docs html
open docs/_build/html/index.html
release: clean
python setup.py sdist upload
sdist: clean
python setup.py sdist
ls -l dist
Setuptools can automate a lot of things, and for things that aren't built-in, it's easily extensible.
To run unittests, you can use the setup.py test command after having added a test_suite argument to the setup() call. (documentation)
Dependencies (even if not available on PyPI) can be handled by adding a install_requires/extras_require/dependency_links argument to the setup() call. (documentation)
To create a .deb package, you can use the stdeb module.
For everything else, you can add custom setup.py commands.
But I agree with S.Lott, most of the tasks you'd wish to automate (except dependencies handling maybe, it's the only one I find really useful) are tasks you don't run everyday, so there wouldn't be any real productivity improvement by automating them.
There is a number of options for automation in Python. I don't think there is a culture against automation, there is just not one dominant way of doing it. The common denominator is distutils.
The one which is closed to your description is buildout. This is mostly used in the Zope/Plone world.
I myself use a combination of the following: Distribute, pip and Fabric. I am mostly developing using Django that has manage.py for automation commands.
It is also being actively worked on in Python 3.3
Any decent test tool has a way of running the entire suite in a single command, and nothing is stopping you from using rake, make, or anything else, really.
There is little reason to invent a new way of doing things when existing methods work perfectly well - why re-invent something just because YOU didn't invent it? (NIH).
The make utility is an optimization tool which reduces the time spent building a software image. The reduction in time is obtained when all of the intermediate materials from a previous build are still available, and only a small change has been made to the inputs (such as source code). In this situation, make is able to perform an "incremental build": rebuild only a subset of the intermediate pieces that are impacted by the change to the inputs.
When a complete build takes place, all that make effectively does is to execute a set of scripting steps. These same steps could just be deposited into a flat script. The -n option of make will in fact print these steps, which makes this possible.
A Makefile isn't "automation"; it's "automation with a view toward optimized incremental rebuilds." Anything scripted with any scripting tool is automation.
So, why would Python project eschew tools like make? Probably because Python projects don't struggle with long build times that they are eager to optimize. And, also, the compilation of a .py to a .pyc file does not have the same web of dependencies like a .c to a .o.
A C source file can #include hundreds of dependent files; a one-character change in any one of these files can mean that the source file must be recompiled. A properly written Makefile will detect when that is or is not the case.
A big C or C++ project without an incremental build system would mean that a developer has to wait hours for an executable image to pop out for testing. Fast, incremental builds are essential.
In the case of Python, probably all you have to worry about is when a .py file is newer than its corresponding .pyc, which can be handled by simple scripting: loop over all the files, and recompile anything newer than its byte code. Moreover, compilation is optional in the first place!
So the reason Python projects tend not to use make is that their need to perform incremental rebuild optimization is low, and they use other tools for automation; tools that are more familiar to Python programmers, like Python itself.
The original PEP where this was raised can be found here. Distutils has become the standard method for distributing and installing Python modules.
Why? It just happens that python is a wonderful language to perform the installation of Python modules with.
Here are few examples of makefile usage with python:
https://blog.horejsek.com/makefile-with-python/
https://krzysztofzuraw.com/blog/2016/makefiles-in-python-projects.html
I think that a most of people is not aware "makefile for python" case. It could be useful, but "sexiness ratio" is too small to propagate rapidly (just my PPOV).
Is there nothing to automate?
Not really. All but two of the examples are one-line commands.
tl;dr Very little of this is really interesting or complex. Very little of this seems to benefit from "automation".
Due to documentation, I don't have to remember the commands to do this.
Do most programmers prefer to run stylechecks, tests, etc. manually?
Yes.
generation documentation,
the docs task calls sphinx with the appropiate arguments
It's one line of code. Automation doesn't help much.
sphinx-build -b html source build/html. That's a script. Written in Python.
We do this rarely. A few times a week. After "significant" changes.
running stylechecks (Pylint, Pyflakes and the pep8-cmdtool).
check calls the pep8 and pylint commandlinetools
We don't do this. We use unit testing instead of pylint.
You could automate that three-step process.
But I can see how SCons or make might help someone here.
tests
There might be space for "automation" here. It's two lines: the non-Django unit tests (python test/main.py) and the Django tests. (manage.py test). Automation could be applied to run both lines.
We do this dozens of times each day. We never knew we needed "automation".
dependecies sets up a virtualenv and installs the dependencies
Done so rarely that a simple list of steps is all that we've ever needed. We track our dependencies very, very carefully, so there are never any surprises.
We don't do this.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
The start server & run nosetest as a two-step "automation" makes some sense. It saves you from entering the two shell commands to run both steps.
the coffeescript task compiles all coffeescripts to minified javascript
This is something that's very rare for us. I suppose it's a good example of something to be automated. Automating the one-line script could be helpful.
I can see how SCons or make might help someone here.
the runserver task depends on dependencies and coffeescript
Except. The dependencies change so rarely, that this seems like overkill. I supposed it can be a good idea of you're not tracking dependencies well in the first place.
the deploy task depends on check and test and deploys the project.
It's an svn co and python setup.py install on the server, followed by a bunch of customer-specific copies from the subversion area to the customer /www area. That's a script. Written in Python.
It's not a general make or SCons kind of thing. It has only one actor (a sysadmin) and one use case. We wouldn't ever mingle deployment with other development, QA or test tasks.