Integrate multiple Python projects - python

I have 2 Python projects:
Proj1(/var/www/proj1)
venv
requirments.txt
app
fun.py
fun2.py
app2
pdf.py
somefun2.py
Proj2(/var/www/proj2)
venv
requirments.txt
another
anotherfun.py
anotherfun2.py
someanother
someanotherfun.py
pdfproj2.py
Both work individually and both have different set of requirements.
Lets say pdf.py from proj1 has a function generate which will generate some PDFs. It will take all other modules(app/fun2 etc) in same project for it.
Now what I want is this functionality(pdf.py->generate) I want to call in pdfproj2.py in proj2.
How is this possible?
Nb: I am not using any frameworks like flask/django etc

There are at least three approaches.
1. external call
Change nothing. Pretty much.
Command line callers are already able to take advantage
of $ python proj1/app2/pdf.py arg..., invoking generate().
Arrange for proj2pdf.py to fork off a subprocess
and do exactly that.
Nothing changes in project1,
since its public API already supports this use case.
Notice that you might need to carefully finesse the PATH & PYTHONPATH
env vars, as part of correctly invoking that pdf.py command.
That's the sort of setup that conda and venv are good at.
2. merge projects
This is the quick-n-dirty approach. I do not recommend it.
Create project3, and incorporate source code from both existing projects.
Take the union of all library dependencies.
Now you can call generate() in the same address space,
the same process, as the calling python code.
Downside is: ugliness.
The bigger project's codebase is not as easily maintainable.
3. packaging
The "right" way to make generate() available to project2,
or to any project, is to package it up.
Pretend you're going to publish on pypi.
Doesn't matter if you actually do,
let's just prepare for such a possibility.
Create setup.py or similar, maybe use setuptools,
and create a wheel (or at least a tar) of project1.
There are many ways to do this, and best practices
continue to evolve, so I won't delve into details here.
Now you can list project1 as a dependency in project2's requirements.txt,
and import it just like any other dep. Problem solved!
This is the best approach.
It does involve a bit of work, and a gentle learning curve.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.
Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.
I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

Is there a way to package tests in python

In my CI/CD environment I have a multiple projects that use mostly the same tests, with a bit of variation. Since all of them are mostly the same, just different projects/builds use them a bit differently, I am looking for a way (if there is one) to package the tests themselves to pass around the projects. EDIT: Packaging tested code is not possible.
The ultimate usage will be something like this:
pip install <test-package>
pytest -m <some-mark-depending-on-build/project> --<additional-variables>
Is there a way to do this?
EDIT: If there is, please point me out toward a solution.
Thanks in advance.
Keeping it here for references.
The way to do this is to create a test package that can run as python module, from main.py.
After researching and some testing, I've concluded that in my case this will create more code to maintain than I would otherwise properly reuse.

Reading setup.py via API to test requirements

I'm working on a plugin system and was thinking of simply using a setup.py file for each 'plugin' since it's already an existing dependency. The thing is, I need a way to test requirements.
Is there already an existing API in place for this, or would it make more sense just to roll a custom system and check it manually?
setup.py is a script, and you can't generally parse that to figure out requirements, especially since some setup scripts will change the requirements depending on the python version used to run them.
There is an upcoming standard that will fix this: PEP 345. At this point in time very few packages make use of this though. For more information on this topic you can look at the distutils-sig list archives where this topic has come up several times.
Have you looked at egg entry points? They basically implement a plugin system that you can use directly. This stackoverflow question has some information that might be interesting.

Why are there no Makefiles for automation in Python projects?

As a long time Python programmer, I wonder, if a central aspect of Python culture eluded me a long time: What do we do instead of Makefiles?
Most ruby-projects I've seen (not just rails) use Rake, shortly after node.js became popular, there was cake. In many other (compiled and non-compiled) languages there are classic Make files.
But in Python, no one seems to need such infrastructure. I randomly picked Python projects on GitHub, and they had no automation, besides the installation, provided by setup.py.
What's the reason behind this?
Is there nothing to automate? Do most programmers prefer to run style checks, tests, etc. manually?
Some examples:
dependencies sets up a virtualenv and installs the dependencies
check calls the pep8 and pylint commandlinetools.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
the coffeescript task compiles all coffeescripts to minified javascript
the runserver task depends on dependencies and coffeescript
the deploy task depends on check and test and deploys the project.
the docs task calls sphinx with the appropiate arguments
Some of them are just one or two-liners, but IMHO, they add up. Due to the Makefile, I don't have to remember them.
To clarify: I'm not looking for a Python equivalent for Rake. I'm glad with paver. I'm looking for the reasons.
Actually, automation is useful to Python developers too!
Invoke is probably the closest tool to what you have in mind, for automation of common repetitive Python tasks: https://github.com/pyinvoke/invoke
With invoke, you can create a tasks.py like this one (borrowed from the invoke docs)
from invoke import run, task
#task
def clean(docs=False, bytecode=False, extra=''):
patterns = ['build']
if docs:
patterns.append('docs/_build')
if bytecode:
patterns.append('**/*.pyc')
if extra:
patterns.append(extra)
for pattern in patterns:
run("rm -rf %s" % pattern)
#task
def build(docs=False):
run("python setup.py build")
if docs:
run("sphinx-build docs docs/_build")
You can then run the tasks at the command line, for example:
$ invoke clean
$ invoke build --docs
Another option is to simply use a Makefile. For example, a Python project's Makefile could look like this:
docs:
$(MAKE) -C docs clean
$(MAKE) -C docs html
open docs/_build/html/index.html
release: clean
python setup.py sdist upload
sdist: clean
python setup.py sdist
ls -l dist
Setuptools can automate a lot of things, and for things that aren't built-in, it's easily extensible.
To run unittests, you can use the setup.py test command after having added a test_suite argument to the setup() call. (documentation)
Dependencies (even if not available on PyPI) can be handled by adding a install_requires/extras_require/dependency_links argument to the setup() call. (documentation)
To create a .deb package, you can use the stdeb module.
For everything else, you can add custom setup.py commands.
But I agree with S.Lott, most of the tasks you'd wish to automate (except dependencies handling maybe, it's the only one I find really useful) are tasks you don't run everyday, so there wouldn't be any real productivity improvement by automating them.
There is a number of options for automation in Python. I don't think there is a culture against automation, there is just not one dominant way of doing it. The common denominator is distutils.
The one which is closed to your description is buildout. This is mostly used in the Zope/Plone world.
I myself use a combination of the following: Distribute, pip and Fabric. I am mostly developing using Django that has manage.py for automation commands.
It is also being actively worked on in Python 3.3
Any decent test tool has a way of running the entire suite in a single command, and nothing is stopping you from using rake, make, or anything else, really.
There is little reason to invent a new way of doing things when existing methods work perfectly well - why re-invent something just because YOU didn't invent it? (NIH).
The make utility is an optimization tool which reduces the time spent building a software image. The reduction in time is obtained when all of the intermediate materials from a previous build are still available, and only a small change has been made to the inputs (such as source code). In this situation, make is able to perform an "incremental build": rebuild only a subset of the intermediate pieces that are impacted by the change to the inputs.
When a complete build takes place, all that make effectively does is to execute a set of scripting steps. These same steps could just be deposited into a flat script. The -n option of make will in fact print these steps, which makes this possible.
A Makefile isn't "automation"; it's "automation with a view toward optimized incremental rebuilds." Anything scripted with any scripting tool is automation.
So, why would Python project eschew tools like make? Probably because Python projects don't struggle with long build times that they are eager to optimize. And, also, the compilation of a .py to a .pyc file does not have the same web of dependencies like a .c to a .o.
A C source file can #include hundreds of dependent files; a one-character change in any one of these files can mean that the source file must be recompiled. A properly written Makefile will detect when that is or is not the case.
A big C or C++ project without an incremental build system would mean that a developer has to wait hours for an executable image to pop out for testing. Fast, incremental builds are essential.
In the case of Python, probably all you have to worry about is when a .py file is newer than its corresponding .pyc, which can be handled by simple scripting: loop over all the files, and recompile anything newer than its byte code. Moreover, compilation is optional in the first place!
So the reason Python projects tend not to use make is that their need to perform incremental rebuild optimization is low, and they use other tools for automation; tools that are more familiar to Python programmers, like Python itself.
The original PEP where this was raised can be found here. Distutils has become the standard method for distributing and installing Python modules.
Why? It just happens that python is a wonderful language to perform the installation of Python modules with.
Here are few examples of makefile usage with python:
https://blog.horejsek.com/makefile-with-python/
https://krzysztofzuraw.com/blog/2016/makefiles-in-python-projects.html
I think that a most of people is not aware "makefile for python" case. It could be useful, but "sexiness ratio" is too small to propagate rapidly (just my PPOV).
Is there nothing to automate?
Not really. All but two of the examples are one-line commands.
tl;dr Very little of this is really interesting or complex. Very little of this seems to benefit from "automation".
Due to documentation, I don't have to remember the commands to do this.
Do most programmers prefer to run stylechecks, tests, etc. manually?
Yes.
generation documentation,
the docs task calls sphinx with the appropiate arguments
It's one line of code. Automation doesn't help much.
sphinx-build -b html source build/html. That's a script. Written in Python.
We do this rarely. A few times a week. After "significant" changes.
running stylechecks (Pylint, Pyflakes and the pep8-cmdtool).
check calls the pep8 and pylint commandlinetools
We don't do this. We use unit testing instead of pylint.
You could automate that three-step process.
But I can see how SCons or make might help someone here.
tests
There might be space for "automation" here. It's two lines: the non-Django unit tests (python test/main.py) and the Django tests. (manage.py test). Automation could be applied to run both lines.
We do this dozens of times each day. We never knew we needed "automation".
dependecies sets up a virtualenv and installs the dependencies
Done so rarely that a simple list of steps is all that we've ever needed. We track our dependencies very, very carefully, so there are never any surprises.
We don't do this.
the test task depends on dependencies enables the virtualenv, starts selenium-server for the integration tests, and calls nosetest
The start server & run nosetest as a two-step "automation" makes some sense. It saves you from entering the two shell commands to run both steps.
the coffeescript task compiles all coffeescripts to minified javascript
This is something that's very rare for us. I suppose it's a good example of something to be automated. Automating the one-line script could be helpful.
I can see how SCons or make might help someone here.
the runserver task depends on dependencies and coffeescript
Except. The dependencies change so rarely, that this seems like overkill. I supposed it can be a good idea of you're not tracking dependencies well in the first place.
the deploy task depends on check and test and deploys the project.
It's an svn co and python setup.py install on the server, followed by a bunch of customer-specific copies from the subversion area to the customer /www area. That's a script. Written in Python.
It's not a general make or SCons kind of thing. It has only one actor (a sysadmin) and one use case. We wouldn't ever mingle deployment with other development, QA or test tasks.

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

Categories

Resources