Setup.py install_requires to find local src packages - python

I have multiple shared internal libraries that many repositories depends on. Right now, these libraries are in the same git repository and are submoduled into each application when needed. During the build time, I will pip install the libraries. The problem I am facing is these internal libraries also depends on each other, but the dependencies can't be resolved since they are in local folders.
For example, I have local library A depends on B. This will NOT work
setup(
name='A_package',
install_requires=[
'B_package', # source file in local folder
],
...
)
since pip tries to find B_package on PyPI.
I have searched many solutions however, I can't seem to find a straight forward solution such as
install_requires=[
'/commonlib/path/B_package',
],
This way, I can just pip install A_package then B_package will also be located and installed.
The reason I would like to have shared library source code as a submodule is to make development easier so engineers can modify and commit the libraries whenever needed. I am welcome for any other suggestions.

Cost of Publish Packages vs Speed of Development
The Tradeoff between these two things is the key here.
Using Git Repo : Good Speed at Small Scale, Problems when it grows
This is what I first tried in my company. We don't use submodule, we just put git repo under where pip installs packages, like /Users/xxx/miniconda3/lib/python3.6/site-packages. pip will always treat that package as installed, and we synchronize that git repo to update that package.
This works great at small scale, but it brings problems when project grows. When you are using git repo, you are using git revision instead of pypi package version so you need to maintain version dependency manually. Suppose project A uses package B, they both have their own versions, how to maintain the dependency, two options:
Always uses the latest version of B, B is backward compatible - Easy for A, but put burden on B.
Just like python package version, put a git revision or tag in A, people need to checkout B every time they switch to a different branch in A - Painful for large project with multiple branches
And if you have multiple virtualenv, you have to extra work to do.
Publish Package But Minimize the Cost : The Way to Go
This is what I ended up with in my company. I setup our own pypi server and the gitlab ci to publish package when a tag is pushed. It doesn't have the problems with the previous one and also supports a fast development iteration.
For Developer
$ git commit ...
$ git tag ...
$ git push && git push --tags
Two commands is all they need to do to publish a package, it's cheap. And we actually use bumpversion to manage version instead of a manual tag.
For User
$ pip install -r requirements.txt
Every time they switch to anther branch in A or someone fix bug in B, they only need to pip install.
EDIT 2019.05.27
It's possible to do what you want with setup.py. When you run pip install, it will download package, unpack it and run python setup.py install, so
you can add custom logic in setup.py:
install_requires = ['b', 'c', 'd']
# make sure it's in the python path and has been checked out
if is_package_b_installed_as_git_repo():
install_requires.remove('b')
setup(
name='A_package',
install_requires=install_requries,
...
)

Related

Dependency management in Python packages

I'm currently developing a CI process for a micro-services Python application. It is built as such: each micro service is packed as a docker image, and there are several of those. In addition, there's some common code which is packed as a PyPI package, and is consumed by the services.
Fot the manner of the discussion, let's say we have a service called foo, and the common code called lib.
In the day-to-day development, we want foo to consume the latest versions of lib. But once we want to release a version of foo, we want to merge the code to the main branch, and record the exact version of lib in the requirements.txt of foo.
The idea that came up is to work in the following manner: In foo, We'll have a develop and master branches. On each push to develop, we'll build an image with the latest version of lib. When the developers merge the code to master, we run pip freeze > requirements.txt , and git push it to main again, so that when we want to come back to this version, we'll have a the requirements.txt file pinned to a specific version of lib (and the rest of the dependencies).
That sounds OK overall, but let's add a complexity to this:
Our lib, in turn, depends on another PyPI package, let's call it utils. The setup.py of lib contains a install_requires field which specified utils. Again, in the day-to-day, we want it to consume the latest version of utils, but when we merge the code to main (in lib), we want to pin a specific version.
The question is, is there a way to automatically update the setup.py install_requires section as there is to automatically update the requirements.txt with pip freeze?
But actually my question is - does this process make sense? Maybe we're missing something here?

Python: incorporate git repo in project

What is best practice for incorporating git repositories into the own project which do not contain a setup.py? There are several ways that I could imagine, but it doesn't seem clear which is best.
Copy the relevant code and include it into the own project
pro:
Only use the relevant code,
con:
git repo might be updated
need to do this for every project again
feels like stealing
Cloning the repository and writing a setup.py and install it with pip
pro:
easy
package can be updated
can use package like any normal pip package
con:
feels weird
Clone the repository and add the path to the project's search path
pro:
easy
package can be updated
con:
needing to adjust the search path also feels strange
In my opinion, you forgot the best option: Ask the original project maintainer to make the package available via pip. Since pip can install directly from git repositories this doesn't take more than a setup.py -- in particular, you don't need a PyPI account, you don't need to tag releases, etc.
If that's not possible then I would opt for your second option, i.e. provide my own setup.py file in a fork of the project. This makes incorporating upstream changes pretty easy (basically you simply git pull them from the upstream repo) and gives you all the benefits of package management (automatic installation, dependency management, etc.).

Why should i create node_modules folder for nodejs dependencies for each expressjs app

I really don't get it. When i run npm install on the main folder, why does it have to download all the dependencies in the node_modules and why does this need to be done for each single project? In Sinatra(Ruby microframework), I never had to do this and it is easy to use the gems that are installed globally without having to download and save each one of them into the project folder again.
I read somewhere that it is done to avoid version mismatch issues but if its working by installing it globally and simply 'require'ing it in many other languages like Python(uses virtualenv to tackle version issues), Ruby etc., why can't it be the same for node.js?
Whatever happened to DRY?
You can do npm install --global (or -g for short) to install globally. But that creates a problem when 2 projects reference to different version of the same dependency. You will have conflicts that is hard to trace. Also install it locally makes it more portable. You can reference to this document
This problem does not only exist in node.js world. Just different language approach this problem differently. I don't know ruby well, but:
In python people use virtualenv to separate dependencies, which needs more effort.
Maven of Java will cache artifacts in $HOME/.m2 folder, but when compile the projects, they will copy those bytecode from .m2 folder to a local folder target.
There is time you want to do npm install --global though for those tools such as grunt.js/gulp.js.
I just think it has nothing to do with DRY, because you didn't write code twice. It's just downloaded twice.
That being said, you still can install everything globlally.

Use setuptools to install from location

I have a framework for a site that I want to use in multiple projects but I don't want to submit my framework to PyPi. Is there anyway I can tell my setup.py to install the framework from a specific location?
Here is my current setup.py
from setuptools import setup
setup(
name='Website',
version='0.2.1',
install_requires=[
'boto>=2.6',
'fabric>=1.4',
'lepl>=5.1',
'pygeoip>=0.2.4',
'pylibmc>=1.2.3',
'pymongo>=2.2',
'pyyaml>=3.1',
'requests>=0.12',
'slimit>=0.7.4',
'thrift>=0.8.0',
'tornado>=2.3',
],
)
Those dependencies are actually all the dependencies for my framework so if I could include it somehow I could only have the framework listed here.
It looks like all of your requirements are public (on PyPI), and you don't need specific versions of them, just "new enough". In 2016, when you can count on everyone having a recent-ish version of pip, there's really nothing to do. If you just pip install . from the source directory or pip install git+https://url/to/package or similar, it will just pull the latest versions of the dependencies off the net. The fact that your package isn't on PyPI won't stop pip from finding its dependencies there.
Or, if you want to stash them all locally, you can set up a local PyPI index. Although in that case, it probably would be simpler to push your package to that same local index, and install it from there.
If you need anything more complicated, a requirements file can take care of that for you.
In particular, if you need to distribute the package to other people in your organization who may not have your team's local index set up, or for some reason you can't set up a local index in the first place, you can put all the necessary information in the requirements file--or, if it's more appropriate, on the command line used to install your package (which even works if you're stuck with easy_install or ancient versions of pip).
The documentation gives full details, and this blog post explains it very nicely, but the short version is this:
If you have a local PyPI index, provide --extra-index-url=http://my.server/path/to/my/pypi/.
If you've got an HTTP server that you can drop the packages on, and you can enable the "auto index directory contents" option in your server, just provide --find-links=http://my.server/path/to/my/packages/.
If you want to use local files (or SMB/AFP/etc. file sharing), create a trivial HTML file with nothing but links to all local packages, and provide --find-links=file:///path/to/my/index.html.
Again, these can go on the command line of a "to install this package, run this" (or a curl | sh install script), but usually you just want to put them in a requirements file. If so, make sure to use only one value per option (e.g., if you want to add two extra indexes, add two --extra-index-url params) and put each one on its own line.
A requirements file also lets you specify specific versions of each package, so you can be sure people are deploying with the same code you developed and tested with, which is often useful in these kinds of situations.

Django and VirtualEnv Development/Deployment Best Practices

Just curious how people are deploying their Django projects in combination with virtualenv
More specifically, how do you keep your production virtualenv's synched correctly with your development machine?
I use git for scm but I don't have my virtualenv inside the git repo - should I, or is it best to use the pip freeze and then re-create the environment on the server using the freeze output? (If you do this, could you please describe the steps - I am finding very little good documentation on the unfreezing process - is something like pip install -r freeze_output.txt possible?)
I just set something like this up at work using pip, Fabric and git. The flow is basically like this, and borrows heavily from this script:
In our source tree, we maintain a requirements.txt file. We'll maintain this manually.
When we do a new release, the Fabric script creates an archive based on whatever treeish we pass it.
Fabric will find the SHA for what we're deploying with git log -1 --format=format:%h TREEISH. That gives us SHA_OF_THE_RELEASE
Fabric will get the last SHA for our requirements file with git log -1 --format=format:%h SHA_OF_THE_RELEASE requirements.txt. This spits out the short version of the hash, like 1d02afc which is the SHA for that file for this particular release.
The Fabric script will then look into a directory where our virtualenvs are stored on the remote host(s).
If there is not a directory named 1d02afc, a new virtualenv is created and setup with pip install -E /path/to/venv/1d02afc -r /path/to/requirements.txt
If there is an existing path/to/venv/1d02afc, nothing is done
The little magic part of this is passing whatever tree-ish you want to git, and having it do the packaging (from Fabric). By using git archive my-branch, git archive 1d02afc or whatever else, I'm guaranteed to get the right packages installed on my remote machines.
I went this route since I really didn't want to have extra virtuenvs floating around if the packages hadn't changed between release. I also don't like the idea of having the actual packages I depend on in my own source tree.
I use this bootstrap.py: http://github.com/ccnmtl/ccnmtldjango/blob/master/ccnmtldjango/template/bootstrap.py
which expects are directory called 'requirements' that looks something like this: http://github.com/ccnmtl/ccnmtldjango/tree/master/ccnmtldjango/template/requirements/
There's an apps.txt, a libs.txt (which apps.txt includes--I just like to keep django apps seperate from other python modules) and a src directory which contains the actual tarballs.
When ./bootstrap.py is run, it creates the virtualenv (wiping a previous one if it exists) and installs everything from requirements/apps.txt into it. I do not ever install anything into the virtualenv otherwise. If I want to include a new library, I put the tarball into requirements/src/, add a line to one of the textfiles and re-run ./bootstrap.py.
bootstrap.py and requirements get checked into version control (also a copy of pip.py so I don't even have to have that installed system-wide anywhere). The virtualenv itself isn't. The scripts that I have that push out to production run ./bootstrap.py on the production server each time I push. (bootstrap.py also goes to some lengths to ensure that it's sticking to Python 2.5 since that's what we have on the production servers (Ubuntu Hardy) and my dev machine (Ubuntu Karmic) defaults to Python 2.6 if you're not careful)

Categories

Resources