Python: incorporate git repo in project

Python: incorporate git repo in project - python

What is best practice for incorporating git repositories into the own project which do not contain a setup.py? There are several ways that I could imagine, but it doesn't seem clear which is best.
Copy the relevant code and include it into the own project
pro:
Only use the relevant code,
con:
git repo might be updated
need to do this for every project again
feels like stealing
Cloning the repository and writing a setup.py and install it with pip
pro:
easy
package can be updated
can use package like any normal pip package
con:
feels weird
Clone the repository and add the path to the project's search path
pro:
easy
package can be updated
con:
needing to adjust the search path also feels strange

In my opinion, you forgot the best option: Ask the original project maintainer to make the package available via pip. Since pip can install directly from git repositories this doesn't take more than a setup.py -- in particular, you don't need a PyPI account, you don't need to tag releases, etc.
If that's not possible then I would opt for your second option, i.e. provide my own setup.py file in a fork of the project. This makes incorporating upstream changes pretty easy (basically you simply git pull them from the upstream repo) and gives you all the benefits of package management (automatic installation, dependency management, etc.).

Related

Setup.py install_requires to find local src packages

I have multiple shared internal libraries that many repositories depends on. Right now, these libraries are in the same git repository and are submoduled into each application when needed. During the build time, I will pip install the libraries. The problem I am facing is these internal libraries also depends on each other, but the dependencies can't be resolved since they are in local folders.
For example, I have local library A depends on B. This will NOT work
setup(
name='A_package',
install_requires=[
'B_package', # source file in local folder
],
...
)
since pip tries to find B_package on PyPI.
I have searched many solutions however, I can't seem to find a straight forward solution such as
install_requires=[
'/commonlib/path/B_package',
],
This way, I can just pip install A_package then B_package will also be located and installed.
The reason I would like to have shared library source code as a submodule is to make development easier so engineers can modify and commit the libraries whenever needed. I am welcome for any other suggestions.

Cost of Publish Packages vs Speed of Development
The Tradeoff between these two things is the key here.
Using Git Repo : Good Speed at Small Scale, Problems when it grows
This is what I first tried in my company. We don't use submodule, we just put git repo under where pip installs packages, like /Users/xxx/miniconda3/lib/python3.6/site-packages. pip will always treat that package as installed, and we synchronize that git repo to update that package.
This works great at small scale, but it brings problems when project grows. When you are using git repo, you are using git revision instead of pypi package version so you need to maintain version dependency manually. Suppose project A uses package B, they both have their own versions, how to maintain the dependency, two options:
Always uses the latest version of B, B is backward compatible - Easy for A, but put burden on B.
Just like python package version, put a git revision or tag in A, people need to checkout B every time they switch to a different branch in A - Painful for large project with multiple branches
And if you have multiple virtualenv, you have to extra work to do.
Publish Package But Minimize the Cost : The Way to Go
This is what I ended up with in my company. I setup our own pypi server and the gitlab ci to publish package when a tag is pushed. It doesn't have the problems with the previous one and also supports a fast development iteration.
For Developer
$ git commit ...
$ git tag ...
$ git push && git push --tags
Two commands is all they need to do to publish a package, it's cheap. And we actually use bumpversion to manage version instead of a manual tag.
For User
$ pip install -r requirements.txt
Every time they switch to anther branch in A or someone fix bug in B, they only need to pip install.
EDIT 2019.05.27
It's possible to do what you want with setup.py. When you run pip install, it will download package, unpack it and run python setup.py install, so
you can add custom logic in setup.py:
install_requires = ['b', 'c', 'd']
# make sure it's in the python path and has been checked out
if is_package_b_installed_as_git_repo():
install_requires.remove('b')
setup(
name='A_package',
install_requires=install_requries,
...
)

When working with a venv virtual environment, which files should I be commiting to my git repository?

Using GitHub's .gitignore, I was able to filter out some files and directories. However, there's a few things that left me a little bit confused:
GitHub's .gitignore did not include /bin and /share created by venv. I assumed they should be ignored by git, however, as the user is meant to build the virtual environment themselves.
Pip generated a pip-selfcheck.json file, which seemed mostly like clutter. I assume it usually does this, and I just haven't seen the file before because it's been placed with my global pip.
pyvenv.cfg is what I really can't make any sense of, though. On one hand, it specifies python version, which ought to be needed for others who want to use the project. On the other hand, it also specifies home = /usr/bin, which, while perhaps probably correct on a lot of Linux distributions, won't necessarily apply to all systems.
Are there any other files/directories I missed? Are there any stricter guidelines for how to structure a project and what to include?

Although venv is a very useful tool, you should not assume (unless you have good reason to do so) that everyone who looks at your repository uses it. Avoid committing any files used only by venv; these are not strictly necessary to be able to run your code and they are confusing to people who don't use venv.
The only configuration file you need to include in your repository is the requirements.txt file generated by pip freeze > requirements.txt which lists package dependencies. You can then add a note in your readme instructing users to install these dependencies with the command pip install -r requirements.txt. It would also be a good idea to specify the required version of Python in your readme.

Why should i create node_modules folder for nodejs dependencies for each expressjs app

I really don't get it. When i run npm install on the main folder, why does it have to download all the dependencies in the node_modules and why does this need to be done for each single project? In Sinatra(Ruby microframework), I never had to do this and it is easy to use the gems that are installed globally without having to download and save each one of them into the project folder again.
I read somewhere that it is done to avoid version mismatch issues but if its working by installing it globally and simply 'require'ing it in many other languages like Python(uses virtualenv to tackle version issues), Ruby etc., why can't it be the same for node.js?
Whatever happened to DRY?

You can do npm install --global (or -g for short) to install globally. But that creates a problem when 2 projects reference to different version of the same dependency. You will have conflicts that is hard to trace. Also install it locally makes it more portable. You can reference to this document
This problem does not only exist in node.js world. Just different language approach this problem differently. I don't know ruby well, but:
In python people use virtualenv to separate dependencies, which needs more effort.
Maven of Java will cache artifacts in $HOME/.m2 folder, but when compile the projects, they will copy those bytecode from .m2 folder to a local folder target.
There is time you want to do npm install --global though for those tools such as grunt.js/gulp.js.
I just think it has nothing to do with DRY, because you didn't write code twice. It's just downloaded twice.
That being said, you still can install everything globlally.

Easy-install live python libraries/scripts

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?

I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

Distributing python code with virtualenv?

I want to distribute some python code, with a few external dependencies, to machines with only core python installed (and users that unfamiliar with easy_install etc.).
I was wondering if perhaps virtualenv can be used for this purpose? I should be able to write some bash scripts that trigger the virtualenv (with the suitable packages) and then run my code.. but this seems somewhat messy, and I'm wondering if I'm re-inventing the wheel?
Are there any simple solutions to distributing python code with dependencies, that ideally doesn't require sudo on client machines?

Buildout - http://pypi.python.org/pypi/zc.buildout
As sample look at my clean project: http://hg.jackleo.info/hyde-0.5.3-buildout-enviroment/src its only 2 files that do the magic, more over Makefile is optional but then you'll need bootstrap.py (Make file downloads it, but it runs only on Linux). buildout.cfg is the main file where you write dependency's and configuration how project is laid down.
To get bootstrap.py just download from http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py
Then run python bootstap.py and bin/buildout. I do not recommend to install buildout locally although it is possible, just use the one bootstrap downloads.
I must admit that buildout is not the easiest solution but its really powerful. So learning is worth time.
UPDATE 2014-05-30
Since It was recently up-voted and used as an answer (probably), I wan to notify of few changes.
First of - buildout is now downloaded from github https://raw.githubusercontent.com/buildout/buildout/master/bootstrap/bootstrap.py
That hyde project would probably fail due to buildout 2 breaking changes.
Here you can find better samples http://www.buildout.org/en/latest/docs/index.html also I want to suggest to look at "collection of links related to Buildout" part, it might contain info for your project.
Secondly I am personally more in favor of setup.py script that can be installed using python. More about the egg structure can be found here http://peak.telecommunity.com/DevCenter/PythonEggs and if that looks too scary - look up google (query for python egg). It's actually more simple in my opinion than buildout (definitely easier to debug) as well as it is probably more useful since it can be distributed more easily and installed anywhere with a help of virtualenv or globally where with buildout you have to provide all of the building scripts with the source all of the time.

You can use a tool like PyInstaller for this purpose. Your application will appear as a single executable on all platforms, and include dependencies. The user doesn't even need Python installed!
See as an example my logview package, which has dependencies on PyQt4 and ZeroMQ and includes distributions for Linux, Mac OSX and Windows all created using PyInstaller.

You don't want to distribute your virtualenv, if that's what you're asking. But you can use pip to create a requirements file - typically called requirements.txt - and tell your users to create a virtualenv then run pip install -r requirements.txt, which will install all the dependencies for them.
See the pip docs for a description of the requirements file format, and the Pinax project for an example of a project that does this very well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.