I feel like I am doing something wrong. We have some projects that produce pip packages in CI whenever we push a commit. I am using setuptools_scm to produce a version number based upon the last tag. I have two problems that I am struggling to solve.
Let's say we have a scenario where two developers are working in two different feature branches. Whenever either of them commits their code, our CI produces a new pip package and pushes it to a development pypi server. The version contains information about the previous tag and the commit hash, but it doesn't contain any information about the feature branch that produced it. If I look at the pypi server I will see packages from both developers. As far as I can see, I can't tell which packages came from which feature branch without significant effort.
If someone wants to test out the feature branch, then they need to figure out the exact version number produced by setuptools_scm - something like package-0.1.dev41+gabcdef12. This is painful to communicate everytime someone pushes a new commit. It would be nice if the branch name was somehow part of the version. (Something like package-0.1.branch.dev41+gabcdef12 Then the user could do a pip install package==0.1.branch to get the latest from my branch. But I see that this is not a valid version.)
I've looked at https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/specification.html and the various PEPs that it references. The only place where I could reasonable put a branch name would be in the local section. This would solve the first problem. I could easily see which feature branch each package came from. But it doesn't help me with testing out a feature branch.
I know that I could produce an alpha/beta/rc tag and use that. But this doesn't map to their intended use. An rc would normally have several commits from many feature branches that were merged since the last release, not a new rc for every commit on a feature branch.
I know that I'm not the only one using git and pip packages. Since I can't find a solution to the problem, I worry that I might be thinking about it wrong. Are there commonly used or standardized ways to handle these issues?
For those who may come across this in the future, I think the best solution is to not package feature branches. Pip allows us to install from a feature branch via pip install git+${REPO_URL}#branch syntax. This syntax works on the command line, and requirements.txt files, and with tools like pip-compile. The user can tie themselves to the head of a particular branch or a specific commit.
The syntax is not the easiest to remember, but it does a very effective job of allowing me to share a feature branch. When I want to make a release, I can then tag the repo and create a package that is more publicly consumable.
Related
for this Git issue I saw that the the gitrepo updated a file for TensorFlow. Now I want to check if the changes can be found in my installation.
I am using conda and installed the specific TensorFlow version in an environment. The file should be here: tensorflow/lite/interpreter.h
However, going down the side package route ~/anaconda3/envs/AI2.6/lib/python3.6/site-packages/tensorflow/lite/, I cannot find the file.
find | grep interpreter in this folder tree gives me
./python/interpreter.py
./python/interpreter_wrapper
./python/interpreter_wrapper/init.py
./python/interpreter_wrapper/pycache
./python/interpreter_wrapper/pycache/init.cpython-36.pyc
./python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
./python/pycache/interpreter.cpython-36.pyc
Could you give me a hint where to find the file, or how to check if a specific commit made it into the stable version of TensorFlow?
Thanks
edit: While typing, I got the answer that the version is in the nightly version, however, it would still be interesting to learn how to find out if a commit made it into a stable release. And why I cannot find the file which should be there.
From the git side, the answer to the question is easy, provided:
that you know the commit's hash ID; and
that the answer you want is is this specific commit in a repository?
The reason for this is that Git commit hash IDs are universally unique. If some repository has some commit, it has that hash ID, in that repository and in every other repository. So you just inspect the repository to see if it has that commit, with that hash ID, and you're done.
In practice—since you've scattered this across a wide rang of tags (I plucked off the linux one since we're not talking about Linux programming APIs here)—this answer isn't useful, not even in the git arena, because commits get copied and modified, and the new-and-improved—or older and worsened, or whatever—version of some commit will have a different hash ID. You often care whether you have some version of some commit, rather than some specific commit.
For this other purpose ("do I have some version of this commit?"), you can sometimes use what Git calls a patch-ID. To find the patch ID of some commit, run the commit through the git patch-ID program (read the linked documentation for details). Then, run potentially matching commits through git patch-ID as well. If they produce the same patch ID, they are equivalent commits, even if they are technically different and therefore have different hash IDs.
A more general, more useful, and more portable way to find out if you have some particular feature requires effort on the part of the maintainers: changelogs, feature tests, and documentation. If something brings new behavior, or new files, or whatever, it should be documented, and in some cases you might want to have, in your programming language, a way to test for the existence of this feature. In python in particular, the core documentation has, for instance, things like this:
subprocess.run(args, *, stdin=None, ...
...
New in version 3.5.
Changed in version 3.6: Added encoding and errors parameters
...
You can also use Python constructs like:
try:
import what.ever
except ImportError:
... do whatever you need here ...
and similar tricks, and import sys and inspect sys.version and so on.
The file should be here: tensorflow/lite/interpreter.h
The OS-specific methods for testing the existence of a file in a path depend on the OS, but when using github, you can construct the URL from the file's name knowing the systematic scheme that the GitHub folks use. For instance, https://github.com/git/git/blob/seen/Makefile is the URL to view the version of Makefile at the tip commit of branch seen in the Git repository mirror for Git itself on GitHub.
I searched a bit but could not find a clear answer.
The goal is, to have two pip indexes, one is a private index, that will be a first priority. And one is the standard PyPI. The priority is there to prevent the security risk of code injection.
Say I have library named lib, and I configure index_url = http://my_private_pypi_repo and extra_index_url = https://pypi.org/simple
If I pip install lib, and lib exists in both indexes. What index will get the priority? From where it is going to be installed from?
Also, if I pip install lib=0.0.2 but lib exists in my private index at version 0.0.1. Is it going to look at PyPI as well?
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
The short answer is: there is no prioritization and you probably should avoid using --extra-index-url entirely.
This is asked and answered here: https://github.com/pypa/pip/issues/5045#issuecomment-369521345
Question:
I have this in my pip.conf:
[global]
index-url = https://myregistry-xyz.com
extra-index-url = https://pypi.python.org/pypi
Let's assume packageX exists in both registries and I run pip install packageX.
I expect pip to install packageX from https://myregistry-xyz.com, but pip will use https://pypi.python.org/pypi instead.
If I switch the values for index-url and extra-index-url I get the same result. pypi is always prioritized.
Answer:
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
I would also recommend reading this discussion: https://discuss.python.org/t/dependency-notation-including-the-index-url/5659
There are quite a lot of things that are addressed in this discussion, some that is clearly out of scope for this question, but everything is very informative anyway.
In there, there should be the key takeaway for you:
Pip does not really prioritize one index over the other in theory. In practice, because of a coincidence in the way things are implemented in code, it might be that one is always checked first, but it is not a behavior you should rely on.
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
You should setup and curate your own package index (devpi, pydist, jfrog artifactory, sonatype nexus, etc.) and use it exclusively, meaning: never use --extra-index-url. This is the only way you can have exact control over what gets downloaded. This custom repository might function mostly a proxy for the public PyPI, except for a couple of dependencies.
Related:
pip: selecting index url based on package name?
The title of this question feels a bit like an instance of XY problem. If you would elaborate more on what you want to achieve and what your constraints are we may be able to give you a better answer.
That said, sinoroc's suggestion to curate your own package index and use only that is a good one. A few other ideas also come to mind:
Update: Turns out pip may run distributions other than those in the constraints file so this method should probably be considered insecure. Additionally hashes are kind of broken on recent releases of pip.
Using a constraints file with hashes. This file can be generated using pip-tools like pip-compile --generate-hashes assuming you have documented your dependencies in a file named requirements.in. You can then install packages like pip install -c requirements.txt some_package.
Pro: What may be installed is documented alongside your code in your VCS.
Con: Controlling what is downloaded the first time is either tricky or laborious.
Con: Hash checking can be slow.
Con: You run into issues more frequently than when not using hashes. Some can be worked around others cannot; it is for instance not possible to combine constraints like -e file://` with hashes.
Use an alternative packaging tool like pipenv. It works similarly to the previous suggestion.
Pro: Easy to use
Con: Harder to integrate into your workflow if it does not fit naturally.
Curate packages locally. Packages and dependencies can be downloaded like pip download --dest some_dir some_package and installed like pip install --no-index --find-links some_dir.
Pro: What may be installed can be documented alongside your code, if you track the artifacts in VCS e.g. git lfs.
Con: Either all packages are downloaded or none are.
Use a hermetic build system. I know bazel advertise this as a feature, not sure about others like pants and buck.
Pro: May be the ultimate solution if you want control over your builds.
Con: Does not integrate well with open source python ecosystem afaik.
Con: A lot of overhead.
1: https://en.wikipedia.org/wiki/XY_proble
So far I know requirements.txt like this: Django==2.0. Now I saw this style of writing Django>=1.8,<2.1.99
Can you explain to me what it means?
requirements.txt is a file where one specifies dependencies. For example your program will here depend on Django (well you probably do not want to implement Django yourself).
In case one only writes a custom application, and does not plan to export it (for example as a library) to other programmers, one can pin the version of the library, for example Django==2.0.1. Then you can always assume (given pip manages to install the correct package) that your environment will ave the correct version, and thus that if you follow the correct documentation, no problems will (well should) arise.
If you however implement a library, for example mygreatdjangolibrary, then you probably do not want to pin the version: it would mean that everybody that wants to use your library would have to install Django==2.0.1. Imagine that they want a feature that is only available in django-2.1, then they can - given they follow the dependencies strictly - not do this: your library requires 2.0.1. This is of course not manageable.
So typically in a library, one aims to give as much freedom to a user of a library. It would be ideal if regardless of the Django version the user installed, your library could work.
Unfortunately this would result in a lot of trouble for the library developer. Imagine that you have to take into account that a user can use Django-1.1 up to django-2.1. Through the years, several features have been introduced that the library then can not use, since the programmer should be conservative, and take into account that it is possible that these features do not exist in the library the user installed.
It becomes even worse since Django went through some refactoring: some features have later been removed, so we can not simply program on django-1.1 and hope that everything works out.
So in that case, it makes sense to specify a range of versions we support. For example we can read the documentation of django-2.0, and look to the release notes to see if something relevant changed in django-2.1, and let tox test both versions for the tests we write. We thus then can specify a range like Django>=2.0,<2.1.99.
This is also important if you depend on several libraries that each a common requirement. Say for example you want to install a library liba, and a library libb, both depend on Django, bot the two have a different range, for example:
liba:
Django>=1.10, <2.1
libb:
Django>=1.9, <1.11
Then this thus means that we can only install a Django version between >=1.10 and <1.11.
The above even easily gets more complex. Since liba and libb of course have versions as well, for example:
liba-0.1:
Django>=1.10, <2.1
liba-0.2:
Django>=1.11, <2.1
liba-0.3:
Django>=1.11, <2.2
libb-0.1:
Django>=1.5, <1.8
libb-0.2:
Django>=1.10, <2.0
So if we now want to install any liba, and any libb, we need to find a version of liba and libb that "allows" us to install a Django version, and that is not that trivial since for example if we would pick libb-0.1, then there is no version of liba that supports an "overlapping" Django version.
To the best of my knowledge, pip currently has no dependency resolution algorithm. It looks at the specification, and each time aims to pick the most recent that is satisfying the constraints, and recursively installs the dependencies of these packages.
Therefore it is up to the user to make sure that (sub)dependencies do not conflict: if we would specify liba libb==0.1, then pip will probably install Django-2.1, and then find out that libb can not work with this.
There are some dependency resolution programs. But the problem turns out to be quite hard (it is NP-hard if I recall correctly). So that means that for a given dependency tree, it can takes years to find a valid configuration.
Say someone owns a repository with only one master hosting code that is compatible with Python 2.7.X. I would like to contribute to that repository with my own changes to a new branch new_branch to offer a variant of the repository that is compatible with Python 3.
I followed the steps here:
I forked the repository on GitHub on my account
I cloned my fork on my local machine
I created a new branch new_branch locally
I made the relevant changes
I committed and pushed the changes to my own fork on GitHub
I went on the browser to the GitHub page of the official repository, and asked for a pull request
The above worked, but it did a pull request from "my_account:new_branch" to "official_account:master". This is not what I want, since Python 2.7.x and Python 3 are incompatible with each other. What I would like to do is create a PR to a new branch on the official repository (e.g. with the same name "new_branch"). How can I do that? Is this possible at all?
You really don't want to do things this way. But first I'll explain how to do it, then I'll come back to explain why not to.
Using Pull Requests at GitHub has a pretty good overview, in particular the section "Changing the branch range and destination repository." It's easiest if you use a topic branch, and have the upstream owner create a topic branch of the same name; then you just pull down the menu where it says "base: master" and the choice will be right there, and he can just click the "merge" button and have no surprises.
So, why don't you want to do things this way?
First, it doesn't fit the GitHub model. Topic branches that live forever in parallel with the master branch and have multiple forks make things harder to maintain and visualize.
Second, you need both a git URL and an https URL for you code. You need people to be able to share links, pip install from top of tree, just clone the repo instead of cloning and then checking out a different branch, etc. This all means your code has to be on the master branch.
Third, if you want people to be able to install your 3.x version off PyPI, find docs at readthedocs, etc., you need a single project with a single source tree. Most such sites have a single latest version, not a latest version for each Python version, and definitely not multiple variations of the same version. (You could install completely fork the project, and create a separate foo3 project. But it's much easier for people to be able to pip install foo than to have them try that, fail, come to SO and ask why it doesn't work, and get told they probably have Python 3 and need to pip install foo3 instead.)
How do you merge two versions into a single package? The porting docs should have the most up-to-date advice, but briefly: If it's at all possible to create a single codebase that runs on both versions, that's ideal; if not, and if you can't make things work by running 2to3 or 3to2 at install time, create a parallel directory for the 3.x code (e.g., a foo3 alongside foo) and pick the appropriate directory at install time. (You can always start with that and gradually work toward a unified codebase.)
One issue that comes up during Pinax development is dealing with development versions of external apps. I am trying to come up with a solution that doesn't involve bringing in the version control systems. Reason being I'd rather not have to install all the possible version control systems on my system (or force that upon contributors) and deal the problems that might arise during environment creation.
Take this situation (knowing how Pinax works will be beneficial to understanding):
We are beginning development on a new version of Pinax. The previous version has a pip requirements file with explicit versions set. A bug comes in for an external app that we'd like to get resolved. To get that bug fix in Pinax the current process is to simply make a minor release of the app assuming we have control of the app. Apps we don't have control we just deal with the release cycle of the app author or force them to make releases ;-) I am not too fond of constantly making minor releases for bug fixes as in some cases I'd like to be working on new features for apps as well. Of course branching the older version is what we do and then do backports as we need.
I'd love to hear some thoughts on this.
Could you handle this using the "==dev" version specifier? If the distribution's page on PyPI includes a link to a .tgz of the current dev version (such as both github and bitbucket provide automatically) and you append "#egg=project_name-dev" to the link, both easy_install and pip will use that .tgz if ==dev is requested.
This doesn't allow you to pin to anything more specific than "most recent tip/head", but in a lot of cases that might be good enough?
I meant to mention that the solution I had considered before asking was to put up a Pinax PyPI and make development releases on it. We could put up an instance of chishop. We are already using pip's --find-links to point at pypi.pinaxproject.com for packages we've had to release ourselves.
Most open source distributors (the Debians, Ubuntu's, MacPorts, et al) use some sort of patch management mechanism. So something like: import the base source code for each package as released, as a tar ball, or as a SCM snapshot. Then manage any necessary modifications on top of it using a patch manager, like quilt or Mercurial's Queues. Then bundle up each external package with any applied patches in a consistent format. Or have URLs to the base packages and URLs to the individual patches and have them applied during installation. That's essentially what MacPorts does.
EDIT: To take it one step further, you could then version control the set of patches across all of the external packages and make that available as a unit. That's quite easy to do with Mercurial Queues. Then you've simplified the problem to just publishing one set of patches using one SCM system, with the patches applied locally as above or available for developers to pull and apply to their copies of the base release packages.
EDIT: I am not sure I am reading your question correctly so the following may not answer your question directly.
Something I've considered, but haven't tested, is using pip's freeze bundle feature. Perhaps using that and distributing the bundle with Pinax would work? My only concern would be how different OS's are handled. For example, I've never used pip on Windows, so I wouldn't know how a bundle would interact there.
The full idea I hope to try is creating a paver script that controls management of the bundles, making it easy for users to upgrade to newer versions. This would require a bit of scaffolding though.
One other option may be you keeping a mirror of the apps you don't control, in a consistent vcs, and then distributing your mirrored versions. This would take away the need for "everyone" to have many different programs installed.
Other than that, it seems the only real solution is what you guys are doing, there isn't a hassle-free way that I've been able to find.