Contributing to a repository on GitHub on a new branch - python

Say someone owns a repository with only one master hosting code that is compatible with Python 2.7.X. I would like to contribute to that repository with my own changes to a new branch new_branch to offer a variant of the repository that is compatible with Python 3.
I followed the steps here:
I forked the repository on GitHub on my account
I cloned my fork on my local machine
I created a new branch new_branch locally
I made the relevant changes
I committed and pushed the changes to my own fork on GitHub
I went on the browser to the GitHub page of the official repository, and asked for a pull request
The above worked, but it did a pull request from "my_account:new_branch" to "official_account:master". This is not what I want, since Python 2.7.x and Python 3 are incompatible with each other. What I would like to do is create a PR to a new branch on the official repository (e.g. with the same name "new_branch"). How can I do that? Is this possible at all?

You really don't want to do things this way. But first I'll explain how to do it, then I'll come back to explain why not to.
Using Pull Requests at GitHub has a pretty good overview, in particular the section "Changing the branch range and destination repository." It's easiest if you use a topic branch, and have the upstream owner create a topic branch of the same name; then you just pull down the menu where it says "base: master" and the choice will be right there, and he can just click the "merge" button and have no surprises.
So, why don't you want to do things this way?
First, it doesn't fit the GitHub model. Topic branches that live forever in parallel with the master branch and have multiple forks make things harder to maintain and visualize.
Second, you need both a git URL and an https URL for you code. You need people to be able to share links, pip install from top of tree, just clone the repo instead of cloning and then checking out a different branch, etc. This all means your code has to be on the master branch.
Third, if you want people to be able to install your 3.x version off PyPI, find docs at readthedocs, etc., you need a single project with a single source tree. Most such sites have a single latest version, not a latest version for each Python version, and definitely not multiple variations of the same version. (You could install completely fork the project, and create a separate foo3 project. But it's much easier for people to be able to pip install foo than to have them try that, fail, come to SO and ask why it doesn't work, and get told they probably have Python 3 and need to pip install foo3 instead.)
How do you merge two versions into a single package? The porting docs should have the most up-to-date advice, but briefly: If it's at all possible to create a single codebase that runs on both versions, that's ideal; if not, and if you can't make things work by running 2to3 or 3to2 at install time, create a parallel directory for the 3.x code (e.g., a foo3 alongside foo) and pick the appropriate directory at install time. (You can always start with that and gradually work toward a unified codebase.)

Related

Pip Whl naming conventions for git branches

I feel like I am doing something wrong. We have some projects that produce pip packages in CI whenever we push a commit. I am using setuptools_scm to produce a version number based upon the last tag. I have two problems that I am struggling to solve.
Let's say we have a scenario where two developers are working in two different feature branches. Whenever either of them commits their code, our CI produces a new pip package and pushes it to a development pypi server. The version contains information about the previous tag and the commit hash, but it doesn't contain any information about the feature branch that produced it. If I look at the pypi server I will see packages from both developers. As far as I can see, I can't tell which packages came from which feature branch without significant effort.
If someone wants to test out the feature branch, then they need to figure out the exact version number produced by setuptools_scm - something like package-0.1.dev41+gabcdef12. This is painful to communicate everytime someone pushes a new commit. It would be nice if the branch name was somehow part of the version. (Something like package-0.1.branch.dev41+gabcdef12 Then the user could do a pip install package==0.1.branch to get the latest from my branch. But I see that this is not a valid version.)
I've looked at https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/specification.html and the various PEPs that it references. The only place where I could reasonable put a branch name would be in the local section. This would solve the first problem. I could easily see which feature branch each package came from. But it doesn't help me with testing out a feature branch.
I know that I could produce an alpha/beta/rc tag and use that. But this doesn't map to their intended use. An rc would normally have several commits from many feature branches that were merged since the last release, not a new rc for every commit on a feature branch.
I know that I'm not the only one using git and pip packages. Since I can't find a solution to the problem, I worry that I might be thinking about it wrong. Are there commonly used or standardized ways to handle these issues?
For those who may come across this in the future, I think the best solution is to not package feature branches. Pip allows us to install from a feature branch via pip install git+${REPO_URL}#branch syntax. This syntax works on the command line, and requirements.txt files, and with tools like pip-compile. The user can tie themselves to the head of a particular branch or a specific commit.
The syntax is not the easiest to remember, but it does a very effective job of allowing me to share a feature branch. When I want to make a release, I can then tag the repo and create a package that is more publicly consumable.

How to find out if a commit made it into the stable version (TensorFlow)

for this Git issue I saw that the the gitrepo updated a file for TensorFlow. Now I want to check if the changes can be found in my installation.
I am using conda and installed the specific TensorFlow version in an environment. The file should be here: tensorflow/lite/interpreter.h
However, going down the side package route ~/anaconda3/envs/AI2.6/lib/python3.6/site-packages/tensorflow/lite/, I cannot find the file.
find | grep interpreter in this folder tree gives me
./python/interpreter.py
./python/interpreter_wrapper
./python/interpreter_wrapper/init.py
./python/interpreter_wrapper/pycache
./python/interpreter_wrapper/pycache/init.cpython-36.pyc
./python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
./python/pycache/interpreter.cpython-36.pyc
Could you give me a hint where to find the file, or how to check if a specific commit made it into the stable version of TensorFlow?
Thanks
edit: While typing, I got the answer that the version is in the nightly version, however, it would still be interesting to learn how to find out if a commit made it into a stable release. And why I cannot find the file which should be there.
From the git side, the answer to the question is easy, provided:
that you know the commit's hash ID; and
that the answer you want is is this specific commit in a repository?
The reason for this is that Git commit hash IDs are universally unique. If some repository has some commit, it has that hash ID, in that repository and in every other repository. So you just inspect the repository to see if it has that commit, with that hash ID, and you're done.
In practice—since you've scattered this across a wide rang of tags (I plucked off the linux one since we're not talking about Linux programming APIs here)—this answer isn't useful, not even in the git arena, because commits get copied and modified, and the new-and-improved—or older and worsened, or whatever—version of some commit will have a different hash ID. You often care whether you have some version of some commit, rather than some specific commit.
For this other purpose ("do I have some version of this commit?"), you can sometimes use what Git calls a patch-ID. To find the patch ID of some commit, run the commit through the git patch-ID program (read the linked documentation for details). Then, run potentially matching commits through git patch-ID as well. If they produce the same patch ID, they are equivalent commits, even if they are technically different and therefore have different hash IDs.
A more general, more useful, and more portable way to find out if you have some particular feature requires effort on the part of the maintainers: changelogs, feature tests, and documentation. If something brings new behavior, or new files, or whatever, it should be documented, and in some cases you might want to have, in your programming language, a way to test for the existence of this feature. In python in particular, the core documentation has, for instance, things like this:
subprocess.run(args, *, stdin=None, ...
     ...
New in version 3.5.
Changed in version 3.6: Added encoding and errors parameters
...
You can also use Python constructs like:
try:
import what.ever
except ImportError:
... do whatever you need here ...
and similar tricks, and import sys and inspect sys.version and so on.
The file should be here: tensorflow/lite/interpreter.h
The OS-specific methods for testing the existence of a file in a path depend on the OS, but when using github, you can construct the URL from the file's name knowing the systematic scheme that the GitHub folks use. For instance, https://github.com/git/git/blob/seen/Makefile is the URL to view the version of Makefile at the tip commit of branch seen in the Git repository mirror for Git itself on GitHub.

Creating a local python pip repo

I am tasked to create and use a local PIP repo.
(the reason being that we'll be using Python 2.7 for at least one more year and fear of packages or older versions being removed)
I am looking at bandersnatch and it is not clear to me whether it is an on-line mirroring tool which i need to run as a service, or can be used to offload a one-off copy?
I'd prefer a second option (don't want to complicate the system unnecessarily), and would be satisfied by running an update say daily or even weekly.
An alternative approach would be to download only the packages and version we actually use by looking at the requirements.txt file, but this would require running an update every time a developer wants to add or update a package.
A way to create a local python package repository is throught Sonatype Nexus, with Nexus you can create some kinds of repos:
Hosted repo (our own and internal repo)
Proxy repo (proxy others repo)
Group repo (group and priority sort a list of hosted and proxied repos)
For example, you can create a group repo with the following logic order:
- First search the package in my own repo
- If it not exists, search it on global public repo.
It is a transparent way to your app.
https://help.sonatype.com/repomanager3/formats/pypi-repositories
There is a Docker image if you want too. https://hub.docker.com/r/sonatype/nexus3
I used it before to different purposes and I see it very mature and complete.
a script that generates a simple repository with N recent versions of 4000 most used packages on pypi. Advantage is it can hold multiple versions as in pypi. https://gist.github.com/harisankar-krishna-swamy/cac5d1e6c1ae074b39286c1336bff63d

Automated deployment and update of Python-based Software with pip

We have a relatively large number of machines in a cluster that run a certain Python-based software for computation in an academic research environment. In order to keep the code base up to date, we are using a build server which makes the current code base available in a directory each time we update a dedicated deployment tag on our Mercurial server. Each machine part of the cluster runs a daily rsync script that just synchronises with the deployment directory on the build server (if there's anything to sync) and restart the process if the code base was updated.
Now this approach I find a bit dated and a slight overkill, and would like to optimise in the following way:
Get rid of the build server as all it actually does is clone the latest code base that has a certain tag attached - it doesn't actually compile or do any additional checks (such as testing) on the code base at all. This would also reduce some pain for us as it'd be one less server to maintain and worry about.
Instad of having the build server, I would like to pull straight from our Mercurial server which hosts the code already. This would reduce the need to duplicate the code base each time we update the deployment tag.
Now I had a bit of a read before on how to install / deploy Python-based software with pip (e.g., How to point pip at a Mercurial branch?). It seems to be the right choice as it supports installing packages straight from a code repository. However, I ran into a few problems that I would need help with. The requirements I have are as follows:
Use Mercurial as a source.
Automated background process to update and install into a custom directory on the file system.
Only pull and update from the repository if there is a new version available.
The following command seems to almost do what I need:
pip install -e hg+https://authkey:anypw#mymercurialserver.hostname.com/Code/package#deployment#egg=package --upgrade --src ~/proj
It pulls the package from the Mercurial server, picks the code base with the tag "deployment" and installs it into proj inside the user's home directory.
The problem, however, is that regardless whether there is an update available or not, pip always uninstalls package and reinstalls it. This makes it difficult to decide whether the process needs to be restarted or not if nothing actually changed. In addition, pip always gets stuck with the the message that hg clone in ./proj/yarely exists with URL... and asks me: What to do? (s)witch, (i)gnore, (w)ipe, (b)ackup. Now this is not ideal, as (1) it would be an automated process without user prompt, and, (2) it should only pull the repository if there was an update in the first place to reduce traffic in the network and not overload our Mercurial server. I believe that in this case, a pull instead of clone if there was a local copy of the repository already would be more appropriate and potentially solve the problem.
I wasn't able to find an elegant and nice solution to this problem. Does anyone have a pointer or suggestion how this could be achieved?

Easy-install live python libraries/scripts

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?
I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

Categories

Resources