Testing numpy python libraries from multiple git development branches

Testing numpy python libraries from multiple git development branches - python

I'm trying to develop a few enhancements for the numpy library. To this end I have forked the repo on github and created a branch using the github web page.
Next I ran the following commands:
$ git clone https://github.com/staticd-growthecommons/numpy.git
$ cd numpy/
$ git remote add https://github.com/numpy/numpy.git
$ git remote add upstream https://github.com/numpy/numpy.git
$ git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
[....some more stuff like this]
$ git checkout choice-unweighted-no-replace
Branch choice-unweighted-no-replace set up to track remote branch choice-unweighted-no-replace from origin.
Switched to a new branch 'choice-unweighted-no-replace'
$ git branch -a
* choice-unweighted-no-replace
master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
OK here my n00bness begins to shine like a thousand splendid suns. Despite reading all the tutorials I could find I'm still not sure what I'm supposed to do now.
What I want to achieve is this:
I want to add/modify three new algorithms to the random library in numpy. Am I correct in assuming that since they are three separate unrelated enhancements, the correct way to go about this is to make three parallel branches based on the master? (and then submit pull requests for each branch so they can be reviewed independently)
Once I have run the commands show above, do I just go about editing the source files found in the numpy directory? Will they automatically be joined to the choice-unweighted-no-replace branch?
Can I switch to another branch for a while to work on another feature before I commit changes and push the current branch to the repo?
What is the best way to test each of these branches? I couldn't figure out how to use virtualenv with git.
Is it possible to import the libraries from two branches into a single python program? like import branch1.numpy, branch2.numpy or something like that
Update: partial answer figured out:
At least for testing numpy, it's fairly trivial: just run ./runtests.py -i from the numpy directory. It builds numpy and opens a ipython shell with the PYTHONPATH set. If you now do import numpy it imports the development branch in that directory.
To test multiple branches, just make copies of the git folder and checkout a different branch in each. Then you can open IPython shells for each branch.

First and foremost I strongly recommend the Git Pro book. It should answer most of your questions that you will have later on.
Yes, it is good practice to separate work on different topics in different branches. That way you can make a pull request later that will only cover the code involved in adding/changing this functionality.
Git works by with something called an index. Merely changing a file does not automatically save the file on a branch, you have to tell git that you want to save it. To do so you first need to stage a file, and later make a commit.
git add modifiedfile
git commit -m "A message about my changes"
This will add a new commit to the current branch you are at. If you want to make a commit on a different branch, you need to switch a branch first.
git checkout branchname
If you want to create a new branch and switch
git checkout -b branchname
You can switch between branches any time, but you should save your work first. You can make a commit which you will later reset, or stash.
Not really familiar with virtualenv, so maybe you should make a separate question.
To do this, you would have 2 repositories in 2 different directories. One will have the first branch checked out and the other would have the second one. This way your script will be able to use both libraries.

Related

Is it possible to emulate `git add -A` in GitPython?

I've recently discovered GitPython, and, given that I'm currently trying to create a Python script which pushes to and pulls from Git repositories automatically, I was really excited to try it out.
When committing to a repository using command line Git, I call git add -A, pretty much to the exclusion of all other arguments. I know that you can call git add . instead, or add/remove files by name; I've just never felt the need to use that functionality. (Is that bad practice on my part?) However, I've been trying to put together a GitPython script today, and, despite combing through the API reference, I can't find any straightforward way of emulating the git add -A command.
This is a snippet from my efforts so far:
repo = Repo(absolute_path)
repo.index.add("-A")
repo.index.commit("Commit message.")
repo.remotes.origin.push()
This throws the following error: FileNotFoundError: [Errno 2] No such file or directory: '-A'. If, instead, I try to call repo.index.add(), I get: TypeError: add() missing 1 required positional argument: 'items'. I understand that .add() wants me to specify the files I want to add by name, but the whole point of GitPython, from my point of view, is that it's automated! Having to name the files manually defeats the purpose of the module!
Is it possible to emulate git add -A in GitPython? If so, how?

The API you linked goes to a version of GitPython that supports invoking the Git binaries themselves directly, so you could just have it run git add -A for you.
That aside, git add -A means:
Update the index not only where the working tree has a file matching <pathspec> but also where the index already has an entry. This adds, modifies, and removes index entries to match the working tree.
If no <pathspec> is given when -A option is used, all files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).
So git add -A is just the same as git add . from the top level of the working tree. If you want the old (pre-2.0) git add -A behavior, run git add . from a lower level of the working tree; to get the 2.0-or-later git add -A behavior, run git add . from the top level of the working tree. But see also --no-all:
This option is primarily to help users who are used to older versions of Git, whose "git add <pathspec>…" was a synonym for "git add --no-all <pathspec>…", i.e. ignored removed files.
So, if you want the pre-2.0 behavior, you will also need --no-all.
If you intend to do all of these within GitPython without using the git.cmd.Git class, I'll also add that in my experience, the various Python implementations of bits of Git vary in their fidelity to fiddly matters like --no-all (and/or their mapping to pre-2.0 Git, post-2.0 Git, post-2.23 Git, etc.), so if you intend to depend on these behaviors, you should test them.

python, project structure to test separate git library

Suppose I have a private git repo(A library repo) which I want to use from my project. (B project repo)
I clone A to my ~/workspace/A
and I work on my project at ~/workspace/B
B 's virtualenv resides in ~/virtualenvs/B
In order to modify A and test the modified from B,
modify A
commit push to A's origin
pip install A git+http://A-repository
Which is very time consuming.. Can I reduce the above steps to
modify A
by placing A inside somewhere in project B's virtualenv?
and commit & push only after I test the modified A from B?
** Edit
I could think of two ways and wonder if there's better way.
add A as a git submodule to B somewhere (which is in python module path) under ~/workspace/B
: I just didn't like submodule whenever I used it.. hard to grasp and manage them.
add ~/workspace/parent-of-A/ to python-path before virutalenv python-path
So when I edit ~/workspace/parent-of-A/A, it is readily seen by B.
And production server and other people who don't modify A could use pip install-ed version in virtualenv.

Contributing to a repository on GitHub on a new branch

Say someone owns a repository with only one master hosting code that is compatible with Python 2.7.X. I would like to contribute to that repository with my own changes to a new branch new_branch to offer a variant of the repository that is compatible with Python 3.
I followed the steps here:
I forked the repository on GitHub on my account
I cloned my fork on my local machine
I created a new branch new_branch locally
I made the relevant changes
I committed and pushed the changes to my own fork on GitHub
I went on the browser to the GitHub page of the official repository, and asked for a pull request
The above worked, but it did a pull request from "my_account:new_branch" to "official_account:master". This is not what I want, since Python 2.7.x and Python 3 are incompatible with each other. What I would like to do is create a PR to a new branch on the official repository (e.g. with the same name "new_branch"). How can I do that? Is this possible at all?

You really don't want to do things this way. But first I'll explain how to do it, then I'll come back to explain why not to.
Using Pull Requests at GitHub has a pretty good overview, in particular the section "Changing the branch range and destination repository." It's easiest if you use a topic branch, and have the upstream owner create a topic branch of the same name; then you just pull down the menu where it says "base: master" and the choice will be right there, and he can just click the "merge" button and have no surprises.
So, why don't you want to do things this way?
First, it doesn't fit the GitHub model. Topic branches that live forever in parallel with the master branch and have multiple forks make things harder to maintain and visualize.
Second, you need both a git URL and an https URL for you code. You need people to be able to share links, pip install from top of tree, just clone the repo instead of cloning and then checking out a different branch, etc. This all means your code has to be on the master branch.
Third, if you want people to be able to install your 3.x version off PyPI, find docs at readthedocs, etc., you need a single project with a single source tree. Most such sites have a single latest version, not a latest version for each Python version, and definitely not multiple variations of the same version. (You could install completely fork the project, and create a separate foo3 project. But it's much easier for people to be able to pip install foo than to have them try that, fail, come to SO and ask why it doesn't work, and get told they probably have Python 3 and need to pip install foo3 instead.)
How do you merge two versions into a single package? The porting docs should have the most up-to-date advice, but briefly: If it's at all possible to create a single codebase that runs on both versions, that's ideal; if not, and if you can't make things work by running 2to3 or 3to2 at install time, create a parallel directory for the 3.x code (e.g., a foo3 alongside foo) and pick the appropriate directory at install time. (You can always start with that and gradually work toward a unified codebase.)

pulling and integrating remote changes with pygit2

I do have the following problem. I'm writing a script which searches a folder for repositories, looks up the remotes on the net and pulls all new data into the repository, notifying me about new changes. The main idea is clear. I'm using python 2.7 on Windows 7 x64, using pygit2 to access the git features. The command-line supports the simple command "git pull 'origin'", but the git api is more complicated and I don't see the way. Okay, I came that far:
import pygit2
orepository=pygit2.Repository("path/to/repository/.git")
oremote=repo.remotes[0]
result=oremote.fetch()
This code retrieves the new objects and downloads it into the repository, but doesn't update the master branch or check the new data out. By inspecting the repository with TortoiseGit I see that nothing way checked out , even the new log messages don't appear when showing the log. I need to use the git pull command to refresh the repository and working copy at all. Now my question: What do I need to do to do all that by using pygit2? I mean, I download the changes by fetching them, but what do I need to do then? I want to update the master branch and working copy too...
Thank you in advance for helping me with my problem.
Best Regards.

Remote.fetch() does not update the files in the workdir because that's very far from its job. If you want to update the current branch and checkout those files, you need to also perform those steps, via Repository.create_reference() or Reference.target= depending on what data you have at the time, and then e.g. Repository.checkout_head() if you did decide to update.
git-pull is a script that performs very many different steps depending on the configuration and flags passed. When you're writing a tool to simulate it over multiple repositories, you need to figure out what it is that you want to do, rather than hoping everything is set up just so that git-pull won't surprise you.

Git: Merge one folder inside a repo

I have an unusual need, and I'm wondering whether Git could fill it.
I want to port my Python package, python_toolbox to Python 3. But I don't like the idea of using 2to3, nor supporting both Python 2 and Python 3 using the same code. (Because it's important for me that my code will be beautiful, and I don't find code written for both Python 2 and Python 3 to be beautiful.)
What I want is to have 2 separate source folders, one for Python 2.x and one for Python 3.x. This will allow me to write each version of the code tailored to the respective major Python version. I want both folders to be in the same repo, and setup.py will choose between them dynamically depending on the version of Python running it. So far so good.
Now, here is where I need help: I want to be able to do merges from my Python 2.x source folder to my Python 3.x source folder. Why? When I develop a feature on the Python 2.x folder, I want have those feature on the Python 3.x version too. I don't want to copy them manually. I want to merge them into the Python 3.x folder, and I fully expect to have wonderful merge fails where I'll have to use my judgement to decide how to merge features that were implemented for Python 2.x into code that was modified for Python 3.x.
The question is: How can I do that? Those folders are folders inside a Git repo, they're not Git repos themselves. I thought about using Git submodules, which I've never used before, but reading about them online paints a scary picture. (The term "sobmodules" had been thrown around.)
Any other ideas how I could merge between these folders in my Git repo?

I recommend you using branches. Dedicate your branches to either version. You may use git branch --orphan to create a fully independent branch. (That may make merging harder, as git wont't be able to find a common ancestor.)
Anyway, if you go with that solution you will be able to merge from one version into another. You will also be able to clone both versions in one command (as they are in the same repo).
However, to able to have both versions open at the same time, you will need to have the repo cloned two times so you can have two different branches checked out at the same time.

You could create branches by having the two versions in separate repositories and use the other one as a remote. The toplevel dir with setup.py and any PyPi meta information, readme's, etc, would also be a repository. The directory layout would look like this:
/root/
.git/
setup.py
read.me
python2/
.git/
source.py
python3/
.git/
source.py
The two sub repositories can be linked so that you can merge between them with e.g.
cd /root/python2
git remote add python3 ../python3
cd /root/python3
git remote add python2 ../python2
Then you can do the usual git fetch, cherry-pick, or even merge between them.
In the main repo, and for releasing things, you use the git submodules feature to coordinate which version of the individual sub repositories you'd like to have checkedout to have a consistent view of the project.
There's lots of stuff in the internet on git's submodules. I'd start with this question on nested repos and work your way through the links and docs.
Here's an explanation of subtree merges and compares it to working with submodules. Basically, subtree merges would combine the idea of having ordinary branches for Py2 and Py3 (like in the answer by Oznerol256) in one repo, with the idea of having a hierarchically organized repo.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.