Git: Merge one folder inside a repo - python

I have an unusual need, and I'm wondering whether Git could fill it.
I want to port my Python package, python_toolbox to Python 3. But I don't like the idea of using 2to3, nor supporting both Python 2 and Python 3 using the same code. (Because it's important for me that my code will be beautiful, and I don't find code written for both Python 2 and Python 3 to be beautiful.)
What I want is to have 2 separate source folders, one for Python 2.x and one for Python 3.x. This will allow me to write each version of the code tailored to the respective major Python version. I want both folders to be in the same repo, and setup.py will choose between them dynamically depending on the version of Python running it. So far so good.
Now, here is where I need help: I want to be able to do merges from my Python 2.x source folder to my Python 3.x source folder. Why? When I develop a feature on the Python 2.x folder, I want have those feature on the Python 3.x version too. I don't want to copy them manually. I want to merge them into the Python 3.x folder, and I fully expect to have wonderful merge fails where I'll have to use my judgement to decide how to merge features that were implemented for Python 2.x into code that was modified for Python 3.x.
The question is: How can I do that? Those folders are folders inside a Git repo, they're not Git repos themselves. I thought about using Git submodules, which I've never used before, but reading about them online paints a scary picture. (The term "sobmodules" had been thrown around.)
Any other ideas how I could merge between these folders in my Git repo?

I recommend you using branches. Dedicate your branches to either version. You may use git branch --orphan to create a fully independent branch. (That may make merging harder, as git wont't be able to find a common ancestor.)
Anyway, if you go with that solution you will be able to merge from one version into another. You will also be able to clone both versions in one command (as they are in the same repo).
However, to able to have both versions open at the same time, you will need to have the repo cloned two times so you can have two different branches checked out at the same time.

You could create branches by having the two versions in separate repositories and use the other one as a remote. The toplevel dir with setup.py and any PyPi meta information, readme's, etc, would also be a repository. The directory layout would look like this:
/root/
.git/
setup.py
read.me
python2/
.git/
source.py
python3/
.git/
source.py
The two sub repositories can be linked so that you can merge between them with e.g.
cd /root/python2
git remote add python3 ../python3
cd /root/python3
git remote add python2 ../python2
Then you can do the usual git fetch, cherry-pick, or even merge between them.
In the main repo, and for releasing things, you use the git submodules feature to coordinate which version of the individual sub repositories you'd like to have checkedout to have a consistent view of the project.
There's lots of stuff in the internet on git's submodules. I'd start with this question on nested repos and work your way through the links and docs.
Here's an explanation of subtree merges and compares it to working with submodules. Basically, subtree merges would combine the idea of having ordinary branches for Py2 and Py3 (like in the answer by Oznerol256) in one repo, with the idea of having a hierarchically organized repo.

Related

Python - replace specific file in module

When someone installs my github repo. I'd like it to install specific files that override files from another module. There's a public library, but there are a couple bugs we want to fix, and the creator doesn't want to merge in a pull request.
I could copy the ENTIRE library into our repo, and that does work. But that seems cumbersome for 5 lines of changes in 2 files.
Is there any easy way to replace specific module files with a different version when my github repo gets installed? If I delete anything but the changed files, I get import errors.
Ideally, the user would install the public library, install our project, and the bugfixes would be implemented (at least when running scripts from our project..)

Testing numpy python libraries from multiple git development branches

I'm trying to develop a few enhancements for the numpy library. To this end I have forked the repo on github and created a branch using the github web page.
Next I ran the following commands:
$ git clone https://github.com/staticd-growthecommons/numpy.git
$ cd numpy/
$ git remote add https://github.com/numpy/numpy.git
$ git remote add upstream https://github.com/numpy/numpy.git
$ git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
[....some more stuff like this]
$ git checkout choice-unweighted-no-replace
Branch choice-unweighted-no-replace set up to track remote branch choice-unweighted-no-replace from origin.
Switched to a new branch 'choice-unweighted-no-replace'
$ git branch -a
* choice-unweighted-no-replace
master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
OK here my n00bness begins to shine like a thousand splendid suns. Despite reading all the tutorials I could find I'm still not sure what I'm supposed to do now.
What I want to achieve is this:
I want to add/modify three new algorithms to the random library in numpy. Am I correct in assuming that since they are three separate unrelated enhancements, the correct way to go about this is to make three parallel branches based on the master? (and then submit pull requests for each branch so they can be reviewed independently)
Once I have run the commands show above, do I just go about editing the source files found in the numpy directory? Will they automatically be joined to the choice-unweighted-no-replace branch?
Can I switch to another branch for a while to work on another feature before I commit changes and push the current branch to the repo?
What is the best way to test each of these branches? I couldn't figure out how to use virtualenv with git.
Is it possible to import the libraries from two branches into a single python program? like import branch1.numpy, branch2.numpy or something like that
Update: partial answer figured out:
At least for testing numpy, it's fairly trivial: just run ./runtests.py -i from the numpy directory. It builds numpy and opens a ipython shell with the PYTHONPATH set. If you now do import numpy it imports the development branch in that directory.
To test multiple branches, just make copies of the git folder and checkout a different branch in each. Then you can open IPython shells for each branch.
First and foremost I strongly recommend the Git Pro book. It should answer most of your questions that you will have later on.
Yes, it is good practice to separate work on different topics in different branches. That way you can make a pull request later that will only cover the code involved in adding/changing this functionality.
Git works by with something called an index. Merely changing a file does not automatically save the file on a branch, you have to tell git that you want to save it. To do so you first need to stage a file, and later make a commit.
git add modifiedfile
git commit -m "A message about my changes"
This will add a new commit to the current branch you are at. If you want to make a commit on a different branch, you need to switch a branch first.
git checkout branchname
If you want to create a new branch and switch
git checkout -b branchname
You can switch between branches any time, but you should save your work first. You can make a commit which you will later reset, or stash.
Not really familiar with virtualenv, so maybe you should make a separate question.
To do this, you would have 2 repositories in 2 different directories. One will have the first branch checked out and the other would have the second one. This way your script will be able to use both libraries.

Contributing to a repository on GitHub on a new branch

Say someone owns a repository with only one master hosting code that is compatible with Python 2.7.X. I would like to contribute to that repository with my own changes to a new branch new_branch to offer a variant of the repository that is compatible with Python 3.
I followed the steps here:
I forked the repository on GitHub on my account
I cloned my fork on my local machine
I created a new branch new_branch locally
I made the relevant changes
I committed and pushed the changes to my own fork on GitHub
I went on the browser to the GitHub page of the official repository, and asked for a pull request
The above worked, but it did a pull request from "my_account:new_branch" to "official_account:master". This is not what I want, since Python 2.7.x and Python 3 are incompatible with each other. What I would like to do is create a PR to a new branch on the official repository (e.g. with the same name "new_branch"). How can I do that? Is this possible at all?
You really don't want to do things this way. But first I'll explain how to do it, then I'll come back to explain why not to.
Using Pull Requests at GitHub has a pretty good overview, in particular the section "Changing the branch range and destination repository." It's easiest if you use a topic branch, and have the upstream owner create a topic branch of the same name; then you just pull down the menu where it says "base: master" and the choice will be right there, and he can just click the "merge" button and have no surprises.
So, why don't you want to do things this way?
First, it doesn't fit the GitHub model. Topic branches that live forever in parallel with the master branch and have multiple forks make things harder to maintain and visualize.
Second, you need both a git URL and an https URL for you code. You need people to be able to share links, pip install from top of tree, just clone the repo instead of cloning and then checking out a different branch, etc. This all means your code has to be on the master branch.
Third, if you want people to be able to install your 3.x version off PyPI, find docs at readthedocs, etc., you need a single project with a single source tree. Most such sites have a single latest version, not a latest version for each Python version, and definitely not multiple variations of the same version. (You could install completely fork the project, and create a separate foo3 project. But it's much easier for people to be able to pip install foo than to have them try that, fail, come to SO and ask why it doesn't work, and get told they probably have Python 3 and need to pip install foo3 instead.)
How do you merge two versions into a single package? The porting docs should have the most up-to-date advice, but briefly: If it's at all possible to create a single codebase that runs on both versions, that's ideal; if not, and if you can't make things work by running 2to3 or 3to2 at install time, create a parallel directory for the 3.x code (e.g., a foo3 alongside foo) and pick the appropriate directory at install time. (You can always start with that and gradually work toward a unified codebase.)

Locally modify package from pip

I've locally installed via pip Python package in virtualenv. I'd like to modify it (not monkey patch or subclass, but deeply modify) and keep it in my source control system referencing without installing. Maybe later I'd like to package it again so I'd like to keep all files for creating package, not only python sources.
Should I just copy it to my project folder and deinstall from virtualenv?
Two points. One, are the changes you're planning to make useful for anyone else? If the first, you might consider cloning the source repo, making your changes and submitting a PR. Even if it's not immediately merged, you can make use of setup.py to create a local package and install that in your virtualenv.
And two, are you planning to use these changes for just one project, or on many projects? If it's just for one project, throwing it in your repo and deeply modifying it is probably an ok thing, (although you need to confirm you're allowed to do so by the license). If you can foresee using this in multiple projects, you're probably better off creating a repo for it, and packaging it via setup.py.

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

Categories

Resources