Is it possible to emulate `git add -A` in GitPython? - python

I've recently discovered GitPython, and, given that I'm currently trying to create a Python script which pushes to and pulls from Git repositories automatically, I was really excited to try it out.
When committing to a repository using command line Git, I call git add -A, pretty much to the exclusion of all other arguments. I know that you can call git add . instead, or add/remove files by name; I've just never felt the need to use that functionality. (Is that bad practice on my part?) However, I've been trying to put together a GitPython script today, and, despite combing through the API reference, I can't find any straightforward way of emulating the git add -A command.
This is a snippet from my efforts so far:
repo = Repo(absolute_path)
repo.index.add("-A")
repo.index.commit("Commit message.")
repo.remotes.origin.push()
This throws the following error: FileNotFoundError: [Errno 2] No such file or directory: '-A'. If, instead, I try to call repo.index.add(), I get: TypeError: add() missing 1 required positional argument: 'items'. I understand that .add() wants me to specify the files I want to add by name, but the whole point of GitPython, from my point of view, is that it's automated! Having to name the files manually defeats the purpose of the module!
Is it possible to emulate git add -A in GitPython? If so, how?

The API you linked goes to a version of GitPython that supports invoking the Git binaries themselves directly, so you could just have it run git add -A for you.
That aside, git add -A means:
Update the index not only where the working tree has a file matching <pathspec> but also where the index already has an entry. This adds, modifies, and removes index entries to match the working tree.
If no <pathspec> is given when -A option is used, all files in the entire working tree are updated (old versions of Git used to limit the update to the current directory and its subdirectories).
So git add -A is just the same as git add . from the top level of the working tree. If you want the old (pre-2.0) git add -A behavior, run git add . from a lower level of the working tree; to get the 2.0-or-later git add -A behavior, run git add . from the top level of the working tree. But see also --no-all:
This option is primarily to help users who are used to older versions of Git, whose "git add <pathspec>…​" was a synonym for "git add --no-all <pathspec>…​", i.e. ignored removed files.
So, if you want the pre-2.0 behavior, you will also need --no-all.
If you intend to do all of these within GitPython without using the git.cmd.Git class, I'll also add that in my experience, the various Python implementations of bits of Git vary in their fidelity to fiddly matters like --no-all (and/or their mapping to pre-2.0 Git, post-2.0 Git, post-2.23 Git, etc.), so if you intend to depend on these behaviors, you should test them.

Related

Removing pycache in git

How can I remove existing and future pycahce files from git repository in Windows? The commands I found online are not working for example when I send the command "git rm -r --cached __pycache__" I get the command "pathspec '__pycache__' did not match any files".
The __pycache__ folders that you are seeing are not in your current and future Git commits. Because of the way Git works internally—which Git forces you to know, at least if you're going to understand it—understanding this is a bit tricky, even once we get past the "directory / folder confusion" we saw in your comments.
The right place to start, I believe, is at the top. Git isn't about files (or even files-and-folders / files-and-directories). Those new to Git see it as storing files, so they think it's about files, but that's just not true. Or, they note the importance of the ideas behind branches, and think that Git is about branches, and that too is not really true, because people confuse one kind of "branch" (that does matter) with branch names (which don't matter). The first thing to know, then, is that Git is really all about commits.
This means that you really need to know:
what a commit is, and
what a commit does for you
(these two overlap but are not identical). We won't really cover what a commit is here, for space reasons, but let's look at the main thing that one does for you: Each commit stores a full snapshot of every file.
We now need a small digression into files and folders and how Git and your OS differ in terms of how they organize files. Your computer insists that a file has a name like file.ext and lives in a folder or directory—the two terms are interchangeable—such as to, which in turn lives in another folder such as path. This produces path/to/file.ext or, on Windows, path\to\file.ext.
Git, by contrast, has only files, and their names always use forward slashes and include the slashes. The file named path/to/file.ext is literally just the file, with that name. But Git does understand that your computer demands the file-in-folder format, and will convert back and forth as needed. If Git needs to extract a file whose name is some/long/file/name.ext, Git will create folders some, some/long, and so on when it must, all automatically.
The strange side effect of this is that because Git stores only the files, not the folders, Git is unable to store an empty folder. This distinction actually occurs in Git's index aka staging area, which we won't get into in any detail, but it explains the problem whose answers are given in How do I add an empty directory to a Git repository?
In any case, commits in Git store files, using these path names. Each commit has a full copy of every file—but the files' contents are stored in a special, Git-ized, read-only, Git-only format in which the contents are de-duplicated. So if a million commits store one particular version of one particular file, there's really only one copy, shared between all million commits. Git can do this kind of sharing because, unlike regular files on your computer, files stored in a commit, in Git, literally can't be changed.
Going back to the commits now: each commit contains a full snapshot of every file (that it had when you, or whoever, made the commit). But these files are read-only—they literally can't have their contents replaced, which is what enables that sharing—and only Git itself can even read them. This makes them useless for actually getting any work done. They're fine as archives, but no good for real work.
The solution to this problem is simple (and the same as in almost all other version control systems): when you select some commit to work on / with, Git will extract the files from that commit. This creates ordinary files, in ordinary folders, in an ordinary area in which you can do your work (whether that's ordinary or substandard or exemplary work—that's all up to you, not to Git 😀). What this means is that you do your work in a working tree or work-tree (Git uses these two terms interchangeably). More importantly, it means this: The files you see and work on / with are not in Git. They may have just been extracted by Git, from some commit. But now they're ordinary files and you use them without Git being aware of what you're doing.
Since Git has extracted these files into ordinary folders, you can create new files and/or new folders if you like. When you run Python programs, Python itself will, at various times, create __pycache__ folders and stuff *.pyc and/or *.pyo files into them. Python does this without Git's knowledge or understanding.
Because these files are generated by Python, based on your source, and just used to speed up Python, it's a good idea to avoid putting them into the commits. There's no need to save a permanent snapshot of these files, especially since the format and contents may depend on the specific Python version (e.g., Python 3.7 generates *.cpython-37.pyc files, Python 3.9 generates *.cpython-39.pyc files, and so on). So we tell Git two things:
Don't complain about the existence of these particular untracked files in the working tree.
When I use an en-masse "add everything" operation like git add ., don't add these files to the index / staging-area, so that they won't go into the next commit either.
We generally do this with the (poorly named) .gitignore file. Listing a file name in a .gitignore does not make Git ignore it; instead, it has the effect of doing the two things I listed here.
This uses the Git-specific term untracked file, which has a simple definition that has a complex back-story. An untracked file is simply any file in your working tree that is not currently in Git's index (staging area). Since we're not going to get into a discussion of Git's index here, we have to stop there for now, but the general idea is that we don't allow the __pycache__ files to get into the index, which keeps them untracked, which keeps Git from committing them, which keeps them from getting into Git's index. It's all a bit circular here, and if you accidentally do get these files into Git's index, that's when you need the git rm -r --cached __pycache__ command.
Since that command is failing, it means you don't have the problem this command is meant to solve. That's good!
Well, you don't need __pycache__ files in your git repositories and you'd better to ignore all related files to it by adding __pycache__/ to your .gitignore file.

How to find out if a commit made it into the stable version (TensorFlow)

for this Git issue I saw that the the gitrepo updated a file for TensorFlow. Now I want to check if the changes can be found in my installation.
I am using conda and installed the specific TensorFlow version in an environment. The file should be here: tensorflow/lite/interpreter.h
However, going down the side package route ~/anaconda3/envs/AI2.6/lib/python3.6/site-packages/tensorflow/lite/, I cannot find the file.
find | grep interpreter in this folder tree gives me
./python/interpreter.py
./python/interpreter_wrapper
./python/interpreter_wrapper/init.py
./python/interpreter_wrapper/pycache
./python/interpreter_wrapper/pycache/init.cpython-36.pyc
./python/interpreter_wrapper/_pywrap_tensorflow_interpreter_wrapper.so
./python/pycache/interpreter.cpython-36.pyc
Could you give me a hint where to find the file, or how to check if a specific commit made it into the stable version of TensorFlow?
Thanks
edit: While typing, I got the answer that the version is in the nightly version, however, it would still be interesting to learn how to find out if a commit made it into a stable release. And why I cannot find the file which should be there.
From the git side, the answer to the question is easy, provided:
that you know the commit's hash ID; and
that the answer you want is is this specific commit in a repository?
The reason for this is that Git commit hash IDs are universally unique. If some repository has some commit, it has that hash ID, in that repository and in every other repository. So you just inspect the repository to see if it has that commit, with that hash ID, and you're done.
In practice—since you've scattered this across a wide rang of tags (I plucked off the linux one since we're not talking about Linux programming APIs here)—this answer isn't useful, not even in the git arena, because commits get copied and modified, and the new-and-improved—or older and worsened, or whatever—version of some commit will have a different hash ID. You often care whether you have some version of some commit, rather than some specific commit.
For this other purpose ("do I have some version of this commit?"), you can sometimes use what Git calls a patch-ID. To find the patch ID of some commit, run the commit through the git patch-ID program (read the linked documentation for details). Then, run potentially matching commits through git patch-ID as well. If they produce the same patch ID, they are equivalent commits, even if they are technically different and therefore have different hash IDs.
A more general, more useful, and more portable way to find out if you have some particular feature requires effort on the part of the maintainers: changelogs, feature tests, and documentation. If something brings new behavior, or new files, or whatever, it should be documented, and in some cases you might want to have, in your programming language, a way to test for the existence of this feature. In python in particular, the core documentation has, for instance, things like this:
subprocess.run(args, *, stdin=None, ...
     ...
New in version 3.5.
Changed in version 3.6: Added encoding and errors parameters
...
You can also use Python constructs like:
try:
import what.ever
except ImportError:
... do whatever you need here ...
and similar tricks, and import sys and inspect sys.version and so on.
The file should be here: tensorflow/lite/interpreter.h
The OS-specific methods for testing the existence of a file in a path depend on the OS, but when using github, you can construct the URL from the file's name knowing the systematic scheme that the GitHub folks use. For instance, https://github.com/git/git/blob/seen/Makefile is the URL to view the version of Makefile at the tip commit of branch seen in the Git repository mirror for Git itself on GitHub.

python, project structure to test separate git library

Suppose I have a private git repo(A library repo) which I want to use from my project. (B project repo)
I clone A to my ~/workspace/A
and I work on my project at ~/workspace/B
B 's virtualenv resides in ~/virtualenvs/B
In order to modify A and test the modified from B,
modify A
commit push to A's origin
pip install A git+http://A-repository
Which is very time consuming.. Can I reduce the above steps to
modify A
by placing A inside somewhere in project B's virtualenv?
and commit & push only after I test the modified A from B?
** Edit
I could think of two ways and wonder if there's better way.
add A as a git submodule to B somewhere (which is in python module path) under ~/workspace/B
: I just didn't like submodule whenever I used it.. hard to grasp and manage them.
add ~/workspace/parent-of-A/ to python-path before virutalenv python-path
So when I edit ~/workspace/parent-of-A/A, it is readily seen by B.
And production server and other people who don't modify A could use pip install-ed version in virtualenv.

Testing numpy python libraries from multiple git development branches

I'm trying to develop a few enhancements for the numpy library. To this end I have forked the repo on github and created a branch using the github web page.
Next I ran the following commands:
$ git clone https://github.com/staticd-growthecommons/numpy.git
$ cd numpy/
$ git remote add https://github.com/numpy/numpy.git
$ git remote add upstream https://github.com/numpy/numpy.git
$ git branch -a
* master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
[....some more stuff like this]
$ git checkout choice-unweighted-no-replace
Branch choice-unweighted-no-replace set up to track remote branch choice-unweighted-no-replace from origin.
Switched to a new branch 'choice-unweighted-no-replace'
$ git branch -a
* choice-unweighted-no-replace
master
remotes/origin/HEAD -> origin/master
remotes/origin/choice-unweighted-no-replace
remotes/origin/enable_separate_by_default
remotes/origin/maintenance/1.0.3.x
OK here my n00bness begins to shine like a thousand splendid suns. Despite reading all the tutorials I could find I'm still not sure what I'm supposed to do now.
What I want to achieve is this:
I want to add/modify three new algorithms to the random library in numpy. Am I correct in assuming that since they are three separate unrelated enhancements, the correct way to go about this is to make three parallel branches based on the master? (and then submit pull requests for each branch so they can be reviewed independently)
Once I have run the commands show above, do I just go about editing the source files found in the numpy directory? Will they automatically be joined to the choice-unweighted-no-replace branch?
Can I switch to another branch for a while to work on another feature before I commit changes and push the current branch to the repo?
What is the best way to test each of these branches? I couldn't figure out how to use virtualenv with git.
Is it possible to import the libraries from two branches into a single python program? like import branch1.numpy, branch2.numpy or something like that
Update: partial answer figured out:
At least for testing numpy, it's fairly trivial: just run ./runtests.py -i from the numpy directory. It builds numpy and opens a ipython shell with the PYTHONPATH set. If you now do import numpy it imports the development branch in that directory.
To test multiple branches, just make copies of the git folder and checkout a different branch in each. Then you can open IPython shells for each branch.
First and foremost I strongly recommend the Git Pro book. It should answer most of your questions that you will have later on.
Yes, it is good practice to separate work on different topics in different branches. That way you can make a pull request later that will only cover the code involved in adding/changing this functionality.
Git works by with something called an index. Merely changing a file does not automatically save the file on a branch, you have to tell git that you want to save it. To do so you first need to stage a file, and later make a commit.
git add modifiedfile
git commit -m "A message about my changes"
This will add a new commit to the current branch you are at. If you want to make a commit on a different branch, you need to switch a branch first.
git checkout branchname
If you want to create a new branch and switch
git checkout -b branchname
You can switch between branches any time, but you should save your work first. You can make a commit which you will later reset, or stash.
Not really familiar with virtualenv, so maybe you should make a separate question.
To do this, you would have 2 repositories in 2 different directories. One will have the first branch checked out and the other would have the second one. This way your script will be able to use both libraries.

Git: Merge one folder inside a repo

I have an unusual need, and I'm wondering whether Git could fill it.
I want to port my Python package, python_toolbox to Python 3. But I don't like the idea of using 2to3, nor supporting both Python 2 and Python 3 using the same code. (Because it's important for me that my code will be beautiful, and I don't find code written for both Python 2 and Python 3 to be beautiful.)
What I want is to have 2 separate source folders, one for Python 2.x and one for Python 3.x. This will allow me to write each version of the code tailored to the respective major Python version. I want both folders to be in the same repo, and setup.py will choose between them dynamically depending on the version of Python running it. So far so good.
Now, here is where I need help: I want to be able to do merges from my Python 2.x source folder to my Python 3.x source folder. Why? When I develop a feature on the Python 2.x folder, I want have those feature on the Python 3.x version too. I don't want to copy them manually. I want to merge them into the Python 3.x folder, and I fully expect to have wonderful merge fails where I'll have to use my judgement to decide how to merge features that were implemented for Python 2.x into code that was modified for Python 3.x.
The question is: How can I do that? Those folders are folders inside a Git repo, they're not Git repos themselves. I thought about using Git submodules, which I've never used before, but reading about them online paints a scary picture. (The term "sobmodules" had been thrown around.)
Any other ideas how I could merge between these folders in my Git repo?
I recommend you using branches. Dedicate your branches to either version. You may use git branch --orphan to create a fully independent branch. (That may make merging harder, as git wont't be able to find a common ancestor.)
Anyway, if you go with that solution you will be able to merge from one version into another. You will also be able to clone both versions in one command (as they are in the same repo).
However, to able to have both versions open at the same time, you will need to have the repo cloned two times so you can have two different branches checked out at the same time.
You could create branches by having the two versions in separate repositories and use the other one as a remote. The toplevel dir with setup.py and any PyPi meta information, readme's, etc, would also be a repository. The directory layout would look like this:
/root/
.git/
setup.py
read.me
python2/
.git/
source.py
python3/
.git/
source.py
The two sub repositories can be linked so that you can merge between them with e.g.
cd /root/python2
git remote add python3 ../python3
cd /root/python3
git remote add python2 ../python2
Then you can do the usual git fetch, cherry-pick, or even merge between them.
In the main repo, and for releasing things, you use the git submodules feature to coordinate which version of the individual sub repositories you'd like to have checkedout to have a consistent view of the project.
There's lots of stuff in the internet on git's submodules. I'd start with this question on nested repos and work your way through the links and docs.
Here's an explanation of subtree merges and compares it to working with submodules. Basically, subtree merges would combine the idea of having ordinary branches for Py2 and Py3 (like in the answer by Oznerol256) in one repo, with the idea of having a hierarchically organized repo.

Categories

Resources