GitPython How to clone all branches?

GitPython How to clone all branches? - python

I found this Bash script, which accomplishes the same task but in Bash.
Working with GitPython to create a complete backup of a repository, can't seem to figure out how to get all the branches to show without having to manual checkout them.
Repo.clone_from(repo, folder)
I'm using this line in a loop to copy all the files from the repos.

I have recently stumbled upon this thread, but have not found any solution, so I have brewed my own. It is better than nothing, neither optimal nor pretty, but at least it works:
# Clone master
repo = Repo.clone_from('https://github.com/user/stuff.git', 'work_dir', branch='master')
# Clone the other branches as needed and setup them tracking the remote
for b in repo.remote().fetch():
repo.git.checkout('-B', b.name.split('/')[1], b.name)
Explanation:
Clone the master branch
Fetch the remote branch names
Checkout branches and create them as needed
PS: I think Gitpython has not got a built-in method to do this with clone_from.
UPDATE:
no_single_branch=True option in GitPyhton is equal to --no-single-branch in git CLI (other 'valueless' arguments can also supplied with True value)
repo = Repo.clone_from('https://github.com/user/stuff.git', 'work_dir', branch='master', no_single_branch=True)

for b in Repo.clone_from(repo, folder).remote[0].fetch():
print(b.name)

Related

Finding first and last commit for every subfolder in directory

I need to loop over every folder in a directory and find the user responsible for the first and last commit. Is there any smart way to do this in git bash? I tried looking into this with the subprocess module in Python, and using that to loop through the folders, but not sure that is a good approach
What I have tried is
git log -- path/to/folder: This solution just lists all commits to that subfolder. But I wish to filter only the first and last commit. I also wish to loop through all folders in the directory
The replies in this stackoverflow thread link: They didn't seem to work for me (either printing nothing, or giving an error)

Assuming you are interested in the current branch only, you can get the first commit via Git Bash with
git rev-list HEAD -- path/to/folder | tail -1
and the last commit with
git rev-list HEAD -- path/to/folder | head -1
git rev-list is similar to git log, but it is a "plumbing" command. "Plumbing" commands are a bit less user-friendly than "porcelain" commands like git log, but they are guaranteed to behave consistently regardless of your personal settings whereas "porcelain" commands may have different output depending on your config. Because of this, it's usually a good idea to use "plumbing" commands when writing scripts/programs.
git rev-list returns only the commit hash by default, but you can use --pretty/--format options similar to git log.
head and tail take a longer input—in this case, the entire list of commits for a path—and return only the first/last n lines, where n is whatever number you give as the parameter. git log and git rev-list show the most recent commit first, so you need tail to get the first commit and head to get the last.
You could also get the last commit using
git rev-list HEAD -1 -- path/to/folder
without piping to head. However, you cannot get the first commit using Git's built-in commit-limiting options, because e.g.
git rev-list HEAD --reverse -1 -- path/to/folder
applies the -1 limiter first, returning only the last commit, before applying --reverse.
Finally, it's worth noting that Git doesn't truly track directories, only files. If you create a folder with no files in it, it's not possible to commit that folder, and if you delete all the files within a folder, then as far as Git is concerned that folder doesn't exist anymore. The upshot is: these commands will get you the first and last commits that touch any file within the directory (and its subdirectories) as opposed to the directory itself. This distinction may or may not be important for your scenario.

I solved my issue with subprocess in the end
import subprocess
import os
dir_path = os.path.normpath('C:/folder_path')
for f in os.listdir(dir_path):
subpath = os.path.join(dir_path, f)
subprocess_args = ['git', 'log', "--pretty=format:{'author': '%aN', 'date': '%as', 'email': '%ce'}", subpath]
commits = subprocess.check_output(subprocess_args).decode().split('\n')
print(f'{f} -- first: {commits[-1]}, last: {commits[0]}')

Using python to stash git changes, switch branches, commit files, switch back and undo stash

I am about to automate adding a large number of files to a specific branch of my git repository. I want to be certain I'm not about to cause major problems for myself.
The issue is I have a code base with which I run several hundreds of experiments. I want the results to be automatically stored to their own branch, while leaving the master branch unaffected (i.e. The master branch will NOT track experimental results). I am not as familiar with the stash command as I'd like to be, and want to be certain I'm using it correctly.
import subprocess
from git import Repo
# Stash changes and switch to result_branch
subprocess.run(["git", "stash"])
subprocess.run(["git", "checkout", "result_branch"], check=True)
#add_results_to_repo() -- Calls method that finds result files and uses Repo.git.add to add them to repo
#git_commit() -- Calls method that uses Repo.git.commit to commit branches
# Return to master branch and undo stash
subprocess.run(["git", "checkout", "master"], check=True)
subprocess.run(["git", "stash", "pop"], check=True)
I use subprocess to switch branches, because I had trouble using Repo. I use subprocess to stash, because I'm lazy and I'm familiar with subprocess.run. Perhaps:
repo.git.stash() # To create stash, and
repo.git.stash('pop') # To restore stash??
Is the approach I'm taking a valid one, or do I risk causing all sorts of repository problems for myself?

call python code of different git branch other than the current repository without switching branch

So, basically I have 2 versions of a project and for some users, I want to use the latest version while for others, I want to use older version. Both of them have same file names and multiple users will use it simultaneously. To accomplish this, I want to call function from different git branch without actually switching the branch.
Is there a way to do so?
for eg., when my current branch is v1 and the other branch is v2; depending on the value of variable flag, call the function
if flag == 1:
# import function f1() from branch v2
return f1()
else:
# use current branch v1

Without commenting on why you need to do that, you can simply checkout your repo twice: once for branch1, and one for branch2 (without cloning twice).
See "git working on two branches simultaneously".
You can then make your script aware of its current path (/path/to/branch1), and relative path to the other branch (../branch2/...)

You must have both versions of the code present / accessible in order to invoke both versions of the code dynamically.
The by-far-simplest way to accomplish this is to have both versions of the code present in different locations, as in VonC's answer.
Since Python is what it is, though, you could dynamically extract specific versions of specific source files, compile them on the fly (using dynamic imports and temporary files, or exec and internal strings), and hence run code that does not show up in casual perusal of the program source. I do not encourage this approach: it is difficult (though not very difficult) and error-prone, tends towards security holes, and is overall a terrible way to work unless you're writing something like a Python debugger or IDE. But if this is what you want to do, you simply decompose the problem into:
examine and/or extract specific files from specific commits (git show, git cat-file -p, etc.), and
dynamically load or execute code from file in file system or from string in memory.
The first is a Git programming exercise (and is pretty trivial, git show 1234567:foo.py or git show branch:foo.py: you can redirect the output to a file using either shell redirection or Python's subprocess module), and when done with files, the second is a Python programming exercise of moderate difficulty: see the documentation, paying particularly close attention to importlib.

How to pull with GitPython?

I am using GitPython to clone a repository from a Gitlab server.
git.Repo.clone_from(gitlab_ssh_URL, local_path)
Later I have another script that tries to update this repo.
try:
my_repo = git.Repo(local_path)
my_repo .remotes.origin.pull()
except (git.exc.InvalidGitRepositoryError, git.exc.NoSuchPathError):
print("Invalid repository: {}".format(local_path)
This working great except if in-between I checkout a tag like this:
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo .head.reference = my_repo.tags[tag_id]
my_repo .head.reset(index=True, working_tree=True)
In this case I get a GitCommandError when pulling:
git.exc.GitCommandError: 'git pull -v origin' returned with exit code 1
I read the documentation twice already and I don't see where is the problème. Especially since if I try to Pull this repo with a dedicate tool like SourceTree it is working without error or warning.
I don't understand how the fact that I checked out a tagged version even with a detached HEAD prevent me from pulling.
What should I do to pull in this case?
What is happening here and what do I miss?
Edit : as advise I tried to look in the exception.stdout and exception.sterr and there is nothing useful here (respectively b'' and None). That's why I have a hard time understanding what's wrong.

I think it's good idea is to learn more about what is happening first (question 2: what is going on?), and that should guide you to the answer to question 1 (how to fix this?).
To know more about what went wrong you can print out stdout and stderr from the exception. Git normally prints error details to the console, so something should be in stdout or stderr.
try:
git.Repo.clone_from(gitlab_ssh_URL, local_path)
except git.GitCommandError as exception:
print(exception)
if exception.stdout:
print('!! stdout was:')
print(exception.stdout)
if exception.stderr:
print('!! stderr was:')
print(exception.stderr)
As a side note, I myself had some issues few times when I did many operations on the git.Repo object before using it to interact with the back-end (i.e. the git itself). In my opinion there sometimes seem to be some data caching issues on the GitPython side and lack of synchronisation between data in the repository (.git directory) and the data structures in the git.Repo object.
EDIT:
Ok, the problem seems to be with pulling on detached head - which is probably not what you want to do anyway.
Still, you can work around your problem. Since from detached head you do git checkout master only in order to do git pull and then you go back to the detached head, you could skip pulling and instead use git fetch <remote> <source>:<destination> like this: git fetch origin master:master. This will fetch remote and merge your local master branch with tracking branch without checking it out, so you can remain in detached head state all along without any problems. See this SO answer for more details on the unconventional use of fetch command: https://stackoverflow.com/a/23941734/4973698
With GitPython, the code could look something like this:
my_repo = git.Repo(local_path)
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo.head.reference = my_repo.tags[tag_id]
my_repo.head.reset(index=True, working_tree=True)
fetch_info = my_repo.remotes.origin.fetch('master:master')
for info in fetch_info:
print('{} {} {}'.format(info.ref, info.old_commit, info.flags))
and it would print something like this:
master 11249124f123a394132523513 64
... so flags equal 64. What does it mean? When you do print(git.FetchInfo.FAST_FORWARD) the result is 64 so that means that the fetch was of Fast-Forward type, and that therefore your local branch was successfully merged with remote tracking branch, i.e. you have executed git pull origin master without checking out the master.
Important note: this kind of fetch works only if your branch can be merged with remote branch using fast-forward merge.

The answer is that trying to pull on a detach head is not a good idea even if some smart client handle this case nicely.
So my solution in this case is to checkout the last version on my branch (master), pull then checkout again the desired tagged version.
However we can regret the poor error message given by GitPython.

Using GitPython, how do I do git submodule update --init

My code so far is working doing the following. I'd like to get rid of the subprocess.call() stuff
import git
from subprocess import call
repo = git.Repo(repo_path)
repo.remotes.origin.fetch(prune=True)
repo.head.reset(commit='origin/master', index=True, working_tree=True)
# I don't know how to do this using GitPython yet.
os.chdir(repo_path)
call(['git', 'submodule', 'update', '--init'])

My short answer: it's convenient and simple.
Full answer follows. Suppose you have your repo variable:
repo = git.Repo(repo_path)
Then, simply do:
for submodule in repo.submodules:
submodule.update(init=True)
And you can do all the things with your submodule that you do with your ordinary repo via submodule.module() (which is of type git.Repo) like this:
sub_repo = submodule.module()
sub_repo.git.checkout('devel')
sub_repo.git.remote('maybeorigin').fetch()
I use such things in my own porcelain over git porcelain that I use to manage some projects.
Also, to do it more directly, you can, instead of using call() or subprocess, just do this:
repo = git.Repo(repo_path)
output = repo.git.submodule('update', '--init')
print(output)
You can print it because the method returns output that you usually get by runnning git submodule update --init (obviously the print() part depends on Python version).

Short answer: You can’t.
Full answer: You can’t, and there is also no point. GitPython is not a complete implementation of the whole Git. It just provides a high-level interface to some common things. While a few operations are implemented directly in Python, a lot calls actually use the Git command line interface to process stuff.
Your fetch line for example does this. Under the hood, there is some trick used to make some calls look like Python although they call the Git executable to process the result—using subprocess as well.
So you could try to figure out how to use the git cmd interface GitPython offers works to support those calls (you can access the instance of that cmd handler using repo.git), or you just continue using the “boring” subprocess calls directly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.