gitpython: determine the remote associated to a (tracking) branch - python

I would like to determine the remote branch, that is associated to the current (tracking branch)
The solution, that I found is working, but it feels strange, that I have to parse the configuration to achieve what I want.
Is there any more elegant solution?
repo = git.Repo(path)
branch = repo.active_branch
cfg = branch.config_reader().config
# hand crafting the section name in next line just seems clumsy
remote= cfg.get(f'branch "{branch.name}"', "remote")

Perhaps git.Head.tracking_branch() and git.Reference.remote_name can give you what you're looking for?
e.g.
repo = git.Repo(path)
branch = repo.active_branch
remote_name = branch.tracking_branch().remote_name

Related

Delete Specific branch with Python-Gitlab module

I am extreme beginner in writing python scripts as I am learning it currently.
I am writing a code where I am going to extract the branches which I have which are named something like tobedeleted_branch1 , tobedeleted_branch2 etc with the help of python-gitlab module.
With so much of research and everything, I was able to extract the names of the branch with the script I have given bellow, but now what I want is, I need to delete the branches which are getting printed.
So I had a plan that, I will go ahead and store the print output in a variable and will delete them in a go, but I am still not able to store them in a variable.
Once I store the 'n' number of branches in that variable, I want to delete them.
I went through the documentation but I couldn't figure out how can I make use of it in python script.
Module: https://python-gitlab.readthedocs.io/en/stable/index.html
Delete branch with help of module REF: https://python-gitlab.readthedocs.io/en/stable/gl_objects/branches.html#branches
Any help regarding this is highly appreciated.
import gitlab, os
TOKEN = "MYTOKEN"
GITLAB_HOST = 'MYINSTANCE'
gl = gitlab.Gitlab(GITLAB_HOST, private_token=TOKEN)
# set gitlab group id
group_id = 6
group = gl.groups.get(group_id, lazy=True)
#get all projects
projects = group.projects.list(include_subgroups=True, all=True)
#get all project ids
project_ids = []
for project in projects:
project_ids.append((project.id))
print(project_ids)
for project in project_ids:
project = gl.projects.get(project)
branches = project.branches.list()
for branch in branches:
if "tobedeleted" in branch.attributes['name']:
print(branch.attributes['name'])
Also, I am very sure this is not the clean way to write the script. Can you please drop your suggestions on how to make it better ?
Thanks
Branch objects have a delete method.
for branch in project.branches.list(as_list=False):
if 'tobedeleted' in branch.name:
branch.delete()
You can also delete a branch by name if you know its exact name already:
project.branches.delete('exact-branch-name')
As a side note:
The other thing you'll notice I've done is add the as_list=False argument to .list(). This will make sure that you paginate through all branches. Otherwise, you'll only get the first page (default 20 per page) of branches. The same is true for most list methods.

Checking out a branch with GitPython. making commits, and then switching back to the previous branch

I'm using the GitPython library to do some simple Git manipulation and I'd like to checkout a branch, make a commit, and then checkout the previous branch. The docs are a little confusing on how to do this. So far, I have this:
import git
repo = git.Repo()
previous_branch = repo.active_branch
new_branch_name = "foo"
new_branch = repo.create_head(new_branch_name)
new_branch.checkout()
repo.index.commit("my commit message") # this seems wrong
## ?????
I can tell that this works by verifying it via git commands, but I get the feeling that I'm doing this incorrectly. I'm not how to switch back to the previous branch safely sort of using the raw git commands (from within the library directly).
From http://gitpython.readthedocs.io/en/stable/tutorial.html
Switching Branches
To switch between branches similar to git checkout, you effectively need to point your HEAD symbolic reference to the new branch and reset your index and working copy to match. A simple manual way to do it is the following one
# Reset our working tree 10 commits into the past
past_branch = repo.create_head('past_branch', 'HEAD~10')
repo.head.reference = past_branch
assert not repo.head.is_detached
# reset the index and working tree to match the pointed-to commit
repo.head.reset(index=True, working_tree=True)
# To detach your head, you have to point to a commit directy
repo.head.reference = repo.commit('HEAD~5')
assert repo.head.is_detached
# now our head points 15 commits into the past, whereas the working tree
# and index are 10 commits in the past
The previous approach would brutally overwrite the user’s changes in the working copy and index though and is less sophisticated than a git-checkout. The latter will generally prevent you from destroying your work. Use the safer approach as follows.
# checkout the branch using git-checkout. It will fail as the working tree appears dirty
self.failUnlessRaises(git.GitCommandError, repo.heads.master.checkout)
repo.heads.past_branch.checkout()
Or, just below that:
Using git directly
In case you are missing functionality as it has not been wrapped, you may conveniently use the git command directly. It is owned by each repository instance.
git = repo.git
git.checkout('HEAD', b="my_new_branch") # create a new branch
git.branch('another-new-one')
git.branch('-D', 'another-new-one') # pass strings for full control over argument order
git.for_each_ref() # '-' becomes '_' when calling it
And simply do the git.checkout() approach

Get the diff details of first commit in GitPython

In GitPython, I can iterate separately the diff information for every change in the tree by calling the diff() method between different commit objects. If I call diff() with the create_patch=True keyword argument, a patch string is created for every change (additions, deletions, renames) which I can access through the created diff object, and dissect for the changes.
However, I don't have a parent to compare to with the first commit.
import git
from git.compat import defenc
repo = git.Repo("path_to_my_repo")
commits = list(repo.iter_commits('master'))
commits.reverse()
for i in commits:
if not i.parents:
# First commit, don't know what to do
continue
else:
# Has a parent
diff = i.diff(i.parents[0], create_patch=True)
for k in diff:
try:
# Get the patch message
msg = k.diff.decode(defenc)
print(msg)
except UnicodeDecodeError:
continue
You can use the method
diff = repo.git.diff_tree(i.hexsha, '--', root=True)
But this calls git diff on the whole tree with the given arguments, returns a string and I cannot get the information for every file separately.
Maybe, there is a way to create a root object of some sorts. How can I get the first changes in a repository?
EDIT
A dirty workaround seems to be comparing to the empty tree by directly using its hash:
EMPTY_TREE_SHA = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
....
if not i.parents:
diff = i.diff(EMPTY_TREE_SHA, create_patch=True, **diffArgs)
else:
diff = i.diff(i.parents[0], create_patch=True, **diffArgs)
But this hardly seems like a real solution. Other answers are still welcome.
The short answer is you can't.
GitPython does not seem to support this method.
It would work to do a git show on the commit, but GitPython does not support that.
You can on the other hand use the stats functionality in GitPython to get something that will allow you to get the information you need:
import git
repo = git.Repo(".")
commits = list(repo.iter_commits('master'))
commits.reverse()
print(commits[0])
print(commits[0].stats.total)
print(commits[0].stats.files)
This might solve your problem. If this does not solve your problem you would probably be better off trying to use pygit2 which is based on libgit2 - The library that VSTS, Bitbucket and GitHub use to handle Git on their backends. That is probably more feature complete. Good luck.
the proposed solution of the OP works, but it has the disadvantage that the diff is inverse (added files in the diff are marked as delete, etc). However, one can simply reverse the logic:
from gitdb.util import to_bin_sha
empty_tree = git.Tree(self.repo, to_bin_sha("4b825dc642cb6eb9a060e54bf8d69288fbee4904"))
diff = empty_tree.diff(i)
Be aware that with sha256, the empty tree id is 6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321
You can check the type of the repo with GitPython like so:
def is_sha1(repo):
format = repo.git.rev_parse("--show-object-format")
return format == "sha1"

Check status of local Python relative to remote with GitPython

How can I use GitPython to determine whether:
My local branch is ahead of the remote (I can safely push)
My local branch is behind the remote (I can safely pull)
My local branch has diverged from the remote?
To check if the local and remote are the same, I'm doing this:
def local_and_remote_are_at_same_commit(repo, remote):
local_commit = repo.commit()
remote_commit = remote.fetch()[0].commit
return local_commit.hexsha == remote_commit.hexsha
See https://stackoverflow.com/a/15862203/197789
E.g.
commits_behind = repo.iter_commits('master..origin/master')
and
commits_ahead = repo.iter_commits('origin/master..master')
Then you can use something like the following to go from iterator to a count:
count = sum(1 for c in commits_ahead)
(You may want to fetch from the remotes before running iter_commits, eg: repo.remotes.origin.fetch())
This was last checked with GitPython 1.0.2.
The following worked better for me, which I got from this stackoverflow answer
commits_diff = repo.git.rev_list('--left-right', '--count', f'{branch}...{branch}#{{u}}')
num_ahead, num_behind = commits_diff.split('\t')
print(f'num_commits_ahead: {num_ahead}')
print(f'num_commits_behind: {num_behind}')

GitPython: get list of remote commits not yet applied

I am writing a Python script to get a list of commits that are about to be applied by a git pull operation. The excellent GitPython library is a great base to start, but the subtle inner workings of git are killing me. Now, here is what I have at the moment (simplified and annotated version):
repo = git.Repo(path) # get the local repo
local_commit = repo.commit() # latest local commit
remote = git.remote.Remote(repo, 'origin') # remote repo
info = remote.fetch()[0] # fetch changes
remote_commit = info.commit # latest remote commit
if local_commit.hexsha == remote_commit.hexsha: # local is updated; end
return
# for every remote commit
while remote_commit.hexsha != local_commit.hexsha:
authors.append(remote_commit.author.email) # note the author
remote_commit = remote_commit.parents[0] # navigate up to the parent
Essentially it gets the authors for all commits that will be applied in the next git pull. This is working well, but it has the following problems:
When the local commit is ahead of the remote, my code just prints all commits to the first.
A remote commit can have more than one parent, and the local commit can be the second parent. This means that my code will never find the local commit in the remote repository.
I can deal with remote repositories being behind the local one: just look in the other direction (local to remote) at the same time, the code gets messy but it works. But this last problem is killing me: now I need to navegate a (potentially unlimited) tree to find a match for the local commit. This is not just theoretical: my latest change was a repo merge which presents this very problem, so my script is not working.
Getting an ordered list of commits in the remote repository, such as repo.iter_commits() does for a local Repo, would be a great help. But I haven't found in the documentation how to do that. Can I just get a Repo object for the Remote repository?
Is there another approach which might get me there, and I am using a hammer to nail screws?
I know this is ages old but I just had to do this for a project and…
head = repo.head.ref
tracking = head.tracking_branch()
return tracking.commit.iter_items(repo, f'{head.path}..{tracking.path}')
(conversely to know how many local commits you have pending to push, just invert it: head.commit.iter_items(repo, f'{tracking.path}..{head.path}'))
I realized that the tree of commits was always like this: one commit has two parents, and both parents have the same parent. This means that the first commit has two parents but only one grandparent.
So it was not too hard to write a custom iterator to go over commits, including diverging trees. It looks like this:
def repo_changes(commit):
"Iterator over repository changes starting with the given commit."
number = 0
next_parent = None
yield commit # return the first commit itself
while len(commit.parents) > 0: # iterate
same_parent(commit.parents) # check only one grandparent
for parent in commit.parents: # go over all parents
yield parent # return each parent
next_parent = parent # for the next iteration
commit = next_parent # start again
The function same_parent() alerts when there are two parents and more than one grandparent. Now it is a simple matter to iterate over the unmerged commits:
for commit in repo_changes(remote_commit):
if commit.hexsha == local_commit.hexsha:
return
authors.append(remote_commit.author.email)
I have left a few details out for clarity. I never return more than a preestablished number of commits (20 in my case), to avoid going to the end of the repo. I also check beforehand that the local repo is not ahead of the remote repo. Other than that, it is working great! Now I can alert all commit authors that their changes are being merged.

Categories

Resources