Python Git Module experiences? [closed]

Python Git Module experiences? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
What are people's experiences with any of the Git modules for Python? (I know of GitPython, PyGit, and Dulwich - feel free to mention others if you know of them.)
I am writing a program which will have to interact (add, delete, commit) with a Git repository, but have no experience with Git, so one of the things I'm looking for is ease of use/understanding with regards to Git.
The other things I'm primarily interested in are maturity and completeness of the library, a reasonable lack of bugs, continued development, and helpfulness of the documentation and developers.
If you think of something else I might want/need to know, please feel free to mention it.

While this question was asked a while ago and I don't know the state of the libraries at that point, it is worth mentioning for searchers that GitPython does a good job of abstracting the command line tools so that you don't need to use subprocess. There are some useful built in abstractions that you can use, but for everything else you can do things like:
import git
repo = git.Repo( '/home/me/repodir' )
print repo.git.status()
# checkout and track a remote branch
print repo.git.checkout( 'origin/somebranch', b='somebranch' )
# add a file
print repo.git.add( 'somefile' )
# commit
print repo.git.commit( m='my commit message' )
# now we are one commit ahead
print repo.git.status()
Everything else in GitPython just makes it easier to navigate. I'm fairly well satisfied with this library and appreciate that it is a wrapper on the underlying git tools.
UPDATE: I've switched to using the sh module for not just git but most commandline utilities I need in python. To replicate the above I would do this instead:
import sh
git = sh.git.bake(_cwd='/home/me/repodir')
print git.status()
# checkout and track a remote branch
print git.checkout('-b', 'somebranch')
# add a file
print git.add('somefile')
# commit
print git.commit(m='my commit message')
# now we are one commit ahead
print git.status()

I thought I would answer my own question, since I'm taking a different path than suggested in the answers. Nonetheless, thanks to those who answered.
First, a brief synopsis of my experiences with GitPython, PyGit, and Dulwich:
GitPython: After downloading, I got this imported and the appropriate object initialized. However, trying to do what was suggested in the tutorial led to errors. Lacking more documentation, I turned elsewhere.
PyGit: This would not even import, and I could find no documentation.
Dulwich: Seems to be the most promising (at least for what I wanted and saw). I made some progress with it, more than with GitPython, since its egg comes with Python source. However, after a while, I decided it may just be easier to try what I did.
Also, StGit looks interesting, but I would need the functionality extracted into a separate module and do not want wait for that to happen right now.
In (much) less time than I spent trying to get the three modules above working, I managed to get git commands working via the subprocess module, e.g.
def gitAdd(fileName, repoDir):
cmd = ['git', 'add', fileName]
p = subprocess.Popen(cmd, cwd=repoDir)
p.wait()
gitAdd('exampleFile.txt', '/usr/local/example_git_repo_dir')
This isn't fully incorporated into my program yet, but I'm not anticipating a problem, except maybe speed (since I'll be processing hundreds or even thousands of files at times).
Maybe I just didn't have the patience to get things going with Dulwich or GitPython. That said, I'm hopeful the modules will get more development and be more useful soon.

I'd recommend pygit2 - it uses the excellent libgit2 bindings

This is a pretty old question, and while looking for Git libraries, I found one that was made this year (2013) called Gittle.
It worked great for me (where the others I tried were flaky), and seems to cover most of the common actions.
Some examples from the README:
from gittle import Gittle
# Clone a repository
repo_path = '/tmp/gittle_bare'
repo_url = 'git://github.com/FriendCode/gittle.git'
repo = Gittle.clone(repo_url, repo_path)
# Stage multiple files
repo.stage(['other1.txt', 'other2.txt'])
# Do the commit
repo.commit(name="Samy Pesse", email="samy#friendco.de", message="This is a commit")
# Authentication with RSA private key
key_file = open('/Users/Me/keys/rsa/private_rsa')
repo.auth(pkey=key_file)
# Do push
repo.push()

Maybe it helps, but Bazaar and Mercurial are both using dulwich for their Git interoperability.
Dulwich is probably different than the other in the sense that's it's a reimplementation of git in python. The other might just be a wrapper around Git's commands (so it could be simpler to use from a high level point of view: commit/add/delete), it probably means their API is very close to git's command line so you'll need to gain experience with Git.

For the sake of completeness, http://github.com/alex/pyvcs/ is an abstraction layer for all dvcs's. It uses dulwich, but provides interop with the other dvcs's.

An updated answer reflecting changed times:
GitPython currently is the easiest to use. It supports wrapping of many git plumbing commands and has pluggable object database (dulwich being one of them), and if a command isn't implemented, provides an easy api for shelling out to the command line. For example:
repo = Repo('.')
repo.checkout(b='new_branch')
This calls:
bash$ git checkout -b new_branch
Dulwich is also good but much lower level. It's somewhat of a pain to use because it requires operating on git objects at the plumbing level and doesn't have nice porcelain that you'd normally want to do. However, if you plan on modifying any parts of git, or use git-receive-pack and git-upload-pack, you need to use dulwich.

PTBNL's Answer is quite perfect for me.
I make a little more for Windows user.
import time
import subprocess
def gitAdd(fileName, repoDir):
cmd = 'git add ' + fileName
pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
(out, error) = pipe.communicate()
print out,error
pipe.wait()
return
def gitCommit(commitMessage, repoDir):
cmd = 'git commit -am "%s"'%commitMessage
pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
(out, error) = pipe.communicate()
print out,error
pipe.wait()
return
def gitPush(repoDir):
cmd = 'git push '
pipe = subprocess.Popen(cmd, shell=True, cwd=repoDir,stdout = subprocess.PIPE,stderr = subprocess.PIPE )
(out, error) = pipe.communicate()
pipe.wait()
return
temp=time.localtime(time.time())
uploaddate= str(temp[0])+'_'+str(temp[1])+'_'+str(temp[2])+'_'+str(temp[3])+'_'+str(temp[4])
repoDir='d:\\c_Billy\\vfat\\Programming\\Projector\\billyccm' # your git repository , windows your need to use double backslash for right directory.
gitAdd('.',repoDir )
gitCommit(uploaddate, repoDir)
gitPush(repoDir)

Here's a really quick implementation of "git status":
import os
import string
from subprocess import *
repoDir = '/Users/foo/project'
def command(x):
return str(Popen(x.split(' '), stdout=PIPE).communicate()[0])
def rm_empty(L): return [l for l in L if (l and l!="")]
def getUntracked():
os.chdir(repoDir)
status = command("git status")
if "# Untracked files:" in status:
untf = status.split("# Untracked files:")[1][1:].split("\n")
return rm_empty([x[2:] for x in untf if string.strip(x) != "#" and x.startswith("#\t")])
else:
return []
def getNew():
os.chdir(repoDir)
status = command("git status").split("\n")
return [x[14:] for x in status if x.startswith("#\tnew file: ")]
def getModified():
os.chdir(repoDir)
status = command("git status").split("\n")
return [x[14:] for x in status if x.startswith("#\tmodified: ")]
print("Untracked:")
print( getUntracked() )
print("New:")
print( getNew() )
print("Modified:")
print( getModified() )

The git interaction library part of StGit is actually pretty good. However, it isn't broken out as a separate package but if there is sufficient interest, I'm sure that can be fixed.
It has very nice abstractions for representing commits, trees etc, and for creating new commits and trees.

For the record, none of the aforementioned Git Python libraries seem to contain a "git status" equivalent, which is really the only thing I would want since dealing with the rest of the git commands via subprocess is so easy.

Related

How to use GitPython to perform "git push" when using SSH Keys?

I am trying to write a Python Controller, which would help me automate Git -usage. I've gotten all other commands to work - but I am having difficulties with git push equivalent, when using GitPython Library.
This is where I am right now. This should be working without the SSH Key identification, but I have to squeeze that in.
""" Execute Git Push with GitPython Library.
Hardcoded values: 'branch' environment.
TODO: This is not working. """
def push(self, repo_path, branch, commit_message, user):
repo = Repo(repo_path)
repo.git.add('--all')
repo.git.commit('-m', commit_message)
origin = repo.remote(name=branch)
origin.push()
This is what I have on my Initialization. (Cleared some values due to privacy.)
load_dotenv()
self.BRANCH = "TBD" # Hardcoded Value
self.REPO_PATH = os.getenv('REPO_PATH')
self.REPO = Repo(self.REPO_PATH)
self.COMMIT_MESSAGE = '"Commit from Controller."'
# TODO: These should be changed, when deployed.
self.GIT_SSH_KEY = os.path.expanduser('/home/user/.ssh/id_rsa')
self.GIT_SSH_CMD = "ssh -i %s" % self.GIT_SSH_KEY
self.GIT_USER = "user" # This needs to be changed.
From my understanding from this (GitPython and SSH Keys?) the tactic here is to use GIT_SSH environment variable to provide executable, which will call the ssh - but since I am a beginner, I am having trouble understanding what exactly that environment variable should contain, and how to wrap that with the push function.
Thank you in advance!

First, setting values on self isn't going to accomplish anything by itself, unless there are parts of your code you're not showing us. If you need to set the GIT_SSH environment variable, then you would need to set os.environ['GIT_SSH'].
In general, you shouldn't need to set GIT_SSH unless you require a non-default ssh commandline. That is, if I have:
$ git remote -v
origin ssh://git#github.com/larsks/gnu-hello (fetch)
origin ssh://git#github.com/larsks/gnu-hello (push)
Then I can write:
>>> import git
>>> repo = git.Repo('.')
>>> origin = repo.remote('origin')
>>> res = origin.push()
>>> res[0].summary
'[up to date]\n'
I didn't have to set anything special here; the defaults were entirely appropriate. Under the hood, GitPython just calls the git command line, so anything that works with the cli should work fine without special configuration.

Iterate commits b/w 2 specified commits in GitPython

import git
repo = git.Repo(repo_dir)
ref_name = 'master'
for commit in repo.iter_commits(rev=ref_name):
<some code here>
This code iterates through all the commits. I want to iterate b/w 2 commits.
Just like git log commit1...commit2
How can I do the same using GitPython's iter_commits() method.

repo.iter_commits(rev='1234abc..5678def') works for me in GitPython==2.1.11
Example:
repo = git.Repo(repo_dir)
for commit in repo.iter_commits(rev='master..HEAD'):
<some code here>

You can use pure gitpython for that.
If you want to able to traverse certain commit (assuming the first
commit is HEAD), just use max_count. See The Commit object
two_commits = list(repo.iter_commits('master', max_count=2))
assert len(two_commits) == 2
if you want similar ability to git log commit1...commit2 as you mentioned:
logs = repo.git.log("--oneline", "f5035ce..f63d26b")
will give you:
>>> logs
'f63d26b Fix urxvt name to match debian repo\n571f449 Add more key for helm-org-rifle\nbea2697 Drop bm package'
You can also use logs = repo.git.log("f5035ce..f63d26b") but it will give you all info (just like you use git log without --oneline)
if you want nice output, use pretty print:
from pprint import pprint as pp
>>> pp(logs)
('f63d26b Fix urxvt name to match debian repo\n'
'571f449 Add more key for helm-org-rifle\n'
'bea2697 Drop bm package')
For more explanation about repo.git.log, see https://stackoverflow.com/a/55545500/6000005

I would suggest you to use PyDriller (a wrapper around GitPython, to make things easier). What you asked can be done like this:
for commit in RepositoryMining("path_to_repo", from_commit="first", to_commit="second").traverse_commits():
# your code

First, make a function to run the git command.
from git import *
from subprocess import Popen, PIPE
def execute_gitcmd(cmd, repo):
pipe = subprocess.Popen(cmd, shell=True, cwd=repo, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
(out, error) = pipe.communicate()
return out, error
pipe.wait()
Then write any git command as you use on terminal, for example:
gitcmd = "git log -n1 --oneline"
Finally, call your function:
log = (execute_gitcmd(gitcmd, your_repository))
Hope this can help.

Add usage help of command line tool to README.rst

I wrote a little command line tool, and want to add the "--help" usage message to the docs.
Since I am lazy, I would like to make the update procedure as simple as possible. Here is what I want to the update workflow to look like:
Update code which results in an updates usage message.
Run a script which updates the docs: The new usage message should be visible in the docs.
In other word: I don't want to copy+paste the usage message.
Step1 is comes from my own brain. But want to reuse existing tools for Step2.
Up to now the docs are just a simple README.rst file.
I would like to stick with a simple solution, where the docs can be visible directly via github. Up to now, I don't need the more complicated solution (like readthedocs).
How can I avoid copy+pasting the --help usage message?
Here is the tool I am working on: https://github.com/guettli/reprec

As suggested in the comments, you could use a git pre-commit hook to generate the README.rst file on commit. You could use an existing tool such as cog, or you could just do something very simple with bash.
For example, create a RST "template" file:
README.rst.tmpl
Test Git pre-commit hook project
--------------------------------
>>> INSERTION POINT FOR HELP OUTPUT <<<
.git/hooks/pre-commit
# Sensible to set -e to ensure we exit if anything fails
set -e
# Get the output from your tool.
# Paths are relative to the root of the repo
output=$(tools/my-cmd-line-tool --help)
cat README.rst.tmpl |
while read line
do
if [[ $line == ">>> INSERTION POINT FOR HELP OUTPUT <<<" ]]
then
echo "$output"
else
echo "$line"
fi
done > README.rst
git add README.rst
This gets run before you are prompted for a commit message, if you didn't pass one on the command line. So when the commit takes place if there were any changes to either README.rst.tmpl or the output from your tool, README.rst will be updated with it.
Edit
I believe this should work on Windows too, or something very similar, since git comes with a bash implementation on Windows, but I haven't tested it.

Consider using cog. It's meant for exactly this job.
Here's something that might just work. (untested) And... There's a lot of scope for improvement.
reprec
======
The tool reprec replaces strings in text files:
.. [[[cog
.. import cog
..
.. def indent(text, width=4):
.. return "\n".join((" "*width + line) for line in text.splitlines())
..
.. text = subprocess.check_output("reprec --help", shell=True)
.. cog.out("""
..
.. ::
..
.. ==> reprec --help""",
.. dedent=True
.. )
.. cog.out(indent(text))
.. ]]]
::
===> reprec --help
<all-help-text>
.. [[[end]]]

For getting the usage text at Step 2, you can use the subprocess
usage_text = subprocess.check_output("reprec --help", shell=True)

I would actually approach in a quite different manner, from another side. I think the workflow you described may be greatly simplified if you switch to using argparse instead of getopt you use now. With this you will have:
I personally think, simpler code in your argument parsing function, and probably more safe, because argparse may verify a lot of conditions on given arguments, as long as you declare them (like data types, number of arguments, etc.)
and you can use argparse features to document the arguments directly in the code, right where you declare them (e.g.: help, usage, epilog and others); this effectively means that you could completely delete your own usage function, because argparse will handle this task for you (just run with --help to see the result).
To sum up, basically, arguments, their contracts and help documentation become mostly declarative, and managed altogether in one place only.
OK, OK, I know, the question originally stands how to update the README. I understand that your intention is to take the laziest approach. So, I think, it is lazy enough to:
maintain all your arguments and their documentation once in single place as above
then run something like myprograom --help > README.rst
commit ;)
OK, you will probably need something little bit more complex than just > README.rst. There we can go creative as we want, so the fun starts here. For example:
having README.template.rst (where you actually maintain the README content) and with ## Usage header somewhere in it:
$ myprogram --help > USAGE.rst
$ sed -e '/## Usage/r USAGE.rst' -e '$G' README.template.rst > README.rst
And you get everything working from same source code!
I think it will still need some polishing up, in order to generate valid rst document, but I hope it shows the idea in general.
Gist: Include generated help into README

hglib: show patches for a revision, possible?

I'm trying to get the patches for a given revision using hglib. I know the hg command is
hg log -pr rev
but I can't find how to do this or equivalent with hglib. It seems there is not functionality to do that, unless I hack the code myself to run the above command. Any help would be greatly appreciated?

The hglib client.log() interface doesn't support what I wanted to do, but I found a simple way to run an arbitrary hg command. This two lines print the patch of revision rev:
out = client.rawcommand([b'log', b'-pr', b'%i'%rev])
print(str(out, 'utf-8'))

May be this is the actual answer!
import hglib
client = hglib.open(<path>)
client.export (revs = str(<revision number>), output = <output file path>)
You can execute the same with subprocess package by yourself to save interpretation time. Rawcommand just builds a command with the parameters we pass and executes with subprocess again.

Using set input for stdin based on output from stdout with python subprocess

I would like to install a software automatically from python using subprocess.Popen. During the installation, this software outputs some information and then asks user a couple of questions (e.g., whether to agree with the software license or to install the software). For these questions, I would like to answer them automatically from the pythong code. However, I tried several ways but none of them worked for me. Any idea of how to make it? The expected pseudo code is as follows (definitely, this code does not work).
p = subprocess.Popen(['myprogram'], stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.STDOUT))
while (p.poll() == None):
line = p.stdout.readline()
if not line:
break
else:
print line.strip()
if line == 'Do you accept license agreement (YES/NO)':
p.stdin.write('YES/n')
elif line == 'Do you want to install this software (YES/NO)':
p.stdin.write('YES/n')

Thanks Thomas Fenz. pypi.python.org/pypi/pexpect is excellent in this case. In my previous comment, I said it didn't work because I made a mistake in the pattern specification. In particular, I used p.expect('Do you accept license agreement? (YES/NO)'). The question mark in this case was a mistake. As it is a meta character, the pattern could not be matched and hence TIMEOUT happened.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.