How to pull with GitPython?

How to pull with GitPython? - python

I am using GitPython to clone a repository from a Gitlab server.
git.Repo.clone_from(gitlab_ssh_URL, local_path)
Later I have another script that tries to update this repo.
try:
my_repo = git.Repo(local_path)
my_repo .remotes.origin.pull()
except (git.exc.InvalidGitRepositoryError, git.exc.NoSuchPathError):
print("Invalid repository: {}".format(local_path)
This working great except if in-between I checkout a tag like this:
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo .head.reference = my_repo.tags[tag_id]
my_repo .head.reset(index=True, working_tree=True)
In this case I get a GitCommandError when pulling:
git.exc.GitCommandError: 'git pull -v origin' returned with exit code 1
I read the documentation twice already and I don't see where is the problème. Especially since if I try to Pull this repo with a dedicate tool like SourceTree it is working without error or warning.
I don't understand how the fact that I checked out a tagged version even with a detached HEAD prevent me from pulling.
What should I do to pull in this case?
What is happening here and what do I miss?
Edit : as advise I tried to look in the exception.stdout and exception.sterr and there is nothing useful here (respectively b'' and None). That's why I have a hard time understanding what's wrong.

I think it's good idea is to learn more about what is happening first (question 2: what is going on?), and that should guide you to the answer to question 1 (how to fix this?).
To know more about what went wrong you can print out stdout and stderr from the exception. Git normally prints error details to the console, so something should be in stdout or stderr.
try:
git.Repo.clone_from(gitlab_ssh_URL, local_path)
except git.GitCommandError as exception:
print(exception)
if exception.stdout:
print('!! stdout was:')
print(exception.stdout)
if exception.stderr:
print('!! stderr was:')
print(exception.stderr)
As a side note, I myself had some issues few times when I did many operations on the git.Repo object before using it to interact with the back-end (i.e. the git itself). In my opinion there sometimes seem to be some data caching issues on the GitPython side and lack of synchronisation between data in the repository (.git directory) and the data structures in the git.Repo object.
EDIT:
Ok, the problem seems to be with pulling on detached head - which is probably not what you want to do anyway.
Still, you can work around your problem. Since from detached head you do git checkout master only in order to do git pull and then you go back to the detached head, you could skip pulling and instead use git fetch <remote> <source>:<destination> like this: git fetch origin master:master. This will fetch remote and merge your local master branch with tracking branch without checking it out, so you can remain in detached head state all along without any problems. See this SO answer for more details on the unconventional use of fetch command: https://stackoverflow.com/a/23941734/4973698
With GitPython, the code could look something like this:
my_repo = git.Repo(local_path)
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo.head.reference = my_repo.tags[tag_id]
my_repo.head.reset(index=True, working_tree=True)
fetch_info = my_repo.remotes.origin.fetch('master:master')
for info in fetch_info:
print('{} {} {}'.format(info.ref, info.old_commit, info.flags))
and it would print something like this:
master 11249124f123a394132523513 64
... so flags equal 64. What does it mean? When you do print(git.FetchInfo.FAST_FORWARD) the result is 64 so that means that the fetch was of Fast-Forward type, and that therefore your local branch was successfully merged with remote tracking branch, i.e. you have executed git pull origin master without checking out the master.
Important note: this kind of fetch works only if your branch can be merged with remote branch using fast-forward merge.

The answer is that trying to pull on a detach head is not a good idea even if some smart client handle this case nicely.
So my solution in this case is to checkout the last version on my branch (master), pull then checkout again the desired tagged version.
However we can regret the poor error message given by GitPython.

Related

How do you perform a git command on the root commit in git using system calls in code?

I would like to be able to the the SHA of the root commit in a git repository
The catch is that I am using a script to automate a certain git task that I need to be performed many times on various repositories.
I am using the function system(), C's standard library function for making system calls, and most languages have an equivalent.
The following process does not work with system():
get SHAs of all commits with system("<git command for listing SHAs here>") <– this outputs text to the command line rather than returning a list of values to the code
find SHA of root commit <– this cannot happen if the code cannot get a list of all commits
run system("<git command here> <SHA of root commit>")
It is possible the command I am looking for looks like this:
system("git checkout root");
If this is the case, what is the command? If this is not the case, what is the appropriate solution? Is there a better alternative to this that doesn't use system() (the function for executing commands in C)?

First, note that there is not necessarily a single root commit: given N ≥ 1 commits there is at at least one root, but there could be more than one.
That said, each commit has a backwards link to its parent(s), unless it is a root commit, which by definition has no parent. So given any commit hash, you can find its root(s) by walking the graph backwards. If you start at all reachable commits and walk all paths, you will find all root commits.
There is a Git command that does precisely that: git rev-list. You give it some set of starting point commit specifiers, and it walks the graph. By default, it emits every commit hash ID as it comes across it, but it takes many options, including those that limit its output. For instance, it has the --min-parents and --max-parents options that tell it to emit only commits that have at least min, and at most max, parents. Hence:
git rev-list --all --max-parents=0
emits all root commits, as found from all references (--all).
[git rev-list] outputs text to the command line rather than returning a list data structure to code
It outputs text to standard output. Any sensible programming language and operating system offers a way to capture that output:
proc = subprocess.Popen(['git', 'rev-list', '--all', '--max-parents=0'],
stdout=subprocess.PIPE)
output = proc.stdout.read()
result = proc.wait()
for instance. (If using Python 3, note that output is made up of bytes rather than str.) You can then parse the output into a series of lines, to find the root commits. If there is more than one root, it's up to you to decide what to do about this.
Since git rev-list is a plumbing command, its output is generally designed to be machine readable.
system("git rebase <SHA of root commit>")
It's rarely sensible to rebase a complex history, but if you have a simple history, this could be fine. Having a simple history may also guarantee you a single root commit: it could be wise to verify (using the output of git rev-list --parents, for instance) that you do in fact have a simple history.

Find all deleted files in a git repositor along with who deleted them

I have a project under version control with Git. In this project there is a "grid" of files which are organized like
/parts
/a
01.src
02.src
...
90.src
/b
01.src
02.src
...
90.src
/...
(It doesn't matter for the question, but maybe it helps to know that these numbered files are small excisions from a musical score.)
These numbered files are generated by a script, and one part of our work is deleting those files that are not used in the musical score.
Now I would like to retrieve information on who deleted each file (as part of our project documentation and workflow). Information retrieval is done from a Python script.
I have a working approach, but that is extremely inefficient because it calls Git as a subprocess for each file in question, which may be far beyond 1.000 times.
What I can do is calling for each file that is missing in the directory tree:
git log --pretty=format:"%an" --diff-filter=D -- FILENAME
This gives me the author name of the last and deleting commit affecting the file. This works correctly, but as said I have to spawn a new subprocess for each deleted file.
I can do the same with a for loop on the shell:
for delfile in $(git log --all --pretty=format: --name-only --diff-filter=D | sort -u); do echo $delfile: $( git log --pretty=format:"%an" --diff-filter=D -- $delfile); done
But this is really slow, which is understandable because it spawns a new git call for every single file (just as if I'd do it from Python).
So the bottom line is: Is there an efficient way to ask Git about
all files that have been deleted from the repository
(possibly restricted to a subdirectory)
along with the author name of the last commit touching each file
(or actually: The author who deleted the file)
?

It seems my last comment brought me on the right track myself:
git log --diff-filter='D|R' --pretty=format:'%an' --name-only parts
gives me the right thing:
--diff-filter filters the right commits
--pretty=format:'%an' returns only the author
--name-only returns a list of deleted files
So as a result I get something like
Author-1
deleted-file-1
deleted-file-2
Author-2
deleted-file-3
deleted-file-4
Author-1
deleted-file-5
This doesn't give me any more information on the commits, but I don't need that for my use-case. This result can easily be processed from within Python.
(For anybody else landing on this paeg: If you need a similar thing but also want information on the result you can modify the --pretty=format:'..' option. See http://git-scm.com/book/en/Git-Basics-Viewing-the-Commit-History for a list of items that can be displayed)

Automating commit and push through mercurial from script

What I would like it is to run a script that automatically checks for new assets (files that aren't code) that have been submitted to a specific directory, and then every so often automatically commit those files and push them.
I could make a script that does this through the command line, but I was mostly curious if mercurial offered any special functionality for this, specifically I'd really like some kind of return error code so that my script will know if the process breaks at any point so I can send an email with the error to specific developers. For example if for some reason the push fails because a pull is necessary first, I'd like the script to get a code so that it knows this and can handle it properly.
I've tried researching this and can only find things like automatically doing a push after a commit, which isn't exactly what I'm looking for.

You can always check exit-code of used commands
hg add (if new, unversioned files appeared in WC) "Returns 0 if all files are successfully added": non-zero means "some troubles here, not all files added"
hg commit "Returns 0 on success, 1 if nothing changed": 1 means "no commit, nothing to push"
hg push "Returns 0 if push was successful, 1 if nothing to push"

How do I modify gitstats to only utilize a specified file extension for its statistics?

The website of the statistics generator in question is:
http://gitstats.sourceforge.net/
Its git repository can be cloned from:
git clone git://repo.or.cz/gitstats.git
What I want to do is something like:
./gitstatus --ext=".py" /input/foo /output/bar
Failing being able to easily pass the above option without heavy modification, I'd just hard-code the file extension I want to be included.
However, I'm unsure of the relevant section of code to modify and even if I did know, I'm unsure of how to start such modifications.
It's seems like it'd be rather simple but alas...

I found this question today while looking for the same thing. After reading sinelaw's answer I looked into the code and ended up forking the project.
https://github.com/ShawnMilo/GitStats
I added an "exclude_extensions" config option. It doesn't affect all parts of the output, but it's getting there.
I may end up doing a pretty extensive rewrite once I fully understand everything it's doing with the git output. The original project was started almost exactly four years ago today and there's a lot of clean-up that can be done due to many updates to the standard library and the Python language.

EDIT: apparently even the previous solution below only affects the "Files" stat page, which is not interesting. I'm trying to find something better. The line we need to fix is 254, this:
lines = getpipeoutput(['git rev-list --pretty=format:"%%at %%ai %%aN <%%aE>" %s' % getcommitrange('HEAD'), 'grep -v ^commit']).split('\n')
Previous attempt was:
Unfortunately, seems like git does not provide options for easily filtering by files in a commit (in the git log and git rev-list). This solution doesn't really filter all the statistics for certain file types (such as the statistics on tags), but does so for the part that calculates activity by number of lines changed.
So the best I could come up with is at line 499 of gitstats (the main script):
res = int(getpipeoutput(['git ls-tree -r --name-only "%s"' % rev, 'wc -l']).split('\n')[0])
You can change that by either adding a pipe into grep in the command, like this:
res = int(getpipeoutput(['git ls-tree -r --name-only "%s"' % rev, 'grep \\.py$', 'wc -l']).split('\n')[0])
OR, you could split out the 'wc -l' part, get the output of git ls-tree into a list of strings, and filter the resulting file names by using the fnmatch module (and then count the lines in each file, possibly by using 'wc -l') but that sounds like overkill for the specific problem you're trying to solve.
Still doesn't solve the problem (the rest of the stats will ignore this filter), but hopefully helpful.

PySVN error: URL doesn't exist

I got a SVN repository copied onto my computer using svnsync. Now when I try to replay it using PySVN it fails at a specific revision (29762) with the message:
pysvn._pysvn_2_6.ClientError: URL 'svn://svn.zope.org/repos/main/ZODB/trunk/src/Persistence' doesn't exist
I can checkout or update until the previous revision (29761) ok but after that I get this error.
My objective is to analyze the code structure and it's evolution so I have
client.update(path,
revision=pysvn.Revision(pysvn.opt_revision_kind.number,
RevNumber),ignore_externals=False)
within a for loop that increments RevNumber
I am ok with ignoring this specific revision, so if there is a way around it that will allow my checked-out code to progress and be analyzed, that will be fine (as long as there aren't many more instances of this happening).
Nevertheless, if my repo is a copy of a working repo, why doesn't it work and how does the original one function properly?

Although the error message doesn't hint to that, I believe it was caused by running out of disk space. After deleting other files on the drive and re-running the script it worked fine.

try:
client.update(path,revision=pysvn.Revision(pysvn.opt_revision_kind.number,RevNumber),ignore_externals=False)
except:
print "Revision skipped at", RevNumber
continue
This does not solve the problem, but you can use try/ except for your code to go on, if you are ok with omitting some revisions, like you've said.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.