How can I use GitPython to determine whether:
My local branch is ahead of the remote (I can safely push)
My local branch is behind the remote (I can safely pull)
My local branch has diverged from the remote?
To check if the local and remote are the same, I'm doing this:
def local_and_remote_are_at_same_commit(repo, remote):
local_commit = repo.commit()
remote_commit = remote.fetch()[0].commit
return local_commit.hexsha == remote_commit.hexsha
See https://stackoverflow.com/a/15862203/197789
E.g.
commits_behind = repo.iter_commits('master..origin/master')
and
commits_ahead = repo.iter_commits('origin/master..master')
Then you can use something like the following to go from iterator to a count:
count = sum(1 for c in commits_ahead)
(You may want to fetch from the remotes before running iter_commits, eg: repo.remotes.origin.fetch())
This was last checked with GitPython 1.0.2.
The following worked better for me, which I got from this stackoverflow answer
commits_diff = repo.git.rev_list('--left-right', '--count', f'{branch}...{branch}#{{u}}')
num_ahead, num_behind = commits_diff.split('\t')
print(f'num_commits_ahead: {num_ahead}')
print(f'num_commits_behind: {num_behind}')
Related
I would like to determine the remote branch, that is associated to the current (tracking branch)
The solution, that I found is working, but it feels strange, that I have to parse the configuration to achieve what I want.
Is there any more elegant solution?
repo = git.Repo(path)
branch = repo.active_branch
cfg = branch.config_reader().config
# hand crafting the section name in next line just seems clumsy
remote= cfg.get(f'branch "{branch.name}"', "remote")
Perhaps git.Head.tracking_branch() and git.Reference.remote_name can give you what you're looking for?
e.g.
repo = git.Repo(path)
branch = repo.active_branch
remote_name = branch.tracking_branch().remote_name
I have the following case:
Two servers running each their own name server (NS)
Each of these servers has the same object registering with a different URI. The URI includes the local server's hostname to make it unique
A third server, the client, tries to target the right server to query information available only that server
Please note that all of these 3 servers can communicate with each other.
My questions and issues:
The requests from the third server always goes to the first server, no matter what; except when I shutdown the first NS. Is there something definitely wrong with that I'm doing? I guess I do, but I can't figure it out...
Is running separate nameservers the root cause? What would be the alternative if this not allowed? I run multiple name servers for redundancy as some other upcoming operations can run on any of the two first servers. When I list the content of each name server (locally on each server), I get the right registration (which includes the hostname).
Is the use of Pyro4.config.NS_HOST parameter wrong, see below the usage in the code? What would be the alternative?
My configuration:
Pyro 4-4.63-1.1
Python 2.7.13
Linux OpenSuse (kernel version 4.4.92)
The test code is listed below. I got rid of the details like try blocks and imports, etc...
My server skeleton code (which runs on the first two servers):
daemon = Pyro4.Daemon(local_ip_address)
ns = Pyro4.locateNS()
uri = daemon.register(TestObject())
ns.register("test.object#%s" % socket.gethostname(), uri)
daemon.requestLoop()
The local_ip_address is the one supplied below by the user to contact the correct name server (on the correct server).
The name server is started on each of the first tow servers as follows:
python -m Pyro4.naming -n local_ip_address
The local_ip_address is the same as above.
My client skeleton code (which runs on the third server):
target_server_hostname = user_provided_hostname
target_server_ip = user_provided_ip
Pyro4.config.NS_HOST = target_server_ip
uri = "test.object#%s" % target_server_hostname
proxy = Pyro4.Proxy(uri)
proxy._pyroTimeout = my_timeout
proxy._pyroMaxRetries = my_retries
rc, reason = proxy.testAction(target_server_hostname)
if rc != 0:
print reason
else:
print "Hostname matches"
If more information is required, please let me know.
Djurdjura.
I think figured it out. Hope this will be useful to anyone else looking for a similar use case.
You just need to specify where to look for the name server itself. The client code becomes something like the following:
target_server_hostname = user_provided_hostname
target_server_ip = user_provided_ip
# The following statement finds the correct name server
ns = Pyro4.locateNS(host=target_server_ip)
name = "test.object#%s" % target_server_hostname
uri = ns.lookup(name)
proxy = Pyro4.Proxy(uri)
proxy._pyroTimeout = my_timeout
proxy._pyroMaxRetries = my_retries
rc, reason = proxy.testAction(target_server_hostname)
if rc != 0:
print reason
else:
print "Hostname matches"
In my case, I guess the alternative would be using a single common name server running on ... the third server (the client server). This server is always on and ready before the other ones. I didn't try this approach one yet.
Regards.
D.
PS. Thanks Irmen for your answer.
I'm currently running Pygit 0.24.1 (along with libgit 0.24.1), working on a repository where I have two branches (say prod and dev).
Every change is first commited to the dev branch and pushed to the remote repository. To do that, I have this piece of code:
repo = Repository('/foo/bar')
repo.checkout('refs/heads/dev')
index = repo.index
index.add('any_file')
index.write()
tree = index.write_tree()
author = Signature('foo', 'foo#bar')
committer = Signature('foo', 'foo#bar')
repo.create_commit('refs/heads/dev', author, committer, 'Just another commit', tree, [repo.head.get_object().hex])
up = UserPass('foo', '***')
rc = RemoteCallbacks(credentials=up)
repo.remotes['origin'].push(['refs/heads/dev'], rc)
This works quite fine, I can see the local commit and also the remote commit, and the local repo remains clean:
nothing to commit, working directory clean
Next, I check-out to the prod branch and I want to merge the HEAD commit on dev. To do so, I use this other piece of code (assuming I always start checked-out to the dev branch):
head_commit = repo.head
repo.checkout('refs/heads/prod')
prod_branch_tip = repo.lookup_reference('HEAD').resolve()
prod_branch_tip.set_target(head_commit.target)
rc = RemoteCallbacks(credentials=up)
repo.remotes['origin'].push(['refs/heads/prod'], rc)
repo.checkout('refs/heads/dev')
I actually can see the branch being merged both locally and remotely, but after this piece of code runs, the commited file always remains in a modified state on branch dev.
On branch dev
Changes to be committed:
(use "git reset HEAD ..." to unstage)
modified: any_file
I'm completely sure noone is modifying that file, though. Actually, a git diff shows nothing. This issue happens only with already commited files (i.e., files that have been commited at least once previously). When files are new, this works perfectly and leaves the file in a clean state.
I'm sure I'm missing some detail but I'm unable to find out what is it. Why is the file left as modified?
EDIT: Just to clarify, my aim is to do a FF (Fast-Forward) merge. I know there's some documentation about doing a non-FF merge in the Pygit2 documentation, but I'd prefer the first method because it keeps commit hashes through branches.
EDIT 2: After #Leon's comment, I double checked and indeed, git diff shows no output while git diff --cached shows the content that the file had before commiting. That's odd since I can see the change successfully commited on the local and remote repositories, but it looks like afterwards the file is changed again to the previous content...
An example of that:
Having a file with content '12345' commited + pushed, I replace that string with '54321'
I run the code above
git log shows the file commited correctly, on the remote repo I see the file with content '54321', while locally git diff --cached shows this:
## -1 +1 ##
-54321
+12345
I would explain the observed problem as follows:
head_commit = repo.head
# This resets the index and the working tree to the old state
# and records that we are in a state corresponding to the commit
# pointed to by refs/heads/prod
repo.checkout('refs/heads/prod')
prod_branch_tip = repo.lookup_reference('HEAD').resolve()
# This changes where refs/heads/prod points. The index and
# the working tree are not updated, but (probably due to a bug in pygit2)
# they are not marked as gone-out-of-sync with refs/heads/prod
prod_branch_tip.set_target(head_commit.target)
rc = RemoteCallbacks(credentials=up)
repo.remotes['origin'].push(['refs/heads/prod'], rc)
# Now we must switch to a state corresponding to refs/heads/dev. It turns
# out that refs/heads/dev points to the same commit as refs/heads/prod.
# But we are already in the (clean) state corresponding to refs/heads/prod!
# Therefore there is no need to update the index and/or the working tree.
# So this simply changes HEAD to refs/heads/prod
repo.checkout('refs/heads/dev')
The solution is to fast-forward the branch without checking it out. The following code is devoid of the described problem:
head_commit = repo.head
prod_branch_tip = repo.lookup_branch('prod')
prod_branch_tip.set_target(head_commit.target)
rc = RemoteCallbacks(credentials=up)
repo.remotes['origin'].push(['refs/heads/prod'], rc)
I am writing a Python script to get a list of commits that are about to be applied by a git pull operation. The excellent GitPython library is a great base to start, but the subtle inner workings of git are killing me. Now, here is what I have at the moment (simplified and annotated version):
repo = git.Repo(path) # get the local repo
local_commit = repo.commit() # latest local commit
remote = git.remote.Remote(repo, 'origin') # remote repo
info = remote.fetch()[0] # fetch changes
remote_commit = info.commit # latest remote commit
if local_commit.hexsha == remote_commit.hexsha: # local is updated; end
return
# for every remote commit
while remote_commit.hexsha != local_commit.hexsha:
authors.append(remote_commit.author.email) # note the author
remote_commit = remote_commit.parents[0] # navigate up to the parent
Essentially it gets the authors for all commits that will be applied in the next git pull. This is working well, but it has the following problems:
When the local commit is ahead of the remote, my code just prints all commits to the first.
A remote commit can have more than one parent, and the local commit can be the second parent. This means that my code will never find the local commit in the remote repository.
I can deal with remote repositories being behind the local one: just look in the other direction (local to remote) at the same time, the code gets messy but it works. But this last problem is killing me: now I need to navegate a (potentially unlimited) tree to find a match for the local commit. This is not just theoretical: my latest change was a repo merge which presents this very problem, so my script is not working.
Getting an ordered list of commits in the remote repository, such as repo.iter_commits() does for a local Repo, would be a great help. But I haven't found in the documentation how to do that. Can I just get a Repo object for the Remote repository?
Is there another approach which might get me there, and I am using a hammer to nail screws?
I know this is ages old but I just had to do this for a project and…
head = repo.head.ref
tracking = head.tracking_branch()
return tracking.commit.iter_items(repo, f'{head.path}..{tracking.path}')
(conversely to know how many local commits you have pending to push, just invert it: head.commit.iter_items(repo, f'{tracking.path}..{head.path}'))
I realized that the tree of commits was always like this: one commit has two parents, and both parents have the same parent. This means that the first commit has two parents but only one grandparent.
So it was not too hard to write a custom iterator to go over commits, including diverging trees. It looks like this:
def repo_changes(commit):
"Iterator over repository changes starting with the given commit."
number = 0
next_parent = None
yield commit # return the first commit itself
while len(commit.parents) > 0: # iterate
same_parent(commit.parents) # check only one grandparent
for parent in commit.parents: # go over all parents
yield parent # return each parent
next_parent = parent # for the next iteration
commit = next_parent # start again
The function same_parent() alerts when there are two parents and more than one grandparent. Now it is a simple matter to iterate over the unmerged commits:
for commit in repo_changes(remote_commit):
if commit.hexsha == local_commit.hexsha:
return
authors.append(remote_commit.author.email)
I have left a few details out for clarity. I never return more than a preestablished number of commits (20 in my case), to avoid going to the end of the repo. I also check beforehand that the local repo is not ahead of the remote repo. Other than that, it is working great! Now I can alert all commit authors that their changes are being merged.
Trying to set up the svn commit with trac using this script.
It is being called without issue, but the problem is this line here:
144 repos = self.env.get_repository()
Because I am calling this remotely self.env_get_repository() looks for the repository using the server drive and not the local drive mapping. That is, it is looking for E:/Projects/svn/InfoProj and not Y:/Projects/sv/InfoProj
I noticed a changeset on the trac set for being able to call get_repository() and passing in the path as the variable, but it seems this hasn't made it into the latest stable release yet.
This version of the script (the one submitted by code monkey) appears to do things differently, but is throwing an error that seems related:
154 if url is None:
155 url = self.env.config.get('project', 'url')
156 self.env.href = Href(url)
157 self.env.abs_href = Href(url)
Lines 156 / 157 throw error: Warning: TypeError: 'str' object is not callable
The 10.3 stable version of the script throws a completely different error:
Warning: NameError: global name 'core' is not defined
I'm setting up trac for the first time on a Windows box with a remote repository. I'm using trac 0.11 stable with Python 2.6.
I thought there would have been a lot more people out there trying to commit across servers who had come across this problem. I've looked around and couldn't find a solution. I'm supposing Linux has a more graceful way of handling this.
Thanks in advance.
This is totally do-able and just requires a couple of small hacks... woo hoo!
The problem I was having is that get_repository reads the value of the svn repository from the trac.ini file. This was pointing at E:/ and not at Y:/. The simple fix involves a check to see if the repository is at repository_dir and if not, then check at a new variable remote_repository_dir. The second part of the fix involves removing the error message from cache.py that checks to see if the current repository address matches the one being passed in.
As always, use this at your own risk and back everything up before hand!!!
First open you trac.ini file and add a new variable 'remote_repository_dir' underneath the 'repository_dir' variable. Remote repository dir will point to the mapped drive on your local machine. It should now look something like this:
repository_dir = E:/Projects/svn/InfoProj
remote_repository_dir = Y:/Projects/svn/InfoProj
Next we will modify the api.py file to check for the new variable if it can't find the repository at the repository_dir location. Around :71 you should have something like this:
repository_dir = Option('trac', 'repository_dir', '',
"""Path to local repository. This can also be a relative path
(''since 0.11'').""")
Underneath this line add:
remote_repository_dir = Option('trac', 'remote_repository_dir', '',
"""Path to remote repository.""")
Next near :156 you will have this:
rtype, rdir = self.repository_type, self.repository_dir
if not os.path.isabs(rdir):
rdir = os.path.join(self.env.path, rdir)
Change that to this:
rtype, rdir = self.repository_type, self.repository_dir
if not os.path.isdir(rdir):
rdir = self.remote_repository_dir
if not os.path.isabs(rdir):
rdir = os.path.join(self.env.path, rdir)
Finally you will need to remove the alert in the cache.py file (note this is not the best way to do this, you should be able to include the remote variable as part of the check, but for now it works).
In cache.py near :97 it should look like this:
if repository_dir:
# directory part of the repo name can vary on case insensitive fs
if os.path.normcase(repository_dir) != os.path.normcase(self.name):
self.log.info("'repository_dir' has changed from %r to %r"
% (repository_dir, self.name))
raise TracError(_("The 'repository_dir' has changed, a "
"'trac-admin resync' operation is needed."))
elif repository_dir is None: #
self.log.info('Storing initial "repository_dir": %s' % self.name)
cursor.execute("INSERT INTO system (name,value) VALUES (%s,%s)",
(CACHE_REPOSITORY_DIR, self.name,))
else: # 'repository_dir' cleared by a resync
self.log.info('Resetting "repository_dir": %s' % self.name)
cursor.execute("UPDATE system SET value=%s WHERE name=%s",
(self.name, CACHE_REPOSITORY_DIR))
We are going to remove the first part of the if statement so it now should look like this:
if repository_dir is None: #
self.log.info('Storing initial "repository_dir": %s' % self.name)
cursor.execute("INSERT INTO system (name,value) VALUES (%s,%s)",
(CACHE_REPOSITORY_DIR, self.name,))
else: # 'repository_dir' cleared by a resync
self.log.info('Resetting "repository_dir": %s' % self.name)
cursor.execute("UPDATE system SET value=%s WHERE name=%s",
(self.name, CACHE_REPOSITORY_DIR))
Warning! Doing this will mean that it no longer gives you an error if your directory has changed and you need a resync.
Hope this helps someone.