Using GitPython, how do I do git submodule update --init

Using GitPython, how do I do git submodule update --init - python

My code so far is working doing the following. I'd like to get rid of the subprocess.call() stuff
import git
from subprocess import call
repo = git.Repo(repo_path)
repo.remotes.origin.fetch(prune=True)
repo.head.reset(commit='origin/master', index=True, working_tree=True)
# I don't know how to do this using GitPython yet.
os.chdir(repo_path)
call(['git', 'submodule', 'update', '--init'])

My short answer: it's convenient and simple.
Full answer follows. Suppose you have your repo variable:
repo = git.Repo(repo_path)
Then, simply do:
for submodule in repo.submodules:
submodule.update(init=True)
And you can do all the things with your submodule that you do with your ordinary repo via submodule.module() (which is of type git.Repo) like this:
sub_repo = submodule.module()
sub_repo.git.checkout('devel')
sub_repo.git.remote('maybeorigin').fetch()
I use such things in my own porcelain over git porcelain that I use to manage some projects.
Also, to do it more directly, you can, instead of using call() or subprocess, just do this:
repo = git.Repo(repo_path)
output = repo.git.submodule('update', '--init')
print(output)
You can print it because the method returns output that you usually get by runnning git submodule update --init (obviously the print() part depends on Python version).

Short answer: You can’t.
Full answer: You can’t, and there is also no point. GitPython is not a complete implementation of the whole Git. It just provides a high-level interface to some common things. While a few operations are implemented directly in Python, a lot calls actually use the Git command line interface to process stuff.
Your fetch line for example does this. Under the hood, there is some trick used to make some calls look like Python although they call the Git executable to process the result—using subprocess as well.
So you could try to figure out how to use the git cmd interface GitPython offers works to support those calls (you can access the instance of that cmd handler using repo.git), or you just continue using the “boring” subprocess calls directly.

Related

when using Watchman's watch-make I want to access the name of the changed files

I am writing a watchman command with watchman-make and I'm at a loss when trying to access exactly what was changed in the directory. I want to run my upload.py script and inside the script I would like to access filenames of newly created files in /var/spool/cups-pdf/ANONYMOUS .
so far I have
$ watchman-make -p '/var/spool/cups-pdf/ANONYMOUS' -—run 'python /home/pi/upload.py'
I'd like to add another argument to python upload.py so I can have an exact filepath to the newly created file so that I can send the new file over to my database in upload.py,
I've been looking at the docs of watchman and the closest thing I can think to use is a trigger object. Please help!

Solution with watchman-wait:
Assuming project layout like this:
/posts/_SUBDIR_WITH_POST_NAME_/index.md
/Scripts/convert.sh
And the shell script like this:
#!/bin/bash
# File: convert.sh
SrcDirPath=$(cd "$(dirname "$0")/../"; pwd)
cd "$SrcDirPath"
echo "Converting: $SrcDirPath/$1"
Then we can launch watchman-wait like this:
watchman-wait . --max-events 0 -p 'posts/**/*.md' | while read line; do ./Scripts/convert.sh $line; done
When we changing file /posts/_SUBDIR_WITH_POST_NAME_/index.md the output will be like this:
...
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
...

watchman-make is intended to be used together with tools that will perform a follow-up query of their own to discover what they want to do as a next step. For example, running the make tool will cause make to stat the various deps to bring things up to date.
That means that your upload.py script needs to know how to do this for itself if you want to use it with watchman.
You have a couple of options, depending on how sophisticated you want things to be:
Use pywatchman to issue an ad-hoc query
If you want to be able to run upload.py whenever you want and have it figure out the right thing (just like make would do) then you can have it ask watchman directly. You can have upload.py use pywatchman (the python watchman client) to do this. pywatchman will get installed if the the watchman configure script thinks you have a working python installation. You can also pip install pywatchman. Once you have it available and in your PYTHONPATH:
import pywatchman
client = pywatchman.client()
client.query('watch-project', os.getcwd())
result = client.query('query', os.getcwd(), {
"since": "n:pi_upload",
"fields": ["name"]})
print(result["files"])
This snippet uses the since generator with a named cursor to discover the list of files that changed since the last query was issued using that same named cursor. Watchman will remember the associated clock value for you, so you don't need to complicate your script with state tracking. We're using the name pi_upload for the cursor; the name needs to be unique among the watchman clients that might use named cursors, so naming it after your tool is a good idea to avoid potential conflict.
This is probably the most direct way to extract the information you need without requiring that you make more invasive changes to your upload script.
Use pywatchman to initiate a long running subscription
This approach will transform your upload.py script so that it knows how to directly subscribe to watchman, so instead of using watchman-make you'd just directly run upload.py and it would keep running and performing the uploads. This is a bit more invasive and is a bit too much code to try and paste in here. If you're interested in this approach then I'd suggest that you take the code behind watchman-wait as a starting point. You can find it here:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait
The key piece of this that you might want to modify is this line:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait#L169
which is where it receives the list of files.
Why not triggers?
You could use triggers for this, but we're steering folks away from triggers because they are hard to manage. A trigger will run in the background and have its output go to the watchman log file. It can be difficult to tell if it is running, or to stop it running.
The interface is closer to the unix model and allows you to feed a list of files on stdin.
Speaking of unix, what about watchman-wait?
We also have a command that emits the list of changed files as they change. You could potentially stream the output from watchman-wait in your upload.py. This would make it have some similarities with the subscription approach but do so without directly using the pywatchman client.

GitPython How to clone all branches?

I found this Bash script, which accomplishes the same task but in Bash.
Working with GitPython to create a complete backup of a repository, can't seem to figure out how to get all the branches to show without having to manual checkout them.
Repo.clone_from(repo, folder)
I'm using this line in a loop to copy all the files from the repos.

I have recently stumbled upon this thread, but have not found any solution, so I have brewed my own. It is better than nothing, neither optimal nor pretty, but at least it works:
# Clone master
repo = Repo.clone_from('https://github.com/user/stuff.git', 'work_dir', branch='master')
# Clone the other branches as needed and setup them tracking the remote
for b in repo.remote().fetch():
repo.git.checkout('-B', b.name.split('/')[1], b.name)
Explanation:
Clone the master branch
Fetch the remote branch names
Checkout branches and create them as needed
PS: I think Gitpython has not got a built-in method to do this with clone_from.
UPDATE:
no_single_branch=True option in GitPyhton is equal to --no-single-branch in git CLI (other 'valueless' arguments can also supplied with True value)
repo = Repo.clone_from('https://github.com/user/stuff.git', 'work_dir', branch='master', no_single_branch=True)

for b in Repo.clone_from(repo, folder).remote[0].fetch():
print(b.name)

call python code of different git branch other than the current repository without switching branch

So, basically I have 2 versions of a project and for some users, I want to use the latest version while for others, I want to use older version. Both of them have same file names and multiple users will use it simultaneously. To accomplish this, I want to call function from different git branch without actually switching the branch.
Is there a way to do so?
for eg., when my current branch is v1 and the other branch is v2; depending on the value of variable flag, call the function
if flag == 1:
# import function f1() from branch v2
return f1()
else:
# use current branch v1

Without commenting on why you need to do that, you can simply checkout your repo twice: once for branch1, and one for branch2 (without cloning twice).
See "git working on two branches simultaneously".
You can then make your script aware of its current path (/path/to/branch1), and relative path to the other branch (../branch2/...)

You must have both versions of the code present / accessible in order to invoke both versions of the code dynamically.
The by-far-simplest way to accomplish this is to have both versions of the code present in different locations, as in VonC's answer.
Since Python is what it is, though, you could dynamically extract specific versions of specific source files, compile them on the fly (using dynamic imports and temporary files, or exec and internal strings), and hence run code that does not show up in casual perusal of the program source. I do not encourage this approach: it is difficult (though not very difficult) and error-prone, tends towards security holes, and is overall a terrible way to work unless you're writing something like a Python debugger or IDE. But if this is what you want to do, you simply decompose the problem into:
examine and/or extract specific files from specific commits (git show, git cat-file -p, etc.), and
dynamically load or execute code from file in file system or from string in memory.
The first is a Git programming exercise (and is pretty trivial, git show 1234567:foo.py or git show branch:foo.py: you can redirect the output to a file using either shell redirection or Python's subprocess module), and when done with files, the second is a Python programming exercise of moderate difficulty: see the documentation, paying particularly close attention to importlib.

Module for Python so that you can generate a free-standing, self-documenting command line utility ala Perl's pod2man/pod2usage?

All I can find is this reference:
Is it possible to use POD(plain old documentation) with Python?
which looks like you have to generate a whole separate set of docs to go with code.
I would like to try Python for making cmdline utils, but when I do this with Perl I can embed the docs directly in the source, and use the Pod2Usage module along with Getopt so that any of my scripts can be run like this:
cmd --man
and this triggers the pod system to dump documentation that is embedded in the script in man-page format. It can also generate shorter (synopsis), or medium formats.
It looks like I could use the pydoc code and kind of reverse engineer it to sort-of do the task (at least showing the full documentation), but I am hoping something better already exists.

The python-modargs package lets you create self-documenting command line interfaces. You define a function for each command you want to make available, and the function's docstring becomes the help text for that function. The function's keyword arguments become named arguments and python-modargs will parse inline comments after the keyword arguments to be help text for that argument.
I use python-modargs to generate the command line interface for dexy, here is the module which defines the commands:
https://github.com/ananelson/dexy/blob/027954f9234363d506225d40b675b3d6478994f4/dexy/commands.py#L144
You need to implement a help_command method to get the generated help, it's a 1-liner.

I think pydoc may be what you're looking for.
It certainly isn't quite the same as POD, as you have to call pydoc itself (e.g. pydoc myscript.py), but I guess it can be a good starting point.
Of course, you can always add pydoc support for your script by importing from it and using it's functions/classes.
Checkout pydoc's own cli implementation for the best example.

Mercurial scripting with python

I am trying to get the mercurial revision number/id (it's a hash not a number) programmatically in python.
The reason is that I want to add it to the css/js files on our website like so:
<link rel="stylesheet" href="example.css?{% mercurial_revision "example.css" %}" />
So that whenever a change is made to the stylesheet, it will get a new url and no longer use the old cached version.
OR if you know where to find good documentation for the mercurial python module, that would also be helpful. I can't seem to find it anywhere.
My Solution
I ended up using subprocess to just run a command that gets the hg node. I chose this solution because the api is not guaranteed to stay the same, but the bash interface probably will:
import subprocess
def get_hg_rev(file_path):
pipe = subprocess.Popen(
["hg", "log", "-l", "1", "--template", "{node}", file_path],
stdout=subprocess.PIPE
)
return pipe.stdout.read()
example use:
> path_to_file = "/home/jim/workspace/lgr/pinax/projects/lgr/site_media/base.css"
> get_hg_rev(path_to_file)
'0ed525cf38a7b7f4f1321763d964a39327db97c4'

It's true there's no official API, but you can get an idea about best practices by reading other extensions, particularly those bundled with hg. For this particular problem, I would do something like this:
from mercurial import ui, hg
from mercurial.node import hex
repo = hg.repository('/path/to/repo/root', ui.ui())
fctx = repo.filectx('/path/to/file', 'tip')
hexnode = hex(fctx.node())
Update At some point the parameter order changed, now it's like this:
repo = hg.repository(ui.ui(), '/path/to/repo/root' )

Do you mean this documentation?
Note that, as stated in that page, there is no official API, because they still reserve the right to change it at any time. But you can see the list of changes in the last few versions, it is not very extensive.

An updated, cleaner subprocess version (uses .check_output(), added in Python 2.7/3.1) that I use in my Django settings file for a crude end-to-end deployment check (I dump it into an HTML comment):
import subprocess
HG_REV = subprocess.check_output(['hg', 'id', '--id']).strip()
You could wrap it in a try if you don't want some strange hiccup to prevent startup:
try:
HG_REV = subprocess.check_output(['hg', 'id', '--id']).strip()
except OSError:
HG_REV = "? (Couldn't find hg)"
except subprocess.CalledProcessError as e:
HG_REV = "? (Error {})".format(e.returncode)
except Exception: # don't use bare 'except', mis-catches Ctrl-C and more
# should never have to deal with a hangup
HG_REV = "???"

If you are using Python 2, you want to use hglib.
I don't know what to use if you're using Python 3, sorry. Probably hgapi.
Contents of this answer
Mercurial's APIs
How to use hglib
Why hglib is the best choice for Python 2 users
If you're writing a hook, that discouraged internal interface is awfully convenient
Mercurial's APIs
Mercurial has two official APIs.
The Mercurial command server. You can talk to it from Python 2 using the hglib (wiki, PyPI) package, which is maintained by the Mercurial team.
Mercurial's command-line interface. You can talk to it via subprocess, or hgapi, or somesuch.
How to use hglib
Installation:
pip install python-hglib
Usage:
import hglib
client = hglib.open("/path/to/repo")
commit = client.log("tip")
print commit.author
More usage information on the hglib wiki page.
Why hglib is the best choice for Python 2 users
Because it is maintained by the Mercurial team, and it is what the Mercurial team recommend for interfacing with Mercurial.
From Mercurial's wiki, the following statement on interfacing with Mercurial:
For the vast majority of third party code, the best approach is to use Mercurial's published, documented, and stable API: the command line interface. Alternately, use the CommandServer or the libraries which are based on it to get a fast, stable, language-neutral interface.
From the command server page:
[The command server allows] third-party applications and libraries to communicate with Mercurial over a pipe that eliminates the per-command start-up overhead. Libraries can then encapsulate the command generation and parsing to present a language-appropriate API to these commands.
The Python interface to the Mercurial command-server, as said, is hglib.
The per-command overhead of the command line interface is no joke, by the way. I once built a very small test suite (only about 5 tests) that used hg via subprocess to create, commit by commit, a handful of repos with e.g. merge situations. Throughout the project, the runtime of suite stayed between 5 to 30 seconds, with nearly all time spent in the hg calls.
If you're writing a hook, that discouraged internal interface is awfully convenient
The signature of a Python hook function is like so:
# In the hgrc:
# [hooks]
# preupdate.my_hook = python:/path/to/file.py:my_hook
def my_hook(
ui, repo, hooktype,
... hook-specific args, find them in `hg help config` ...,
**kwargs)
ui and repo are part of the aforementioned discouraged unofficial internal API. The fact that they are right there in your function args makes them terribly convenient to use, such as in this example of a preupdate hook that disallows merges between certain branches.
def check_if_merge_is_allowed(ui, repo, hooktype, parent1, parent2, **kwargs):
from_ = repo[parent2].branch()
to_ = repo[parent1].branch()
...
# return True if the hook fails and the merge should not proceed.
If your hook code is not so important, and you're not publishing it, you might choose to use the discouraged unofficial internal API. If your hook is part of an extension that you're publishing, better use hglib.

give a try to the keyword extension

FWIW to avoid fetching that value on every page/view render, I just have my deploy put it into the settings.py file. Then I can reference settings.REVISION without all the overhead of accessing mercurial and/or another process. Do you ever have this value change w/o reloading your server?

I wanted to do the same thing the OP wanted to do, get hg id -i from a script (get tip revision of the whole REPOSITORY, not of a single FILE in that repo) but I did not want to use popen, and the code from brendan got me started, but wasn't what I wanted.
So I wrote this... Comments/criticism welcome. This gets the tip rev in hex as a string.
from mercurial import ui, hg, revlog
# from mercurial.node import hex # should I have used this?
def getrepohex(reporoot):
repo = hg.repository(ui.ui(), reporoot)
revs = repo.revs('tip')
if len(revs)==1:
return str(repo.changectx(revs[0]))
else:
raise Exception("Internal failure in getrepohex")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.