Automating commit and push through mercurial from script

Automating commit and push through mercurial from script - python

What I would like it is to run a script that automatically checks for new assets (files that aren't code) that have been submitted to a specific directory, and then every so often automatically commit those files and push them.
I could make a script that does this through the command line, but I was mostly curious if mercurial offered any special functionality for this, specifically I'd really like some kind of return error code so that my script will know if the process breaks at any point so I can send an email with the error to specific developers. For example if for some reason the push fails because a pull is necessary first, I'd like the script to get a code so that it knows this and can handle it properly.
I've tried researching this and can only find things like automatically doing a push after a commit, which isn't exactly what I'm looking for.

You can always check exit-code of used commands
hg add (if new, unversioned files appeared in WC) "Returns 0 if all files are successfully added": non-zero means "some troubles here, not all files added"
hg commit "Returns 0 on success, 1 if nothing changed": 1 means "no commit, nothing to push"
hg push "Returns 0 if push was successful, 1 if nothing to push"

Related

How to keep checking a directory in linux system if a log file is created or no? Using python

A feature will be launched which will create logs as the run proceeds. So, I have to write a script which will keep on checking a directory during runtime of the about feature if there are any log files created or no and incase if I see the logs being created, I will do further actions. The tricky part here is I do not have access to watch, cron jobs or anything like that at customer site so any other suggestion would be appreciated. Also, I cannot install any python libraries, so I need something very basic.
I haven't yet tried but looking to see if any function exists, I am planning to use while loop to keep monitoring the directory.

If you cannot convince the client to install inotify-tools in order to have access to inotifywait, then you need to keep track of not only the existence of an output file, but, more important, is the output file closed (is the process finished with that file).
In other words the process creating the file would have a second "flag file" (call it ${process_name}.writing) which would be created before output and removed when output completed.
As for conditional logic, if output.txt exists and ${process_name}.writing does not, then the output.txt is complete and usable.
You can always consider use of the flock utility to test/assign reserved use of file in order to ensure no conflicts in acquisition of "closed" files.

when using Watchman's watch-make I want to access the name of the changed files

I am writing a watchman command with watchman-make and I'm at a loss when trying to access exactly what was changed in the directory. I want to run my upload.py script and inside the script I would like to access filenames of newly created files in /var/spool/cups-pdf/ANONYMOUS .
so far I have
$ watchman-make -p '/var/spool/cups-pdf/ANONYMOUS' -—run 'python /home/pi/upload.py'
I'd like to add another argument to python upload.py so I can have an exact filepath to the newly created file so that I can send the new file over to my database in upload.py,
I've been looking at the docs of watchman and the closest thing I can think to use is a trigger object. Please help!

Solution with watchman-wait:
Assuming project layout like this:
/posts/_SUBDIR_WITH_POST_NAME_/index.md
/Scripts/convert.sh
And the shell script like this:
#!/bin/bash
# File: convert.sh
SrcDirPath=$(cd "$(dirname "$0")/../"; pwd)
cd "$SrcDirPath"
echo "Converting: $SrcDirPath/$1"
Then we can launch watchman-wait like this:
watchman-wait . --max-events 0 -p 'posts/**/*.md' | while read line; do ./Scripts/convert.sh $line; done
When we changing file /posts/_SUBDIR_WITH_POST_NAME_/index.md the output will be like this:
...
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
Converting: /Users/.../Angular/dartweb_quickstart/posts/swift-on-android-building-toolchain/index.md
...

watchman-make is intended to be used together with tools that will perform a follow-up query of their own to discover what they want to do as a next step. For example, running the make tool will cause make to stat the various deps to bring things up to date.
That means that your upload.py script needs to know how to do this for itself if you want to use it with watchman.
You have a couple of options, depending on how sophisticated you want things to be:
Use pywatchman to issue an ad-hoc query
If you want to be able to run upload.py whenever you want and have it figure out the right thing (just like make would do) then you can have it ask watchman directly. You can have upload.py use pywatchman (the python watchman client) to do this. pywatchman will get installed if the the watchman configure script thinks you have a working python installation. You can also pip install pywatchman. Once you have it available and in your PYTHONPATH:
import pywatchman
client = pywatchman.client()
client.query('watch-project', os.getcwd())
result = client.query('query', os.getcwd(), {
"since": "n:pi_upload",
"fields": ["name"]})
print(result["files"])
This snippet uses the since generator with a named cursor to discover the list of files that changed since the last query was issued using that same named cursor. Watchman will remember the associated clock value for you, so you don't need to complicate your script with state tracking. We're using the name pi_upload for the cursor; the name needs to be unique among the watchman clients that might use named cursors, so naming it after your tool is a good idea to avoid potential conflict.
This is probably the most direct way to extract the information you need without requiring that you make more invasive changes to your upload script.
Use pywatchman to initiate a long running subscription
This approach will transform your upload.py script so that it knows how to directly subscribe to watchman, so instead of using watchman-make you'd just directly run upload.py and it would keep running and performing the uploads. This is a bit more invasive and is a bit too much code to try and paste in here. If you're interested in this approach then I'd suggest that you take the code behind watchman-wait as a starting point. You can find it here:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait
The key piece of this that you might want to modify is this line:
https://github.com/facebook/watchman/blob/master/python/bin/watchman-wait#L169
which is where it receives the list of files.
Why not triggers?
You could use triggers for this, but we're steering folks away from triggers because they are hard to manage. A trigger will run in the background and have its output go to the watchman log file. It can be difficult to tell if it is running, or to stop it running.
The interface is closer to the unix model and allows you to feed a list of files on stdin.
Speaking of unix, what about watchman-wait?
We also have a command that emits the list of changed files as they change. You could potentially stream the output from watchman-wait in your upload.py. This would make it have some similarities with the subscription approach but do so without directly using the pywatchman client.

How to pull with GitPython?

I am using GitPython to clone a repository from a Gitlab server.
git.Repo.clone_from(gitlab_ssh_URL, local_path)
Later I have another script that tries to update this repo.
try:
my_repo = git.Repo(local_path)
my_repo .remotes.origin.pull()
except (git.exc.InvalidGitRepositoryError, git.exc.NoSuchPathError):
print("Invalid repository: {}".format(local_path)
This working great except if in-between I checkout a tag like this:
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo .head.reference = my_repo.tags[tag_id]
my_repo .head.reset(index=True, working_tree=True)
In this case I get a GitCommandError when pulling:
git.exc.GitCommandError: 'git pull -v origin' returned with exit code 1
I read the documentation twice already and I don't see where is the problème. Especially since if I try to Pull this repo with a dedicate tool like SourceTree it is working without error or warning.
I don't understand how the fact that I checked out a tagged version even with a detached HEAD prevent me from pulling.
What should I do to pull in this case?
What is happening here and what do I miss?
Edit : as advise I tried to look in the exception.stdout and exception.sterr and there is nothing useful here (respectively b'' and None). That's why I have a hard time understanding what's wrong.

I think it's good idea is to learn more about what is happening first (question 2: what is going on?), and that should guide you to the answer to question 1 (how to fix this?).
To know more about what went wrong you can print out stdout and stderr from the exception. Git normally prints error details to the console, so something should be in stdout or stderr.
try:
git.Repo.clone_from(gitlab_ssh_URL, local_path)
except git.GitCommandError as exception:
print(exception)
if exception.stdout:
print('!! stdout was:')
print(exception.stdout)
if exception.stderr:
print('!! stderr was:')
print(exception.stderr)
As a side note, I myself had some issues few times when I did many operations on the git.Repo object before using it to interact with the back-end (i.e. the git itself). In my opinion there sometimes seem to be some data caching issues on the GitPython side and lack of synchronisation between data in the repository (.git directory) and the data structures in the git.Repo object.
EDIT:
Ok, the problem seems to be with pulling on detached head - which is probably not what you want to do anyway.
Still, you can work around your problem. Since from detached head you do git checkout master only in order to do git pull and then you go back to the detached head, you could skip pulling and instead use git fetch <remote> <source>:<destination> like this: git fetch origin master:master. This will fetch remote and merge your local master branch with tracking branch without checking it out, so you can remain in detached head state all along without any problems. See this SO answer for more details on the unconventional use of fetch command: https://stackoverflow.com/a/23941734/4973698
With GitPython, the code could look something like this:
my_repo = git.Repo(local_path)
tag_id = choose_tag() # Return the position of an existing tag in my_repo.tags
my_repo.head.reference = my_repo.tags[tag_id]
my_repo.head.reset(index=True, working_tree=True)
fetch_info = my_repo.remotes.origin.fetch('master:master')
for info in fetch_info:
print('{} {} {}'.format(info.ref, info.old_commit, info.flags))
and it would print something like this:
master 11249124f123a394132523513 64
... so flags equal 64. What does it mean? When you do print(git.FetchInfo.FAST_FORWARD) the result is 64 so that means that the fetch was of Fast-Forward type, and that therefore your local branch was successfully merged with remote tracking branch, i.e. you have executed git pull origin master without checking out the master.
Important note: this kind of fetch works only if your branch can be merged with remote branch using fast-forward merge.

The answer is that trying to pull on a detach head is not a good idea even if some smart client handle this case nicely.
So my solution in this case is to checkout the last version on my branch (master), pull then checkout again the desired tagged version.
However we can regret the poor error message given by GitPython.

Perforce API: Get Latest Revision of a Subdirectory

I have downloaded and installed the Perforce API for Python.
I'm able to run the examples on this page:
http://www.perforce.com/perforce/doc.current/manuals/p4script/03_python.html#1127434
But unfortunately the documentation seems incomplete. For example, the P4 class has a method called run_sync, but it's not documented anywhere (in fact, it doesn't even show up if you run dir(p4) in the Python interactive interpreter, despite the fact that you can use the method just fine in the interactive interpreter.)
So I'm struggling with figuring out how to use the API for anything beyond the trivial examples on the page I linked to above.
I would like to write a script which simply downloads the latest revision of a subdirectory to the filesystem of the computer running it and does nothing else. I don't want the server to change in any way. I don't want there to be any indication that the files came from Perforce (as opposed to if you get the files via the Perforce application, it'll mark the files in your file system as read only until you check them out or whatever. That's silly - I just need to pull down a snapshot of what the subdirectory looked like at the moment the script was run.)

The Python API follows the same basic structure as the command line client (both are very thin wrappers over the same underlying API), so you'll want to look at the command line client documentation; for example, look at "p4 sync" to understand how "run_sync" in P4Python works:
http://www.perforce.com/perforce/r14.2/manuals/cmdref/p4_sync.html
For the task you're describing I would do the following (I'll describe it in terms of Perforce commands since my Python is a little rusty; once you know what commands you're running it should be pretty simple to translate into Python, since the P4Python doc has examples of things like creating and modifying a client spec, which is the hardest part):
1) Create a client that maps the desired depot directory to the desired local filesystem location, e.g. if you want the directory "//depot/foo/..." downloaded to "/usr/team/foo" you'd make a client that looks like:
Client: mytempclient123847
Root: /usr/team/foo
View:
//depot/foo/... //mytempclient123847/...
You should set the "allwrite" option on the client since you said don't want the synced files to be read-only:
Options: allwrite noclobber nocompress unlocked nomodtime rmdir
2) Sync, using the "-p" option to minimize server impact (the server will not record that you "have" the files).
3) Delete the client.
(I'm omitting some details like making sure that you're authenticated correctly -- that's a whole other potential challenge depending on your server's security and whether it's using external authentication, but it sounds like that's not the part you're having trouble with.)

Find all deleted files in a git repositor along with who deleted them

I have a project under version control with Git. In this project there is a "grid" of files which are organized like
/parts
/a
01.src
02.src
...
90.src
/b
01.src
02.src
...
90.src
/...
(It doesn't matter for the question, but maybe it helps to know that these numbered files are small excisions from a musical score.)
These numbered files are generated by a script, and one part of our work is deleting those files that are not used in the musical score.
Now I would like to retrieve information on who deleted each file (as part of our project documentation and workflow). Information retrieval is done from a Python script.
I have a working approach, but that is extremely inefficient because it calls Git as a subprocess for each file in question, which may be far beyond 1.000 times.
What I can do is calling for each file that is missing in the directory tree:
git log --pretty=format:"%an" --diff-filter=D -- FILENAME
This gives me the author name of the last and deleting commit affecting the file. This works correctly, but as said I have to spawn a new subprocess for each deleted file.
I can do the same with a for loop on the shell:
for delfile in $(git log --all --pretty=format: --name-only --diff-filter=D | sort -u); do echo $delfile: $( git log --pretty=format:"%an" --diff-filter=D -- $delfile); done
But this is really slow, which is understandable because it spawns a new git call for every single file (just as if I'd do it from Python).
So the bottom line is: Is there an efficient way to ask Git about
all files that have been deleted from the repository
(possibly restricted to a subdirectory)
along with the author name of the last commit touching each file
(or actually: The author who deleted the file)
?

It seems my last comment brought me on the right track myself:
git log --diff-filter='D|R' --pretty=format:'%an' --name-only parts
gives me the right thing:
--diff-filter filters the right commits
--pretty=format:'%an' returns only the author
--name-only returns a list of deleted files
So as a result I get something like
Author-1
deleted-file-1
deleted-file-2
Author-2
deleted-file-3
deleted-file-4
Author-1
deleted-file-5
This doesn't give me any more information on the commits, but I don't need that for my use-case. This result can easily be processed from within Python.
(For anybody else landing on this paeg: If you need a similar thing but also want information on the result you can modify the --pretty=format:'..' option. See http://git-scm.com/book/en/Git-Basics-Viewing-the-Commit-History for a list of items that can be displayed)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.