How to merge the display of logs from several Mercurial repositories

How to merge the display of logs from several Mercurial repositories - python

Is there a way to merge the change logs from several different Mercurial repositories? By "merge" here I just mean integrate into a single display; this is nothing to do with merging in the source control sense.
In other words, I want to run hg log on several different repositories at once. The entries should be sorted by date regardless of which repository they're from, but be limited to the last n days (configurable), and should include entries from all branches of all the repositories. It would also be nice to filter by author and do this in a graphical client like TortoiseHg. Does anyone know of an existing tool or script that would do this? Or, failing that, a good way to access the log entries programmically? (Mercurial is written in Python, which would be ideal, but I can't find any information on a simple API for this.)
Background: We are gradually beginning to transition from SVN to Mercurial. The old repository was not just monolithic in the sense of one server, but also in the sense that there was one huge repository for all projects (albeit with a sensible directory structure). Our new Mercurial repositories are more focused! In general, this works much better, but we miss one useful feature from SVN: being able to use svn log at the root of the repository to see everything we have been working on recently. It's very useful for filling in timesheets, giving yourself a sense of purpose, etc.

I figured out a way of doing this myself. In short, I merge all the revisions into one mega-repo, and I can then look at this in TortoiseHG. Of course, it's a total mess, but it's good enough to get a summary of what happened recently.
I do this in three steps:
(Optional) Run hg convert on each source repository using the branchmap feature to rename each branch from original to reponame/original. This makes it easier later to identify which revision came from which source repository. (More faithful to SVN would be to use the filemap feature instead.)
On a new repository, run hg pull -f to force-pull from the individual repositories into a one big one. This gets all the revisions in one place, but they show up in the wrong order.
Use the method described in this answer to create yet another repository that contains all the changes from the one created in step 2 but sorted into the right order. (Actually I use a slight variant: I get the hashes and compare against the hashes in the destination, check that the destination has a prefix of the source's, and only copy the new ones across.)
This is all done from a Python script, but although Mercurial is written in Python I just use the command line interface using the subprocess module. Running through the three steps only copies the new revisions without rebuilding everything from scratch, unless you add a new repo.

Related

How to distinguish several small changes from daily changes within git?

I work on a major private Python project in an exploratory way for several months using Pycharm.
I use git to track the changes in that project.
I am the only person that contributes to that project.
Normally I commit changes to that project roughly once a day.
Now I want to track changes of my code every time I execute my code (the background is that I sometimes get lost which intermediate result was achieved using which version of my code).
Therefore I want to perform a git commit of all changed files at the end of the script execution.
As now every commit just gets a 'technical' commit message, I would like to distinguish these 'technical' commits from the other commits I do roughly once a day (see above). The background is that I still would like to see and compare the daily differences from each other. The technical commits might sum up to several dozens per day and would hinder me to see the major changes over the course of time.
Which techniques does git offer to distinguish the technical commits from the daily commits I do?
Is maybe branching a valid approach for this? If yes, would I delete these branches later on? (I am a git novice)

You could use a branch for that, yes. Just use a working branch when doing your scripted autocommits, and then when you want to make a commit for the history, switch to your main branch.
To get to re-add the final changes as a single commit, one way would be to soft reset the history when you are done with the changes. So you would:
git reset prev-real-commit
Which jumps the history back to before your new batch of wip auto commits, but does not touch the files so you don't loose work. Then you can make a new commit normally for the changes.
That technique also works without a branch. Using a branch might still be nice though so you can easily check what the version was before your new wip commits.
Git also has rebasing which would allow squashing multiple commits to one and rewriting the messages. But for the workflow you describe, I think simply reseting the autocommits away and redoing a normal commit is better.
Also the suggestion to add some tag to the message of the autocommits is good.
That said, I usually just commit the checkpoints that I need in normal dev flow. It can be nicer to have a commit e.g. every hour instead of only once a day. Small atomic commits are good. You can use feature branches and on GitHub pull requests if you want to record and manage larger wholes of work.

I think even if you work on this project alone it might still be a good idea to adopt typical github flow approach - and start using branches.
The idea is that you distinguish your "technical" commits (many issued throughout the day) from your daily commits (rarely more than one) in terms of Git entities used:
your main code stays in master branch
your daily commits remain 'normal' commits going into a specific long-running branch (develop is a common name)
your 'once-a-day' commit becomes a merge commit, pushing all the changes in develop into master branch
This allows to you to save the history - yet see a clear distinction between those two types. You can opt for 'no fast forward' approach, so that each merge commit becomes clearly distinct from 'regular' ones.
And if you actually don't want all the history to be there (as #antont said, there might be a LOT of commits), you might consider 'squashing' those commits when either merging or rebasing, like described here.

How to verify if two systems are in sync

I have a requirement to test two applications (via automation using Python).
The requirement is for example we have a system called “www.abc.com”
where we develop and merge code in every 2 weeks and then we create a another system called “www.xyz.com” ( basically it is backup to the first system ), everytime we do a release and add/edit in the main system, we update in our back up system.
Now the question is i need to tests both the system, after every release (every 2 weeks) to see if they both are in sync (identical).
how do i fire a python automation test script (multiple tests) to check if for example databases, servers, UI, front end, check if code base are same in both systems? can i do that if yes any help and advice , please suggest so that i can implement possible solutions .

There are several ways you could approach this:
Assuming you are using some sort of source control you could write a script to make sure that the repo is up to date and then report back the results. See here and here. This probably won't cover the data in your databases, but there are numerous ways to back database backups and it will depend what programs you are using.
Another or additional way you might check is to write a script to gather a list of hashes or checksums of all the files you care about in both systems and then compare the list for differences.

call python code of different git branch other than the current repository without switching branch

So, basically I have 2 versions of a project and for some users, I want to use the latest version while for others, I want to use older version. Both of them have same file names and multiple users will use it simultaneously. To accomplish this, I want to call function from different git branch without actually switching the branch.
Is there a way to do so?
for eg., when my current branch is v1 and the other branch is v2; depending on the value of variable flag, call the function
if flag == 1:
# import function f1() from branch v2
return f1()
else:
# use current branch v1

Without commenting on why you need to do that, you can simply checkout your repo twice: once for branch1, and one for branch2 (without cloning twice).
See "git working on two branches simultaneously".
You can then make your script aware of its current path (/path/to/branch1), and relative path to the other branch (../branch2/...)

You must have both versions of the code present / accessible in order to invoke both versions of the code dynamically.
The by-far-simplest way to accomplish this is to have both versions of the code present in different locations, as in VonC's answer.
Since Python is what it is, though, you could dynamically extract specific versions of specific source files, compile them on the fly (using dynamic imports and temporary files, or exec and internal strings), and hence run code that does not show up in casual perusal of the program source. I do not encourage this approach: it is difficult (though not very difficult) and error-prone, tends towards security holes, and is overall a terrible way to work unless you're writing something like a Python debugger or IDE. But if this is what you want to do, you simply decompose the problem into:
examine and/or extract specific files from specific commits (git show, git cat-file -p, etc.), and
dynamically load or execute code from file in file system or from string in memory.
The first is a Git programming exercise (and is pretty trivial, git show 1234567:foo.py or git show branch:foo.py: you can redirect the output to a file using either shell redirection or Python's subprocess module), and when done with files, the second is a Python programming exercise of moderate difficulty: see the documentation, paying particularly close attention to importlib.

Is there a way to switch code indentation from tabs to spaces across the project, and to keep 'hg annotate' functionality?

There's a rather large oldish Python project that historically has the most (95%+) of the code base using tabs for indentation. Mercurial is used as a VCS.
There're several inconveniences in using tabs. It seems that 4 spaces became a prevailing indentation way within Python community, and most of code analysing/formatting software messes up with tabs one way or another. Also, most (pretty much all, actually) of team members that are working on the project are preferring spaces to tabs, thus would like to switch.
So, there's this fear of losing the ability to track who was the latest modifier of a specific line of code... because if all of the lines of code are converted to use spaces-based indentation from using tabs-based one, and then the change gets committed to the mercurial repository, that's exactly what's going to happen. And this feature (hg annotate) is too useful to consider a possibility of sacrificing it.
Is there a way to switch the indentation method across the project without losing the Mercurial hg annotate functionality? If there is, what would be the most painless way?

If you were to do the replacement of each tab with 4 spaces, you could still get a reasonably correct result from annotate, just use the switch that ignores changes in whitespace:
hg annotate -b text.txt
You could also use -w to ignore all whitespace in the comparison, but -b appeared to the best match: ignoring the case when some whitespace was changed into a different whitespace.
This would, however, ignore all lines where only whitespace had been altered, which would ignore changes in indentation and leave them attributed to the previous alteration of the line.
See hg help annotate for more.

You could create a new repository and, by using suitable scripts, populate it with each commit of the previous history BUT with the files automatically modified from what was actually committed to the same with the tabs replaced. Basically your script would need to checkout the initial file set and get the commit details replace any tabs in the file set and then commit to the new repository with the original commit details. It would then move on to the next change set, generate and apply paches, filter for tabs again and commit, etc. There is a blog here about doing something similar.
You could do this offline and autmatically and on an agreed upon date replace the repositories on your server, (keeping a copy of course), with the modified one - just remember to tell your team that they need to pull before doing any work the next day.
I would strongly recommend implementing pre-commit hooks, so as to ensure that you do not get polluted should anybody try checking in an old format file. They would probably be worth having in place on the new repository before starting the process.
UPDATE
Having written the above I finally came up with the correct search terms and found you hg_clone which should do exactly what you need, to quote the opening comments:
# Usage: hg-clone.rb SOURCE-DIR DEST-DIR --target TARGET-REVISION --cutoff CUTOFF-REVISION --filter FILTER
#
# This program clones a mercurial repository to a new directory, changeset by changeset
# with the option of running a filter before each commit. The filter can be used for
# example to strip out secret data (such as code for unused platforms) from the code.
#
# --target TARGET-REVISION
# The revision that the DEST-DIR should be updated to. This revision, and all its parent revisions
# will be copied over to the dest dir. If no TARGET-REVISION is specified, the latest revision in
# the repositiory will be used.
#
# --cutoff CUTOFF-REVISION
# If specified, this should be a short revision number that acts as a cutoff for synching. If you
# specify 100 for instance, no revisions before 100 will be brought over to DEST-DIR. Revisions
# that have parents earlier than revision 100 will be reparented to have 100 as their revision.
#
# --filter FILTER
# Specifies a program that should be run in the DEST-DIR before committing each revision.

How to see changes for a mercurial file context?

I'm currently trying to write a script that will find all the files changed given a certain # in the task description, and I have gotten the script to work for that. But now I'm trying to sort it by whether the file was added, modified or removed. I've looked through the Mercurial API, but I can't find anything that can do what I want.
My code currently uses repo[revnum].description() and parses that to find which ones contain the #, and if they do, add the file context to a list.
This works fine and I can print a list of files, but I can't find a method to see what was done with each context. Can anyone help me out here, or point me to some better documentation?

Do you need to work with the Mercurial API? It is possible to do what you need by working with the output of hg log

In general, you should avoid writing scripts directly using the Mercurial API. It is better to write your scripts to use the CLI or perhaps even use hglib. As stated on the MercurialApi wiki:
For the vast majority of third party code, the best approach is to use
Mercurial's published, documented, and stable API: the command line
interface.
That being said, if you really need to use the API, you can use repo.status() to find the info you asked about:
modified, added, removed, deleted, unknown, ignored, clean = repo.status(revnum-1, revnum)

I ended up using something similar to what Tim said, although I did still use the API.
I imported commands from mercurial, and then called commands.status(repo.ui, repo, change=revnum)
I captured the output of this, using repo.ui.pushbuffer() and repo.ui.popbuffer() which was in the form
A file_path1
R file_path2
R file_path3
A file_path4
M file_path5
I parsed this input and sorted it into Add, remove, modify, etc..

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.