Is a format-only update a frivolous Git commit?

Is a format-only update a frivolous Git commit? - python

I have a Python project that I'm working on for research. I've been working on two different machines, and I recently discovered that half of my files used tabs and the other half used spaces.
Python objected to this when I attempted to edit and run a file from one machine on the other, so I'd like to switch everything to spaces instead of tabs. However, this seems like a waste of a Git commit - running 'git diff' on the uncommitted-but-correct files makes it look like I'm wiping out and replacing the entire file.
Is there a way around this? That is, is there some way that I can "hide" these (IMO) frivolous changes?

There is, it's called a rebase - however, you'd probably want to add spaces to every file retroactively in each commit where you edited that file, which would be extremely tedious.
However, there is absolutely nothing wrong with having a commit like this. A commit represents a change a distinct functioning state of your project, and replacing tabs with spaces is definitely a distinct state.
One place where you would want to use rebase is if you accidentally make a non-functional commit. For example, you might have committed only half the files you need to.
One last thing: never edit history (i.e. with rebase) once you've pushed your changes to another machine. The machines will get out of sync, and your repo will start slowly exploding.

Unfortunately there is no way around the fact that at the textual level, this is a big change. The best you can do is not mix whitespace changes with any other changes. The topic of a such a commit should be nothing but the whitespace change.
If this screwup is unpublished (only in your private repos), you can go back in time and fix the mess in that point in the history when it was introduced, and then go through the pain of fixing up the subsequent changes (which have to be re-worked in the correct indentation style). For the effort, you end up with a clean history.

It is perfectly valid. Coupling a white space reformat with other changes in the same file could obfuscate the non white space changes. The commit has a single responsibility to reformat white space.

Related

How to distinguish several small changes from daily changes within git?

I work on a major private Python project in an exploratory way for several months using Pycharm.
I use git to track the changes in that project.
I am the only person that contributes to that project.
Normally I commit changes to that project roughly once a day.
Now I want to track changes of my code every time I execute my code (the background is that I sometimes get lost which intermediate result was achieved using which version of my code).
Therefore I want to perform a git commit of all changed files at the end of the script execution.
As now every commit just gets a 'technical' commit message, I would like to distinguish these 'technical' commits from the other commits I do roughly once a day (see above). The background is that I still would like to see and compare the daily differences from each other. The technical commits might sum up to several dozens per day and would hinder me to see the major changes over the course of time.
Which techniques does git offer to distinguish the technical commits from the daily commits I do?
Is maybe branching a valid approach for this? If yes, would I delete these branches later on? (I am a git novice)

You could use a branch for that, yes. Just use a working branch when doing your scripted autocommits, and then when you want to make a commit for the history, switch to your main branch.
To get to re-add the final changes as a single commit, one way would be to soft reset the history when you are done with the changes. So you would:
git reset prev-real-commit
Which jumps the history back to before your new batch of wip auto commits, but does not touch the files so you don't loose work. Then you can make a new commit normally for the changes.
That technique also works without a branch. Using a branch might still be nice though so you can easily check what the version was before your new wip commits.
Git also has rebasing which would allow squashing multiple commits to one and rewriting the messages. But for the workflow you describe, I think simply reseting the autocommits away and redoing a normal commit is better.
Also the suggestion to add some tag to the message of the autocommits is good.
That said, I usually just commit the checkpoints that I need in normal dev flow. It can be nicer to have a commit e.g. every hour instead of only once a day. Small atomic commits are good. You can use feature branches and on GitHub pull requests if you want to record and manage larger wholes of work.

I think even if you work on this project alone it might still be a good idea to adopt typical github flow approach - and start using branches.
The idea is that you distinguish your "technical" commits (many issued throughout the day) from your daily commits (rarely more than one) in terms of Git entities used:
your main code stays in master branch
your daily commits remain 'normal' commits going into a specific long-running branch (develop is a common name)
your 'once-a-day' commit becomes a merge commit, pushing all the changes in develop into master branch
This allows to you to save the history - yet see a clear distinction between those two types. You can opt for 'no fast forward' approach, so that each merge commit becomes clearly distinct from 'regular' ones.
And if you actually don't want all the history to be there (as #antont said, there might be a LOT of commits), you might consider 'squashing' those commits when either merging or rebasing, like described here.

How to merge the display of logs from several Mercurial repositories

Is there a way to merge the change logs from several different Mercurial repositories? By "merge" here I just mean integrate into a single display; this is nothing to do with merging in the source control sense.
In other words, I want to run hg log on several different repositories at once. The entries should be sorted by date regardless of which repository they're from, but be limited to the last n days (configurable), and should include entries from all branches of all the repositories. It would also be nice to filter by author and do this in a graphical client like TortoiseHg. Does anyone know of an existing tool or script that would do this? Or, failing that, a good way to access the log entries programmically? (Mercurial is written in Python, which would be ideal, but I can't find any information on a simple API for this.)
Background: We are gradually beginning to transition from SVN to Mercurial. The old repository was not just monolithic in the sense of one server, but also in the sense that there was one huge repository for all projects (albeit with a sensible directory structure). Our new Mercurial repositories are more focused! In general, this works much better, but we miss one useful feature from SVN: being able to use svn log at the root of the repository to see everything we have been working on recently. It's very useful for filling in timesheets, giving yourself a sense of purpose, etc.

I figured out a way of doing this myself. In short, I merge all the revisions into one mega-repo, and I can then look at this in TortoiseHG. Of course, it's a total mess, but it's good enough to get a summary of what happened recently.
I do this in three steps:
(Optional) Run hg convert on each source repository using the branchmap feature to rename each branch from original to reponame/original. This makes it easier later to identify which revision came from which source repository. (More faithful to SVN would be to use the filemap feature instead.)
On a new repository, run hg pull -f to force-pull from the individual repositories into a one big one. This gets all the revisions in one place, but they show up in the wrong order.
Use the method described in this answer to create yet another repository that contains all the changes from the one created in step 2 but sorted into the right order. (Actually I use a slight variant: I get the hashes and compare against the hashes in the destination, check that the destination has a prefix of the source's, and only copy the new ones across.)
This is all done from a Python script, but although Mercurial is written in Python I just use the command line interface using the subprocess module. Running through the three steps only copies the new revisions without rebuilding everything from scratch, unless you add a new repo.

How to REALLY disable re-indent on paste

I'm using Atom at work for editing Python code, and I'm running against a painful interaction between muscle memory and a labor-saving feature.
Near as I can tell, Atom will, when you paste a snippet of code, redo the indentation so that it's consistent with the indentation of the line it was pasted into, preserving relative indents.
If I didn't have any baggage from using editors without this feature, I'm pretty sure it'd be great, but as it is, I can't break my habit of selecting back to the preceding newline, and pasting that, which tends to do crazy things when pasting to or from the first line of a block.
I've tried to turn off Auto Indent on Paste, but it's not on anywhere I can find, and I'm not even sure it's the same feature; it's just what I hear about from people complaining about Atom going crazy when they paste Python.
So, where do I look to disable this? I'm willing to work up from no extensions back to what I've got installed, so assume a vanilla install.
I guess the workflow I'm looking for is "paste, manual re-indent", because at least that way I know what I'm getting and my response is always the same. As it stands, I don't have to think about it until it converts simple line rearrangements into syntactic garbage, which is worse than just adjusting things every time.
EDIT: In response to Milo Price, I have just now tried setting both autoIndentOnPaste and normalizeIndentOnPaste to false. The behavior is unchanged.
FURTHER EDIT: I had to reload the configuration for it to take. It's working now.

You have to set the config options autoIndentOnPaste and normalizeIndentOnPaste both to false, and then reload the configuration.

Is there a way to switch code indentation from tabs to spaces across the project, and to keep 'hg annotate' functionality?

There's a rather large oldish Python project that historically has the most (95%+) of the code base using tabs for indentation. Mercurial is used as a VCS.
There're several inconveniences in using tabs. It seems that 4 spaces became a prevailing indentation way within Python community, and most of code analysing/formatting software messes up with tabs one way or another. Also, most (pretty much all, actually) of team members that are working on the project are preferring spaces to tabs, thus would like to switch.
So, there's this fear of losing the ability to track who was the latest modifier of a specific line of code... because if all of the lines of code are converted to use spaces-based indentation from using tabs-based one, and then the change gets committed to the mercurial repository, that's exactly what's going to happen. And this feature (hg annotate) is too useful to consider a possibility of sacrificing it.
Is there a way to switch the indentation method across the project without losing the Mercurial hg annotate functionality? If there is, what would be the most painless way?

If you were to do the replacement of each tab with 4 spaces, you could still get a reasonably correct result from annotate, just use the switch that ignores changes in whitespace:
hg annotate -b text.txt
You could also use -w to ignore all whitespace in the comparison, but -b appeared to the best match: ignoring the case when some whitespace was changed into a different whitespace.
This would, however, ignore all lines where only whitespace had been altered, which would ignore changes in indentation and leave them attributed to the previous alteration of the line.
See hg help annotate for more.

You could create a new repository and, by using suitable scripts, populate it with each commit of the previous history BUT with the files automatically modified from what was actually committed to the same with the tabs replaced. Basically your script would need to checkout the initial file set and get the commit details replace any tabs in the file set and then commit to the new repository with the original commit details. It would then move on to the next change set, generate and apply paches, filter for tabs again and commit, etc. There is a blog here about doing something similar.
You could do this offline and autmatically and on an agreed upon date replace the repositories on your server, (keeping a copy of course), with the modified one - just remember to tell your team that they need to pull before doing any work the next day.
I would strongly recommend implementing pre-commit hooks, so as to ensure that you do not get polluted should anybody try checking in an old format file. They would probably be worth having in place on the new repository before starting the process.
UPDATE
Having written the above I finally came up with the correct search terms and found you hg_clone which should do exactly what you need, to quote the opening comments:
# Usage: hg-clone.rb SOURCE-DIR DEST-DIR --target TARGET-REVISION --cutoff CUTOFF-REVISION --filter FILTER
#
# This program clones a mercurial repository to a new directory, changeset by changeset
# with the option of running a filter before each commit. The filter can be used for
# example to strip out secret data (such as code for unused platforms) from the code.
#
# --target TARGET-REVISION
# The revision that the DEST-DIR should be updated to. This revision, and all its parent revisions
# will be copied over to the dest dir. If no TARGET-REVISION is specified, the latest revision in
# the repositiory will be used.
#
# --cutoff CUTOFF-REVISION
# If specified, this should be a short revision number that acts as a cutoff for synching. If you
# specify 100 for instance, no revisions before 100 will be brought over to DEST-DIR. Revisions
# that have parents earlier than revision 100 will be reparented to have 100 as their revision.
#
# --filter FILTER
# Specifies a program that should be run in the DEST-DIR before committing each revision.

Are there a modules for temporarily backup and restore text files in Python

I need to modify a text file at runtime but restore its original state later (even if the computer crash).
My program runs in regular sessions. Once a session ended, the original state of that file can be changed, but the original state won't change at runtime.
There are several instances of this text file with the same name in several directories. My program runs in each directory (but not in parallel), but depending on the directory content's it does different things. The order of choosing a working directory like this is completely arbitrary.
Since the file's name is the same in each directory, it seems a good idea to store the backed up file in slightly different places (ie. the parent directory name could be appended to the backup target path).
What I do now is backup and restore the file with a self-written class, and also check at startup if the previous backup for the current directory was properly restored.
But my implementation needs serious refactoring, and now I'm interested if there are libraries already implemented for this kind of task.
edit
version control seems like a good idea, but actually it's a bit overkill since it requires network connection and often a server. Other VCS need clients to be installed. I would be happier with a pure-python solution, but at least it should be cross-platform, portable and small enough (<10mb for example).

Why not just do what every unix , mac , window file has done for years -- create a lockfile/working file concept.
When a file is selected for edit:
Check to see if there is an active lock or a crashed backup.
If the file is locked or crashed, give a "recover" option
Otherwise, begin editing the file...
The editing tends to do one or more of a few things:
Copy the original file into a ".%(filename)s.backup"
Create a ".%(filename)s.lock" to prevent others from working on it
When editing is achieved, the lock goes away and the .backup is removed
Sometimes things are slightly reversed, and the original stays in place while a .backup is the active edit; on success the .backup replaces the original
If you crash vi or some other text programs on a linux box, you'll see these files created . note that they usually have a dot(.) prefix so they're normally hidden on the command line. Word/Powerpoint/etc all do similar things.

Implement Version control ... like svn (see pysvn) it should be fast as long as the repo is on the same server... and allows rollbacks to any version of the file... maybe overkill but that will make everything reversible
http://pysvn.tigris.org/docs/pysvn_prog_guide.html
You dont need a server ... you can have local version control and it should be fine...

Git, Subversion or Mercurial is your friend.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.