Interruption of a os.rename in python - python

I made a script in python that renames all files and folders(does not recurse) in "." directory: the directory in which file is kept. It happened that I ran the script in a directory which contained no files and only one directory let's say imp with path .\imp. While program was renaming it, the electricity went off and the job was interrupted (sorry did't had UPS).
Now as the name suggests, assume imp contains important data. The renaming process also took quite good time ( compared to others ) before electricity went off even when all it was renaming was one folder. After this endeavour is some data corrupted, lost or anything?
Just make this more useful: what happens os.rename is forced to stop when it is doing its job? How is the effect different for files and folders?
Details
Python Version - 2.7.10
Operating System - Windows 10 Pro

You are using Windows, which means you are (probably) on NTFS. NTFS is a modern, journaling file system. It should not corrupt or lose any data, though it's possible that only some of the changes that constitute a rename have been applied (for instance, the filename might change without updating the modification time, or vice-versa). It is also possible that none of those changes have been applied.
Note the word "should" is not the same as "will." NTFS should not lose data in this fashion, and if it does, it's a bug. But because all software has bugs, it is important to keep backups of files you care about.

Related

Python Setup Disabling Path Length Limit Pros and Cons?

I recently installed Python 3.7 and at the end of the setup, there is the option to "Disable path length limit". I don't know whether or not I should do this.
What are the pros and cons of doing this? Just from the sound of it you should always disable it.
I recommend selecting that option and thereby removing the path length limit. It will potentially save you time in future on debugging an avoidable issue.
Here is an anecdote of how I came to know about it:
During the compilation of my program (C# code on a Windows machine), I started getting the following error:
error MSB3541: Files has invalid value "long\path\filename". The specified path,
file name, or both are too long. The fully qualified file name must be less than
260 characters, and the directory name must be less than 248 characters.
This error was not allowing me to build my project and the only apparent solution to this issue was to shorten my path/file names. Turns out that this bug is a built-in limitation in NTFS (Window's File System): Why does the 260 character path length limit exist in Windows?
After a couple of decades with the limitation built into the NTFS file system, it has finally been fixed (Unix based system did not have it) in Windows 10 (https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation), but it is not enabled automatically, and needs registry (or group policy) settings to do this. The Python option allows you to disable it for Python libraries, saving you a lot of headache.
Do note that enabling this option will,
a) break compatibility of your programs on systems using older versions of Windows 10 and lower, when using long file/directory names and paths.
b) break programs on Windows 10 machines not having this option enabled, when using long file/directory names and paths.
to answer both of your questions:
Should you disable this?
The quick answer is that it doesn't matter that much, since this only matters when working with paths longer than 260 characters, not something most people do.
What are the pros and cons of disabling the path length limit?
Pros
you won't get an error when working with filepaths longer than 260 characters, so there's less worry about the path length
it can make debugging easier
Cons
disabling it has no negative technical side effects
if you work in a team, it might introduce bugs where code works on your machine, but not on their machine. because you have the path-limit disabled, but they don't.
disabling it can have negative human behaviour side effects.
Enabling long paths could promote bad naming-behaviour in your team
regarding pathnames and folderstructure. A limit forces people to
shorten their paths.
E.g. I've worked in teams with paths like this, and allowing them longer names would have resulted in less readable filepaths:
c:/project_name/unity/files/assets/UI/UI_2.0/levelname/season2_levelname/release_season2_levelname_ui_2/PROJECT_S2_MENU_UI/PROJECT_S2_hover_button_shadow_ui/PROJECT_S2_hover_button_shadow_ui_blue/PROJECT_S2_hover_button_shadow_ui_blue.asset
Explanation
To understand the pros and cons, it helps to understand what the path length limit is.
windows path length
You probably already know that a Windows path is a string, that represents where to find a file or folder.
e.g. C:\Program Files\7-Zip
longer folder or file names result in a longer string.
e.g. C:\Program Files\Microsoft Update Health Tools
more folders inside other folders also result in a longer string
e.g. C:\Program Files\Microsoft Update Health Tools\Logs
file path length errors
If you have a lot of folders inside each other, with long names, you might run into an error when trying to use this path in your code.
This is because Windows has a path length limit. An update in windows 10 allows you to disable this limitation. but it doesn't do so by default.
Disabling this limitation allows your computer to use longer paths without errors.
Why does this happen?
The old windows API promised that if you wrote your application correctly, it'd continue to work in the future.
If Windows were to allow filenames longer than 260 characters then your existing application (which used the windows API correctly) would fail.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. This is the update on windows 10.
read more on why
I am keeping this simple and straight forward
The "Disable path length limit" option refers to the maximum length of the file paths that Windows can handle. Disabling this limit can allow for longer file paths, which can be useful if you are working with files that have very long names or are stored in deeply nested directories. However, it can also cause compatibility issues with some programs, particularly older ones that may not designed to support long file paths.
In general, it's usually not necessary to disable the path length limit unless you have a specific need for it. If you're not sure whether you need it or not, it's probably best to leave it enabled.
Generally, it's not a good idea to disable it, especially if you have programs that could potentially break upon disabling it.
I have a lot of older programs, and potentially forgetting that I disabled it, and the fact that re-enabling it (being that finding out how to) and the fact that doing that could potentially break any program that uses long file paths in its scripts, makes having it off unhelpful, and moreover possibly a waste of time and debugging.
But to defend its existence, in certain environments it can be helpful, especially in environments where making subfolders upon subfolders is key. Particularly, this is helpful when making a game with a lot of assets. But again, there are many ways to shorten subfolders (and files), and doing that makes it generally easier to type out the path if you aren't copy-and-pasting everywhere. (For example, C:\my_game\assts\01\plyr\walk_01.png is easier to type than C:\my_epic_game_featuring_my_awesome_character\assets\…)
If you have a virtual machine or just another OS to try this on where you do not have to worry about specific programs breaking upon disabling the path limit, it'd probably be useful to have this off, but for everything else, just be wary of it's probability to make more bugs than to fix.

Should I delete temporary files created by my script?

It's a common question not specifically about some language or platform. Who is responsible for a file created in systems $TEMP folder?
If it's my duty, why should I care where to put this file? I can place it anywhere with same result.
If it's OS responsibility, can I forgot about this file right after use?
Thanks and sorry for my basic English.
As a general rule, you should remove the temporary files that you create.
Recall that the $TEMP directory is a shared resource that other programs can use. Failure to remove the temporary files will have an impact on the other programs that use $TEMP.
What kind of impacts? That will depend upon the other programs. If those other programs create a lot of temporary files, then their execution will be slower as it will take longer to create a new temporary file as the directory will have to be scanned on each temporary file creation to ensure that the file name is unique.
Consider the following (based on real events) ...
In years past, my group at work had to use the Intel C Compiler. We found that over time, it appeared to be slowing down. That is, the time it took to run our sanity tests using it took longer and longer. This also applied to building/compiling a single C file. We tracked the problem down.
ICC was opening, stat'ing and reading every file under $TEMP. For what purpose, I know not. Although the argument can be made that the problem lay with the ICC, the existence of the files under $TEMP was slowing it and our development team down. Deleting those temporary files resulted in the sanity checks running in less than a half hour instead of over two--a significant time saver.
Hope this helps.
There is no standard and no common rules. In most OSs, the files in the temporary folder will pile up. Some systems try to prevent this by deleting files in there automatically after some time but that sometimes causes grief, for example with long running processes or crash backups.
The reason for $TEMP to exist is that many programs (especially in early times when RAM was scarce) needed a place to store temporary data since "super computers" in the 1970s had only a few KB of RAM (yes, N*1024 bytes where N is << 100 - you couldn't even fit the image of your mouse cursor into that). Around 1980, 64KB was a lot.
The solution was a folder where anyone could write. Security wasn't an issue at the time, memory was.
Over time, OSs started to get better systems to create temporary files and to clean them up but backwards compatibility prevented a clean, "work for all" solution.
So even though you know where the data ends up, you are responsible to clean up the files after yourself. To make error analysis easier, I tend to write my code in such a way that files are only deleted when everything is fine - that way, I can look at intermediate results to figure out what is wrong. But logging is often a better and safer solution.
Related: Memory prices 1957-2014 12KB of Ram did cost US $4'680,- in 1973.

Where should I write a user specific log file to (and be XDG base directory compatible)

By default, pip logs errors into "~/.pip/pip.log". Pip has an option to change the log path, and I'd like to put the log file somewhere besides ~/.pip so as not to clutter up my home directory. Where should I put it and be XDG base dir compatible?
Right now I'm considering one of these:
$XDG_DATA_HOME (typically $HOME/.local/share)
$XDG_CACHE_HOME (typically $HOME/.cache)
This is, for the moment, unclear.
Different software seem to handle this in different ways (imsettings puts it in $XDG_CACHE_HOME,
profanity in $XDG_DATA_HOME).
Debian, however, has a proposal which I can get behind (emphasis mine):
This is a recurring request/complaint (see this or this) on the xdg-freedesktop mailing list to introduce another directory for state information that does not belong in any of the existing categories (see also home-dir.proposal. Examples for this information are:
history files of shells, repls, anything that uses libreadline
logfiles
state of application windows on exit
recently opened files
last time application was run
emacs: bookmarks, ido last directories, backups, auto-save files, auto-save-list
The above example information is not essential data. However it should still persist on reboots of the system unlike cache data that a user might consider putting in a TMPFS. On the other hand the data is rather volatile and does not make sense to be checked into a VCS. The files are also not the data files that an application works on.
A default folder for a future STATE category might be: $HOME/.local/state
This would effectively introduce another environment variable since $XDG_DATA_HOME usually points to $HOME/.local/share and this hypothetical environment variable ($XDG_STATE_HOME?) would point to $HOME/.local/state
If you really want to adhere to the current standard I would place my log files in $XDG_CACHE_HOME since log files aren't required to run the program.

Disk usage of a directory in Python

I have some bash code which moves files and directory to /tmp/rmf rather than deleting them, for safety purposes.
I am migrating the code to Python to add some functionality. One of the added features is checking the available size on /tmp and asserting that the moved directory can fit in /tmp.
Checking for available space is done using os.statvfs, but how can I measure the disk usage of the moved directory?
I could either call du using subprocess, or recursively iterate over the directory tree and sum the sizes of each file. Which approach would be better?
I think you might want to reconsider your strategy. Two reasons:
Checking if you can move a file, asserting you can move a file, and then moving a file provides a built-in race-condition to the operation. A big file gets created in /tmp/ after you've asserted but before you've moved your file.. Doh.
Moving the file across filesystems will result in a huge amount of overhead. This is why on OSX each volume has their own 'Trash' directory. Instead of moving the blocks that compose the file, you just create a new inode that points to the existing data.
I'd consider how long the file needs to be available and the visibility to consumers of the files. If it's all automated stuff happening on the backend - renaming a file to 'hide' it from computer and human consumers is easy enough in most cases and has the added benefit of being an atomic operation)
Occasionally scan the filesystem for 'old' files to cull and rm them after some grace period. No drama. Also makes restoring files a lot easier since it's just a rename to restore.
This should do the trick:
import os
path = 'THE PATH OF THE DIRECTORY YOU WANT TO FETCH'
os.statvfs(path)

Are there a modules for temporarily backup and restore text files in Python

I need to modify a text file at runtime but restore its original state later (even if the computer crash).
My program runs in regular sessions. Once a session ended, the original state of that file can be changed, but the original state won't change at runtime.
There are several instances of this text file with the same name in several directories. My program runs in each directory (but not in parallel), but depending on the directory content's it does different things. The order of choosing a working directory like this is completely arbitrary.
Since the file's name is the same in each directory, it seems a good idea to store the backed up file in slightly different places (ie. the parent directory name could be appended to the backup target path).
What I do now is backup and restore the file with a self-written class, and also check at startup if the previous backup for the current directory was properly restored.
But my implementation needs serious refactoring, and now I'm interested if there are libraries already implemented for this kind of task.
edit
version control seems like a good idea, but actually it's a bit overkill since it requires network connection and often a server. Other VCS need clients to be installed. I would be happier with a pure-python solution, but at least it should be cross-platform, portable and small enough (<10mb for example).
Why not just do what every unix , mac , window file has done for years -- create a lockfile/working file concept.
When a file is selected for edit:
Check to see if there is an active lock or a crashed backup.
If the file is locked or crashed, give a "recover" option
Otherwise, begin editing the file...
The editing tends to do one or more of a few things:
Copy the original file into a ".%(filename)s.backup"
Create a ".%(filename)s.lock" to prevent others from working on it
When editing is achieved, the lock goes away and the .backup is removed
Sometimes things are slightly reversed, and the original stays in place while a .backup is the active edit; on success the .backup replaces the original
If you crash vi or some other text programs on a linux box, you'll see these files created . note that they usually have a dot(.) prefix so they're normally hidden on the command line. Word/Powerpoint/etc all do similar things.
Implement Version control ... like svn (see pysvn) it should be fast as long as the repo is on the same server... and allows rollbacks to any version of the file... maybe overkill but that will make everything reversible
http://pysvn.tigris.org/docs/pysvn_prog_guide.html
You dont need a server ... you can have local version control and it should be fine...
Git, Subversion or Mercurial is your friend.

Categories

Resources