Flaky file deletion under Windows 7?

Flaky file deletion under Windows 7? - python

I have a Python test suite that creates and deletes many temporary files. Under Windows 7, the shutil.rmtree operations sometimes fail (<1% of the time). The failure is apparently random, not always on the same files, not always in the same way, but it's always on rmtree operations. It seems to be some kind of timing issue. It is also reminiscent of Windows 7's increased vigilance about permissions and administrator rights, but there are no permission issues here (since the code had just created the files), and there are no administrator rights in the mix.
It also looks like a timing issue between two threads or processes, but there is no concurrency here either.
Two examples of (partial) stack traces:
File "C:\ned\coverage\trunk\test\test_farm.py", line 298, in clean
shutil.rmtree(cleandir)
File "c:\python23\lib\shutil.py", line 142, in rmtree
raise exc[0], (exc[1][0], exc[1][1] + ' removing '+arg)
WindowsError: [Errno 5] Access is denied removing xml_1
File "C:\ned\coverage\trunk\test\test_farm.py", line 298, in clean
shutil.rmtree(cleandir)
File "c:\python23\lib\shutil.py", line 142, in rmtree
raise exc[0], (exc[1][0], exc[1][1] + ' removing '+arg)
WindowsError: [Errno 3] The system cannot find the path specified removing out
On Windows XP, it never failed. On Windows 7, it fails like this, across a few different Python versions (2.3-2.6, not sure about 3.1).
Anyone seen anything like this and have a solution? The code itself is on bitbucket for the truly industrious.

It's a long shot, but are you running anything that scans directories in the background? I'm thinking antivirus/backup (maybe Windows 7 has something like that built in? I don't know). I have experienced occasional glitches when deleting/moving files from the TSVNCache.exe process that TortoiseSVN starts -- seems it watches directories for changes, then presumably opens them for scanning the files.

We had similar problems with shutil.rmtree on Windows, particularly looking like your first stack trace. We solved it by using an exception handler with rmtree. See this answer for details.

Just a thought, but if the test behavior (creating and deleting lots of temp files) isn't typical of what the app actually does, maybe you could move those test file operations to (c)StringIO, and keep a suite of function tests that exercises your app's actual file creation/deletion behavior.
That way, you can make sure that your app behaves correctly, without introducing extra complexity not related to the app.

My guess is that you should check up on the code that creates the file, and make SURE they are closed explicitly before moving on to delete it. If nothing is obvious there in the code, download a copy of Process Monitor and watch what's happening on the file system there. This tool will give you the exact error code coming from Windows, and should shed some light on the situation.

That "The system cannot find the path specified:" error will appear intermittently if the path is too long for Windows (260 chars). Automated tasks often create folder hierarchies using relative references that produce fully qualified paths longer than 260 characters. Any script that attempts to delete those folders using fully qualified paths will fail.
I built a quick workaround that used relative path references, but don't have generic code solution to offer, just a warning that the excellent answers provided might not help you.

I met same problem with shutil.rmtree command and this issue maybe cause by special file name.(Ex:Other country language:леме / Ö)
Please use the following format to delete the directory which you want:
import shutil
shutil.rmtree(os.path.join("<folder_name>").decode('ascii'))
Enjoy it !

Related

How to check if a folder is open in Windows/Macintosh and possibly close it?

I am working in python and sometimes with os.rename() I run into the problem that if the file or folder to rename is open in the windows system I get a PermissionError: [WinError 5] error.
So if I close the folder and rerun the script, everything works.
In any case I am working on Windows, but I think it is good practice that I take into account that it could also be run on Machintosh.
I don't know what the best practice is for this, but please have a little patience, I'm still learning python and really don't know how to ask the question any better than that.

In general, no. This is a violation of system security: another entity (user, session, process, etc.) is using the resource (file or folder), in a way that requires exclusive rights (typically update / modify rights). If you steal the resource, how is that other entity supposed to react or adapt to the change?
This is why an OS has these privileges and locks: to manage system resources. Since you already have user control over the resource, you are supposed to use that authority to release the file lock -- not cracking the security from outside.
However, as the controlling user, you do likely have rights to view your own sessions and jobs, to inquire which one owns the resource, and then terminate the job or otherwise force it to release the resource.
In that fashion, it is possible to steal the resource from the other process. If you want to do that, you need to educate yourself on your OS's capabilities and rights. The most useful ones will be available through Python's os package. Enjoy the learning.

To echo Prune's answer, figuring out where it's in use and closing it sounds difficult, and probably not a great idea anyway. Imagine if it's not available because some other program is currently saving to one of the files — you could end up with corrupted data.
That said, you could make your Python script smart enough to notice there's a permission error, then pause and let you know so that you could try and close things on your own before telling it to try again and continue.
import os
def try_to_rename(src, dst):
while True:
try:
os.rename(src, dst)
break
except PermissionError:
input(f"Unable to rename {src} to {dst}. If one or both "
"files/folders are open, please close them. Press Enter to "
"continue.")
P.S. It makes little difference for this simple example, but for working extensively with paths and files, I'd recommend pathlib over os. It can really make things a lot more convenient and readable.

Deeptools cannot find file that exists

An intern of mine is trying to use deeptools' bamCoverage function, and it throws a '/my/dir/data.bam' file does not exist error, despite the file being there. Yes, the file does exist, we can manipulate it using bash commands just fine so there's no real reason for it to throw that error.
According to this thread, it could be an issue with pysam or python. Both are fully up to date. Do you know how I could investigate issues with pysam or python IO further?
All of this is happening on a server that our team uses. Could it be an issue with python's paths for his user session?
For reference, here is the code I'm running in my bash terminal. It's pretty basic:
my.name#server:/mnt/data1/my.name/PROJET_X/DATA/BIGWIG$ bamCoverage -p 8 -b /mnt/data1/my.name/PROJET_X/DATA/BAM_sort/G0-G00.Inputs.sorted.bam -o /mnt/data1/my.name/PROJET_X/DATA/BIGWIG/G0-G00.Inputs.RPGC.bw --normalizeUsing RPGC --effectiveGenomeSize 2913022398 -bs 10
The file '/mnt/data1/my.name/PROJET_X/DATA/BAM_sort/G0-G00.Inputs.sorted.bam' does not exist
my.name#server:/mnt/data1/my.name/PROJET_X/DATA/BIGWIG$ ll /mnt/data1/my.name/PROJET_X/DATA/BAM_sort/G0-G00.Inputs.sorted.bam
-rwxr-xr-x 1 my.name bioinfo 4171366400 juin 22 10:14 /mnt/data1/my.name/PROJET_X/DATA/BAM_sort/G0-G00.Inputs.sorted.bam
I have used this line probably 100 times this past year, but now it won't find the file.
I've also tried installing deeptools in my home directory using conda, but that gives the same errors.
EDIT: Apparently, it was just a problem with that one file. bamCoverage will work on other data files. It would be nice if deeptools would tell you that instead of just "file not found"...

It would be helpful to see what code is being run here - please,
provide a minimal reproducible example, including the full output / error, if at all possible.
Right now, I can think of 4 possible solutions to common problems:
As mentioned in the comments, providing the full path might be one solution. You can get the full path by using the pwd bash command in the folder where the file is stored and then add the file name at the end of the string.
Another thing that sometimes helps with relative (local) paths is to start the path with a ".", so the path would be something like this: './my/dir/data.bam'.
Another problem I've encountered (mostly when running things on Windows environments) was that the backslashes had to be used instead: '\my\dir\data.bam'
For the sake of completeness here, I'll mention that especially with beginning programmers, the path is often erroneously provided without commas (to make it a string), but that does not seem to be the case here.
Hope this helps!

Are there a modules for temporarily backup and restore text files in Python

I need to modify a text file at runtime but restore its original state later (even if the computer crash).
My program runs in regular sessions. Once a session ended, the original state of that file can be changed, but the original state won't change at runtime.
There are several instances of this text file with the same name in several directories. My program runs in each directory (but not in parallel), but depending on the directory content's it does different things. The order of choosing a working directory like this is completely arbitrary.
Since the file's name is the same in each directory, it seems a good idea to store the backed up file in slightly different places (ie. the parent directory name could be appended to the backup target path).
What I do now is backup and restore the file with a self-written class, and also check at startup if the previous backup for the current directory was properly restored.
But my implementation needs serious refactoring, and now I'm interested if there are libraries already implemented for this kind of task.
edit
version control seems like a good idea, but actually it's a bit overkill since it requires network connection and often a server. Other VCS need clients to be installed. I would be happier with a pure-python solution, but at least it should be cross-platform, portable and small enough (<10mb for example).

Why not just do what every unix , mac , window file has done for years -- create a lockfile/working file concept.
When a file is selected for edit:
Check to see if there is an active lock or a crashed backup.
If the file is locked or crashed, give a "recover" option
Otherwise, begin editing the file...
The editing tends to do one or more of a few things:
Copy the original file into a ".%(filename)s.backup"
Create a ".%(filename)s.lock" to prevent others from working on it
When editing is achieved, the lock goes away and the .backup is removed
Sometimes things are slightly reversed, and the original stays in place while a .backup is the active edit; on success the .backup replaces the original
If you crash vi or some other text programs on a linux box, you'll see these files created . note that they usually have a dot(.) prefix so they're normally hidden on the command line. Word/Powerpoint/etc all do similar things.

Implement Version control ... like svn (see pysvn) it should be fast as long as the repo is on the same server... and allows rollbacks to any version of the file... maybe overkill but that will make everything reversible
http://pysvn.tigris.org/docs/pysvn_prog_guide.html
You dont need a server ... you can have local version control and it should be fine...

Git, Subversion or Mercurial is your friend.

Odd python search-path behavior, what's going wrong here?

We have an application based on Excel 2003 and Python 2.4 on Windows XP 32bit. The application consists of a large collection of Python functions which can be called from a number of excel worksheets.
We've notcied an anomolous behavior which is that sometimes in the middle of one of these calls the python interpreter will start hunting around for modules which almost certainly are already loaded and in memory.
We know this because we were able to hook-up Sysinternal's Process Monitor to the process and observe that from time to time the process (when called) starts hunting around a bunch of directories and eggs for certain .py files.
The obvious thing to try is to see if the python search-path had become modified, however we found this not to be the case. It's exactly what we'd expect. The odd thing is that:
The occasions on which this searching behavior was triggered appears to be random, i.e. it did not happen every time or with any noticable pattern.
The behavior did not affect the result of the function. It returned the same value irrespective of whether this file searching behavior was triggered.
The folders that were being scanned were non-existant (e.g. J:/python-eggs ) on a machine where J-drive contained no-such folder. Naturally procmon reports that this generated a file-not found error.
It's all very mysterious so I dont expect anybody to be able to provide a definitive answer as to what might be going wrong. I would appreciate any suggestions about how this problem might be debugged.
Thanks!
Answers to comments
All the things that are being searched for are actual, known python files which exist in the main project .egg file. The odd thing is that at the time they are being searched-for those particuar modules have already been imported. They must be in memory in order for the process to work.
Yes, this affects performance because sometimes this searching behavior tries to hit network drives. Also by searching eggs which couldnt possibly contain certain modules it the process gets interrupted by the corporate mandated virus-scanner. That slows down what would normally be a harmless and instant interruption.
This is stock python 2.4.4. No modifications.

Python programs can import modules at any time, not just during program load. Try searching the modules you are using for import.
If this doesn't work, you can write an import hook to catch and report all attempted imports before they occur. For example, if you run this before everything else, you will get a dump of every attempted import and its source:
import sys, traceback
class ImportDebugger:
def find_module(self, fullname, path=None):
print "Attempting to import %s:" % fullname
traceback.print_stack()
sys.meta_path.insert(0, ImportDebugger())

"Python functions which can be called from a number of excel worksheets"
And you're not blaming Excel for randomly running Python modules? Why not? How have you proven that Excel is behaving properly?

Python file read problem

file_read = open("/var/www/rajaneesh/file/_config.php", "r")
contents = file_read.read()
print contents
file_read.close()
The output is empty, but in that file all contents are there. Please help me how to do read and replace a string in __conifg.php.

Usually, when there is such kind of issues, it is very useful to start the interactive shell and analyze all commands.
For instance, it could be that the file does not exists (see comment from freiksenet) or you do not have privileges to it, or it is locked by another process.
If you execute the script in some system (like a web server, as the path could suggest), the exception could go to a log - or simply be swallowed by other components in the system.
On the contrary, if you execute it in the interactive shell, you can immediately see what the problem was, and eventually inspect the object (by using help(), dir() or the module inspect). By the way, this is also a good method for developing a script - just by tinkering around with the concept in the shell, then putting altogether.
While we are here, I strongly suggest you usage of IPython. It is an evolution of the standard shell, with powerful aids for introspection (just press tab, or a put a question mark after an object). Unfortunately in the latest weeks the site is not often not available, but there are good chances you already have it installed on your system.

I copied your code onto my own system, and changed the filename so that it works on my system. Also, I changed the indenting (putting everything at the same level) from what shows in your question. With those changes, the code worked fine.
Thus, I think it's something else specific to your system that we probably cannot solve here (easily).

Would it be possible that you don't have read access to the file you are trying to open?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.