os.walk() not processing subdirectories when using UNC paths

os.walk() not processing subdirectories when using UNC paths - python

I'm having trouble with os.walk() in python 2.7.8 on Windows.
When I supply it with a 'normal' path such as "D:\Test\master" it works as expected. However when I supply it with a UNC path such as "\\?\D:\Test\master" it will report the root directory as expected but it will not drill down into the sub directories, nor will it raise an exception.
My research: I read on the help page that os.walk() accepts a function argument to handle errors. By default this argument is None so no error is reported.
I passed a simple function to print the error and received the following for every directory.
def WalkError(Error):
raise Exception(Error)
Stack trace:
Traceback (most recent call last):
File "Compare.py", line 988, in StartServer
for root, dirs, files in os.walk(ROOT_DIR,True,WalkError):
File "C:\Program Files (x86)\Python2.7.8\lib\os.py", line 296, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "C:\Program Files (x86)\Python2.7.8\lib\os.py", line 281, in walk
onerror(err)
File "Compare.py", line 62, in WalkError
raise Exception(Error)
Exception: [Error 123] The filename, directory name, or volume label syntax is incorrect: '\\\\?\\D:\\Test\\master\\localization/*.*'

Answer from the original author (originally posted as an edit to the question):
Instant update: In the process of inspecting \lib\os.py, I discovered the error stems from os.listdir(). I searched for the above error message in relation to os.listdir() and found this solution which worked for me.
It looks like if you're going to use UNC style paths with os. modules they need to Unixised (have their \ converted to /). \\\\?\\D:\\Test\\master\\ becomes //?/D:/Test/master/ (note: you no longer need to escape the \ which is handy).
This runs counter to the UNC 'spec' so be aware if you're working with other modules which respect Microsoft's UNC implementation.
(Sorry for the self-solution, I was going to close the tab but felt there was knowledge here which couldn't be found elsewhere.)

Related

Errno 22 Invalid argument - Zipfile Is Skipped

I am working on a project in Python in which I am parsing data from a zipped folder containing log files. The code works fine for most zips, but occasionally this exception is thrown:
[Errno 22] Invalid argument
As a result, the entire file is skipped, thus excluding the data in the desired log files from the results. When I try to extract the zipped file using the default Windows utility, I am met with this error:
Zip error
However, when I try to extract the file with 7zip, it does so successfully, save 2 errors:
1 <path> Unexpected End of Data
2 Data error: x.csv
x.csv is totally unrelated to the log I am trying to parse, and as such, I need to write code that is resilient to the point where if an unrelated file is corrupted, it will still be able to parse the other logs that are not.
At the moment, I am using the zipfile module to extract the files into memory. Is there a robust way to do this without the entire file being skipped?
Update 1: I believe the error I am running into is that the zipfile is missing a footer. I realized this when looking at it in a hex editor. I do not really have any idea how to safely edit the actual file using Python.
Here is the code that I am using to extract zips into memory:
for zip in os.listdir(directory):
try:
if zip.lower().endswith('.zip'):
if os.path.isfile(directory + "\\" + zip):
logs = zipfile.ZipFile(directory + "\\" + zip)
for log in logs.namelist():
if log.endswith('log.txt'):
data = logs.read(log)
Edit 2: Traceback for the error:
Traceback (most recent call last):
File "c:/Users/xxx/Desktop/Python Porjects/PE/logParse.py", line 28, in parse
logs = zipfile.ZipFile(directory + "\\" + zip)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1222, in __init__
self._RealGetContents()
File "C:\Users\xxx\AppData\Local\Programs\Python\Python37\lib\zipfile.py", line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

The stacktrace seems to show that it's not your code which badly manage to read the file but the Python module managing zip that is raising an error.
It looks like that python zip manager is more strict than other program (see this bug where a user report a difference between python behaviour and other program as GNOME Archive Manager).
Maybe, there is a bug report to do.

Python getting WindowsError 5 when deleting a file even though I have full permissions

quick question about Python on windows.
I have a script that compiles a program (using an install rule), and then moves the build products to a remote destination over the network.
However, I keep getting WindowsError 5 Access Denied.
All files are created from the script context, I have ownership and full control on all of them. The copy to the remote destination succeeds, but the failure is during the deletion process.
If I try to delete or rename the file manually within windows, I get no errors. Just the shutil.move fails.
I'm thinking maybe the API is trying to delete the file when the network operation is not yet complete?
Any input is very appreciated.
try:
shutil.move(directory, destination)
except OSError:
print "Failed to move %s to %s." %(directory, destination)
raise
...
Traceback (most recent call last):
File "C:\WIP\BuildMachine\build_machine.py", line 176, in <module>
main()
File "C:\WIP\BuildMachine.hg\BuilderInstance.py", line 496, in deployVersion
shutil.move(directory, destination)
File "C:\Python27\lib\shutil.py", line 300, in move
rmtree(src)
File "C:\Python27\lib\shutil.py", line 252, in rmtree
onerror(os.remove, fullname, sys.exc_info())
File "C:\Python27\lib\shutil.py", line 250, in rmtree
os.remove(fullname)
WindowsError: [Error 5] Access is denied: '3_54_7__B_1\\Application_Release_Note.doc'

the problem with shutil.move on windows is that it doesn't handle the case where:
the source & destination aren't on the same drive AND
some files in the source directory are write-protected.
If both conditions are met, shutil.move cannot perform a os.rename, it has to:
copy the files (which isn't an issue)
delete the source files (which is an issue because of a limitation of shutil)
To fix that, I made myself a copy of the shutil module (under a different name) and added that line (for you it'll be right before line 250):
os.chmod(fullname,0o777) # <-- add that line
os.remove(fullname) # some versions have "unlink" instead
The rmtree function has the same issue on Windows.
On Linux this doesn't happen because file delete permissions aren't handled at the file level but at the directory level. On windows, it doesn't work that way. Adding os.chmod does the trick (even if it's a hack), and os.remove succeeds (unless the file is open in Word or whatever)
Note that shutil authors encourage you to copy & improve the functions. Also a note from the documentation of shutil.move:
A lot more could be done here... A look at a mv.c shows a lot of
the issues this implementation glosses over.
If you don't want to modify shutil, you can run a recursive chmod on the source files to make sure that shutil.move will work, for instance like this:
for root, dirs, files in os.walk(path):
for f in dirs+files:
os.chmod(os.path.join(root, f), 0o777)
You could also use shutil.copytree then a modified version of shutil.rmtree (since you know that source & dest aren't on the same filesystem)

File does not exist error with 'w' mode

I am encountering an odd behaviour from the file() builtin. I am using the unittest-xml-reporting Python package to generate results for my unit tests. Here are the lines that open a file for writing, a file which (obviously does not exist):
report_file = file('%s%sTEST-%s.xml' % \
(test_runner.output, os.sep, suite), 'w')
(code is taken from the package's Github page)
However, I am given the following error:
...
File "/home/[...]/django-cms/.tox/pytest/local/lib/python2.7/site-packages/xmlrunner/__init__.py", line 240, in generate_reports
(test_runner.output, os.sep, suite), 'w')
IOError: [Errno 2] No such file or directory: './TEST-cms.tests.page.NoAdminPageTests.xml'
I found this weird because, as the Python docs state, if the w mode is used, the file should be created if it doesn't exist. Why is this happening and how can I fix this?

from man 2 read
ENOENT O_CREAT is not set and the named file does not exist. Or, a
directory component in pathname does not exist or is a dangling
symbolic link.
take your pick :)
in human terms:
your current working directory, ./ is removed by the time this command is ran,
./TEST-cms.tests.page.NoAdminPageTests.xml exists but is a symlink pointing to nowhere
"w" in your open/file call is somehow messed up, e.g. if you redefined file builtin

file will create a file, but not a directory. You have to create it first, as seen here

It seems like the file which needed to be created was attempted to be created in a directory that has already been deleted (since the path was given as . and most probably the test directory has been deleted by that point).
I managed to fix this by supplying an absolute path to test_runner.output and the result files are successfully created now.

Python doctest example failure

This is probably a silly question.
I am experimenting with python doctest, and I try to run this example
ending with
if __name__ == "__main__":
import doctest
doctest.testfile("example.txt")
I have put "example.txt" in the same folder as the source file containing the example code, but I get the following error:
Traceback (most recent call last):
File "test_av_funktioner.py", line 61, in <module>
doctest.testfile("example.txt")
File "C:\Python26\lib\doctest.py", line 1947, in testfile
text, filename = _load_testfile(filename, package, module_relative)
File "C:\Python26\lib\doctest.py", line 219, in _load_testfile
return open(filename).read(), filename
IOError: [Errno 2] No such file or directory: 'example.txt'
Can I somehow tell/set where the doctest module is searching for the specified file?

Doctest searches relative to the calling module's directory by default (but you can override this).
Quoting the docs for doctest.testfile:
Optional argument module_relative specifies how the filename should be interpreted:
If module_relative is True (the default), then filename specifies an OS-independent module-relative path. By default, this path is relative to the calling module’s directory; but if the package argument is specified, then it is relative to that package. To ensure OS-independence, filename should use / characters to separate path segments, and may not be an absolute path (i.e., it may not begin with /).
If module_relative is False, then filename specifies an OS-specific path. The path may be absolute or relative; relative paths are resolved with respect to the current working directory.

Using Python, How to copy files in 'temporary internet files' folder in Windows

I am using this code to find files recursively in a folder , with size greater than 50000 bytes.
def listall(parent):
lis=[]
for root, dirs, files in os.walk(parent):
for name in files:
if os.path.getsize(os.path.join(root,name))>500000:
lis.append(os.path.join(root,name))
return lis
This is working fine.
But when I used this on 'temporary internet files' folder in windows, am getting this error.
Traceback (most recent call last):
File "<pyshell#4>", line 1,
in <module> listall(a) File "<pyshell#2>",
line 5, in listall if os.path.getsize(os.path.join(root,name))>500000:
File "C:\Python26\lib\genericpath.py", line 49, in getsize return os.stat(filename).st_size WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'C:\\Documents and Settings\\khedarnatha\\Local Settings\\Temporary Internet Files\\Content.IE5\\EDS8C2V7\\??????+1[1].jpg'
I think this is because windows gives names with special characters in this specific folder...
Please help to sort out this issue.

It's because the saved file ‘(something)+1[1].jpg’ has non-ASCII characters in its name, characters that don't fit into the ‘system default code page’ (also misleadingly known as ‘ANSI’).
Programs like Python that use the byte-based C standard library (stdio) file access functions have big problems with Unicode filenames. On other platforms they can just use UTF-8 and everyone's happy, but on Windows the system default code page is never UTF-8, so there will always be characters that can't be represented in the given encoding. They'll get replaced with ? or sometimes other similar-looking characters, and then when you try to read the files with mangled names you'll get errors like the above.
Which code page you get depends on your locale: on Western Windows installs it'll be cp1252 (similar to ISO-8859-1, ‘Latin-1’), so you'll only be to use these characters.
Luckily, reasonably recent versions of Python (2.3+, according to PEP277) can also directly support Unicode filenames by using the native Win32 APIs instead of stdio. If you pass a Unicode string into os.listdir(), Python will use these native-Unicode APIs and you'll get Unicode strings back, which will include the original characters in the filename instead of mangled ones. So if you call listall with a Unicode pathname:
listall(ur'C:\Documents and Settings\khedarnatha\Local Settings\Temporary Internet Files')
it should Just Work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

os.walk() not processing subdirectories when using UNC paths - python

Related

Errno 22 Invalid argument - Zipfile Is Skipped

Python getting WindowsError 5 when deleting a file even though I have full permissions

File does not exist error with 'w' mode

Python doctest example failure

Using Python, How to copy files in 'temporary internet files' folder in Windows

Categories

Resources