Pathname too long to open? - python

This is a screenshot of the execution:
As you see, the error says that the directory "JSONFiles/Apartment/Rent/dubizzleabudhabiproperty" is not there.
But look at my files, please:
The folder is definitely there.
Update 2
The code
self.file = open("JSONFiles/"+ item["category"]+"/" + item["action"]+"/"+ item['source']+"/"+fileName + '.json', 'wb') # Create a new JSON file with the name = fileName parameter
line = json.dumps(dict(item)) # Change the item to a JSON format in one line
self.file.write(line) # Write the item to the file
UPDATE
When I change the file name to a smaller one, it works, so the problem is because of the length of the path. what is the solution please?

Regular DOS paths are limited to MAX_PATH (260) characters, including the string's terminating NUL character. You can exceed this limit by using an extended-length path that starts with the \\?\ prefix. This path must be a Unicode string, fully qualified, and only use backslash as the path separator. Per Microsoft's file system functionality comparison, the maximum extended path length is 32760 characters. A individual file or directory name can be up to 255 characters (127 for the UDF filesystem). Extended UNC paths are also supported as \\?\UNC\server\share.
For example:
import os
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, unicode) and
encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
path = winapi_path(os.path.join(u"JSONFiles",
item["category"],
item["action"],
item["source"],
fileName + ".json"))
>>> path = winapi_path("C:\\Temp\\test.txt")
>>> print path
\\?\C:\Temp\test.txt
See the following pages on MSDN:
Naming Files, Paths, and Namespaces
Defining an MS-DOS Device Name
Kernel object namespaces
Background
Windows calls the NT runtime library function RtlDosPathNameToRelativeNtPathName_U_WithStatus to convert a DOS path to a native NT path. If we open (i.e. CreateFile) the above path with a breakpoint set on the latter function, we can see how it handles a path that starts with the \\?\ prefix.
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du #rcx
000000b4`52fc0f60 "\\?\C:\Temp\test.txt"
0:000> r rdx
rdx=000000b450f9ec18
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
The result replaces \\?\ with the NT DOS devices prefix \??\, and copies the string into a native UNICODE_STRING:
0:000> dS b450f9ec18
000000b4`536b7de0 "\??\C:\Temp\test.txt"
If you use //?/ instead of \\?\, then the path is still limited to MAX_PATH characters. If it's too long, then RtlDosPathNameToRelativeNtPathName returns the status code STATUS_NAME_TOO_LONG (0xC0000106).
If you use \\?\ for the prefix but use slash in the rest of the path, Windows will not translate the slash to backslash for you:
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du #rcx
0000005b`c2ffbf30 "\\?\C:/Temp/test.txt"
0:000> r rdx
rdx=0000005bc0b3f068
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
0:000> dS 5bc0b3f068
0000005b`c3066d30 "\??\C:/Temp/test.txt"
Forward slash is a valid object name character in the NT namespace. It's reserved by Microsoft filesystems, but you can use a forward slash in other named kernel objects, which get stored in \BaseNamedObjects or \Sessions\[session number]\BaseNamedObjects. Also, I don't think the I/O manager enforces the policy on reserved characters in device and filenames. It's up to the device. Maybe someone out there has a Windows device that implements a namespace that allows forward slash in names. At the very least you can create DOS device names that contain a forward slash. For example:
>>> kernel32 = ctypes.WinDLL('kernel32')
>>> kernel32.DefineDosDeviceW(0, u'My/Device', u'C:\\Temp')
>>> os.path.exists(u'\\\\?\\My/Device\\test.txt')
True
You may be wondering what \?? signifies. This used to be an actual directory for DOS device links in the object namespace, but starting with NT 5 (or NT 4 w/ Terminal Services) this became a virtual prefix. The object manager handles this prefix by first checking the logon session's DOS device links in the directory \Sessions\0\DosDevices\[LOGON_SESSION_ID] and then checking the system-wide DOS device links in the \Global?? directory.
Note that the former is a logon session, not a Windows session. The logon session directories are all under the DosDevices directory of Windows session 0 (i.e. the services session in Vista+). Thus if you have a mapped drive for a non-elevated logon, you'll discover that it's not available in an elevated command prompt, because your elevated token is actually for a different logon session.
An example of a DOS device link is \Global??\C: => \Device\HarddiskVolume2. In this case the DOS C: drive is actually a symbolic link to the HarddiskVolume2 device.
Here's a brief overview of how the system handles parsing a path to open a file. Given we're calling WinAPI CreateFile, it stores the translated NT UNICODE_STRING in an OBJECT_ATTRIBUTES structure and calls the system function NtCreateFile.
0:000> g
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ff9`d2023d70 4c8bd1 mov r10,rcx
0:000> !obja #r8
Obja +000000b450f9ec58 at 000000b450f9ec58:
Name is \??\C:\Temp\test.txt
OBJ_CASE_INSENSITIVE
NtCreateFile calls the I/O manager function IoCreateFile, which in turn calls the undocumented object manager API ObOpenObjectByName. This does the work of parsing the path. The object manager starts with \??\C:\Temp\test.txt. Then it replaces that with \Global??\C:Temp\test.txt. Next it parses up to the C: symbolic link and has to start over (reparse) the final path \Device\HarddiskVolume2\Temp\test.txt.
Once the object manager gets to the HarddiskVolume2 device object, parsing is handed off to the I/O manager, which implements the Device object type. The ParseProcedure of an I/O Device creates the File object and an I/O Request Packet (IRP) with the major function code IRP_MJ_CREATE (an open/create operation) to be processed by the device stack. This is sent to the device driver via IoCallDriver. If the device implements reparse points (e.g. junction mountpoints, symbolic links, etc) and the path contains a reparse point, then the resolved path has to be resubmitted to the object manager to be parsed from the start.
The device driver will use the SeChangeNotifyPrivilege (almost always present and enabled) of the process token (or thread if impersonating) to bypass access checks while traversing directories. However, ultimately access to the device and target file has to be allowed by a security descriptor, which is verified via SeAccessCheck. Except simple filesystems such as FAT32 don't support file security.

below is Python 3 version regarding #Eryk Sun's solution.
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, str) and encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
#Python 3 renamed the unicode type to str, the old str type has been replaced by bytes. NameError: global name 'unicode' is not defined - in Python 3

Adding the solution that helped me fix a similar issue:
python version = 3.9, windows version = 10 pro.
I had an issue with the filename itself as it was too long for python's open built-in method. The error I got is that the path simply doesn't exist, although I use the 'w+' mode for open (which is supposed to open a new file regardless whether it exists or not).
I found this guide which solved the problem with a quick change to window's Registry Editor (specifically the Group Policy). Scroll down to the 'Make Windows 10 Accept Long File Paths' headline.
Don't forget to update your OS group policy to take effect immediately, a guide can be found here.
Hope this helps future searches as this post is quite old.

There can be multiple reasons for you getting this error. Please make sure of the following:
The parent directory of the folder (JSONFiles) is the same as the directory of the Python script.
Even though the folder exists it does not mean the individual file does. Verify the same and make sure the exact file name matches the one that your Python code is trying to access.
If you still face an issue, share the result of "dir" command on the innermost folder you are trying to access.

it works for me
import os
str1=r"C:\Users\sandeepmkwana\Desktop\folder_structure\models\manual\demodfadsfljdskfjslkdsjfklaj\inner-2djfklsdfjsdklfj\inner3fadsfksdfjdklsfjksdgjl\inner4dfhasdjfhsdjfskfklsjdkjfleioreirueewdsfksdmv\anotherInnerfolder4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\5qbbbbbbbbbbbccccccccccccccccccccccccsssssssssssssssss\tmp.txt"
print(len(str1)) #346
path = os.path.abspath(str1)
if path.startswith(u"\\\\"):
path=u"\\\\?\\UNC\\"+path[2:]
else:
path=u"\\\\?\\"+path
with open(path,"r+") as f:
print(f.readline())
If you get a long path (more than 258 characters) issue in Windows, then try this.

Related

FileNotFoundError when writing txt file [duplicate]

This is a screenshot of the execution:
As you see, the error says that the directory "JSONFiles/Apartment/Rent/dubizzleabudhabiproperty" is not there.
But look at my files, please:
The folder is definitely there.
Update 2
The code
self.file = open("JSONFiles/"+ item["category"]+"/" + item["action"]+"/"+ item['source']+"/"+fileName + '.json', 'wb') # Create a new JSON file with the name = fileName parameter
line = json.dumps(dict(item)) # Change the item to a JSON format in one line
self.file.write(line) # Write the item to the file
UPDATE
When I change the file name to a smaller one, it works, so the problem is because of the length of the path. what is the solution please?
Regular DOS paths are limited to MAX_PATH (260) characters, including the string's terminating NUL character. You can exceed this limit by using an extended-length path that starts with the \\?\ prefix. This path must be a Unicode string, fully qualified, and only use backslash as the path separator. Per Microsoft's file system functionality comparison, the maximum extended path length is 32760 characters. A individual file or directory name can be up to 255 characters (127 for the UDF filesystem). Extended UNC paths are also supported as \\?\UNC\server\share.
For example:
import os
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, unicode) and
encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
path = winapi_path(os.path.join(u"JSONFiles",
item["category"],
item["action"],
item["source"],
fileName + ".json"))
>>> path = winapi_path("C:\\Temp\\test.txt")
>>> print path
\\?\C:\Temp\test.txt
See the following pages on MSDN:
Naming Files, Paths, and Namespaces
Defining an MS-DOS Device Name
Kernel object namespaces
Background
Windows calls the NT runtime library function RtlDosPathNameToRelativeNtPathName_U_WithStatus to convert a DOS path to a native NT path. If we open (i.e. CreateFile) the above path with a breakpoint set on the latter function, we can see how it handles a path that starts with the \\?\ prefix.
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du #rcx
000000b4`52fc0f60 "\\?\C:\Temp\test.txt"
0:000> r rdx
rdx=000000b450f9ec18
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
The result replaces \\?\ with the NT DOS devices prefix \??\, and copies the string into a native UNICODE_STRING:
0:000> dS b450f9ec18
000000b4`536b7de0 "\??\C:\Temp\test.txt"
If you use //?/ instead of \\?\, then the path is still limited to MAX_PATH characters. If it's too long, then RtlDosPathNameToRelativeNtPathName returns the status code STATUS_NAME_TOO_LONG (0xC0000106).
If you use \\?\ for the prefix but use slash in the rest of the path, Windows will not translate the slash to backslash for you:
Breakpoint 0 hit
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus:
00007ff9`d1fb5880 4883ec58 sub rsp,58h
0:000> du #rcx
0000005b`c2ffbf30 "\\?\C:/Temp/test.txt"
0:000> r rdx
rdx=0000005bc0b3f068
0:000> pt
ntdll!RtlDosPathNameToRelativeNtPathName_U_WithStatus+0x66:
00007ff9`d1fb58e6 c3 ret
0:000> dS 5bc0b3f068
0000005b`c3066d30 "\??\C:/Temp/test.txt"
Forward slash is a valid object name character in the NT namespace. It's reserved by Microsoft filesystems, but you can use a forward slash in other named kernel objects, which get stored in \BaseNamedObjects or \Sessions\[session number]\BaseNamedObjects. Also, I don't think the I/O manager enforces the policy on reserved characters in device and filenames. It's up to the device. Maybe someone out there has a Windows device that implements a namespace that allows forward slash in names. At the very least you can create DOS device names that contain a forward slash. For example:
>>> kernel32 = ctypes.WinDLL('kernel32')
>>> kernel32.DefineDosDeviceW(0, u'My/Device', u'C:\\Temp')
>>> os.path.exists(u'\\\\?\\My/Device\\test.txt')
True
You may be wondering what \?? signifies. This used to be an actual directory for DOS device links in the object namespace, but starting with NT 5 (or NT 4 w/ Terminal Services) this became a virtual prefix. The object manager handles this prefix by first checking the logon session's DOS device links in the directory \Sessions\0\DosDevices\[LOGON_SESSION_ID] and then checking the system-wide DOS device links in the \Global?? directory.
Note that the former is a logon session, not a Windows session. The logon session directories are all under the DosDevices directory of Windows session 0 (i.e. the services session in Vista+). Thus if you have a mapped drive for a non-elevated logon, you'll discover that it's not available in an elevated command prompt, because your elevated token is actually for a different logon session.
An example of a DOS device link is \Global??\C: => \Device\HarddiskVolume2. In this case the DOS C: drive is actually a symbolic link to the HarddiskVolume2 device.
Here's a brief overview of how the system handles parsing a path to open a file. Given we're calling WinAPI CreateFile, it stores the translated NT UNICODE_STRING in an OBJECT_ATTRIBUTES structure and calls the system function NtCreateFile.
0:000> g
Breakpoint 1 hit
ntdll!NtCreateFile:
00007ff9`d2023d70 4c8bd1 mov r10,rcx
0:000> !obja #r8
Obja +000000b450f9ec58 at 000000b450f9ec58:
Name is \??\C:\Temp\test.txt
OBJ_CASE_INSENSITIVE
NtCreateFile calls the I/O manager function IoCreateFile, which in turn calls the undocumented object manager API ObOpenObjectByName. This does the work of parsing the path. The object manager starts with \??\C:\Temp\test.txt. Then it replaces that with \Global??\C:Temp\test.txt. Next it parses up to the C: symbolic link and has to start over (reparse) the final path \Device\HarddiskVolume2\Temp\test.txt.
Once the object manager gets to the HarddiskVolume2 device object, parsing is handed off to the I/O manager, which implements the Device object type. The ParseProcedure of an I/O Device creates the File object and an I/O Request Packet (IRP) with the major function code IRP_MJ_CREATE (an open/create operation) to be processed by the device stack. This is sent to the device driver via IoCallDriver. If the device implements reparse points (e.g. junction mountpoints, symbolic links, etc) and the path contains a reparse point, then the resolved path has to be resubmitted to the object manager to be parsed from the start.
The device driver will use the SeChangeNotifyPrivilege (almost always present and enabled) of the process token (or thread if impersonating) to bypass access checks while traversing directories. However, ultimately access to the device and target file has to be allowed by a security descriptor, which is verified via SeAccessCheck. Except simple filesystems such as FAT32 don't support file security.
below is Python 3 version regarding #Eryk Sun's solution.
def winapi_path(dos_path, encoding=None):
if (not isinstance(dos_path, str) and encoding is not None):
dos_path = dos_path.decode(encoding)
path = os.path.abspath(dos_path)
if path.startswith(u"\\\\"):
return u"\\\\?\\UNC\\" + path[2:]
return u"\\\\?\\" + path
#Python 3 renamed the unicode type to str, the old str type has been replaced by bytes. NameError: global name 'unicode' is not defined - in Python 3
Adding the solution that helped me fix a similar issue:
python version = 3.9, windows version = 10 pro.
I had an issue with the filename itself as it was too long for python's open built-in method. The error I got is that the path simply doesn't exist, although I use the 'w+' mode for open (which is supposed to open a new file regardless whether it exists or not).
I found this guide which solved the problem with a quick change to window's Registry Editor (specifically the Group Policy). Scroll down to the 'Make Windows 10 Accept Long File Paths' headline.
Don't forget to update your OS group policy to take effect immediately, a guide can be found here.
Hope this helps future searches as this post is quite old.
There can be multiple reasons for you getting this error. Please make sure of the following:
The parent directory of the folder (JSONFiles) is the same as the directory of the Python script.
Even though the folder exists it does not mean the individual file does. Verify the same and make sure the exact file name matches the one that your Python code is trying to access.
If you still face an issue, share the result of "dir" command on the innermost folder you are trying to access.
it works for me
import os
str1=r"C:\Users\sandeepmkwana\Desktop\folder_structure\models\manual\demodfadsfljdskfjslkdsjfklaj\inner-2djfklsdfjsdklfj\inner3fadsfksdfjdklsfjksdgjl\inner4dfhasdjfhsdjfskfklsjdkjfleioreirueewdsfksdmv\anotherInnerfolder4aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\5qbbbbbbbbbbbccccccccccccccccccccccccsssssssssssssssss\tmp.txt"
print(len(str1)) #346
path = os.path.abspath(str1)
if path.startswith(u"\\\\"):
path=u"\\\\?\\UNC\\"+path[2:]
else:
path=u"\\\\?\\"+path
with open(path,"r+") as f:
print(f.readline())
If you get a long path (more than 258 characters) issue in Windows, then try this.

Python : Problem reading filename with brackets/long path name

I am trying to read excel file with pandas.
df=pd.read_excel('abcd (xyz-9) Interim Report 01-03-18.xlsx')
which gives me file not found error. If I remove brackets and rename file to 'abcd Interim Report 01-03-18.xlsx', then it works fine.
I tried renaming with shutil but it gives me the same error
shutil.copyfile('abcd (xyz-9) Interim Report 01-03-18.xlsx','test.xlsx')
I tried
1. pd.read_excel('abcd ^(xyz-9) Interim Report 01-03-18.xlsx')
2. pd.read_excel('abcd \\(xyz-9\\) Interim Report 01-03-18.xlsx')
EDIT:
The file seem to work on local drive but not on network drive even if I change the cwd to the file location.
On using glob and os.path.exists:
for i in range(0,1):
for filename in glob.glob(fpath+"\\"+ldir[i]+"\\"+"*Interim*.xlsx"):
print(filename)
print(os.path.exists(filename))
\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd (xyz-9) Interim Report 01-03-18.xlsx
False
\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd Interim Report 01-03-18.xlsx
True
On using glob and os.stat:
import ctypes
for i in range(0,1):
for filename in glob.glob(fpath+"\\"+ldir[i]+"\\"+"*Interim*.xlsx"):
print(filename)
try:
print(os.stat(filename))
except OSError as e:
ntstatus = ctypes.windll.ntdll.RtlGetLastNtStatus()
print('winerror:', e.winerror)
print('ntstatus:', hex(ntstatus & (2**32-1)))
\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd (xyz-9) Interim Report 01-03-18.xlsx
winerror: 3
ntstatus: 0x80000006
\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd Interim Report 01-03-18.xlsx
os.stat_result(st_mode=33206, st_ino=15624813576354602, st_dev=3657573641, st_nlink=1, st_uid=0, st_gid=0, st_size=726670, st_atime=1563172745, st_mtime=1523347973, st_ctime=1563170560)
The os.stat test shows that accessing the path with brackets fails with ERROR_PATH_NOT_FOUND (3), which is either from a missing path component or a path that's too long. We know it's not a problem with finding the final path component, since in this case we expect the error to be ERROR_FILE_NOT_FOUND (2). We know it's not a problem with reserved characters, since in this case we expect the error to be ERROR_INVALID_NAME (123). Also, in both of these cases, the Windows API would have to make an NT system call, but we see that the last NT status is STATUS_NO_MORE_FILES (0x80000006), which is from the os.listdir call in glob.glob. Thus the problem is likely that the path is too long.
For brevity or privacy, the question seems to shorten some path component names to "xxx". Probably if expanded out to the true names, we'd see that the path with "(xyz-9)" is at least 260 characters, which is one more than the maximum allowed length for a DOS path, MAX_PATH - 1 (259) characters.
We can access a long path by converting it to an extended path, which is a Unicode string that begins with the "\\?\" device-path prefix, or "\\?\UNC\" for a UNC path. First, since bytes paths in Windows are limited to MAX_PATH characters, we have to decode a bytes path to Unicode. Next we have to normalize and qualify the path via os.path.abspath because extended paths bypass normalization when accessed. Here's an extpath function to convert a DOS path to an extended path:
import os
try:
from os import fsdecode
except ImportError: # Probably Python 2.x
import sys
def fsdecode(filename):
if isinstance(filename, type(u'')):
return filename
elif isinstance(filename, bytes):
return filename.decode(sys.getfilesystemencoding(), 'strict')
raise TypeError('expected string, not {}'.format(
type(filename).__name__))
def extpath(path):
path = os.path.abspath(fsdecode(path))
if not path.startswith(u'\\\\?\\'):
if path.startswith(u'\\\\.\\'):
path = u'\\\\?\\' + path[4:]
elif path.startswith(u'\\\\'):
path = u'\\\\?\\UNC\\' + path[2:]
else:
path = u'\\\\?\\' + path
return path
Background
At the core of Windows NT (i.e. all Windows versions since XP) is the NTOS operating system, which uses the NT kernel. This is similar to the way 16-bit Windows was layered over DOS in the 1980s and early 1990s. But NTOS is more tightly coupled to Windows than DOS was, and it's a more capable OS (e.g. support for symmetric multiprocessing, preemptive multithreading, virtual memory, asynchronous I/O, secured objects, and multiple users with simultaneous logons and sessions).
In some ways Windows still maintains its MS-DOS roots. In particular, for disk device and filesystem paths, the Windows API uses DOS paths instead of NT object paths. This includes DOS drives "A:" through "Z:" and UNC paths such as "\\server\share\path". The Windows API normalizes DOS paths to replace forward slash with backslash; resolve relative paths (i.e. paths with no root directory or no drive) using the process working directory or per-drive working directory; resolve "." and ".." components; and trim trailing spaces and dots from the final component.
When accessing a DOS path, Windows converts it to a device path that begins with one of the WINAPI device prefixes, "\\.\" and "\\?\" (e.g. "C:\Windows" -> "\\?\C:\Windows"). For drive-letter paths and relative paths (but not UNC paths), it reserves a small set of DOS device names in the final path component (e.g. "C:\Temp\con" -> "\\.\con", and "nul" -> "\\.\nul"). For UNC paths, it uses the "UNC" device mountpoint (e.g. \\server\share\path -> "\\?\UNC\server\share\path"). Before a call passes to the NT domain, the WINAPI device prefix ("\\.\" or "\\?\") gets replaced by the NTAPI device prefix ("\??\").
These device-namespace prefixes are shorthand for the caller's local mountpoint directory in NT's object namespace (i.e. "\Sessions\0\DosDevices\<Caller's Logon Session ID>"). The local mountpoint directory implicitly shadows the global mountpoint directory (i.e. "\Global??"). For example, "C:\Temp" becomes "\??\C:\Temp", which evaluates as "\Global??\C:\Temp" if global "C:" isn't locally shadowed. The global mountpoint can be referenced explicitly via the "Global" object link (e.g. "\\?\Global\C:\Temp"), which should always be available.
Classically, DOS path normalization uses string buffers with space for no more than MAX_PATH (260) characters. In Windows 10, this legacy limit is lifted (in most cases) if long paths are enabled for the system and the application manifest declares that it's long-path aware. The
"python[w].exe" executable in Python 3.6+ has this manifest setting.
If long DOS paths aren't enabled, most file API functions still support long paths. We just have to use a WINAPI device path that begins with the "\\?\" prefix, which is called an extended path. This is similar to a regular device path that begins with the "\\.\" prefix, except an extended path does not get normalized when accessed. The downside is that we have to implement our own path normalization. In Python this is implemented by os.path.abspath. We still have to manually rewrite UNC paths, but that's a simple matter of replacing a leading "\\" with "\\?\UNC\".
Note that the working directory does not support device paths, extended or not. The system has undefined behavior if we set a device path as the working directory. So don't use them with Python's os.chdir or the cwd parameter of subprocess.Popen. This means we can't get around the limit on using long relative paths by setting an extended path as the working directory. In Windows 10, if long DOS paths are enabled for the process, the working directory does support long paths, but it still only supports regular DOS paths (UNC and drive-letter paths), not device paths.
I always try to use python's pathlib module whenever I deal with files. There seems to be an error with the initial a in the filename
import pandas as pd
import pathlib
path_to_file = pathlib.Path("g:\Python\abcd (xyz-9) Interim Report 01-03-18.xlsx")
df = pd.read_exces(path_to_file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'pandas' has no attribute 'read_exces'
df = pd.read_excel(path_to_file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
self.book = xlrd.open_workbook(filepath_or_buffer)
File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: 'g:\\Python\x07bcd (xyz-9) Interim Report 01-03-18.xlsx'
It seems that because of the preceding backslash, the initial a was at some point interpreted as the control symbol \x07 U+0007 : ALERT [BEL].
that's why it is necessary to use the raw string when defining the path, as Dawid suggested
path_to_file = pathlib.Path(r"g:\Python\abcd (xyz-9) Interim Report 01-03-18.xlsx")

Long paths for python on windows - os.stat() fails for relative paths?

I want to access some long UNC paths on Windows. I know that I need to use the "\\?\UNC\" prefix (which is "\\\\?\\UNC\\" if you escape the slashes). That works fine:
os.stat('\\\\?\\UNC\\server.example.com\\that\\has\\long\\path\\aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.txt')
# works, returns os.stat_result
However, it seems to fail with a relative path:
os.chdir('\\\\?\\UNC\\server.example.com\\that\\has\\long\\path')
os.getcwd()
# returns '\\\\?\\UNC\\server.example.com\\that\\has\\long\\path'
os.stat('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.txt')
# fails with [WinError 3] The system cannot find the path specified: 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\\bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb.txt'
Might this be a bug in Python, or is my code wrong?
Sidenote - a workaround is os.stat(os.path.abspath('aaa\\bbb.txt')).
In Windows 10, you can enable long-path support for the system by setting a DWORD named "LongPathsEnabled" in "HKLM\System\CurrentControlSet\Control\FileSystem". This allows applications that declare support for long paths in their manifest to use the maximum path length that's supported by the kernel (about 32,760 characters, depending on the final resolved path), without even requiring the "\\?\" prefix. Python 3.6+ is manifested to support long paths.
That said, prior to Windows 10, the working directory and relative paths cannot exceed MAX_PATH (260) characters, which includes the trailing backslash and NUL terminator. The current documentation is misleading on this point. Apparently someone added the disclaimer "to extend this limit to 32,767 wide characters..." to the docs for SetCurrentDirectory. No, there's no extending the limit. Here's what it said circa 2016.
The current working directory of a process is a DOS path, as opposed to a native kernel path (*). A DOS path is any path that's non-Unicode, or uses forward slashes, DOS devices (e.g. logical drive letters, CON, NUL, etc), or UNC syntax. DOS paths have to be translated to native paths by runtime-library functions in ntdll.dll. If long-path support isn't available, this implicit translation is limited to at most MAX_PATH characters.
Working around this requires using a fully-qualified Unicode path that begins with the "\\?\" prefix. This prefix tells the runtime-library to bypass path translation. Instead it simply replaces the "\\?\" prefix with the kernel's "\??\" virtual directory for DOS device links, and the path ultimately resolves to a real NT device (e.g. "\\?\UNC" => "\??\UNC" => "\Device\Mup").
(*) The kernel namespace uses a singly-rooted tree for all kernel objects, not just device objects. It also has a more reliable way of handling relative paths; see the RootDirectory field of OBJECT_ATTRIBUTES.

Opening filenames with colon (":") in Windows 7

I am writing a Python app that should run in both Windows and Linux, but am having an issue with one of the file naming conventions. I need to load a JSON file that has a colon in its name. However, with Windows 7 it doesn't seem to be possible, at least not directly.
These files are stored on a NFS drive thus we are able to see it in Windows 7, but cannot open them.
Does anyone have a workaround as to how it may be possible to read the JSON file containing a colon in Windows 7 using Python? One possible workaround we have (that we'd like to avoid) is to SSH into a Linux box, echo the contents and send it back.
Obviously if anyone else has another method that would be great. Windows XP is able to open them and read them just fine - this is just an issue with Win 7.
As an update, we found out that we can access our NFS/AFS servers through the web. So we ended up using urllib2 urlopen for all the JSON files that contain invalid characters. Seems to be working well so far.
To quote from http://support.microsoft.com/kb/289627:
Windows and UNIX operating systems have restrictions on valid characters that can be used in a file name. The list of illegal characters for each operating system, however, is different. For example, a UNIX file name can use a colon (:), but a Windows file name cannot use a colon (:). ...
To enable file name character mapping, create a character translation file and add a registry entry.
For example, the following maps the UNIX colon (:) to a Windows dash (-):
0x3a : 0x2d ; replace client : with - on server
When you have created the file name character translation file, you must specify its name location in the system registry. To register the path and name of the file:
Use Registry Editor to locate the following registry key:
HKEY_LOCAL_MACHINE\Software\Microsoft\Server For NFS\CurrentVersion\Mapping
Edit the CharacterTranslation (REG_SZ) value.
Enter the fully qualified path name of the file name character translation file. For example, C:\Sfu\CTrans.txt.

Determine if a path is legal

How can I check a user-supplied path is sanitised?
I want to ensure it has no wildcards nor any shenanigans. Right now, I'm checking that it is not escaping the correct folder so:
if os.path.commonprefix([os.path.abspath(path),os.getcwd()]) != os.getcwd():
# raise error etc..
But like all self-written security check code, I want it held up to better scrutiny! And it doesn't address that the path is actually legal after all that.
I will then be using the path to create assets and such.
You could use Werkzeug's secure_filename:
werkzeug.utils.secure_filename(filename)
Pass it a filename and it will return a secure version of it. This
filename can then safely be stored on a regular file system and passed
to os.path.join(). The filename returned is an ASCII only string for
maximum portability.
On windows system the function also makes sure that the file is not
named after one of the special device files.
>>> secure_filename("My cool movie.mov")
'My_cool_movie.mov'
>>> secure_filename("../../../etc/passwd")
'etc_passwd'
>>> secure_filename(u'i contain cool \xfcml\xe4uts.txt')
'i_contain_cool_umlauts.txt'

Categories

Resources