python open() escape backslash

python open() escape backslash - python

I have a path to a file containing $ signs, which I escaped with \ so open() can handle the path. But now open turns the \$ to a \\$ automatically.
For example:
open("/home/test/\$somedir\$/file.txt", "r")
result in an error message
IOError: [Errno 2] No such file or directory: '/home/test/\\$somedir\\$/file.txt'
Can I supress this. Why open() do that? I can't find anything in the docu of open, which describes this.

open() doesn't do that. It's Python, which escapes any special characters when representing a string:
>>> path = '\$'
>>> path
'\\$'
>>> print path
\$
In a regular Python string literal, a \ has special meaning, so it is escaped when echoing back the value, which can be pasted right back into a Python script or interpreter session to recreate the same value.
On Linux or Mac, you generally do not need to escape a $ value in a filename; a $ has no special meaning in a regular python string, nor in most Linux or Mac filenames:
>>> os.listdir('/tmp/$somedir$')
['test']
>>> open('/tmp/$somedir$/test')
<open file '/tmp/$somedir$/test', mode 'r' at 0x105579390>

Try using a literal representation of the string adding r before the string variable to avoid dealing with more complex scape situations, for example:
print('C:\test')
#C: est
print(r'C:\test')
#C:\test
where \t is interpreted as a tab.

Related

"No such file or directory" error with both file in same directory AND when using absolute path

all. I'm running into a "no such file or directory" issue in python that's stumped me.
Things I've tried so far:
Closing any program that I could think might have the file open
Having the file in the same directory as the program I'm running
Using the absolute path name
Escaping the backslashes
Escaping the backslashes and spaces
Changing the backslashes to forward slashes
Removing all spaces, special symbols, and numbers from the filename
I even checked with os.getcwd and os.path.abspath and copy-pasted the path exactly.
I'm not sure what's going on here. I'm at a loss now. Would I get this same error if the file is still open in some elusive background program?
This is the relevant bit of code:
print(os.getcwd())
print(os.path.abspath('RainyGenki.json'))
deckName = "C:\Users\myName\My Documents\LiClipse Workspace\KanjiDrag\RainyGenki.json"
deck = open(deckName, 'r') #opens card deck
This is the error message:
C:\Users\myName\My Documents\LiClipse Workspace\KanjiDrag
C:\Users\myName\My Documents\LiClipse Workspace\KanjiDrag\RainyGenki.json
Traceback (most recent call last):
File "C:\Users\myName\My Documents\LiClipse Workspace\KanjiDrag\kanji_drag\kanji_main.py", line 79, in <module>
deck = open(deckName, 'r') #opens card deck
IOError: [Errno 2] No such file or directory: 'C:\\Users\\myName\\My Documents\\LiClipse Workspace\\KanjiDrag\\RainyGenki.json'

Using a raw string...
If you are sure that your path is OK then use the python syntax for a raw string:
In plain English: String literals can be enclosed in matching single
quotes (') or double quotes ("). They can also be enclosed in matching
groups of three single or double quotes (these are generally referred
to as triple-quoted strings). The backslash () character is used to
escape characters that otherwise have a special meaning, such as
newline, backslash itself, or the quote character. String literals may
optionally be prefixed with a letter 'r' or 'R'; such strings are
called raw strings and use different rules for interpreting backslash
escape sequences. A prefix of 'u' or 'U' makes the string a Unicode
string. Unicode strings use the Unicode character set as defined by
the Unicode Consortium and ISO 10646. Some additional escape
sequences, described below, are available in Unicode strings. A prefix
of 'b' or 'B' is ignored in Python 2; it indicates that the literal
should become a bytes literal in Python 3 (e.g. when code is
automatically converted with 2to3). A 'u' or 'b' prefix may be
followed by an 'r' prefix.
This basically means that to escape the backslash escape sequences, you just need to put an 'r'before the string like:
deckName = r"C:\Users\myName\My Documents\LiClipse Workspace\KanjiDrag\RainyGenki.json"
ck = open(deckName, "r")
And even though you say you have tried it, escaping the backslashes should also work:
deckName = "C:\\Users\\myName\\My Documents\\LiClipse Workspace\\KanjiDrag\\RainyGenki.json"
ck = open(deckName, "r")

I had a folder called KanjiDrag, and within that I had the actual source folder kanji_drag which is where the json file and the main module were. The path I was using was accessing the KanjiDrag folder, but not the kanji_drag folder, and I didn't catch the different names. This is my dumbest file IO mistake yet. Thanks for all the replies, many of which I'll still refer to later when I refine this part of my program.

python magic can't identify unicode filename

In my small project I had to identify the type of files in the directory. So I went with python-magic module and did the following:
from Tkinter import Tk
from tkFileDialog import askdirectory
def getDirInput():
root = Tk()
root.withdraw()
return askdirectory()
di = getDirInput()
print('Selected Directory: ' + di)
for f in os.listdir(di):
m = magic.Magic(magic_file='magic')
print 'Type of ' + f + ' --> ' + m.from_file(f)
But It seems that python-magic can't take unicode filenames as it is when I pass it to the from_file() function.Here's a sample output:
Selected Directory: C:/Users/pruthvi/Desktop/vidrec/temp
Type of log.txt --> ASCII English text, with very long lines, with CRLF, CR line terminators
Type of TAEYEON 태연_ I (feat. Verbal Jint)_Music Video.mp4 --> cannot open `TAEYEON \355\234\227_ I (feat. Verbal Jint)_Music Video.mp4' (No such file or directory)
Type of test.py --> a python script text executable
you can observe that python-magic failed to identiy the type of second file TAEYEON... as it had unicode characters in it. It shows 태연 characters as \355\234\227 instead while I passed the same in both cases. How can I overcome this problem and find the type of file with Unicode characters also ? Thank you

But It seems that python-magic can't take unicode filenames
Correct. In fact most cross-platform software on Windows can't handle non-ASCII characters in filenames.
This is because the C standard library uses byte strings for all filenames but Windows uses Unicode strings (technically, UTF-16 code unit strings, but the difference isn't important here). When software using the C standard library opens a file by byte-based string, the MS C runtime converts that to a Unicode string automatically, using an encoding (the confusingly-named ‘ANSI’ code page) that depends on the locale of the Windows installation. Your ANSI code page is probably 1252, which can't encode Korean characters, so it's impossible to use that filename. The ANSI code page is unfortunately never anything sensible like UTF-8, so it can never include all possible Unicode characters.
Python is special in that it has extra support for Windows Unicode filenames which bypasses the C standard library and calls the underlying Win32 APIs directly for Unicode filenames. So you can pass a unicode string using eg open() and it will work for all filenames.
However python-magic's from_file call doesn't open the file from Python. Instead it passes the filename to the libmagic library which is written in pure C. libmagic doesn't have the special Windows-filename code path for Unicode so this fails.
I suggest opening the file yourself from Python and using magic.from_buffer instead.

The response from the magic module seems to show that your characters were incorrectly translated somewhere - only half the string is shown and the byte order of 태 is wrong - it should be \355\227\234at least.
As this is on Windows, this raises UTF-16 byte-order alarm bells.
It might be possible to work around this by encoding to UTF-16. As suggested by other commenters, you need to prefix the directory.
input_encoding = locale.getpreferredencoding()
u_di = di.decode(input_encoding)
m = magic.Magic(magic_file='magic') # only needs to be initialised once
for f in os.listdir(u_di):
fq_f = os.path.join(u_di, f)
utf16_fq_f = fq_f.encode("UTF-16LE")
print m.from_file(utf16_fq_f)

EOL Error with file path string in Python

I am trying to create a simple sting that points to a folder which contains a file on my C drive. The string is as follows:
filelocation = "C:\Documents\Folder\"
I am getting an EOL error which I think is being caused by the backslashes. Is it possible to have these backslashes in the string or is there another way of achieving this?
Thanks

Python on Windows supports forward slashes:
filelocation = "C:/Documents/Folder/"
Alternatively, escape each of your \ characters:
filelocation = "C:\\Documents\\Folder\\"
The reason you're getting the error is because of the final \ character - Python is interpreting that as an escape sequence, and it thinks the string has not been terminated. To work around it, use one of my solutions above, or just omit the final \.

On Windows: filelocation = "C:\\Documents\\Folder\\"
On Linux: filelocation = "C:/Documents/Folder/"

Get properties of a file whose name contains special (non-ASCII) characters

I'm using python and having some trouble reading the properties of a file, when the filename includes non-ASCII characters.
One of the files for example is named:
0-Channel-https∺∯∯services.apps.microsoft.com∯browse∯6.2.9200-1∯615∯Channel.dat
When I run this:
list2 = os.listdir('C:\\Users\\James\\AppData\\Local\\Microsoft\\Windows Store\\Cache Medium IL\\0\\')
for data in list2:
print os.path.getmtime(data) + '\n'
I get the error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: '0-Channel-https???services.apps.microsoft.com?browse?6.2.9200-1?615?Channel.dat'
I assume its caused by the special chars because the code works fine with other file names with only ASCII chars.
Does anyone know of a way to query the filesystem properties of a file named like this?

If this is python 2.x, its an encoding issue. If you pass a unicode string to os.listdir such as u'C:\\my\\pathname', it will return unicode strings and they should have the non-ascii chars encoded correctly. See Unicode Filenames in the docs.
Quoting the doc:
os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames. For example, assuming the default filesystem encoding is UTF-8, running the following program:
this code should work...
directory_name = u'C:\\Users\\James\\AppData\\Local\\Microsoft\\Windows Store\\Cache Medium IL\\0\\'
list2 = os.listdir(directory_name)
for data in list2:
print data, os.path.getmtime(os.path.join(directory_name, data))

As you are in windows you should try with ntpath module instead of os.path
from ntpath import getmtime
As I don't have windows I can't test it. Every os has a different path convention, so, Python provides a specific module for the most common operative systems.

Strange path separators on Windows

I an running this code:
#!/usr/bin/python coding=utf8
# test.py = to demo fault
def loadFile(path):
f = open(path,'r')
text = f.read()
return text
if __name__ == '__main__':
path = 'D:\work\Kindle\srcs\test1.html'
document = loadFile(path)
print len(document)
It gives me a trackback
D:\work\Kindle\Tests>python.exe test.py
Traceback (most recent call last):
File "test.py", line 11, in <module>
document = loadFile(path)
File "test.py", line 5, in loadFile
f = open(path,'r')
IOError: [Errno 22] invalid mode ('r') or filename: 'D:\\work\\Kindle\\srcs\test1.html'
D:\work\Kindle\Tests>
If I change the path line to
path = 'D:\work\Kindle\srcs\\test1.html'
(note the double \\) it all works fine.
Why? Either the separator is '\' or it is not, not a mix?
System. Windows 7, 64bit,
Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Checked - and all the backslashes appear correctly.

The backslash is an escape character when the next character combination would result in a special meaning. Take the following examples:
>>> '\r'
'\r'
>>> '\n'
'\n'
>>> '\b'
'\x08'
>>> '\c'
'\\c'
>>>
r, n, and b all have special meanings when preceded by a backslash. The same is true for t, which would produce a tab. You either need to A. Double all your backslashes, for consistency, because '\\' will produce a backslash, or, B, use raw strings: r'c:\path\to\my\file.txt'. The preceding r will prompt the interpreter not to evaluate back slashes as escape sequences, preventing the \t from appearing as a tab.

You need to escape backslashes in paths with an extra backslash... like you've done for '\\test1.html'.
'\t' is the escape sequence for a tab character.
'D:\work\Kindle\srcs\test1.html is essentially 'D:\work\Kindle\srcs est1.html'.
You could also use raw literals, r'\test1.html' expands to:
'\\test1.html'

Use raw strings for Windows paths:
path = r'D:\work\Kindle\srcs\test1.html'
Otherwise the \t piece of your string will be interpreted as a Tab character.

The backslash \ is an escape character in Python. So your actual filepath is going to be D:\work\Kindle\srcs<tab>est1.html. Use os.sep, escape the backslashes with \\ or use a raw string by having r'some text'.

In addition to using a raw string (prefix string with the r character), the os.path module may be helpful to automatically provide OS-correct slashes when building a pathname.

Gotcha — backslashes in Windows filenames provides an interesting overview.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.