String literals for file names

String literals for file names - python

I am new to Python - but not to programming, and on a bit of a steep learning curve.
I have a programme that reads several input files - the first input file contains (amongst other things) the path and name the other files.
I can open the file and read the name OK. If I print the string it looks like this
Z:\ \python\ \rb_data.dat\n'
all my "\" become "\ \" I think I can fix this by using the "r" prefix to convert it to a literal.
My question is how do I attach the prefix to a string variable ??
This is what I want to do :
modat = open('z:\\python\mot1 input.txt') # first input file containing names of other file
rbfile = modat.readline() # get new file name
rbdat = open(rbfile) # open new file

The \\ is an escape sequence for the backslash character \. When you specify a string literal, they are enquoted by either ' or ". Because there are some characters you might need to specify to be part of the string which you cannot enter like this—for example the quotation marks themselves—escape sequences allow you to do it. They usually are \x where x is something you want to enter. Now because all escape sequences start with a backslash, the backslash itself also turns into a special character which you cannot specify directly within a string literal. So you need to escape it too.
That means that the string literal '\\' actually represents a string with a single character: The backslash. Raw strings, that are string literals with an r in front of the opening quotation character, ignore (most) escape sequences. So r'\\x' is actually the string where two backslashes are followed by an x. So it’s identical to the string described by the non-raw string literal '\\\\x'.
All this only applies to string literals though. The string itself holds no information about whether it was created with a raw string literal or not, or whether there was some escape sequence need or not. It just contains all the characters that make out the string.
That also means that as soon as you get a string from somewhere, for example by reading it from a file, then you don’t need to worry about escaping something in there to make sure that it’s a correct string. It just is.
So in your code, when you open the file at z:\python\mot1 input.txt, you need to specify that filename as a string first. So you have to use a string literal, either with escaping the backslashes, or by using a raw string.
Then, when you read the new filename from that file, you already have a real string, and don’t need to bother with anything more. Assuming that it was correctly written to the file, you can just use it like that.

The backslash \ in Python strings (and in code blocks on StackOverflow!) means, effectively, "treat the next character differently". As it is reserved for this purpose, when you actually have a backslash in your strings, it must be "escaped" by a preceding backslash:
>>> myString = "\\" # the first one "escapes" the second
>>> myString = "\" # no escape, so...
SyntaxError: EOL while scanning string literal
>>> print("\\") # when we actually print out the string
\
The short story is, you can basically ignore this in your strings. If you pass rbfile to open, Python will interpret it correctly.

Why not use os.path.normcase, like this:
with open(r'z:\python\mot1 input.txt') as f:
for line in f:
if line.strip():
if os.path.isfile(os.path.normcase(line.strip())):
with open(line.strip()) as f2:
# do something with
# f2
From the documentation of os.path.normcase:
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

Related

Trouble loading csv file into Jupyter Notebooks [duplicate]

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 3 years ago.
I'm trying to read a CSV file into Python (Spyder), but I keep getting an error. My code:
import csv
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I have tried to replace the \ with \\ or with / and I've tried to put an r before "C.., but all these things didn't work.

This error occurs, because you are using a normal string as a path. You can use one of the three following solutions to fix your problem:
1: Just put r before your normal string. It converts a normal string to a raw string:
pandas.read_csv(r"C:\Users\DeePak\Desktop\myac.csv")
2:
pandas.read_csv("C:/Users/DeePak/Desktop/myac.csv")
3:
pandas.read_csv("C:\\Users\\DeePak\\Desktop\\myac.csv")

The first backslash in your string is being interpreted as a special character. In fact, because it's followed by a "U", it's being interpreted as the start of a Unicode code point.
To fix this, you need to escape the backslashes in the string. The direct way to do this is by doubling the backslashes:
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
If you don't want to escape backslashes in a string, and you don't have any need for escape codes or quotation marks in the string, you can instead use a "raw" string, using "r" just before it, like so:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

You can just put r in front of the string with your actual path, which denotes a raw string. For example:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

Consider it as a raw string. Just as a simple answer, add r before your Windows path.
import csv
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)

Try writing the file path as "C:\\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener" i.e with double backslash after the drive as opposed to "C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener"

Add r before your string. It converts a normal string to a raw string.

As per String literals:
String literals can be enclosed within single quotes (i.e. '...') or double quotes (i.e. "..."). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings).
The backslash character (i.e. \) is used to escape characters which otherwise will have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r or R. Such strings are called raw strings and use different rules for backslash escape sequences.
In triple-quoted strings, unescaped newlines and quotes are allowed, except that the three unescaped quotes in a row terminate the string.
Unless an r or R prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So ideally you need to replace the line:
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
To any one of the following characters:
Using raw prefix and single quotes (i.e. '...'):
data = open(r'C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener')
Using double quotes (i.e. "...") and escaping backslash character (i.e. \):
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
Using double quotes (i.e. "...") and forwardslash character (i.e. /):
data = open("C:/Users/miche/Documents/school/jaar2/MIK/2.6/vektis_agb_zorgverlener")

Just putting an r in front works well.
eg:
white = pd.read_csv(r"C:\Users\hydro\a.csv")

It worked for me by neutralizing the '' by f = open('F:\\file.csv')

The double \ should work for Windows, but you still need to take care of the folders you mention in your path. All of them (except the filename) must exist. Otherwise you will get an error.

python 3: quoting result of random string generation

I'm new to python and things do not always work as I expect... but I am learning, slowly. Here is a case in point. If I randomly create a string via:
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
])
then I could end up with a string with single quotes as well as backslashes. I assume that I should then go through the string and quote the possibly problematic characters. So, for example: if I generate the (short) string:
cs]b77e\IM>&4/,u.s_jr"xmMdHD7a'wrEw(
my instinct tells me that I should quote that into:
cs]b77e\\IM>&4/,u.s_jr"xmMdHD7a\'wrEw(
It looks like the string.replace() method is my friend...
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
]).replace('\\', '\\').replace('\'', '\'')
but is there a better way?
Also, in the replace() methods the meaning of the single quoted strings seems to change depending on context. Coming from Perl this seems strange to me. My initial attempts had me doing things like replace('\\', '\\\\') thinking that I had to quote the characters going into the replacement string. Is this normal or am I missing something else?
Edit
My goal here is to end up with 63 characters in a string. I don't really think that I have to quote any generated single quotes but my thought is that if I later use the string and it has generated backslashes then the next character after the backslash would act like it was quoted, right? I mean:
len('1234')
yields 4 but
len('12\4')
yields 3 so I need to post-process the generated string to at least quote the backslashes, right? Is there a better way to quote problematic characters than a chain of replaces() methods?

A string can contain any valid characters; the quotes and backslashes are only useful or special when representing a string in Python code. So you don't normally need to do anything like this when you already have a string which contains the characters you want.
If you want a representation which can be parsed by Python (e.g. by writing it to a .py file), repr() does that.

You don't have to escape characters unless they are part of code you are writing or from an input from a user. If the backslash character or a quote character is generated by a Python program, then it is already stored as that character in memory. There is no need do any additional escaping.
Why? Because Python is not interpreting a string literal, it is simply generating characters, which are stored as numbers in memory. When you ask Python to display a string containing one of the characters such as a single quote or a backslash, it will automatically escape them.
Here is an example. A double quote is 34, single quote is character 39, and backslash is 92.
'a'+chr(34)+'b'+chr(39)+'c'+chr(92)+'d'
# returns:
'a"b\'c\\d'
Because I included a double quote and a single quote Python will use a single quote to surround the string, an unescaped double quote within the string, an escaped single quote, and and escaped backslash.
So there is no need to escape characters that are generated within a Python program, it does it for you.

Using os.chdir to access a file in which a folder starts with '\f'

I know that \f is a form feed. I want to access my folder the following way:
os.chdir("C:\Python27\BGT_Python\skills\fuzzymatching")
The folder 'fuzzymatching' starts with the \f symbol which breaks the string.
What's the easiest way to get around these types of symbols?

Add an r character in front of the string:
os.chdir(r"C:\Python27\BGT_Python\skills\fuzzymatching")
See the Python docs.
In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A ``quote'' is the character used to open the string, i.e. either ' or ".)
and
Unless an r' orR' prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.

For completeness, I'll add:
os.chdir("C:/Python27/BGT_Python/skills/fuzzymatching")
About the only part of Windows that actually requires backslashes is the command line.

This should work:
os.chdir("C:\Python27\BGT_Python\skills\\fuzzymatching")
I just added a \ to scape \f.

Escape Windows's Path Delimiter

I need to change this string by escaping the windows path delimiters. I don't define the original string myself, so I can't pre-pend the raw string 'r'.
I need this:
s = 'C:\foo\bar'
to be this:
s = 'C:\\foo\\bar'
Everything I can find here and elsewhere says to do this:
s.replace( r'\\', r'\\\\' )
(Why I should have to escape the character inside a raw string I can't imagine)
But printing the string results in this. Obviously something has decided to re-interpret the escapes in the modified string:
C:♀oar
This would be so simple in Perl. How do I solve this in Python?

After a bunch of questions back and forth, the actual problem is this:
You have a file with contents like this:
C:\foo\bar
C:\spam\eggs
You want to read the contents of that file, and use it as pathnames, and you want to know how to escape things.
The answer is that you don't have to do anything at all.
Backslash sequences are processed in string literals, not in string objects that you read from a file, or from input (in 3.x; in 2.x that's raw_input), etc. So, you don't need to escape those backslash sequences.
If you think about it, you don't need to add quotes around a string to turn it into a string. And this is exactly the same case. The quotes and the escaping backslashes are both part of the string's representation, not the string itself.
In other words, if you save that example file as paths.txt, and you run the following code:
with open('paths.txt') as f:
file_paths = [line.strip() for line in f]
literal_paths = [r'C:\foo\bar', r'C:\spam\eggs']
print(file_paths == literal_paths)
… it will print out True.
Of course if your file was generated incorrectly and is full of garbage like this:
C:♀oar
Then there is no way to "escape the backslashes", because they're not there to escape. You can try to write heuristic code to reconstruct the original data that was supposed to be there, but that's the best you can do.
For example, you could do something like this:
backslash_map = { '\a': r'\a', '\b': r'\b', '\f': r'\f',
'\n': r'\n', '\r': r'\r', '\t': r'\t', '\v': r'\v' }
def reconstruct_broken_string(s):
for key, value in backslash_map.items():
s = s.replace(key, value)
return s
But this won't help if there were any hex, octal, or Unicode escape sequences to undo. For example, 'C:\foo\x02' and 'C:\foo\b' both represent the exact same string, so if you get that string, there's no way to know which one you're supposed to convert to. That's why the best you can do is a heuristic.

Don't do s.replace(anything). Just stick an r in front of the string literal, before the opening quote, so you have a raw string. Anything based on string replacement would be a horrible kludge, since s doesn't actually have backslashes in it; your code has backslashes in it, but those don't become backslashes in the actual string.
If the string actually has backslashes in it, and you want the string to have two backslashes wherever there once was one, you want this:
s = s.replace('\\', r'\\')
That'll replace any single backslash with two backslashes. If the string literally appears in the source code as s = 'C:\foo\bar', though, the only reasonable solution is to change that line. It's broken, and anything you do to the rest of the code won't make it not broken.

python replace backslashes to slashes

How can I escape the backslashes in the string: 'pictures\12761_1.jpg'?
I know about raw string. How can I convert str to raw if I take 'pictures\12761_1.jpg' value from xml file for example?

You can use the string .replace() method along with rawstring.
Python 2:
>>> print r'pictures\12761_1.jpg'.replace("\\", "/")
pictures/12761_1.jpg
Python 3:
>>> print(r'pictures\12761_1.jpg'.replace("\\", "/"))
pictures/12761_1.jpg
There are two things to notice here:
Firstly to read the text as a drawstring by putting r before the
string. If you don't give that, there will be a Unicode error here.
And also that there were two backslashes given inside the replace method's first argument. The reason for that is that backslash is a literal used with other letters to work as an escape sequence. Now you might wonder what is an escape sequence. So an escape sequence is a sequence of characters that doesn't represent itself when used inside string literal or character. It is composed of two or more characters starting with a backslash. Like '\n' represents a newline and similarly there are many. So to escape backslash itself which is usually an initiation of an escape sequence, we use another backslash to escape it.
I know the second part is bit confusing but I hope it made some sense.

You can also use split/join:
print "/".join(r'pictures\12761_1.jpg'.split("\\"))
EDITED:
The other way you may use is to prepare data during it's retrieving(e.g. the idea is to update string before assign to variable) - for example:
f = open('c:\\tst.txt', "r")
print f.readline().replace('\\','/')
>>>'pictures/12761_1.jpg\n'

I know it is not what you asked exactly, but I think this will work better.
Tit's better to just have the names of your directories and use os.path.join(directory,filename)
"os.path.join(path, *paths)
Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component"
https://docs.python.org/2/library/os.path.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.