Trouble loading csv file into Jupyter Notebooks [duplicate]

Trouble loading csv file into Jupyter Notebooks [duplicate] - python

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 3 years ago.
I'm trying to read a CSV file into Python (Spyder), but I keep getting an error. My code:
import csv
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)
I get the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 2-3: truncated \UXXXXXXXX escape
I have tried to replace the \ with \\ or with / and I've tried to put an r before "C.., but all these things didn't work.

This error occurs, because you are using a normal string as a path. You can use one of the three following solutions to fix your problem:
1: Just put r before your normal string. It converts a normal string to a raw string:
pandas.read_csv(r"C:\Users\DeePak\Desktop\myac.csv")
2:
pandas.read_csv("C:/Users/DeePak/Desktop/myac.csv")
3:
pandas.read_csv("C:\\Users\\DeePak\\Desktop\\myac.csv")

The first backslash in your string is being interpreted as a special character. In fact, because it's followed by a "U", it's being interpreted as the start of a Unicode code point.
To fix this, you need to escape the backslashes in the string. The direct way to do this is by doubling the backslashes:
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
If you don't want to escape backslashes in a string, and you don't have any need for escape codes or quotation marks in the string, you can instead use a "raw" string, using "r" just before it, like so:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

You can just put r in front of the string with your actual path, which denotes a raw string. For example:
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")

Consider it as a raw string. Just as a simple answer, add r before your Windows path.
import csv
data = open(r"C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
data = csv.reader(data)
print(data)

Try writing the file path as "C:\\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener" i.e with double backslash after the drive as opposed to "C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener"

Add r before your string. It converts a normal string to a raw string.

As per String literals:
String literals can be enclosed within single quotes (i.e. '...') or double quotes (i.e. "..."). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings).
The backslash character (i.e. \) is used to escape characters which otherwise will have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter r or R. Such strings are called raw strings and use different rules for backslash escape sequences.
In triple-quoted strings, unescaped newlines and quotes are allowed, except that the three unescaped quotes in a row terminate the string.
Unless an r or R prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
So ideally you need to replace the line:
data = open("C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener")
To any one of the following characters:
Using raw prefix and single quotes (i.e. '...'):
data = open(r'C:\Users\miche\Documents\school\jaar2\MIK\2.6\vektis_agb_zorgverlener')
Using double quotes (i.e. "...") and escaping backslash character (i.e. \):
data = open("C:\\Users\\miche\\Documents\\school\\jaar2\\MIK\\2.6\\vektis_agb_zorgverlener")
Using double quotes (i.e. "...") and forwardslash character (i.e. /):
data = open("C:/Users/miche/Documents/school/jaar2/MIK/2.6/vektis_agb_zorgverlener")

Just putting an r in front works well.
eg:
white = pd.read_csv(r"C:\Users\hydro\a.csv")

It worked for me by neutralizing the '' by f = open('F:\\file.csv')

The double \ should work for Windows, but you still need to take care of the folders you mention in your path. All of them (except the filename) must exist. Otherwise you will get an error.

Related

Why aren't all the backslashes being replaced as per replace() statement?

I'm trying to replace backslashes (from this polyline) with double backslashes.
My Script:
txt = "qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\Zl#d#|Ad#`#jAzBf#nATn#\xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\X~#b#^^PfAr#\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA"
x = txt.replace("\\", "\\\\")
print(x)
Output (top string has spaces added to highlight the differences to the orginal string below)
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\\h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\\Zl#d#|Ad#`#jAzBf#nATn#~ #Jt#TJTv#^b#f#^b#b#f#^r#nA`#^#` #VVp#Lx#Vb#d#n#\\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\\`#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\\wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\\X~#b#^^PfAr#\\Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\\kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
qnkyHbgFYhBi#lFA|#Dz#K~#Oz#An#Dz#A|#D|#M|BJZd#KDHRzBf#lMNrBB~#Lz#\ h#^`#Lt#?L]`#GH?FJl#Xl#NNLXVpBHV^LZf#Jt##j#Kx#OjB#l#b#ZJDv#d#`#RXFp#\ Zl#d#|Ad#`#jAzBf#nATn#\ xAD~#Jt#TJTv#^b#f#^b#b#f#^r#nA`#^\f#`#VVp#Lx#Vb#d#n#\ l#V|BLx#Hz#Pp#Lz#Fl#ErAFf#K~#IvBQr#m#hBFfAC|#g#nC?z#ANi#hBGjAMx#OvBa#hBWfBSx#?r#M|#Qv#Wn#q#tAMt##z#Px#kAvDWj#u#pAYr#Mx#q#fA_#d#K~#YTAr#C##B#`#ITSL}#I]vBz#pA#z#KVGDk##g#Eg#Kg##g#Ec#Fe#YSZGFEUUg#Gz#i#j#S|#Wr#Sp#[hBDx#BvBD~#BvCNx#Zj#Z^Zf#TXb#R^Z\ `#f#FXj#BTOdA]`BUx#Wl#Oz#G|#Hn#At#m#lI#XXf#?v#IBCq#M_AV{B\ wGLs#To#|#Th#In#`##cBWmAuAeE[k#e#YYg#a#_#c#YQ[y#aAWa#Us#?iAEk#?{#KaFNy#d#qBh#eDl#s#J#JMLKLT`#R`#\ X~#b#^^PfAr#\ Lf#Fd#m#J{ABkALw#d#a#?_AMoA?yCbAqC#s#hBwBXi#Ng#Nu#n#sAMuBJ}#\ kACs#[e#Wk#e#c#k#YYYIBi#eAMy#A}#Im#G}#JyBDyB?{#IyBC_AKc#_#QgAa#]m#eAo#]e#Om#SsAIy#GwDIwB#}BF}BC]AcEIq#Wq#i#eBMy#AkBUyDIKEvAIZUNK?MFc#ZUGKeAG}#M{DLgAD_ADiFHKtBITm#By#OS[QAUF}#PeBA}AIe#GQc#g#UaBUs#}#gAKi#Ns#A[Kw#GS]g#Ws#I}BOwB?{#SqDA_AQwDAw#g#DEGAu#B}BA}#Fy#Re#Hw#A_AUiAH{#RYXk#EwBG{#OiAAmA
So you can see
#\xAD~#
becomes
#~#
when I would expect it to be
#\\xAD~#.

The basic problem is that there are no backslashes in your string. Your source code has backslashes, but they are all escape signals. If you want to retain the backslashes in "WYSIWYG" style, use raw string mode:
txt = r"qnkyHbgFYhB..."
This will retain the characters as seen, without processing the usual escape sequences.

python 3: quoting result of random string generation

I'm new to python and things do not always work as I expect... but I am learning, slowly. Here is a case in point. If I randomly create a string via:
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
])
then I could end up with a string with single quotes as well as backslashes. I assume that I should then go through the string and quote the possibly problematic characters. So, for example: if I generate the (short) string:
cs]b77e\IM>&4/,u.s_jr"xmMdHD7a'wrEw(
my instinct tells me that I should quote that into:
cs]b77e\\IM>&4/,u.s_jr"xmMdHD7a\'wrEw(
It looks like the string.replace() method is my friend...
thing = ''.join([
random.SystemRandom().choice(
"{}{}{}".format(
string.ascii_letters, string.digits, string.punctuation
)
) for i in range(63)
]).replace('\\', '\\').replace('\'', '\'')
but is there a better way?
Also, in the replace() methods the meaning of the single quoted strings seems to change depending on context. Coming from Perl this seems strange to me. My initial attempts had me doing things like replace('\\', '\\\\') thinking that I had to quote the characters going into the replacement string. Is this normal or am I missing something else?
Edit
My goal here is to end up with 63 characters in a string. I don't really think that I have to quote any generated single quotes but my thought is that if I later use the string and it has generated backslashes then the next character after the backslash would act like it was quoted, right? I mean:
len('1234')
yields 4 but
len('12\4')
yields 3 so I need to post-process the generated string to at least quote the backslashes, right? Is there a better way to quote problematic characters than a chain of replaces() methods?

A string can contain any valid characters; the quotes and backslashes are only useful or special when representing a string in Python code. So you don't normally need to do anything like this when you already have a string which contains the characters you want.
If you want a representation which can be parsed by Python (e.g. by writing it to a .py file), repr() does that.

You don't have to escape characters unless they are part of code you are writing or from an input from a user. If the backslash character or a quote character is generated by a Python program, then it is already stored as that character in memory. There is no need do any additional escaping.
Why? Because Python is not interpreting a string literal, it is simply generating characters, which are stored as numbers in memory. When you ask Python to display a string containing one of the characters such as a single quote or a backslash, it will automatically escape them.
Here is an example. A double quote is 34, single quote is character 39, and backslash is 92.
'a'+chr(34)+'b'+chr(39)+'c'+chr(92)+'d'
# returns:
'a"b\'c\\d'
Because I included a double quote and a single quote Python will use a single quote to surround the string, an unescaped double quote within the string, an escaped single quote, and and escaped backslash.
So there is no need to escape characters that are generated within a Python program, it does it for you.

Defining file paths in python with EOL string literal errors [duplicate]

Technically, any odd number of backslashes, as described in the documentation.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.

The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:
When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string
So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).
This way:
r'abc\d' comprises a, b, c, \, d
r'abc\'d' comprises a, b, c, \, ', d
r'abc\'' comprises a, b, c, \, '
and:
r'abc\' comprises a, b, c, \, ' but there is no terminating quote now.
Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.

The reason is explained in the part of that section which I highlighted in bold:
String quotes can be escaped with a
backslash, but the backslash remains
in the string; for example, r"\"" is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\" is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.
So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

That's the way it is! I see it as one of those small defects in python!
I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.
The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.
However, this shouldn't cause any trouble.
If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join
>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'

In order for you to end a raw string with a slash I suggest you can use this trick:
>>> print r"c:\test"'\\'
test\
It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.

Another trick is to use chr(92) as it evaluates to "\".
I recently had to clean a string of backslashes and the following did the trick:
CleanString = DirtyString.replace(chr(92),'')
I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.

Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.
Why not stop parsing the string literal when you encounter the first "?
If that was the case, then \" wouldn't be allowed inside the string literal. But it is.

The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.
But you could use:
'\\'

Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).
I thought it was an interesting idea and am including it as community wiki for posterity.

Naive raw strings
The naive idea of a raw string is
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
and it will mean itself.
Unfortunately, this does not work, because if the whatever
happens to contain a quote, the raw string would end at that point.
It is simply impossible that I can put "whatever I want"
between fixed delimiters, because some of it could look like
the terminating delimiter -- no matter what that delimiter is.
Real-world raw strings (variant 1)
One possible approach to this problem would be to say
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
This restriction sounds harsh, until one recognizes that
Python's large offering of quotes can accommodate most situations
with this rule. The following are all valid Python quotes:
'
"
'''
"""
With this many possibilities for the delimiter, almost anything
can be made to work.
About the only exception would be if the string
literal is supposed to contain a complete list of all allowed
Python quotes.
Real-world raw strings (variant 2, as in Python)
Python, however, takes a different route using
an extended version of the above rule.
It effectively states
If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
If I insist on including a quote, even that is allowed,
but I have to put a backslash before it.
So the Python approach is, in a sense, even more liberal
than variant 1 above -- but it has the side effect of
"mis"interpreting the closing quote as part of the string
if the last intended character of the string is a backslash.
Variant 2 is not helpful:
If I want the quote in my string,
but not the backslash, the allowed version of my string literal
will not be what I need.
However, given the three different other kinds of quotes I have
at my disposal, I will probably just pick one of those and my
problem will be solved -- so this is not problematic case.
The problematic case is this one:
If I want my string to end with a backslash, I am at a loss.
I need to resort to concatenating a non-raw string literal
containing the backslash.
Conclusion
After writing this, I go with several of the other posters
that variant 1 would have been easier to understand and to accept
and therefore more pythonic. That's life!

Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.
That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.

some tips :
1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :
os.path.normpath('c:/folder1/')
2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :
r'\one \two \three'
3) if you need to prefix a string in a variable X with a backslash then you can do this :
X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X # X2 now contains \dummy
4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :
voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name
now lilypond_statement contains "\DisplayLilyMusic \upper"
long live python ! :)
n3on

Despite its role, even a raw string cannot end in a single
backslash, because the backslash escapes the following quote
character—you still must escape the surrounding quote character to
embed it in the string. That is, r"...\" is not a valid string
literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use
two and slice off the second.

Given the confusion around the arbitrary-seeming restriction against an odd number of backslashes at the end of a Python raw-string, it's fair to say that this is a design mistake or legacy issue originating in a desire to have a simpler parser.
While workarounds (such as r'C:\some\path' '\\' yielding 'C:\\some\\path\\' (in Python notation) or C:\some\path\ (verbatim)) are simple, it's counterintuitive to be needing them. For comparison, let's have a look at C++ and Perl.
In C++, we can straightforwardly use raw string literal syntax
#include <iostream>
int main() {
std::cout << R"(Hello World!)" << std::endl;
std::cout << R"(Hello World!\)" << std::endl;
std::cout << R"(Hello World!\\)" << std::endl;
std::cout << R"(Hello World!\\\)" << std::endl;
}
to get the following output:
Hello World!
Hello World!\
Hello World!\\
Hello World!\\\
If we want to use the closing delimiter (above: )) within the string literal, we can even extend the syntax in an ad-hoc way to R"delimiterString(quotedMaterial)delimiterString". For example, R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf" produces the string some random delimiters: ( } [ ] { ) < > just for fun in the output. (Ain't that a good use of "asdf"!)
In Perl, this code
my $str = q{This is a test.\\};
print ($str);
print ("This is another test.\n");
will output the following: This is a test.\This is another test.
Replacing the first line by
my $str = q{This is a test.\};
would lead to an error message: Can't find string terminator "}" anywhere before EOF at main.pl line 1.
However, Perl treating a pre-delimiter \ as an escape character doesn't prevent the user from having an odd number of backslashes at the end of the resulting string; eg to place 3 backslashes \\\ into the end of $str, simply end the code with 6 backslashes: my $str = q{This is a test.\\\\\\};. Importantly, while we need to double the backslashes in the input, there is no Python-like inconsistent-seeming syntactic restriction.
Another way of looking at things is that these 3 languages use different ways to address the parsing issue of interaction between escape characters and closing delimiters:
Python: disallows an odd number of backslashes just before the closing delimiter; a simple workaround is r'stringWithoutFinalBackslash' '\\'
C++: allows essentially¹ everything between the delimiters
Perl: allows essentially² everything between the delimiters, but backslashes need to be consistently doubled
¹ The custom delimiterString itself cannot be more than 16 characters long, but that's hardly a limitation.
² If you need the delimiter itself, just escape it with \.
However, to be fair in a comparison to Python, we need to acknowledge that (1) C++ didn't have such string literals until C++11 and is famously hard to parse and (2) Perl is even harder to parse.

I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:
x = 'a string\\'
x
'a string\\'
# Now save it in a text file and it will appear with a single backslash:
with open("my_file.txt", 'w') as h:
h.write(x)
BTW it is not working with json if you dump it using python's json library.
Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).

Understanding file locations in python - unexpected errors

I am learning python 3.3 in windows 7. I have a two text files - lines.txt and raven.txt in a folder. Both contain the same text for the first example.
When I try to access ravens, using the code below, I get the error -
OSError: [Errno 22] Invalid argument: 'C:\\Python\raven.txt'
I know that the above error can be fixed by using an escape character like this -
C:\\Python\\raven.txt
C:\Python\\raven.txt
Why do both methods work ? Strangely, when I access lines.txt in the same folder, I get no error ! Why ?
import re
def main():
print('')
fh = open('C:\Python\lines.txt')
for line in fh:
if re.search('(Len|Neverm)ore', line):
print(line, end = '')
if __name__ == '__main__':main()
Also, when I use the line below, I get a completely different error - TypeError: embedded NUL character. Why ?
fh = open('C:\Python\Exercise Files\09 Regexes\raven.txt')
I can rectify this by using \ before every \ in the file path.

\r is an escape character, but \l is not. So, lines is interpreted as lines while raven is interpreted as aven, since \r is escaped.
In [1]: len('\l')
Out[1]: 2
In [2]: len('\r')
Out[2]: 1
You should always escape backslashes with \\. In cases your string doesn't have quotes, you can also use raw strings:
In [9]: len(r'\r')
Out[9]: 2
In [10]: r'\r'
Out[10]: '\\r'
See: https://docs.python.org/3/reference/lexical_analysis.html

maybe you can use raw string.
just like this open(r'C:\Python\Exercise Files\09 Regexes\raven.txt').
When an r' orR' prefix is present, backslashes are still used to
quote the following character, but all backslashes are left in the
string. For example, the string literal r"\n" consists of two
characters: a backslash and a lowercase `n'. String quotes can be
escaped with a backslash, but the backslash remains in the string; for
example, r"\"" is a valid string literal consisting of two characters:
a backslash and a double quote; r"\" is not a value string literal
(even a raw string cannot end in an odd number of backslashes).
Specifically, a raw string cannot end in a single backslash (since the
backslash would escape the following quote character). Note also that
a single backslash followed by a newline is interpreted as those two
characters as part of the string, not as a line continuation.

You can actually use forward slashes instead of backward ones, that way you don't have to escape them at all, which would save you a lot of headaches. Like this: 'C:/Python/raven.txt', I can guarantee that it works on Windows.

String literals for file names

I am new to Python - but not to programming, and on a bit of a steep learning curve.
I have a programme that reads several input files - the first input file contains (amongst other things) the path and name the other files.
I can open the file and read the name OK. If I print the string it looks like this
Z:\ \python\ \rb_data.dat\n'
all my "\" become "\ \" I think I can fix this by using the "r" prefix to convert it to a literal.
My question is how do I attach the prefix to a string variable ??
This is what I want to do :
modat = open('z:\\python\mot1 input.txt') # first input file containing names of other file
rbfile = modat.readline() # get new file name
rbdat = open(rbfile) # open new file

The \\ is an escape sequence for the backslash character \. When you specify a string literal, they are enquoted by either ' or ". Because there are some characters you might need to specify to be part of the string which you cannot enter like this—for example the quotation marks themselves—escape sequences allow you to do it. They usually are \x where x is something you want to enter. Now because all escape sequences start with a backslash, the backslash itself also turns into a special character which you cannot specify directly within a string literal. So you need to escape it too.
That means that the string literal '\\' actually represents a string with a single character: The backslash. Raw strings, that are string literals with an r in front of the opening quotation character, ignore (most) escape sequences. So r'\\x' is actually the string where two backslashes are followed by an x. So it’s identical to the string described by the non-raw string literal '\\\\x'.
All this only applies to string literals though. The string itself holds no information about whether it was created with a raw string literal or not, or whether there was some escape sequence need or not. It just contains all the characters that make out the string.
That also means that as soon as you get a string from somewhere, for example by reading it from a file, then you don’t need to worry about escaping something in there to make sure that it’s a correct string. It just is.
So in your code, when you open the file at z:\python\mot1 input.txt, you need to specify that filename as a string first. So you have to use a string literal, either with escaping the backslashes, or by using a raw string.
Then, when you read the new filename from that file, you already have a real string, and don’t need to bother with anything more. Assuming that it was correctly written to the file, you can just use it like that.

The backslash \ in Python strings (and in code blocks on StackOverflow!) means, effectively, "treat the next character differently". As it is reserved for this purpose, when you actually have a backslash in your strings, it must be "escaped" by a preceding backslash:
>>> myString = "\\" # the first one "escapes" the second
>>> myString = "\" # no escape, so...
SyntaxError: EOL while scanning string literal
>>> print("\\") # when we actually print out the string
\
The short story is, you can basically ignore this in your strings. If you pass rbfile to open, Python will interpret it correctly.

Why not use os.path.normcase, like this:
with open(r'z:\python\mot1 input.txt') as f:
for line in f:
if line.strip():
if os.path.isfile(os.path.normcase(line.strip())):
with open(line.strip()) as f2:
# do something with
# f2
From the documentation of os.path.normcase:
Normalize the case of a pathname. On Unix and Mac OS X, this returns
the path unchanged; on case-insensitive filesystems, it converts the
path to lowercase. On Windows, it also converts forward slashes to
backward slashes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.