Why are Python strings behaving funny? - python

I'm so confused... why/how is a different from b?! Why don't they print the same thing?
>>> a = '"'
>>> a
'"'
>>> b = "'"
>>> b
"'"

The strings are not presented differently. Their presentation is just adjusted to avoid having to quote the contained quote. Both ' and " are legal string literal delimiters.
Note that the contents of the string are very different. " is not the same string as '; a == b is (patently) False.
Python would have to use a \ backslash for the " or ' character otherwise. If you use both characters in a string, then python is forced to use quoting:
>>> '\'"'
'\'"'
>>> """Tripple quoted means you can use both without escaping them: "'"""
'Tripple quoted means you can use both without escaping them: "\''
As you can see, the string representation used by Python still uses single quotes and a backslash to represent that last string.

Related

Python str.format with string contatenation and continuation

I'd like to specify a string with both line continuation and catenation characters. this is really useful if I'm echoing a bunch of related values. Here is a simple example with only two parameters:
temp = "here is\n"\
+"\t{}\n"\
+"\t{}".format("foo","bar")
print(temp)
here's what I get:
here is
{}
foo
And here is what I expect:
here is
foo
bar
What gives?
You can try something like this :
temp = ("here is\n"
"\t{}\n"
"\t{}".format("foo","bar"))
print(temp)
Or like :
# the \t have been replaced with
# 4 spaces just as an example
temp = '''here is
{}
{}'''.format
print(temp('foo', 'bar'))
vs. what you have:
a = "here is\n"
b = "\t{}\n"
c = "\t{}".format("foo","bar")
print( a + b + c)
str.format is called before your strings are concatenated. Think of it like 1 + 2 * 3, where the multiplication is evaluated before the addition.
Just wrap the whole string in parentheses to indicate that you want the strings concatenated before calling str.format:
temp = ("here is\n"
+ "\t{}\n"
+ "\t{}").format("foo","bar")
Python in effect sees this:
Concatenate the result of
"here is\n"
with the resuslt of
"\t{}\n"
with the result of
"\t{}".format("foo","bar")
You have 3 separate string literals, and only the last one has the str.format() method applied.
Note that the Python interpreter is concatenating the strings at runtime.
You should instead use implicit string literal concatenation. Whenever you place two string literals side by side in an expression with no other operators in between, you get a single string:
"This is a single" " long string, even though there are separate literals"
This is stored with the bytecode as a single constant:
>>> compile('"This is a single" " long string, even though there are separate literals"', '', 'single').co_consts
('This is a single long string, even though there are separate literals', None)
>>> compile('"This is two separate" + " strings added together later"', '', 'single').co_consts
('This is two separate', ' strings added together later', None)
From the String literal concatenation documentation:
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld".
When you use implicit string literal concatenation, any .format() call at the end is applied to that whole, single string.
Next, you don't want to use \ backslash line continuation. Use parentheses instead, it is cleaner:
temp = (
"here is\n"
"\t{}\n"
"\t{}".format("foo","bar"))
This is called implicit line joining.
You might also want to learn about multiline string literals, where you use three quotes at the start and end. Newlines are allowed in such strings and remain part of the value:
temp = """\
here is
\t{}
\t{}""".format("foo","bar")
I used a \ backslash after the opening """ to escape the first newline.
The format function is only being applied to the last string.
temp = "here is\n"\
+"\t{}\n"\
+"\t{}".format("foo","bar")
Is doing this:
temp = "here is\n" + "\t{}\n"\ + "\t{}".format("foo","bar")
The key is that the .format() function is only happening to the last string:
"\t{}".format("foo","bar")
You can obtain the desired result using parentheses:
temp = ("here is\n"\
+"\t{}\n"\
+"\t{}").format("foo","bar")
print(temp)
#here is
# foo
# bar

python replaces single backlash with double when using extend [duplicate]

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 4 years ago.
I have a dictionary:
my_dictionary = {"058498":"table", "064165":"pen", "055123":"pencil"}
I iterate over it:
for item in my_dictionary:
PDF = r'C:\Users\user\Desktop\File_%s.pdf' %item
doIt(PDF)
def doIt(PDF):
part = MIMEBase('application', "octet-stream")
part.set_payload( open(PDF,"rb").read() )
But I get this error:
IOError: [Errno 2] No such file or directory: 'C:\\Users\\user\\Desktop\\File_055123.pdf'
It can't find my file. Why does it think there are double backslashes in file path?
The double backslash is not wrong, python represents it way that to the user. In each double backslash \\, the first one escapes the second to imply an actual backslash. If a = r'raw s\tring' and b = 'raw s\\tring' (no 'r' and explicit double slash) then they are both represented as 'raw s\\tring'.
>>> a = r'raw s\tring'
>>> b = 'raw s\\tring'
>>> a
'raw s\\tring'
>>> b
'raw s\\tring'
For clarification, when you print the string, you'd see it as it would get used, like in a path - with just one backslash:
>>> print(a)
raw s\tring
>>> print(b)
raw s\tring
And in this printed string case, the \t doesn't imply a tab, it's a backslash \ followed by the letter 't'.
Otherwise, a string with no 'r' prefix and a single backslash would escape the character after it, making it evaluate the 't' following it == tab:
>>> t = 'not raw s\tring' # here '\t' = tab
>>> t
'not raw s\tring'
>>> print(t) # will print a tab (and no letter 't' in 's\tring')
not raw s ring
So in the PDF path+name:
>>> item = 'xyz'
>>> PDF = r'C:\Users\user\Desktop\File_%s.pdf' % item
>>> PDF # the representation of the string, also in error messages
'C:\\Users\\user\\Desktop\\File_xyz.pdf'
>>> print(PDF) # "as used"
C:\Users\user\Desktop\File_xyz.pdf
More info about escape sequences in the table here. Also see __str__ vs __repr__.
Double backslashes are due to r, raw string:
r'C:\Users\user\Desktop\File_%s.pdf' ,
It is used because the \ might escape some of the characters.
>>> strs = "c:\desktop\notebook"
>>> print strs #here print thinks that \n in \notebook is the newline char
c:\desktop
otebook
>>> strs = r"c:\desktop\notebook" #using r'' escapes the \
>>> print strs
c:\desktop\notebook
>>> print repr(strs) #actual content of strs
'c:\\desktop\\notebook'
save yourself from getting a headache you can use other slashes as well.
if you know what I saying. the opposite looking slashes.
you're using now
PDF = 'C:\Users\user\Desktop\File_%s.pdf' %item
try to use
**
PDF = 'C:/Users/user/Desktop/File_%s.pdf' %item
**
it won't be treated as a escaping character .
It doesn't. Double backslash is just the way of the computer of saying backslash. Yes, I know this sounds weird, but think of it this way - in order to represent special characters, backslash was chosen as an escaping character (e.g. \n means newline, and not the backslash character followed by the n character). But what happens if you actually want to print (or use) a backslash (possibly followed by more characters), but you don't want the computer to treat it as an escaping character? In that case we escape the backslash itself, meaning we use a double backslash so the computer will understand it's a single backslash.
It's done automatically in your case because of the r you added before the string.
alwbtc #
I dare say: "I found the bug..."
replace
PDF = r'C:\Users\user\Desktop\File_%s.pdf' %item
doIt(PDF)`
with
for item in my_dictionary:
PDF = r'C:\Users\user\Desktop\File_%s.pdf' % mydictionary[item]
doIt(PDF)`
in fact you were really looking for File_pencil.pdf (not File_055123.pdf).
You were sliding the index dictionary not its contents.
This forum topic maybe a side-effect.

remove special escape python

I have the string
a = 'ddd\ttt\nnn'
I want to remove the '\' from the string. and It will be
a = 'dddtttnnn'
how to do that in python since '\t' and '\n' has special meaning in python
Assuming you want to remove \t and \n type characters (with those representing tab and newline in this case and remove the meaning of \ in the string in general) you can do:
>>> a = 'ddd\ttt\nnn'
>>> print a
ddd tt
nn
>>> repr(a)[1:-1].replace('\\','')
'dddtttnnn'
>>> print repr(a)[1:-1].replace('\\','')
dddtttnnn
If it is a raw string (i.e., the \ is not interpolated to a single character), you do not need the repr:
>>> a = r'ddd\ttt\nnn'
>>> a.replace('\\','')
'dddtttnnn'

How do I use both single and double quotes in a python string

I am going a little crazy trying out all combinations. I need a string variable whose value is set to: r'" a very long string \r"'
This long string is given across multiple lines. My code looks like this:
str = r'" a very
long
string \r"'
This is introducing "\n" in the str variable. I tried using this syntax """ ...""" too. But I get a syntax error. Can someone help me please ? I saw the other Qs on stackoverflow, they don't seem to match this requirement.
You can use multiple string literals; as long as they are on the same logical line they'll be concatenated to one long string. You can extend the logical line with paretheses:
yourstr = (
'" a very'
'long '
r'string \r"')
Note that I mixed string literal types here. The first two parts are normal string literals, the latter part is a raw string literal so you don't have to double the \ in \r. If you really wanted to have a CR carriage return, omit the r prefix.
Demo:
>>> yourstr = (
... '" a very'
... 'long '
... r'string \r"')
>>> yourstr
'" a verylong string \\r"'
>>> print yourstr
" a verylong string \r"
The Python compiler concatenates adjacent string literals. The trick is to tell it that it should be considered a single line of code.
S = ('" a very '
'long '
r'string \r"')

lexical analysis python

Recently I made the following observation:
>>> x= "\'"
>>> x
"'"
>>> y="'"
>>> y
"'"
>>> print x
'
>>> print y
'
Can anyone please explain why is it so. I am using python 2.7.x. I know well about escape sequences.
I want to do the following:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this.
Inside a pair of "", you don't need to escape the ' character. You can, of course, but as you've seen it's unnecessary and has no effect whatsoever.
It'd be necessary to escape if you were to write a ' inside a pair of '' or a " inside a pair of "":
x = '\''
y = "\""
EDIT :
Regarding the last part in the question, added after the edit:
I have a string with single quotes in it and I have to enter it in a database so I need to replace the instance of single quote(') with a backslash followed by a single quote(\'). How can I achieve this
Any of the following will work, notice the use of raw strings for avoiding the need to escape special characters:
v = "\\'"
w = '\\\''
x = r'\''
y = r"\'"
print v, w, x, y
> \' \' \' \'

Categories

Resources