regex windows path incomplete escape '\U' [duplicate] - python

Is there a way to declare a string variable in Python such that everything inside of it is automatically escaped, or has its literal character value?
I'm not asking how to escape the quotes with slashes, that's obvious. What I'm asking for is a general purpose way for making everything in a string literal so that I don't have to manually go through and escape everything for very large strings.

Raw string literals:
>>> r'abc\dev\t'
'abc\\dev\\t'

If you're dealing with very large strings, specifically multiline strings, be aware of the triple-quote syntax:
a = r"""This is a multiline string
with more than one line
in the source code."""

There is no such thing. It looks like you want something like "here documents" in Perl and the shells, but Python doesn't have that.
Using raw strings or multiline strings only means that there are fewer things to worry about. If you use a raw string then you still have to work around a terminal "\" and with any string solution you'll have to worry about the closing ", ', ''' or """ if it is included in your data.
That is, there's no way to have the string
' ''' """ " \
properly stored in any Python string literal without internal escaping of some sort.

You will find Python's string literal documentation here:
http://docs.python.org/tutorial/introduction.html#strings
and here:
http://docs.python.org/reference/lexical_analysis.html#literals
The simplest example would be using the 'r' prefix:
ss = r'Hello\nWorld'
print(ss)
Hello\nWorld

(Assuming you are not required to input the string from directly within Python code)
to get around the Issue Andrew Dalke pointed out, simply type the literal string into a text file and then use this;
input_ = '/directory_of_text_file/your_text_file.txt'
input_open = open(input_,'r+')
input_string = input_open.read()
print input_string
This will print the literal text of whatever is in the text file, even if it is;
' ''' """ “ \
Not fun or optimal, but can be useful, especially if you have 3 pages of code that would’ve needed character escaping.

Use print and repr:
>>> s = '\tgherkin\n'
>>> s
'\tgherkin\n'
>>> print(s)
gherkin
>>> repr(s)
"'\\tgherkin\\n'"
# print(repr(..)) gets literal
>>> print(repr(s))
'\tgherkin\n'
>>> repr('\tgherkin\n')
"'\\tgherkin\\n'"
>>> print('\tgherkin\n')
gherkin
>>> print(repr('\tgherkin\n'))
'\tgherkin\n'

Related

How can I stop Python from recognizing string literals such as "\n" or "\b"?

I am using an API to convert LaTeX to PNG format. The latex strings that I am converting are written in latex, .tex, and so they use phrases such as '\n'.
On example of a string that I have is
query = "$\displaystyle \binom n r = \dfrac{n!}{r!(n-r)!}$"
However, Python recognizes the \b in \binom and thus the string is recognized as having a line break, even though all I want it to do is just recognize the individual characters.
If at all possible, I would like to not have to modify the string itself, as the string too is taken from an API. So is there any way to ignore these string literals such as '\b' or '\n'?
Use r"$\displaystyle \binom n r = \dfrac{n!}{r!(n-r)!}$". This is called a raw string. You can read more about it here
Generally, you can use raw strings in the following format:
Normal string:
'Hi\nHow are you?'
output:
Hi
How are you?
Raw string:
r'Hi\nHow are you?'
output:
Hi\nHow are you?
I've updated my answer for clarity.
If the string comes directly from an API then it should already be in a raw format (or the rawest that will be accessible to you), such as r"$\displaystyle \binom n r = \dfrac{n!}{r!(n-r)!}$". Therefore, Python won't be assuming escaped characters and there shouldn't be an issue.
To answer your other question about raw strings - to print a string as a raw string in Python try the repr function, which returns a printable representational string of the given object.
query = "$\displaystyle \binom n r = \dfrac{n!}{r!(n-r)!}$"
print(repr(query))
Here is the output:
'$\\displaystyle \x08inom n r = \\dfrac{n!}{r!(n-r)!}$'
Note that in the true raw data of query above, the \b character is still technically stored as the \b encoding (or \x08), not as two separate characters. Why isn't \d stored as an encoding, you may ask? Because \d is not a valid encoded escape sequence, so it is overlooked and Python treats the \ as a character. (Ahh... silently ignoring parsing errors, isn't this why we love Python?)
Then what about this example?
query = r"$\displaystyle \binom n r = \dfrac{n!}{r!(n-r)!}$"
print(repr(query))
Looks good, but wait, Python prints '$\\displaystyle \\binom n r = \\dfrac{n!}{r!(n-r)!}$'.
Why the \\? Well, the repr function returns a printable representational string of the given object, so to avoid any confusion the \ character is properly escaped with \, creating \\.
All of that coming full circle and back to your question - if the value of a string comes directly from an API call, then the string data should already be translated from a binary encoding and things like escape sequences shouldn't be an issue (because the aren't in the raw data). But in the example you provided you declared a string in a query = "st\ring" format, which unfortunately isn't equivalent to retrieving a string from an API, and the obvious solution would be to use the query = r"st\ring" format.

How can I print this string with backslash

Update for clarification
I have to replicate the functionality from a server. One of the responses of this old server is the one seen here http://test.muchticket.com/api/?token=carlos&method=ventas&ESP=11, except that the double slashes should be single ones.
End of update
Update No.2 for clarification
This variable then goes to a dictionary wich is dumped to an HttpResponse with this
return HttpResponse(json.dumps(response_data,sort_keys=True), content_type="application/json")
I hate my job.
End of update
I need to store 'http:\/\/shop.muchticket.com\/' in a variable. And then save it in a dictionary. I have tried several different methods, but none of them seems to work, here are some examples of what I've tried:
url = 'http:\/\/shop.muchticket.com\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
With raw
url = r'http:\/\/shop.muchticket.com\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
With the escape character
url = 'http:\\/\\/shop.muchticket.com\\/'
print url
>> http:\\/\\/shop.muchticket.com\\/
Raw and escape character
url = r'http:\\/\\/shop.muchticket.com\\/'
print url
>> http:\\\\/\\\\/shop.muchticket.com\\\\/
Escape character and decode
url = 'http:\\/\\/shop.muchticket.com\\/'
print url.decode('string_escape')
>> http:\\/\\/shop.muchticket.com\\/
Decode only
url = 'http:\/\/shop.muchticket.com\/'
print url.decode('string_escape')
>> http:\\/\\/shop.muchticket.com\\/
The best way is not to use any escape sequences
>>> s = 'http://shop.muchticket.com/'
>>> s
'http://shop.muchticket.com/'
>>> print(s)
http://shop.muchticket.com/
Unlike "other" languages, you do not need to escape the forward slash (/) in Python!
If you need the forward slash then
>>> s = 'http:\/\/shop.muchticket.com\/'
>>> print(s)
http:\/\/shop.muchticket.com\/
Note: When you just type s in interpreter it gives you the repr output and thus you get the escaped forward slash
>>> s
'http:\\/\\/shop.muchticket.com\\/' # Internally stored!!!
>>> print(repr(s))
'http:\\/\\/shop.muchticket.com\\/'
Therefore Having a single \ is enough to store it in a variable.
As J F S says,
To avoid ambiguity, either use raw string literals or escape the
backslashes if you want a literal backslash in the string.
Thus your string would be
s = 'http:\\/\\/shop.muchticket.com\\/' # Escape the \ literal
s = r'http:\/\/shop.muchticket.com\/' # Make it a raw string
If you need two characters in the string: the backslash (REVERSE SOLIDUS) and the forward slash (SOLIDUS) then all three Python string literals produce the same string object:
>>> '\/' == r'\/' == '\\/' == '\x5c\x2f'
True
>>> len(r'\/') == 2
True
The preferable way to write it is: r'\/' or '\\/'.
The reason is that the backslash is a special character in a string literal (something that you write in Python source code (usually by hand)) if it is followed by certain characters e.g., '\n' is a single character (newline) and '\\' is also a single character (the backslash). But '\/' is not an escape sequence and therefore it is two characters. To avoid ambiguity, use raw string literals r'\/' where the backslash has no special meaning.
The REPL calls repr on a string to print it:
>>> r'\/'
'\\/'
>>> print r'\/'
\/
>>> print repr(r'\/')
'\\/'
repr() shows your the Python string literal (how you would write it in a Python source code). '\\/' is a two character string, not three. Don't confuse a string literal that is used to create a string and the string object itself.
And to test the understanding:
>>> repr(r'\/')
"'\\\\/'"
It shows the representation of the representation of the string.
For Python 2.7.9, ran:
url = "http:\/\/shop.muchticket.com\/"
print url
With the result of:
>> http:\/\/shop.muchticket.com\/
What is the version of Python you are using? From Bhargav Rao's answer, it seems that it should work in Python 3.X as well, so maybe it's a case of some weird imports?

latexcodec stripping slashes but not translating characters (Python)

I'm trying to process some Bibtex entries converted to an XML tree via Pybtex. I'd like to go ahead and process all the special characters from their LaTeX specials to unicode characters, via latexcodec. Via question Does pybtex support accent/special characters in .bib file? and the documentation I have checked the syntax, however, I am not getting the correct output.
>>> import latexcodec
>>> name = 'Br\"{u}derle'
>>> name.decode('latex')
u'Br"{u}derle'
I have tested this across different strings and special characters and always it just strips off the first slash without translating the character. Should I be using latexencoder differently to get the correct output?
Your backslash is not included in the string at all because it is treated as a string escape, so the codec never sees it:
>>> print 'Br\"{u}derle'
Br"{u}derle
Use a raw string:
name = r'Br\"{u}derle'
Alternatively, try reading actual data from a file, in which case the raw/non-raw distinction will not matter. (The distinction only applies to literal strings in Python source code.)

In Python, is it possible to escape newline characters when printing a string?

I want the newline \n to show up explicitly when printing a string retrieved from elsewhere. So if the string is 'abc\ndef' I don't want this to happen:
>>> print(line)
abc
def
but instead this:
>>> print(line)
abc\ndef
Is there a way to modify print, or modify the argument, or maybe another function entirely, to accomplish this?
Just encode it with the 'string_escape' codec.
>>> print "foo\nbar".encode('string_escape')
foo\nbar
In python3, 'string_escape' has become unicode_escape. Additionally, we need to be a little more careful about bytes/unicode so it involves a decoding after the encoding:
>>> print("foo\nbar".encode("unicode_escape").decode("utf-8"))
unicode_escape reference
Another way that you can stop python using escape characters is to use a raw string like this:
>>> print(r"abc\ndef")
abc\ndef
or
>>> string = "abc\ndef"
>>> print (repr(string))
>>> 'abc\ndef'
the only proplem with using repr() is that it puts your string in single quotes, it can be handy if you want to use a quote
Simplest method:
str_object.replace("\n", "\\n")
The other methods are better if you want to show all escape characters, but if all you care about is newlines, just use a direct replace.

Python: Writing raw strings to a file

I want to generate C code with a Python script, and not have to escape things. For example, I have tried:
myFile.write(someString + r'\r\n\')
hoping that a r prefix would make things work. However, I'm still getting the error:
myFile.write(someString + ur'\r\n\')
^
SyntaxError: EOL while scanning string literal
How can I write raw strings to a file in Python?
Python raw stings can't end with a backslash.
However, there are workarounds.
You can just add a whitespace at the end of the string:
>>> with open("c:\\tmp\\test.txt", "w") as myFile:
... myFile.write(someString + r'\r\n\ ')
You propably don't bother with that, so that may be a solution.
Assume someString is Hallo.
This will write Hallo\r\n\_ to the file, where _ is a space.
If you don't like the extra space, you can remove it like this:
>>> with open("c:\\tmp\\test.txt", "w") as myFile:
... myFile.write(someString + r'\r\n\ '[:-1])
This will write Hallo\r\n\ to the file, without the extra whitespace, and without escaping the backslash.
You need to escape the last \ so it doesn't escape the end of string, but if you put it as part of a raw string, it won't get you exactly what you want:
>>> r'\r\n\\'
'\\r\\n\\\\'
Python's string literal concatenation, however, lets you mix raw and normal strings:
>>> r'\r\n' '\\'
'\\r\\n\\'
You could insert the raw string into the string via the format method. This ensures
that the raw string will be inserted with the proper escaping.
Example:
mystring = "some string content {0}"
# insert raw string
mystring = mystring.format(r"\r\n\\")
myfile = open("test.txt", "w")
myfile.write(mystring)
myfile.close()
myFile.write(someString + r'\r\n\\')
Just escape your strings ^^
There is no way to have a string literal of arbitrary contents without escaping. You will always run into problems, since there is no way of for example having the "end-of-literal character", in this case ' there without escaping, as Python would be unable to tell if it is the end of the string literal or part of it.
And that's the entire reason why we have escaping in the first place. So the answer to your question is: You can't. Not in Python nor any other language.

Categories

Resources