Python: How do string variables prevent escape? - python

>>>m = "\frac{7x+5}{1+y^2}"
>>>print(m)
rac{7x+5}{1+y^2}
>>>print(r""+m)
rac{7x+5}{1+y^2}
>>>print(r"{}".format(m))
rac{7x+5}{1+y^2}
>>>print(repr(m))
'\x0crac{7x+5}{1+y^2}'
I want the result:"\frac{7x+5}{1+y^2}"
Must be a string variable!!!

You need the string literal that contains the slash to be a raw string.
m = r"\frac{7x+5}{1+y^2}"
Raw strings are just another way of writing strings. They aren't a different type. For example r"" is exactly the same as "" because there are no characters to escape, it doesn't produce some kind of raw empty string and adding it to another string changes nothing.

Another option is to add the escape sign to the escape sign to signify that it is a string literal
m = "\\frac{7x+5}{1+y^2}"
print(m)
print(r""+m)
print(r"{}".format(m))
print(repr(m))

A good place to start is to read the docs here. So you can use either the escape character "\" as here
>>> m = "\\frac{7x+5}{1+y^2}"
>>> print(m)
\frac{7x+5}{1+y^2}
or use string literals, which takes the string to be as is
>>> m = r"\frac{7x+5}{1+y^2}"
>>> print(m)
\frac{7x+5}{1+y^2}

Related

How to remove set of characters when a string comprise of "\" and Special characters in python

a = "\Virtual Disks\DG2_ASM04\ACTIVE"
From the above string I would like to get the part "DG2_ASM04" alone. I cannot split or strip as it has the special characters "\", "\D" and "\A" in it.
Have tried the below and can't get the desired output.
a.lstrip("\Virtual Disks\\").rstrip("\ACTIVE")
the output I have got is: 'G2_ASM04' instead of "DG2_ASM04"
Simply use slicing and escape backslash(\)
>>> a.split("\\")[-2]
'DG2_ASM04'
In your case D is also removing because it is occurring more than one time in given string (thus striping D as well). If you tweak your string then you will realize what is happening
>>> a = "\Virtual Disks\XG2_ASM04\ACTIVE"
>>> a.lstrip('\\Virtual Disks\\').rstrip("\\ACTIVE")
'XG2_ASM04'

Removing escape characters from a string

How can i remove the escape chars in Python 2.7 and python 3 ?
Example:
a = "\u00E7a\u00E7a\u00E7a=http\://\u00E1\u00E9\u00ED\u00F3\u00FA\u00E7/()\=)(){[]}"
decoded = a.decode('unicode_escape')
print decoded
Result:
çaçaça=http\://áéíóúç/()\=)(){[]}
Expected result
çaçaça=http://áéíóúç/()=)(){[]}
EDIT: In order to avoid unnecessary downvotes. using .replace isn't our primary focus since this problem was raised by a legacy solution from other teams ( db table with reference data with contains portuguese chars and regular expressions).
You're looking for a simple str.replace
>>> print decoded.replace('\\', '')
çaçaça=http://áéíóúç/()=)(){[]}
The remaining \ is actually a literal backslash, not an escape sequence.
You can simply remove the unnecessary the escape character in your string, i.e.
>>> a = "\u00E7a\u00E7a\u00E7a=http://\u00E1\u00E9\u00ED\u00F3\u00FA\u00E7/()=)(){[]}"
>>> decoded = a.decode('unicode_escape')
>>> print decoded
çaçaça=http://áéíóúç/()=)(){[]}

Dealing with doubly escaped unicode string

I have a database of badly formatted database of strings. The data looks like this:
"street"=>"\"\\u4e2d\\u534e\\u8def\""
when it should be like this:
"street"=>"中华路"
The problem I have is that when that doubly escaped strings comes from the database they are not being decoded to the chinese characters as they should be. So suppose I have this variable; street="\"\\u4e2d\\u534e\\u8def\"" and if I print that print(street) the result is a string of codepoints "\u4e2d\u534e\u8def"
What can I do at this point to convert "\u4e2d\u534e\u8def" to actual unicode characters ?
First encode this string as utf8 and then decode it with unicode-escape which will handle the \\ for you:
>>> line = "\"\\u4e2d\\u534e\\u8def\""
>>> line.encode('utf8').decode('unicode-escape')
'"中华路"'
You can then strip the " if necessary
You could remove the quotation marks with strip and split at every '\\u'. This would give you the characters as strings representing hex numbers. Then for each string you could convert it to int and back to string with chr:
>>> street = "\"\\u4e2d\\u534e\\u8def\""
>>> ''.join(chr(int(x, 16)) for x in street.strip('"').split('\\u') if x)
'中华路'
Based on what you wrote, the database appears to be storing an eval-uable ascii representation of a string with non-unicode chars.
>>> eval("\"\\u4e2d\\u534e\\u8def\"")
'中华路'
Python has a built-in function for this.
>>> ascii('中华路')
"'\\u4e2d\\u534e\\u8def'"
The only difference is the use of \" instead of ' for the needed internal quote.

Hexadecimal file is loading with 2 back slashes before each byte instead of one

I have a hex file in this format: \xda\xd8\xb8\x7d
When I load the file with Python, it loads with two back slashes instead of one.
with open('shellcode.txt', 'r') as file:
shellcode = file.read().replace('\n', '')
Like this: \\xda\\xd8\\xb8\\x7d
I've tried using hex.replace("\\", "\"), but I'm getting an error
EOL while scanning string literal
What is the proper way to replace \\ with \?
Here is an example
>>> h = "\\x123"
>>> h
'\\x123'
>>> print h
\x123
>>>
The two backslashes are needed because \ is an escape character, and so it needs to be escaped. When you print h, it shows what you want
Backshlash (\) is an escape character. It is used for changing the meaning of the character(s) following it.
For example, if you want to create a string which contains a quote, you have to escape it:
s = "abc\"def"
print s # prints: abc"def
If there was no backslash, the first quote would be interpreted as the end of the string.
Now, if you really wanted that backslash in the string, you would have to escape the bacsklash using another backslash:
s = "abc\\def"
print s # prints: abc\def
However, if you look at the representation of the string, it will be shown with the escape characters:
print repr(s) # prints: 'abc\\def'
Therefore, this line should include escapes for each backslash:
hex.replace("\\", "\") # wrong
hex.replace("\\\\", "\\") # correct
But that is not the solution to the problem!
There is no way that file.read().replace('\n', '') introduced additional backslashes. What probably happened is that OP printed the representation of the string with backslashes (\) which ended up printing escaped backslashes (\\).
You can make a bytes object with a utf-8 encoding, and then decode as unicode-escape.
>>> x = "\\x61\\x62\\x63"
>>> y = bytes(x, "utf-8").decode("unicode-escape")
>>> print(x)
\x61\x62\x63
>>> print(y)
abc

Using raw literal representations when working with variable strings

I am trying to use a for loop to copy files in different source folders into the same destination folder. I like to use literal representations to avoid issues with multiple backslashes in the file paths. I could not find the proper way to get a literal representation of a variable. Any tip would be appreciated. The code is below:
import shutil
destination_folder=DF
for i in range (1,3):
new_folder='folder_'+str(i)
new_path=os.path.join('C:\foo', new_folder, file_to_copy)
source_file= r(new_path) #WRONG
destination= r(destination_folder) #WRONG
shutil.copy(source_file, destination)
r is not a function that applies to string objects, it's a modifier that applies to string literals. It changes how the literal gets interpreted as a value. But once it's done, the value is just a plain old string value. In particular:
>>> a = '\n'
>>> b = '''
... '''
>>> a == b
True
So, if a and b are the same value, how can Python possibly know that you want to turn it into r'\n'?
For that matter, imagine this:
>>> c = sys.stdin.readline()
>>> c == a
True
Or this:
>>> d = chr(10)
>>> d == a
You can't go back and re-interpret the string literal as a raw string in any of these other cases—in b it would be unchanged, and in c and d there was no string literal in the first place.
If you want to escape all special characters in a string value, without caring where they came from, you can do that by asking Python to escape the string. For example:
>>> e = a.encode('unicode-escape').decode('ascii')
But you definitely don't want to do that for constructing filenames to pass to the shutil.copy function.
If you have a string literal in your code, and you want it to be treated as a raw string literal, just write it as a raw string literal. So:
new_path=os.path.join(r'C:\foo', new_folder, file_to_copy)
source_file= new_path
destination= destination_folder
You could instead manually escape the backslash in your literal, or use forward slashes instead of backslashes, etc. But those are all things you do to the literal before it gets evaluated by Python, not to the string after the fact.
The concept of a "literal representation" of a variable string doesn't really make sense.
If you have a variable called new_path, the value of this variable is simply a string value. The r prefix only applies to string literals.

Categories

Resources