Python 3.5 splitlines for multiline string containing backslashes - python

How can I effectively split multiline string containing backslashes resulting in unwanted escape characters into separate lines?
Here is an example input I'm dealing with:
strInput = '''signalArr(0)="ASCB D\axx\bxx\fxx\nxx"
signalArr(1)="root\rxx\txx\vxx"'''
I've tried this (to transform single backslash into double. So backslash escape would have precedence and following character would be treated "normally"):
def doubleBackslash(inputString):
inputString.replace('\\','\\\\')
inputString.replace('\a','\\a')
inputString.replace('\b','\\b')
inputString.replace('\f','\\f')
inputString.replace('\n','\\n')
inputString.replace('\r','\\r')
inputString.replace('\t','\\t')
inputString.replace('\v','\\v')
return inputString
strInputProcessed = doubleBackslash(strInput)
I'd like to get:
lineList = strInputProcessed.splitlines()
>> ['signalArr(0)="ASCB D\axx\bxx\fxx\nxx"','signalArr(1)="root\rxx\txx\vxx"']
What I got:
>> ['signalArr(0)="ASCB D\x07xx\x08xx', 'xx', 'xx"', 'signalArr(1)="root', 'xx\txx', 'xx"']

Try storing your input as a raw string, then all '\n' characters will automatically be escaped:
>>> var = r'''abc\n
... cba'''
>>> print var
abc\n
cba
>>> var.splitlines()
['abc\\n', 'bca']
(Note the r before the '. This denotes the string is raw)
As an extra, if you wish to escape an existing string, instead of the replace commands you did above, you can use encode with 'string-escape'.
>>> s = 'abc\nabc\nabc'
>>> s.encode('string-escape')
'abc\\nabc\\nabc'
and similarly if needed, you can undo the string-escaping of a string.
>>> s.decode('string-escape')
Finally, thought I would add in your context:
>>> strInput = r'''signalArr(0)="ASCB D\axx\bxx\fxx\nxx"
... signalArr(1)="root\rxx\txx\vxx"'''
>>> strInput.splitlines()
['signalArr(0)="ASCB D\\axx\\bxx\\fxx\\nxx"', 'signalArr(1)="root\\rxx\\txx\\vxx"']
Even though the extra \ are present on the printed string, they don't really exist in memory. Iterating the string will prove this, as it does not give you an extra \ character that is used to escape.
>>> s = r'\a\b\c'
>>>
>>> for c in s:
... print c
\
a
\
b
\
c
>>> list(s)
['\\', 'a', '\\', 'b', '\\', 'c']

Related

Rstrip not removing correct backslashes or giving position

So,
I have a string that looks like \uisfhb\dfjn
This will vary in length. Im struggling to get my head around rsplit and the fact that backslash is an escape character. i only want "dfjn"
i currently have
more = "\\\\uisfhb\dfjn"
more = more.replace(r'"\\\\', r"\\")
sharename = more.rsplit(r'\\', 2)
print(sharename)
and im getting back
['', 'uisfhb\dfjn']
If you want to partition a string on a literal backslash, you need to escape the backslash with another backslash in the separator.
>>> more.split('\\')
['', '', 'uisfhb', 'dfjn']
>>> more.rsplit('\\', 1)
['\\\\uisfhb', 'dfjn']
>>> more.rpartition('\\')
('\\\\uisfhb', '\\', 'dfjn')
Once the string has been split, the last element can be accessed using the index -1:
>>> sharename = more.rsplit('\\', 1)[-1]
>>> sharename
'dfjn'
or using sequence-unpacking syntax (the * operator)
>>> *_, sharename = more.rpartition('\\')
>>> sharename
'dfjn'
I think this is an issue with raw strings. Try this:
more = "\\\\uisfhb\dfjn"
more = more.replace("\\\\", "\\")
sharename = more.split("\\")[2] # using split and not rsplit
print(sharename)
If sharename is the last node in the tree, this will get it:
>>>more = "\\\\uisfhb\dfjn"
>>>sharename = more.split('\\')[-1]
>>>sharename
'dfjn'

Hexadecimal file is loading with 2 back slashes before each byte instead of one

I have a hex file in this format: \xda\xd8\xb8\x7d
When I load the file with Python, it loads with two back slashes instead of one.
with open('shellcode.txt', 'r') as file:
shellcode = file.read().replace('\n', '')
Like this: \\xda\\xd8\\xb8\\x7d
I've tried using hex.replace("\\", "\"), but I'm getting an error
EOL while scanning string literal
What is the proper way to replace \\ with \?
Here is an example
>>> h = "\\x123"
>>> h
'\\x123'
>>> print h
\x123
>>>
The two backslashes are needed because \ is an escape character, and so it needs to be escaped. When you print h, it shows what you want
Backshlash (\) is an escape character. It is used for changing the meaning of the character(s) following it.
For example, if you want to create a string which contains a quote, you have to escape it:
s = "abc\"def"
print s # prints: abc"def
If there was no backslash, the first quote would be interpreted as the end of the string.
Now, if you really wanted that backslash in the string, you would have to escape the bacsklash using another backslash:
s = "abc\\def"
print s # prints: abc\def
However, if you look at the representation of the string, it will be shown with the escape characters:
print repr(s) # prints: 'abc\\def'
Therefore, this line should include escapes for each backslash:
hex.replace("\\", "\") # wrong
hex.replace("\\\\", "\\") # correct
But that is not the solution to the problem!
There is no way that file.read().replace('\n', '') introduced additional backslashes. What probably happened is that OP printed the representation of the string with backslashes (\) which ended up printing escaped backslashes (\\).
You can make a bytes object with a utf-8 encoding, and then decode as unicode-escape.
>>> x = "\\x61\\x62\\x63"
>>> y = bytes(x, "utf-8").decode("unicode-escape")
>>> print(x)
\x61\x62\x63
>>> print(y)
abc

Replacing unknown characters in a string Python 2.7

How can I define characters(in a LIST or a STRING), and have any other characters replaced with.. lets say a '?'
Example:
strinput = "abcdefg#~"
legal = '.,/?~abcdefg' #legal characters
while i not in legal:
#Turn i into '?'
print output
Put the legal characters in a set then use in to test each character of the string. Construct the new string using the str.join() method and a conditional expression.
>>> s = "test.,/?~abcdefgh"
>>> legal = set('.,/?~abcdefg')
>>> s = ''.join(char if char in legal else '?' for char in s)
>>> s
'?e??.,/?~abcdefg?'
>>>
If this is a large file, read in chunks, and apply re.sub(..) as below. ^ within a class (square brackets) stands for negation (similar to saying "anything other than")
>>> import re
>>> char = '.,/?~abcdefg'
>>> re.sub(r'[^' + char +']', '?', "test.,/?~abcdefgh")
'?e??.,/?~abcdefg?'

remove special escape python

I have the string
a = 'ddd\ttt\nnn'
I want to remove the '\' from the string. and It will be
a = 'dddtttnnn'
how to do that in python since '\t' and '\n' has special meaning in python
Assuming you want to remove \t and \n type characters (with those representing tab and newline in this case and remove the meaning of \ in the string in general) you can do:
>>> a = 'ddd\ttt\nnn'
>>> print a
ddd tt
nn
>>> repr(a)[1:-1].replace('\\','')
'dddtttnnn'
>>> print repr(a)[1:-1].replace('\\','')
dddtttnnn
If it is a raw string (i.e., the \ is not interpolated to a single character), you do not need the repr:
>>> a = r'ddd\ttt\nnn'
>>> a.replace('\\','')
'dddtttnnn'

Why are Python strings behaving funny?

I'm so confused... why/how is a different from b?! Why don't they print the same thing?
>>> a = '"'
>>> a
'"'
>>> b = "'"
>>> b
"'"
The strings are not presented differently. Their presentation is just adjusted to avoid having to quote the contained quote. Both ' and " are legal string literal delimiters.
Note that the contents of the string are very different. " is not the same string as '; a == b is (patently) False.
Python would have to use a \ backslash for the " or ' character otherwise. If you use both characters in a string, then python is forced to use quoting:
>>> '\'"'
'\'"'
>>> """Tripple quoted means you can use both without escaping them: "'"""
'Tripple quoted means you can use both without escaping them: "\''
As you can see, the string representation used by Python still uses single quotes and a backslash to represent that last string.

Categories

Resources