Unable to search for '\\n' using regular expressions in python3 [duplicate] - python

This question already has answers here:
Confused about backslashes in regular expressions [duplicate]
(3 answers)
Closed 5 years ago.
>>> import re
>>> a='''\\n5
... 8'''
>>> b=re.findall('\\n[0-9]',a)
>>> print(b)
['\n8']
Why does it show \n8 and not \n5?
I used a \ in front of \n the first time.
I am finding the use of raw string in regex in python a bit confusing. To me it does not seem to be making any changes to the result

This is because in strings, the newline character is considered that, a single character.
When you do \\n5 you're escaping the \, so that's literally printing \n5, and not a newline by Python standards.
When you search for a regex such as \\n[0-9] though, in the first \ you're escaping the \n regex expression, so in the end you're looking for \n which is Python's newline. That matches the actual newline in your string, but not \\n which is two separate characters, an escaped \ and an n.

\\n is not a newline, it's an escaped backslash with an n.
>>> import re
>>> a = '''\n5
... 8'''
>>> a=re.findall('\\n[0-9]',a)
>>> print(a)
['\n5', '\n8']

because \\n5 is not valid new line, it will print \n5

Related

Python Regular Expression newline is matched [duplicate]

This question already has answers here:
REGEX - Differences between `^`, `$` and `\A`, `\Z`
(1 answer)
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I want to match a string that has alphanumerics and some special characters but not the newline. But, whenever my string has a newline, it matches the newline character as well. I checked document for some flags but none of them looked relevant.
The following is a sample code in Python 3.6.2 REPL
>>> import re
>>> s = "do_not_match\n"
>>> p = re.compile(r"^[a-zA-Z\+\-\/\*\%\_\>\<=]*$")
>>> p.match(s)
<_sre.SRE_Match object; span=(0, 12), match='do_not_match'>
The expected result is that it shouldn't match as I have newline at the end.
https://regex101.com/r/qyRw5s/1
I am a bit confused on what I am missing here.
The problem is that $ matches at the end of the string before the newline (if any).
If you don't want to match the newline at the end, use \Z instead of $ in your regex.
See the re module's documentation:
'$'
Matches the end of the string or just before the newline at the end of the string,
\Z
Matches only at the end of the string.

Removing wrapped line returns [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 months ago.
I want to remove the line returns of a text that is wrapped to a certain width. e.g.
import re
x = 'the meaning\nof life'
re.sub("([,\w])\n(\w)", "\1 \2", x)
'the meanin\x01 \x02f life'
I want to return the meaning of life. What am I doing wrong?
You need escape that \ like this:
>>> import re
>>> x = 'the meaning\nof life'
>>> re.sub("([,\w])\n(\w)", "\1 \2", x)
'the meanin\x01 \x02f life'
>>> re.sub("([,\w])\n(\w)", "\\1 \\2", x)
'the meaning of life'
>>> re.sub("([,\w])\n(\w)", r"\1 \2", x)
'the meaning of life'
>>>
If you don't escape it, the output is \1, so:
>>> '\1'
'\x01'
>>>
That's why we need use '\\\\' or r'\\'to display a signal \ in Python RegEx.
However about that, from this answer:
If you're putting this in a string within a program, you may actually need to use four backslashes (because the string parser will remove two of them when "de-escaping" it for the string, and then the regex needs two for an escaped regex backslash).
And the document:
As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python's usage of the same character for the same purpose in string literals.
Let's say you want to write a RE that matches the string \section, which might be found in a LaTeX file. To figure out what to write in the program code, start with the desired string to be matched. Next, you must escape any backslashes and other metacharacters by preceding them with a backslash, resulting in the string \\section. The resulting string that must be passed to re.compile() must be \\section. However, to express this as a Python string literal, both backslashes must be escaped again.
Another way as brittenb suggested, you don't need RegEx in this case:
>>> x = 'the meaning\nof life'
>>> x.replace("\n", " ")
'the meaning of life'
>>>
Use raw string literals; both Python string literal syntax and regex interpret backslashes; \1 in a python string literal is interpreted as an octal escape, but not in a raw string literal:
re.sub(r"([,\w])\n(\w)", r"\1 \2", x)
The alternative would be to double all backslashes so that they reach the regex engine as such.
See the Backslash plague section of the Python regex HOWTO.
Demo:
>>> import re
>>> x = 'the meaning\nof life'
>>> re.sub(r"([,\w])\n(\w)", r"\1 \2", x)
'the meaning of life'
It might be easier just to split on newlines; use the str.splitlines() method, then re-join with spaces using str.join():
' '.join(ex.splitlines())
but admittedly this won't distinguish between newlines between words and extra newlines elsewhere.

Including \n in a string in python [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 7 years ago.
How do i escape \n in a string in python.
How do i write out to stdin in python this string "abc\ndef" as one single input
Sys.stdout.write("abc\ndef")
current output
import sys
>>> sys.stdout.write("abc\ndef")
abc
def
I would like it to be abc\ndef
You should escape the backslash so that it's not treated as escaping character itself:
Sys.stdout.write("abc\\ndef")
Background
The backslash \ tells the parser that the next character is something special and must be treated differently. That's why \n will not print as \n but as a newline. But how do we write a backslash then? We need to escape it, too, resulting in \\ for a single backslash and \\n for the output \n.
Docs here, also see this SO question
Alternatively you can use "raw" strings, i.e. prefixing your strings with an r, to disable interpreting escape sequences is your strings:
Sys.stdout.write(r"abc\ndef")
As an alternative to escaping the backslash, you can disable backslash-escaping entirely by using a raw string literal:
>>> print(r"abc\ndef")
abc\ndef

Can '\' be in a Python string? [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 8 years ago.
I program in Python in PyCharm and whenever I write '\' as a string it says that the following statements do nothing. For example:
Is there a way to fix this and make it work?
Thanks.
You need to double the backslash:
'/-\\'
as a single backslash has special meaning in a Python string as the start of an escape sequence. A double \\ results in the string containing a single backslash:
>>> print '/-\\'
/-\
If the backslash is not the last character in the string, you could use a r'' raw string as well:
>>> print r'\-/'
\-/
You need to scape them to be in the string, for example:
>>>s='\\'
>>>print s
\
You can also use the r (raw string) modifier in front of the string to include them easily but they can't end with an odd number of backslash. You can read more about string literals on the docs.

How to escape special characters of a string with single backslashes [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 7 months ago.
I'm trying to escape the characters -]\^$*. each with a single backslash \.
For example the string: ^stack.*/overflo\w$arr=1 will become:
\^stack\.\*/overflo\\w\$arr=1
What's the most efficient way to do that in Python?
re.escape double escapes which isn't what I want:
'\\^stack\\.\\*\\/overflow\\$arr\\=1'
I need this to escape for something else (nginx).
This is one way to do it (in Python 3.x):
escaped = a_string.translate(str.maketrans({"-": r"\-",
"]": r"\]",
"\\": r"\\",
"^": r"\^",
"$": r"\$",
"*": r"\*",
".": r"\."}))
For reference, for escaping strings to use in regex:
import re
escaped = re.escape(a_string)
Just assuming this is for a regular expression, use re.escape.
We could use built-in function repr() or string interpolation fr'{}' escape all backwardslashs \ in Python 3.7.*
repr('my_string') or fr'{my_string}'
Check the Link: https://docs.python.org/3/library/functions.html#repr
re.escape doesn't double escape. It just looks like it does if you run in the repl. The second layer of escaping is caused by outputting to the screen.
When using the repl, try using print to see what is really in the string.
$ python
>>> import re
>>> re.escape("\^stack\.\*/overflo\\w\$arr=1")
'\\\\\\^stack\\\\\\.\\\\\\*\\/overflo\\\\w\\\\\\$arr\\=1'
>>> print re.escape("\^stack\.\*/overflo\\w\$arr=1")
\\\^stack\\\.\\\*\/overflo\\w\\\$arr\=1
>>>
Simply using re.sub might also work instead of str.maketrans. And this would also work in python 2.x
>>> print(re.sub(r'(\-|\]|\^|\$|\*|\.|\\)',lambda m:{'-':'\-',']':'\]','\\':'\\\\','^':'\^','$':'\$','*':'\*','.':'\.'}[m.group()],"^stack.*/overflo\w$arr=1"))
\^stack\.\*/overflo\\w\$arr=1
Utilize the output of built-in repr to deal with \r\n\t and process the output of re.escape is what you want:
re.escape(repr(a)[1:-1]).replace('\\\\', '\\')

Categories

Resources