How to treat \t as a regular string in python [duplicate] - python

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 8 years ago.
I need to remove \t from a string that is being written however when ever I do
str(contents).replace('\t', ' ')
it just removes all of the tabs. I understand this is because \t is how you write tabs but I want to know how to just treat it like a regular string.

You can prefix the string with r and create a raw-string:
str(contents).replace(r'\t', ' ')
Raw-strings do not process escape sequences. Below is a demonstration:
>>> mystr = r'a\t\tb' # Escape sequences are ignored
>>> print(mystr)
a\t\tb
>>> print(mystr.replace('\t', ' ')) # This replaces tab characters
a\t\tb
>>> print(mystr.replace(r'\t', ' ')) # This replaces the string '\t'
a b
>>>

Related

Seek clarity on regex with $ and \Z [duplicate]

This question already has answers here:
Dollar sign in regular expression and new line character
(2 answers)
Closed 5 months ago.
There is wee confusion between $ and \Z in regex. I understand the underlying concept,
\Z matches the end of the string regardless of the multiline mode where as
$ matches end of the string or just before "\n" in multiline mode.
import re
items = ['lovely', '1\dentist', '2 lonely', 'eden', 'fly\n', 'dent']
# res = [e for e in items if re.search(r'\Aden|ly\Z', e)]
t = re.compile(r"^den|ly$")
res = [e for e in items if t.search(e)]
print(res)
res = ['lovely', '2 lonely', 'fly\n', 'dent']
Why am I matching "fly\n", It ends with "\n" so isn't it suppose to ignore it where as r"^den|ly\Z" get me the desired result.
Note the python documentation for the $ special character:
Matches the end of the string or just before the newline at the end of the string [emphasis added] […]
In "fly\n", the newline is at the end of the string, so '$' can match just before it. If instead the string were "fly\n\n", then the regex would fail.

how to replace CRLF in python? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 2 years ago.
>>> print 'aaa\rbbb'.replace('\r','ccc')
aaacccbbb
>>> print 'aaa\rbbb'.replace('\\r','ccc')
bbb
>>> print 'aaa\rbbb'.replace(r'\r','ccc')
bbb
>>>
I am wondering a reason for last two statment. and I am confusing what
The last two variations do not replace the line return character, so it prints out the original 'aaa\rbbb'. It first prints aaa, then the line return character, moving the cursor back to the beginning of the line, and then it prints bbb over the existing aaa.
The reason '\\r' and r'\r' don't replace '\r' is because '\r' is the line return character, but both '\\r' and r'\r' are a backslash followed by the letter "r".
That's because '\\r' is literally the character '\' followed by the character 'r'
The same for r'\r'. the r prefix is used for raw strings.
Both won't match the character \r (because it's a single character)

RegEx: Extract Unknown # of Numbers of Unknown Length, With Separators, and Characters to Ignore [duplicate]

This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 4 years ago.
I am looking to extract numbers in the format:
[number]['/' or ' ' or '\' possible, ignore]:['/' or ' ' or '\'
possible, ignore][number]['/' or ' ' or '\' possible, ignore]:...
For example:
"4852/: 5934: 439028/:\23"
Would extract: ['4852', '5934', '439028', '23']
Use re.findall to extract all occurrences of a pattern. Note that you should use double backslash to represent a literal backslash in quotes.
>>> import re
>>> re.findall(r'\d+', '4852/: 5934: 439028/:\\23')
['4852', '5934', '439028', '23']
>>>
Python does have a regex package 2.7, 3.*
The function that you would probably want to use is the .split() function
A code snippet would be
import re
numbers = re.split('[/:\]', your_string)
The code above would work if thats you only split it based on those non-alphanumeric characters. But you could split it based on all non numeric characters too. like this
numbers = re.split('\D+', your_string)
or you could do
numbers = re.findall('\d+',your_string)
Kudos!

Weird behavior of strip() [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 6 years ago.
>>> adf = "123 ABCD#"
>>> df = "<ABCD#>"
>>> adf.strip(df)
>>> '123 '
>>> xc = "dfdfd ABCD#!"
>>> xc.strip(df)
>>> 'dfdfd ABCD#!'
Why does strip() take out ABCD# in adf?
Does strip completely ignore "<" and ">" ?Why does it remove the chars when no "<" and ">" are there in the original string?
The method strip() returns a copy of the string in which all chars have been stripped from the beginning and the end of the string (default whitespace characters).
The characters that are in df, they occur at the end in the string adf. This is not the case in string xc where the first and last character are ! and d.
str.strip([chars]); => If any character in str occurs in chars at last or first index, then that character is stripped off from str. Then it again checks. When no character is stripped, it stops.

Python: Split a string, respect and preserve quotes [duplicate]

This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 7 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']
>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
The regex pattern "[^"]*" matches a simple quoted string.
"(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
[^\s,"] matches a non-delimiter.
Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.

Categories

Resources