Weird behavior of strip() [duplicate] - python

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 6 years ago.
>>> adf = "123 ABCD#"
>>> df = "<ABCD#>"
>>> adf.strip(df)
>>> '123 '
>>> xc = "dfdfd ABCD#!"
>>> xc.strip(df)
>>> 'dfdfd ABCD#!'
Why does strip() take out ABCD# in adf?
Does strip completely ignore "<" and ">" ?Why does it remove the chars when no "<" and ">" are there in the original string?

The method strip() returns a copy of the string in which all chars have been stripped from the beginning and the end of the string (default whitespace characters).
The characters that are in df, they occur at the end in the string adf. This is not the case in string xc where the first and last character are ! and d.
str.strip([chars]); => If any character in str occurs in chars at last or first index, then that character is stripped off from str. Then it again checks. When no character is stripped, it stops.

Related

Seek clarity on regex with $ and \Z [duplicate]

This question already has answers here:
Dollar sign in regular expression and new line character
(2 answers)
Closed 5 months ago.
There is wee confusion between $ and \Z in regex. I understand the underlying concept,
\Z matches the end of the string regardless of the multiline mode where as
$ matches end of the string or just before "\n" in multiline mode.
import re
items = ['lovely', '1\dentist', '2 lonely', 'eden', 'fly\n', 'dent']
# res = [e for e in items if re.search(r'\Aden|ly\Z', e)]
t = re.compile(r"^den|ly$")
res = [e for e in items if t.search(e)]
print(res)
res = ['lovely', '2 lonely', 'fly\n', 'dent']
Why am I matching "fly\n", It ends with "\n" so isn't it suppose to ignore it where as r"^den|ly\Z" get me the desired result.
Note the python documentation for the $ special character:
Matches the end of the string or just before the newline at the end of the string [emphasis added] […]
In "fly\n", the newline is at the end of the string, so '$' can match just before it. If instead the string were "fly\n\n", then the regex would fail.

how to replace CRLF in python? [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 2 years ago.
>>> print 'aaa\rbbb'.replace('\r','ccc')
aaacccbbb
>>> print 'aaa\rbbb'.replace('\\r','ccc')
bbb
>>> print 'aaa\rbbb'.replace(r'\r','ccc')
bbb
>>>
I am wondering a reason for last two statment. and I am confusing what
The last two variations do not replace the line return character, so it prints out the original 'aaa\rbbb'. It first prints aaa, then the line return character, moving the cursor back to the beginning of the line, and then it prints bbb over the existing aaa.
The reason '\\r' and r'\r' don't replace '\r' is because '\r' is the line return character, but both '\\r' and r'\r' are a backslash followed by the letter "r".
That's because '\\r' is literally the character '\' followed by the character 'r'
The same for r'\r'. the r prefix is used for raw strings.
Both won't match the character \r (because it's a single character)

Remove chars from string using Regular Expression [duplicate]

This question already has answers here:
Remove specific characters from a string in Python
(26 answers)
Closed 5 years ago.
Given an array of strings which contains alphanumeric characters but also punctuations that have to be deleted. For instance the string x="0-001" is converted into x="0001".
For this purpose I have:
punctuations = list(string.punctuation)
Which contain all the characters that have to be removed from the strings. I'm trying to solve this using regular expressions in python, any suggestion on how to proceed using regular expressions?
import string
punctuations = list(string.punctuation)
test = "0000.1111"
for i, char in enumerate(test):
if char in punctuations:
test = test[:i] + test[i+ 1:]
If all you want to do is remove non-alphanumeric characters from a string, you can do it simply with re.sub:
>>> re.sub('\W', '', '0-001')
'0001'
Note, the \W will match any character which is not a Unicode word character. This is the opposite of \w. For ASCII strings it's equivalent to [^a-zA-Z0-9_].

How to treat \t as a regular string in python [duplicate]

This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 8 years ago.
I need to remove \t from a string that is being written however when ever I do
str(contents).replace('\t', ' ')
it just removes all of the tabs. I understand this is because \t is how you write tabs but I want to know how to just treat it like a regular string.
You can prefix the string with r and create a raw-string:
str(contents).replace(r'\t', ' ')
Raw-strings do not process escape sequences. Below is a demonstration:
>>> mystr = r'a\t\tb' # Escape sequences are ignored
>>> print(mystr)
a\t\tb
>>> print(mystr.replace('\t', ' ')) # This replaces tab characters
a\t\tb
>>> print(mystr.replace(r'\t', ' ')) # This replaces the string '\t'
a b
>>>

Python: Split a string, respect and preserve quotes [duplicate]

This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 7 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']
>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
The regex pattern "[^"]*" matches a simple quoted string.
"(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
[^\s,"] matches a non-delimiter.
Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.

Categories

Resources