Remove chars from string using Regular Expression [duplicate] - python

This question already has answers here:
Remove specific characters from a string in Python
(26 answers)
Closed 5 years ago.
Given an array of strings which contains alphanumeric characters but also punctuations that have to be deleted. For instance the string x="0-001" is converted into x="0001".
For this purpose I have:
punctuations = list(string.punctuation)
Which contain all the characters that have to be removed from the strings. I'm trying to solve this using regular expressions in python, any suggestion on how to proceed using regular expressions?
import string
punctuations = list(string.punctuation)
test = "0000.1111"
for i, char in enumerate(test):
if char in punctuations:
test = test[:i] + test[i+ 1:]

If all you want to do is remove non-alphanumeric characters from a string, you can do it simply with re.sub:
>>> re.sub('\W', '', '0-001')
'0001'
Note, the \W will match any character which is not a Unicode word character. This is the opposite of \w. For ASCII strings it's equivalent to [^a-zA-Z0-9_].

Related

RegEx: Extract Unknown # of Numbers of Unknown Length, With Separators, and Characters to Ignore [duplicate]

This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 4 years ago.
I am looking to extract numbers in the format:
[number]['/' or ' ' or '\' possible, ignore]:['/' or ' ' or '\'
possible, ignore][number]['/' or ' ' or '\' possible, ignore]:...
For example:
"4852/: 5934: 439028/:\23"
Would extract: ['4852', '5934', '439028', '23']
Use re.findall to extract all occurrences of a pattern. Note that you should use double backslash to represent a literal backslash in quotes.
>>> import re
>>> re.findall(r'\d+', '4852/: 5934: 439028/:\\23')
['4852', '5934', '439028', '23']
>>>
Python does have a regex package 2.7, 3.*
The function that you would probably want to use is the .split() function
A code snippet would be
import re
numbers = re.split('[/:\]', your_string)
The code above would work if thats you only split it based on those non-alphanumeric characters. But you could split it based on all non numeric characters too. like this
numbers = re.split('\D+', your_string)
or you could do
numbers = re.findall('\d+',your_string)
Kudos!

My regular expression is not getting matched exactly in python [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 6 years ago.
Here's my code...
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]',i,re.M|re.I)
if matchobj:
print(i)
as I have mentioned chap[0-9].. so it should only those strings which follow only one integer after chap
so I should get the following output..
chap3
chap2
chap4
but I am getting the following output...
chap11
chap3
chap2
chap4
chap55
chap33
chap54
match matches your pattern at the beginning of the string. Append e.g. end of string '$' or word boundary '\b' to your pattern:
matchobj=re.match(r'chap\d$',i,re.M|re.I)
# \d (digit) is shortcut for [0-9]
From the docs on re.match:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance.
You should add a dollar sign to the end of your regex expression. The dollar ($) means the end of the string, and for future reference, the carat (^) signifies the beginning.
import re
l=["chap","chap11","chapa","chapb","chapc","chap3","chap2","chapf","chap4","chap55","chapf","chap33","chap54","chapgk"]
for i in l:
matchobj=re.match(r'chap[0-9]$',i,re.M|re.I)
if matchobj:
print(i)
Output
chap3
chap2
chap4

Weird behavior of strip() [duplicate]

This question already has answers here:
How do the .strip/.rstrip/.lstrip string methods work in Python?
(4 answers)
Closed 6 years ago.
>>> adf = "123 ABCD#"
>>> df = "<ABCD#>"
>>> adf.strip(df)
>>> '123 '
>>> xc = "dfdfd ABCD#!"
>>> xc.strip(df)
>>> 'dfdfd ABCD#!'
Why does strip() take out ABCD# in adf?
Does strip completely ignore "<" and ">" ?Why does it remove the chars when no "<" and ">" are there in the original string?
The method strip() returns a copy of the string in which all chars have been stripped from the beginning and the end of the string (default whitespace characters).
The characters that are in df, they occur at the end in the string adf. This is not the case in string xc where the first and last character are ! and d.
str.strip([chars]); => If any character in str occurs in chars at last or first index, then that character is stripped off from str. Then it again checks. When no character is stripped, it stops.

Regex can't escape question mark? [duplicate]

This question already has an answer here:
match trailing slash with Python regex
(1 answer)
Closed 8 years ago.
I can't match the question mark character although I escaped it.
I tried escaping with multiple backslashes and also using re.escape().
What am I missing?
Code:
import re
text = 'test?'
result = ''
result = re.match(r'\?',text)
print ("input: "+text)
print ("found: "+str(result))
Output:
input: test?
found: None
re.match only matches a pattern at the begining of string; as in the docs:
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object.
so, either:
>>> re.match(r'.*\?', text).group(0)
'test?
or re.search
>>> re.search(r'\?', text).group(0)
'?'

Python: Split a string, respect and preserve quotes [duplicate]

This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 7 years ago.
Using python, I want to split the following string:
a=foo, b=bar, c="foo, bar", d=false, e="false"
This should result in the following list:
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false'"']
When using shlex in posix-mode and splitting with ", ", the argument for cgets treated correctly. However, it removes the quotes. I need them because false is not the same as "false", for instance.
My code so far:
import shlex
mystring = 'a=foo, b=bar, c="foo, bar", d=false, e="false"'
splitter = shlex.shlex(mystring, posix=True)
splitter.whitespace += ','
splitter.whitespace_split = True
print list(splitter) # ['a=foo', 'b=bar', 'c=foo, bar', 'd=false', 'e=false']
>>> s = r'a=foo, b=bar, c="foo, bar", d=false, e="false", f="foo\", bar"'
>>> re.findall(r'(?:[^\s,"]|"(?:\\.|[^"])*")+', s)
['a=foo', 'b=bar', 'c="foo, bar"', 'd=false', 'e="false"', 'f="foo\\", bar"']
The regex pattern "[^"]*" matches a simple quoted string.
"(?:\\.|[^"])*" matches a quoted string and skips over escaped quotes because \\. consumes two characters: a backslash and any character.
[^\s,"] matches a non-delimiter.
Combining patterns 2 and 3 inside (?: | )+ matches a sequence of non-delimiters and quoted strings, which is the desired result.

Categories

Resources