In Python, ignore digits in string but remove pure digits [duplicate] - python

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 2 years ago.
I am processing string like
This is python3 and learning it takes 100 hours
I want to remove only digits like 100 but want to keep digits when it is part of anything like python3.
I am trying the regex
text = re.sub('[0-9]', '', text)
but it is not working as expected. Help is appreciated.

You can just add a space to both sides of your regex, and then have a single space as the replacement. Remember to also a + to match one or more digits:
import re
text = 'This is python3 and learning it takes 100 hours'
text = re.sub(r' [0-9]+ ', ' ', text)
print(text)
Output:
This is python3 and learning it takes hours

Try below,
text = re.sub(' [0-9]{1,} ', ' ', text)

You can use \b word boundary (class \d is for [0-9]) :
def clean(value):
return re.sub(r"\b\d+\b", "", value)
if __name__ == "__main__":
print(clean("This is python3 and learning it takes 100 hours")) # This is python3 and learning it takes hours
Regex demo

Related

Split string at a specific number that is also contained in a larger number in the same string [duplicate]

This question already has answers here:
How do I check for an exact word or phrase in a string in Python
(8 answers)
Closed 2 years ago.
I have the string: 'This line is 14 1400'
I would like to split it keeping everything to the right of 14.
I have tried:
split2 = re.split('14', string)[2]
This returns: 00
I would like it to return 1400
How would I modify this to get this output? I've experimented with expression operations to only consider 14 but can't seem to get this to work.
To split only on 14 and not on 1400, use the word boundary metacharacter \b.
Make sure to use a raw string to avoid having to escape the \.
>>> split2 = re.split(r'\b14\b', string)
>>> split2
['This line is ', ' 1400']
>>> split2[1]
' 1400'
Alternatively, to also get rid of the leading space in ' 1400', do not split only on 14 but also on any spaces surrounding it:
>>> re.split(r'\s*\b14\b\s*', s)
['This line is', '1400']

Need to find '$word;' pattern in string [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I have big text file and I have to find all words starts with '$' and ends with ';' like $word;.
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall("$..;", text)
print(x)
I want my output like ['$h;', '$h_end;'] How can I do that?
I have to find all words starts with '$' and ends with ';' like $word;.
I would do:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
result = re.findall('\$[^;]+;',text)
print(result)
Output:
['$h;', '$h_end;']
Note that $ needs to be escaped (\$) as it is one of special characters. Then I match 1 or more occurences of anything but ; and finally ;.
You may use
\$\w+;
See the regex demo. Details:
\$ - a $ char
\w+ - 1+ letters, digits, _ (=word) chars
; - a semi-colon.
Python demo:
import re
text = "$h;BREWERY$h_end;You've built yourself a brewery."
x = re.findall(r"\$\w+;", text)
print(x) # => ['$h;', '$h_end;']

RegEx: Extract Unknown # of Numbers of Unknown Length, With Separators, and Characters to Ignore [duplicate]

This question already has answers here:
How to extract numbers from a string in Python?
(19 answers)
Closed 4 years ago.
I am looking to extract numbers in the format:
[number]['/' or ' ' or '\' possible, ignore]:['/' or ' ' or '\'
possible, ignore][number]['/' or ' ' or '\' possible, ignore]:...
For example:
"4852/: 5934: 439028/:\23"
Would extract: ['4852', '5934', '439028', '23']
Use re.findall to extract all occurrences of a pattern. Note that you should use double backslash to represent a literal backslash in quotes.
>>> import re
>>> re.findall(r'\d+', '4852/: 5934: 439028/:\\23')
['4852', '5934', '439028', '23']
>>>
Python does have a regex package 2.7, 3.*
The function that you would probably want to use is the .split() function
A code snippet would be
import re
numbers = re.split('[/:\]', your_string)
The code above would work if thats you only split it based on those non-alphanumeric characters. But you could split it based on all non numeric characters too. like this
numbers = re.split('\D+', your_string)
or you could do
numbers = re.findall('\d+',your_string)
Kudos!

Working With Python Regex [duplicate]

This question already has answers here:
Checking whole string with a regex
(5 answers)
Closed 5 years ago.
I am trying to compile a regex on python but am having limited success. I am doing the following
import re
pattern = re.compile("[a-zA-Z0-9_])([a-zA-Z0-9_-]*)")
m=pattern.match("gb,&^(#)")
if m: print 1
else: print 2
I am expecting the output of the above to print 2, but instead it is printing one. The regex should match strings as follows:
The first letter is alphanumeric or an underscore. All characters after that can be alphanumeric, an underscore, or a dash and there can be 0 or more characters after the first.
I was thinking that this thing should fail as soon as it sees the comma, but it is not.
What am I doing wrong here?
import re
pattern = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)$") # when you don't use $ at end it will match only initial string satisfying your regex
m=pattern.match("gb,&^(#)")
if m:
print(1)
else:
print(2)
pat = re.compile("^([a-zA-Z0-9_])([a-zA-Z0-9_-]*)") # this pattern is written by you which matches any string having alphanumeric characters and underscore in starting
if pat.match("_#"):
print('match')
else:
print('no match 1')
This will also help you understand explaination by #Wiktor with example.

Remove words of length less than 4 from string [duplicate]

This question already has answers here:
Remove small words using Python
(4 answers)
Closed 8 years ago.
I am trying to remove words of length less than 4 from a string.
I use this regex:
re.sub(' \w{1,3} ', ' ', c)
Though this removes some strings but it fails when 2-3 words of length less than 4 appear together. Like:
I am in a bank.
It gives me:
I in bank.
How to resolve this?
Don't include the spaces; use \b word boundary anchors instead:
re.sub(r'\b\w{1,3}\b', '', c)
This removes words of up to 3 characters entirely:
>>> import re
>>> re.sub(r'\b\w{1,3}\b', '', 'The quick brown fox jumps over the lazy dog')
' quick brown jumps over lazy '
>>> re.sub(r'\b\w{1,3}\b', '', 'I am in a bank.')
' bank.'
If you want an alternative to regex:
new_string = ' '.join([w for w in old_string.split() if len(w)>3])
Answered by Martijn, but I just wanted to explain why your regex doesn't work. The regex string ' \w{1,3} ' matches a space, followed by 1-3 word characters, followed by another space. The I doesn't get matched because it doesn't have a space in front of it. The am gets replaced, and then the regex engine starts at the next non-matched character: the i in in. It doesn't see the space before in, since it was placed there by the substitution. So, the next match it finds is a, which produces your output string.

Categories

Resources