Remove just the alphabet characters from a string - python

In python,
I have string like
"dsafsadf_afasa_2.2.14_43.33_dsfd"
I need to get just
"2.2.14_43.33"
How do I do it?

Seems like you're trying to remove all alphabets and all the underscores except if the undercore is present inbetween digits,.
>>> s = "dsafsadf_afasa_2.2.14_43.33_dsfd"
>>> re.sub(r'[a-z]|(?<=\D)_(?=\d)|(?<=\d)_(?=\D)|(?<=\D)_(?=\D)|^_+|_+$', '', s)
'2.2.14_43.33'

You can use str.translate if you just want to remove the letters:
s = "dsafsadf_afasa_2.2.14_43.33_dsfd"
from string import ascii_letters
print(s.translate(None,ascii_letters))
which outputs:
__2.2.14_43.33_
For python3:
from string import ascii_letters
print(s.translate({ord(ch):"" for ch in ascii_letters}))
If you really want to remove underscores from the end use strip:
s = "dsafsadf_afasa_2.2.14_43.33_dsfd"
from string import ascii_letters
print(s.translate(None,ascii_letters).strip("_"))
Output:
2.2.14_43.33

You can simply do a re.findall.
import re
p = re.compile(r'\d+(?:[\W_]\d+)*')
test_str = "dsafsadf_afasa_2.2.14_43.33_dsfd"
re.findall(p, test_str)
See demo.
https://regex101.com/r/hF1wE3/2

Related

Remove n characters after certain character

I have an string that looks something like this:
*45hello I'm a string *2jwith some *plweird things
I need to remove all the * and the 2 chars that follow those * to get this:
hello I'm a string with some weird things
Is there a practical way to do it without iterating over the string?
Thanks!
Using regular expression:
import re
s = "*45hello I'm a string *2jwith some *plweird things"
s = re.sub(r'\*..', '', s)
You can use regex:
import re
regex = r"\*(.{2})"
test_str = "*45hello I'm a string *2jwith some *plweird things"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, '', test_str, 0)

Python Regex: Remove optional characters

I have a regex pattern with optional characters however at the output I want to remove those optional characters. Example:
string = 'a2017a12a'
pattern = re.compile("((20[0-9]{2})(.?)(0[1-9]|1[0-2]))")
result = pattern.search(string)
print(result)
I can have a match like this but what I want as an output is:
desired output = '201712'
Thank you.
You've already captured the intended data in groups and now you can use re.sub to replace the whole match with just contents of group1 and group2.
Try your modified Python code,
import re
string = 'a2017a12a'
pattern = re.compile(".*(20[0-9]{2}).?(0[1-9]|1[0-2]).*")
result = re.sub(pattern, r'\1\2', string)
print(result)
Notice, how I've added .* around the pattern, so any of the extra characters around your data is matched and gets removed. Also, removed extra parenthesis that were not needed. This will also work with strings where you may have other digits surrounding that text like this hello123 a2017a12a some other 99 numbers
Output,
201712
Regex Demo
You can just use re.sub with the pattern \D (=not a number):
>>> import re
>>> string = 'a2017a12a'
>>> re.sub(r'\D', '', string)
'201712'
Try this one:
import re
string = 'a2017a12a'
pattern = re.findall("(\d+)", string) # this regex will capture only digit
print("".join(p for p in pattern)) # combine all digits
Output:
201712
If you want to remove all character from string then you can do this
import re
string = 'a2017a12a'
re.sub('[A-Za-z]+','',string)
Output:
'201712'
You can use re module method to get required output, like:
import re
#method 1
string = 'a2017a12a'
print (re.sub(r'\D', '', string))
#method 2
pattern = re.findall("(\d+)", string)
print("".join(p for p in pattern))
You can also refer below doc for further knowledge.
https://docs.python.org/3/library/re.html

Replace string with double quotes + string using regex

I want to replace a string with (double quotes + string). Need to use it into python.
Input : {responseHeader:{status:0,QTime:94}}
Output : {"responseHeader":{"status":0,"QTime":94}}
Tried /[^\d\W]+/g regex to get only string but don't know how to replace it with (double quotes + string).
Try this
>>> import re
>>> inp = '{responseHeader:{status:0,QTime:94}}'
>>> re.sub(r'([a-zA-Z]+)',r'"\1"',inp)
'{"responseHeader":{"status":0,"QTime":94}}'
([a-zA-Z]+)
Try this.Replace by "\1".See demo.
https://regex101.com/r/sJ9gM7/18#python
import re
p = re.compile(r'([a-zA-Z]+)', re.MULTILINE)
test_str = "{responseHeader:{status:0,QTime:94}}"
subst = "\"\1\""
result = re.sub(p, subst, test_str)

Python converting string to latex using regular expression

Say I have a string
string = "{1/100}"
I want to use regular expressions in Python to convert it into
new_string = "\frac{1}{100}"
I think I would need to use something like this
new_string = re.sub(r'{.+/.+}', r'', string)
But I'm stuck on what I would put in order to preserve the characters in the fraction, in this example 1 and 100.
You can use () to capture the numbers. Then use \1 and \2 to refer to them:
new_string = re.sub(r'{(.+)/(.+)}', r'\\frac{\1}{\2}', string)
# \frac{1}{100}
Note: Don't forget to escape the backslash \\.
Capture the numbers using parens and then reference them in the replacement text using \1 and \2. For example:
>>> print re.sub(r'{(.+)/(.+)}', r'\\frac{\1}{\2}', "{1/100}")
\frac{1}{100}
Anything inside the braces would be a number/number. So in the regex place numbers([0-9]) instead of a .(dot).
>>> import re
>>> string = "{1/100}"
>>> new = re.sub(r'{([0-9]+)/([0-9]+)}', r'\\frac{\1}{\2}', string)
>>> print new
\frac{1}{100}
Use re.match. It's more flexible:
>>> m = re.match(r'{(.+)/(.+)}', string)
>>> m.groups()
('1', '100')
>>> new_string = "\\frac{%s}{%s}"%m.groups()
>>> print new_string
\frac{1}{100}

How can I remove all the punctuations from a string?

for removing all punctuations from a string, x.
i want to use re.findall(), but i've been struggling to know what to write in it..
i know that i can get all the punctuations by writing:
import string
y = string.punctuation
but if i write:
re.findall(y,x)
it says:
raise error("multiple repeat")
sre_constants.error: multiple repeat
can someone explain what exactly we should write in re.findall function?
You may not even need RegEx for this. You can simply use translate, like this
import string
print data.translate(None, string.punctuation)
Several characters in string.punctuation have special meaning in regular expression. They should be escaped.
>>> import re
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~'
>>> import re
>>> re.escape(string.punctuation)
'\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\#\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~'
And if you want to match any one of them, use character class ([...])
>>> '[{}]'.format(re.escape(string.punctuation))
'[\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\#\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~]'
>>> import re
>>> pattern = '[{}]'.format(re.escape(string.punctuation))
>>> re.sub(pattern, '', 'Hell,o World.')
'Hello World'

Categories

Resources