Python replace only part of a re.sub match - python

The following Python script: re.sub("[^a-zA-Z]pi[^a-zA-Z]", "(math.pi)", "2pi3 + supirse")
results in: '(math.pi) + supirse'
While the match of a non-alpha before and after pi is critical, I do not want these non-alpha characters to be replaced in the match. I would like to see the following output: '2(math.pi)3 + supirse'
Note: A previous suggestion of the following: re.sub("\Bpi\B", "(math.pi)", "2pi3 + supirse")
results in a complete replacement of every instance: '2(math.pi)3 + su(math.pi)rse' which is also NOT what I am looking for

Use this instead: re.sub("(?<=[^a-zA-Z])pi(?=[^a-zA-Z])", "(math.pi)", "2pi3 + supirse")
Visualization: http://regex101.com/r/fX5wX3

Use lookahead/lookbehind:
import re
print re.sub("(?<=[^a-zA-Z])pi(?=[^a-zA-Z])", "(math.pi)", "2pi3 + supirse")
See here for the concrete produced result: http://ideone.com/rSd8H

In fact you need a lowercase "\b" that means word boundary, while "\B" means not a word boundary.
Try this:
import re
re.sub(r"\bpi\b", "(math.pi)", "2pi3 + supirse")
That would yield '2pi3 + supirse'

Related

replace before and after a string using re in python

i have string like this 'approved:rakeshc#IAD.GOOGLE.COM'
i would like extract text after ':' and before '#'
in this case the test to be extracted is rakeshc
it can be done using split method - 'approved:rakeshc#IAD.GOOGLE.COM'.split(':')[1].split('#')[0]
but i would want this be done using regular expression.
this is what i have tried so far.
import re
iptext = 'approved:rakeshc#IAD.GOOGLE.COM'
re.sub('^(.*approved:)',"", iptext) --> give everything after ':'
re.sub('(#IAD.GOOGLE.COM)$',"", iptext) --> give everything before'#'
would want to have the result in single expression. expression would be used to replace a string with only the middle string
Here is a regex one-liner:
inp = "approved:rakeshc#IAD.GOOGLE.COM"
output = re.sub(r'^.*:|#.*$', '', inp)
print(output) # rakeshc
The above approach is to strip all text from the start up, and including, the :, as well as to strip all text from # until the end. This leaves behind the email ID.
Use a capture group to copy the part between the matches to the result.
result = re.sub(r'.*approved:(.*)#IAD\.GOOGLE\.COM$', r'\1', iptext)
Hope this works for you:
import re
input_text = "approved:rakeshc#IAD.GOOGLE.COM"
out = re.search(':(.+?)#', input_text)
if out:
found = out.group(1)
print(found)
You can use this one-liner:
re.sub(r'^.*:(\w+)#.*$', r'\1', iptext)
Output:
rakeshc

Trying to remove all punctuation characters from a string but everything I keep getting // left in

I am trying to write a function to remove all punctuation characters from a string. I've tried several permutations on translate, replace, strip, etc. My latest attempt uses a brute force approach:
def clean_lower(sample):
punct = list(string.punctuation)
for c in punct:
sample.replace(c, ' ')
return sample.split()
That gets rid of almost all of the punctuation but I'm left with // in front of one of the words. I can't seem to find any way to remove it. I've even tried explicitly replacing it with sample.replace('//', ' ').
What do I need to do?
using translate is the fastest way to remove punctuations, this will remove // too:
import string
s = "This is! a string, with. punctuations? //"
def clean_lower(s):
return s.translate(str.maketrans('', '', string.punctuation))
s = clean_lower(s)
print(s)
Use regular expressions
import re
def clean_lower(s):
return(re.sub(r'\W','',s))
Above function erases any symbols except underscore
Perhaps you should approach it from the perspective of what you want to keep:
For example:
import string
toKeep = set(string.ascii_letters + string.digits + " ")
toRemove = set(string.printable) - toKeep
cleanUp = str.maketrans('', '', "".join(toRemove))
usage:
s = "Hello! world of / and dice".translate(cleanUp)
# s will be 'Hello world of and dice'
as suggested by #jasonharper you need to redefine "sample" and it should work:
import string
sample='// Hello?) // World!'
print(sample)
punct=list(string.punctuation)
for c in punct:
sample=sample.replace(c,'')
print(sample.split())

Print only alphabetics in a string using Regular Expression

Goal : i want only alphabets to be printed in a string
#Input
#======
string = ' 529Wind3#. '
#Neededoutput
#============
'Wind'
I tried coding for this using the below code
import re
string=re.sub('[^a-z]+[^A-Z]',' ',string)
print(string)
The output i'm getting is
ind
But this code only applies for lowercase
Can you please tell me how to write code for both upper and lowercase
Try using a list comprehension to check if each character is in string.ascii_letters or not, if it is, it will be stored:
import string
String = ' 529Wind3#. '
print(''.join([i for i in String if i in string.ascii_letters]))
Output:
Wind
I agree with #U8-Forward's point but I think you may also want to know why your regular expression isn't working. This
[^a-z]+[^A-Z]
doesn't do what you want because W matches [^a-z]+ and so gets removed.
Put all of the characters you don't want in a single character class:
[^a-zA-Z]+
You need to write [^a-zA-Z] instead of [^a-z]+[^A-Z]. The + operator is for detecting repetitive characters and not to combine multiple conditions.
Try the below code for your requirement:
import re
string=re.sub('[^a-zA-Z]',' ',string)
print(string)
you can use re.findall
import re
String = ' 529Wind3#. '
string = re.findall('[a-zA-Z]+', String)
print(''.join(string))
print re.sub('[^a-zA-Z]','',string)

Parsing a MAC address with python

How can I convert a hex value "0000.0012.13a4" into "00:00:00:12:13:A4"?
text = '0000.0012.13a4'
text = text.replace('.', '').upper() # a little pre-processing
# chunk into groups of 2 and re-join
out = ':'.join([text[i : i + 2] for i in range(0, len(text), 2)])
print(out)
00:00:00:12:13:A4
import re
old_string = "0000.0012.13a4"
new_string = ':'.join(s for s in re.split(r"(\w{2})", old_string.upper()) if s.isalnum())
print(new_string)
OUTPUT
> python3 test.py
00:00:00:12:13:A4
>
Without modification, this approach can handle some other MAC formats that you might run into like, "00-00-00-12-13-a4"
Try following code
import re
hx = '0000.0012.13a4'.replace('.','')
print(':'.join(re.findall('..', hx)))
Output: 00:00:00:12:13:a4
There is a pretty simple three step solution:
First we strip those pesky periods.
step1 = hexStrBad.replace('.','')
Then, if the formatting is consistent:
step2 = step1[0:2] + ':' + step1[2:4] + ':' + step1[4:6] + ':' + step1[6:8] + ':' + step1[8:10] + ':' + step1[10:12]
step3 = step2.upper()
It's not the prettiest, but it will do what you need!
It's unclear what you're asking exactly, but if all you want is to make a string all uppercase, use .upper()
Try to clarify your question somewhat, because if you're asking about converting some weirdly formatted string into what looks like a MAC address, we need to know that to answer your question.

Python - get string between parantheses

Example (suppose this is a string):
0.1+0.5*(sign(t-0.5)+1)*t + sign(t+0.8)
I have to get the string 't-0.5' and 't+0.8' between the parantheses of the sign functions (=string in this case), so that after substitution I can get for example:
0.1+0.5*(copysign(1,t-0.5)+1)*t + copysign(1,t+0.8)
Any help would be appreciated.
Your question is not very clear, but if you want a solution for this specific case, then here it is:
>>> s = '0.1+0.5*(sign(t-0.5)+1)*t + sign(t+0.8)'
>>> s.replace('sign(', 'copysign(1,')
'0.1+0.5*(copysign(1,t-0.5)+1)*t + copysign(1,t+0.8)'
If you want a more general replacement, it could be quite tricky.
You can use regex:
>>> import re
>>> s = '0.1+0.5*(sign(t-0.5)+1)*t + sign(t+0.8)'
>>> print re.sub(r'sign\((t.*?)\)', r'copysign(1,\1)', s)
0.1+0.5*(copysign(1,t-0.5)+1)*t + copysign(1,t+0.8)
Someone could probably come up with a better expression; I'm not that great at regex. But it works :).

Categories

Resources