replace before and after a string using re in python - python

i have string like this 'approved:rakeshc#IAD.GOOGLE.COM'
i would like extract text after ':' and before '#'
in this case the test to be extracted is rakeshc
it can be done using split method - 'approved:rakeshc#IAD.GOOGLE.COM'.split(':')[1].split('#')[0]
but i would want this be done using regular expression.
this is what i have tried so far.
import re
iptext = 'approved:rakeshc#IAD.GOOGLE.COM'
re.sub('^(.*approved:)',"", iptext) --> give everything after ':'
re.sub('(#IAD.GOOGLE.COM)$',"", iptext) --> give everything before'#'
would want to have the result in single expression. expression would be used to replace a string with only the middle string

Here is a regex one-liner:
inp = "approved:rakeshc#IAD.GOOGLE.COM"
output = re.sub(r'^.*:|#.*$', '', inp)
print(output) # rakeshc
The above approach is to strip all text from the start up, and including, the :, as well as to strip all text from # until the end. This leaves behind the email ID.

Use a capture group to copy the part between the matches to the result.
result = re.sub(r'.*approved:(.*)#IAD\.GOOGLE\.COM$', r'\1', iptext)

Hope this works for you:
import re
input_text = "approved:rakeshc#IAD.GOOGLE.COM"
out = re.search(':(.+?)#', input_text)
if out:
found = out.group(1)
print(found)

You can use this one-liner:
re.sub(r'^.*:(\w+)#.*$', r'\1', iptext)
Output:
rakeshc

Related

Regex : replace url inside string

i have
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
i need a python regex expression to identify xxx-zzzzzzzzz.eeeeeeeeeee.fr to do a sub-string function to it
Expected output :
string : 'Server:PIPELININGSIZE'
the URL is inside a string, i tried a lot of regex expressions
Not sure if this helps, because your question was quite vaguely formulated. :)
import re
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
string_1 = re.search('[a-z.-]+([A-Z]+)', string).group(1)
print(f'string: Server:{string_1}')
Output:
string: Server:PIPELININGSIZE
No regex. single line use just to split on your target word.
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
last = string.split("fr",1)[1]
first =string[:string.index(":")]
print(f'{first} : {last}')
Gives #
Server:PIPELININGSIZE
The wording of the question suggests that you wish to find the hostname in the string, but the expected output suggests that you want to remove it. The following regular expression will create a tuple and allow you to do either.
import re
str = "Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE"
p = re.compile('^([A-Za-z]+[:])(.*?)([A-Z]+)$')
m = re.search(p, str)
result = m.groups()
# ('Server:', 'xxx-zzzzzzzzz.eeeeeeeeeee.fr', 'PIPELININGSIZE')
Remove the hostname:
print(f'{result[0]} {result[2]}')
# Output: 'Server: PIPELININGSIZE'
Extract the hostname:
print(result[1])
# Output: 'xxx-zzzzzzzzz.eeeeeeeeeee.fr'

replace a string after substring found in jython/python

I have a string like this
ABC/AAAA DEF/78kkk OBJ/89KKK KLE/67899
and I pass the substring to find and replace after. so If I pass DEF/00012 and the original string
should be replaced as like this
ABC/AAAA DEF/00012 OBJ/89KKK KLE/67899
I have tried with string.replace('DEF', 'DEF/00012')
I would get the output as
ABC/AAAA DEF/00012/78kkk OBJ/89KKK KLE/67899
any suggestions would be highly appreciated.
Thanks
I would do:
txt = 'ABC/AAAA DEF/78kkk OBJ/89KKK KLE/67899'
change = 'DEF'
changeto = 'DEF/00012'
newtxt = ' '.join(changeto if i.startswith(change) else i for i in txt.split(' '))
print(newtxt)
Output:
ABC/AAAA DEF/00012 OBJ/89KKK KLE/67899
I splitted at spaces and changed part beginning with DEF
string.replace('DEF/78kkk', 'DEF/00012')
If you mean by "substring" is that the succeeding characters after "DEF" is not fixed to a specific value, use regular expressions instead.
result = re.sub("DEF/\w+", "DEF/00012", string)
Assuming there really is a blank space after every "substring" you will have to use re:
import re
your_string = re.sub("DEF/*$", "DEF/00012", your_string)

How to use a regex variable in a regular expression?

I am using the following pattern to clean a piece of text (replacing the matches with null):
{\s{\s\"[A-Za-z0-9.,\-:]*(?<!\bbecause\b)(?<!\bsince\b)\"\s}\s\"[A-Za-z0-9.,\-:]*\"\s}
I have a list of relators like "because" and "since" that could change every time. So I created a separate string which is a regex itself like:
lookahead_string = (?<!\bbecause\b)(?<!\bsince\b)
And put it in my original regex pattern and changed it like the following:
{\s{\s\"[A-Za-z0-9.,\-:]*'+lookahead_string+r'\"\s}\s\"[A-Za-z0-9.,\-:]*\"\s}
But the new pattern does not match the parts of the input text that could be matched using the original regex pattern. The code I am using is:
lookahead_string = ''
relators = ["because", "since"]
for rel in relators:
lookahead_string += '(?<!\b'+rel+'\b)'
text = re.sub(r'{\s{\s\"[A-Za-z0-9.,\-:]*'+lookahead_string+r'\"\s}\s\"[A-Za-z0-9.,\-:]*\"\s}', "", text)
text = ' '.join(text.split())
What should I do to make it work?! I have already tried using re.escape and format string but none of them works in my case.
Edit: I removed the input output text because I thought it is a little confusing. However, I thank #DYZ for the good suggestion.
A suggestion: Instead of messing up with the complex string syntax, convert the string to a Python list.
import ast
l = ast.literal_eval("[" + s.replace("}", "],").replace("{", "[") + "]")
#[[[[['I'], 'PRP'], 'NP'], [[[[['did'], 'VBD'], [['not'], 'RB'], 'VP'],
# ..., 'S'], '']
Now you can apply simple list functions to your data and, when done, transform the list to a bracketed string.

Extract substring between specific characters

I have some strings like:
\i{}Agrostis\i0{} <L.>
I would like to get rid of the '\i{}', '\io{}' characters, so that I could get just:
Agrostis <L.>
I've tried the following code (adapted from here):
m = re.search('\i{}(.+?)\i0', item_name)
if m:
name = m.group(1).strip('\\')
else:
name = item_name
It works in part, because when I run it I get just:
Agrostis
without the
<L.>
part (which I want to keep).
Any hints?
Thanks in advance for any assistance you can provide!
Use s.replace('\i{}', '') and s.replace('\io{}', '')
You ca do this in different ways.
The simplest one is to use str.replace
s = '''\i{}Agrostis\i0{} <L.>'''
s2 = s.replace('''\i{}''', '').replace('''\i0{}''', '')
Another way is to use re.sub()
You need to use the re.sub function.
In [34]: import re
In [35]: s = "\i{}Agrostis\i0{} <L.>"
In [36]: re.sub(r'\\i\d*{}', '', s)
Out[36]: 'Agrostis <L.>'
You could use a character class along with re.sub()
import re
regex = r'\\i[\d{}]+'
string = "\i{}Agrostis\i0{} <L.>"
string = re.sub(regex, '', string)
print string
See a demo on ideone.com.
You can either use s.replace('\i{}', '') and s.replace('\io{}', ''), as Julien said, or, continuing with the regex approach, change your pattern to:
re.search('\i{}(.+?)\i0(.++)', item_name)
And use m.group(1).strip('\\') + m.group(2).strip('\\') as the result.

Python : How to ignore a delimited part of a sentence?

I have the following line :
CommonSettingsMandatory = #<Import Project="[\\.]*Shared(\\vc10\\|\\)CommonSettings\.targets," />#,true
and i want the following output:
['commonsettingsmandatory', '<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />', 'true'
If i do a simple regex with the comma, it will split the value if there's a value in it, like i wrote a comma after targets, it will split here.
So i want to ignore the text between the ## to make sure there's no splitting there.
I really don't know how to do!
http://docs.python.org/library/re.html#re.split
import re
string = 'CommonSettingsMandatory = #toto,tata#, true'
splitlist = re.split('\s?=\s?#(.*?)#,\s?', string)
Then splitlist contains ['CommonSettingsMandatory', 'toto,tata', 'true'].
While you might be able to use split with a lookbehind, I would use the groups captured by this expression.
(\S+)\s*=\s*##([^#]+)##,\s*(.*)
m = re.Search(expression, myString). use m.group(1) for the first string, m.group(2) for the second, etc.
If I understand you correctly, you're trying to split the string using spaces as delimiters, but you want to also remove any text between pound signs?
If that's correct, why not simply remove the pound sign-delimited text before splitting the string?
import re
myString = re.sub(r'#.*?#', '', myString)
myArray = myString.split(' ')
EDIT: (based on revised question)
import re
myArray = re.findall(r'^(.*?) = #(.*?)#,(.*?)$', myString)
That will actually return an array of tuples including your matches, in the form of:
[
(
'commonsettingsmandatory',
'<Import Project="[\\\\.]*Shared(\\\\vc10\\\\|\\\\)CommonSettings\\.targets," />',
'true'
)
]
(spacing added to illustrate the format better)

Categories

Resources