How to get spesific parts from a text? Python - python

I have a string like this.
'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'
I want to get only values begin with "up:".
Like this:
up:A0A0S2Z391
up:Q9Y263
up:Q9UL54.
How can i do that with python?

By using re module for regular expressions.
import re
text = ''''hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'''
pattern = r'up:.*'
values = re.findall(pattern, text)
print(values)
Output:
['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']

You could use the split() method for that.
Here is a link to the documentation:
https://docs.python.org/3/library/stdtypes.html?#str.split
Something like this could work for the string you posted:
s = 'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'
res = []
for i in s.split('up')[1:]:
res.append('up' + i.split()[0])
print(res)
output:
['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']

Related

replace before and after a string using re in python

i have string like this 'approved:rakeshc#IAD.GOOGLE.COM'
i would like extract text after ':' and before '#'
in this case the test to be extracted is rakeshc
it can be done using split method - 'approved:rakeshc#IAD.GOOGLE.COM'.split(':')[1].split('#')[0]
but i would want this be done using regular expression.
this is what i have tried so far.
import re
iptext = 'approved:rakeshc#IAD.GOOGLE.COM'
re.sub('^(.*approved:)',"", iptext) --> give everything after ':'
re.sub('(#IAD.GOOGLE.COM)$',"", iptext) --> give everything before'#'
would want to have the result in single expression. expression would be used to replace a string with only the middle string
Here is a regex one-liner:
inp = "approved:rakeshc#IAD.GOOGLE.COM"
output = re.sub(r'^.*:|#.*$', '', inp)
print(output) # rakeshc
The above approach is to strip all text from the start up, and including, the :, as well as to strip all text from # until the end. This leaves behind the email ID.
Use a capture group to copy the part between the matches to the result.
result = re.sub(r'.*approved:(.*)#IAD\.GOOGLE\.COM$', r'\1', iptext)
Hope this works for you:
import re
input_text = "approved:rakeshc#IAD.GOOGLE.COM"
out = re.search(':(.+?)#', input_text)
if out:
found = out.group(1)
print(found)
You can use this one-liner:
re.sub(r'^.*:(\w+)#.*$', r'\1', iptext)
Output:
rakeshc

how to use python re findall and regex so that 2 conditions can be run simultaneously?

here my data string :
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
i use this code, but when i run it it shows empty at DATANORMAL
mydata = re.findall(r'MYDATA=(.*)' r'_.*', mystring)
print mydata
and it just shows : NOTNORMAL
i want both to work, and displays data like this:
DATANORMAL
NOTNORMAL
how do i do it? Thanks.
Try it online!
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
"""
mydata = re.findall(r'^\s*MYDATA=(?:.+_)?(.+?)\s*$', mystring, re.M)
print(mydata)
In case if you need word before _, not after, then use regex r'^\s*MYDATA=(.+?)(?:_.+)?\s*$' in code above, you may try this second variant here.
Based on what you describe, you might want to use an alternation here:
\bMYDATA=((?:DATA|(?:DATA_))\S+)\b
Script:
inp = "some text MYDATA=DATANORMAL more text MYDATA=DATA_NOTNORMAL"
mydata = re.findall(r'\bMYDATA=((?:DATA|(?:DATA_))\S+)\b', inp)
print(mydata)
This prints:
['DATANORMAL', 'DATA_NOTNORMAL']
I guess you need to add flags=re.M?
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL"""
pattern = re.compile("MYDATA=(?:DATA_)?(\w+)",flags=re.M)
print(pattern.findall(mystring))

Extract everything before a particular string in python

Let's say I have a string
s = 'ab#cD!.2e.cp'
I want to extract only ab#cD!.2e out of it. I am trying this:
print(re.search(r'^(.*?)\.cp',s).group())
But still getting the output as ab#cD!.2e.cp. Can someone please tell me where I am doing it wrong and what should be the correct regex for this?
You probably meant to add 1 as parameter to group:
import re
s = 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group() # 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group(0) # 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group(1) # 'ab#cD!.2e'
Instead of re.search, use re.findall:
import re
s = 'ab#cD!.2e.cp'
print(re.findall(r'^(.*?)\.cp',s)[0])
Output:
ab#cD!.2e
If it is really just about extracting everything before a certain string - as your title suggests - you don't need a regex at all but a simple split will do:
res = s.split('.cp')[0]
yields
'ab#cD!.2e'
Please be aware that this will return the original string if .cp was not found:
s = 'foo'
s.split('.cp')[0]
will return
'foo'

How to use regex to find the middle of a string

I'm trying to get certain results out of the response from Blogger. I wanna get my blog names. How would I go about something like that with Regex? I've tried Googling my issue but none of the answers helped me in my case unfortunately.
So my response looks something like this:
\\x22http://emyblog.blogspot.com/
So it's always starting with the \\x22http:// and ending with .blogspot.com/
I've tried the following re:
regEx = re.findall(b"""\x22http://(.*)\.blogspot\.com""", r)
But unfortunately it returned an empty list. Any idea's on how to solve this problem?
Thanks,
Use a raw string, otherwise \\x22 is interpreted as the character " instead of a literal string. Not sure that the re.findall method is the good method, re.search should suffice.
Assuming your byte-string is:
>>> r = rb'\\x22http://emyblog.blogspot.com/'
With byte-strings:
>>> res = re.search(rb'\\x22http://(.*)\.blogspot\.com/', r)
>>> res.group(1)
b'emyblog'
With normal strings:
>>> res = re.search(r'\\\\x22http://(.*)\.blogspot\.com/', r.decode('utf-8'))
>>> res.group(1)
'emyblog'
use r'' (string is taken as raw string literal) instead of b''
import re
pattern = re.compile(r'\x22http://(.*)\.blogspot\.com')
match = pattern.match('\x22http://emyblog.blogspot.com/')
match.group(1)
# 'emyblog'
This seems to be working!
import re
text = "\x22http://emyblog.blogspot.com/"
regex = re.compile('\x22http://(.*)\.blogspot\.com')
print regex.findall(text)

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.
In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'
z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .
Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

Categories

Resources