Extract everything before a particular string in python - python

Let's say I have a string
s = 'ab#cD!.2e.cp'
I want to extract only ab#cD!.2e out of it. I am trying this:
print(re.search(r'^(.*?)\.cp',s).group())
But still getting the output as ab#cD!.2e.cp. Can someone please tell me where I am doing it wrong and what should be the correct regex for this?

You probably meant to add 1 as parameter to group:
import re
s = 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group() # 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group(0) # 'ab#cD!.2e.cp'
re.search(r'^(.*?)\.cp',s).group(1) # 'ab#cD!.2e'

Instead of re.search, use re.findall:
import re
s = 'ab#cD!.2e.cp'
print(re.findall(r'^(.*?)\.cp',s)[0])
Output:
ab#cD!.2e

If it is really just about extracting everything before a certain string - as your title suggests - you don't need a regex at all but a simple split will do:
res = s.split('.cp')[0]
yields
'ab#cD!.2e'
Please be aware that this will return the original string if .cp was not found:
s = 'foo'
s.split('.cp')[0]
will return
'foo'

Related

Regex : replace url inside string

i have
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
i need a python regex expression to identify xxx-zzzzzzzzz.eeeeeeeeeee.fr to do a sub-string function to it
Expected output :
string : 'Server:PIPELININGSIZE'
the URL is inside a string, i tried a lot of regex expressions
Not sure if this helps, because your question was quite vaguely formulated. :)
import re
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
string_1 = re.search('[a-z.-]+([A-Z]+)', string).group(1)
print(f'string: Server:{string_1}')
Output:
string: Server:PIPELININGSIZE
No regex. single line use just to split on your target word.
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
last = string.split("fr",1)[1]
first =string[:string.index(":")]
print(f'{first} : {last}')
Gives #
Server:PIPELININGSIZE
The wording of the question suggests that you wish to find the hostname in the string, but the expected output suggests that you want to remove it. The following regular expression will create a tuple and allow you to do either.
import re
str = "Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE"
p = re.compile('^([A-Za-z]+[:])(.*?)([A-Z]+)$')
m = re.search(p, str)
result = m.groups()
# ('Server:', 'xxx-zzzzzzzzz.eeeeeeeeeee.fr', 'PIPELININGSIZE')
Remove the hostname:
print(f'{result[0]} {result[2]}')
# Output: 'Server: PIPELININGSIZE'
Extract the hostname:
print(result[1])
# Output: 'xxx-zzzzzzzzz.eeeeeeeeeee.fr'

How to get spesific parts from a text? Python

I have a string like this.
'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'
I want to get only values begin with "up:".
Like this:
up:A0A0S2Z391
up:Q9Y263
up:Q9UL54.
How can i do that with python?
By using re module for regular expressions.
import re
text = ''''hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'''
pattern = r'up:.*'
values = re.findall(pattern, text)
print(values)
Output:
['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']
You could use the split() method for that.
Here is a link to the documentation:
https://docs.python.org/3/library/stdtypes.html?#str.split
Something like this could work for the string you posted:
s = 'hsa:578\tup:Q16611\nhsa:578\tup:A0A0S2Z391\nhsa:9373\tup:Q9Y263\nhsa:9344\tup:Q9UL54\nhsa:5894\tup:P04049\nhsa:5894\tup:L7RRS6\nhsa:673\tup:P15056\n'
res = []
for i in s.split('up')[1:]:
res.append('up' + i.split()[0])
print(res)
output:
['up:Q16611', 'up:A0A0S2Z391', 'up:Q9Y263', 'up:Q9UL54', 'up:P04049', 'up:L7RRS6', 'up:P15056']

how to use python re findall and regex so that 2 conditions can be run simultaneously?

here my data string :
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
i use this code, but when i run it it shows empty at DATANORMAL
mydata = re.findall(r'MYDATA=(.*)' r'_.*', mystring)
print mydata
and it just shows : NOTNORMAL
i want both to work, and displays data like this:
DATANORMAL
NOTNORMAL
how do i do it? Thanks.
Try it online!
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL
"""
mydata = re.findall(r'^\s*MYDATA=(?:.+_)?(.+?)\s*$', mystring, re.M)
print(mydata)
In case if you need word before _, not after, then use regex r'^\s*MYDATA=(.+?)(?:_.+)?\s*$' in code above, you may try this second variant here.
Based on what you describe, you might want to use an alternation here:
\bMYDATA=((?:DATA|(?:DATA_))\S+)\b
Script:
inp = "some text MYDATA=DATANORMAL more text MYDATA=DATA_NOTNORMAL"
mydata = re.findall(r'\bMYDATA=((?:DATA|(?:DATA_))\S+)\b', inp)
print(mydata)
This prints:
['DATANORMAL', 'DATA_NOTNORMAL']
I guess you need to add flags=re.M?
import re
mystring = """
MYDATA=DATANORMAL
MYDATA=DATA_NOTNORMAL"""
pattern = re.compile("MYDATA=(?:DATA_)?(\w+)",flags=re.M)
print(pattern.findall(mystring))

How to use regex to find the middle of a string

I'm trying to get certain results out of the response from Blogger. I wanna get my blog names. How would I go about something like that with Regex? I've tried Googling my issue but none of the answers helped me in my case unfortunately.
So my response looks something like this:
\\x22http://emyblog.blogspot.com/
So it's always starting with the \\x22http:// and ending with .blogspot.com/
I've tried the following re:
regEx = re.findall(b"""\x22http://(.*)\.blogspot\.com""", r)
But unfortunately it returned an empty list. Any idea's on how to solve this problem?
Thanks,
Use a raw string, otherwise \\x22 is interpreted as the character " instead of a literal string. Not sure that the re.findall method is the good method, re.search should suffice.
Assuming your byte-string is:
>>> r = rb'\\x22http://emyblog.blogspot.com/'
With byte-strings:
>>> res = re.search(rb'\\x22http://(.*)\.blogspot\.com/', r)
>>> res.group(1)
b'emyblog'
With normal strings:
>>> res = re.search(r'\\\\x22http://(.*)\.blogspot\.com/', r.decode('utf-8'))
>>> res.group(1)
'emyblog'
use r'' (string is taken as raw string literal) instead of b''
import re
pattern = re.compile(r'\x22http://(.*)\.blogspot\.com')
match = pattern.match('\x22http://emyblog.blogspot.com/')
match.group(1)
# 'emyblog'
This seems to be working!
import re
text = "\x22http://emyblog.blogspot.com/"
regex = re.compile('\x22http://(.*)\.blogspot\.com')
print regex.findall(text)

Extracting text in the middle of a string - python

i was wondering if anyone has a simpler solution to extract a few letters in the middle of a string. i want to retrive the 3 letters (in this case, GMB) and all the entries follow the same patter. i'struggling o get a simpler way of doing this.
here is an example of what i've been using.
entry = "entries-alphabetical.jsp?raceid13=GMB$20140313A"
symbol = entry.strip('entries-alphabetical.jsp?raceid13=')
symbol = symbol[0:3]
print symbol
thanks
First of all the argument passed to str.strip is not prefix or suffix, it is just a combination of characters that you want to be stripped off from the string.
Since the string looks like an url, you can use urlparse.parse_qsl:
>>> import urlparse
>>> urlparse.parse_qsl(entry)
[('entries-alphabetical.jsp?raceid13', 'GMB$20140313A')]
>>> urlparse.parse_qsl(entry)[0][1][:3]
'GMB'
This is what regular expressions are for. http://docs.python.org/2/library/re.html
import re
val = re.search(r'(GMB.*)', entry)
print val.group(1)

Categories

Resources