How to output only the match string using Python? - python

I want to match a string then print the string that matched.
I need to match a string mapping=C111 from all those lists.
Here what I tried. I can find the matched string but I can not print only the matched string.
import re
AllString = ["123A","B456","AGHF\C111\B321","3FEW/D654"]
print(type(AllString))
for str in AllString:
mapping = "C111"
findid = [re.match(mapping, str)]
for f in findid:
if f is not None:
print(f)
The output is like this:
<re.Match object; span=(0, 4), match='C111'>
My expectation result is "AGHF\C111\B321" the whole string.
Anyone can help, please. Thank you so much

import re
AllString = ["123A","B456","C111\B321","3FEW/D654"]
print(type(AllString))
for str in AllString:
mapping = "C111"
findid = [re.match(mapping, str)]
for f in findid:
if f is not None:
print(f.string) # output: C111\B321 and It makes sense
OR:
import re
AllString = ["123A","B456","C111\B321","3FEW/D654"]
print(type(AllString))
for str in AllString:
mapping = "C111"
findid = [re.match(mapping, str)]
for f in findid:
if f is not None:
print(mapping) # It meets your requirement but looks weird
NEW UPDATED:
import re
AllString = ["123A","B456","AGHF\C111\B321","3FEW/D654"]
print(type(AllString))
for str in AllString:
mapping = r".+C111.+" # method 'match' should be used with regex
findid = [re.match(mapping, str)]
for f in findid:
if f is not None:
print(f.string)

One problem with the code is re.match must match at the start of the string. You could use re.search instead, but there is no need for a regular expression in this case. Use in:
strings = ['123A','B456','AGHF\C111\B321','3FEW/D654']
for s in strings:
if 'C111' in s:
print(s)
AGHF\C111\B321
If you need to match the exact alphanumeric sequence with no extra letters/numbers around it then use re.search with \b (word breaks):
import re
strings = ["123A","B456","C111\B321","3FEW/ABC111DEF/D654","ABC\C111/DEF"]
for s in strings:
if re.search(r'\bC111\b',s):
print(s)
C111\B321
ABC\C111/DEF

Related

Capture substring within a string - dynamically

I have a string:
ostring = "Ref('r1_featuring', ObjectId('5f475')"
What I am trying to do is search the string and check if it starts with Ref, if it does it should remove everything in the string and keep the substring 5f475.
I know this can be done using a simple replace like so:
string = ostring.replace("Ref('r1_featuring', ObjectId('", '').replace("')", '')
But I cannot do it this way as it needs to all be dynamic as there are going to be different strings each time. So I need to do it in a way that it will search the string and check if it starts with Ref, if it does then grab the alphanumeric value.
Desired Output:
5f475
Any help will be appreciated.
Like that?
>>> import re
>>> pattern = r"Ref.*'(.*)'\)$"
>>> m = re.match(pattern, "Ref('r1_featuring', ObjectId('5f475')")
>>> if m:
... print(m.group(1))
...
5f475
# >= python3.8
>>> if m := re.match(pattern, "Ref('r1_featuring', ObjectId('5f475')"):
... print(m.group(1))
...
5f475
a regex-free solution :)
ostring = "Ref('r1_featuring', ObjectId('5f475')"
if ostring.startswith("Ref"):
desired_part = ostring.rpartition("('")[-1].rpartition("')")[0]
str.rpartition

How to remove characters from a str in python?

I have the following str I want to delete characters.
For example:
from str1 = "A.B.1912/2013(H-0)02322"
to 1912/2013
from srt2 = "I.M.1591/2017(I-299)17529"
to 1591/2017
from str3 = "I.M.C.15/2017(I-112)17529"
to 15/2017
I'm trying this way, but I need to remove the rest from ( to the right
newStr = str1.strip('A.B.')
'1912/2013(H-0)02322'
For the moment I'm doing it with slice notation
str1 = "A.B.1912/2013(H-0)02322"
str1 = str1[4:13]
'1912/2013'
But not all have the same length.
Any ideas or suggestions?
With some (modest) assumptions about the format of the strings, here's a solution without using regex:
First split the string on the ( character, keeping the substring on the left:
left = str1.split( '(' )[0] # "A.B.1912/2013"
Then, split the result on the last . (i.e. split from the right just once), keeping the second component:
cut = left.rsplit('.', 1)[1] # "1912/2013"
or combining the two steps into a function:
def extract(s):
return s.split('(')[0].rsplit('.', 1)[1]
Use a regex instead:
import re
regex = re.compile(r'\d+/\d+')
print(regex.search(str1).group())
print(regex.search(str2).group())
print(regex.search(str3).group())
Output:
1912/2013
1591/2017
15/2017
We can try using re.sub here with a capture group:
str1 = "A.B.1912/2013(H-0)02322"
output = re.sub(r'.*\b(\d+/\d+)\b.*', '\\1', str1)
print(output)
1912/2013
You have to use a regular expression to solve this problem.
import re
pattern = r'\d+/\d+'
str1 = "A.B.1912/2013(H-0)02322"
srt2 = "I.M.1591/2017(I-299)17529"
str3 = "I.M.C.15/2017(I-112)17529"
print(*re.findall(pattern, str1))
print(*re.findall(pattern, str2))
print(*re.findall(pattern, str3))
Output:
1912/2013
1591/2017
15/2017

How to parse values appear after the same string in python?

I have a input text like this (actual text file contains tons of garbage characters surrounding these 2 string too.)
(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)
I am trying to parse the text to store something like this:
value1="xxx" and value2="yyy".
I wrote python code as follows:
value1_start = content.find('value')
value1_end = content.find(';', value1_start)
value2_start = content.find('value')
value2_end = content.find(';', value2_start)
print "%s" %(content[value1_start:value1_end])
print "%s" %(content[value2_start:value2_end])
But it always returns:
value=xxx
value=xxx
Could anyone tell me how can I parse the text so that the output is:
value=xxx
value=yyy
Use a regex approach:
re.findall(r'\bvalue=[^;]*', s)
Or - if value can be any 1+ word (letter/digit/underscore) chars:
re.findall(r'\b\w+=[^;]*', s)
See the regex demo
Details:
\b - word boundary
value= - a literal char sequence value=
[^;]* - zero or more chars other than ;.
See the Python demo:
import re
rx = re.compile(r"\bvalue=[^;]*")
s = "$%$%&^(&value=xxx;$%^$%^$&^%^*value=yyy;%$#^%"
res = rx.findall(s)
print(res)
Use regex to filter the data you want from the "junk characters":
>>> import re
>>> _input = '#4#5%value=xxx38u952035983049;3^&^*(^%$3value=yyy#%$#^&*^%;$#%$#^'
>>> matches = re.findall(r'[a-zA-Z0-9]+=[a-zA-Z0-9]+', _input)
>>> matches
['value=xxx', 'value=yyy']
>>> for match in matches:
print(match)
value=xxx
value=yyy
>>>
Summary or the regular expression:
[a-zA-Z0-9]+: One or more alphanumeric characters
=: literal equal sign
[a-zA-Z0-9]+: One or more alphanumeric characters
For this input:
content = '(random_garbage_char_here)**value=xxx**;(random_garbage_char_here)**value=yyy**;(random_garbage_char_here)'
use a simple regex and manually strip off the first and last two characters:
import re
values = [x[2:-2] for x in re.findall(r'\*\*value=.*?\*\*', content)]
for value in values:
print(value)
Output:
value=xxx
value=yyy
Here the assumption is that there are always two leading and two trailing * as in **value=xxx**.
You already have good answers based on the re module. That would certainly be the simplest way.
If for any reason (perfs?) you prefere to use str methods, it is indeed possible. But you must search the second string past the end of the first one :
value2_start = content.find('value', value1_end)
value2_end = content.find(';', value2_start)

Regex Python capture string in quotes

I have a file with lines of this form:
ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName
and I would like to capture the names in quotes "" after ClientsName(0) = and ClientsName(1) =.
So far, I came up with this code
import re
f = open('corrected_clients_data.txt', 'r')
result = ''
re_name = "ClientsName\(0\) = (.*)"
for line in f:
name = re.search(line, re_name)
print (name)
which is returning None at each line...
Two sources of error can be: the backslashes and the capture sequence (.*)...
You can do that more easily using re.findall and using \d instead of 0 to make it more general:
import re
s = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> print re.findall(r'ClientsName\(\d\) = "([^"]*)"', s)
['SUPERBRAND', 'GREATSTUFF']
Another thing you must note is that your order of arguments to search() or findall() is wrong. It should be as follows: re.search(pattern, string)
You can use re.findall and just take the first two matches:
>>> s = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> re.findall(r'\"([^"]+)\"' , s)[:2]
['SUPERBRAND', 'GREATSTUFF']
try this
import re
text_file = open("corrected_clients_data.txt", "r")
text = text_file.read()
matches=re.findall(r'\"(.+?)\"',text)
text_file.close()
if you notice the question mark(?) indicates that we have to stop reading the string
at the first ending double quotes encountered.
hope this is helpful.
Use a lookbehind to get the value of ClientsName(0) and ClientsName(1) through re.findall function,
>>> import re
>>> str = '''ClientsName(0) = "SUPERBRAND": ClientsName(1) = "GREATSTUFF": cClientsNames.Add Key:="SUPER", Item:=ClientsName'''
>>> m = re.findall(r'(?<=ClientsName\(0\) = \")[^"]*|(?<=ClientsName\(1\) = \")[^"]*', str)
>>> m
['SUPERBRAND', 'GREATSTUFF']
Explanation:
(?<=ClientsName\(0\) = \") Positive lookbehind is used to set the matching marker just after to the string ClientsName(0) = "
[^"]* Then it matches any character not of " zero or more times. So it match the first value ie, SUPERBRAND
| Logical OR operator used to combine two regexes.
(?<=ClientsName\(1\) = \")[^"]* Matches any character just after to the string ClientsName(1) = " upto the next ". Now it matches the second value ie, GREATSTUFF

search patterns with variable gaps in python

I am looking for patterns in a list containing different strings as:
names = ['TAATGH', 'GHHKLL', 'TGTHA', 'ATGTTKKKK', 'KLPPNF']
I would like to select the string that has the pattern 'T--T' (no matter how the string starts), so those elements would be selected and appended to a new list as:
namesSelected = ['TAATGH', 'ATGTTKKKK']
Using grep I could:
grep "T[[:alpha:]]\{2\}T"
Is there a similar mode in re python?
Thanks for any help!
I think this is most likely what you want:
re.search(r'T[A-Z]{2}T', inputString)
The equivalent in Python for [[:alpha:]] would be [a-zA-Z]. You may replace [A-Z] with [a-zA-Z] in the code snippet above if you wish to allow lowercase alphabet.
Documentation for re.search.
Yep, you can use re.search:
>>> names = ['TAATGH', 'GHHKLL', 'TGTHA', 'ATGTTKKKK', 'KLPPNF']
>>> reslist = []
>>> for i in names:
... res = re.search(r'T[A-Z]{2}T', i)
... if res:
... reslist.append(i)
...
>>>
>>> print(reslist)
['TAATGH', 'ATGTTKKKK']
import re
def grep(l, pattern):
r = re.compile(pattern)
return [_ for _ in l if r.search(pattern)]
nameSelected = grep(names, "T\w{2}T")
Note the use of \w instead of [[:alpha:]]

Categories

Resources