How to get "clean" match results in Python

How to get "clean" match results in Python - python

I am a total noob, coding for the first time and trying to learn by doing.
I'm using this:
import re
f = open('aaa.txt', 'r')
string=f.read()
c = re.findall(r"Guest last name: (.*)", string)
print "Dear Mr.", c
that returns
Dear Mr. ['XXXX']
I was wondering, is there any way to get the result like
Dear Mr. XXXX
instead?
Thanks in advance.

You need to take the first item in the list
print "Dear Mr.", c[0]

Yes use re.search if you only expect one match:
re.search(r"Guest last name: (.*)", string).group(1)`
findall is if you expect multiple matches. You probably want to also add ? to your regex (.*?) for a non-greedy capture but you also probably want to be a little more specific and capture up to the next possible character after the name/phrase you want.

Related

regex to match a word and the first parenteshis i find

I need a regex to match a word like 'estabilidade' and then matches anything until it gets to the first parenteshis.
I already tried some regex that i found on internet, but i have difficulties to make my own regex, as i dont understand how it works very well.
Someone can help me?
The regex i already tried were:
re.search(r"([^\(]+)", resultado) -> trying to get just the parenteshis.
and
re.search(r"estabilidade((\s*|.*))\(+", resultado).group(1)
Real Example (need to pick up all the numbers inside the parenthesis, but knowing which word this number is related to. For instance, the first 7 is related to the sentence 'Procura por estabilidade'):
Procura por
estabilidade
(7)
É assertivo(a)
com os outros
(5)
Procura convencer
os outros
(7)
Espontaneamente
se aproxima
dos outros
LIDERANÇA INFLUÊ
10
9
(6)
Demonstra
diplomacia
(5)

As you didn't specify which part of the matched string you want to check, so I included some more groups.
import re
s = 'hello there estabilidade this is just some text (yes it is)'
r = re.search(r"(estabilidade([.\S]+))\(", s)
print(r.group(1)) # "estabilidade this is just some text"
print(r.group(2)) # " this is just some text"

Something like this?
In [1]: import re
In [2]: re.findall(r'([^()]+)\((\d+)\)', 'estabilidade_smth(10) estabilidade_other(20)')
Out[2]: [('estabilidade_smth', '10'), (' estabilidade_other', '20')]

This should do it:
estabilidade([^(]+)
It's using a negative character class, that's the key take away and a good tool in your bag to have. [] is a character class. It is a list of characters, if you put in ^ as the first character it's a list of characters not in there. So [^(] means any character that isn't (. Adding the + means at least 1 of the item to the left. So, putting all that together we want at least 1 non (.
Here is it in Python:
import re
text = "hello estabilidade how are you today (at the farm)"
print (re.search("estabilidade([^(]+)", text).group(1))
Output:
how are you today
Example to play with:
https://regex101.com/r/2qxa0y/1/
Here is a good site to learn some of the basic regex tricks, this will go a long way: https://www.regular-expressions.info/tutorial.html

For my question, i solved the problem with the following regex, using the following tool indicate for one the users here (https://regex101.com/r/2qxa0y/1/)
((|.|[(]|\s)*)\((\d*)\)
Thanks to everyone!!

Python negative regex

I have a string such as:
s = "The code for the product is A8H4DKE3SP93W6J and you can buy it here."
The text in this string will not always be in the same format, it will be dynamic, so I can't do a simple find and replace to obtain the product code.
I can see that:
re.sub(r'A[0-9a-zA-Z_]{14} ', '', s)
will get ride of the product code. How do I go about doing the opposite of this, i.e. deleting all of the text, apart from the product code? The product code will always be a 15 character string, starting with the letter A.
I have been racking my brain and Googling to find a solution, but can't seem to figure it out.
Thanks

Instead of substituting the rest of the string, use re.search() to search for the product number:
In [1]: import re
In [2]: s = "The code for the product is A8H4DKE3SP93W6J and you can buy it here."
In [3]: re.search(r"A[0-9a-zA-Z_]{14}", s).group()
Out[3]: 'A8H4DKE3SP93W6J'

In regex, you can match on the portion you want to keep for substituting by using braces around the pattern and then referring to it in the sub-pattern with backslash followed by the index for that matching portion. In the code below, "(A[0-9A-Za-z_]{14})" is the portion you want to match, and you can substitute in the resulting string using "\1".
re.sub(r'.*(A[0-9A-Za-z_]{14}).*', r'\1', s)

Regular expression: repeating patterns in the beginning of string

For example, consider the following string: "apple1: apple2: apple3: some random words here apple4:"
I want to match only apple1, apple2 and apple3 but not apple4. I am having a hard time to figure out how to archive this.
Any help is appreciated.
Thanks.

If you are using .net you can match the below pattern and then use the Captures property of the group to get all the different apples matched along the way.
(?:(apple\d).*?){3}
If you only want to match the first one:
apple\d
Sweet and simple. Just call match on this once.

So, maybe something like this:
^([A-Za-z]+)[^A-Za-z]+(\1[^A-Za-z]+)+
http://regexr.com/38vvb

From your comment, it sounds like you want to match the occurrences of apple followed by a digit throughout the string except an occurrence of apple followed by a digit at the end of the string.
>>> import re
>>> text = 'apple1: apple2: apple3: some random words here apple4:'
>>> matches = re.findall(r'(\bapple\d+):(?!$)', text)
['apple1', 'apple2', 'apple3']

Sorry guys, I did not format my question properly, it wasn't clear.
I found the solution:
r'\s*((apple)\d+[ \:\,]*)+'
Thanks for all your help!

How can I extract two values from a string like this using a regular expression?

How can I get the value from the following strings using one regular expression?
/*##debug_string:value/##*/
or
/*##debug_string:1234/##*/
or
/*##debug_string:http://stackoverflow.com//##*/
The result should be
value
1234
http://stackoverflow.com/

Trying to read behind your pattern
re.findall("/\*##debug_string:(.*?)/##\*/", your_string)
Note that your variations cannot work because you didn't escape the *. In regular expressions, * mean a repetition of the previous character/group. If you really mean the * character, you must use \*.
import re
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:value/##*/")
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:1234/##*/")
print re.findall("/\*##debug_string:(.*?)/##\*/", "/*##debug_string:http://stackoverflow.com//##*/")
Executes as:
['value']
['1234']
['http://stackoverflow.com/']
EDIT: Ok I see that you can have a URL. I've amended the pattern to take it into account.

Use this regex:
[^:]+:([^/]+)
And use capture group #1 for your value.
Live Demo: http://www.rubular.com/r/FxFnpfPHFn

Your regex will be something like: .*:(.*)/.+. Group 1 will be what you are looking for. However this is a REALLY inclusive regex, you might want to post some more details so that you can create some more restrictions.

Assuming that the format stays consistent:
re.findall('debug_string:([^\/]+)\/##', string)

Match a string that does not end with a list of known strings

I want to match street names, which could come in forms of " St/Ave/Road". The postfix may not be exist at all, so it may just be "1st". I also want to know what the postfix is. What is a suitable regex for it? I tried:
(.+)(\s+(St|Ave|Road))?
But it seems like the first group greedily matches the entire string. I tried a look back (?<!), but couldn't get it to work properly, as it kept on spewing errors like "look-behind requires fixed-width pattern".
If it matters at all, I'm using Python.
Any suggestions?

Just make your first group non-greedy by adding a question mark:
(.+?)(\s+(St|Ave|Road))?

As an alternative to regex-based solutions, how about:
suffix = s.split(' ')[-1]
if suffix in ('St', 'Ave', 'Road'):
print 'suffix is', suffix
else:
print 'no suffix'
If you do have to use regular expressions, simply make the first match non-greedy, like to: r'.*?\s+(St|Ave|Road)$'
In [28]: print re.match(r'(.*?)\s+(St|Ave|Road)$', 'Main Road')
<_sre.SRE_Match object at 0x260ead0>
In [29]: print re.match(r'(.*?)\s+(St|Ave|Road)$', 'nothing here')
None

You wanted negative look ahead
(?!(St|Ave|Road))$

How about negative look behind:
(?!<=(St|Ave|Road))$
it seems to express the requirement closely

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get "clean" match results in Python - python

You need to take the first item in the list print "Dear Mr.", c[0]

Related

regex to match a word and the first parenteshis i find

Python negative regex

Regular expression: repeating patterns in the beginning of string

How can I extract two values from a string like this using a regular expression?

Match a string that does not end with a list of known strings

Categories

Resources