For example, consider the following string: "apple1: apple2: apple3: some random words here apple4:"
I want to match only apple1, apple2 and apple3 but not apple4. I am having a hard time to figure out how to archive this.
Any help is appreciated.
Thanks.
If you are using .net you can match the below pattern and then use the Captures property of the group to get all the different apples matched along the way.
(?:(apple\d).*?){3}
If you only want to match the first one:
apple\d
Sweet and simple. Just call match on this once.
So, maybe something like this:
^([A-Za-z]+)[^A-Za-z]+(\1[^A-Za-z]+)+
http://regexr.com/38vvb
From your comment, it sounds like you want to match the occurrences of apple followed by a digit throughout the string except an occurrence of apple followed by a digit at the end of the string.
>>> import re
>>> text = 'apple1: apple2: apple3: some random words here apple4:'
>>> matches = re.findall(r'(\bapple\d+):(?!$)', text)
['apple1', 'apple2', 'apple3']
Sorry guys, I did not format my question properly, it wasn't clear.
I found the solution:
r'\s*((apple)\d+[ \:\,]*)+'
Thanks for all your help!
Related
I have a string containing words in the form word1_word2, word3_word4, word5_word1 (so a word can appear at the left or at the right). I want a regex that looks for all the occurrences of a specific word, and returns the "super word" containing it. So if I'm looking for word1, I expect my regex to return word1_word2, word5_word1. Since the word can appear on the left or on the right, I wrote this:
re.findall("( {}_)?[\u0061-\u007a\u00e0-\u00e1\u00e8-\u00e9\u00ec\u00ed\u00f2-\u00f3\u00f9\u00fa]*(_{} )?".format("w1", "w1"), string)
With the optional blocks at the beginning or at the end of the pattern. However, it takes forever to execute and I think something is not correct because I tried removing the optional blocks and writing two separate regex for looking at the beginning and at the end and they are much faster (but I don't want to use two regex). Am I missing something or is it normal?
This would be the regex solution to your problem:
re.findall(rf'\b({yourWord}_\w+?|\w+?_{yourWord})\b', yourString)
Python provides some methods to do this
a=['word1_word2', 'word3_word4', 'word5_word1']
b = [x for x in a if x.startswith("word1") or x.endswith('word1')]
print(b) # ['word1_word2', 'word5_word1']
Referenece link
s = 'word1_word2, word3_word4, word5_word1'
matches = re.finditer(r'(\w+_word1)|(word1_\w+)', s)
result = list(map(lambda x: x.group(), matches))
['word1_word2', 'word5_word1']
This is one method, but seeing #Carl his answer I voted for his. That is a faster and cleaner method. I will just leave it here as one of many regex options.
this regex will do the job for word1:
regex = (word\d_)*word1(_word\d)*
re.findall(regex, string)
you can also use this:
re.findall(rf'\b(word{number}_\w+?|\w+?_word{number})\b', string)
Try the following regex.
In the following, replace word1 with the word you're looking for. This is assuming that the word you are looking for consists of only alphanumeric characters.
([a-zA-Z0-9]*_word1)|(word1_.[a-zA-Z0-9]*)
I'm very new with the syntax of regex, I already read some about the libary. I'm trying extract names from a simple sentence, but I found myself in trouble, below I show a exemple of what I've done.
x = 'Fred used to play with his brother, Billy, both are 10 and their parents Jude and Edde have two more kids.'
import re
re.findall('^[A-Za-z ]+$',x)
Anyone can explain me what is wrong and how to proceed?
Use
re.findall(r'\b[A-Z]\w*', x)
See proof. It matches words starting with uppercase letter and having any amount of letters, digits or underscores.
I think your regex has two problems.
You want to extract names of sentence. You need to remove ^ start of line and $ end of line.
Name starts with uppercase and does not have space. You should remove in your regex.
You could use following regex.
\b[A-Z][A-Za-z]+\b
I also tried to test result on python.
x = 'Fred used to play with his brother, Billy, both are 10 and their parents Jude and Edde have two more kids.'
import re
result = re.findall('\\b[A-Z][A-Za-z]+\\b',x)
print(result)
Result.
['Fred', 'Billy', 'Jude', 'Edde']
I'm trying to create a regex to catch all hexadecimal colors in a string literal. I'm using Python 3, and that's what I have:
import re
pattern = re.compile(r"#[a-fA-F\d]{3}([a-fA-F\d]{3})?")
However, when I apply the findall regex method on #abcdef here's what I get:
>>> re.findall(pattern,"#abcdef")
["def"]
Can someone explain me why do I have that? I actually need to get ["#abcdef"]
Thank you in advance
According to http://regex101.com:
It looks like this regex is looking for
#(three characters a through f, A through F or a digit)(three characters a through f, A through F or a digit, which may or may not be present, and if they are they are what is returned from the match)
If you are looking to match any instance of the whole above string, I would recommend this instead:
#[a-fA-F\d]{6}
Thanks to Andrej Kesely, I got the answer to my question, that is:
Regex will return capturing group.
To bypass this, just change the regex from:
r"#[a-fA-F\d]{3}([a-fA-F\d]{3})?"
to:
r"#[a-fA-F\d]{3}(?:[a-fA-F\d]{3})?"
I am a total noob, coding for the first time and trying to learn by doing.
I'm using this:
import re
f = open('aaa.txt', 'r')
string=f.read()
c = re.findall(r"Guest last name: (.*)", string)
print "Dear Mr.", c
that returns
Dear Mr. ['XXXX']
I was wondering, is there any way to get the result like
Dear Mr. XXXX
instead?
Thanks in advance.
You need to take the first item in the list
print "Dear Mr.", c[0]
Yes use re.search if you only expect one match:
re.search(r"Guest last name: (.*)", string).group(1)`
findall is if you expect multiple matches. You probably want to also add ? to your regex (.*?) for a non-greedy capture but you also probably want to be a little more specific and capture up to the next possible character after the name/phrase you want.
suppose i have string='IT:A12-IT:B23:REMOVE-IT:C45-IT:A67:ME'
i want the end result of the string using regex .sub to be string='IT:A12-IT:B23-IT:C45-IT:A67'.
I want to remove(any/any amount of) character y in IT:xxx:yyy.
I tried something like re.sub(r':.+-','',string). However it removes everything. Please help thanks.
First things first, your question is crazy confusing. But, it looks like this is the regex that you are looking for. Change the number inside \w{} to delete item that has particular amount of characters. This example deletes anything with 2 characters:
/\bIT:\w+:(?<removeme>\w{2})\b/
You want output like this:
>>> s = "IT:A12-IT:B23:REMOVE-IT:C45-IT:A67:ME"
>>> import re
>>> re.sub(r"\:[A-Z]{2,}", r"",s)
'IT:A12-IT:B23-IT:C45-IT:A67'