What is wrong with this Python regular expression? [duplicate] - python

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 4 years ago.
Given a string, I want to find all the substrings consisting of two or three '4,'.
For example, given '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
I want to get ['4,4,', '4,4,', '4,4,4'].
str_ = '1,4,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,3,2,'
m = re.findall(r"(4,){2,3}", str_)
what I get is :
['4,', '4,', '4,', '4,']
what's wrong?
It seems to me that the parenthesis wrapping '4,' is interpreted as grouping but not telling Python '4' and ',' should occur together. However, I don't know how to do this.

Just use non-capturing group (online version of this regex here):
import re
s = '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
print(re.findall(r'(?:4,?){2,3}', s))
Prints:
['4,4,', '4,4,', '4,4,4,']
EDIT:
Edited regex to capture 2 or 3 elements "4,"

Related

How to find 6 digits in a string with a particular pattern on the first 3 digits in Python? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I am trying to find a regular expression to return to me the entire 6 digits with the first 3 digits as a pattern/fixed.
Ex:
import re
string_ex = 'docs/data/622999/2013904065003_file.bin'
re.findall(r'622(\d{3})',string_ex)
results in just ['999']
but I want the result to be ['622999']
Thanks!
You should include 622 too within the parenthesis
>>> import re
>>> string_ex = 'docs/data/622999/2013904065003_file.bin'
>>> re.findall(r'(622\d{3})',string_ex)
['622999']
You can use "index" on the string directly.
i = string_ex.index("622")
found = string_ex[i-3:i+2]
https://www.tutorialspoint.com/python/string_index.htm

How to check and remove '/' and '-' from a list of words at the same time [duplicate]

This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Split string with multiple delimiters in Python [duplicate]
(5 answers)
Closed 3 years ago.
I have the following words in a list
listx=['info/base','tri-gen']
I am trying to remove both the '/' and '-' at the same time.
Currently I have two separate blocks of code (mentioned below) which achieve the above
listx=['info/base','tri-gen']
if '/' in listx:
listmain= '/'.join(listx).split('/')
listmain = list(filter(None, listmain))
if '-' in listx:
listmain= '-'.join(listx).split('-')
listmain = list(filter(None, listmain))
How do I achieve it in a single if condition or is there a way to include many conditions for e.g like below
'-','/'.join(listx).split('-','/')
Expected output
listx=['info base','tri gen']
The quick way to do this is using the re module, which provides you with regex magic. Feel free to read the documentation: https://docs.python.org/3/library/re.html
import re
listx=['info/base','tri-gen']
[re.sub("\/|\-"," ",i) for i in listx]
Output:
['info base', 'tri gen']
EDIT
For your comment, I think you can get away without an if statement.
This regex will find all the words you need while ignoring the ones in parenthesis:
\b\w+\b(?![\(\w+\)])
See it at work: https://regex101.com/r/YqhJDb/1
You can implement something like this:
[" ".join(re.findall(r"\b\w+\b(?![\(\w+\)])", i)) for i in listx]
Output:
['info base', 'tri gen', 'century tech limited']

Why is my regex returning 2 results if there is only one in the string? [duplicate]

This question already has answers here:
strange behavior of parenthesis in python regex
(3 answers)
Closed 4 years ago.
I am trying to extract an ID from a string with python3. The regex returns more then one item, despite only having one in the text:
text_total = 'Lore Ippsum Ref. 116519LN Perlmutt'
>>> re.findall(r"Ref\.? ?(([A-Z\d\.]+)|([\d.]+))", text_total)
[('116519LN', '116519LN', '')]
I am looking for a single trimed result, if possible without beeing a list anyway.
That's why my original line is:
[x for x in re.findall(r"Ref\.? ?(([A-Z\d\.]+)|([\d.]+))", text_total)][0]
The regex has an OR as I am also trying to match
Lore Ippsum Ref. 1166AB.39AZU2.123 Lore Ippsum
How can I retrieve just one result from the text and match both conditions?
Your groups inside your OR group, so to speak, are "capturing groups". You need to make them non capturing using the ?: syntax inside those groups, and allow the outer group to stay as a capturing group.
import re
text_total = 'Lore Ippsum Ref. 116519LN Perlmutt'
re.findall(r"Ref\.? ?((?:[A-Z\d\.]+)|(?:[\d.]+))", text_total)
#result ['116519LN']
Note that this still gets you multiple matches if there are many. You can use re.search for just first match.
You don't necessarily need an or, you can do Ref\.? ?([a-zA-Z. 0-9]+) (note the space at the end of the regex, it will be used as the ending for the match.
import re
pattern = r"Ref\.? ?([a-zA-Z. 0-9]+) "
text_total = "Lore Ippsum Ref. 116519LN Perlmutt"
results = re.findall(pattern, text_total)
print(results[0])

re.findall only finding half the patterns [duplicate]

This question already has answers here:
Why doesn't [01-12] range work as expected?
(7 answers)
Closed 4 years ago.
I'm using re.findall to parse the year and month from a string, however it is only outputting patterns from half the string. Why is this?
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('[1-2][0-9][0-9][0-9]-[1-12]', date_string)
print(find_year_and_month)
and my output is this:
['2011-1', '2012-1']
This is the current output for those dates but why am I only getting pattern matching for half the string?
[1-12] doesn't do what you think it does. It matches anything in the range 1 to 1, or it matches a 2.
See this question for some replacement regex options, like ([1-9]|1[0-2]): How to represent regex number ranges (e.g. 1 to 12)?
If you want an interactive tool for experimenting with regexes, I personally recommend Regexr.
Adjust your regex pattern as shown below:
import re
date_string = '2011-1-1_2012-1-3,2015-3-1_2015-3-3'
find_year_and_month = re.findall('([1-2][0-9]{3}-(?:1[0-2]|[1-9]))', date_string)
print(find_year_and_month)
The output:
['2011-1', '2012-1', '2015-3', '2015-3']

Capture repeated characters and split using Python [duplicate]

This question already has answers here:
How can I tell if a string repeats itself in Python?
(13 answers)
Closed 3 years ago.
I need to split a string by using repeated characters.
For example:
My string is "howhowhow"
I need output as 'how,how,how'.
I cant use 'how' directly in my reg exp. because my input varies. I should check the string whether it is repeating the character and need to split that characters.
import re
string = "howhowhow"
print(','.join(re.findall(re.search(r"(.+?)\1", string).group(1), string)))
OUTPUT
howhowhow -> how,how,how
howhowhowhow -> how,how,how,how
testhowhowhow -> how,how,how # not clearly defined by OP
The pattern is non-greedy so that howhowhowhow doesn't map to howhow,howhow which is also legitimate. Remove the ? if you prefer the longest match.
lengthofRepeatedChar = 3
str1 = 'howhowhow'
HowmanyTimesRepeated = int(len(str1)/lengthofRepeatedChar)
((str1[:lengthofRepeatedChar]+',')*HowmanyTimesRepeated)[:-1]
'how,how,how'
Works When u know the length of repeated characters

Categories

Resources