Logical or groups regex - python

I am trying to make a regex that will find certain cases of incorrectly entered fractions, and return the numerator and denominator as groups.
These cases involve a space between the slash and a number: such as either 1 /2 or 1/ 2.
I use a logical-or operator in the regex, since I'd rather not have 2 separate patterns to check for:
r'(\d) /(\d)|(\d)/ (\d)'
(I'm not using \d+ since I'm more interested in the numbers directly bordering the division sign, though \d+ would work as well).
The problem is, when it matches one of the cases, say the second (1/ 2), looking at all the groups gives (None, None, '1', '2'), but I would like to have a regex that only returns 2 groups--in both cases, I would like the groups to be ('1', '2'). Is this possible?
Edit:
I would also like it to return groups ('1', '2') for the case 1 / 2, but to not capture anything for well-formed fractions like 1/2.

(\d)(?: /|/ | / )(\d) should do it (and only return incorrectly entered fractions). Notice the use of no-capture groups.
Edit: updated with comments below.

What about just using (\d)\s*/\s*(\d)?
That way you will always have only two groups:
>>> import re
>>> regex = r'(\d)\s*/\s*(\d)'
>>> re.findall(regex, '1/2')
[('1', '2')]
>>> re.findall(regex, '1 /2')
[('1', '2')]
>>> re.findall(regex, '1/ 2')
[('1', '2')]
>>> re.findall(regex, '1 / 2')
[('1', '2')]
>>>

Related

Python regex named result with variable

I'm writing code to parse SVG's transform command in Python3.7:
t = "translate(44,22) rotate(55,6,7) scale(2)"
num = "[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?"
types = "matrix|translate|rotate|scale|skewX|skewY"
regex = f"({types})\((?P<arg1>{num})(?:,?(?P<argi>{num}))*\)" # <- 'i' as an increasing number
matches = re.finditer(regex, t)
print(match.groupdict())
The types in input string t could have up to 6 parameters inside of the parentheses ('matrix' has 6, others have fewer). I'd like to use groupdict() to give me numbered arguments arg-1, arg-2, arg-3, etc. depending on how many finditer has found. That means that the named match needs to be a variable that's increasing.
I've tried some obvious stuff and looked at the docs. Neither got it working for me.
So... is it possible? Am I thinking about this the wrong way? Thanks!
If there can only be up to 6 arguments inside the parentheses, use six (?:,(?P<argX>{num}))? optional groups (where X is a digit from 1 to 6) to match 1 to 6 patterns matching the arguments, and then discard all the groupdict items that have None value:
import re
t = "translate(44,22) rotate(55,6,7) scale(2)"
num = "[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?"
types = "matrix|translate|rotate|scale|skewX|skewY"
regex = f"({types})\((?P<arg1>{num})(?:,(?P<arg2>{num}))?(?:,(?P<arg3>{num}))?(?:,(?P<arg4>{num}))?(?:,(?P<arg5>{num}))?(?:,(?P<arg6>{num}))?\)" # <- 'i' as an increasing number
for match in re.finditer(regex, t):
print({k:v for k,v in match.groupdict().items() if v is not None})
See the Python demo yielding
{'arg1': '44', 'arg2': '22'}
{'arg1': '55', 'arg2': '6', 'arg3': '7'}
{'arg1': '2'}
Maybe you can use ast.literal_eval with re to parse the parameters, for example:
import re
from ast import literal_eval
t = "translate(44,22) rotate(55,6,7) scale(2)"
types = "matrix|translate|rotate|scale|skewX|skewY"
print([(f, literal_eval('(' + s + ',)')) for f, s in re.findall(fr'({types})\(([^)]+)', t)])
Prints:
[('translate', (44, 22)), ('rotate', (55, 6, 7)), ('scale', (2,))]

How to Identify Repetitive Characters in a String Using Python?

I am new to python and I want to write a program that determines if a string consists of repetitive characters. The list of strings that I want to test are:
Str1 = "AAAA"
Str2 = "AGAGAG"
Str3 = "AAA"
The pseudo-code that I come up with:
WHEN len(str) % 2 with zero remainder:
- Divide the string into two sub-strings.
- Then, compare the two sub-strings and check if they have the same characters, or not.
- if the two sub-strings are not the same, divide the string into three sub-strings and compare them to check if repetition occurs.
I am not sure if this is applicable way to solve the problem, Any ideas how to approach this problem?
Thank you!
You could use the Counter library to count the most common occurrences of the characters.
>>> from collections import Counter
>>> s = 'abcaaada'
>>> c = Counter(s)
>>> c.most_common()
[('a', 5), ('c', 1), ('b', 1), ('d', 1)]
To get the single most repetitive (common) character:
>>> c.most_common(1)
[('a', 5)]
You could do this using a RegX backreferences.
To find a pattern in Python, you are going to need to use "Regular Expressions". A regular expression is typically written as:
match = re.search(pat, str)
This is usually followed by an if-statement to determine if the search succeeded.
for example this is how you would find the pattern "AAAA" in a string:
import re
string = ' blah blahAAAA this is an example'
match = re.search(r'AAAA', string)
if match:
print 'found', match.group()
else:
print 'did not find'
This returns "found 'AAAA'"
Do the same for your other two strings and it will work the same.
Regular expressions can do a lot more than just this so work around with them and see what else they can do.
Assuming you mean the whole string is a repeating pattern, this answer has a good solution:
def principal_period(s):
i = (s+s).find(s, 1, -1)
return None if i == -1 else s[:i]

Zip string-subset from tuples in a list

With a structure like this
hapts = [('1|2', '1|2'), ('3|4', '3|4')]
I need to zip it (sort of...) to get the following:
end = ['1|1', '2|2', '3|3', '4|4']
I started working with the following code:
zipped=[]
for i in hapts:
tete = zip(i[0][0], i[1][0])
zipped.extend(tete)
some = zip(i[0][2], i[1][2])
zipped.extend(some)
... and got it zipped like this:
zipped = [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Any suggestions on how to continue? Furthermore i'm sure there should a more elegant way to do this, but is hard to pass to Google an accurate definition of the question ;)
Thx!
You are very close to solving this, I would argue the best solution here is a simple str.join() in a list comprehension:
["|".join(values) for values in zipped]
This also has the bonus of working nicely with (potentially) more values, without modification.
If you wanted tuples (which is not what your requested output shows, as brackets don't make a tuple, a comma does), then it is trivial to add that in:
[("|".join(values), ) for values in zipped]
Also note that zipped can be produced more effectively too:
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1', '2|2', '3|3', '4|4']
And to show what I meant before about handling more values elegantly:
>>> hapts = [('1|2|3', '1|2|3', '1|2|3'), ('3|4|5', '3|4|5', '3|4|5')]
>>> zipped = itertools.chain.from_iterable(zip(*[part.split("|") for part in group]) for group in hapts)
>>> ["|".join(values) for values in zipped]
['1|1|1', '2|2|2', '3|3|3', '3|3|3', '4|4|4', '5|5|5']
The problem in this context is to
unfold the list
reformat it
fold it
Here is how you may approach the problem
>>> reformat = lambda t: map('|'.join, izip(*(e.split("|") for e in t)))
>>> list(chain(*(reformat(t) for t in hapts)))
['1|1', '2|2', '3|3', '4|4']
You don't need the working code in this context
Instead if you need to work on your output, just rescan it and join it with "|"
>>> ['{}|{}'.format(*t) for t in zipped]
['1|1', '2|2', '3|3', '4|4']
Note
Parenthesis are redundant in your output
Your code basically works, but here's a more elegant way to do it.
First define a transposition function that takes an entry of hapts and flips it:
>>> transpose = lambda tup: zip(*(y.split("|") for y in tup))
Then map that function over hapts:
>>> map(transpose, hapts)
... [[('1', '1'), ('2', '2')], [('3', '3'), ('4', '4')]]
and then if you want to flatten this into one list
>>> y = list(chain.from_iterable(map(transpose, hapts)))
... [('1', '1'), ('2', '2'), ('3', '3'), ('4', '4')]
Finally, to join it back up into strings again:
>>> map("|".join, y)
... ['1|1', '2|2', '3|3', '4|4']
end = []
for groups in hapts:
end.extend('|'.join(regrouped) for regrouped in zip([group.split('|') for group in groups]))
This should also continue to work with n-length groups of n-length pipe-delimited characters, and n-length groups of groups, though it will truncate the regrouped values to the shortest group of characters in each group of character groups.

Detect repetitions in string

I have a simple problem, but can't come with a simple solution :)
Let's say I have a string. I want to detect if there is a repetition in it.
I'd like:
"blablabla" # => (bla, 3)
"rablabla" # => (bla, 2)
The thing is I don't know what pattern I am searching for (I don't have "bla" as input).
Any idea?
EDIT:
Seeing the comments, I think I should precise a bit more what I have in mind:
In a string, there is either a pattern that is repeted or not.
The repeted pattern can be of any length.
If there is a pattern, it would be repeted over and over again until the end. But the string can end in the middle of the pattern.
Example:
"testblblblblb" # => ("bl",4)
import re
def repetitions(s):
r = re.compile(r"(.+?)\1+")
for match in r.finditer(s):
yield (match.group(1), len(match.group(0))/len(match.group(1)))
finds all non-overlapping repeating matches, using the shortest possible unit of repetition:
>>> list(repetitions("blablabla"))
[('bla', 3)]
>>> list(repetitions("rablabla"))
[('abl', 2)]
>>> list(repetitions("aaaaa"))
[('a', 5)]
>>> list(repetitions("aaaaablablabla"))
[('a', 5), ('bla', 3)]

Regular Expression to test the presence and match a string at the same time

I would like to determine whether a string S has a substring MYSUBSTRING preceded by two consecutive digits I need to determine.
For example:
'aaa79bbbMYSUBSTRINGccc'
==> I want 7, 9 and True (or 7, 9 and MYSUBSTRING)
'aaa79bbbccc'
==> I want 7, 9 and False (or 7, 9 and None)
Can I do that with a SINGLE regex? If so, which one?
The following regex should do it:
(\d)(\d)(?:.*?(MYSUBSTRING))?
>>> re.search(r'(\d)(\d)(?:.*?(MYSUBSTRING))?', 'aaa79bbbMYSUBSTRINGccc').groups()
('7', '9', 'MYSUBSTRING')
>>> re.search(r'(\d)(\d)(?:.*?(MYSUBSTRING))?', 'aaa79bbbccc').groups()
('7', '9', None)
A fun problem. This monstrosity:
(\d)(\d)(.(?!(MYSUBSTRING)))*.?(MYSUBSTRING)?
Seems to work for me.
Broken down:
(\d)(\d) # capture 2 digits
(.(?!(MYSUBSTRING)))* # any characters not preceded by MYSUBSTRING
.? # the character immediately before MYSUBSTRINg
(MYSUBSTRING)? # MYSUBSTRING, if it exists
Sure, you can use (\d)(\d).*?(MYSUBSTRING)?. In Python, you would use this in the re.search function like so:
s = ... # your string
m = re.search(r'(\d)(\d).*?(MYSUBSTRING)?', s)
m.group(1) # first digit
m.group(2) # second digit
m.group(3) # the substring, or None if it didn't match

Categories

Resources