Related
I have a list of strings. Each element represents a field as key value separated by space:
listA = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
Behavior
I need to return a dict out of this list with expanding the keys like 'xyz0-1' by the range denoted by 0-1 into multiple keys like abcd1 and abcd2 with the same value like 4d4e.
It should run as part of an Ansible plugin, where Python 2.7 is used.
Expected
The end result would look like the dict below:
{
abcd1: 4d4e,
abcd2: 4d4e,
xyz0: 551,
xyz1: 551,
foo: 3ea,
bar1: 2bd,
mc-mqisd0: 77a,
mc-mqisd1: 77a,
mc-mqisd2: 77a,
}
Code
I have created below function. It is working with Python 3.
def listFln(listA):
import re
fL = []
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.sub('^(.+?)(\d+-\d+)?$',r'\1',aL)
cmpCountR = re.sub('^(.+?)(\d+-\d+)?$',r'\2',aL)
if cmpCountR.strip():
nStart = int(cmpCountR.split('-')[0])
nEnd = int(cmpCountR.split('-')[1])
for j in range(nStart,nEnd+1):
fL.append(comp + str(j) + ' ' + bL)
else:
fL.append(i)
return(dict([k.split() for k in fL]))
Error
In lower python versions like Python 2.7. this code throws an "unmatched group" error:
cmpCountR = re.sub('^(.+?)(\d+-\d+)?$',r'\2',aL)
File "/usr/lib64/python2.7/re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib64/python2.7/re.py", line 275, in filter
return sre_parse.expand_template(template, match)
File "/usr/lib64/python2.7/sre_parse.py", line 800, in expand_template
raise error, "unmatched group"
Anything wrong with the regex here?
Here's a simpler version using findall instead of sub, successfully tested on 2,7. It also directly creates the dict instead of first building a list:
mylist=[
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
import re
fL = {}
for i in listA:
aL = i.split()[0]
bL = i.split()[1]
comp = re.findall('^(.+?)(\d+-\d+)?$',aL)[0]
if comp[1]:
nStart = int(comp[1].split('-')[0])
nEnd = int(comp[1].split('-')[1])
for j in range(nStart,nEnd+1):
fL[comp[0]+str(j)] = bL
else:
fL[comp[0]] = bL
return fL
print(listFln(mylist))
# {'abcd1': '4d4e',
# 'abcd2': '4d4e',
# 'xyz0': '551',
# 'xyz1': '551',
# 'foo': '3ea',
# 'bar1': '2bd',
# 'mc-mqisd0': '77a',
# 'mc-mqisd1': '77a',
# 'mc-mqisd2': '77a'}
Used Python 2.7 to reproduce. This answer shows the issue with not found backreferences for re.sub in Python 2.7 and some patterns to fix.
Both patterns compile
import re
# both seem identical
regex1 = '^(.+?)(\d+-\d+)?$'
regex2 = '^(.+?)(\d+-\d+)?$'
# also the compiled pattern is identical, see hash
re.compile(regex1) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
re.compile(regex2) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
Note: The compiled pattern using re.compile() saves time when re-using multiple times like in this loop.
Fix: test for groups found
The error-message indicates that there are groups that aren't matched.
Put it other: In the matching result of re.sub (docs to 2.7) there are references to groups like the second capturing group (\2) that have not been found or captured in the given string input:
sre_constants.error: unmatched group
To fix this, we should test on groups that were found in the match.
Therefore we use re.match(regex, str) or the compiled variant pattern.match(str) to create a Match object, then Match.groups() to return all found groups as tuple.
import re
regex = '^(.+?)(\d+-\d+)?$' # a key followed by optional digits-range
pattern = re.compile(regex) # <_sre.SRE_Pattern object at 0x7f575ef8fd40>
def dict_with_expanded_digits(fields_list):
entry_list = []
for fields in fields_list:
(key_digits_range, value) = fields.split() # a pair of ('key0-1', 'value')
# test for match and groups found
match = pattern.match(key_digits_range)
print("DEBUG: groups:", match.groups()) # tuple containing all the subgroups of the match,
# watch: the 3rd iteration has only group(1), while group(2) is None
# break to next iteration here, if not maching pattern
if not match:
print('ERROR: no valid key! Will not add to dict.', fields)
continue
# if no 2nd group, only a single key,value
if not match.group(2):
print('WARN: key without range! Will add as single entry:', fields)
entry_list.append( (key_digits_range, value) )
continue # stop iteration here and continue with next
key = pattern.sub(r'\1', key_digits_range)
index_range = pattern.sub(r'\2', key_digits_range)
# no strip needed here
(start, end) = index_range.split('-')
for index in range(int(start), int(end)+1):
expanded_key = "{}{}".format(key, index)
entry = (expanded_key, value) # use tuple for each field entry (key, value)
entry_list.append(entry)
return dict([e for e in entry_list])
list_a = [
'abcd1-2 4d4e', # 2 entries
'xyz0-1 551', # 2 entries
'foo 3ea', # 1 entry
'bar1 2bd', # 1 entry
'mc-mqisd0-2 77a' # 3 entries
]
dict_a = dict_with_expanded_digits(list_a)
print("INFO: resulting dict with length: ", len(dict_a), dict_a)
assert len(dict_a) == 9
Prints:
('DEBUG: groups:', ('abcd', '1-2'))
('DEBUG: groups:', ('xyz', '0-1'))
('DEBUG: groups:', ('foo', None))
('WARN: key without range! Will add as single entry:', 'foo 3ea')
('DEBUG: groups:', ('bar1', None))
('WARN: key without range! Will add as single entry:', 'bar1 2bd')
('DEBUG: groups:', ('mc-mqisd', '0-2'))
('INFO: resulting dict with length: ', 9, {'bar1': '2bd', 'foo': '3ea', 'mc-mqisd2': '77a', 'mc-mqisd0': '77a', 'mc-mqisd1': '77a', 'xyz1': '551', 'xyz0': '551', 'abcd1': '4d4e', 'abcd2': '4d4e'})
Note on added improvements
renamed function and variables to express intend
used tuples where possible, e.g. assignment (start, end)
instead of re. methods used the equivalent methods of compiled pattern pattern.
the guard-statement if not match.group(2): avoids expanding the field and just adds the key-value as is
added assert to verify given list of 7 is expanded to dict of 9 as expected
You could use a single pattern with 4 capture groups, and check if the 3rd capture group value is not empty.
^(\S*?)(?:(\d+)-(\d+))?\s+(.*)
The pattern matches:
^ Start of string
\S*?) Capture group 1, match optional non whitespace chars, as few as possible
(?:(\d+)-(\d+))? Optionally capture 1+ digits in group 2 and group 3 with a - in between
(.*) Capture group 4, match the rest of the line
Regex demo | Python demo
Code example (works on Python 2 and Python 3)
import re
strings = [
'abcd1-2 4d4e',
'xyz0-1 551',
'foo 3ea',
'bar1 2bd',
'mc-mqisd0-2 77a'
]
def listFln(listA):
dct = {}
for s in listA:
lst = sum(re.findall(r"^(\S*?)(?:(\d+)-(\d+))?\s+(.*)", s), ())
if lst and lst[2]:
for i in range(int(lst[1]), int(lst[2]) + 1):
dct[lst[0] + str(i)] = lst[3]
else:
dct[lst[0]] = lst[3]
return dct
print(listFln(strings))
Output
{
'abcd1': '4d4e',
'abcd2': '4d4e',
'xyz0': '551',
'xyz1': '551',
'foo': '3ea',
'bar1': '2bd',
'mc-mqisd0': '77a',
'mc-mqisd1': '77a',
'mc-mqisd2': '77a'
}
With text, type, scope being string and val, altval being int, why is the following code syntatically not correct? (I know this isn't the correct way to do it aesthatically but would that affect syntax?)
result = [(i[val:] if scope=="before" else i[:val] if scope=="after" else i[val:altval] if scope=="beforeafter" else i) if j<=until for j,i in enumerate(text.split("\n"))]
Broken down into lines:
result = [
(i[val:] if scope=="before"
else i[:val] if scope=="after"
else i[val:altval] if scope=="beforeafter"
else i) if j<=until
for j,i in enumerate(text.split("\n"))]
With lines split up as this, the SyntaxError is at the last line:
for j,i in enumerate(text.split("\n"))]
^
Version: Python 3.x
System: Windows
What am I missing? Is it a list comprehension limitation?
Simply move the last if condition after for:
result = [
(
i[val:] if scope=="before"
else i[:val] if scope=="after"
else i[val:altval] if scope=="beforeafter"
else i
) for j, i in enumerate(text.split("\n"))
if j <= until
]
Online Demo
I have a list of filenames, some of them have only text, some of them have text and number, and some of them have all.
Example:
[ 'mango_1.jpg', 'dog005.jpg', 'guru_2018_01_01.png', 'dog008.jpg', 'mango_6.jpg', 'guru_2018_5_23.png', 'dog01.png', 'mango_11.jpg', 'mango2.jpg', 'guru_2018_02_5.png', 'guru_2019_08_23.jpg', 'dog9.jpg', 'mango05.jpg' ]
My Code is :
import re
## ref: https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
def sort_nicely( l ):
""" Sort the given list in the way that humans expect.
"""
convert = lambda text: int(text) if text.isdigit() else text
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
l.sort( key=alphanum_key )
return print(l)
Actual output:
['dog01.png', 'dog005.jpg', 'dog008.jpg', 'dog9.jpg', 'guru_2018_01_01.png', 'guru_2018_02_5.png', 'guru_2018_5_23.png', 'guru_2019_08_23.jpg', 'mango2.jpg', 'mango05.jpg', 'mango_1.jpg', 'mango_6.jpg', 'mango_11.jpg']
Expected output:
['dog01.png', 'dog005.jpg', 'dog008.jpg', 'dog9.jpg', 'guru_2018_01_01.png', 'guru_2018_02_5.png', 'guru_2018_5_23.png', 'guru_2019_08_23.jpg', 'mango_1.jpg', 'mango2.jpg', 'mango5.jpg', 'mango_6.jpg', 'mango_11.jpg']
How do I get the expected output?
Looks like you are not giving any significance to _ character in that case, modify your code to exclude that
import re
## ref: https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
def sort_nicely( l ):
""" Sort the given list in the way that humans expect.
"""
convert = lambda text: int(text) if text.isdigit() else text
alphanum_key = lambda key: [ convert(c.replace("_","")) for c in re.split('([0-9]+)', key) ]
l.sort( key=alphanum_key )
return print(l)
```
From what I understand, you want to sort according to the "text" and then the "date" that may exist in each filename. So first you need a function that can split filenames into those two components:
def split(n):
# back-to-front, find first letter index
for (i, c) in enumerate(reversed(n)):
if not (c.isdigit() or c == '_'):
break
# proper (non-reversed) index
i = len(n) - i
# split into name and date
(n, t) = (n[:i], n[i:])
# split and remove extra underscores
t = filter(bool, t.split('_'))
# convert to integers and return
return n, tuple(map(int, t))
Then you need a function to get rid of any unwanted parts in filenames (like extensions):
import os
def parse(n):
(n, e) = os.path.splitext(n)
return split(n)
Now you can simply use this as a key in the built-in sorted function:
>>> sorted(l, key = parse)
['dog01.png', 'dog005.jpg', 'dog008.jpg', 'dog9.jpg', 'guru_2018_01_01.png', 'guru_2018_02_5.png', 'guru_2018_5_23.png', 'guru_2019_08_23.jpg', 'mango_1.jpg', 'mango2.jpg', 'mango05.jpg', 'mango_6.jpg', 'mango_11.jpg']
I'm trying to replace '=' with '==' in the following string:
log="[x] = '1' and [y] <> '7' or [z]='51'".
Unfortunately, only the second '=' is getting replaced. Why is the first one not being replaced and how do I replace the first one as well?
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
The output should be:
log="[x] == '1' and [y] <> '7' or [z]=='51'".
This is what i get instead:
log="[x] = '1' and [y] <> '7' or [z]=='51'".
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
This part goes through the matches and doesn't do any replacing.
Only when you leave the loop, you do replacing - that's why it changes only the last one. ;)
Also, you do replacing without using the regex - simple str.replace takes all substrings matches and replaces them. So if your first = didn't have space before, it would get changed anyway!
Looking at your regex, there is only one space possible between ] and =, so why not do the replacing on those two cases, instead of using regexes? ;)
def subs_equal_sign(logic):
return logic.replace(']=', ']==').replace('] =', ']==')
Maybe the replace() function is what you are looking for :
log="[x] = '1' and [y] <> '7' or [z]='51'"
log = log.replace("=", "==")
Change your function to
def subs_equal_sign(logic):
y = re.compile(r'\]\s?\=\s?')
return y.sub("]==", logic)
and the output will now be
>>> subs_equal_sign('''log="[x] = '1' and [y] <> '7' or [z]='51'".''')
'log="[x]==\'1\' and [y] <> \'7\' or [z]==\'51\'".'
as expected.
#h4z3 correctly pointed out that your key problem is iterating through the matched groups without doing anything to them. You can make it work by simply using re.sub() to replace all occurrences at once.
A quick way to deal with this is to remove the whitespace:
def subs_equal_sign(logic):
for k in range(len(logic))):
logic[k].replace(' ','')
y = re.compile(r'\]\s?\=\s?')
iterator = y.finditer(logic)
for match in iterator:
j = str(match.group())
return logic.replace(j, ']==')
Does the string represent the branching logic for a REDCap variable? If so, I wrote a function a while back that should convert REDCap's SQL-like syntax to a pythonic form. Here it is:
def make_pythonic(str):
"""
Takes the branching logic string of a field name
and converts the syntax to that of Python.
"""
# make list of all checkbox vars in branching_logic string
# NOTE: items in list have the same serialization (ordering)
# as in the string.
checkbox_snoop = re.findall('[a-z0-9_]*\([0-9]*\)', str)
# if there are entries in checkbox_snoop
if len(checkbox_snoop) > 0:
# serially replace "[mycheckboxvar(888)]" syntax of each
# checkbox var in the logic string with the appropraite
# "record['mycheckboxvar___888']" syntax
for item in checkbox_snoop:
item = re.sub('\)', '', item)
item = re.sub('\(', '___', item)
str = re.sub('[a-z0-9_]*\([0-9]*\)', item, str)
# mask and substitute
str = re.sub('<=', 'Z11Z', str)
str = re.sub('>=', 'X11X', str)
str = re.sub('=', '==', str)
str = re.sub('Z11Z', '<=', str)
str = re.sub('X11X', '>=', str)
str = re.sub('<>', '!=', str)
str = re.sub('\[', 'record[\'', str)
str = re.sub('\]', '\']', str)
# return the string
return str
This could replace the given character with the new char to be replaced in the entire string.
log=log.replace("=","==")#Replaces the given substring with new string
print(log)#Display
I am a beginner in python. I found a question in python that Given string in format "%0 is a %1 %2" and a tuple ("Ram", "good", "boy"). Means string contains %x where it should be replaced with the respective tuple element of index x. (after edit ): Forgot to mention, If given tuple is ("Ram", "good")., answer must be "Ram is a good %2" i.e, remaining %x should be left as it is
The result must be "Ram is a good boy". I did it like this(below is the code). But I came to know that it could be written in more efficient way , in less no. of lines...Could u pls help how ? Thanks in advance
format = "%0 is a %1 %2"
args = ("Ram", "good", "boy")
count = 0
for i in range(0, len(format) + 1):
if format[i] == '%':
b= '%'
b = b + format[i + 1]
format = format.replace(b, args[int(format[i+1])])
count+= 1
if count == len(args):
break
print format
I would use str.format, you can simply unpack the tuple:
args = ("Ram", "good", "boy")
print("{} is a {} {}".format(*args))
Ram is a good boy
If you need to manipulate the original string first use re.sub :
import re
"%2 and %1 and %0"
args = ("one", "two", "three")
print(re.sub(r"%\d+", lambda x: "{"+x.group()[1:]+"}", s).format(*args))
Output:
In [6]: s = "%2 and %1 and %0"
In [7]: re.sub(r"%\d+", lambda x: "{"+x.group()[1:]+"}", s).format(*args)
Out[7]: 'three and two and one'
In [8]: s = "%1 and %0 and %2"
In [9]: re.sub(r"%\d+",lambda x: "{"+x.group()[1:]+"}", s).format(*args)
Out[9]: 'two and one and three'
%\d+ matches a percent sign followed by 1 or more digits, the x in the lambda is a match object which we use .group to get the matched string from and slice just the digits wrapping the number string in {} to use as placeholders for str.format.
Re comment that you can have more placeholders than args, sub takes a count arg of the max amount of replacements to make:
s = "%0 is a %1 %2"
args = ("Ram", "Good")
sub = re.sub(r"%\d+\b", lambda x: "{"+x.group()[1:]+"}", s,count=len(args)).format(*args)
print(sub)
Output:
Ram is a Good %2
To work for arbitrary order, it is going to take more logic:
s = "%2 is a %1 %0"
args = ("Ram", "Good")
sub = re.sub(r"%\d+\b", lambda x: "{"+x.group()[1:]+"}" if int(x.group()[1:]) < len(args) else x.group(), s).format(*args)
print(sub)
Output:
%2 is a Good Ram
Moving the lambda logic to a function is a little nicer:
s = "%2 is a %1 %0"
args = ("Ram", "Good")
def f(x):
g = x.group()
return "{"+g[1:]+"}" if int(x.group()[1:]) < len(args) else g
sub = re.sub(r"%\d+\b",f, s).format(*args)
Or using split and join if the placeholders are always on their own:
print(" ".join(["{"+w[1:]+"}" if w[0] == "%" else w for w in s.split(" ")]).format(*args))
three and two and one
Maybe use string.replace to replace the various %xes with their tuple counterparts, like:
format = "%0 is a %1 %2"
args = ("Ram", "good", "boy")
result = format # Set it here in case args is the empty tuple
for index, arg in enumerate(args):
formatter = '%' + str(index) # "%0", "%1", etc
result = result.replace(formatter, arg)
print(result)
use the built in string formatting.
>>> print('%s is a %s %s' % ('Ram','good','boy'))
Ram is a good boy
According to your edit, you are looking for something different. You can use re.findall and re.sub to accomplish this:
>>> import re
>>> formatstring,args = "%0 is a %1 %2",("Ram", "good", "boy")
>>> for x in re.findall('(%\d+)',formatstring):
formatstring = re.sub(x,args[int(x[1:])],formatstring)
>>> formatstring
'Ram is a good boy'