Python regex named result with variable - python

I'm writing code to parse SVG's transform command in Python3.7:
t = "translate(44,22) rotate(55,6,7) scale(2)"
num = "[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?"
types = "matrix|translate|rotate|scale|skewX|skewY"
regex = f"({types})\((?P<arg1>{num})(?:,?(?P<argi>{num}))*\)" # <- 'i' as an increasing number
matches = re.finditer(regex, t)
print(match.groupdict())
The types in input string t could have up to 6 parameters inside of the parentheses ('matrix' has 6, others have fewer). I'd like to use groupdict() to give me numbered arguments arg-1, arg-2, arg-3, etc. depending on how many finditer has found. That means that the named match needs to be a variable that's increasing.
I've tried some obvious stuff and looked at the docs. Neither got it working for me.
So... is it possible? Am I thinking about this the wrong way? Thanks!

If there can only be up to 6 arguments inside the parentheses, use six (?:,(?P<argX>{num}))? optional groups (where X is a digit from 1 to 6) to match 1 to 6 patterns matching the arguments, and then discard all the groupdict items that have None value:
import re
t = "translate(44,22) rotate(55,6,7) scale(2)"
num = "[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?"
types = "matrix|translate|rotate|scale|skewX|skewY"
regex = f"({types})\((?P<arg1>{num})(?:,(?P<arg2>{num}))?(?:,(?P<arg3>{num}))?(?:,(?P<arg4>{num}))?(?:,(?P<arg5>{num}))?(?:,(?P<arg6>{num}))?\)" # <- 'i' as an increasing number
for match in re.finditer(regex, t):
print({k:v for k,v in match.groupdict().items() if v is not None})
See the Python demo yielding
{'arg1': '44', 'arg2': '22'}
{'arg1': '55', 'arg2': '6', 'arg3': '7'}
{'arg1': '2'}

Maybe you can use ast.literal_eval with re to parse the parameters, for example:
import re
from ast import literal_eval
t = "translate(44,22) rotate(55,6,7) scale(2)"
types = "matrix|translate|rotate|scale|skewX|skewY"
print([(f, literal_eval('(' + s + ',)')) for f, s in re.findall(fr'({types})\(([^)]+)', t)])
Prints:
[('translate', (44, 22)), ('rotate', (55, 6, 7)), ('scale', (2,))]

Related

Python package for converting finite regex to a text array? [duplicate]

Suppose I have the following string:
trend = '(A|B|C)_STRING'
I want to expand this to:
A_STRING
B_STRING
C_STRING
The OR condition can be anywhere in the string. i.e STRING_(A|B)_STRING_(C|D)
would expand to
STRING_A_STRING_C
STRING_B_STRING C
STRING_A_STRING_D
STRING_B_STRING_D
I also want to cover the case of an empty conditional:
(|A_)STRING would expand to:
A_STRING
STRING
Here's what I've tried so far:
def expandOr(trend):
parenBegin = trend.index('(') + 1
parenEnd = trend.index(')')
orExpression = trend[parenBegin:parenEnd]
originalTrend = trend[0:parenBegin - 1]
expandedOrList = []
for oe in orExpression.split("|"):
expandedOrList.append(originalTrend + oe)
But this is obviously not working.
Is there any easy way to do this using regex?
Here's a pretty clean way. You'll have fun figuring out how it works :-)
def expander(s):
import re
from itertools import product
pat = r"\(([^)]*)\)"
pieces = re.split(pat, s)
pieces = [piece.split("|") for piece in pieces]
for p in product(*pieces):
yield "".join(p)
Then:
for s in ('(A|B|C)_STRING',
'(|A_)STRING',
'STRING_(A|B)_STRING_(C|D)'):
print s, "->"
for t in expander(s):
print " ", t
displays:
(A|B|C)_STRING ->
A_STRING
B_STRING
C_STRING
(|A_)STRING ->
STRING
A_STRING
STRING_(A|B)_STRING_(C|D) ->
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
import exrex
trend = '(A|B|C)_STRING'
trend2 = 'STRING_(A|B)_STRING_(C|D)'
>>> list(exrex.generate(trend))
[u'A_STRING', u'B_STRING', u'C_STRING']
>>> list(exrex.generate(trend2))
[u'STRING_A_STRING_C', u'STRING_A_STRING_D', u'STRING_B_STRING_C', u'STRING_B_STRING_D']
I would do this to extract the groups:
def extract_groups(trend):
l_parens = [i for i,c in enumerate(trend) if c == '(']
r_parens = [i for i,c in enumerate(trend) if c == ')']
assert len(l_parens) == len(r_parens)
return [trend[l+1:r].split('|') for l,r in zip(l_parens,r_parens)]
And then you can evaluate the product of those extracted groups using itertools.product:
expr = 'STRING_(A|B)_STRING_(C|D)'
from itertools import product
list(product(*extract_groups(expr)))
Out[92]: [('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D')]
Now it's just a question of splicing those back onto your original expression. I'll use re for that :)
#python3.3+
def _gen(it):
yield from it
p = re.compile('\(.*?\)')
for tup in product(*extract_groups(trend)):
gen = _gen(tup)
print(p.sub(lambda x: next(gen),trend))
STRING_A_STRING_C
STRING_A_STRING_D
STRING_B_STRING_C
STRING_B_STRING_D
There's probably a more readable way to get re.sub to sequentially substitute things from an iterable, but this is what came off the top of my head.
It is easy to achieve with sre_yield module:
>>> import sre_yield
>>> trend = '(A|B|C)_STRING'
>>> strings = list(sre_yield.AllStrings(trend))
>>> print(strings)
['A_STRING', 'B_STRING', 'C_STRING']
The goal of sre_yield is to efficiently generate all values that can match a given regular expression, or count possible matches efficiently... It does this by walking the tree as constructed by sre_parse (same thing used internally by the re module), and constructing chained/repeating iterators as appropriate. There may be duplicate results, depending on your input string though -- these are cases that sre_parse did not optimize.

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).

How to handle missing data in split?

I want to split a string where a part may be missing.
E.g., "foo-bar" should be split into "foo" and "bar" while "zot" into "zot" and None.
foo,bar = line.split('-',1)
works for the first case but not for the second one:
ValueError: need more than 1 value to unpack
I can go, of course, the long way:
foobar = line.split('-',1)
if len(foobar) == 2:
foo,bar = foobar
else:
foo,bar = foobar[0],None
but I wonder if this is the most "pythonic" way.
Catch the exception:
try:
foo, bar = line.split('-', 1)
except ValueError:
# not enough values
foo, bar = line, None
Note that you'd need to split once to get two values, not two times.
For this exact example, I'd use the partition method.
>>> 'foo-bar'.partition('-')
('foo', '-', 'bar')
>>> 'foobar'.partition('-')
('foobar', '', '')
>>> 'foo-bar-baz'.partition('-')
('foo', '-', 'bar-baz')
For the general case where there's more than one split, but still a known number, I usually check the length of the result of split, but Martijn is (unsurprisingly) right that catching the exception is fine too and is probably a better choice if strings missing the delimiter are uncommon.
using list Comprehension:
i=['ff-bb','cc','dd-ss-vv']
[string+[None] if len(string)==1 else string for string in [x.split('-') for x in i]]
returns
[['ff', 'bb'], ['cc', None], ['dd', 'ss', 'vv']]

How to convert a malformed string to a dictionary?

I have a string s (note that the a and b are not enclosed in quotation marks, so it can't directly be evaluated as a dict):
s = '{a:1,b:2}'
I want convert this variable to a dict like this:
{'a':1,'b':2}
How can I do this?
This will work with your example:
import ast
def elem_splitter(s):
return s.split(':',1)
s = '{a:1,b:2}'
s_no_braces = s.strip()[1:-1] #s.translate(None,'{}') is more elegant, but can fail if you can have strings with '{' or '}' enclosed.
elements = (elem_splitter(ss) for ss in s_no_braces.split(','))
d = dict((k,ast.literal_eval(v)) for k,v in elements)
Note that this will fail if you have a string formatted as:
'{s:"foo,bar",ss:2}' #comma in string is a problem for this algorithm
or:
'{s,ss:1,v:2}'
but it will pass a string like:
'{s ss:1,v:2}' #{"s ss":1, "v":2}
You may also want to modify elem_splitter slightly, depending on your needs:
def elem_splitter(s):
k,v = s.split(':',1)
return k.strip(),v # maybe `v.strip() also?`
*Somebody else might cook up a better example using more of the ast module, but I don't know it's internals very well, so I doubt I'll have time to make that answer.
As your string is malformed as both json and Python dict so you neither can use json.loads not ast.literal_eval to directly convert the data.
In this particular case, you would have to manually translate it to a Python dictionary by having knowledge of the input data
>>> foo = '{a:1,b:2}'
>>> dict(e.split(":") for e in foo.translate(None,"{}").split(","))
{'a': '1', 'b': '2'}
As Updated by Tim, and my short-sightedness I missed the fact that the values should be integer, here is an alternate implementation
>>> {k: int(v) for e in foo.translate(None,"{}").split(",")
for k, v in [e.split(":")]}
{'a': 1, 'b': 2}
import re,ast
regex = re.compile('([a-z])')
ast.literal_eval(regex.sub(r'"\1"', s))
out:
{'a': 1, 'b': 2}
EDIT:
If you happen to have something like {foo1:1,bar:2} add an additional capture group to the regex:
regex = re.compile('(\w+)(:)')
ast.literal_eval(regex.sub(r'"\1"\2', s))
You can do it simply with this:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
to_int = lambda x: int(x) if x.isdigit() else x
d = dict((to_int(i) for i in pair.split(":", 1)) for pair in content.split(","))
For simplicity I've omitted exception handling if the string doesn't contain a valid specification, and also this version doesn't strip whitespace, which you may want. If the interpretation you prefer is that the key is always a string and the value is always an int, then it's even easier:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
d = dict((int(pair[0]), pair[1].strip()) for pair in content.split(","))
As a bonus, this version also strips whitespace from the key to show how simple it is.
import simplejson
s = '{a:1,b:2}'
a = simplejson.loads(s)
print a

Inserting a character at regular intervals in a list

I am trying to convert 10000000C9ABCDEF to 10:00:00:00:c9:ab:cd:ef
This is needed because 10000000C9ABCDEF format is how I see HBAs or host bust adapaters when I login to my storage arrays. But the SAN Switches understand 10:00:00:00:c9:ab:cd:ef notation.
I have only been able to accomplish till the following:
#script to convert WWNs to lowercase and add the :.
def wwn_convert():
while True:
wwn = (input('Enter the WWN or q to quit- '))
list_wwn = list(wwn)
list_wwn = [x.lower() for x in list_wwn]
lower_wwn = ''.join(list_wwn)
print(lower_wwn)
if wwn == 'q':
break
wwn_convert()
I tried ':'.join, but that inserts : after each character, so I get 1:0:0:0:0:0:0:0:c:9:a:b:c:d:e:f
I want the .join to go through a loop where I can say something like for i in range (0, 15, 2) so that it inserts the : after two characters, but not quite sure how to go about it. (Good that Python offers me to loop in steps of 2 or any number that I want.)
Additionally, I will be thankful if someone could direct me to pointers where I could script this better...
Please help.
I am using Python Version 3.2.2 on Windows 7 (64 Bit)
Here is another option:
>>> s = '10000000c9abcdef'
>>> ':'.join(a + b for a, b in zip(*[iter(s)]*2))
'10:00:00:00:c9:ab:cd:ef'
Or even more concise:
>>> import re
>>> ':'.join(re.findall('..', s))
'10:00:00:00:c9:ab:cd:ef'
>>> s = '10000000C9ABCDEF'
>>> ':'.join([s[x:x+2] for x in range(0, len(s)-1, 2)])
'10:00:00:00:C9:AB:CD:EF'
Explanation:
':'.join(...) returns a new string inserting ':' between the parts of the iterable
s[x:x+2] returns a substring of length 2 starting at x from s
range(0, len(s) - 1, 2) returns a list of integers with a step of 2
so the list comprehension would split the string s in substrings of length 2, then the join would put them back together but inserting ':' between them.
>>> s='10000000C9ABCDEF'
>>> si=iter(s)
>>> ':'.join(c.lower()+next(si).lower() for c in si)
>>> '10:00:00:00:c9:ab:cd:ef'
In lambda form:
>>> (lambda x: ':'.join(c.lower()+next(x).lower() for c in x))(iter(s))
'10:00:00:00:c9:ab:cd:ef'
I think what would help you out the most is a construction in python called a slice. I believe that you can use them on any iterable object, including strings, making them quite useful and something that is generally a very good idea to know how to use.
>>> s = '10000000C9ABCDEF'
>>> [s.lower()[i:i+2] for i in range(0, len(s)-1, 2)]
['10', '00', '00', '00', 'c9', 'ab', 'cd', 'ef']
>>> ':'.join([s.lower()[i:i+2] for i in range(0, len(s)-1, 2)])
'10:00:00:00:c9:ab:cd:ef'
If you'd like to read some more about slices, they're explained very nicely in this question, as well as a part of the actual python documentation.
It may be done using grouper recipe from here.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
Using this function, the code will look like:
def join(it):
for el in it:
yield ''.join(el)
':'.join(join(grouper(2, s)))
It works this way:
grouper(2,s) returns tuples '1234...' -> ('1','2'), ('3','4') ...
def join(it) does this: ('1','2'), ('3','4') ... -> '12', '34' ...
':'.join(...) creates a string from iterator: '12', '34' ... -> '12:34...'
Also, it may be rewritten as:
':'.join(''.join(el) for el in grouper(2, s))
Here is my simple, straightforward solution:
s = '10000000c9abcdef'
new_s = str()
for i in range(0, len(s)-1, 2):
new_s += s[i:i+2]
if i+2 < len(s):
new_s += ':'
>>> new_s
'10:00:00:00:c9:ab:cd:ef'

Categories

Resources