Parsing devicetree with pyparsing into structured dictionary

Parsing devicetree with pyparsing into structured dictionary - python

For my C++ RTOS I'm writing a parser of devicetree "source" files (.dts) in Python using pyparsing module. I'm able to parse the structure of the devicetree into a (nested) dictionary, where the property name or node name are dictionary keys (strings), and property values or nodes are dictionary values (either string or a nested dictionary).
Let's assume I have the following example devicetree structure:
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
I'm able to parse that into something like that:
{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
'property11': ['string11'],
'property12': ['string12']},
'node2': {'property21': ['string21'], 'property22': ['string22']},
'property1': ['string1'],
'property2': ['string2']}}
However for my needs I would prefer to have this data structured differently. I would like to have all properties as a nested dictionary for key "properties", and all child nodes as a nested dictionary for key "children". The reason is that the devicetree (especially nodes) have some "metadata" which I would like to have just as key-value pairs, which requires me to move actual "contents" of the node one level "lower" to avoid any name conflicts for the key. So I would prefer the example above to look like this:
{'/': {
'properties': {
'property1': ['string1'],
'property2': ['string2']
},
'nodes': {
'node1': {
'properties': {
'property11': ['string11'],
'property12': ['string12']
}
'nodes': {
'node11': {
'properties': {
'property111': ['string111'],
'property112': ['string112']
}
'nodes': {
}
}
}
},
'node2': {
'properties': {
'property21': ['string21'],
'property22': ['string22']
}
'nodes': {
}
}
}
}
}
I've tried to add "name" to the parsing tokens, but this results in "doubled" dictionary elements (which is expected, as this behaviour is described in pyparsing documentation). This might not be a problem, but technically a node or property can be named "properties" or "children" (or whatever I choose), so I don't think such solution is robust.
I've also tried to use setParseAction() to convert the token into a dictionary fragment (I hoped that I could transform {'key': 'value'} into {'properties': {'key': 'value'}}), but this did not work at all...
Is this at all possible directly with pyparsing? I'm prepared to just do a second phase to transform the original dictionary to whatever structure I need, but as a perfectionist I would prefer to use a single-run pyparsing-only solution - if possible.
For a reference here's a sample code (Python 3) which transforms the devicetree source into an "unstructured" dictionary. Please note that this code is just a simplification which doesn't support all the features found in .dts (any data type other than string, value lists, unit-addresses, labels and so on) - it just supports string properties and node nesting.
#!/usr/bin/env python
import pyparsing
import pprint
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
dictionary = rootNode.parseString("""
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
""").asDict()
pprint.pprint(dictionary, width = 120)

You are really so close. I just did the following:
added Groups and results names for your "properties" and "nodes" sub-sections
changed some of the punctuation literals to CONSTANTS (Literal("};") will fail to match if there is space between the closing brace and semicolon, but RBRACE + SEMI will accommodate whitespace)
removed the outermost Dict on rootNode
Code:
LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ
+ pyparsing.Group(propertyValue)
+ SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI))
Converting to a dict with asDict and printing with pprint gives:
pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
'properties': {'property111': ['string111'],
'property112': ['string112']}}},
'properties': {'property11': ['string11'],
'property12': ['string12']}},
'node2': {'children': [],
'properties': {'property21': ['string21'],
'property22': ['string22']}}},
'properties': {'property1': ['string1'], 'property2': ['string2']}}
You can also use the dump() method that is included with pyparsing's ParseResults class, to help visualized the list and dict/namespace-style access to the results as-is, without any conversion call necessary
print(result[0].dump())
[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
- node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
- children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
- node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
- children: []
- properties: [['property111', ['string111']], ['property112', ['string112']]]
- property111: ['string111']
- property112: ['string112']
- properties: [['property11', ['string11']], ['property12', ['string12']]]
- property11: ['string11']
- property12: ['string12']
- node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
- children: []
- properties: [['property21', ['string21']], ['property22', ['string22']]]
- property21: ['string21']
- property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
- property1: ['string1']
- property2: ['string2']

Related

How to parse file with different structures in python

I am working on a file where data with a lot of structures. But I cannot figure out an efficient way to handle all of these. My idea is read line by line and find paratheses in pair. Is there any efficient way to match paratheses then I handle each type in specific logic?
Here is the file I am facing:
.....
# some header info that can be discarded
object node {
name R2-12-47-3_node_453;
phases ABCN;
voltage_A 7200+0.0j;
voltage_B -3600-6235j;
voltage_C -3600+6235j;
nominal_voltage 7200;
bustype SWING;
}
...
# a lot of objects node
object triplex_meter {
name R2-12-47-3_tm_403;
phases AS;
voltage_1 120;
voltage_2 120;
voltage_N 0;
nominal_voltage 120;
}
....
# a lot of object triplex_meter
object triplex_line {
groupid Triplex_Line;
name R2-12-47-3_tl_409;
phases AS;
from R2-12-47-3_tn_409;
to R2-12-47-3_tm_409;
length 30;
configuration triplex_line_configuration_1;
}
...
# a lot of object triplex_meter
#some nested objects...awh...
So my question is there way to quickly match "{" and "}" so that I can focus on the type inside.
I am expecting some logic like after parsing the file:
if obj_type == "node":
# to do 1
elif obj_type == "triplex_meter":
# to do 2
It seems easy to deal with this structure, but I am not sure exactly where to get started.

Code with comments
file = """
object node {
name R2-12-47-3_node_453
phases ABCN
voltage_A 7200+0.0j
voltage_B - 3600-6235j
voltage_C - 3600+6235j
nominal_voltage 7200
bustype SWING
}
object triplex_meter {
name R2-12-47-3_tm_403
phases AS
voltage_1 120
voltage_2 120
voltage_N 0
nominal_voltage 120
}
object triplex_line {
groupid Triplex_Line
name R2-12-47-3_tl_409
phases AS
from R2-12-47-3_tn_409
to R2-12-47-3_tm_409
length 30
configuration triplex_line_configuration_1
}"""
# New python dict
data = {}
# Generate a list with all object taken from file
x = file.replace('\n', '').replace(' - ', ' ').strip().split('object ')
for i in x:
# Exclude null items in the list to avoid errors
if i != '':
# Hard split
a, b = i.split('{')
c = b.split(' ')
# Generate a new list with non null elements
d = [e.replace('}', '') for e in c if e != '' and e != ' ']
# Needing a sub dict here for paired values
sub_d = {}
# Iterating over list to get paired values
for index in range(len(d)):
# We are working with paired values so we unpack only pair indexes
if index % 2 == 0:
# Inserting paired values in sub_dict
sub_d[d[index]] = d[index+1]
# Inserting sub_dict in main dict "data" using object name
data[a.strip()] = sub_d
print(data)
Output
{'node': {'name': 'R2-12-47-3_node_453', 'phases': 'ABCN', 'voltage_A': '7200+0.0j', 'voltage_B': '3600-6235j', 'voltage_C': '3600+6235j', 'nominal_voltage': '7200', 'bustype': 'SWING'}, 'triplex_meter': {'name': 'R2-12-47-3_tm_403', 'phases': 'AS', 'voltage_1': '120', 'voltage_2': '120', 'voltage_N': '0', 'nominal_voltage': '120'}, 'triplex_line': {'groupid': 'Triplex_Line', 'name': 'R2-12-47-3_tl_409', 'phases': 'AS', 'from': 'R2-12-47-3_tn_409', 'to': 'R2-12-47-3_tm_409', 'length': '30', 'configuration': 'triplex_line_configuration_1'}}
You can now use the python dict how you want.
For e.g.
print(data['triplex_meter']['name'])
EDIT
If you have got lots of "triplex_meter" objects in your file group it in a Python list before inserting them in the main dict

Python ruamel.yaml dumps tags with quotes

I'm trying to use ruamel.yaml to modify an AWS CloudFormation template on the fly using python. I added the following code to make the safe_load working with CloudFormation functions such as !Ref. However, when I dump them out, those values with !Ref (or any other functions) will be wrapped by quotes. CloudFormation is not able to identify that.
See example below:
import sys, json, io, boto3
import ruamel.yaml
def funcparse(loader, node):
node.value = {
ruamel.yaml.ScalarNode: loader.construct_scalar,
ruamel.yaml.SequenceNode: loader.construct_sequence,
ruamel.yaml.MappingNode: loader.construct_mapping,
}[type(node)](node)
node.tag = node.tag.replace(u'!Ref', 'Ref').replace(u'!', u'Fn::')
return dict([ (node.tag, node.value) ])
funcnames = [ 'Ref', 'Base64', 'FindInMap', 'GetAtt', 'GetAZs', 'ImportValue',
'Join', 'Select', 'Split', 'Split', 'Sub', 'And', 'Equals', 'If',
'Not', 'Or' ]
for func in funcnames:
ruamel.yaml.SafeLoader.add_constructor(u'!' + func, funcparse)
txt = open("/space/tmp/a.template","r")
base = ruamel.yaml.safe_load(txt)
base["foo"] = {
"name": "abc",
"Resources": {
"RouteTableId" : "!Ref aaa",
"VpcPeeringConnectionId" : "!Ref bbb",
"yourname": "dfw"
}
}
ruamel.yaml.safe_dump(
base,
sys.stdout,
default_flow_style=False
)
The input file is like this:
foo:
bar: !Ref barr
aa: !Ref bb
The output is like this:
foo:
Resources:
RouteTableId: '!Ref aaa'
VpcPeeringConnectionId: '!Ref bbb'
yourname: dfw
name: abc
Notice the '!Ref VpcRouteTable' is been wrapped by single quotes. This won't be identified by CloudFormation. Is there a way to configure dumper so that the output will be like:
foo:
Resources:
RouteTableId: !Ref aaa
VpcPeeringConnectionId: !Ref bbb
yourname: dfw
name: abc
Other things I have tried:
pyyaml library, works the same
Use Ref:: instead of !Ref, works the
same

Essentially you tweak the loader, to load tagged (scalar) objects as if they were mappings, with the tag the key and the value the scalar. But you don't do anything to distinguish the dict loaded from such a mapping from other dicts loaded from normal mappings, nor do you have any specific code to represent such a mapping to "get the tag back".
When you try to "create" a scalar with a tag, you just make a string starting with an exclamation mark, and that needs to get dumped quoted to distinguish it from real tagged nodes.
What obfuscates this all, is that your example overwrites the loaded data by assigning to base["foo"] so the only thing you can derive from the safe_load, and and all your code before that, is that it doesn't throw an exception. I.e. if you leave out the lines starting with base["foo"] = { your output will look like:
foo:
aa:
Ref: bb
bar:
Ref: barr
And in that Ref: bb is not distinguishable from a normal dumped dict. If you want to explore this route, then you should make a subclass TagDict(dict), and have funcparse return that subclass, and also add a representer for that subclass that re-creates the tag from the key and then dumps the value. Once that works (round-trip equals input), you can do:
"RouteTableId" : TagDict('Ref', 'aaa')
If you do that, you should, apart from removing non-used libraries, also change your code to close the file-pointer txt in your code, as that can lead to problems. You can do this elegantly be using the with statement:
with open("/space/tmp/a.template","r") as txt:
base = ruamel.yaml.safe_load(txt)
(I also would leave out the "r" (or put a space before it); and replace txt with a more appropriate variable name that indicates this is an (input) file pointer).
You also have the entry 'Split' twice in your funcnames, which is superfluous.
A more generic solution can be achieved by using a multi-constructor that matches any tag and having three basic types to cover scalars, mappings and sequences.
import sys
import ruamel.yaml
yaml_str = """\
foo:
scalar: !Ref barr
mapping: !Select
a: !Ref 1
b: !Base64 A413
sequence: !Split
- !Ref baz
- !Split Multi word scalar
"""
class Generic:
def __init__(self, tag, value, style=None):
self._value = value
self._tag = tag
self._style = style
class GenericScalar(Generic):
#classmethod
def to_yaml(self, representer, node):
return representer.represent_scalar(node._tag, node._value)
#staticmethod
def construct(constructor, node):
return constructor.construct_scalar(node)
class GenericMapping(Generic):
#classmethod
def to_yaml(self, representer, node):
return representer.represent_mapping(node._tag, node._value)
#staticmethod
def construct(constructor, node):
return constructor.construct_mapping(node, deep=True)
class GenericSequence(Generic):
#classmethod
def to_yaml(self, representer, node):
return representer.represent_sequence(node._tag, node._value)
#staticmethod
def construct(constructor, node):
return constructor.construct_sequence(node, deep=True)
def default_constructor(constructor, tag_suffix, node):
generic = {
ruamel.yaml.ScalarNode: GenericScalar,
ruamel.yaml.MappingNode: GenericMapping,
ruamel.yaml.SequenceNode: GenericSequence,
}.get(type(node))
if generic is None:
raise NotImplementedError('Node: ' + str(type(node)))
style = getattr(node, 'style', None)
instance = generic.__new__(generic)
yield instance
state = generic.construct(constructor, node)
instance.__init__(tag_suffix, state, style=style)
ruamel.yaml.add_multi_constructor('', default_constructor, Loader=ruamel.yaml.SafeLoader)
yaml = ruamel.yaml.YAML(typ='safe', pure=True)
yaml.default_flow_style = False
yaml.register_class(GenericScalar)
yaml.register_class(GenericMapping)
yaml.register_class(GenericSequence)
base = yaml.load(yaml_str)
base['bar'] = {
'name': 'abc',
'Resources': {
'RouteTableId' : GenericScalar('!Ref', 'aaa'),
'VpcPeeringConnectionId' : GenericScalar('!Ref', 'bbb'),
'yourname': 'dfw',
's' : GenericSequence('!Split', ['a', GenericScalar('!Not', 'b'), 'c']),
}
}
yaml.dump(base, sys.stdout)
which outputs:
bar:
Resources:
RouteTableId: !Ref aaa
VpcPeeringConnectionId: !Ref bbb
s: !Split
- a
- !Not b
- c
yourname: dfw
name: abc
foo:
mapping: !Select
a: !Ref 1
b: !Base64 A413
scalar: !Ref barr
sequence: !Split
- !Ref baz
- !Split Multi word scalar
Please note that sequences and mappings are handled correctly and that they can be created as well. There is however no check that:
the tag you provide is actually valid
the value associated with the tag is of the proper type for that tag name (scalar, mapping, sequence)
if you want GenericMapping to behave more like dict, then you probably want it a subclass of dict (and not of Generic) and provide the appropriate __init__ (idem for GenericSequence/list)
When the assignment is changed to something more close to yours:
base["foo"] = {
"name": "abc",
"Resources": {
"RouteTableId" : GenericScalar('!Ref', 'aaa'),
"VpcPeeringConnectionId" : GenericScalar('!Ref', 'bbb'),
"yourname": "dfw"
}
}
the output is:
foo:
Resources:
RouteTableId: !Ref aaa
VpcPeeringConnectionId: !Ref bbb
yourname: dfw
name: abc
which is exactly the output you want.

Apart from Anthon's detailed answer above, for the specific question in terms of CloudFormation template, I found another very quick & sweet workaround.
Still using the constructor snippet to load the YAML.
def funcparse(loader, node):
node.value = {
ruamel.yaml.ScalarNode: loader.construct_scalar,
ruamel.yaml.SequenceNode: loader.construct_sequence,
ruamel.yaml.MappingNode: loader.construct_mapping,
}[type(node)](node)
node.tag = node.tag.replace(u'!Ref', 'Ref').replace(u'!', u'Fn::')
return dict([ (node.tag, node.value) ])
funcnames = [ 'Ref', 'Base64', 'FindInMap', 'GetAtt', 'GetAZs', 'ImportValue',
'Join', 'Select', 'Split', 'Split', 'Sub', 'And', 'Equals', 'If',
'Not', 'Or' ]
for func in funcnames:
ruamel.yaml.SafeLoader.add_constructor(u'!' + func, funcparse)
When we manipulate the data, instead of doing
base["foo"] = {
"name": "abc",
"Resources": {
"RouteTableId" : "!Ref aaa",
"VpcPeeringConnectionId" : "!Ref bbb",
"yourname": "dfw"
}
}
which will wrap the value !Ref aaa with quotes, we can simply do:
base["foo"] = {
"name": "abc",
"Resources": {
"RouteTableId" : {
"Ref" : "aaa"
},
"VpcPeeringConnectionId" : {
"Ref" : "bbb
},
"yourname": "dfw"
}
}
Similarly, for other functions in CloudFormation, such as !GetAtt, we should use their long form Fn::GetAtt and use them as the key of a JSON object. Problem solved easily.

create a dictionary from file python

I am new to python and am trying to read a file and create a dictionary from it.
The format is as follows:
.1.3.6.1.4.1.14823.1.1.27 {
TYPE = Switch
VENDOR = Aruba
MODEL = ArubaS3500-48T
CERTIFICATION = CERTIFIED
CONT = Aruba-Switch
HEALTH = ARUBA-Controller
VLAN = Dot1q INSTRUMENTATION:
Card-Fault = ArubaController:DeviceID
CPU/Memory = ArubaController:DeviceID
Environment = ArubaSysExt:DeviceID
Interface-Fault = MIB2
Interface-Performance = MIB2
Port-Fault = MIB2
Port-Performance = MIB2
}
The first line OID (.1.3.6.1.4.1.14823.1.1.27 { ) I want this to be the key and the remaining lines are the values until the }
I have tried a few combinations but am not able to get the correct regex to match these
Any help please?
I have tried something like
lines = cache.readlines()
for line in lines:
searchObj = re.search(r'(^.\d.*{)(.*)$', line)
if searchObj:
(oid, cert ) = searchObj.groups()
results[searchObj(oid)] = ", ".join(line[1:])
print("searchObj.group() : ", searchObj.group(1))
print("searchObj.group(1) : ", searchObj.group(2))

You can try this:
import re
data = open('filename.txt').read()
the_key = re.findall("^\n*[\.\d]+", data)
values = [re.split("\s+\=\s+", i) for i in re.findall("[a-zA-Z0-9]+\s*\=\s*[a-zA-Z0-9]+", data)]
final_data = {the_key[0]:dict(values)}
Output:
{'\n.1.3.6.1.4.1.14823.1.1.27': {'VENDOR': 'Aruba', 'CERTIFICATION': 'CERTIFIED', 'Fault': 'MIB2', 'VLAN': 'Dot1q', 'Environment': 'ArubaSysExt', 'HEALTH': 'ARUBA', 'Memory': 'ArubaController', 'Performance': 'MIB2', 'CONT': 'Aruba', 'MODEL': 'ArubaS3500', 'TYPE': 'Switch'}}

You could use a nested dict comprehension along with an outer and inner regex.
Your blocks can be separated by
.numbers...numbers.. {
// values here
}
In terms of regular expression this can be formulated as
^\s* # start of line + whitespaces, eventually
(?P<key>\.[\d.]+)\s* # the key
{(?P<values>[^{}]+)} # everything between { and }
As you see, we split the parts into key/value pairs.
Your "inner" structure can be formulated like
(?P<key>\b[A-Z][-/\w]+\b) # the "inner" key
\s*=\s* # whitespaces, =, whitespaces
(?P<value>.+) # the value
Now let's build the "outer" and "inner" expressions together:
rx_outer = re.compile(r'^\s*(?P<key>\.[\d.]+)\s*{(?P<values>[^{}]+)}', re.MULTILINE)
rx_inner = re.compile(r'(?P<key>\b[A-Z][-/\w]+\b)\s*=\s*(?P<value>.+)')
result = {item.group('key'):
{match.group('key'): match.group('value')
for match in rx_inner.finditer(item.group('values'))}
for item in rx_outer.finditer(string)}
print(result)
A demo can be found on ideone.com.

Python - append to dictionary by name with multilevels 1, 1.1, 1.1.1, 1.1.2 (hierarchical)

I use openpyxl to read data from excel files to provide a json file at the end. The problem is that I cannot figure out an algorithm to do a hierarchical organisation of the json (or python dictionary).
The data form is like the following:
The output should be like this:
{
'id' : '1',
'name' : 'first',
'value' : 10,
'children': [ {
'id' : '1.1',
'name' : 'ab',
'value': 25,
'children' : [
{
'id' : '1.1.1',
'name' : 'abc' ,
'value': 16,
'children' : []
}
]
},
{
'id' : '1.2',
...
]
}
Here is what I have come up with, but i can't go beyond '1.1' because '1.1.1' and '1.1.1.1' and so on will be at the same level as 1.1.
from openpyxl import load_workbook
import re
from json import dumps
wb = load_workbook('resources.xlsx')
sheet = wb.get_sheet_by_name(wb.get_sheet_names()[0])
resources = {}
prev_dict = {}
list_rows = [ row for row in sheet.rows ]
for nrow in range(list_rows.__len__()):
id = str(list_rows[nrow][0].value)
val = {
'id' : id,
'name' : list_rows[nrow][1].value ,
'value' : list_rows[nrow][2].value ,
'children' : []
}
if id[:-2] == str(list_rows[nrow-1][0].value):
prev_dict['children'].append(val)
else:
resources[nrow] = val
prev_dict = resources[nrow]
print dumps(resources)

You need to access your data by ID, so first step is to create a dictionary where the IDs are the keys. For easier data manipulation, string "1.2.3" is converted to ("1","2","3") tuple. (Lists are not allowed as dict keys). This makes the computation of a parent key very easy (key[:-1]).
With this preparation, we could simply populate the children list of each item's parent. But before doing that a special ROOT element needs to be added. It is the parent of top-level items.
That's all. The code is below.
Note #1: It expects that every item has a parent. That's why 1.2.2 was added to the test data. If it is not the case, handle the KeyError where noted.
Note #2: The result is a list.
import json
testdata="""
1 first 20
1.1 ab 25
1.1.1 abc 16
1.2 cb 18
1.2.1 cbd 16
1.2.1.1 xyz 19
1.2.2 NEW -1
1.2.2.1 poz 40
1.2.2.2 pos 98
2 second 90
2.1 ezr 99
"""
datalist = [line.split() for line in testdata.split('\n') if line]
datadict = {tuple(item[0].split('.')): {
'id': item[0],
'name': item[1],
'value': item[2],
'children': []}
for item in datalist}
ROOT = ()
datadict[ROOT] = {'children': []}
for key, value in datadict.items():
if key != ROOT:
datadict[key[:-1]]['children'].append(value)
# KeyError = parent does not exist
result = datadict[ROOT]['children']
print(json.dumps(result, indent=4))

parsing linux iscsi multipath.conf into python nested dictionaries

I writing a script that involves adding/removing multipath "objects" from the standard multipath.conf configuration file, example below:
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
So what I'm trying to do is read this file in and store it in a nested python dictionary (or list of nested dictionaries). We can ignore the comments lines (starting with #) for now. I have not come up with a clear/concise solution for this.
Here is my partial solution (doesn't give me the expected output yet, but it's close)
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line
def __parse_conf__(self):
conf = []
with open(self.conf_file_path) as f:
for line in nonblank_lines(f):
if line.strip().endswith("{"): # opening bracket, start of new list of dictionaries
current_dictionary_key = line.split()[0]
current_dictionary = { current_dictionary_key : None }
conf.append(current_dictionary)
elif line.strip().endswith("}"): # closing bracket, end of new dictionary
pass
# do nothing...
elif not line.strip().startswith("#"):
if current_dictionary.values() == [None]:
# New dictionary... we should be appending to this one
current_dictionary[current_dictionary_key] = [{}]
current_dictionary = current_dictionary[current_dictionary_key][0]
key = line.strip().split()[0]
val = " ".join(line.strip().split()[1:])
current_dictionary[key] = val
And this is the resulting dictionary (the list 'conf'):
[{'defaults': [{'user_friendly_names': 'yes'}]},
{'devices': None},
{'device': [{'failback': 'immediate',
'features': '"0"',
'getuid_callout': '"/lib/udev/scsi_id --whitelisted --device=/dev/%n"',
'hardware_handler': '"0"',
'no_path_retry': '24',
'path_checker': 'tur',
'path_grouping_policy': 'multibus',
'path_selector': '"service-time 0"',
'prio': 'const',
'product': '"SSD SAN"',
'rr_min_io': '1000',
'rr_min_io_rq': '1',
'rr_weight': 'uniform',
'vendor': '"SolidFir"'}]},
{'multipaths': None},
{'multipath': [{'alias': 'dwqa-ora-fs',
'wwid': '36f47acc1000000006167347a00000041'}]},
{'multipath': [{'alias': 'dwqa-ora-grid',
'wwid': '36f47acc1000000006167347a00000043'}]},
{'multipath': [{'alias': 'dwqa-ora-dwqa1',
'wwid': '36f47acc1000000006167347a00000044'}]},
{'multipath': [{'alias': 'dwqa-ora-dwh2d10-1',
'wwid': '36f47acc1000000006167347a000000ae'}]},
{'multipath': [{'alias': 'dwqa-ora-testdg-1',
'wwid': '36f47acc1000000006167347a000000f9'}]},
{'multipath': [{'alias': 'dwqa-ora-testdp10-1',
'wwid': '"SSolidFirSSD SAN 6167347a00000123f47acc0100000000"'}]}]
Obviously the "None"s should be replaced with nested dictionary below it, but I can't get this part to work.
Any suggestions? Or better ways to parse this file and store it in a python data structure?

Try something like this:
def parse_conf(conf_lines):
config = []
# iterate on config lines
for line in conf_lines:
# remove left and right spaces
line = line.rstrip().strip()
if line.startswith('#'):
# skip comment lines
continue
elif line.endswith('{'):
# new dict (notice the recursion here)
config.append({line.split()[0]: parse_conf(conf_lines)})
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split()
if len(line) > 1:
config.append({line[0]: " ".join(line[1:])})
return config
The function will get into the nested levels on the configuration file (thanks to recursion and the fact that the conf_lines object is an iterator) and make a list of dictionaries that contain other dictionaries. Unfortunately, you have to put every nested dictionary inside a list again, because in the example file you show how multipath can repeat, but in Python dictionaries a key must be unique. So you make a list.
You can test it with your example configuration file, like this:
with open('multipath.conf','r') as conf_file:
config = parse_conf(conf_file)
# show multipath config lines as an example
for item in config:
if 'multipaths' in item:
for multipath in item['multipaths']:
print multipath
# or do something more useful
And the output would be:
{'multipath': [{'wwid': '36f47acc1000000006167347a00000041'}, {'alias': 'dwqa-ora-fs'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000043'}, {'alias': 'dwqa-ora-grid'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000044'}, {'alias': 'dwqa-ora-dwqa1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000ae'}, {'alias': 'dwqa-ora-dwh2d10-1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000f9'}, {'alias': 'dwqa-ora-testdg-1'}]}

If you don't use recursion, you will need some way of keeping track of your level. But even then it is difficult to have references to parents or siblings in order to add data (I failed). Here's another take based on Daniele Barresi's mention of recursion on the iterable input:
Data:
inp = """
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
"""
Code:
import re
level = 0
def recurse( data ):
""" """
global level
out = []
level += 1
for line in data:
l = line.strip()
if l and not l.startswith('#'):
match = re.search(r"\s*(\w+)\s*(?:{|(?:\"?\s*([^\"]+)\"?)?)", l)
if not match:
if l == '}':
level -= 1
return out # recursion, up one level
else:
key, value = match.groups()
if not value:
print( " "*level, level, key )
value = recurse( data ) # recursion, down one level
else:
print( " "*level, level, key, value)
out.append( [key,value] )
return out # once
result = recurse( iter(inp.split('\n')) )
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(result)
Resulting list with nested ["key", value] pairs:
[ ['defaults', [['user_friendly_names', 'yes']]],
[ 'devices',
[ [ 'device',
[ ['vendor', 'SolidFir'],
['product', 'SSD SAN'],
['path_grouping_policy', 'multibus'],
[ 'getuid_callout',
'/lib/udev/scsi_id --whitelisted --device=/dev/%n'],
['path_selector', 'service-time 0'],
['path_checker', 'tur'],
['hardware_handler', '0'],
['failback', 'immediate'],
['rr_weight', 'uniform'],
['rr_min_io', '1000'],
['rr_min_io_rq', '1'],
['features', '0'],
['no_path_retry', '24'],
['prio', 'const']]]]],
[ 'multipaths',
[ [ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000041'],
['alias', 'dwqa-ora-fs']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000043'],
['alias', 'dwqa-ora-grid']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000044'],
['alias', 'dwqa-ora-dwqa1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000ae'],
['alias', 'dwqa-ora-dwh2d10-1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000f9'],
['alias', 'dwqa-ora-testdg-1']]]]]]

Multipath conf is a bit of a pig to parse. This is what I use (originally based on the answer from daniele-barresi), the output is easier to work with than the other examples.
def get_multipath_conf():
def parse_conf(conf_lines, parent=None):
config = {}
for line in conf_lines:
line = line.split('#',1)[0].strip()
if line.endswith('{'):
key = line.split('{', 1)[0].strip()
value = parse_conf(conf_lines, parent=key)
if key+'s' == parent:
if type(config) is dict:
config = []
config.append(value)
else:
config[key] = value
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split(' ',1)
if len(line) > 1:
key = line[0]
value = line[1].strip().strip("'").strip('"')
config[key] = value
return config
return parse_conf(open('/etc/multipath.conf','r'))
This is the output:
{'blacklist': {'devnode': '^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda|sdb)[0-9]*$'},
'defaults': {'find_multipaths': 'yes',
'max_polling_interval': '4',
'polling_interval': '2',
'reservation_key': '0x1'},
'devices': [{'detect_checker': 'no',
'hardware_handler': '1 alua',
'no_path_retry': '5',
'path_checker': 'tur',
'prio': 'alua',
'product': 'iSCSI Volume',
'user_friendly_names': 'yes',
'vendor': 'StorMagic'}]}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing devicetree with pyparsing into structured dictionary - python

Related

How to parse file with different structures in python

Python ruamel.yaml dumps tags with quotes

create a dictionary from file python

Python - append to dictionary by name with multilevels 1, 1.1, 1.1.1, 1.1.2 (hierarchical)

parsing linux iscsi multipath.conf into python nested dictionaries

Categories

Resources