parsing linux iscsi multipath.conf into python nested dictionaries - python

I writing a script that involves adding/removing multipath "objects" from the standard multipath.conf configuration file, example below:
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
So what I'm trying to do is read this file in and store it in a nested python dictionary (or list of nested dictionaries). We can ignore the comments lines (starting with #) for now. I have not come up with a clear/concise solution for this.
Here is my partial solution (doesn't give me the expected output yet, but it's close)
def nonblank_lines(f):
for l in f:
line = l.rstrip()
if line:
yield line
def __parse_conf__(self):
conf = []
with open(self.conf_file_path) as f:
for line in nonblank_lines(f):
if line.strip().endswith("{"): # opening bracket, start of new list of dictionaries
current_dictionary_key = line.split()[0]
current_dictionary = { current_dictionary_key : None }
conf.append(current_dictionary)
elif line.strip().endswith("}"): # closing bracket, end of new dictionary
pass
# do nothing...
elif not line.strip().startswith("#"):
if current_dictionary.values() == [None]:
# New dictionary... we should be appending to this one
current_dictionary[current_dictionary_key] = [{}]
current_dictionary = current_dictionary[current_dictionary_key][0]
key = line.strip().split()[0]
val = " ".join(line.strip().split()[1:])
current_dictionary[key] = val
And this is the resulting dictionary (the list 'conf'):
[{'defaults': [{'user_friendly_names': 'yes'}]},
{'devices': None},
{'device': [{'failback': 'immediate',
'features': '"0"',
'getuid_callout': '"/lib/udev/scsi_id --whitelisted --device=/dev/%n"',
'hardware_handler': '"0"',
'no_path_retry': '24',
'path_checker': 'tur',
'path_grouping_policy': 'multibus',
'path_selector': '"service-time 0"',
'prio': 'const',
'product': '"SSD SAN"',
'rr_min_io': '1000',
'rr_min_io_rq': '1',
'rr_weight': 'uniform',
'vendor': '"SolidFir"'}]},
{'multipaths': None},
{'multipath': [{'alias': 'dwqa-ora-fs',
'wwid': '36f47acc1000000006167347a00000041'}]},
{'multipath': [{'alias': 'dwqa-ora-grid',
'wwid': '36f47acc1000000006167347a00000043'}]},
{'multipath': [{'alias': 'dwqa-ora-dwqa1',
'wwid': '36f47acc1000000006167347a00000044'}]},
{'multipath': [{'alias': 'dwqa-ora-dwh2d10-1',
'wwid': '36f47acc1000000006167347a000000ae'}]},
{'multipath': [{'alias': 'dwqa-ora-testdg-1',
'wwid': '36f47acc1000000006167347a000000f9'}]},
{'multipath': [{'alias': 'dwqa-ora-testdp10-1',
'wwid': '"SSolidFirSSD SAN 6167347a00000123f47acc0100000000"'}]}]
Obviously the "None"s should be replaced with nested dictionary below it, but I can't get this part to work.
Any suggestions? Or better ways to parse this file and store it in a python data structure?

Try something like this:
def parse_conf(conf_lines):
config = []
# iterate on config lines
for line in conf_lines:
# remove left and right spaces
line = line.rstrip().strip()
if line.startswith('#'):
# skip comment lines
continue
elif line.endswith('{'):
# new dict (notice the recursion here)
config.append({line.split()[0]: parse_conf(conf_lines)})
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split()
if len(line) > 1:
config.append({line[0]: " ".join(line[1:])})
return config
The function will get into the nested levels on the configuration file (thanks to recursion and the fact that the conf_lines object is an iterator) and make a list of dictionaries that contain other dictionaries. Unfortunately, you have to put every nested dictionary inside a list again, because in the example file you show how multipath can repeat, but in Python dictionaries a key must be unique. So you make a list.
You can test it with your example configuration file, like this:
with open('multipath.conf','r') as conf_file:
config = parse_conf(conf_file)
# show multipath config lines as an example
for item in config:
if 'multipaths' in item:
for multipath in item['multipaths']:
print multipath
# or do something more useful
And the output would be:
{'multipath': [{'wwid': '36f47acc1000000006167347a00000041'}, {'alias': 'dwqa-ora-fs'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000043'}, {'alias': 'dwqa-ora-grid'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a00000044'}, {'alias': 'dwqa-ora-dwqa1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000ae'}, {'alias': 'dwqa-ora-dwh2d10-1'}]}
{'multipath': [{'wwid': '36f47acc1000000006167347a000000f9'}, {'alias': 'dwqa-ora-testdg-1'}]}

If you don't use recursion, you will need some way of keeping track of your level. But even then it is difficult to have references to parents or siblings in order to add data (I failed). Here's another take based on Daniele Barresi's mention of recursion on the iterable input:
Data:
inp = """
# This is a basic configuration file with some examples, for device mapper
# multipath.
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
##
devices {
device {
vendor "SolidFir"
product "SSD SAN"
path_grouping_policy multibus
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_selector "service-time 0"
path_checker tur
hardware_handler "0"
failback immediate
rr_weight uniform
rr_min_io 1000
rr_min_io_rq 1
features "0"
no_path_retry 24
prio const
}
}
multipaths {
multipath {
wwid 36f47acc1000000006167347a00000041
alias dwqa-ora-fs
}
multipath {
wwid 36f47acc1000000006167347a00000043
alias dwqa-ora-grid
}
multipath {
wwid 36f47acc1000000006167347a00000044
alias dwqa-ora-dwqa1
}
multipath {
wwid 36f47acc1000000006167347a000000ae
alias dwqa-ora-dwh2d10-1
}
multipath {
wwid 36f47acc1000000006167347a000000f9
alias dwqa-ora-testdg-1
}
}
"""
Code:
import re
level = 0
def recurse( data ):
""" """
global level
out = []
level += 1
for line in data:
l = line.strip()
if l and not l.startswith('#'):
match = re.search(r"\s*(\w+)\s*(?:{|(?:\"?\s*([^\"]+)\"?)?)", l)
if not match:
if l == '}':
level -= 1
return out # recursion, up one level
else:
key, value = match.groups()
if not value:
print( " "*level, level, key )
value = recurse( data ) # recursion, down one level
else:
print( " "*level, level, key, value)
out.append( [key,value] )
return out # once
result = recurse( iter(inp.split('\n')) )
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(result)
Resulting list with nested ["key", value] pairs:
[ ['defaults', [['user_friendly_names', 'yes']]],
[ 'devices',
[ [ 'device',
[ ['vendor', 'SolidFir'],
['product', 'SSD SAN'],
['path_grouping_policy', 'multibus'],
[ 'getuid_callout',
'/lib/udev/scsi_id --whitelisted --device=/dev/%n'],
['path_selector', 'service-time 0'],
['path_checker', 'tur'],
['hardware_handler', '0'],
['failback', 'immediate'],
['rr_weight', 'uniform'],
['rr_min_io', '1000'],
['rr_min_io_rq', '1'],
['features', '0'],
['no_path_retry', '24'],
['prio', 'const']]]]],
[ 'multipaths',
[ [ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000041'],
['alias', 'dwqa-ora-fs']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000043'],
['alias', 'dwqa-ora-grid']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a00000044'],
['alias', 'dwqa-ora-dwqa1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000ae'],
['alias', 'dwqa-ora-dwh2d10-1']]],
[ 'multipath',
[ ['wwid', '36f47acc1000000006167347a000000f9'],
['alias', 'dwqa-ora-testdg-1']]]]]]

Multipath conf is a bit of a pig to parse. This is what I use (originally based on the answer from daniele-barresi), the output is easier to work with than the other examples.
def get_multipath_conf():
def parse_conf(conf_lines, parent=None):
config = {}
for line in conf_lines:
line = line.split('#',1)[0].strip()
if line.endswith('{'):
key = line.split('{', 1)[0].strip()
value = parse_conf(conf_lines, parent=key)
if key+'s' == parent:
if type(config) is dict:
config = []
config.append(value)
else:
config[key] = value
else:
# inside a dict
if line.endswith('}'):
# end of current dict
break
else:
# parameter line
line = line.split(' ',1)
if len(line) > 1:
key = line[0]
value = line[1].strip().strip("'").strip('"')
config[key] = value
return config
return parse_conf(open('/etc/multipath.conf','r'))
This is the output:
{'blacklist': {'devnode': '^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda|sdb)[0-9]*$'},
'defaults': {'find_multipaths': 'yes',
'max_polling_interval': '4',
'polling_interval': '2',
'reservation_key': '0x1'},
'devices': [{'detect_checker': 'no',
'hardware_handler': '1 alua',
'no_path_retry': '5',
'path_checker': 'tur',
'prio': 'alua',
'product': 'iSCSI Volume',
'user_friendly_names': 'yes',
'vendor': 'StorMagic'}]}

Related

How to parse file with different structures in python

I am working on a file where data with a lot of structures. But I cannot figure out an efficient way to handle all of these. My idea is read line by line and find paratheses in pair. Is there any efficient way to match paratheses then I handle each type in specific logic?
Here is the file I am facing:
.....
# some header info that can be discarded
object node {
name R2-12-47-3_node_453;
phases ABCN;
voltage_A 7200+0.0j;
voltage_B -3600-6235j;
voltage_C -3600+6235j;
nominal_voltage 7200;
bustype SWING;
}
...
# a lot of objects node
object triplex_meter {
name R2-12-47-3_tm_403;
phases AS;
voltage_1 120;
voltage_2 120;
voltage_N 0;
nominal_voltage 120;
}
....
# a lot of object triplex_meter
object triplex_line {
groupid Triplex_Line;
name R2-12-47-3_tl_409;
phases AS;
from R2-12-47-3_tn_409;
to R2-12-47-3_tm_409;
length 30;
configuration triplex_line_configuration_1;
}
...
# a lot of object triplex_meter
#some nested objects...awh...
So my question is there way to quickly match "{" and "}" so that I can focus on the type inside.
I am expecting some logic like after parsing the file:
if obj_type == "node":
# to do 1
elif obj_type == "triplex_meter":
# to do 2
It seems easy to deal with this structure, but I am not sure exactly where to get started.
Code with comments
file = """
object node {
name R2-12-47-3_node_453
phases ABCN
voltage_A 7200+0.0j
voltage_B - 3600-6235j
voltage_C - 3600+6235j
nominal_voltage 7200
bustype SWING
}
object triplex_meter {
name R2-12-47-3_tm_403
phases AS
voltage_1 120
voltage_2 120
voltage_N 0
nominal_voltage 120
}
object triplex_line {
groupid Triplex_Line
name R2-12-47-3_tl_409
phases AS
from R2-12-47-3_tn_409
to R2-12-47-3_tm_409
length 30
configuration triplex_line_configuration_1
}"""
# New python dict
data = {}
# Generate a list with all object taken from file
x = file.replace('\n', '').replace(' - ', ' ').strip().split('object ')
for i in x:
# Exclude null items in the list to avoid errors
if i != '':
# Hard split
a, b = i.split('{')
c = b.split(' ')
# Generate a new list with non null elements
d = [e.replace('}', '') for e in c if e != '' and e != ' ']
# Needing a sub dict here for paired values
sub_d = {}
# Iterating over list to get paired values
for index in range(len(d)):
# We are working with paired values so we unpack only pair indexes
if index % 2 == 0:
# Inserting paired values in sub_dict
sub_d[d[index]] = d[index+1]
# Inserting sub_dict in main dict "data" using object name
data[a.strip()] = sub_d
print(data)
Output
{'node': {'name': 'R2-12-47-3_node_453', 'phases': 'ABCN', 'voltage_A': '7200+0.0j', 'voltage_B': '3600-6235j', 'voltage_C': '3600+6235j', 'nominal_voltage': '7200', 'bustype': 'SWING'}, 'triplex_meter': {'name': 'R2-12-47-3_tm_403', 'phases': 'AS', 'voltage_1': '120', 'voltage_2': '120', 'voltage_N': '0', 'nominal_voltage': '120'}, 'triplex_line': {'groupid': 'Triplex_Line', 'name': 'R2-12-47-3_tl_409', 'phases': 'AS', 'from': 'R2-12-47-3_tn_409', 'to': 'R2-12-47-3_tm_409', 'length': '30', 'configuration': 'triplex_line_configuration_1'}}
You can now use the python dict how you want.
For e.g.
print(data['triplex_meter']['name'])
EDIT
If you have got lots of "triplex_meter" objects in your file group it in a Python list before inserting them in the main dict

How do I converted my textfile to a nested json in python

I have a text file which I want to convert to a nested json structure. The text file is :
Report_for Reconciliation
Execution_of application_1673496470638_0001
Spark_version 2.4.7-amzn-0
Java_version 1.8.0_352 (Amazon.com Inc.)
Start_time 2023-01-12 09:45:13.360000
Spark Properties:
Job_ID 0
Submission_time 2023-01-12 09:47:20.148000
Run_time 73957ms
Result JobSucceeded
Number_of_stages 1
Stage_ID 0
Number_of_tasks 16907
Number_of_executed_tasks 16907
Completion_time 73207ms
Stage_executed parquet at RawDataPublisher.scala:53
Job_ID 1
Submission_time 2023-01-12 09:48:34.177000
Run_time 11525ms
Result JobSucceeded
Number_of_stages 2
Stage_ID 1
Number_of_tasks 16907
Number_of_executed_tasks 0
Completion_time 0ms
Stage_executed parquet at RawDataPublisher.scala:53
Stage_ID 2
Number_of_tasks 300
Number_of_executed_tasks 300
Completion_time 11520ms
Stage_executed parquet at RawDataPublisher.scala:53
Job_ID 2
Submission_time 2023-01-12 09:48:46.908000
Run_time 218358ms
Result JobSucceeded
Number_of_stages 1
Stage_ID 3
Number_of_tasks 1135
Number_of_executed_tasks 1135
Completion_time 218299ms
Stage_executed parquet at RawDataPublisher.scala:53
I want the output to be :
{
"Report_for": "Reconciliation",
"Execution_of": "application_1673496470638_0001",
"Spark_version": "2.4.7-amzn-0",
"Java_version": "1.8.0_352 (Amazon.com Inc.)",
"Start_time": "2023-01-12 09:45:13.360000",
"Job_ID 0": {
"Submission_time": "2023-01-12 09:47:20.148000",
"Run_time": "73957ms",
"Result": "JobSucceeded",
"Number_of_stages": "1",
"Stage_ID 0”: {
"Number_of_tasks": "16907",
"Number_of_executed_tasks": "16907",
"Completion_time": "73207ms",
"Stage_executed": "parquet at RawDataPublisher.scala:53"
"Stage": "parquet at RawDataPublisher.scala:53",
},
},
}
I tried defaultdict method but it was generating a json with values as list which was not acceptable to make a table on it. Here's what I did:
import json
from collections import defaultdict
INPUT = 'demofile.txt'
dict1 = defaultdict(list)
def convert():
with open(INPUT) as f:
for line in f:
command, description = line.strip().split(None, 1)
dict1[command].append(description.strip())
OUTPUT = open("demo1file.json", "w")
json.dump(dict1, OUTPUT, indent = 4, sort_keys = False)
and was getting this:
"Report_for": [ "Reconciliation" ],
"Execution_of": [ "application_1673496470638_0001" ],
"Spark_version": [ "2.4.7-amzn-0" ],
"Java_version": [ "1.8.0_352 (Amazon.com Inc.)" ],
"Start_time": [ "2023-01-12 09:45:13.360000" ],
"Job_ID": [
"0",
"1",
"2", ....
]]]
I just want to convert my text to the above json format so that I can build a table on top of it.
There's no way, python or one of it's libraries can figure out your nesting requirements, if a flat text is being given as an input. How should it know Stages are inside Jobs...for example.
You will have to programmatically tell your application how it works.
I hacked an example which should work, you can go from there (assuming input_str is what you posted as your file content):
# define your nesting structure
nesting = {'Job_ID': {'Stage_ID': {}}}
upper_nestings = []
upper_nesting_keys = []
# your resulting dictionary
result_dict = {}
# your "working" dictionaries
current_nesting = nesting
working_dict = result_dict
# parse each line of the input string
for line_str in input_str.split('\n'):
# key is the first word, value are all consecutive words
line = line_str.split(' ')
# if key is in nesting, create new sub-dict, all consecutive entries are part of the sub-dict
if line[0] in current_nesting.keys():
current_nesting = current_nesting[line[0]]
upper_nestings.append(line[0])
upper_nesting_keys.append(line[1])
working_dict[line_str] = {}
working_dict = working_dict[line_str]
else:
# if a new "parallel" or "upper" nesting is detected, reset your nesting structure
if line[0] in upper_nestings:
nests = upper_nestings[:upper_nestings.index(line[0])]
keys = upper_nesting_keys[:upper_nestings.index(line[0])]
working_dict = result_dict
for nest in nests:
working_dict = working_dict[' '.join([nest, keys[nests.index(nest)]])]
upper_nestings = upper_nestings[:upper_nestings.index(line[0])+1]
upper_nesting_keys = upper_nesting_keys[:upper_nestings.index(line[0])]
upper_nesting_keys.append(line[1])
current_nesting = nesting
for nest in upper_nestings:
current_nesting = current_nesting[nest]
working_dict[line_str] = {}
working_dict = working_dict[line_str]
continue
working_dict[line[0]] = ' '.join(line[1:])
print(result_dict)
Results in:
{
'Report_for': 'Reconciliation',
'Execution_of': 'application_1673496470638_0001',
'Spark_version': '2.4.7-amzn-0',
'Java_version': '1.8.0_352 (Amazon.com Inc.)',
'Start_time': '2023-01-12 09:45:13.360000',
'Spark': 'Properties: ',
'Job_ID 0': {
'Submission_time': '2023-01-12 09:47:20.148000',
'Run_time': '73957ms',
'Result': 'JobSucceeded',
'Number_of_stages': '1',
'Stage_ID 0': {
'Number_of_tasks': '16907',
'Number_of_executed_tasks': '16907',
'Completion_time': '73207ms',
'Stage_executed': 'parquet at RawDataPublisher.scala:53'
}
},
'Job_ID 1': {
'Submission_time': '2023-01-12 09:48:34.177000',
'Run_time': '11525ms',
'Result': 'JobSucceeded',
'Number_of_stages': '2',
'Stage_ID 1': {
'Number_of_tasks': '16907',
'Number_of_executed_tasks': '0',
'Completion_time': '0ms',
'Stage_executed': 'parquet at RawDataPublisher.scala:53'
},
'Stage_ID 2': {
'Number_of_tasks': '300',
'Number_of_executed_tasks': '300',
'Completion_time': '11520ms',
'Stage_executed': 'parquet at RawDataPublisher.scala:53'
}
},
'Job_ID 2': {
'Submission_time':
'2023-01-12 09:48:46.908000',
'Run_time': '218358ms',
'Result': 'JobSucceeded',
'Number_of_stages': '1',
'Stage_ID 3': {
'Number_of_tasks': '1135',
'Number_of_executed_tasks': '1135',
'Completion_time': '218299ms',
'Stage_executed': 'parquet at RawDataPublisher.scala:53'
}
}
}
and should pretty much be generically usable for all kinds of nesting definitions from a flat input. Let me know if it works for you!

Parsing devicetree with pyparsing into structured dictionary

For my C++ RTOS I'm writing a parser of devicetree "source" files (.dts) in Python using pyparsing module. I'm able to parse the structure of the devicetree into a (nested) dictionary, where the property name or node name are dictionary keys (strings), and property values or nodes are dictionary values (either string or a nested dictionary).
Let's assume I have the following example devicetree structure:
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
I'm able to parse that into something like that:
{'/': {'node1': {'node11': {'property111': ['string111'], 'property112': ['string112']},
'property11': ['string11'],
'property12': ['string12']},
'node2': {'property21': ['string21'], 'property22': ['string22']},
'property1': ['string1'],
'property2': ['string2']}}
However for my needs I would prefer to have this data structured differently. I would like to have all properties as a nested dictionary for key "properties", and all child nodes as a nested dictionary for key "children". The reason is that the devicetree (especially nodes) have some "metadata" which I would like to have just as key-value pairs, which requires me to move actual "contents" of the node one level "lower" to avoid any name conflicts for the key. So I would prefer the example above to look like this:
{'/': {
'properties': {
'property1': ['string1'],
'property2': ['string2']
},
'nodes': {
'node1': {
'properties': {
'property11': ['string11'],
'property12': ['string12']
}
'nodes': {
'node11': {
'properties': {
'property111': ['string111'],
'property112': ['string112']
}
'nodes': {
}
}
}
},
'node2': {
'properties': {
'property21': ['string21'],
'property22': ['string22']
}
'nodes': {
}
}
}
}
}
I've tried to add "name" to the parsing tokens, but this results in "doubled" dictionary elements (which is expected, as this behaviour is described in pyparsing documentation). This might not be a problem, but technically a node or property can be named "properties" or "children" (or whatever I choose), so I don't think such solution is robust.
I've also tried to use setParseAction() to convert the token into a dictionary fragment (I hoped that I could transform {'key': 'value'} into {'properties': {'key': 'value'}}), but this did not work at all...
Is this at all possible directly with pyparsing? I'm prepared to just do a second phase to transform the original dictionary to whatever structure I need, but as a perfectionist I would prefer to use a single-run pyparsing-only solution - if possible.
For a reference here's a sample code (Python 3) which transforms the devicetree source into an "unstructured" dictionary. Please note that this code is just a simplification which doesn't support all the features found in .dts (any data type other than string, value lists, unit-addresses, labels and so on) - it just supports string properties and node nesting.
#!/usr/bin/env python
import pyparsing
import pprint
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + pyparsing.Group(pyparsing.Literal('=').suppress() +
propertyValue) + pyparsing.Literal(';').suppress()))
childNode = pyparsing.Forward()
rootNode = pyparsing.Dict(pyparsing.Group(pyparsing.Literal('/') + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + pyparsing.Literal('{').suppress() +
pyparsing.ZeroOrMore(property) + pyparsing.ZeroOrMore(childNode) +
pyparsing.Literal('};').suppress()))
dictionary = rootNode.parseString("""
/ {
property1 = "string1";
property2 = "string2";
node1 {
property11 = "string11";
property12 = "string12";
node11 {
property111 = "string111";
property112 = "string112";
};
};
node2 {
property21 = "string21";
property22 = "string22";
};
};
""").asDict()
pprint.pprint(dictionary, width = 120)
You are really so close. I just did the following:
added Groups and results names for your "properties" and "nodes" sub-sections
changed some of the punctuation literals to CONSTANTS (Literal("};") will fail to match if there is space between the closing brace and semicolon, but RBRACE + SEMI will accommodate whitespace)
removed the outermost Dict on rootNode
Code:
LBRACE,RBRACE,SLASH,SEMI,EQ = map(pyparsing.Suppress, "{}/;=")
nodeName = pyparsing.Word(pyparsing.alphas, pyparsing.alphanums + ',._+-', max = 31)
propertyName = pyparsing.Word(pyparsing.alphanums + ',._+?#', max = 31)
propertyValue = pyparsing.dblQuotedString.setParseAction(pyparsing.removeQuotes)
property = pyparsing.Dict(pyparsing.Group(propertyName + EQ
+ pyparsing.Group(propertyValue)
+ SEMI))
childNode = pyparsing.Forward()
rootNode = pyparsing.Group(SLASH + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI)
childNode <<= pyparsing.Dict(pyparsing.Group(nodeName + LBRACE
+ pyparsing.Group(pyparsing.ZeroOrMore(property))("properties")
+ pyparsing.Group(pyparsing.ZeroOrMore(childNode))("children")
+ RBRACE + SEMI))
Converting to a dict with asDict and printing with pprint gives:
pprint.pprint(result[0].asDict())
{'children': {'node1': {'children': {'node11': {'children': [],
'properties': {'property111': ['string111'],
'property112': ['string112']}}},
'properties': {'property11': ['string11'],
'property12': ['string12']}},
'node2': {'children': [],
'properties': {'property21': ['string21'],
'property22': ['string22']}}},
'properties': {'property1': ['string1'], 'property2': ['string2']}}
You can also use the dump() method that is included with pyparsing's ParseResults class, to help visualized the list and dict/namespace-style access to the results as-is, without any conversion call necessary
print(result[0].dump())
[[['property1', ['string1']], ['property2', ['string2']]], [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]]
- children: [['node1', [['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]], ['node2', [['property21', ['string21']], ['property22', ['string22']]], []]]
- node1: [[['property11', ['string11']], ['property12', ['string12']]], [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]]
- children: [['node11', [['property111', ['string111']], ['property112', ['string112']]], []]]
- node11: [[['property111', ['string111']], ['property112', ['string112']]], []]
- children: []
- properties: [['property111', ['string111']], ['property112', ['string112']]]
- property111: ['string111']
- property112: ['string112']
- properties: [['property11', ['string11']], ['property12', ['string12']]]
- property11: ['string11']
- property12: ['string12']
- node2: [[['property21', ['string21']], ['property22', ['string22']]], []]
- children: []
- properties: [['property21', ['string21']], ['property22', ['string22']]]
- property21: ['string21']
- property22: ['string22']
- properties: [['property1', ['string1']], ['property2', ['string2']]]
- property1: ['string1']
- property2: ['string2']

What is the pythonic way to extract values from dict

What is the best way to extract values from dictionary. Let's suppose we have a list of dicts:
projects = [{'project': 'project_name1',
'dst-repo': 'some_dst_path',
'src-repo': 'some_src_path',
'branches': ['*']},
{...},
{...}]
Now I just iterate through this dictionary and get values, something like:
for project in projects:
project_name = project.get('project')
project_src = ....
project_dst = ....
....
....
So the question is: "Are there any more pythonic approaches to extract values by key from dictionary that allow not making so many lines of code for new variable assignment?"
There's nothing wrong with what you're doing, but you can make it more compact by using a list comprehension to extract the values from the current dictionary. Eg,
projects = [
{
'project': 'project_name1',
'dst-repo': 'some_dst_path',
'src-repo': 'some_src_path',
'branches': ['*']
},
]
keys = ['project', 'src-repo', 'dst-repo', 'branches']
for project in projects:
name, src, dst, branches = [project[k] for k in keys]
# Do stuff with the values
print(name, src, dst, branches)
output
project_name1 some_src_path some_dst_path ['*']
However, this approach gets unwieldy if the number of keys is large.
If keys are sometimes absent from the dict, then you will need to use the .get method, which returns None for missing keys (unless you pass it a default arg):
name, src, dst, branches = [project.get(k) for k in keys]
If you need specific default for each key, you could put them into a dict, eg
defaults = {
'project': 'NONAME',
'src-repo': 'NOSRC',
'dst-repo': 'NODEST',
'branches': ['*'],
}
projects = [
{
'project': 'project_name1',
'src-repo': 'some_src_path',
},
]
keys = ['project', 'src-repo', 'dst-repo', 'branches']
for project in projects:
name, src, dst, branches = [project.get(k, defaults[k]) for k in keys]
# Do stuff with the values
print(name, src, dst, branches)
output
project_name1 some_src_path NODEST ['*']
out = [elt.values() for elt in projects]
for project in projects:
project_name = project['project']
project_src = ....
project_dst = ....
....
....
I'm not sure you can get less typing
//EDIT:
Ok, it looks I misunderstood the question:
Assume we have a list of dicts like this:
projects = [ {'project': "proj1", 'other': "value1", 'other2': "value2"},
{'project': "proj2", 'other': "value3", 'other2': "value4"},
{'project': "proj2", 'other': "value3", 'other2': "value4"} ]
To extract the list of project fields, you can use the following expression:
projects_names = [x['project'] for x in projects]
This will iterate over project list, extracting the value of 'project' key from each dictionary.

How can I change the value of a node in a python dictionary by following a list of keys?

I have a bit of a complex question that I can't seem to get to the bottom of. I have a list of keys corresponding to a position in a Python dictionary. I would like to be able to dynamically change the value at the position (found by the keys in the list).
For example:
listOfKeys = ['car', 'ford', 'mustang']
I also have a dictionary:
DictOfVehiclePrices = {'car':
{'ford':
{'mustang': 'expensive',
'other': 'cheap'},
'toyota':
{'big': 'moderate',
'small': 'cheap'}
},
'truck':
{'big': 'expensive',
'small': 'moderate'}
}
Via my list, how could I dynamically change the value of DictOfVehiclePrices['car']['ford']['mustang']?
In my actual problem, I need to follow the list of keys through the dictionary and change the value at the end position. How can this be done dynamically (with loops, etc.)?
Thank you for your help! :)
Use reduce and operator.getitem:
>>> from operator import getitem
>>> lis = ['car', 'ford', 'mustang']
Update value:
>>> reduce(getitem, lis[:-1], DictOfVehiclePrices)[lis[-1]] = 'cheap'
Fetch value:
>>> reduce(getitem, lis, DictOfVehiclePrices)
'cheap'
Note that in Python 3 reduce has been moved to functools module.
A very simple approach would be:
DictOfVehiclePrices[listOfKeys[0]][listOfKeys[1]][listOfKeys[2]] = 'new value'
print reduce(lambda x, y: x[y], listOfKeys, dictOfVehiclePrices)
Output
expensive
In order to change the values,
result = dictOfVehiclePrices
for key in listOfKeys[:-1]:
result = result[key]
result[listOfKeys[-1]] = "cheap"
print dictOfVehiclePrices
Output
{'car': {'toyota': {'small': 'cheap', 'big': 'moderate'},
'ford': {'mustang': 'cheap', 'other': 'cheap'}},
'truck': {'small': 'moderate', 'big': 'expensive'}}
You have a great solution here by #Joel Cornett.
based on Joel method you can use it like this:
def set_value(dict_nested, address_list):
cur = dict_nested
for path_item in address_list[:-2]:
try:
cur = cur[path_item]
except KeyError:
cur = cur[path_item] = {}
cur[address_list[-2]] = address_list[-1]
DictOfVehiclePrices = {'car':
{'ford':
{'mustang': 'expensive',
'other': 'cheap'},
'toyota':
{'big': 'moderate',
'small': 'cheap'}
},
'truck':
{'big': 'expensive',
'small': 'moderate'}
}
set_value(DictOfVehiclePrices,['car', 'ford', 'mustang', 'a'])
print DictOfVehiclePrices
STDOUT:
{'car': {'toyota': {'small': 'cheap', 'big': 'moderate'}, 'ford':
{'mustang': 'a', 'other': 'cheap'}}, 'truck': {'small': 'moderate',
'big': 'expensive'}}
def update_dict(parent, data, value):
'''
To update the value in the data if the data
is a nested dictionary
:param parent: list of parents
:param data: data dict in which value to be updated
:param value: Value to be updated in data dict
:return:
'''
if parent:
if isinstance(data[parent[0]], dict):
update_dict(parent[1:], data[parent[0]], value)
else:
data[parent[0]] = value
parent = ["test", "address", "area", "street", "locality", "country"]
data = {
"first_name": "ttcLoReSaa",
"test": {
"address": {
"area": {
"street": {
"locality": {
"country": "india"
}
}
}
}
}
}
update_dict(parent, data, "IN")
Here is a recursive function to update a nested dict based on a list of keys:
1.Trigger the update dict function with the required params
2.The function will iterate the list of keys, and retrieves the value from the dict.
3.If the retrieved value is dict, it pops the key from the list and also it updates the dict with the value of the key.
4.Sends the updated dict and list of keys to the same function recursively.
5.When the list gets empty, it means that we have reached the desired the key, where we need to apply our replacement. So if the list is empty, the funtion replaces the dict[key] with the value

Categories

Resources