Turn a simple dictionary into dictionary with nested lists

Turn a simple dictionary into dictionary with nested lists - python

Given the following data received from a web form:
for key in request.form.keys():
print key, request.form.getlist(key)
group_name [u'myGroup']
category [u'social group']
creation_date [u'03/07/2013']
notes [u'Here are some notes about the group']
members[0][name] [u'Adam']
members[0][location] [u'London']
members[0][dob] [u'01/01/1981']
members[1][name] [u'Bruce']
members[1][location] [u'Cardiff']
members[1][dob] [u'02/02/1982']
How can I turn it into a dictionary like this? It's eventually going to be used as JSON but as JSON and dictionaries are easily interchanged my goal is just to get to the following structure.
event = {
group_name : 'myGroup',
notes : 'Here are some notes about the group,
category : 'social group',
creation_date : '03/07/2013',
members : [
{
name : 'Adam',
location : 'London',
dob : '01/01/1981'
}
{
name : 'Bruce',
location : 'Cardiff',
dob : '02/02/1982'
}
]
}
Here's what I have managed so far. Using the following list comprehension I can easily make sense of the ordinary fields:
event = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] != "catches" ]
but I'm struggling with the members list. There can be any number of members. I think I need to separately create a list for them and add that to a dictionary with the non-iterative records. I can get the member data like this:
tmp_members = [(key, request.form.getlist(key)) for key in request.form.keys() if key[0:7]=="members"]
Then I can pull out the list index and field name:
member_arr = []
members_orig = [ (key, request.form.getlist(key)[0]) for key in request.form.keys() if key[0:7] ==
"members" ]
for i in members_orig:
p1 = i[0].index('[')
p2 = i[0].index(']')
members_index = i[0][p1+1:p2]
p1 = i[0].rfind('[')
members_field = i[0][p1+1:-1]
But how do I add this to my data structure. The following won't work because I could be trying to process members[1][name] before members[0][name].
members_arr[int(members_index)] = {members_field : i[1]}
This seems very convoluted. Is there a simper way of doing this, and if not how can I get this working?

You could store the data in a dictionary and then use the json library.
import json
json_data = json.dumps(dict)
print(json_data)
This will print a json string.
Check out the json library here

Yes, convert it to a dictionary, then use json.dumps(), with some optional parameters, to print out the JSON in the format you need:
eventdict = {
'group_name': 'myGroup',
'notes': 'Here are some notes about the group',
'category': 'social group',
'creation_date': '03/07/2013',
'members': [
{'name': 'Adam',
'location': 'London',
'dob': '01/01/1981'},
{'name': 'Bruce',
'location': 'Cardiff',
'dob': '02/02/1982'}
]
}
import json
print json.dumps(eventdict, indent=4)
The order of the key:value pairs is not always consistent, but if you're just looking for pretty-looking JSON that can be parsed by a script, while remaining human-readable, this should work. You can also sort the keys alphabetically, using:
print json.dumps(eventdict, indent=4, sort_keys=True)

The following python functions can be used to create a nested dictionary from the flat dictionary. Just pass in the html form output to decode().
def get_key_name(str):
first_pos = str.find('[')
return str[:first_pos]
def get_subkey_name(str):
'''Used with lists of dictionaries only'''
first_pos = str.rfind('[')
last_pos = str.rfind(']')
return str[first_pos:last_pos+1]
def get_key_index(str):
first_pos = str.find('[')
last_pos = str.find(']')
return str[first_pos:last_pos+1]
def decode(idic):
odic = {} # Initialise an empty dictionary
# Scan all the top level keys
for key in idic:
# Nested entries have [] in their key
if '[' in key and ']' in key:
if key.rfind('[') == key.find('[') and key.rfind(']') == key.find(']'):
print key, 'is a nested list'
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
# Append can't be used because we may not get the list in the correct order.
try:
odic[key_name][key_index] = idic[key][0]
except KeyError: # List doesn't yet exist
odic[key_name] = [None] * (key_index + 1)
odic[key_name][key_index] = idic[key][0]
except IndexError: # List is too short
odic[key_name] = odic[key_name] + ([None] * (key_index - len(odic[key_name]) + 1 ))
# TO DO: This could be a function
odic[key_name][key_index] = idic[key][0]
else:
key_name = get_key_name(key)
key_index = int(get_key_index(key).replace('[','',1).replace(']','',1))
subkey_name = get_subkey_name(key).replace('[','',1).replace(']','',1)
try:
odic[key_name][key_index][subkey_name] = idic[key][0]
except KeyError: # Dictionary doesn't yet exist
print "KeyError"
# The dictionaries must not be bound to the same object
odic[key_name] = [{} for _ in range(key_index+1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
except IndexError: # List is too short
# The dictionaries must not be bound to the same object
odic[key_name] = odic[key_name] + [{} for _ in range(key_index - len(odic[key_name]) + 1)]
odic[key_name][key_index][subkey_name] = idic[key][0]
else:
# This can be added to the output dictionary directly
print key, 'is a simple key value pair'
odic[key] = idic[key][0]
return odic

Related

How to parse file with different structures in python

I am working on a file where data with a lot of structures. But I cannot figure out an efficient way to handle all of these. My idea is read line by line and find paratheses in pair. Is there any efficient way to match paratheses then I handle each type in specific logic?
Here is the file I am facing:
.....
# some header info that can be discarded
object node {
name R2-12-47-3_node_453;
phases ABCN;
voltage_A 7200+0.0j;
voltage_B -3600-6235j;
voltage_C -3600+6235j;
nominal_voltage 7200;
bustype SWING;
}
...
# a lot of objects node
object triplex_meter {
name R2-12-47-3_tm_403;
phases AS;
voltage_1 120;
voltage_2 120;
voltage_N 0;
nominal_voltage 120;
}
....
# a lot of object triplex_meter
object triplex_line {
groupid Triplex_Line;
name R2-12-47-3_tl_409;
phases AS;
from R2-12-47-3_tn_409;
to R2-12-47-3_tm_409;
length 30;
configuration triplex_line_configuration_1;
}
...
# a lot of object triplex_meter
#some nested objects...awh...
So my question is there way to quickly match "{" and "}" so that I can focus on the type inside.
I am expecting some logic like after parsing the file:
if obj_type == "node":
# to do 1
elif obj_type == "triplex_meter":
# to do 2
It seems easy to deal with this structure, but I am not sure exactly where to get started.

Code with comments
file = """
object node {
name R2-12-47-3_node_453
phases ABCN
voltage_A 7200+0.0j
voltage_B - 3600-6235j
voltage_C - 3600+6235j
nominal_voltage 7200
bustype SWING
}
object triplex_meter {
name R2-12-47-3_tm_403
phases AS
voltage_1 120
voltage_2 120
voltage_N 0
nominal_voltage 120
}
object triplex_line {
groupid Triplex_Line
name R2-12-47-3_tl_409
phases AS
from R2-12-47-3_tn_409
to R2-12-47-3_tm_409
length 30
configuration triplex_line_configuration_1
}"""
# New python dict
data = {}
# Generate a list with all object taken from file
x = file.replace('\n', '').replace(' - ', ' ').strip().split('object ')
for i in x:
# Exclude null items in the list to avoid errors
if i != '':
# Hard split
a, b = i.split('{')
c = b.split(' ')
# Generate a new list with non null elements
d = [e.replace('}', '') for e in c if e != '' and e != ' ']
# Needing a sub dict here for paired values
sub_d = {}
# Iterating over list to get paired values
for index in range(len(d)):
# We are working with paired values so we unpack only pair indexes
if index % 2 == 0:
# Inserting paired values in sub_dict
sub_d[d[index]] = d[index+1]
# Inserting sub_dict in main dict "data" using object name
data[a.strip()] = sub_d
print(data)
Output
{'node': {'name': 'R2-12-47-3_node_453', 'phases': 'ABCN', 'voltage_A': '7200+0.0j', 'voltage_B': '3600-6235j', 'voltage_C': '3600+6235j', 'nominal_voltage': '7200', 'bustype': 'SWING'}, 'triplex_meter': {'name': 'R2-12-47-3_tm_403', 'phases': 'AS', 'voltage_1': '120', 'voltage_2': '120', 'voltage_N': '0', 'nominal_voltage': '120'}, 'triplex_line': {'groupid': 'Triplex_Line', 'name': 'R2-12-47-3_tl_409', 'phases': 'AS', 'from': 'R2-12-47-3_tn_409', 'to': 'R2-12-47-3_tm_409', 'length': '30', 'configuration': 'triplex_line_configuration_1'}}
You can now use the python dict how you want.
For e.g.
print(data['triplex_meter']['name'])
EDIT
If you have got lots of "triplex_meter" objects in your file group it in a Python list before inserting them in the main dict

How can I refactor my code to return a collection of dictionaries?

def read_data(service_client):
data = list_data(domain, realm) # This returns a data frame
building_data = []
building_names = {}
all_buildings = {}
for elem in data.iterrows():
building = elem[1]['building_name']
region_id = elem[1]['region_id']
bandwith = elem[1]['bandwith']
building_id = elem[1]['building_id']
return {
'Building': building,
'Region Id': region_id,
'Bandwith': bandwith,
'Building Id': building_id,
}
Basically I am able to return a single dictionary value upon a iteration here in this example. I have tried printing it as well and others.
I am trying to find a way to store multiple dictionary values on each iteration and return it, instead of just returning one.. Does anyone know any ways to achieve this?

You may replace your for-loop with the following to get all dictionaries in a list.
naming = {
'building_name': 'Building',
'region_id': 'Region Id',
'bandwith': 'Bandwith',
'building_id': 'Building Id',
}
return [
row[list(naming.values())].to_dict()
for idx, row in data.rename(naming, axis=1).iterrows()
]

How to check each key separately from a list in a loop without creating multiple loops. Which may have a KeyError etc

I wrote a code that takes 9 keys from API.
The authors, isbn_one, isbn_two, thumbinail, page_count fields may not always be retrievable, and if any of them are missing, I would like it to be None. Unfortunately, if, or even nested, doesn't work. Because that leads to a lot of loops. I also tried try and except KeyError etc. because each key has a different error and it is not known which to assign none to. Here is an example of logic when a photo is missing:
th = result['volumeInfo'].get('imageLinks')
if th is not None:
book_exists_thumbinail = {
'thumbinail': result['volumeInfo']['imageLinks']['thumbnail']
}
dnew = {**book_data, **book_exists_thumbinail}
book_import.append(dnew)
else:
book_exists_thumbinail_n = {
'thumbinail': None
}
dnew_none = {**book_data, **book_exists_thumbinail_n}
book_import.append(dnew_none)
When I use logic, you know when one condition is met, e.g. for thumbinail, the rest is not even checked.
When I use try and except, it's similar. There's also an ISBN in the keys, but there's a list in the dictionary over there, and I need to use something like this:
isbn_zer = result['volumeInfo']['industryIdentifiers']
dic = collections.defaultdict(list)
for d in isbn_zer:
for k, v in d.items():
dic[k].append(v)
Output data: [{'type': 'ISBN_10', 'identifier': '8320717507'}, {'type': 'ISBN_13', 'identifier': '9788320717501'}]
I don't know what to use anymore to check each key separately and in the case of its absence or lack of one ISBN (identifier) assign the value None. I have already tried many ideas.
The rest of the code:
book_import = []
if request.method == 'POST':
filter_ch = BookFilterForm(request.POST)
if filter_ch.is_valid():
cd = filter_ch.cleaned_data
filter_choice = cd['choose_v']
filter_search = cd['search']
search_url = "https://www.googleapis.com/books/v1/volumes?"
params = {
'q': '{}{}'.format(filter_choice, filter_search),
'key': settings.BOOK_DATA_API_KEY,
'maxResults': 2,
'printType': 'books'
}
r = requests.get(search_url, params=params)
results = r.json()['items']
for result in results:
book_data = {
'title': result['volumeInfo']['title'],
'authors': result['volumeInfo']['authors'][0],
'publish_date': result['volumeInfo']['publishedDate'],
'isbn_one': result['volumeInfo']['industryIdentifiers'][0]['identifier'],
'isbn_two': result['volumeInfo']['industryIdentifiers'][1]['identifier'],
'page_count': result['volumeInfo']['pageCount'],
'thumbnail': result['volumeInfo']['imageLinks']['thumbnail'],
'country': result['saleInfo']['country']
}
book_import.append(book_data)
else:
filter_ch = BookFilterForm()
return render(request, "BookApp/book_import.html", {'book_import': book_import,
'filter_ch': filter_ch})```

Dictionary key name from variable

I am trying to create a nested dictionary, whereby the key to each nested dictionary is named from the value from a variable. My end result should look something like this:
data_dict = {
'jane': {'name': 'jane', 'email': 'jane#example.com'},
'jim': {'name': 'jim', 'email': 'jim#example.com'}
}
Here is what I am trying:
data_dict = {}
s = "jane"
data_dict[s][name] = 'jane'
To my surprise, this does not work. Is this possible?

You want something like:
data_dict = {}
s = "jane"
data_dict[s] = {}
data_dict[s]['name'] = s
That should work, though I would recommend instead of a nested dictionary that you use a dictionary of names to either namedtuples or instances of a class.

Try this:
data_dict = {}
s = ["jane", "jim"]
for name in s:
data_dict[name] = {}
data_dict[name]['name'] = name
data_dict[name]['email'] = name + '#example.com'

as #Milad in the comment mentioned, you first need to initialize s as empty dictionary first
data={}
data['Tom']={}
data['Tom']['name'] = 'Tom Marvolo Riddle'
data['Tom']['email'] = 'iamlordvoldermort.com'

For existing dictionaries you can do dict[key] = value although if there is no dict that would raise an error. I think this is the code you want to have:
data_dict = {}
s = "jane"
data_dict[s] = {"name": s, "email": f"{s}#example.com"}
print(data_dict)
I just realized when I got a notification about this question:
data_dict = defaultdict(dict)
data_dict["jane"]["name"] = "jane"
Would be a better answer I think.

create a dictionary from file python

I am new to python and am trying to read a file and create a dictionary from it.
The format is as follows:
.1.3.6.1.4.1.14823.1.1.27 {
TYPE = Switch
VENDOR = Aruba
MODEL = ArubaS3500-48T
CERTIFICATION = CERTIFIED
CONT = Aruba-Switch
HEALTH = ARUBA-Controller
VLAN = Dot1q INSTRUMENTATION:
Card-Fault = ArubaController:DeviceID
CPU/Memory = ArubaController:DeviceID
Environment = ArubaSysExt:DeviceID
Interface-Fault = MIB2
Interface-Performance = MIB2
Port-Fault = MIB2
Port-Performance = MIB2
}
The first line OID (.1.3.6.1.4.1.14823.1.1.27 { ) I want this to be the key and the remaining lines are the values until the }
I have tried a few combinations but am not able to get the correct regex to match these
Any help please?
I have tried something like
lines = cache.readlines()
for line in lines:
searchObj = re.search(r'(^.\d.*{)(.*)$', line)
if searchObj:
(oid, cert ) = searchObj.groups()
results[searchObj(oid)] = ", ".join(line[1:])
print("searchObj.group() : ", searchObj.group(1))
print("searchObj.group(1) : ", searchObj.group(2))

You can try this:
import re
data = open('filename.txt').read()
the_key = re.findall("^\n*[\.\d]+", data)
values = [re.split("\s+\=\s+", i) for i in re.findall("[a-zA-Z0-9]+\s*\=\s*[a-zA-Z0-9]+", data)]
final_data = {the_key[0]:dict(values)}
Output:
{'\n.1.3.6.1.4.1.14823.1.1.27': {'VENDOR': 'Aruba', 'CERTIFICATION': 'CERTIFIED', 'Fault': 'MIB2', 'VLAN': 'Dot1q', 'Environment': 'ArubaSysExt', 'HEALTH': 'ARUBA', 'Memory': 'ArubaController', 'Performance': 'MIB2', 'CONT': 'Aruba', 'MODEL': 'ArubaS3500', 'TYPE': 'Switch'}}

You could use a nested dict comprehension along with an outer and inner regex.
Your blocks can be separated by
.numbers...numbers.. {
// values here
}
In terms of regular expression this can be formulated as
^\s* # start of line + whitespaces, eventually
(?P<key>\.[\d.]+)\s* # the key
{(?P<values>[^{}]+)} # everything between { and }
As you see, we split the parts into key/value pairs.
Your "inner" structure can be formulated like
(?P<key>\b[A-Z][-/\w]+\b) # the "inner" key
\s*=\s* # whitespaces, =, whitespaces
(?P<value>.+) # the value
Now let's build the "outer" and "inner" expressions together:
rx_outer = re.compile(r'^\s*(?P<key>\.[\d.]+)\s*{(?P<values>[^{}]+)}', re.MULTILINE)
rx_inner = re.compile(r'(?P<key>\b[A-Z][-/\w]+\b)\s*=\s*(?P<value>.+)')
result = {item.group('key'):
{match.group('key'): match.group('value')
for match in rx_inner.finditer(item.group('values'))}
for item in rx_outer.finditer(string)}
print(result)
A demo can be found on ideone.com.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn a simple dictionary into dictionary with nested lists - python

You could store the data in a dictionary and then use the json library. import json json_data = json.dumps(dict) print(json_data) This will print a json string. Check out the json library here

Related

How to parse file with different structures in python

How can I refactor my code to return a collection of dictionaries?

How to check each key separately from a list in a loop without creating multiple loops. Which may have a KeyError etc

Dictionary key name from variable

create a dictionary from file python

Categories

Resources