Convert with points separated string to nested complex dict - python

As a java developer I need some tips how to solve this problem in python 2. My skills in python are in a beginning state. But now the question:
We provide a service for devices, which are reporting some technical statistics in a format, which we can not change. The server runs with python.
The main report are coming as dictionaries and we need to save the the json way. The converting from dict to json is not the problem, but converting the flat and with points separated keys need to be converted.
Perhaps an example can show what I want to say. This is the format from the devices, name it source:
{
'Device.DeviceInfo.SoftwareVersion': 'ote-2.2.1',
'Device.GatewayInfo.ProductClass': 'OEM-TX23',
'Device.GatewayInfo.SerialNumber': 'A223142D1CC7',
'Device.Ethernet.Interface.1.MaxBitRate': 1000,
'Device.HomePlug.Interface.1.AssociatedDevice.1.RxPhyRate': 522,
'Device.HomePlug.Interface.1.AssociatedDevice.1.TxPhyRate': 706,
'Device.HomePlug.Interface.1.AssociatedDevice.1.Active': 1,
'Device.HomePlug.Interface.1.AssociatedDevice.1.MACAddress': 'af:49:79:e4:64:fc',
'Device.HomePlug.Interface.1.AssociatedDevice.2.RxPhyRate': 544,
'Device.HomePlug.Interface.1.AssociatedDevice.2.TxPhyRate': 0,
'Device.HomePlug.Interface.1.AssociatedDevice.2.Active': 1,
'Device.HomePlug.Interface.1.AssociatedDevice.2.MACAddress': 'af:49:79:e4:64:dd',
'Device.Ethernet.Interface.2.MaxBitRate': 1000,
'Device.HomePlug.Interface.2.AssociatedDevice.1.RxPhyRate': 671,
'Device.HomePlug.Interface.2.AssociatedDevice.1.TxPhyRate': 607,
'Device.HomePlug.Interface.2.AssociatedDevice.1.Active': 1,
'Device.HomePlug.Interface.2.AssociatedDevice.1.MACAddress': 'bf:49:79:e4:64:fc',
'Device.HomePlug.Interface.2.AssociatedDevice.2.RxPhyRate': 340,
'Device.HomePlug.Interface.2.AssociatedDevice.2.TxPhyRate': 0,
'Device.HomePlug.Interface.2.AssociatedDevice.2.Active': 1,
'Device.HomePlug.Interface.2.AssociatedDevice.2.MACAddress': 'bf:49:79:e4:64:dd'
}
The Integer values within the source represents the index of the interfaces and AssociatedDevices for this interfaces. So the part behind an integer should be a list of multiple dictionaries. The integer value should not be included within the result.
We need the following nested structure before we can persist it to database, especially mysql docstore. And again, the conversion from nested dict to json is not the problem.
Here is the format we need:
{
'Device': {
'GatewayInfo': {
'SerialNumber': 'A223142D1CC7',
'ProductClass': 'OEM-TX23'
},
'DeviceInfo': {
'SoftwareVersion': 'ote-2.2.1'
},
'Ethernet': {
'Interface': [{
'MaxBitRate': 1000
}, {
'MaxBitRate': 1000
}]
},
'HomePlug': {
'Interface': [{
'AssociatedDevice': [{
'RxPhyRate': 522,
'TxPhyRate': 706,
'Active': 1,
'MACAddress': 'af:49:79:e4:64:fc',
}, {
'RxPhyRate': 544,
'TxPhyRate': 0,
'Active': 1,
'MACAddress': 'af:49:79:e4:64:dd',
}]
}, {
'AssociatedDevice': [{
'RxPhyRate': 671,
'TxPhyRate': 607,
'Active': 1,
'MACAddress': 'bf:49:79:e4:64:fc',
}, {
'RxPhyRate': 340,
'TxPhyRate': 0,
'Active': 1,
'MACAddress': 'bf:49:79:e4:64:dd',
}]
}]
}
}
}
UPDATE:
The first answer is partially correct, except that the parts after the integers should converted to a list containing the rest as dictionary.

You may iterate over your original dict to recursively add keys and add value to the final item as:
new_dict = {}
for key, value in my_dict.items():
k_list = key.split('.')
temp_dict = new_dict
for k in k_list[:-1]:
if k not in temp_dict:
temp_dict[k] = {}
temp_dict = temp_dict[k]
temp_dict[k_list[-1]] = value
where my_dict is your original dict object as mentioned in question.
Final value hold by new_dict will be:
{
"Device":{
"GatewayInfo":{
"SerialNumber":"A223142D1CC7",
"ProductClass":"OEM-TX23"
},
"DeviceInfo":{
"SoftwareVersion":"ote-2.2.1"
},
"HomePlug":{
"Interface":{
"1":{
"AssociatedDevice":{
"1":{
"RxPhyRate":522,
"Active":1,
"TxPhyRate":706,
"MACAddress":"af:49:79:e4:64:fc"
},
"2":{
"Active":1,
"MACAddress":"af:49:79:e4:64:dd",
"RxPhyRate":544,
"TxPhyRate":0
}
}
},
"2":{
"AssociatedDevice":{
"1":{
"RxPhyRate":671,
"Active":1,
"TxPhyRate":607,
"MACAddress":"bf:49:79:e4:64:fc"
},
"2":{
"RxPhyRate":340,
"MACAddress":"bf:49:79:e4:64:dd",
"TxPhyRate":0,
"Active":1
}
}
}
}
},
"Ethernet":{
"Interface":{
"1":{
"MaxBitRate":1000
},
"2":{
"MaxBitRate":1000
}
}
}
}
}

This should work. Just pass the unconverted dict to convert and it will return the converted dict
def convert(data):
to_convert = set()
new_dict = {}
for key, value in data.items():
path_stack = []
k_list = key.split('.')
temp_dict = new_dict
for k in k_list[:-1]:
path_stack.append(k)
if k.isnumeric():
to_convert.add(tuple(path_stack))
if k not in temp_dict:
temp_dict[k] = {}
temp_dict = temp_dict[k]
temp_dict[k_list[-1]] = value
for path in sorted(to_convert, key=len, reverse=True):
current_level = new_dict
for k in path[:-2]:
current_level = current_level[k]
if isinstance(current_level[path[-2]], dict):
new_level = [current_level[path[-2]][i] for i in sorted(current_level[path[-2]].keys())]
else:
new_level = current_level[path[-2]]
current_level[path[-2]] = new_level
return new_dict

If you want to dive into the python you may be interested in module dotteddict.
It's a little tricky but very intersting "pythonic" code. At a moment it doesn't convert numeric keys into list but some concepts from it is definitely worth a time to spend.

Related

Changing value of a value in a dictionary within a list within a dictionary

I have a json like:
pd = {
"RP": [
{
"Name": "PD",
"Value": "qwe"
},
{
"Name": "qwe",
"Value": "change"
}
],
"RFN": [
"All"
],
"RIT": [
{
"ID": "All",
"IDT": "All"
}
]
}
I am trying to change the value change to changed. This is a dictionary within a list which is within another dictionary. Is there a better/ more efficient/pythonic way to do this than what I did below:
for key, value in pd.items():
ls = pd[key]
for d in ls:
if type(d) == dict:
for k,v in d.items():
if v == 'change':
pd[key][ls.index(d)][k] = "changed"
This seems pretty inefficient due to the amount of times I am parsing through the data.
String replacement could work if you don't want to write depth/breadth-first search.
>>> import json
>>> json.loads(json.dumps(pd).replace('"Value": "change"', '"Value": "changed"'))
{'RP': [{'Name': 'PD', 'Value': 'qwe'}, {'Name': 'qwe', 'Value': 'changed'}],
'RFN': ['All'],
'RIT': [{'ID': 'All', 'IDT': 'All'}]}

Having an issue parsing through this json in python

I have created a var that is equal to t.json. The JSON file is a follows:
{
"groups": {
"customerduy": {
"nonprod": {
"name": "customerduynonprod",
"id": "529646781943",
"owner": "cloudops#coerce.com",
"manager_email": ""
},
"prod": {
"name": "phishing_duyaccountprod",
"id": "241683454720",
"owner": "cloudops#coerce.com",
"manager_email": ""
}
},
"customerduyprod": {
"nonprod": {
"name": "phishing_duyaccountnonprod",
"id": "638968214142",
"owner": "cloudops#coerce.com",
"manager_email": ""
}
},
"ciasuppliergenius": {
"prod": {
"name": "ciasuppliergeniusprod",
"id": "220753788760",
"owner": "cia_developers#coerce.com",
"manager_email": "jarks#coerce.com"
}
}
}
}
my goal was to pars this JSON file and get value for "owner" and output it to a new var. Example below:
t.json = group_map
group_id_aws = group(
group.upper(),
"accounts",
template,
owner = group_map['groups']['prod'],
manager_description = "Groups for teams to access their product accounts.",
The error I keep getting is: KeyError: 'prod'
Owner occurs 4 times, so here is how to get all of them.
import json
# read the json
with open("C:\\test\\test.json") as f:
data = json.load(f)
# get all 4 occurances
owner_1 = data['groups']['customerduy']['nonprod']['owner']
owner_2 = data['groups']['customerduy']['prod']['owner']
owner_3 = data['groups']['customerduyprod']['nonprod']['owner']
owner_4 = data['groups']['ciasuppliergenius']['prod']['owner']
# print results
print(owner_1)
print(owner_2)
print(owner_3)
print(owner_4)
the result:
cloudops#coerce.com
cloudops#coerce.com
cloudops#coerce.com
cia_developers#coerce.com
You get a key error since the key 'prod' is not in 'groups'
What you have is
group_map['groups']['customerduy']['prod']
group_map['groups']['ciasuppliergenius']['prod']
So you will have to extract the 'owner' from each element in the tree:
def s(d,t):
for k,v in d.items():
if t == k:
yield v
try:
for i in s(v,t):
yield i
except:
pass
print(','.join(s(j,'owner')))
If your JSON is loaded in variable data, you can use a recursive function
that deals with the two containers types (dict and list) that can occur
in a JSON file, recursively:
def find_all_values_for_key(d, key, result):
if isinstance(d, dict):
if key in d:
result.append(d[key])
return
for k, v in d.items():
find_all_values_for_key(v, key, result)
elif isinstance(d, list):
for elem in d:
find_all_values_for_key(elem, key, result)
owners = []
find_all_values_for_key(data, 'owner', owners)
print(f'{owners=}')
which gives:
owners=['cloudops#coerce.com', 'cloudops#coerce.com', 'cloudops#coerce.com', 'cia_developers#coerce.com']
This way you don't have to bother with the names of intermediate keys, or in general the structure of your JSON file.
You don't have any lists in your example, but it is trivial to recurse through
them to any dict with an owner key that might "lurk" somewhere nested
under a a list element, so it is better to deal with potential future changes
to the JSON.

unable to update JSON using python

I am trying to update transaction ID from the following json:
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
I have done following code for the same, but it does not update the transaction id, but it inserts the transaction id to the end of block:-
try:
session = requests.Session()
with open(
"sales.json",
"r") as read_file:
payload = json.load(read_file)
payload["transactionId"] = random.randint(0, 5)
with open(
"sales.json",
"w") as read_file:
json.dump(payload, read_file)
Output:-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
}
'transactionId': 1
}
Expected Outut:-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1",
"source": "WEB"
} ]
This would do it, but only in your specific case:
payload["transactions"][0]["transactionId"] = xxx
There should be error handling for cases like "transactions" key is not int the dict, or there are no records or there are more than one
also, you will need to assign =str(your_random_number) not the int if you wish to have the record of type string as the desired output suggests
If you just want to find the transactionId key and you don't know exactly where it may exist. You can do-
from collections.abc import Mapping
def update_key(key, new_value, jsondict):
new_dict = {}
for k, v in jsondict.items():
if isinstance(v, Mapping):
# Recursive traverse if value is a dict
new_dict[k] = update_key(key, new_value, v)
elif isinstance(v, list):
# Traverse through all values of list
# Recursively traverse if an element is a dict
new_dict[k] = [update_key(key, new_value, innerv) if isinstance(innerv, Mapping) else innerv for innerv in v]
elif k == key:
# This is the key to replace with new value
new_dict[k] = new_value
else:
# Just a regular value, assign to new dict
new_dict[k] = v
return new_dict
Given a dict-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
}
You can do-
>>> update_key('transactionId', 5, d)
{'locationId': '5115', 'transactions': [{'transactionId': 5, 'source': 'WEB'}]}
Yes because transactionId is inside transactions node. So your code should be like:
payload["transactions"][0].transactionId = random.randint(0, 5)
or
payload["transactions"][0]["transactionId"] = random.randint(0, 5)

Using .values() with list of dictionaries?

I'm comparing json files between two different API endpoints to see which json records need an update, which need a create and what needs a delete. So, by comparing the two json files, I want to end up with three json files, one for each operation.
The json at both endpoints is structured like this (but they use different keys for same sets of values; different problem):
{
"records": [{
"id": "id-value-here",
"c": {
"d": "eee"
},
"f": {
"l": "last",
"f": "first"
},
"g": ["100", "89", "9831", "09112", "800"]
}, {
…
}]
}
So the json is represented as a list of dictionaries (with further nested lists and dictionaries).
If a given json endpoint (j1) id value ("id":) exists in the other endpoint json (j2), then that record should be added to j_update.
So far I have something like this, but I can see that .values() doesn't work because it's trying to operate on the list instead of on all the listed dictionaries(?):
j_update = {r for r in j1['records'] if r['id'] in
j2.values()}
This doesn't return an error, but it creates an empty set using test json files.
Seems like this should be simple, but tripping over the nesting I think of dictionaries in a list representing the json. Do I need to flatten j2, or is there a simpler dictionary method python has to achieve this?
====edit j1 and j2====
have same structure, use different keys; toy data
j1
{
"records": [{
"field_5": 2329309841,
"field_12": {
"email": "cmix#etest.com"
},
"field_20": {
"last": "Mixalona",
"first": "Clara"
},
"field_28": ["9002329309999", "9002329309112"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309831,
"field_12": {
"email": "mherbitz345#test.com"
},
"field_20": {
"last": "Herbitz",
"first": "Michael"
},
"field_28": ["9002329309831", "9002329309112", "8002329309999"],
"field_44": ["1002329309832"]
}, {
"field_5": 2329309855,
"field_12": {
"email": "nkatamaran#test.com"
},
"field_20": {
"first": "Noriss",
"last": "Katamaran"
},
"field_28": ["9002329309111", "8002329309112"],
"field_44": ["1002329309877"]
}]
}
j2
{
"records": [{
"id": 2329309831,
"email": {
"email": "mherbitz345#test.com"
},
"name_primary": {
"last": "Herbitz",
"first": "Michael"
},
"assign": ["8003329309831", "8007329309789"],
"hr_id": ["1002329309877"]
}, {
"id": 2329309884,
"email": {
"email": "yinleeshu#test.com"
},
"name_primary": {
"last": "Lee Shu",
"first": "Yin"
},
"assign": ["8002329309111", "9003329309831", "9002329309111", "8002329309999", "8002329309112"],
"hr_id": ["1002329309832"]
}, {
"id": 23293098338,
"email": {
"email": "amlouis#test.com"
},
"name_primary": {
"last": "Maxwell Louis",
"first": "Albert"
},
"assign": ["8002329309111", "8007329309789", "9003329309831", "8002329309999", "8002329309112"],
"hr_id": ["1002329309877"]
}]
}
If you read the json it will output a dict. You are looking for a particular key in the list of the values.
if 'records' in j2:
r = j2['records'][0].get('id', []) # defaults if id does not exist
It it prettier to do a recursive search but i dunno how you data is organized to quickly come up with a solution.
To give an idea for recursive search consider this example
def resursiveSearch(dictionary, target):
if target in dictionary:
return dictionary[target]
for key, value in dictionary.items():
if isinstance(value, dict):
target = resursiveSearch(value, target)
if target:
return target
a = {'test' : 'b', 'test1' : dict(x = dict(z = 3), y = 2)}
print(resursiveSearch(a, 'z'))
You tried:
j_update = {r for r in j1['records'] if r['id'] in j2.values()}
Aside from the r['id'/'field_5] problem, you have:
>>> list(j2.values())
[[{'id': 2329309831, ...}, ...]]
The id are buried inside a list and a dict, thus the test r['id'] in j2.values() always return False.
The basic solution is fairly simple.
First, create a set of j2 ids:
>>> present_in_j2 = set(record["id"] for record in j2["records"])
Then, rebuild the json structure of j1 but without the j1 field_5 that are not present in j2:
>>> {"records":[record for record in j1["records"] if record["field_5"] in present_in_j2]}
{'records': [{'field_5': 2329309831, 'field_12': {'email': 'mherbitz345#test.com'}, 'field_20': {'last': 'Herbitz', 'first': 'Michael'}, 'field_28': ['9002329309831', '9002329309112', '8002329309999'], 'field_44': ['1002329309832']}]}
It works, but it's not totally satisfying because of the weird keys of j1. Let's try to convert j1 to a more friendly format:
def map_keys(json_value, conversion_table):
"""Map the keys of a json value
This is a recursive DFS"""
def map_keys_aux(json_value):
"""Capture the conversion table"""
if isinstance(json_value, list):
return [map_keys_aux(v) for v in json_value]
elif isinstance(json_value, dict):
return {conversion_table.get(k, k):map_keys_aux(v) for k,v in json_value.items()}
else:
return json_value
return map_keys_aux(json_value)
The function focuses on dictionary keys: conversion_table.get(k, k) is conversion_table[k] if the key is present in the conversion table, or the key itself otherwise.
>>> j1toj2 = {"field_5":"id", "field_12":"email", "field_20":"name_primary", "field_28":"assign", "field_44":"hr_id"}
>>> mapped_j1 = map_keys(j1, j1toj2)
Now, the code is cleaner and the output may be more useful for a PUT:
>>> d1 = {record["id"]:record for record in mapped_j1["records"]}
>>> present_in_j2 = set(record["id"] for record in j2["records"])
>>> {"records":[record for record in mapped_j1["records"] if record["id"] in present_in_j2]}
{'records': [{'id': 2329309831, 'email': {'email': 'mherbitz345#test.com'}, 'name_primary': {'last': 'Herbitz', 'first': 'Michael'}, 'assign': ['9002329309831', '9002329309112', '8002329309999'], 'hr_id': ['1002329309832']}]}

How to Pythonically map content from one dict to another in a fail safe manner?

I've got one dict from an api:
initial_dict = {
"content": {
"text":
},
"meta": {
"title": "something",
"created": "2016-03-04 15:30",
"author": "Pete",
"extra": {
"a": 123,
"b": 456
}
}
}
and I need to map this to another dict:
new_dict = {
"content_text": initial_dict['content']['text'],
"meta_title": initial_dict['meta']['title'],
"meta_extras": {
"time_related": {
initial_dict['meta']['created']
},
"by": initial_dict['meta']['author']
}
}
The problem is that not all fields are always in the initial_dict. I can of course wrap the whole creation of new_dict into a try/except, but then it would fail if one of the initial fields doesn't exist.
Is there no other way than creating a try/except for each and every field I add to the new_dict? In reality the dict is way bigger than this (about 400 key/value pairs), so this will become a mess quite fast.
Isn't there a better and more pythonic way of doing this?
How about using dict.get? Instead of throwing an error, this returns None if the key isn't in the dictionary.
new_dict = {
"content_text": initial_dict['content'].get('text'),
"meta_title": initial_dict['meta'].get('title'),
"meta_extras": {
"time_related": {
initial_dict['meta'].get('created')
},
"by": initial_dict['meta'].get('author')
}
}
If this goes deeper than one level, you can do some_dict.get('key1', {}).get('key2') as was suggested in the comments.
Converting the original dict to a defaultdict is also an option, which allows you to keep using the [] notation (more practical than having to chain get methods):
from collections import defaultdict
def to_defaultdict(d):
return defaultdict(lambda: None, ((k, to_defaultdict(v) if isinstance(v, dict) else v)
for k, v in d.items()))
initial_dict = to_defaultdict(initial_dict)
You can then filter out the None values:
def filter_dict(d):
return dict((k, filter_dict(v) if isinstance(v, dict) else v)
for k, v in d.items() if v is not None)
new_dict = filter_dict(new_dict)

Categories

Resources