Related
I have a list of jsons as mentioned below
[
{
"files": 0,
"data": [
{"name": "RFC", "value": "XXXXXXX", "attId": 01},
{"name": "NOMBRE", "value": "JOSE", "attId": 02},
{"name": "APELLIDO PATERNO", "value": "MONTIEL", "attId": 03},
{"name": "APELLIDO MATERNO", "value": "MENDOZA", "attId": 04},
{"name": "FECHA NACIMIENTO", "value": "1989-02-04", "attId": 05}
],
"dirId": 1,
"docId": 4,
"structure": {
"name": "personales",
"folioId": 22
}
},
{
"files": 0,
"data": [
{"name": "CALLE", "value": "AMOR", "attId": 06},
{"name": "No. EXTERIOR", "value": "4", "attId": 07},
{"name": "No. INTERIOR", "value": "2", "attId": 08},
{"name": "C.P.", "value": "55060", "attId": 09},
{"name": "ENTIDAD", "value": "ESTADO DE MEXICO", "attId": 10},
{"name": "MUNICIPIO", "value": "ECATEPEC", "attId": 11},
{"name": "COLONIA", "value": "INDUSTRIAL", "attId": 12}
],
"dirId": 1,
"docId": 4,
"structure": {
"name": "direccion",
"folioId": 22
}
}
]
I need to convert this list of jsons into separate individual jsons and execute them separately.
how to achieve this using pyspark or python?
Something like this, just one loop should work;
PN: There are few leading zeros in your input raw data...
# Assuming given list in your question is stored in lst
for item in lst:
print(item,"\n")
# item here will behave as individual dict/json
# Do your coding here as required
# Output for items;
{'files': 0, 'data': [{'name': 'RFC', 'value': 'XXXXXXX', 'attId': '01'}, {'name': 'NOMBRE', 'value': 'JOSE', 'attId': '02'}, {'name': 'APELLIDO PATERNO', 'value': 'MONTIEL', 'attId': '03'}, {'name': 'APELLIDO MATERNO', 'value': 'MENDOZA', 'attId': '04'}, {'name': 'FECHA NACIMIENTO', 'value': '1989-02-04', 'attId': '05'}], 'dirId': 1, 'docId': 4, 'structure': {'name': 'personales', 'folioId': 22}}
{'files': 0, 'data': [{'name': 'CALLE', 'value': 'AMOR', 'attId': '06'}, {'name': 'No. EXTERIOR', 'value': '4', 'attId': '07'}, {'name': 'No. INTERIOR', 'value': '2', 'attId': '08'}, {'name': 'C.P.', 'value': '55060', 'attId': '09'}, {'name': 'ENTIDAD', 'value': 'ESTADO DE MEXICO', 'attId': '10'}, {'name': 'MUNICIPIO', 'value': 'ECATEPEC', 'attId': '11'}, {'name': 'COLONIA', 'value': 'INDUSTRIAL', 'attId': '12'}], 'dirId': 1, 'docId': 4, 'structure': {'name': 'direccion', 'folioId': 22}}
I have a dictionary as below.
Key id is present multiple times inside dictionary.I need to fill id value at all places in dicts in single line of code.
Currently I am writing multiple line of code to fill empty values.
dicts = {
"abc": {
"a":{"id": "", "id1":""},
"b":{"id": "","hey":"1223"},
"c":{"id": "","hello":"4564"}
},
"xyz": {
"d":{"id": "","id1":"", "ijk":"water"}
},
"f":{"id": ""},
"g":{"id1": ""}
}
id = 123
dicts['abc']['a']['id'] = id
dicts['abc']['b']['id'] = id
dicts['abc']['c']['id'] = id
dicts['xyz']['d']['id'] = id
dicts['f']['id'] = id
dicts
Output:
{'abc': {'a': {'id': 123,"id1":""},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123,id1:"", 'ijk': 'water'}},
'f': {'id': 123}, "g":{"id1": ""}}
You can solve it in place via simple recursive function, for example:
id = 123
dicts = {
"abc": {
"a": {"id": "", "id1": ""},
"b": {"id": "", "hey": "1223"},
"c": {"id": "", "hello": "4564"}
},
"xyz": {
"d": {"id": "", "id1": "", "ijk": "water"}
},
"f": {"id": ""},
"g": {"id1": ""}
}
def process(dicts):
for k, v in dicts.items():
if k == 'id' and not dicts[k]:
dicts[k] = id
if isinstance(v, dict):
process(v)
process(dicts)
print(dicts)
Output:
{
'abc': {'a': {'id': 123, 'id1': ''},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123, 'id1': '', 'ijk': 'water'}},
'f': {'id': 123}, 'g': {'id1': ''}
}
I am trying to merge the parent elements with each value item.
The JSON code has the following format:
[
{"id": "1",
"name": "a",
"values": [
{"ts": 111,
"speed": 12
},
{"ts": 112,
"speed": 8
},
]},
{"id": "2",
"name": "b",
"values": [
{"ts": 113,
"speed": 10
},
{"ts": 114,
"speed": 7
},
]}
In the end, the results should look as follows:
[{"id": "1", "name": "a", "ts": 111, "speed": 12},
{"id": "1", "name": "a", "ts": 112, "speed": 8},
{"id": "2", "name": "b", "ts": 113, "speed": 10},
{"id": "2", "name": "b", "ts": 114, "speed": 7}]
My idea was to use two loops. One that loops through all entries and one that loops through "values".
for entry in data:
for value in entry["values"]:
# a = entry without "values"
# a.update(value)
# print(a)
However, here I have the following problem. How can I get all the values of my entries except "values". I tried to delete "values" from a, however, this resulted in KeyError: 'values'
Furthermore, I am not sure if this is actually a good solution to my problem.
I am using python version 3.6.3.
Thanks a lot in advance for any suggestions!
You can build a new list with a nested comprehension to pull out any values you need:
newList = [{'id': d['id'],'name': d['name'], **v} for d in l for v in d['values']]
newList will be:
[{'id': '1', 'name': 'a', 'ts': 111, 'speed': 12},
{'id': '1', 'name': 'a', 'ts': 112, 'speed': 8},
{'id': '2', 'name': 'b', 'ts': 113, 'speed': 10},
{'id': '2', 'name': 'b', 'ts': 114, 'speed': 7}]
This question already has answers here:
Python: create a nested dictionary from a list of parent child values
(3 answers)
Closed 3 years ago.
I have a list of dictionaries that I got from the database in parent-child relationship:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
I want to convert it to a nested dictionary such as
[ { "id": 1, "parent_id": 0, "name": "Wood", "price": 0, "children": [ { "id": 2, "parent_id": 1, "name": "Mango", "price": 18, "children": [ { "id": 3, "parent_id": 2, "name": "Table", "price": 342 }, { "id": 4, "parent_id": 2, "name": "Box", "price": 340, "children": [ { "id": 5, "parent_id": 4, "name": "Pencil", "price": 240 } ] } ] } ] }, { "id": 6, "parent_id": 0, "name": "Electronic", "price": 20, "children": [ { "id": 7, "parent_id": 6, "name": "TV", "price": 350 }, { "id": 8, "parent_id": 6, "name": "Mobile", "price": 300, "children": [ { "id": 9, "parent_id": 8, "name": "Iphone", "price": 0, "children": [ { "id": 10, "parent_id": 9, "name": "Iphone 10", "price": 400 } ] } ] } ] } ]
You can do this recursively, starting from the root nodes (where parent_id = 0) going downwards. But before your recursive calls, you can group nodes by their parent_id so that accessing them in each recursive call can be done in constant time:
levels = {}
for n in data:
levels.setdefault(n['parent_id'], []).append(n)
def build_tree(parent_id=0):
nodes = [dict(n) for n in levels.get(parent_id, [])]
for n in nodes:
children = build_tree(n['id'])
if children: n['children'] = children
return nodes
tree = build_tree()
print(tree)
Output
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0,'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
Code is documented inline. Ignoring the corner cases like circular relations etc.
# Actual Data
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Create Parent -> child links using dictonary
data_dict = { r['id'] : r for r in data}
for r in data:
if r['parent_id'] in data_dict:
parent = data_dict[r['parent_id']]
if 'children' not in parent:
parent['children'] = []
parent['children'].append(r)
# Helper function to get all the id's associated with a parent
def get_all_ids(r):
l = list()
l.append(r['id'])
if 'children' in r:
for c in r['children']:
l.extend(get_all_ids(c))
return l
# Trimp the results to have a id only once
ids = set(data_dict.keys())
result = []
for r in data_dict.values():
the_ids = set(get_all_ids(r))
if ids.intersection(the_ids):
ids = ids.difference(the_ids)
result.append(r)
print (result)
Output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
I worked out a VERY SHORT solution, I believe it isn't the most efficient algorithm, but it does the job, will need a hell of optimization to work on very large data sets.
for i in range(len(data)-1, -1, -1):
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
for child in data[i]["children"]:
data.remove(child)
Here is the complete explanation:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Looping backwards,placing the lowest child
# into the next parent in the heirarchy
for i in range(len(data)-1, -1, -1):
# Create a dict key for the current parent in the loop called "children"
# and assign to it a list comprehension that loops over all items in the data
# to get the elements which have a parent_id equivalent to our current element's id
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
# since the child is placed inside our its parent already, we will
# remove it from its actual position in the data
for child in data[i]["children"]:
data.remove(child)
# print the new data structure
print(data)
And here is the output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342, 'children': []}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240, 'children': []}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350, 'children': []}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400, 'children': []}]}]}]}]
To be honest, it's too easy for me to make in JS or Perl, but i've completely stuck with that in Python because of coplexed tools for dealing with dicts/lists. So, what i need:
i have an array of dicts:
[
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_1", "k_id": 4},
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_2", "k_id": 5},
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_3", "k_id": 6},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_1", "k_id": 7},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_2", "k_id": 8},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_3", "k_id": 9},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_4", "k_id": 10},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_1", "k_id": 11},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_2", "k_id": 12},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_3", "k_id": 13},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_4", "k_id": 14}
]
and i need to make that:
[
{
"id": 1,
"name": "Res1",
"type": "resource",
"indicators": [
{"name": "Ind1_1","id": 4},
{"name": "Ind1_2","id": 5},
{"name": "Ind1_3","id": 6}
]
},
{
"id": 2,
"name": "Res2",
"type": "service",
"indicators": [
{"name": "Ind2_1","id": 7},
{"name": "Ind2_2","id": 8},
{"name": "Ind2_3","id": 9},
{"name": "Ind2_4","id": 10}
]
},
{
"id": 3,
"name": "Res3",
"type": "service",
"indicators": [
{"name": "Ind3_1","id": 11},
{"name": "Ind3_2","id": 12},
{"name": "Ind3_3","id": 13},
{"name": "Ind3_4","id": 14}
]
}
]
Can you help me with that?
itertools to the rescue:
import itertools
# Assuming your original list is `l`
# if it does not come in order, you need to do this line first, and will probably be less efficient.
l = sorted(l, key=lambda x:(x["id"], x["name"], x["type"]))
d = []
for k, g in itertools.groupby(l, lambda x: (x["id"], x["name"], x["type"])):
d.append({i:v for i, v in zip(["id", "name", "type"], k)})
d[-1]["indicator"] = [{y.split('_')[1]:e[y] for y in ["k_id", "k_name"]} for e in list(g)]
d becomes:
[{'id': 1,
'indicator': [{'id': 4, 'name': 'Ind1_1'},
{'id': 5, 'name': 'Ind1_2'},
{'id': 6, 'name': 'Ind1_3'}],
'name': 'Res1',
'type': 'resource'},
{'id': 2,
'indicator': [{'id': 7, 'name': 'Ind2_1'},
{'id': 8, 'name': 'Ind2_2'},
{'id': 9, 'name': 'Ind2_3'},
{'id': 10, 'name': 'Ind2_4'}],
'name': 'Res2',
'type': 'service'},
{'id': 3,
'indicator': [{'id': 11, 'name': 'Ind3_1'},
{'id': 12, 'name': 'Ind3_2'},
{'id': 13, 'name': 'Ind3_3'},
{'id': 14, 'name': 'Ind3_4'}],
'name': 'Res3',
'type': 'service'}]
You can use a mapping dict to map ids to corresponding sub-lists, so that as you iterate through the list (named l in this example), you can append a new entry to the output list if the id is not found in the mapping, or append the entry to the existing sub-list if id is found in the mapping:
mapping = {}
output = []
for d in l:
i = {'name': d.pop('k_name'), 'id': d.pop('k_id')}
if d['id'] in mapping:
mapping[d['id']].append(i)
else:
output.append({**d, 'indicators': [i]})
mapping[d['id']] = output[-1]['indicators']
output becomes:
[{'id': 1, 'name': 'Res1', 'type': 'resource', 'indicators': [{'name': 'Ind1_1', 'id': 4}, {'name': 'Ind1_2', 'id': 5}, {'name': 'Ind1_3', 'id': 6}]}, {'id': 2, 'name': 'Res2', 'type': 'service', 'indicators': [{'name': 'Ind2_1', 'id': 7}, {'name': 'Ind2_2', 'id': 8}, {'name': 'Ind2_3', 'id': 9}, {'name': 'Ind2_4', 'id': 10}]}, {'id': 3, 'name': 'Res3', 'type': 'service', 'indicators': [{'name': 'Ind3_1', 'id': 11}, {'name': 'Ind3_2', 'id': 12}, {'name': 'Ind3_3', 'id': 13}, {'name': 'Ind3_4', 'id': 14}]}]