How to remove doubles by nested attributes in Python? - python

I've got a list of records in which the details contain some doubles. In the list of dicts below you see that the first 3 records (with id 1, 2 and 3) have the same "count" for all the details with a dir "s" (even though their respective detail id's differ). I would like to remove all records from the root list, for which all the counts of the details with a dir "s" are the same as the counts of the details with a dir "s" in a previous record. So from the list below I would want the records with ids 2 and 3 to be removed from the records list.
I've been writing nested loops for a while, but I can't really find a way of doing this. Plus, my code constantly becomes this complete mess real quick.
What would be a logical and Pythonic way of doing this?
records = [
{
'id': 1,
'details': [
{"id": 10, "dir": "s", "count": "1"},
{"id": 20, "dir": "u", "count": "6"},
{"id": 30, "dir": "s", "count": "1"}
]
},
{
'id': 2,
'details': [
{"id": 40, "dir": "s", "count": "1"},
{"id": 50, "dir": "u", "count": "7"},
{"id": 60, "dir": "s", "count": "1"}
]
},
{
'id': 3,
'details': [
{"id": 70, "dir": "s", "count": "1"},
{"id": 80, "dir": "u", "count": "8"},
{"id": 90, "dir": "s", "count": "1"}
]
},
{
'id': 4,
'details': [
{"id": 100, "dir": "s", "count": "999"},
{"id": 110, "dir": "up", "count": "6"},
{"id": 120, "dir": "s", "count": "999"}
]
},
]

Use a set and the key based on the two elements of the dict that you consider the definition of a 'duplicate'.
Simple example to uniquify:
seen=set()
for di in records:
for sdi in di['details']:
key=(sdi['dir'], sdi['count'])
if key not in seen:
seen.add(key)
print(sdi)
else:
# deal with the duplicate?
pass
Prints:
{'id': 10, 'dir': 's', 'count': '1'}
{'id': 20, 'dir': 'u', 'count': '6'}
{'id': 50, 'dir': 'u', 'count': '7'}
{'id': 80, 'dir': 'u', 'count': '8'}
{'id': 100, 'dir': 's', 'count': '999'}
{'id': 110, 'dir': 'up', 'count': '6'}
Giving a first pass the what I think you mean:
seen=set()
new_rec=[]
for di in records:
new_di={}
new_di['id']=di['id']
new_li=[]
for sdi in di['details']:
key=(sdi['dir'], sdi['count'])
if key not in seen:
seen.add(key)
new_li.append(sdi)
else:
# deal with the duplicate?
pass
new_di['details']=new_li
new_rec.append(new_di)
Which results in:
[ { 'id': 1,
'details': [ { 'id': 10,
'dir': 's',
'count': '1'},
{ 'id': 20,
'dir': 'u',
'count': '6'}]},
{ 'id': 2,
'details': [ { 'id': 50,
'dir': 'u',
'count': '7'}]},
{ 'id': 3,
'details': [ { 'id': 80,
'dir': 'u',
'count': '8'}]},
{ 'id': 4,
'details': [ { 'id': 100,
'dir': 's',
'count': '999'},
{ 'id': 110,
'dir': 'up',
'count': '6'}]}]

Related

How to convert list of multiple jsons into individual jsons using pyspark

I have a list of jsons as mentioned below
[
{
"files": 0,
"data": [
{"name": "RFC", "value": "XXXXXXX", "attId": 01},
{"name": "NOMBRE", "value": "JOSE", "attId": 02},
{"name": "APELLIDO PATERNO", "value": "MONTIEL", "attId": 03},
{"name": "APELLIDO MATERNO", "value": "MENDOZA", "attId": 04},
{"name": "FECHA NACIMIENTO", "value": "1989-02-04", "attId": 05}
],
"dirId": 1,
"docId": 4,
"structure": {
"name": "personales",
"folioId": 22
}
},
{
"files": 0,
"data": [
{"name": "CALLE", "value": "AMOR", "attId": 06},
{"name": "No. EXTERIOR", "value": "4", "attId": 07},
{"name": "No. INTERIOR", "value": "2", "attId": 08},
{"name": "C.P.", "value": "55060", "attId": 09},
{"name": "ENTIDAD", "value": "ESTADO DE MEXICO", "attId": 10},
{"name": "MUNICIPIO", "value": "ECATEPEC", "attId": 11},
{"name": "COLONIA", "value": "INDUSTRIAL", "attId": 12}
],
"dirId": 1,
"docId": 4,
"structure": {
"name": "direccion",
"folioId": 22
}
}
]
I need to convert this list of jsons into separate individual jsons and execute them separately.
how to achieve this using pyspark or python?
Something like this, just one loop should work;
PN: There are few leading zeros in your input raw data...
# Assuming given list in your question is stored in lst
for item in lst:
print(item,"\n")
# item here will behave as individual dict/json
# Do your coding here as required
# Output for items;
{'files': 0, 'data': [{'name': 'RFC', 'value': 'XXXXXXX', 'attId': '01'}, {'name': 'NOMBRE', 'value': 'JOSE', 'attId': '02'}, {'name': 'APELLIDO PATERNO', 'value': 'MONTIEL', 'attId': '03'}, {'name': 'APELLIDO MATERNO', 'value': 'MENDOZA', 'attId': '04'}, {'name': 'FECHA NACIMIENTO', 'value': '1989-02-04', 'attId': '05'}], 'dirId': 1, 'docId': 4, 'structure': {'name': 'personales', 'folioId': 22}}
{'files': 0, 'data': [{'name': 'CALLE', 'value': 'AMOR', 'attId': '06'}, {'name': 'No. EXTERIOR', 'value': '4', 'attId': '07'}, {'name': 'No. INTERIOR', 'value': '2', 'attId': '08'}, {'name': 'C.P.', 'value': '55060', 'attId': '09'}, {'name': 'ENTIDAD', 'value': 'ESTADO DE MEXICO', 'attId': '10'}, {'name': 'MUNICIPIO', 'value': 'ECATEPEC', 'attId': '11'}, {'name': 'COLONIA', 'value': 'INDUSTRIAL', 'attId': '12'}], 'dirId': 1, 'docId': 4, 'structure': {'name': 'direccion', 'folioId': 22}}

Fill Multiple empty values in python dictionary for particular key all over dictionary

I have a dictionary as below.
Key id is present multiple times inside dictionary.I need to fill id value at all places in dicts in single line of code.
Currently I am writing multiple line of code to fill empty values.
dicts = {
"abc": {
"a":{"id": "", "id1":""},
"b":{"id": "","hey":"1223"},
"c":{"id": "","hello":"4564"}
},
"xyz": {
"d":{"id": "","id1":"", "ijk":"water"}
},
"f":{"id": ""},
"g":{"id1": ""}
}
id = 123
dicts['abc']['a']['id'] = id
dicts['abc']['b']['id'] = id
dicts['abc']['c']['id'] = id
dicts['xyz']['d']['id'] = id
dicts['f']['id'] = id
dicts
Output:
{'abc': {'a': {'id': 123,"id1":""},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123,id1:"", 'ijk': 'water'}},
'f': {'id': 123}, "g":{"id1": ""}}
You can solve it in place via simple recursive function, for example:
id = 123
dicts = {
"abc": {
"a": {"id": "", "id1": ""},
"b": {"id": "", "hey": "1223"},
"c": {"id": "", "hello": "4564"}
},
"xyz": {
"d": {"id": "", "id1": "", "ijk": "water"}
},
"f": {"id": ""},
"g": {"id1": ""}
}
def process(dicts):
for k, v in dicts.items():
if k == 'id' and not dicts[k]:
dicts[k] = id
if isinstance(v, dict):
process(v)
process(dicts)
print(dicts)
Output:
{
'abc': {'a': {'id': 123, 'id1': ''},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123, 'id1': '', 'ijk': 'water'}},
'f': {'id': 123}, 'g': {'id1': ''}
}

Merge value item and relevant infos from its "parent"

I am trying to merge the parent elements with each value item.
The JSON code has the following format:
[
{"id": "1",
"name": "a",
"values": [
{"ts": 111,
"speed": 12
},
{"ts": 112,
"speed": 8
},
]},
{"id": "2",
"name": "b",
"values": [
{"ts": 113,
"speed": 10
},
{"ts": 114,
"speed": 7
},
]}
In the end, the results should look as follows:
[{"id": "1", "name": "a", "ts": 111, "speed": 12},
{"id": "1", "name": "a", "ts": 112, "speed": 8},
{"id": "2", "name": "b", "ts": 113, "speed": 10},
{"id": "2", "name": "b", "ts": 114, "speed": 7}]
My idea was to use two loops. One that loops through all entries and one that loops through "values".
for entry in data:
for value in entry["values"]:
# a = entry without "values"
# a.update(value)
# print(a)
However, here I have the following problem. How can I get all the values of my entries except "values". I tried to delete "values" from a, however, this resulted in KeyError: 'values'
Furthermore, I am not sure if this is actually a good solution to my problem.
I am using python version 3.6.3.
Thanks a lot in advance for any suggestions!
You can build a new list with a nested comprehension to pull out any values you need:
newList = [{'id': d['id'],'name': d['name'], **v} for d in l for v in d['values']]
newList will be:
[{'id': '1', 'name': 'a', 'ts': 111, 'speed': 12},
{'id': '1', 'name': 'a', 'ts': 112, 'speed': 8},
{'id': '2', 'name': 'b', 'ts': 113, 'speed': 10},
{'id': '2', 'name': 'b', 'ts': 114, 'speed': 7}]

Convert list of dictionaries into a nested dictionary [duplicate]

This question already has answers here:
Python: create a nested dictionary from a list of parent child values
(3 answers)
Closed 3 years ago.
I have a list of dictionaries that I got from the database in parent-child relationship:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
I want to convert it to a nested dictionary such as
[ { "id": 1, "parent_id": 0, "name": "Wood", "price": 0, "children": [ { "id": 2, "parent_id": 1, "name": "Mango", "price": 18, "children": [ { "id": 3, "parent_id": 2, "name": "Table", "price": 342 }, { "id": 4, "parent_id": 2, "name": "Box", "price": 340, "children": [ { "id": 5, "parent_id": 4, "name": "Pencil", "price": 240 } ] } ] } ] }, { "id": 6, "parent_id": 0, "name": "Electronic", "price": 20, "children": [ { "id": 7, "parent_id": 6, "name": "TV", "price": 350 }, { "id": 8, "parent_id": 6, "name": "Mobile", "price": 300, "children": [ { "id": 9, "parent_id": 8, "name": "Iphone", "price": 0, "children": [ { "id": 10, "parent_id": 9, "name": "Iphone 10", "price": 400 } ] } ] } ] } ]
You can do this recursively, starting from the root nodes (where parent_id = 0) going downwards. But before your recursive calls, you can group nodes by their parent_id so that accessing them in each recursive call can be done in constant time:
levels = {}
for n in data:
levels.setdefault(n['parent_id'], []).append(n)
def build_tree(parent_id=0):
nodes = [dict(n) for n in levels.get(parent_id, [])]
for n in nodes:
children = build_tree(n['id'])
if children: n['children'] = children
return nodes
tree = build_tree()
print(tree)
Output
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0,'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
Code is documented inline. Ignoring the corner cases like circular relations etc.
# Actual Data
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Create Parent -> child links using dictonary
data_dict = { r['id'] : r for r in data}
for r in data:
if r['parent_id'] in data_dict:
parent = data_dict[r['parent_id']]
if 'children' not in parent:
parent['children'] = []
parent['children'].append(r)
# Helper function to get all the id's associated with a parent
def get_all_ids(r):
l = list()
l.append(r['id'])
if 'children' in r:
for c in r['children']:
l.extend(get_all_ids(c))
return l
# Trimp the results to have a id only once
ids = set(data_dict.keys())
result = []
for r in data_dict.values():
the_ids = set(get_all_ids(r))
if ids.intersection(the_ids):
ids = ids.difference(the_ids)
result.append(r)
print (result)
Output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400}]}]}]}]
I worked out a VERY SHORT solution, I believe it isn't the most efficient algorithm, but it does the job, will need a hell of optimization to work on very large data sets.
for i in range(len(data)-1, -1, -1):
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
for child in data[i]["children"]:
data.remove(child)
Here is the complete explanation:
data = [
{"id":1, "parent_id": 0, "name": "Wood", "price": 0},
{"id":2, "parent_id": 1, "name": "Mango", "price": 18},
{"id":3, "parent_id": 2, "name": "Table", "price": 342},
{"id":4, "parent_id": 2, "name": "Box", "price": 340},
{"id":5, "parent_id": 4, "name": "Pencil", "price": 240},
{"id":6, "parent_id": 0, "name": "Electronic", "price": 20},
{"id":7, "parent_id": 6, "name": "TV", "price": 350},
{"id":8, "parent_id": 6, "name": "Mobile", "price": 300},
{"id":9, "parent_id": 8, "name": "Iphone", "price": 0},
{"id":10, "parent_id": 9, "name": "Iphone 10", "price": 400}
]
# Looping backwards,placing the lowest child
# into the next parent in the heirarchy
for i in range(len(data)-1, -1, -1):
# Create a dict key for the current parent in the loop called "children"
# and assign to it a list comprehension that loops over all items in the data
# to get the elements which have a parent_id equivalent to our current element's id
data[i]["children"] = [child for child in data if child["parent_id"] == data[i]["id"]]
# since the child is placed inside our its parent already, we will
# remove it from its actual position in the data
for child in data[i]["children"]:
data.remove(child)
# print the new data structure
print(data)
And here is the output:
[{'id': 1, 'parent_id': 0, 'name': 'Wood', 'price': 0, 'children': [{'id': 2, 'parent_id': 1, 'name': 'Mango', 'price': 18, 'children': [{'id': 3, 'parent_id': 2, 'name': 'Table', 'price': 342, 'children': []}, {'id': 4, 'parent_id': 2, 'name': 'Box', 'price': 340, 'children': [{'id': 5, 'parent_id': 4, 'name': 'Pencil', 'price': 240, 'children': []}]}]}]}, {'id': 6, 'parent_id': 0, 'name': 'Electronic', 'price': 20, 'children': [{'id': 7, 'parent_id': 6, 'name': 'TV', 'price': 350, 'children': []}, {'id': 8, 'parent_id': 6, 'name': 'Mobile', 'price': 300, 'children': [{'id': 9, 'parent_id': 8, 'name': 'Iphone', 'price': 0, 'children': [{'id': 10, 'parent_id': 9, 'name': 'Iphone 10', 'price': 400, 'children': []}]}]}]}]

Python, reorganize array of dicts

To be honest, it's too easy for me to make in JS or Perl, but i've completely stuck with that in Python because of coplexed tools for dealing with dicts/lists. So, what i need:
i have an array of dicts:
[
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_1", "k_id": 4},
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_2", "k_id": 5},
{"id": 1, "name": "Res1", "type": "resource", "k_name": "Ind1_3", "k_id": 6},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_1", "k_id": 7},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_2", "k_id": 8},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_3", "k_id": 9},
{"id": 2, "name": "Res2", "type": "service", "k_name": "Ind2_4", "k_id": 10},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_1", "k_id": 11},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_2", "k_id": 12},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_3", "k_id": 13},
{"id": 3, "name": "Res3", "type": "service", "k_name": "Ind3_4", "k_id": 14}
]
and i need to make that:
[
{
"id": 1,
"name": "Res1",
"type": "resource",
"indicators": [
{"name": "Ind1_1","id": 4},
{"name": "Ind1_2","id": 5},
{"name": "Ind1_3","id": 6}
]
},
{
"id": 2,
"name": "Res2",
"type": "service",
"indicators": [
{"name": "Ind2_1","id": 7},
{"name": "Ind2_2","id": 8},
{"name": "Ind2_3","id": 9},
{"name": "Ind2_4","id": 10}
]
},
{
"id": 3,
"name": "Res3",
"type": "service",
"indicators": [
{"name": "Ind3_1","id": 11},
{"name": "Ind3_2","id": 12},
{"name": "Ind3_3","id": 13},
{"name": "Ind3_4","id": 14}
]
}
]
Can you help me with that?
itertools to the rescue:
import itertools
# Assuming your original list is `l`
# if it does not come in order, you need to do this line first, and will probably be less efficient.
l = sorted(l, key=lambda x:(x["id"], x["name"], x["type"]))
d = []
for k, g in itertools.groupby(l, lambda x: (x["id"], x["name"], x["type"])):
d.append({i:v for i, v in zip(["id", "name", "type"], k)})
d[-1]["indicator"] = [{y.split('_')[1]:e[y] for y in ["k_id", "k_name"]} for e in list(g)]
d becomes:
[{'id': 1,
'indicator': [{'id': 4, 'name': 'Ind1_1'},
{'id': 5, 'name': 'Ind1_2'},
{'id': 6, 'name': 'Ind1_3'}],
'name': 'Res1',
'type': 'resource'},
{'id': 2,
'indicator': [{'id': 7, 'name': 'Ind2_1'},
{'id': 8, 'name': 'Ind2_2'},
{'id': 9, 'name': 'Ind2_3'},
{'id': 10, 'name': 'Ind2_4'}],
'name': 'Res2',
'type': 'service'},
{'id': 3,
'indicator': [{'id': 11, 'name': 'Ind3_1'},
{'id': 12, 'name': 'Ind3_2'},
{'id': 13, 'name': 'Ind3_3'},
{'id': 14, 'name': 'Ind3_4'}],
'name': 'Res3',
'type': 'service'}]
You can use a mapping dict to map ids to corresponding sub-lists, so that as you iterate through the list (named l in this example), you can append a new entry to the output list if the id is not found in the mapping, or append the entry to the existing sub-list if id is found in the mapping:
mapping = {}
output = []
for d in l:
i = {'name': d.pop('k_name'), 'id': d.pop('k_id')}
if d['id'] in mapping:
mapping[d['id']].append(i)
else:
output.append({**d, 'indicators': [i]})
mapping[d['id']] = output[-1]['indicators']
output becomes:
[{'id': 1, 'name': 'Res1', 'type': 'resource', 'indicators': [{'name': 'Ind1_1', 'id': 4}, {'name': 'Ind1_2', 'id': 5}, {'name': 'Ind1_3', 'id': 6}]}, {'id': 2, 'name': 'Res2', 'type': 'service', 'indicators': [{'name': 'Ind2_1', 'id': 7}, {'name': 'Ind2_2', 'id': 8}, {'name': 'Ind2_3', 'id': 9}, {'name': 'Ind2_4', 'id': 10}]}, {'id': 3, 'name': 'Res3', 'type': 'service', 'indicators': [{'name': 'Ind3_1', 'id': 11}, {'name': 'Ind3_2', 'id': 12}, {'name': 'Ind3_3', 'id': 13}, {'name': 'Ind3_4', 'id': 14}]}]

Categories

Resources