python dictionary to json - python

the output of file comes as dictionary, with 5 columns. Due to the 5th column the first 4 are duplicated. My goals is to output it as a json, without duplicates in the following format.
Sample input:
test_dict = [
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"123"},
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"345"}
]
Previously there were no duplicates so it was easy to transform to json as below:
result = defaultdict(set)
for i in test_dict:
id = i.get('ID')
if id:
result[i].add(i.get('ID_A'))
result[i].add(i.get('ID_B'))
result[i].add(i.get('ID_C'))
output = []
for id, details in result.items():
output.append(
{
"ID": id,
"otherDetails": {
"IDs": [
{"id": ref} for ref in details
]
},
}
)
How could I add INVOICE to this without duplicating the rows? The output would look like this:
[{'ID': '"A"',
'OtherDetails': {'IDs': [{'id': 'A1'},
{'id': 'A2'},
{'id': 'A3'}],
{'INVOICE': [{'id':'123'},
{'id':'345'}]}}}]
Thanks! (python 3.9)

Basically, you can just do the same as for the IDs, using a second defaultdict (or similar) for the invoice IDs. Afterwards, use a nested list/dict comprehension to build the final result.
id_to_ids = defaultdict(set)
id_to_inv = defaultdict(set)
for d in test_dict:
id_to_ids[d["ID"]] |= {d[k] for k in ["ID_A", "ID_B", "ID_C"]}
id_to_inv[d["ID"]] |= {d["INVOICE"]}
result = [{
'ID': k,
'OtherDetails': {
'IDs': [{'id': v} for v in id_to_ids[k]],
'INVOICE': [{'id': v} for v in id_to_inv[k]]
}} for k in id_to_ids]
Note, though, that using this format, you will lose the information which of the "other" IDs was which, and with that invoice ID those were associated.

You were pretty close. I would make the intermediate dictionary a little bit more straight forward. And have it just be a diction with id, and two lists.
When walking the original data, you just need to append INVOICE if there is already an entry for the ID. Then when you create the "json" format (a list of dictionary for each ID), all you have to do is use the lists you already generate. Here is the structure I propose.
from collections import defaultdict
test_dict = [
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"123"},
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"345"}
]
result = {}
for i in test_dict:
id = i.get('ID')
if not id:
continue
if id in result:
# just add INVOICE
result[id]['INVOICE'].append(i.get('INVOICE'))
else:
# ID not in result dictionary, so populate it
result[id] = {'IDs': [ i.get('ID_A'), i.get('ID_B'), i.get('ID_C')],
'INVOICE' : [i.get('INVOICE')]
}
output = []
for id, details in result.items():
output.append(
{
"ID": id,
"otherDetails": {
"IDs": details['IDs'],
'INVOICE': details['INVOICE']
}
}
)
The trick for duplicate id's is handled by the if id in result where it only appends the invoice to the list of invoices. I will also add since we are using a lot of dict.get() calls rather than simple dict[], we are potentially adding a bunch of None's into these lists.

The like the answer from #tobias_k, but it does not handle duplicate values for any of the ID_* or invoice columns. His answer is the most simple if order and repetition are not important.
Checkout this if they are important.
import pandas as pd
def create_item(df: pd.DataFrame):
output = list()
groups = df.groupby(["ID", "ID_A", "ID_B", "ID_C"])[["INVOICE"]]
for group, gdf in groups:
row = dict()
row["ID"] = group[0]
row["OtherDetails"] = dict()
row["OtherDetails"]["IDS"] = [{"id": x} for x in group[1:]]
row["OtherDetails"]["INVOICE"] = [{"id": x} for x in gdf["INVOICE"]]
output.append(row)
return output
test_dict = [
{"ID": "A", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
{"ID": "A", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "345"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "345"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
]
test_df = pd.DataFrame(test_dict)
create_item(test_df)
Which will return
[{'ID': 'A',
'OtherDetails': {'IDS': [{'id': 'A1'}, {'id': 'A2'}, {'id': 'A3'}],
'INVOICE': [{'id': '123'}, {'id': '345'}]}},
{'ID': 'B',
'OtherDetails': {'IDS': [{'id': 'A1'}, {'id': 'A2'}, {'id': 'A3'}],
'INVOICE': [{'id': '123'}, {'id': '345'}, {'id': '123'}]}}]

Related

Changing value of a value in a dictionary within a list within a dictionary

I have a json like:
pd = {
"RP": [
{
"Name": "PD",
"Value": "qwe"
},
{
"Name": "qwe",
"Value": "change"
}
],
"RFN": [
"All"
],
"RIT": [
{
"ID": "All",
"IDT": "All"
}
]
}
I am trying to change the value change to changed. This is a dictionary within a list which is within another dictionary. Is there a better/ more efficient/pythonic way to do this than what I did below:
for key, value in pd.items():
ls = pd[key]
for d in ls:
if type(d) == dict:
for k,v in d.items():
if v == 'change':
pd[key][ls.index(d)][k] = "changed"
This seems pretty inefficient due to the amount of times I am parsing through the data.
String replacement could work if you don't want to write depth/breadth-first search.
>>> import json
>>> json.loads(json.dumps(pd).replace('"Value": "change"', '"Value": "changed"'))
{'RP': [{'Name': 'PD', 'Value': 'qwe'}, {'Name': 'qwe', 'Value': 'changed'}],
'RFN': ['All'],
'RIT': [{'ID': 'All', 'IDT': 'All'}]}

working with list of dictionaries in python

I have a list of dictionaries look bellow
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
Expected output should be
[
{"item_name": ["orange", "apple"], "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
but how i got
[
{"item_name": "orangeapple", "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
Here my code bellow
comp_key = "id"
conc_key = "item_name"
res = []
for ele in test_list:
temp = False
for ele1 in res:
if ele1[comp_key] == ele[comp_key]:
ele1[conc_key] = ele1[conc_key] + ele[conc_key]
temp = True
break
if not temp:
res.append(ele)
how to resolve...?
Something like this - the special sauce is the isinstance stuff to make sure you're making the concatenated value a list instead.
Do note that this assumes the raw list is ordered by the comp_key (id), and will misbehave if that's not the case.
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
comp_key = "id"
conc_key = "item_name"
grouped_items = []
for item in raw_list:
last_group = grouped_items[-1] if grouped_items else None
if not last_group or last_group[comp_key] != item[comp_key]: # Starting a new group?
grouped_items.append(item.copy()) # Shallow-copy the item into the result array
else:
if not isinstance(last_group[conc_key], list):
# The concatenation key is not a list yet, make it so
last_group[conc_key] = [last_group[conc_key]]
last_group[conc_key].append(item[conc_key])
print(grouped_items)
import pandas as pd
df = pd.DataFrame(raw_list)
dd = pd.concat([df.groupby('id')['item_name'].apply(list), df.groupby('id').['total'].apply(sum)], axis=1).reset_index()
dd.to_dict('records')
you could use pandas to group by and apply function to two columns then convert to dict
[{'id': 12, 'item_name': ['orange', 'apple'], 'total': 2.0},
{'id': 34, 'item_name': ['apple'], 'total': 22.0}]
You could use itertools.grouper for grouping by id and collections.defaultdict to combine values with same keys into lists.
from itertools import groupby
from collections import defaultdict
id_getter = lambda x: x['id']
gp = groupby(sorted(raw_list, key=id_getter), key=id_getter)
out = []
for _,i in gp:
subdict = defaultdict(list)
for j in i:
for k,v in j.items():
subdict[k].append(v)
out.append(dict(subdict))
out
Working with complex datatypes such as nested lists and dictionaries, I would advice really utilizing the APIs provided by collections and itertools.

Converting CSV to Hierarchical JSON output

I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.
gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms
The expected Output format should be in the following format
{
"name": "A1BG",
"children": [
{"name": "Adenocarcinoma"},
{"name": "apnea"},
{"name": "Athritis"}
]
},
{
"name": "A2M",
"children": [
{"name": "Asthma"},
{"name": "Astrocytoma"},
{"name": "Diabetes"}
]
},
{
"name": "NAT1",
"children": [
{"name": "polyps"},
{"name": "lymphoma"},
{"name": "neoplasms"}
]
}
The python code I have written is below. let me know where I need to change to get the desired output.
import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])
for key, value in grouped:
dictionary = {}
dictList = []
anotherDict = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['name'] = j.at[0, 'gene']
for i in j.index:
anotherDict['disease'] = j.at[i, 'disease']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
with open('outputresult3.json', "w") as out:
json.dump(finalList,out)
import json
json_data = []
# group the data by each unique gene
for gene, data in df.groupby(["gene"]):
# obtain a list of diseases for the current gene
diseases = data["disease"].tolist()
# create a new list of dictionaries to satisfy json requirements
children = [{"name": disease} for disease in diseases]
entry = {"name": gene, "children": children}
json_data.append(entry)
with open('outputresult3.json', "w") as out:
json.dump(json_data, out)
Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:
L = (df.rename(columns={'disease':'name'})
.groupby('gene')
.apply(lambda x: x[['name']].to_dict('records'))
.reset_index(name='children')
.rename(columns={'gene':'name'})
.to_dict('records')
)
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
{'name': 'apnea'},
{'name': 'Athritis'}]},
{'name': 'A2M', 'children': [{'name': 'Asthma'},
{'name': 'Astrocytoma'},
{'name': 'Diabetes'}]},
{'name': 'NAT1', 'children': [{'name': 'polyps'},
{'name': 'lymphoma'},
{'name': 'neoplasms'}]}]
with open('outputresult3.json', "w") as out:
json.dump(L,out)

Get unique dictionary but concatenate value of specific key

I am trying to devise a logic in python for the given scenario -
I have a list of multiple dictionaries, my main goal is to get the unique list of dictionary based on the id key.
non_unique = [
{"id": 1, "name": "A", "items": ["blah1"]},
{"id": 1, "name": "A", "items": ["blah2"]}
]
I can get the unique list of dictionaries by this dictionary comprehension:
list({v["id"]: v for v in non_unique}.values())
But I am unable to fit a logic in the dictionary comprehension to concatenate the value of items key. My expected output is:
[{"id": 1, "name": "A", "items": ["blah1", "blah2"]}]
Sometimes a simple for loop is much more clear than dict or list comprehension....in your case i would simple use :
from operator import itemgetter
non_unique = [{'id': 1, "name": "A", "items": ["blah1"]},
{'id': 1, "name": "A", "items": ["blah2"]},
{'id': 2, "name": "A", "items": ["blah2"]},
{'id': 2, "name": "B", "items": ["blah1"]},
]
result = {}
for uniq in non_unique:
id, items, name = itemgetter('id', 'items', 'name')(uniq)
if id in result:
result[id]["items"] += items
if name not in result[id]["name"].split():
result[id]["name"] += ",{}".format(name)
else:
result[id] = uniq
unique_list = [val for item, val in result.items()]
print(unique_list)
Output :
[{'id': 1, 'name': 'A', 'items': ['blah1', 'blah2']}, {'id': 2, 'name': 'A,B', 'items': ['blah2', 'blah1']}]
EDIT
As suggested in comments : i add a simple check for the name and add it to names if it does not exists....
I also add the itemgetter for getting a more "clear" code.
You can use this method.
non_unique = [
{'id': 1, 'name': "A", 'items': ["blah1"]},
{'id': 1, 'name': "A", 'items': ["blah2"]}
]
dic = []
for v in non_unique:
for x in dic:
if x['id'] == v['id']:
if v['name']not in x['name']:
x['name'] += v['name']
if v['items'] not in x['items']:
x['items'] += v['items']
break
else:
dic.append(v)
print(dic)
Output - [{'id': 1, 'name': 'A', 'items': ['blah1', 'blah2']}]

Create partial dict from recursively nested field list

After parsing a URL parameter for partial responses, e.g. ?fields=name,id,another(name,id),date, I'm getting back an arbitrarily nested list of strings, representing the individual keys of a nested JSON object:
['name', 'id', ['another', ['name', 'id']], 'date']
The goal is to map that parsed 'graph' of keys onto an original, larger dict and just retrieve a partial copy of it, e.g.:
input_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42",
"but_wait": "there is more!"
},
"even_more": {
"nesting": {
"why": "not?"
}
},
"date": 1584567297
}
should simplyfy to:
output_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42"
},
"date": 1584567297,
}
Sofar, I've glanced over nested defaultdicts, addict and glom, but the mappings they take as inputs are not compatible with my list (might have missed something, of course), and I end up with garbage.
How can I do this programmatically, and accounting for any nesting that might occur?
you can use:
def rec(d, f):
result = {}
for i in f:
if isinstance(i, list):
result[i[0]] = rec(d[i[0]], i[1])
else:
result[i] = d[i]
return result
f = ['name', 'id', ['another', ['name', 'id']], 'date']
rec(input_dict, f)
output:
{'name': 'foobar',
'id': '1',
'another': {'name': 'spam', 'id': '42'},
'date': 1584567297}
here the assumption is that on a nested list the first element is a valid key from the upper level and the second element contains valid keys from a nested dict which is the value for the first element

Categories

Resources