I am trying to devise a logic in python for the given scenario -
I have a list of multiple dictionaries, my main goal is to get the unique list of dictionary based on the id key.
non_unique = [
{"id": 1, "name": "A", "items": ["blah1"]},
{"id": 1, "name": "A", "items": ["blah2"]}
]
I can get the unique list of dictionaries by this dictionary comprehension:
list({v["id"]: v for v in non_unique}.values())
But I am unable to fit a logic in the dictionary comprehension to concatenate the value of items key. My expected output is:
[{"id": 1, "name": "A", "items": ["blah1", "blah2"]}]
Sometimes a simple for loop is much more clear than dict or list comprehension....in your case i would simple use :
from operator import itemgetter
non_unique = [{'id': 1, "name": "A", "items": ["blah1"]},
{'id': 1, "name": "A", "items": ["blah2"]},
{'id': 2, "name": "A", "items": ["blah2"]},
{'id': 2, "name": "B", "items": ["blah1"]},
]
result = {}
for uniq in non_unique:
id, items, name = itemgetter('id', 'items', 'name')(uniq)
if id in result:
result[id]["items"] += items
if name not in result[id]["name"].split():
result[id]["name"] += ",{}".format(name)
else:
result[id] = uniq
unique_list = [val for item, val in result.items()]
print(unique_list)
Output :
[{'id': 1, 'name': 'A', 'items': ['blah1', 'blah2']}, {'id': 2, 'name': 'A,B', 'items': ['blah2', 'blah1']}]
EDIT
As suggested in comments : i add a simple check for the name and add it to names if it does not exists....
I also add the itemgetter for getting a more "clear" code.
You can use this method.
non_unique = [
{'id': 1, 'name': "A", 'items': ["blah1"]},
{'id': 1, 'name': "A", 'items': ["blah2"]}
]
dic = []
for v in non_unique:
for x in dic:
if x['id'] == v['id']:
if v['name']not in x['name']:
x['name'] += v['name']
if v['items'] not in x['items']:
x['items'] += v['items']
break
else:
dic.append(v)
print(dic)
Output - [{'id': 1, 'name': 'A', 'items': ['blah1', 'blah2']}]
Related
I have a json like:
pd = {
"RP": [
{
"Name": "PD",
"Value": "qwe"
},
{
"Name": "qwe",
"Value": "change"
}
],
"RFN": [
"All"
],
"RIT": [
{
"ID": "All",
"IDT": "All"
}
]
}
I am trying to change the value change to changed. This is a dictionary within a list which is within another dictionary. Is there a better/ more efficient/pythonic way to do this than what I did below:
for key, value in pd.items():
ls = pd[key]
for d in ls:
if type(d) == dict:
for k,v in d.items():
if v == 'change':
pd[key][ls.index(d)][k] = "changed"
This seems pretty inefficient due to the amount of times I am parsing through the data.
String replacement could work if you don't want to write depth/breadth-first search.
>>> import json
>>> json.loads(json.dumps(pd).replace('"Value": "change"', '"Value": "changed"'))
{'RP': [{'Name': 'PD', 'Value': 'qwe'}, {'Name': 'qwe', 'Value': 'changed'}],
'RFN': ['All'],
'RIT': [{'ID': 'All', 'IDT': 'All'}]}
the output of file comes as dictionary, with 5 columns. Due to the 5th column the first 4 are duplicated. My goals is to output it as a json, without duplicates in the following format.
Sample input:
test_dict = [
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"123"},
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"345"}
]
Previously there were no duplicates so it was easy to transform to json as below:
result = defaultdict(set)
for i in test_dict:
id = i.get('ID')
if id:
result[i].add(i.get('ID_A'))
result[i].add(i.get('ID_B'))
result[i].add(i.get('ID_C'))
output = []
for id, details in result.items():
output.append(
{
"ID": id,
"otherDetails": {
"IDs": [
{"id": ref} for ref in details
]
},
}
)
How could I add INVOICE to this without duplicating the rows? The output would look like this:
[{'ID': '"A"',
'OtherDetails': {'IDs': [{'id': 'A1'},
{'id': 'A2'},
{'id': 'A3'}],
{'INVOICE': [{'id':'123'},
{'id':'345'}]}}}]
Thanks! (python 3.9)
Basically, you can just do the same as for the IDs, using a second defaultdict (or similar) for the invoice IDs. Afterwards, use a nested list/dict comprehension to build the final result.
id_to_ids = defaultdict(set)
id_to_inv = defaultdict(set)
for d in test_dict:
id_to_ids[d["ID"]] |= {d[k] for k in ["ID_A", "ID_B", "ID_C"]}
id_to_inv[d["ID"]] |= {d["INVOICE"]}
result = [{
'ID': k,
'OtherDetails': {
'IDs': [{'id': v} for v in id_to_ids[k]],
'INVOICE': [{'id': v} for v in id_to_inv[k]]
}} for k in id_to_ids]
Note, though, that using this format, you will lose the information which of the "other" IDs was which, and with that invoice ID those were associated.
You were pretty close. I would make the intermediate dictionary a little bit more straight forward. And have it just be a diction with id, and two lists.
When walking the original data, you just need to append INVOICE if there is already an entry for the ID. Then when you create the "json" format (a list of dictionary for each ID), all you have to do is use the lists you already generate. Here is the structure I propose.
from collections import defaultdict
test_dict = [
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"123"},
{'ID':"A", 'ID_A':"A1",'ID_B':"A2",'ID_C':"A3",'INVOICE':"345"}
]
result = {}
for i in test_dict:
id = i.get('ID')
if not id:
continue
if id in result:
# just add INVOICE
result[id]['INVOICE'].append(i.get('INVOICE'))
else:
# ID not in result dictionary, so populate it
result[id] = {'IDs': [ i.get('ID_A'), i.get('ID_B'), i.get('ID_C')],
'INVOICE' : [i.get('INVOICE')]
}
output = []
for id, details in result.items():
output.append(
{
"ID": id,
"otherDetails": {
"IDs": details['IDs'],
'INVOICE': details['INVOICE']
}
}
)
The trick for duplicate id's is handled by the if id in result where it only appends the invoice to the list of invoices. I will also add since we are using a lot of dict.get() calls rather than simple dict[], we are potentially adding a bunch of None's into these lists.
The like the answer from #tobias_k, but it does not handle duplicate values for any of the ID_* or invoice columns. His answer is the most simple if order and repetition are not important.
Checkout this if they are important.
import pandas as pd
def create_item(df: pd.DataFrame):
output = list()
groups = df.groupby(["ID", "ID_A", "ID_B", "ID_C"])[["INVOICE"]]
for group, gdf in groups:
row = dict()
row["ID"] = group[0]
row["OtherDetails"] = dict()
row["OtherDetails"]["IDS"] = [{"id": x} for x in group[1:]]
row["OtherDetails"]["INVOICE"] = [{"id": x} for x in gdf["INVOICE"]]
output.append(row)
return output
test_dict = [
{"ID": "A", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
{"ID": "A", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "345"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "345"},
{"ID": "B", "ID_A": "A1", "ID_B": "A2", "ID_C": "A3", "INVOICE": "123"},
]
test_df = pd.DataFrame(test_dict)
create_item(test_df)
Which will return
[{'ID': 'A',
'OtherDetails': {'IDS': [{'id': 'A1'}, {'id': 'A2'}, {'id': 'A3'}],
'INVOICE': [{'id': '123'}, {'id': '345'}]}},
{'ID': 'B',
'OtherDetails': {'IDS': [{'id': 'A1'}, {'id': 'A2'}, {'id': 'A3'}],
'INVOICE': [{'id': '123'}, {'id': '345'}, {'id': '123'}]}}]
I'm struggling with updating a value in a json object.
import json
userBoard = '' #see example below. is loaded in a separate function
#app.get("/setItem")
def setItem():
id = request.args.get('itemId')
id = int(id[2:]) # is for instance 2
for item in json.loads(session['userBoard']):
if item['id'] == id:
item['solved']='true'
else:
print('Nothing found!')
return('OK')
Example of the json:
[{"id": 1, "name": "t1", "solved": "false"}, {"id": 2, "name": "t2", "solved": "false"}, {"id": 3, "name": "t3"}]
However, when I check the printout of the userBoard, the value is still 'false'. Does anyone have any idea? Does this need to be serialized somehow? Tried many things but it didn't work out...
Many thanks!
One could say the question is somehow specific and is lacking some information to provide a simple answer. So I am going to make some assumptions and propose a solution.
First, id and input are python built-ins and should not be used as variable names. I will use these strings with a _ prefix on purpose, so that you could still use these names in a safer way.
import json
from typing import List
json_ex = '[{"id": 1, "name": "t1", "solved": "false"}, {"id": 2, "name": "t2", "solved": "false"}, {"id": 3, "name": "t3"}]'
_id = 2 # for now a constant for demonstration purposes
def setItem(_input: List[dict]):
for item in _input:
if (this_id := item['id']) == _id: # requires python 3.8+, otherwise you can simplify this
item['solved'] = 'true'
print(f'Updated item id {this_id}')
else:
print('Nothing found!')
json_ex_parsed = json.loads(json_ex) # this is now a list of dictionaries
setItem(json_ex_parsed)
Output:
Nothing found!
Updated item id 2
Nothing found!
The contents of json_ex_parsed before applying setItem:
[{'id': 1, 'name': 't1', 'solved': 'false'},
{'id': 2, 'name': 't2', 'solved': 'false'},
{'id': 3, 'name': 't3'}]
and after:
[{'id': 1, 'name': 't1', 'solved': 'false'},
{'id': 2, 'name': 't2', 'solved': 'true'}, # note here has been updated
{'id': 3, 'name': 't3'}]
I have a list of dictionaries look bellow
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
Expected output should be
[
{"item_name": ["orange", "apple"], "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
but how i got
[
{"item_name": "orangeapple", "id": 12, "total": 2},
{"item_name": "apple", "id": 34, "total": 22},
]
Here my code bellow
comp_key = "id"
conc_key = "item_name"
res = []
for ele in test_list:
temp = False
for ele1 in res:
if ele1[comp_key] == ele[comp_key]:
ele1[conc_key] = ele1[conc_key] + ele[conc_key]
temp = True
break
if not temp:
res.append(ele)
how to resolve...?
Something like this - the special sauce is the isinstance stuff to make sure you're making the concatenated value a list instead.
Do note that this assumes the raw list is ordered by the comp_key (id), and will misbehave if that's not the case.
raw_list = [
{"item_name": "orange", "id": 12, "total": 2},
{"item_name": "apple", "id": 12},
{"item_name": "apple", "id": 34, "total": 22},
]
comp_key = "id"
conc_key = "item_name"
grouped_items = []
for item in raw_list:
last_group = grouped_items[-1] if grouped_items else None
if not last_group or last_group[comp_key] != item[comp_key]: # Starting a new group?
grouped_items.append(item.copy()) # Shallow-copy the item into the result array
else:
if not isinstance(last_group[conc_key], list):
# The concatenation key is not a list yet, make it so
last_group[conc_key] = [last_group[conc_key]]
last_group[conc_key].append(item[conc_key])
print(grouped_items)
import pandas as pd
df = pd.DataFrame(raw_list)
dd = pd.concat([df.groupby('id')['item_name'].apply(list), df.groupby('id').['total'].apply(sum)], axis=1).reset_index()
dd.to_dict('records')
you could use pandas to group by and apply function to two columns then convert to dict
[{'id': 12, 'item_name': ['orange', 'apple'], 'total': 2.0},
{'id': 34, 'item_name': ['apple'], 'total': 22.0}]
You could use itertools.grouper for grouping by id and collections.defaultdict to combine values with same keys into lists.
from itertools import groupby
from collections import defaultdict
id_getter = lambda x: x['id']
gp = groupby(sorted(raw_list, key=id_getter), key=id_getter)
out = []
for _,i in gp:
subdict = defaultdict(list)
for j in i:
for k,v in j.items():
subdict[k].append(v)
out.append(dict(subdict))
out
Working with complex datatypes such as nested lists and dictionaries, I would advice really utilizing the APIs provided by collections and itertools.
I am trying to convert the CSV file into a Hierarchical JSON file.CSV file input as follows, It contains two columns Gene and Disease.
gene,disease
A1BG,Adenocarcinoma
A1BG,apnea
A1BG,Athritis
A2M,Asthma
A2M,Astrocytoma
A2M,Diabetes
NAT1,polyps
NAT1,lymphoma
NAT1,neoplasms
The expected Output format should be in the following format
{
"name": "A1BG",
"children": [
{"name": "Adenocarcinoma"},
{"name": "apnea"},
{"name": "Athritis"}
]
},
{
"name": "A2M",
"children": [
{"name": "Asthma"},
{"name": "Astrocytoma"},
{"name": "Diabetes"}
]
},
{
"name": "NAT1",
"children": [
{"name": "polyps"},
{"name": "lymphoma"},
{"name": "neoplasms"}
]
}
The python code I have written is below. let me know where I need to change to get the desired output.
import json
finalList = []
finalDict = {}
grouped = df.groupby(['gene'])
for key, value in grouped:
dictionary = {}
dictList = []
anotherDict = {}
j = grouped.get_group(key).reset_index(drop=True)
dictionary['name'] = j.at[0, 'gene']
for i in j.index:
anotherDict['disease'] = j.at[i, 'disease']
dictList.append(anotherDict)
dictionary['children'] = dictList
finalList.append(dictionary)
with open('outputresult3.json', "w") as out:
json.dump(finalList,out)
import json
json_data = []
# group the data by each unique gene
for gene, data in df.groupby(["gene"]):
# obtain a list of diseases for the current gene
diseases = data["disease"].tolist()
# create a new list of dictionaries to satisfy json requirements
children = [{"name": disease} for disease in diseases]
entry = {"name": gene, "children": children}
json_data.append(entry)
with open('outputresult3.json', "w") as out:
json.dump(json_data, out)
Use DataFrame.groupby with custom lambda function for convert values to dictionaries by DataFrame.to_dict:
L = (df.rename(columns={'disease':'name'})
.groupby('gene')
.apply(lambda x: x[['name']].to_dict('records'))
.reset_index(name='children')
.rename(columns={'gene':'name'})
.to_dict('records')
)
print (L)
[{'name': 'A1BG', 'children': [{'name': 'Adenocarcinoma'},
{'name': 'apnea'},
{'name': 'Athritis'}]},
{'name': 'A2M', 'children': [{'name': 'Asthma'},
{'name': 'Astrocytoma'},
{'name': 'Diabetes'}]},
{'name': 'NAT1', 'children': [{'name': 'polyps'},
{'name': 'lymphoma'},
{'name': 'neoplasms'}]}]
with open('outputresult3.json', "w") as out:
json.dump(L,out)