I have a list of 10,000 Dictionaries from a JSON that look like:
my_list =
[
{"id": 1, "val": "A"},
{"id": 4, "val": "A"},
{"id": 1, "val": "C"},
{"id": 3, "val": "C"},
{"id": 1, "val": "B"},
{"id": 2, "val": "B"},
{"id": 4, "val": "C"},
{"id": 4, "val": "B"},
.
.
.
{"id": 10000, "val": "A"}
]
and I want my output to be:
mapped_list =
[
{"id": 1, "val": ["A", "B", "C"]},
{"id": 2, "val": ["B"]},
{"id": 3, "val": ["C"]},
{"id": 4, "val": ["A", "B", "C"]},
.
.
.
{"id": 10000, "val": ["A","C"]}
]
My goal is to Map the first list's "id" and its "val" to create the 2nd list as efficiently as possible. So far my running time has not been the greatest:
output = []
cache = {}
for unit in my_list:
uid = unit['id']
value = unit['val']
if (uid in cache):
output[uid][value].append(value)
else:
cache[uid] = 1
output.append({'id' : uid, 'values': value})
My approach is to make a frequency check of the 'id' to avoid iterating through 2 different lists. I believe my fault is in understanding nested dicts/lists of dicts. I have a feeling I can get this in O(n), if not better, as O(n^2) is out of the question its too easy to grow this in magnitude.
Brighten my insight PLEASE, I could use the help.
Or any other way of approaching this problem.
Maybe map(), zip(), tuple() might be a better approach for this. Let me know!
EDIT: I'm trying to accomplish this with only built-in functions. Also, the last dictionary is to exemplify that this is not limited to what I have displayed but there are more "id's" than I can share with "val" being a combination of A,B,C for whatever id its associated with.
UPDATE:
This is my final solution, if there can be any improvements, Let me know!
mapped_list = []
cache = {}
for item in my_list:
id = item['id']
val = item['val']
if (id in cache):
output[cache[id]]['val'].append(val)
else:
cache[id] = len(output)
mapped_list.append({'id' : id, 'val': [val]})
mapped_list.sort(key=lambda k: k['id'])
print(output)
my_list=[
{"id": 1, "val": 'A'},
{"id": 4, "val": "A"},
{"id": 1, "val": "C"},
{"id": 3, "val": "C"},
{"id": 1, "val": "B"},
{"id": 2, "val": "B"},
{"id": 4, "val": "C"},
{"id": 4, "val": "B"},
{"id": 10000, "val": "A"}
]
temp_dict = {}
for item in my_list:
n, q = item.values()
if not n in temp_dict:
temp_dict[n] = []
temp_dict.get(n,[]).append(q)
mapped_list = [{'id': n, 'val': q} for n,q in temp_dict.items()]
mapped_list = sorted(mapped_list, key = lambda x : x['id'])
print(mapped_list)
If there are multiple val with the same id you can use a set like this:
my_list = [
{"id": 1, "val": "A"},
{"id": 4, "val": "A"},
{"id": 1, "val": "C"},
{"id": 3, "val": "C"},
{"id": 1, "val": "B"},
{"id": 2, "val": "B"},
{"id": 4, "val": "C"},
{"id": 4, "val": "B"},
{"id": 10000, "val": "A"}
]
from collections import defaultdict
ddict = defaultdict(set)
for lst in my_list:
ddict[lst['id']].add(lst['val'])
result = [{"id" : k,"val" : list(v)} for k,v in ddict.items()]
sorted(result,key = lambda x : x['id'])
[{'id': 1, 'val': ['C', 'A', 'B']},
{'id': 2, 'val': ['B']},
{'id': 3, 'val': ['C']},
{'id': 4, 'val': ['C', 'A', 'B']},
{'id': 10000, 'val': ['A']}]
Insert or search in dict (or defaultdict) and set have O(1) complexity and the sort function have O(NlogN) so overall is O(N + NlogN)
You could just use collections.defaultdict like,
>>> my_list
[{'id': 1, 'val': 'A'}, {'id': 4, 'val': 'A'}, {'id': 1, 'val': 'C'}, {'id': 3, 'val': 'C'}, {'id': 1, 'val': 'B'}, {'id': 2, 'val': 'B'}, {'id': 4, 'val': 'C'}, {'id': 4, 'val': 'B'}, {'id': 10000, 'val': 'A'}]
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for item in my_list:
... d[item['id']].append(item['val'])
...
>>> mapped_list = [{'id': key, 'val': val} for key,val in d.items()]
>>> mapped_list = sorted(mapped_list, key=lambda x: x['id']) # just to make it always sorted by `id`
>>> import pprint
>>> pprint.pprint(mapped_list)
[{'id': 1, 'val': ['A', 'C', 'B']},
{'id': 2, 'val': ['B']},
{'id': 3, 'val': ['C']},
{'id': 4, 'val': ['A', 'C', 'B']},
{'id': 10000, 'val': ['A']}]
I think you won't be able to do it better than O(n*log(n)):
from collections import defaultdict
vals = defaultdict(list)
my_list.sort(key=lambda x: x['val'])
for i in my_list:
vals[i['id']].append(i['val'])
output = [{'id': k, 'val': v} for k, v in vals.items()]
output.sort(key=lambda x: x['id'])
Output:
[{'id': 1, 'val': ['A', 'B', 'C']},
{'id': 2, 'val': ['B']},
{'id': 3, 'val': ['C']},
{'id': 4, 'val': ['A', 'B', 'C']},
{'id': 1000, 'val': ['A']}]
I am created mapped_list using setdefault
d = {}
for i in my_list:
d.setdefault(i['id'], []).append(i['val'])
mapped_list = [{'id':key, 'val': val} for key,val in sorted(d.items())]
print(mapped_list)
defaultdict makes better performance than setdefault.
I just make this answer for creating mapped_list using another approach
Related
I have a dictionary as below.
Key id is present multiple times inside dictionary.I need to fill id value at all places in dicts in single line of code.
Currently I am writing multiple line of code to fill empty values.
dicts = {
"abc": {
"a":{"id": "", "id1":""},
"b":{"id": "","hey":"1223"},
"c":{"id": "","hello":"4564"}
},
"xyz": {
"d":{"id": "","id1":"", "ijk":"water"}
},
"f":{"id": ""},
"g":{"id1": ""}
}
id = 123
dicts['abc']['a']['id'] = id
dicts['abc']['b']['id'] = id
dicts['abc']['c']['id'] = id
dicts['xyz']['d']['id'] = id
dicts['f']['id'] = id
dicts
Output:
{'abc': {'a': {'id': 123,"id1":""},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123,id1:"", 'ijk': 'water'}},
'f': {'id': 123}, "g":{"id1": ""}}
You can solve it in place via simple recursive function, for example:
id = 123
dicts = {
"abc": {
"a": {"id": "", "id1": ""},
"b": {"id": "", "hey": "1223"},
"c": {"id": "", "hello": "4564"}
},
"xyz": {
"d": {"id": "", "id1": "", "ijk": "water"}
},
"f": {"id": ""},
"g": {"id1": ""}
}
def process(dicts):
for k, v in dicts.items():
if k == 'id' and not dicts[k]:
dicts[k] = id
if isinstance(v, dict):
process(v)
process(dicts)
print(dicts)
Output:
{
'abc': {'a': {'id': 123, 'id1': ''},
'b': {'id': 123, 'hey': '1223'},
'c': {'id': 123, 'hello': '4564'}},
'xyz': {'d': {'id': 123, 'id1': '', 'ijk': 'water'}},
'f': {'id': 123}, 'g': {'id1': ''}
}
I have the following list of dictionaries:
[
{"id": 1, "roll_id": ["101", "201"]},
{"id": 2, "roll_id": ["301", "201"]},
{"id": 3, "roll_id": ["424"]}
]
Now I need to convert this into the following format:
[
{'roll_id': '101', 'id':["1"]},
{'roll_id': '201', 'id':["1","2"]},
{'roll_id': '301', 'id':["2"]},
{'roll_id': '424', 'id':["3"]}
]
Can anyone help me, please?
You can use a dictionary+setdefault to collect the values, then convert to list:
out = {}
for d in l:
for RID in d['roll_id']:
out.setdefault(RID, {'roll_id': RID, 'id': []})['id'].append(d['id'])
out = list(out.values())
Another solution using pandas:
l = [
{"id": 1, "roll_id": ["101", "201"]},
{"id": 2, "roll_id": ["301", "201"]},
{"id": 3, "roll_id": ["424"]}
]
import pandas as pd
out = (pd
.json_normalize(l)
.explode('roll_id')
.groupby('roll_id', as_index=False)
['id'].agg(list)
.to_dict('records')
)
Output:
[{'roll_id': '101', 'id': [1]},
{'roll_id': '201', 'id': [1, 2]},
{'roll_id': '301', 'id': [2]},
{'roll_id': '424', 'id': [3]}]
Try this:
data = [{"id": 1, "roll_id": ["101", "201"]}, {"id": 2, "roll_id": ["301", "201"]}, {"id": 3, "roll_id": ["424"]}]
res = []
for el in data:
for r in el["roll_id"]:
find = [i for i, v in enumerate(res) if v['roll_id'] == r]
if not find:
res.append({'roll_id': r, "id": [el['id']]})
else:
res[find[0]]['id'].append(el['id'])['id'].append(el['id'])
print(res)
Result:
[{'roll_id': '101', 'id': [1]}, {'roll_id': '201', 'id': [1, 2]}, {'roll_id': '301', 'id': [2]}, {'roll_id': '424', 'id': [3]}]
Not my best but it work.
Regards,
I think the most efficient way of doing it could look something like this:
def convert(input_dict):
result = {}
for dic in input_dict:
for roll_id in dic["roll_id"]:
str_id = str(dic["id"])
if roll_id in result:
result[roll_id]["id"].append(str_id)
else:
result[roll_id] = {"roll_id":roll_id, "id":[str_id]}
return [result[i] for i in result]
print(convert(test))
I have nested dict something like that
my_dict= {'name1': {'code1': {'brand1': 2}},'name2': {'code2.1': {'brand2.1': 2,'brand2.2': 8,'brand2.3': 5, 'brand2.4': 4},'code2.2': {'brand2.1': 2, 'brand1': 1, 'brand2.5': 25}},'name3': {'code1': {'brand2.1': 2},'code3': {'brand4': 1,'brand3.1':2}}}
I need sort on the level "code" with depending on summing values "brands". For example,
target_dict= {'name1': {'code1': {'brand1': 2}}, 'name2': {'code2.2': {'brand2.1':2,'brand1': 1,'brand2.5': 25},'code2.1': {'brand2.1': 2,'brand2.2': 8,'brand2.3': 5,'brand2.4': 4}}, 'name3': {'code3': {'brand4': 1, 'brand3.1':2},'code1': {'brand2.1': 2}}}
*# 'code2.2' first because 2+1+25=28 > 2+8+5+4=19
# 'code3' first because 1+2=3 > 2
I can sum values "brands" by "code" with
sum_values = [[[i, sum(v[i].values())] for i in v.keys()] for x,y in v.items() for k,v in my_dict.items()]
and try combine with sort function as
target_dict = sorted(my_dict.items(), key=lambda i: [[[i, sum(v[i].values())] for i in v.keys()] for x,y in v.items() for k,v in my_dict.items()], reverse=True).
Thanks for your attention and help!
Try (assuming sufficient version of Python to preserve creation order of dict):
my_dict = {
"name1": {"code1": {"brand1": 2}},
"name2": {
"code2.1": {"brand2.1": 2, "brand2.2": 8, "brand2.3": 5, "brand2.4": 4},
"code2.2": {"brand2.1": 2, "brand1": 1, "brand2.5": 25},
},
"name3": {"code1": {"brand2.1": 2}, "code3": {"brand4": 1, "brand3.1": 2}},
}
out = {
k: dict(sorted(v.items(), key=lambda d: sum(d[1].values()), reverse=True))
for k, v in my_dict.items()
}
print(out)
Prints:
{
"name1": {"code1": {"brand1": 2}},
"name2": {
"code2.2": {"brand2.1": 2, "brand1": 1, "brand2.5": 25},
"code2.1": {"brand2.1": 2, "brand2.2": 8, "brand2.3": 5, "brand2.4": 4},
},
"name3": {"code3": {"brand4": 1, "brand3.1": 2}, "code1": {"brand2.1": 2}},
}
I would like to define python function which takes a list of dictionaries in which some keys could be lists and then returns a list of list of dictionaries in which each key is a single value, which corresponds to all the combinations of options (an option is picking a single value from each list).
Consider the following input:
input = [
{
"name": "A",
"option1": [1, 2],
"option2": ["a1", "a2"]
}
{
"name": "B",
"option1": [3, 4],
"option2": "b1"
}
]
Given this input, the desired output would be:
output = [[{"name": "A", "option1": 1, "option2": "a1"}{"name": "B", "option1": 3, "option2": "b1"}]
[{"name": "A", "option1": 1, "option2": "a1"}{"name": "B", "option1": 4, "option2": "b1"}]
[{"name": "A", "option1": 1, "option2": "a2"}{"name": "B", "option1": 3, "option2": "b1"}]
[{"name": "A", "option1": 1, "option2": "a2"}{"name": "B", "option1": 4, "option2": "b1"}]
[{"name": "A", "option1": 2, "option2": "a1"}{"name": "B", "option1": 3, "option2": "b1"}]
[{"name": "A", "option1": 2, "option2": "a1"}{"name": "B", "option1": 4, "option2": "b1"}]
[{"name": "A", "option1": 2, "option2": "a2"}{"name": "B", "option1": 3, "option2": "b1"}]
[{"name": "A", "option1": 2, "option2": "a2"}{"name": "B", "option1": 4, "option2": "b1"}]]
Let's try something like this.
import itertools
input = [
{
"name": "A",
"option1": [1, 2],
"option2": ["a1", "a2"]
},
{
"name": "B",
"option1": [3, 4],
"option2": "b1"
}
]
list_opts = [y
for x in input
for y in [x['option1']
if type(x['option1']) == list
else [x['option1']],
x['option2']
if type(x['option2']) == list
else [x['option2']]]]
list_combinations = list(itertools.product(*list_opts))
output = [[{"name": "A",
"option1": x[0],
"option2": x[1]},
{"name": "B",
"option1": x[2],
"option2": x[3]}]
for x in list_combinations]
Here's what I got.
[[{'name': 'A', 'option1': 1, 'option2': 'a1'},
{'name': 'B', 'option1': 3, 'option2': 'b1'}],
[{'name': 'A', 'option1': 1, 'option2': 'a1'},
{'name': 'B', 'option1': 4, 'option2': 'b1'}],
[{'name': 'A', 'option1': 1, 'option2': 'a2'},
{'name': 'B', 'option1': 3, 'option2': 'b1'}],
[{'name': 'A', 'option1': 1, 'option2': 'a2'},
{'name': 'B', 'option1': 4, 'option2': 'b1'}],
[{'name': 'A', 'option1': 2, 'option2': 'a1'},
{'name': 'B', 'option1': 3, 'option2': 'b1'}],
[{'name': 'A', 'option1': 2, 'option2': 'a1'},
{'name': 'B', 'option1': 4, 'option2': 'b1'}],
[{'name': 'A', 'option1': 2, 'option2': 'a2'},
{'name': 'B', 'option1': 3, 'option2': 'b1'}],
[{'name': 'A', 'option1': 2, 'option2': 'a2'},
{'name': 'B', 'option1': 4, 'option2': 'b1'}]]
While George's answer might produce correct output, it could be problematic in case of exending the solution to more options.
Here's what I came up with:
import copy
import itertools
input = [
{
"name": "A",
"option1": [1, 2],
"option2": ["a1", "a2"]
},
{
"name": "B",
"option1": [3, 4],
"option2": "b1"
}
]
def dictContainsListVals(dic):
return any([isinstance(val, list) for val in dic.values()])
def splitDict(dic):
flattenedDicts = [dic]
while any(dictContainsListVals(dic) for dic in flattenedDicts):
splitDict = flattenedDicts.pop(0)
for key, value in splitDict.items():
if isinstance(value, list):
for el in value:
dictCopy = copy.deepcopy(splitDict)
dictCopy[key] = el
flattenedDicts.append(dictCopy)
break
else:
flattenedDicts.append(splitDict)
return flattenedDicts
flattenedDicts = []
for dic in input:
flattenedDicts.append(splitDict(dic))
output = itertools.product(*flattenedDicts)
print(list(output))
Output:
[({'name': 'A', 'option1': 1, 'option2': 'a1'}, {'name': 'B', 'option1': 3, 'option2': 'b1'}),
({'name': 'A', 'option1': 1, 'option2': 'a1'}, {'name': 'B', 'option1': 4, 'option2': 'b1'}),
({'name': 'A', 'option1': 1, 'option2': 'a2'}, {'name': 'B', 'option1': 3, 'option2': 'b1'}),
({'name': 'A', 'option1': 1, 'option2': 'a2'}, {'name': 'B', 'option1': 4, 'option2': 'b1'}),
({'name': 'A', 'option1': 2, 'option2': 'a1'}, {'name': 'B', 'option1': 3, 'option2': 'b1'}),
({'name': 'A', 'option1': 2, 'option2': 'a1'}, {'name': 'B', 'option1': 4, 'option2': 'b1'}),
({'name': 'A', 'option1': 2, 'option2': 'a2'}, {'name': 'B', 'option1': 3, 'option2': 'b1'}),
({'name': 'A', 'option1': 2, 'option2': 'a2'}, {'name': 'B', 'option1': 4, 'option2': 'b1'})]
I have a list of dictionaries that I wish to manipulate using Pandas. Say:
m = [{"topic": "A", "type": "InvalidA", "count": 1}, {"topic": "A", "type": "InvalidB", "count": 1}, {"topic": "A", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidA", "count": 1}, {"topic": "B", "type": "InvalidB", "count": 1}]
1) first create a dataframe using the constructor:
df = pd.DataFrame(m)
2) Group by columns ['topic] and ['type'] and count
df_group = df.groupby(['topic', 'type']).count()
I end up with:
count
topic type
A InvalidA 2
InvalidB 1
B InvalidA 2
InvalidB 1
I want to now convert this to a nested dict:
{ "A" : {"InvalidA" : 2,
"InvalidB" : 1},
"B" : {"InvalidA" : 2,
"InvalidB": 1}
}
Any suggestions on how to get from df_group to a nested dict?
Using unstack + to_dict
df_group['count'].unstack(0).to_dict()
Out[446]: {'A': {'InvalidA': 2, 'InvalidB': 1}, 'B': {'InvalidA': 2, 'InvalidB': 1}}
And also slightly change you groupby to crosstab
pd.crosstab(df.type,df.topic).to_dict()
Out[449]: {'A': {'InvalidA': 2, 'InvalidB': 1}, 'B': {'InvalidA': 2, 'InvalidB': 1}}