how to merge same value dict of list

how to merge same value dict of list - python

Here is my list of dictionary.
a = [{'ID': 1, 'FID': 938119, 'LEFT': 'A', 'LTIME': '1:10', 'RIGHT': '', 'RTIME': ''},
{'ID': 2, 'FID': 938119, 'LEFT': 'B', 'LTIME': '1:55', 'RIGHT': '', 'RTIME': ''},
{'ID': 3, 'FID': 938119, 'LEFT': '', 'LTIME': '', 'RIGHT': 'A', 'RTIME': '1:20'},
{'ID': 4, 'FID': 938119, 'LEFT': 'A', 'LTIME': '1:56', 'RIGHT': '', 'RTIME': ''},
{'ID': 5, 'FID': 938120, 'LEFT': 'A', 'LTIME': '1:36', 'RIGHT': '', 'RTIME': ''},
]
How can I get output like below:(Group by FID and LEFT, sum LTIME and RTIME)
b = [
{'ID': 1, 'FID': 938119, 'LEFT': 'A', 'LTIME': '3:06', 'RIGHT': 'A', 'RTIME': '1:20'},
{'ID': 2, 'FID': 938119, 'LEFT': 'B', 'LTIME': '1:55', 'RIGHT': '', 'RTIME': ''},
{'ID': 3, 'FID': 938120, 'LEFT': 'A', 'LTIME': '1:36', 'RIGHT': '', 'RTIME': ''},
]

Related

Extract specific region from image using segmentation in python

I am having a JSON file where the annotation is stored as below
{'licenses': [{'name': '', 'id': 0, 'url': ''}], 'info': {'contributor': '', 'date_created': '', 'description': '', 'url': '', 'version': '', 'year': ''}, 'categories': [{'id': 1, 'name': 'book', 'supercategory': ''}, {'id': 2, 'name': 'ceiling', 'supercategory': ''}, {'id': 3, 'name': 'chair', 'supercategory': ''}, {'id': 4, 'name': 'floor', 'supercategory': ''}, {'id': 5, 'name': 'object', 'supercategory': ''}, {'id': 6, 'name': 'person', 'supercategory': ''}, {'id': 7, 'name': 'screen', 'supercategory': ''}, {'id': 8, 'name': 'table', 'supercategory': ''}, {'id': 9, 'name': 'wall', 'supercategory': ''}, {'id': 10, 'name': 'window', 'supercategory': ''}, {'id': 11, 'name': '__background__', 'supercategory': ''}], 'images': [{'id': 1, 'width': 848, 'height': 480, 'file_name': '153058384000.png', 'license': 0, 'flickr_url': '', 'coco_url': '', 'date_captured': 0}], 'annotations': [{'id': 1, 'image_id': 1, 'category_id': 7, 'segmentation': [[591.81, 146.75, 848.0, 119.83, 848.0, 289.18, 606.39, 288.06]], 'area': 38747.0, 'bbox': [591.81, 119.83, 256.19, 169.35], 'iscrowd': 0, 'attributes': {'occluded': False}}]}
I want to select a specific region from the image using the ''segmentation': [[591.81, 146.75, 848.0, 119.83, 848.0, 289.18, 606.39, 288.06]]' field within annotation in the above json file.
The image I am using is below
I tried with Opencv and PIL, but I didn't get effective output
Note: segmentation may have more than 8 coordinates

python data structure: list of dict to one dict

I have a data structure. It looks as follows:
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-A', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-B', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-D', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3C', 'name': 'grandChild-E', 'steps': 2},
{'id': '4A', 'name': 'final', 'steps': 3}
],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2', 'name': 'child', 'steps': 1},
]
]
How my expected output is
expected output
output = {
"1" : {
"2A": {
"3A": "grandChild-A",
"3B": "grandChild-B"
},
"2B": {
"3A": "grandChild-C",
"3B": "grandChild-D",
"3C": {
"4A": "final"
}
},
"2":"child"
}
}
How can I do that? I wanted to use the enumerator, But I always everything inside 1.
Thanks in advance
Update:
I have tried the following code:
parent = data[0][0]["id"]
dict_new = {}
dict_new[parent] = {}
for e in data:
for idx, item in enumerate(e):
display(item)
if idx>0:
dict_new[parent][e[idx]["id"]] = e[idx]["name"]

You can try:
d = {}
root = d
for L in data:
d = root
for M in L[:-1]:
d = d.setdefault(M["id"], {})
d[L[-1]["id"]] = L[-1]['name']
The idea is to follow each list to build a tree (thus d.setdefault(M["id"], {}). The leaf is handled differently, because it has to be the value of 'name'.
from pprint import pprint
pprint(root)
Output:
{'1': {'2': 'child',
'2A': {'3A': 'grandChild-A', '3B': 'grandChild-B'},
'2B': {'3A': 'grandChild-C',
'3B': 'grandChild-D',
'3C': {'4A': 'final'}}}}
The solution above won't work for the following input:
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1}]],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}]]
Iterating over the second list will try to add a new element 3A -> grandChild-C to the d['1']['2B'] dict. But d['1']['2B'] is not a dict but the 'child' string here, because of the first list.
When we iterate over the elements, we check if the key is already mapped and otherwise create a new dict (that's the setdefault job). We can also check if the key was mapped to a str, and if that's the case, replace the string by a fresh new dict:
...
for M in L[:-1]:
if M["id"] not in d or isinstance(d[M["id"]], str):
d[M["id"]] = {}
d = d[M["id"]]
...
Output:
{'1': {'2B': {'3A': 'grandChild-C'}}}

I fixed your data: (missing comma)
data = [[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-A', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2A', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-B', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3A', 'name': 'grandChild-C', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3B', 'name': 'grandChild-D', 'steps': 2}],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2B', 'name': 'child', 'steps': 1},
{'id': '3C', 'name': 'grandChild-E', 'steps': 2},
{'id': '4A', 'name': 'final', 'steps': 3}
],
[{'id': '1', 'name': 'parent', 'steps': 0},
{'id': '2', 'name': 'child', 'steps': 1},
]
]
And I came up with this code:
output = {}
#print(data)
for lis in data:
o = output
ln = len(lis) - 1
for idx,d in enumerate(lis):
id = d['id']
if idx == ln:
o[id] = d['name']
else:
if id not in o:
o[id] = {}
o = o[id]
print('Result:')
print(output)

Update JSON format from other JSON file

I have two files which are a and b. I want to import certain information from data b to data a with the unique id from every response.
data
a= [{'id':'abc23','name':'aa','age':'22',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc25','name':'bb','age':'32',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc60','name':'cc','age':'24',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}}]
b=[{'id':'abc23','read':'2','speak':'abc','write':'2'},
{'id':'abc25','read':'3','speak':'def','write':'3'},
{'id':'abc60','read':'5','speak':'dgf','write':'1'}]
Code that I used to import from b to a :
from pprint import pprint
for dest in a:
for source in b:
if source['id'] == dest['id']:
dest['data'].update(source)
pprint(a)
Output from the code that i used :
[{ 'age': '22',
'data': {'id': 'abc23', 'read': '2', 'speak': 'abc', 'write': '2'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{ 'age': '32',
'data': {'id': 'abc25', 'read': '3', 'speak': 'def', 'write': '3'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{ 'age': '24',
'data': {'id': 'abc60', 'read': '5', 'speak': 'dgf', 'write': '1'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]
But... This is the output that I want:
[{'age': '22',
'data': {'read': '2', 'speak': 'abc'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{'age': '32',
'data': {'read': '3', 'speak': 'def'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{'age': '24',
'data': {'read': '5', 'speak': 'dgf'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]

It can't work the way you want with your code.
You do
dest['data'].update(source)
where source is
{'id':'abc23','read':'2','speak':'abc','write':'2'}
and dest['data'] is {'read':'','speak':''}.
When you update it will add all key-value pairs to dest['data'] and preserve the ones that won't be overwritten.
from pprint import pprint
for dest in a:
for source in b:
if source['id'] == dest['id']:
dest['data'] = {k: v for k, v in source.items() if k in dest.get('data', {})}
pprint(a)
This one will look for all the fields that are 'updateable' for each case. You might want to hardcode it, depending on your use case.

This is one approach by changing b to a dict for easy lookup.
Ex:
a= [{'id':'abc23','name':'aa','age':'22',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc25','name':'bb','age':'32',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}},
{'id':'abc60','name':'cc','age':'24',
'data':{'read':'','speak':''},
'responses':{'a':1,'b':2}}]
b=[{'id':'abc23','read':'2','speak':'abc','write':'2'},
{'id':'abc25','read':'3','speak':'def','write':'3'},
{'id':'abc60','read':'5','speak':'dgf','write':'1'}]
b = {i.pop('id'): i for i in b} #Convert to dict key = ID & value = `read`, `speak`, `write`
for i in a:
i['data'].update(b[i['id']]) #Update list
print(a)
Output:
[{'age': '22',
'data': {'read': '2', 'speak': 'abc', 'write': '2'},
'id': 'abc23',
'name': 'aa',
'responses': {'a': 1, 'b': 2}},
{'age': '32',
'data': {'read': '3', 'speak': 'def', 'write': '3'},
'id': 'abc25',
'name': 'bb',
'responses': {'a': 1, 'b': 2}},
{'age': '24',
'data': {'read': '5', 'speak': 'dgf', 'write': '1'},
'id': 'abc60',
'name': 'cc',
'responses': {'a': 1, 'b': 2}}]

list of tuples to rdd with count using map reduce pyspark

I have an rdd similar to following:
[('C3', [{'Item': 'Shirt', 'Color ': 'Black', 'Size': '32','Price':'2500'}, {'Item': 'Sweater', 'Color ': 'Red', 'Size': '35', 'Price': '1000'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}]), ('C1', [{'Item': 'Shirt', 'Color ': 'Green', 'Size': '25', 'Price': '2000'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}])]
We can create above the rdd by:
sc.parallelize([('C3', [{'Item': 'Shirt', 'Color ': 'Black', 'Size': '32','Price':'2500'}, {'Item': 'Sweater', 'Color ': 'Red', 'Size': '35', 'Price': '1000'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}]), ('C1', [{'Item': 'Shirt', 'Color ': 'Green', 'Size': '25', 'Price': '2000'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Saree', 'Color ': 'Green', 'Size': '25', 'Price': '1500'}, {'Item': 'Jeans', 'Color ': 'Yellow', 'Size': '30', 'Price': '1500'}])])
I need to create a dataframe/rdd similar to following (I am adding count to all attributes)
{'C1': {'Color ': {'Green': 3, 'Yellow': 1},
'Item': {'Jeans': 1, 'Saree': 2, 'Shirt': 1},
'Price': {'1500': 3, '2000': 1},
'Size': {'25': 3, '30': 1}},
'C3': {'Color ': {'Black': 1, 'Red': 1, 'Yellow': 1},
'Item': {'Jeans': 1, 'Shirt': 1, 'Sweater': 1},
'Price': {'1000': 1, '1500': 1, '2500': 1},
'Size': {'30': 1, '32': 1, '35': 1}}}
Corresponding dataframe/rdd will be:
+-------+---------------------------------------------------------------------
|custo |attr
|C1 |Map(Color -> Map(Green -> 3, yellow -> 1), Item -> Map(Jeans -> 1, Saree -> 2, Shirt ->1), Price -> |
+-------+-------------------------------------------------------------------------------------------------------+

Use a udf to collect the counts.
from pyspark.sql import functions as f
from pyspark.sql import types as t
def count(c_dict):
res = {}
for item in c_dict:
print(type(item))
for key in item:
print(key, item[key])
if key in res:
if item[key] in res[key]:
res[key][item[key]]+= 1
else:
res[key][item[key]] = 1
else:
res[key]={}
res[key][item[key]] = 1
return(res)
schema = t.MapType(t.StringType(), t.MapType(t.StringType(), t.IntegerType()))
count_udf = f.udf(count, schema)
df2 = df.withColumn( 'col2' , count_udf(df.col2))
df.collect()
result
[Row(col1='C3', col2={'Size': {'35': 1, '30': 1, '32': 1}, 'Price': {'2500': 1, '1500': 1, '1000': 1}, 'Item': {'Sweater': 1, 'Jeans': 1, 'Shirt': 1}, 'Color ': {'Red': 1, 'Black': 1, 'Yellow': 1}}),
Row(col1='C1', col2={'Size': {'25': 3, '30': 1}, 'Price': {'1500': 3, '2000': 1}, 'Item': {'Jeans': 1, 'Saree': 2, 'Shirt': 1}, 'Color ': {'Green': 3, 'Yellow': 1}})]

I want to make a dictionary in good order

I want to parse excel and put data in the model(User).Now I want to make a dictionary which has excel data. Excel is
I wrote in views.py
#coding:utf-8
from django.shortcuts import render
import xlrd
def try_to_int(arg):
try:
return int(arg)
except:
return arg
def main():
book3 = xlrd.open_workbook('./data/excel1.xlsx')
sheet3 = book3.sheet_by_index(0)
data_dict = {}
tag_list = sheet3.row_values(0)[1:]
for row_index in range(1, sheet3.nrows):
row = sheet3.row_values(row_index)[1:]
row = list(map(try_to_int, row))
data_dict[row_index] = dict(zip(tag_list, row))
print(data_dict)
main()
and it printed out {1: {'A': '', 'area': 'New York', 'C': 0, 'name': 'Blear', 'B': ''}, 2: {'A': '', 'area': 'Chicago', 'C': '', 'name': '', 'B': 0}, 3: {'A': 0, 'area': 'London', 'C': '', 'name': '', 'B': ''}, 4: {'A': '', 'area': 'Singapore', 'C': '', 'name': 'Tom', 'B': ''}, 5: {'A': 0, 'area': 'Delhi', 'C': '', 'name': '', 'B': ''}, 6: {'A': '', 'area': 'Beijing', 'C': 1, 'name': '', 'B': ''}}
But
I cannot understand why output dictionary is mess.I want to get a dictionary like
{1: {'name': 'Blear', 'area': 'New York', 'A': '', 'B': '', 'C': 0},
1: {'name': 'Blear', 'area': 'Chicago', 'A': '', 'B': 0, 'C': ''},
1: {'name': 'Blear', 'area': 'London', 'A': 50, 'B': 0, 'C': ''},
2: {'name': 'Tom', 'area': 'Singapore', 'A': '', 'B': '', 'C': ''}}・・・
What is wrong in my code? How should I fix this?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to merge same value dict of list - python

Related

Extract specific region from image using segmentation in python

python data structure: list of dict to one dict

Update JSON format from other JSON file

list of tuples to rdd with count using map reduce pyspark

I want to make a dictionary in good order

Categories

Resources