I want to remove nan value from dictonaires and list. If nan is the only value in case of phone_type work then remove full dictionary itself.
Input Data
dic = {'Customer_Number': 12345, 'Email': [{'Email_Type': 'Primary', 'Email': ['sa#ru.edu', nan]}]
,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}, {'Phone_Type': 'work', 'Phone': [nan]}]}
Expected Output
{'Customer_Number': 12345, 'Email': [{'Email_Type': 'Primary', 'Email': ['sam#rus.edu']}]
,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}]}
Code tried:
for i in range(0, len(dic)):
for j in dic[i][key]:
print("j key:",j)
print("j",j[value[1]])
if (pd.isna(j[value[1]])):
print("nan condition")
dic[i][value[1]].remove(j)
else:
null_val_dict_removal.append(j)
dic[i][key] = null_val_dict_removal
print("dict key", dic[i][key])
null_val_dict_removal = []
Getting error :
if (pd.isna(j[value[1]])):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
You can create a recursive function to remove nan and empty lists/dictionaries:
import json
from numpy import nan, NaN, NAN
dic = {'Customer_Number': 92154246, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu', nan]}]
,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280.0]}, {'Phone_Type': 'work', 'Phone': [nan]}]}
# define which elements you want to remove:
to_be_deleted = [[], {}, "", None, nan]
def remove_empty_elements(jsonData):
if isinstance(jsonData, list):
jsonData = [new_elem for elem in jsonData
if (new_elem := remove_empty_elements(elem)) not in to_be_deleted]
elif isinstance(jsonData,dict):
jsonData = {key: new_value for key, value in jsonData.items()
if (new_value := remove_empty_elements(value)) not in to_be_deleted}
if len(jsonData) == 1:
return None
return jsonData
new_dic = remove_empty_elements(dic)
print(json.dumps(new_dic, indent=4))
Output:
{
"Customer_Number": 92154246,
"Email": [
{
"Email_Type": "Primary",
"Email": [
"saman.zonouz#rutgers.edu"
]
}
],
"Phone_Number": [
{
"Phone_Type": "Mobile",
"Phone": [
12177218280.0
]
}
]
}
Edit: for python < 3.8, remove the comprehension assignments from the function:
def remove_empty_elements(jsonData):
if isinstance(jsonData, list):
jsonData = [remove_empty_elements(elem) for elem in jsonData
if remove_empty_elements(elem) not in to_be_deleted]
elif isinstance(jsonData,dict):
jsonData = {key: remove_empty_elements(value) for key, value in jsonData.items()
if remove_empty_elements(value) not in to_be_deleted}
if len(jsonData) == 1:
return None
return jsonData
Related
I want to remove dictionary if value is nan/None/NaN/null for key . If nan is the only value in case of phone_type work then remove full dictionary itself.
Input Data
dic = ['Customer_Number': 12345,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}, {'Phone_Type': 'work', 'Phone': [nan]}]]
Expected Output Data
dic = ['Customer_Number': 12345,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}]]
code tried
#define which elements you want to remove:
to_be_deleted = [[], {}, "", None, "nan",nan, "NaN", NaN]
def remove_empty_elements(jsonData):
if isinstance(jsonData, list):
print("jsonDAta:", jsonData)
jsonData = [remove_empty_elements(elem) for elem in jsonData
if remove_empty_elements(elem) not in to_be_deleted]
elif isinstance(jsonData, dict):
jsonData = {key: remove_empty_elements(value) for key, value in jsonData.items()
if remove_empty_elements(value) not in to_be_deleted}
if len(jsonData) == 1:
return None
return jsonData
res = remove_empty_elements(dic)
Example solution:
to_delete = [ None, [None], "", None, "nan", "NaN"]
dic = {'Customer_Number': 12345,'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}, {'Phone_Type': 'work', 'Phone': [None]}]}
for key in dic.keys():
if isinstance(dic[key], list):
for item in dic[key]:
if isinstance(item, dict):
item_keys = list(item.keys())
for key_2 in item_keys:
if item[key_2] in to_delete:
del item[key_2]
output
{'Customer_Number': 12345, 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [1217]}, {'Phone_Type': 'work'}]}
Change accordingly in order to catch more cases. This one only works if you have a list of dictionaries as value for the top level keys
I want to remove duplicates values from list which is inside the dictionary. I am trying to make configurable code to work on any field instead of making field specific.
Input Data :
{'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu', 'saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280, 12177218280]}]}
Expected Output Data :
{'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280]}]}
code tried:
dic = {'Customer_Number': 90617174, 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu', 'saman.zonouz#rutgers.edu']}], 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280, 12177218280]}]}
res = []
for i in dic:
if i not in res:
res.append(i)
You can use set()
import json
dic = {
'Customer_Number': 90617174,
'Email': [
{
'Email_Type': 'Primary',
'Email': list(set([
'saman.zonouz#rutgers.edu',
'saman.zonouz#rutgers.edu',
]))
}
],
'Phone_Number': [
{
'Phone_Type': 'Mobile',
'Phone': list(set([
12177218280,
12177218280,
]))
}
]
}
print(json.dumps(dic,indent=2))
If you want to do it on a list of dic's then you can do like this:
for dic in dics:
for email in dic['Email']:
email['Email'] = list(set(email['Email']))
for phone in dic['Phone_Number']:
phone['Phone'] = list(set(phone['Phone']))
The approach that you started with, you need to go a few levels deeper with that to find every such "repeating" list and dedupe it.
To dedupe, you can use a set - which is also a "container" data structure like a list but with some (many?) differences. You can get a good introduction to all of this in the official python docs -
for key in dic:
if isinstance(dic[key], list):
for inner_dict in dic[key]:
for inner_key in inner_dict:
if isinstance(inner_dict[inner_key], list):
inner_dict[inner_key] = list(set(inner_dict[inner_key]))
print(dic)
#{'Customer_Number': 90617174,
# 'Email': [{'Email_Type': 'Primary', 'Email': ['saman.zonouz#rutgers.edu']}],
# 'Phone_Number': [{'Phone_Type': 'Mobile', 'Phone': [12177218280]}]}
I have the following array of dicts (there's only one dict):
[{
'RuntimeInMinutes': '21',
'EpisodeNumber': '21',
'Genres': ['Animation'],
'ReleaseDate': '2005-02-05',
'LanguageOfMetadata': 'EN',
'Languages': [{
'_Key': 'CC',
'Value': ['en']
}, {
'_Key': 'Primary',
'Value': ['EN']
}],
'Products': [{
'URL': 'http://www.hulu.com/watch/217566',
'Rating': 'TV-Y',
'Currency': 'USD',
'SUBSCRIPTION': '0.00',
'_Key': 'US'
}, {
'URL': 'http://www.hulu.com/d/217566',
'Rating': 'TV-Y',
'Currency': 'USD',
'SUBSCRIPTION': '0.00',
'_Key': 'DE'
}],
'ReleaseYear': '2005',
'TVSeriesID': '5638#TVSeries',
'Type': 'TVEpisode',
'Studio': '4K Media'
}]
I would like to flatten the dict as follows:
[{
'RuntimeInMinutes': '21',
'EpisodeNumber': '21',
'Genres': ['Animation'],
'ReleaseDate': '2005-02-05',
'LanguageOfMetadata': 'EN',
'Languages._Key': ['CC', 'Primary'],
'Languages.Value': ['en', 'EN'],
'Products.URL': ['http://www.hulu.com/watch/217566', 'http://www.hulu.com/d/217566'],
'Products.Rating': ['TV-Y', 'TV-Y'],
'Products.Currency': ['USD', 'USD'],
'Products.SUBSCRIPTION': ['0.00', '0.00'],
'Products._Key': ['US', 'DE'],
'ReleaseYear': '2005',
'TVSeriesID': '5638#TVSeries',
'Type': 'TVEpisode',
'Studio': '4K Media'
}]
In other words, anytime a dict is encountered, it need to convert to either a string, number, or list.
What I currently have is something along the lines of the following, which uses a while loop to iterate through all the subpaths of the json.
while True:
for key in copy(keys):
val = get_sub_object_from_path(obj, key)
if isinstance(val, dict):
FLAT_OBJ[key.replace('/', '.')] = val
else:
keys.extend(os.path.join(key, _nextkey) for _nextkey in val.keys())
keys.remove(key)
if (not keys) or (n > 5):
break
else:
n += 1
continue
You can use recursion with a generator:
from collections import defaultdict
_d = [{'RuntimeInMinutes': '21', 'EpisodeNumber': '21', 'Genres': ['Animation'], 'ReleaseDate': '2005-02-05', 'LanguageOfMetadata': 'EN', 'Languages': [{'_Key': 'CC', 'Value': ['en']}, {'_Key': 'Primary', 'Value': ['EN']}], 'Products': [{'URL': 'http://www.hulu.com/watch/217566', 'Rating': 'TV-Y', 'Currency': 'USD', 'SUBSCRIPTION': '0.00', '_Key': 'US'}, {'URL': 'http://www.hulu.com/d/217566', 'Rating': 'TV-Y', 'Currency': 'USD', 'SUBSCRIPTION': '0.00', '_Key': 'DE'}], 'ReleaseYear': '2005', 'TVSeriesID': '5638#TVSeries', 'Type': 'TVEpisode', 'Studio': '4K Media'}]
def get_vals(d, _path = []):
for a, b in getattr(d, 'items', lambda :{})():
if isinstance(b, list) and all(isinstance(i, dict) or isinstance(i, list) for i in b):
for c in b:
yield from get_vals(c, _path+[a])
elif isinstance(b, dict):
yield from get_vals(b, _path+[a])
else:
yield ['.'.join(_path+[a]), b]
results = [i for b in _d for i in get_vals(b)]
_c = defaultdict(list)
for a, b in results:
_c[a].append(b)
result = [{a:list(b) if len(b) > 1 else b[0] for a, b in _c.items()}]
import json
print(json.dumps(result, indent=4))
Output:
[
{
"RuntimeInMinutes": "21",
"EpisodeNumber": "21",
"Genres": [
"Animation"
],
"ReleaseDate": "2005-02-05",
"LanguageOfMetadata": "EN",
"Languages._Key": [
"CC",
"Primary"
],
"Languages.Value": [
[
"en"
],
[
"EN"
]
],
"Products.URL": [
"http://www.hulu.com/watch/217566",
"http://www.hulu.com/d/217566"
],
"Products.Rating": [
"TV-Y",
"TV-Y"
],
"Products.Currency": [
"USD",
"USD"
],
"Products.SUBSCRIPTION": [
"0.00",
"0.00"
],
"Products._Key": [
"US",
"DE"
],
"ReleaseYear": "2005",
"TVSeriesID": "5638#TVSeries",
"Type": "TVEpisode",
"Studio": "4K Media"
}
]
Edit: wrapping solution in outer function:
def flatten_obj(data):
def get_vals(d, _path = []):
for a, b in getattr(d, 'items', lambda :{})():
if isinstance(b, list) and all(isinstance(i, dict) or isinstance(i, list) for i in b):
for c in b:
yield from get_vals(c, _path+[a])
elif isinstance(b, dict):
yield from get_vals(b, _path+[a])
else:
yield ['.'.join(_path+[a]), b]
results = [i for b in data for i in get_vals(b)]
_c = defaultdict(list)
for a, b in results:
_c[a].append(b)
return [{a:list(b) if len(b) > 1 else b[0] for a, b in _c.items()}]
EDIT
This now appears to be fixed:
As #panda-34 correctly points out (+1), the currently accepted
solution loses data, specifically Genres and Languages.Value when
you run the posted code.
Unfortunately, #panda-34's code modifies Genres:
'Genres': 'Animation',
rather than leaving it alone as in the OP's example:
'Genres': ['Animation'],
Below's my solution which attacks the problem a different way. None of the keys in the original data contains a dictionary as a value, only non-containers or lists (e.g. lists of dictionaries). So a primary a list of dictionaries will becomes a dictionary of lists (or just a plain dictionary if there's only one dictionary in the list.) Once we've done that, then any value that's now a dictionary is expanded back into the original data structure:
def flatten(container):
# A list of dictionaries becomes a dictionary of lists (unless only one dictionary in list)
if isinstance(container, list) and all(isinstance(element, dict) for element in container):
new_dictionary = {}
first, *rest = container
for key, value in first.items():
new_dictionary[key] = [flatten(value)] if rest else flatten(value)
for dictionary in rest:
for key, value in dictionary.items():
new_dictionary[key].append(value)
container = new_dictionary
# Any dictionary value that's a dictionary is expanded into original dictionary
if isinstance(container, dict):
new_dictionary = {}
for key, value in container.items():
if isinstance(value, dict):
for sub_key, sub_value in value.items():
new_dictionary[key + "." + sub_key] = sub_value
else:
new_dictionary[key] = value
container = new_dictionary
return container
OUTPUT
{
"RuntimeInMinutes": "21",
"EpisodeNumber": "21",
"Genres": [
"Animation"
],
"ReleaseDate": "2005-02-05",
"LanguageOfMetadata": "EN",
"Languages._Key": [
"CC",
"Primary"
],
"Languages.Value": [
[
"en"
],
[
"EN"
]
],
"Products.URL": [
"http://www.hulu.com/watch/217566",
"http://www.hulu.com/d/217566"
],
"Products.Rating": [
"TV-Y",
"TV-Y"
],
"Products.Currency": [
"USD",
"USD"
],
"Products.SUBSCRIPTION": [
"0.00",
"0.00"
],
"Products._Key": [
"US",
"DE"
],
"ReleaseYear": "2005",
"TVSeriesID": "5638#TVSeries",
"Type": "TVEpisode",
"Studio": "4K Media"
}
But this solution introduces a new apparent inconsistency:
'Languages.Value': ['en', 'EN'],
vs.
"Languages.Value": [["en"], ["EN"]],
However, I believe this is tied up with the Genres inconsistency mentioned earlier and the OP needs to define a consistent resolution.
Ajax1234's answer loses values of 'Genres' and 'Languages.Value'
Here's a bit more generic version:
def flatten_obj(data):
def flatten_item(item, keys):
if isinstance(item, list):
for v in item:
yield from flatten_item(v, keys)
elif isinstance(item, dict):
for k, v in item.items():
yield from flatten_item(v, keys+[k])
else:
yield '.'.join(keys), item
res = []
for item in data:
res_item = defaultdict(list)
for k, v in flatten_item(item, []):
res_item[k].append(v)
res.append({k: (v if len(v) > 1 else v[0]) for k, v in res_item.items()})
return res
P.S. "Genres" value is also flattened. It is either an inconsistency in the OP requirements or a separate problem which is not addressed in this answer.
I wanna make a dictionary has name's key & data.In views.py I wrote
data_dict ={}
def try_to_int(arg):
try:
return int(arg)
except:
return arg
def main():
book4 = xlrd.open_workbook('./data/excel1.xlsx')
sheet4 = book4.sheet_by_index(0)
data_dict_origin = OrderedDict()
tag_list = sheet4.row_values(0)[1:]
for row_index in range(1, sheet4.nrows):
row = sheet4.row_values(row_index)[1:]
row = list(map(try_to_int, row))
data_dict_origin[row_index] = dict(zip(tag_list, row))
if data_dict_origin['name'] in data_dict:
data_dict[data_dict_origin['name']].update(data_dict_origin)
else:
data_dict[data_dict_origin['name']] = data_dict_origin
main()
When I printed out data_dict,it is
OrderedDict([(1, {'user_id': '100', 'group': 'A', 'name': 'Tom', 'dormitory': 'C'}), (2, {'user_id': '50', 'group': 'B', 'name': 'Blear', 'dormitory': 'E'})])
My ideal dictionary is
dicts = {
Tom: {
'user_id': '100',
'group': 'A',
'name': 'Tom',
'dormitory': 'C'
},
Blear: {
},
}
How should I fix this?What should I write it?
The code is using the wrong key in the dictionary. The keys are 1, 2, and do not have the name key. You can use this code instead:
for value in data_dict.values():
if value['name'] in data_dict:
data_dict[value['name']].update(value)
else:
data_dict[value['name']] = value
Your data_dict_origin has numbers as keys and dicts as values (which technically makes it a sparse array of dicts). The "name" key exists in those dicts, not in your data_dict.
Sorry for the length but tried to be complete.
I'm trying to get the following data -
(only small sampling from a much larger json file, same structure)
{
"count": 394,
"status": "ok",
"data": [
{
"md5": "cd042ba78d0810d86755136609793d6d",
"threatscore": 90,
"threatlevel": 0,
"avdetect": 0,
"vxfamily": "",
"domains": [
"dynamicflakesdemo.com",
"www.bountifulbreast.co.uk"
],
"hosts": [
"66.33.214.180",
"64.130.23.5",
],
"environmentId": "1",
},
{
"md5": "4f3a560c8deba19c5efd48e9b6826adb",
"threatscore": 65,
"threatlevel": 0,
"avdetect": 0,
"vxfamily": "",
"domains": [
"px.adhigh.net"
],
"hosts": [
"130.211.155.133",
"65.52.108.163",
"172.225.246.16"
],
"environmentId": "1",
}
]
}
if "threatscore" is over 70 I want to add it to this json structure -
Ex.
"data": [
{
"md5": "cd042ba78d0810d86755136609793d6d",
"threatscore": 90,
{
"Event":
{"date":"2015-11-25",
"threat_level_id":"1",
"info":"HybridAnalysis",
"analysis":"0",
"distribution":"0",
"orgc":"SOC",
"Attribute": [
{"type":"ip-dst",
"category":"Network activity",
"to_ids":True,
"distribution":"3",
"value":"66.33.214.180"},
{"type":"ip-dst",
"category":"Network activity",
"to_ids":True,
"distribution":"3",
"value":"64.130.23.5"}
{"type":"domain",
"category":"Network activity",
"to_ids":True,
"distribution":"3",
"value":"dynamicflakesdemo.com"},
{"type":"domain",
"category":"Network activity",
"to_ids":True,
"distribution":"3",
"value":"www.bountifulbreast.co.uk"}
{"type":"md5",
"category":"Payload delivery",
"to_ids":True,
"distribution":"3",
"value":"cd042ba78d0810d86755136609793d6d"}]
}
}
This is my code -
from datetime import datetime
import os
import json
from pprint import pprint
now = datetime.now()
testFile = open("feed.json")
feed = json.load(testFile)
for x in feed['data']:
if x['threatscore'] > 90:
data = {}
data['Event']={}
data['Event']["date"] = now.strftime("%Y-%m-%d")
data['Event']["threat_level_id"] = "1"
data['Event']["info"] = "HybridAnalysis"
data['Event']["analysis"] = 0
data['Event']["distribution"] = 3
data['Event']["orgc"] = "Malware"
data['Event']["Attribute"] = []
if 'hosts' in x:
data['Event']["Attribute"].append({'type': "ip-dst"})
data['Event']["Attribute"][0]["category"] = "Network activity"
data['Event']["Attribute"][0]["to-ids"] = True
data['Event']["Attribute"][0]["distribution"] = "3"
data["Event"]["Attribute"][0]["value"] =x['hosts']
if 'md5' in x:
data['Event']["Attribute"].append({'type': "md5"})
data['Event']["Attribute"][1]["category"] = "Payload delivery"
data['Event']["Attribute"][1]["to-ids"] = True
data['Event']["Attribute"][1]["distribution"] = "3"
data['Event']["Attribute"][1]['value'] = x['md5']
if 'domains' in x:
data['Event']["Attribute"].append({'type': "domain"})
data['Event']["Attribute"][2]["category"] = "Network activity"
data['Event']["Attribute"][2]["to-ids"] = True
data['Event']["Attribute"][2]["distribution"] = "3"
data['Event']["Attribute"][2]["value"] = x['domains']
attributes = data["Event"]["Attribute"]
data["Event"]["Attribute"] = []
for attribute in attributes:
for value in attribute["value"]:
if value == " ":
pass
else:
new_attr = attribute.copy()
new_attr["value"] = value
data["Event"]["Attribute"].append(new_attr)
pprint(data)
with open('output.txt', 'w') as outfile:
json.dump(data, outfile)
And now it seems to be cleaned up a little but the data['md5'] is being split on each letter and I think it's just like L3viathan said earlier I keep overwriting the first element in the dictionary... but I'm not sure how to get it to keep appending???
{'Event': {'Attribute': [{'category': 'Network activity',
'distribution': '3',
'to-ids': True,
'type': 'ip-dst',
'value': u'216.115.96.174'},
{'category': 'Network activity',
'distribution': '3',
'to-ids': True,
'type': 'ip-dst',
'value': u'64.4.54.167'},
{'category': 'Network activity',
'distribution': '3',
'to-ids': True,
'type': 'ip-dst',
'value': u'63.250.200.37'},
{'category': 'Payload delivery',
'distribution': '3',
'to-ids': True,
'type': 'md5',
'value': u'7'},
{'category': 'Payload delivery',
'distribution': '3',
'to-ids': True,
'type': 'md5',
'value': u'1'},
And still getting the following error in the end:
Traceback (most recent call last):
File "hybridanalysis.py", line 34, in
data['Event']["Attribute"][1]["category"] = "Payload delivery"
IndexError: list index out of range
The final goal is to get it set so that I can post the events into MISP but they have to go one at a time.
I think this should fix your problems. I added the attribute dictionary all in one go, and moved the data in a list (which is more appropriate), but you might want to remove the superfluous list which wraps the Events.
from datetime import datetime
import os
import json
from pprint import pprint
now = datetime.now()
testFile = open("feed.json")
feed = json.load(testFile)
data_list = []
for x in feed['data']:
if x['threatscore'] > 90:
data = {}
data['Event']={}
data['Event']["date"] = now.strftime("%Y-%m-%d")
data['Event']["threat_level_id"] = "1"
data['Event']["info"] = "HybridAnalysis"
data['Event']["analysis"] = 0
data['Event']["distribution"] = 3
data['Event']["orgc"] = "Malware"
data['Event']["Attribute"] = []
if 'hosts' in x:
data['Event']["Attribute"].append({
'type': 'ip-dst',
'category': 'Network activity',
'to-ids': True,
'distribution': '3',
'value': x['hosts']})
if 'md5' in x:
data['Event']["Attribute"].append({
'type': 'md5',
'category': 'Payload delivery',
'to-ids': True,
'distribution': '3',
'value': x['md5']})
if 'domains' in x:
data['Event']["Attribute"].append({
'type': 'domain',
'category': 'Network activity',
'to-ids': True,
'distribution': '3',
'value': x['domains']})
attributes = data["Event"]["Attribute"]
data["Event"]["Attribute"] = []
for attribute in attributes:
for value in attribute["value"]:
if value == " ":
pass
else:
new_attr = attribute.copy()
new_attr["value"] = value
data["Event"]["Attribute"].append(new_attr)
data_list.append(data)
with open('output.txt', 'w') as outfile:
json.dump(data_list, outfile)
In the json, "Attiribute" Holds the value of a list with a 1 item, a dict, in it, as shown here.
{'Event': {'Attribute': [{'category': 'Network activity',
'distribution': '3',
'to-ids': True,
'type': 'ip-dst',
'value': [u'54.94.221.70']}]
...
When you call data['Event']["Attribute"][1]["category"] you are getting the second item (index 1) in the list of attribute, while it only has one item, which is why you are getting the error.
Thanks L3viathan! Below is how I tweaked it to not iterate over MD5's.
attributes = data["Event"]["Attribute"]
data["Event"]["Attribute"] = []
for attribute in attributes:
if attribute['type'] == 'md5':
new_attr = attribute.copy()
new_attr["value"] = str(x['md5'])
data["Event"]["Attribute"].append(new_attr)
else:
for value in attribute["value"]:
new_attr = attribute.copy()
new_attr["value"] = value
data["Event"]["Attribute"].append(new_attr)
data_list.append(data)
Manipulating json seems to be the way to go to learn lists and dictionaries.