Related
Inputs
I have two lists of lists.
rule_seq =
[['#1', '#2', '#3'],
['#1', '#2', '#3']]
KG_seq =
[['nationality', 'placeOfBirth', 'locatedIn'],
['nationality', 'hasFather', 'nationality']]
I have to map the values in the same index to the dictionary with the value of rule_seq as the key in the list of above.
My desired output is
Output
unify_dict =
{'#1': ['nationality'],
'#2': ['placeOfBirth', 'hasFather'],
'#3': ['locatedIn', 'nationality']}
I made a dictionary as follows by flattening and zipping both lists of lists to check whether keys and values are in the dictionary.
My code is as follows.
def create_unify_dict(rule_seq, KG_seq):
unify_dict = collections.defaultdict(list)
flat_aug_rule_list = list(itertools.chain.from_iterable(rule_seq))
flat_aug_KG_list = list(itertools.chain.from_iterable(KG_seq))
[unify_dict[key].append(val) for key, val in zip(flat_aug_rule_list, flat_aug_KG_list)
if key not in unify_dict.keys() or val not in unify_dict[key]]
return unify_dict
unify_dict = create_unify_dict(rule_seq, KG_seq)
Is there a simpler way to get the result I want?
You can just call append using the same defualtdict with second level of nesting.
from collections import defaultdict
result = defaultdict(list)
for keyList,valueList in zip(rule_seq, KG_seq):
for key,item in zip(keyList, valueList):
if item not in result[key]: result[key].append(item)
OUTPUT:
defaultdict(<class 'list'>,
{'#1': ['nationality'],
'#2': ['placeOfBirth', 'hasFather'],
'#3': ['locatedIn', 'nationality']})
You can use collections:
import collections
# Create a defaultdict with list as value type
result = collections.defaultdict(list)
for s0, s1 in zip(rule_seq, KG_seq):
for v0, v1 in zip(s0, s1):
if v1 not in result[v0]:
result[v0].append(v1)
print({k: v for k, v in result.items()})
# {
# '#1': ['nationality'],
# '#2': ['placeOfBirth', 'hasFather'],
# '#3': ['locatedIn', 'nationality'],
# }
This would be my homemade approach using no modules. Vanilla Python.
combine = [list(set(l)) for l in [[lst[i] for lst in KG_seq] for i in range(len(KG_seq[0]))]]
dct = {place:st for place,st in zip(rule_seq[0],combine)}
output
{'#1': ['nationality'], '#2': ['hasFather', 'placeOfBirth'], '#3': ['nationality', 'locatedIn']}
oversimplified version
combine = []
for i in range(len(KG_seq[0])):
group = []
for lst in KG_seq:
group.append(lst[i])
combine.append(group)
newComb = []
for simp in combine:
newComb.append(list(set(simp)))
dct = {}
for place,st in zip(rule_seq[0],combine):
dct[place] = st
print(dct)
undersimplified
dct = {place:st for place,st in zip(rule_seq[0],[list(set(l)) for l in [[lst[i] for lst in KG_seq] for i in range(len(KG_seq[0]))]])}
Based on the following assumptions there could be several forms to what your method look like
rule_seq and kg_seq are equal in length
rule_seq and kg_seq items are also equal in length
One liner
def one_liner(rule_seq, kg_seq):
ret = {}
[ret.update({idx: ret.get(idx, set()) | {val}}) for arr_idx, arr_val in zip(rule_seq, kg_seq) for idx, val in zip(arr_idx, arr_val)]
return ret
Single loop + one liner
def one_loop(rule_seq, kg_seq):
ret = {}
for arr_idx, arr_val in zip(rule_seq, kg_seq):
[ret.update({idx: ret.get(idx, set()) | {val}}) for idx, val in zip(arr_idx, arr_val)]
return ret
Nested loops
def nested_loop(rule_seq, kg_seq):
ret = {}
for arr_idx, arr_val in zip(rule_seq, kg_seq):
for idx, val in zip(arr_idx, arr_val):
ret[idx] = ret.get(idx, set()) | {val}
return ret
Testing these out
one_liner(rule_seq, KG_seq)
{'#1': {'nationality'},
'#2': {'hasFather', 'placeOfBirth'},
'#3': {'locatedIn', 'nationality'}}
one_loop(rule_seq, KG_seq)
{'#1': {'nationality'},
'#2': {'hasFather', 'placeOfBirth'},
'#3': {'locatedIn', 'nationality'}}
nested_loop(rule_seq, KG_seq)
{'#1': {'nationality'},
'#2': {'hasFather', 'placeOfBirth'},
'#3': {'locatedIn', 'nationality'}}
d = {}
for i in range(len(rule_seq)):
for j in range(len(rule_seq[i])):
rule, kg = rule_seq[i][j], KG_seq[i][j]
if (rule not in d.keys()):
d[rule] = [kg]
elif kg not in d[rule]:
d[rule].append(kg)
result:
{'#1': ['nationality'], '#2': ['placeOfBirth', 'hasFather'], '#3': ['locatedIn', 'nationality']}
I'm looking for any suggestions to resolve an issue I'm facing. It might seem as a simple problem, but after a few days trying to find an answer - I think it is not anymore.
I'm receiving data (StringType) in a following JSON-like format, and there is a requirement to turn it into flat key-value pair dictionary. Here is a payload sample:
s = """{"status": "active", "name": "{\"first\": \"John\", \"last\": \"Smith\"}", "street_address": "100 \"Y\" Street"}"""
and the desired output should look like this:
{'status': 'active', 'name_first': 'John', 'name_last': 'Smith', 'street_address': '100 "Y" Street'}
The issue is I can't find a way to turn original string (s) into a dictionary. If I can achieve that the flattening part is working perfectly fine.
import json
import collections
import ast
#############################################################
# Flatten complex structure into a flat dictionary
#############################################################
def flatten_dictionary(dictionary, parent_key=False, separator='_', value_to_str=True):
"""
Turn a nested complex json into a flattened dictionary
:param dictionary: The dictionary to flatten
:param parent_key: The string to prepend to dictionary's keys
:param separator: The string used to separate flattened keys
:param value_to_str: Force all returned values to string type
:return: A flattened dictionary
"""
items = []
for key, value in dictionary.items():
new_key = str(parent_key) + separator + key if parent_key else key
try:
value = json.loads(value)
except BaseException:
value = value
if isinstance(value, collections.MutableMapping):
if not value.items():
items.append((new_key,None))
else:
items.extend(flatten_dictionary(value, new_key, separator).items())
elif isinstance(value, list):
if len(value):
for k, v in enumerate(value):
items.extend(flatten_dictionary({str(k): (str(v) if value_to_str else v)}, new_key).items())
else:
items.append((new_key,None))
else:
items.append((new_key, (str(value) if value_to_str else value)))
return dict(items)
# Data sample; sting and dictionary
s = """{"status": "active", "name": "{\"first\": \"John\", \"last\": \"Smith\"}", "street_address": "100 \"Y\" Street"}"""
d = {"status": "active", "name": "{\"first\": \"John\", \"last\": \"Smith\"}", "street_address": "100 \"Y\" Street"}
# Works for dictionary type
print(flatten_dictionary(d))
# Doesn't work for string type, for any of the below methods
e = eval(s)
# a = ast.literal_eval(s)
# j = json.loads(s)
Try:
import json
import re
def jsonify(s):
s = s.replace('"{','{').replace('}"','}')
s = re.sub(r'street_address":\s+"(.+)"(.+)"(.+)"', r'street_address": "\1\2\3"',s)
return json.loads(s)
If you must keep the quotes around Y, try:
def jsonify(s):
s = s.replace('"{','{').replace('}"','}')
search = re.search(r'street_address":\s+"(.+)"(.+)"(.+)"',s)
if search:
s = re.sub(r'street_address":\s+"(.+)"(.+)"(.+)"', r'street_address": "\1\2\3"',s)
dict_version = json.loads(s)
dict_version['street_address'] = dict_version['street_address'].replace(search.group(2),'"'+search.group(2)+'"')
return dict_version
A more generalized attempt:
def jsonify(s):
pattern = r'(?<=[,}])\s*"(.[^\{\}:,]+?)":\s+"([^\{\}:,]+?)"([^\{\}:,]+?)"([^\{\}:,]+?)"([,\}])'
s = s.replace('"{','{').replace('}"','}')
search = re.search(pattern,s)
matches = []
if search:
matches = re.findall(pattern,s)
s = re.sub(pattern, r'"\1": "\2\3\4"\5',s)
dict_version = json.loads(s)
for match in matches:
dict_version[match[0]] = dict_version[match[0]].replace(match[2],'"'+match[2]+'"')
return dict_version
I have a json in below format:
{"MainName":[{"col1":"12345","col2":"False","col3":"190809","SubName1":{"col4":30.00,"SubName2":{"col5":"19703","col6":"USD"}},"col7":"7372267","SubName3":[{"col8":"345337","col9":"PC"}],"col10":"10265","col11":"29889004","col12":"calculated","col13":"9218","SubName4":{"col14":1,"SubName5":{"col15":"1970324","col16":"integer"}},"col17":"434628","col18":"2020-02-06T13:47:40.000-0800","col19":"754878037","SubName6":{"col20":30.00,"SubName7":{"col21":"19703248","col22":"USD"}}},{"col1":"12345","col2":"False","col3":"190809","SubName1":{"col4":30.00,"SubName2":{"col5":"19703","col6":"USD"}},"col7":"7372267","SubName3":[{"col8":"345337","col9":"PC"}],"col10":"10265","col11":"29889004","col12":"calculated","col13":"9218","SubName4":{"col14":1,"SubName5":{"col15":"1970324","col16":"integer"}},"col17":"434628","col18":"2020-02-06T13:47:40.000-0800","col19":"754878037","SubName6":{"col20":30.00,"SubName7":{"col21":"19703248","col22":"USD"}}}],"skip":0,"top":2,"next":"/v1/APIName?skip=2&top=2"}
I want to convert it into csv with below format:
MainName_col1,MainName_col2,MainName_col3,MainName_SubName1_col4,MainName_SubName1_SubName2_col5,MainName_SubName1_SubName2_col6,MainName_col7,MainName_SubName3_col8,MainName_SubName3_col9,MainName_col10,MainName_col11,MainName_col12,MainName_col13,MainName_SubName4_col14,MainName_SubName4_SubName5_col15,MainName_SubName4_SubName5_col16,MainName_col17,MainName_col18,MainName_col19,MainName_SubName6_col20,MainName_SubName6_SubName7_col21,MainName_SubName6_SubName7_col22
12345,False,190809,30.0,19703,USD,7372267,345337,PC,10265,29889004,calculated,9218,1,1970324,integer,434628,2020-02-06T13:47:40.000-0800,754878037,30.0,19703248,USD
12345,False,190809,30.0,19703,USD,7372267,345337,PC,10265,29889004,calculated,9218,2,123453,integer,434628,2020-02-06T13:47:40.000-0800,754878037,30.0,19703248,USD
Kindly help me out in this.
Use below function to flatten your JSON data.
dc = {"MainName":[{"col1":"12345","col2":False,"col3":"190809","SubName1":{"col4":30.00,"SubName2":{"col5":"19703","col6":"USD"}},"col7":"7372267","SubName3":[{"col8":"345337","col9":"PC"}],"col10":"10265","col11":"29889004","col12":"calculated","col13":"9218","SubName4":{"col14":1,"SubName5":{"col15":"1970324","col16":"integer"}},"col17":"434628","col18":"2020-02-06T13:47:40.000-0800","col19":"754878037","SubName6":{"col20":30.00,"SubName7":{"col21":"19703248","col22":"USD"}}}],"skip":0,"top":1,"next":"/v1/APIName?skip=1&top=1"}
def flatten(root: str, dict_obj: dict):
flat = {}
for i in dict_obj.keys():
val = dict_obj[i]
if not isinstance(val, dict) and not isinstance(val, list):
flat[f'{root}_{i}'] = val
else:
if isinstance(val, list):
val = val[-1]
flat.update(flatten(f'{root}_{i}', val))
return flat
flatten('MainName', dc['MainName'][0])
It will give you expected output. Then use it the way you want.
{'MainName_col1': '12345',
'MainName_col2': False,
'MainName_col3': '190809',
'MainName_SubName1_col4': 30.0,
'MainName_SubName1_SubName2_col5': '19703',
'MainName_SubName1_SubName2_col6': 'USD',
'MainName_col7': '7372267',
'MainName_SubName3_col8': '345337',
'MainName_SubName3_col9': 'PC',
'MainName_col10': '10265',
'MainName_col11': '29889004',
'MainName_col12': 'calculated',
'MainName_col13': '9218',
'MainName_SubName4_col14': 1,
'MainName_SubName4_SubName5_col15': '1970324',
'MainName_SubName4_SubName5_col16': 'integer',
'MainName_col17': '434628',
'MainName_col18': '2020-02-06T13:47:40.000-0800',
'MainName_col19': '754878037',
'MainName_SubName6_col20': 30.0,
'MainName_SubName6_SubName7_col21': '19703248',
'MainName_SubName6_SubName7_col22': 'USD'}
As of my understanding, your dc will look like below
dc = {"MainName":[{"col1":"12345","col2":"False","col3":"190809","SubName1":{"col4":30.00,"SubName2":{"col5":"19703","col6":"USD"}},"col7":"7372267","SubName3":[{"col8":"345337","col9":"PC"}],"col10":"10265","col11":"29889004","col12":"calculated","col13":"9218","SubName4":{"col14":1,"SubName5":{"col15":"1970324","col16":"integer"}},"col17":"434628","col18":"2020-02-06T13:47:40.000-0800","col19":"754878037","SubName6":{"col20":30.00,"SubName7":{"col21":"19703248","col22":"USD"}}},{"col1_a":"12345XX","col2_b":"False","col3_c":"190809","SubName1":{"col4_d":30.00,"SubName2":{"col5_e":"19703","col6_f":"USD"}},"col7_g":"7372267","SubName3":[{"col8_h":"345337","col9":"PC"}],"col10_i":"10265","col11_j":"29889004","col12_k":"calculated","col13_l":"9218","SubName4":{"col14_m":1,"SubName5":{"col15_n":"1970324","col16_o":"integer"}},"col17_p":"434628","col18_q":"2020-02-06T13:47:40.000-0800","col19_r":"754878037","SubName6":{"col20_s":30.00,"SubName7":{"col21_t":"19703248","col22_u":"USDZZ"}}}],"skip":0,"top":2,"next":"/v1/APIName?skip=2&top=2"}
I used the above answer to flatten everything into single object
def flatten(root: str, dict_obj: dict):
flat = {}
for i in dict_obj.keys():
val = dict_obj[i]
if not isinstance(val, dict) and not isinstance(val, list):
flat[f'{root}_{i}'] = val
else:
if isinstance(val, list):
val = val[-1]
flat.update(flatten(f'{root}_{i}', val))
return flat
keys_list = []
values_list = []
for i in range(len(dc['MainName'])):
result = flatten('MainName', dc['MainName'][i])
keys_list.append(list(result.keys()))
values_list.append(list(result.values()))
for k in keys_list:
for res in k:
guestFile = open("sample.csv","a")
guestFile.write(res)
guestFile.write(",")
guestFile.close()
for v in values_list:
for res in v:
guestFile = open("sample.csv","a")
guestFile.write(str(res))
guestFile.write(",")
guestFile.close()
Checkout my code at https://repl.it/#TamilselvanLaks/jsontocsvmul
Note: Use the 'run' button to run the program, left side you can see sample.csv
there you can see all keys as like you want
Please let me know my answer meets your expectation
I'm trying to run the following code:
def by_primary_key(table, key, fields) -> object:
key_columns = get_key_columns(table, key )
print("key columns in get by primary key " , key_columns)
print("key, " , key )
zip_it = list(zip(key_columns, key))
print("zip_it", zip_it )
dictt = dict(zip_it)
print("dict", dictt)
The output I want for zip_it is: [('playerID', 'willite01')]
but the output the program produces is:
key columns in get by primary key ['playerID']
key, willite01
zip_it [('playerID', 'w')]
dict {'playerID': 'w'}
Where am I going wrong?
The following worked
key_columns = get_key_columns(table, key )
lst = []
lst.append(key)
tmp = dict(zip(key_columns, lst))
result = find_by_template1(table, tmp, fields)
return result
I have a DeepDiff result which is obtained by comparing two JSON files. I have to construct a python dictionary from the deepdiff result as follows.
json1 = {"spark": {"ttl":3, "poll":34}}
json2 = {"spark": {"ttl":3, "poll":34, "toll":23}, "cion": 34}
deepdiffresult = {'dictionary_item_added': {"root['spark']['toll']", "root['cion']"}}
expecteddict = {"spark" : {"toll":23}, "cion":34}
How can this be achieved?
There is probably a better way to do this. But you can parse the returned strings and chain together a new dictionary with the result you want.
json1 = {"spark": {"ttl":3, "poll":34}}
json2 = {"spark": {"ttl":3, "poll":34, "toll":23}, "cion": 34}
deepdiffresult = {'dictionary_item_added': {"root['spark']['toll']", "root['cion']"}}
added = deepdiffresult['dictionary_item_added']
def convert(s, j):
s = s.replace('root','')
s = s.replace('[','')
s = s.replace("'",'')
keys = s.split(']')[:-1]
d = {}
for k in reversed(keys):
if not d:
d[k] = None
else:
d = {k: d}
v = None
v_ref = d
for i, k in enumerate(keys, 1):
if not v:
v = j.get(k)
else:
v = v.get(k)
if i<len(keys):
v_ref = v_ref.get(k)
v_ref[k] = v
return d
added_dict = {}
for added_str in added:
added_dict.update(convert(added_str, json2))
added_dict
#returns:
{'cion': 34, 'spark': {'toll': 23}}
Simple Answer,
in python have a in-build called Dictdiffer function. can you try this.
$ pip install dictdiffer
Examples:
from dictdiffer import diff
result = diff(json1, json2)
print result == {"spark" : {"toll":23}, "cion":34}
References:
DictDiffer