Turn dict with duplicate keys into list containing these keys

Turn dict with duplicate keys into list containing these keys - python

I receive a response I have no control over from an API. Using requests response.json() will filter out duplicate keys. So I would need to turn this response into a list where each key is an element in that list: What I get now:
{
"user": {
//...
},
"user": {
//...
},
//...
}
What I need:
{
"users": [
{
"user": {
//...
}
},
{
"user": {
//...
}
},
//...
]
}
This way JSON won't filter out any of the results, and I can loop through users.

Okay, let me have a try by method used in Python json parser allow duplicate keys
All we should do is handle the pairs_list by ourself.
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"key": 2, "key": 3}, "foo": 4, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
pairs_list = decoder.decode(data)
# the pairs_list is the real thing which we can use
aggre_key = 's'
def recusive_handle(pairs_list):
dct = {}
for k, v in pairs_list:
if v and isinstance(v, list) and isinstance(v[0], tuple):
v = recusive_handle(v)
if k + aggre_key in dct:
dct[k + aggre_key].append({k: v})
elif k in dct:
first_dict = {k: dct.pop(k)}
dct[k + aggre_key] = [first_dict, {k: v}]
else:
dct[k] = v
return dct
print(recusive_handle(pairs_list))
output:
{'foos': [{'foo': {'keys': [{'key': 2}, {'key': 3}]}}, {'foo': {'bar': 4}}, {'foo': 23}]}

Related

Count number of objects in list of dictionary where a key's value is more than 1

Given a list of dictionaries:
data = {
"data": [
{
"categoryOptionCombo": {
"id": "A"
},
"dataElement": {
"id": "123"
}
},
{
"categoryOptionCombo": {
"id": "B"
},
"dataElement": {
"id": "123"
}
},
{
"categoryOptionCombo": {
"id": "C"
},
"dataElement": {
"id": "456"
}
}
]
}
I would like to display the dataElement where the count of distinct categoryOptionCombo is larger than 1.
e.g. the result of the function would be an iterable of IDs:
[123]
because the dataElement with id 123 has two different categoryOptionCombos.
tracker = {}
for d in data['data']:
data_element = d['dataElement']['id']
coc = d['categoryOptionCombo']['id']
if data_element not in tracker:
tracker[data_element] = set()
tracker[data_element].add(coc)
too_many = [key for key,value in tracker.items() if len(value) > 1]
How can I iterate the list of dictionaries preferably with a comprehension? This solution above is not pythonic.

One approach:
import collections
counts = collections.defaultdict(set)
for d in data["data"]:
counts[d["dataElement"]["id"]].add(d["categoryOptionCombo"]["id"])
res = [k for k, v in counts.items() if len(v) > 1]
print(res)
Output
['123']
This approach creates a dictionary mapping dataElements to the different types of categoryOptionCombo:
defaultdict(<class 'set'>, {'123': {'B', 'A'}, '456': {'C'}})

Almost a one-liner:
counts = collections.Counter( d['dataElement']['id'] for d in data['data'] )
print( counts )
Output:
Counter({'123': 2, '456': 1})

No need for sets, you can just remember each data element's first coc or mark it as having 'multiple'.
tracker = {}
for d in data['data']:
data_element = d['dataElement']['id']
coc = d['categoryOptionCombo']['id']
if tracker.setdefault(data_element, coc) != coc:
tracker[data_element] = 'multiple'
too_many = [key for key,value in tracker.items() if value == 'multiple']
(If the string 'multiple' can be a coc id, then use multiple = object() and compare with is).

unable to update JSON using python

I am trying to update transaction ID from the following json:
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
I have done following code for the same, but it does not update the transaction id, but it inserts the transaction id to the end of block:-
try:
session = requests.Session()
with open(
"sales.json",
"r") as read_file:
payload = json.load(read_file)
payload["transactionId"] = random.randint(0, 5)
with open(
"sales.json",
"w") as read_file:
json.dump(payload, read_file)
Output:-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
}
'transactionId': 1
}
Expected Outut:-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1",
"source": "WEB"
} ]

This would do it, but only in your specific case:
payload["transactions"][0]["transactionId"] = xxx
There should be error handling for cases like "transactions" key is not int the dict, or there are no records or there are more than one
also, you will need to assign =str(your_random_number) not the int if you wish to have the record of type string as the desired output suggests

If you just want to find the transactionId key and you don't know exactly where it may exist. You can do-
from collections.abc import Mapping
def update_key(key, new_value, jsondict):
new_dict = {}
for k, v in jsondict.items():
if isinstance(v, Mapping):
# Recursive traverse if value is a dict
new_dict[k] = update_key(key, new_value, v)
elif isinstance(v, list):
# Traverse through all values of list
# Recursively traverse if an element is a dict
new_dict[k] = [update_key(key, new_value, innerv) if isinstance(innerv, Mapping) else innerv for innerv in v]
elif k == key:
# This is the key to replace with new value
new_dict[k] = new_value
else:
# Just a regular value, assign to new dict
new_dict[k] = v
return new_dict
Given a dict-
{
"locationId": "5115",
"transactions": [
{
"transactionId": "1603804404-5650",
"source": "WEB"
} ]
}
You can do-
>>> update_key('transactionId', 5, d)
{'locationId': '5115', 'transactions': [{'transactionId': 5, 'source': 'WEB'}]}

Yes because transactionId is inside transactions node. So your code should be like:
payload["transactions"][0].transactionId = random.randint(0, 5)
or
payload["transactions"][0]["transactionId"] = random.randint(0, 5)

Remove duplicate values in different Json Lists python

I know that there are a lot of questions about duplicates but I can't find a solution suitable for me.
I have a json structure like this:
{
"test": [
{
"name2": [
"Tik",
"eev",
"asdv",
"asdfa",
"sadf",
"Nick"
]
},
{
"name2": [
"Tik",
"eev",
"123",
"r45",
"676",
"121"
]
}
]
}
I want to keep the first value and remove all the other duplicates.
Expected Result
{
"test": [
{
"name2": [
"Tik",
"eev",
"asdv",
"asdfa",
"sadf",
"Nick"
]
},
{
"name2": [
"123",
"r45",
"676",
"121"
]
}
]
}
I tried using a tmp to check for duplicates but it didn't seem to work. Also I can't find a way to make it json again.
import json
with open('myjson') as access_json:
read_data = json.load(access_json)
tmp = []
tmp2 = []
def get_synonyms():
ingredients_access = read_data['test']
for x in ingredients_access:
for j in x['name2']:
tmp.append(j)
if j in tmp:
tmp2.append(j)
get_synonyms()
print(len(tmp))
print(len(tmp2))

You can use recursion:
def filter_d(d):
seen = set()
def inner(_d):
if isinstance(_d, dict):
return {a:inner(b) if isinstance(b, (dict, list)) else b for a, b in _d.items()}
_r = []
for i in _d:
if isinstance(i, (dict, list)):
_r.append(inner(i))
elif i not in seen:
_r.append(i)
seen.add(i)
return _r
return inner(d)
import json
print(json.dumps(filter_d(data), indent=4))
Output:
{
"test": [
{
"name2": [
"Tik",
"eev",
"asdv",
"asdfa",
"sadf",
"Nick"
]
},
{
"name2": [
"123",
"r45",
"676",
"121"
]
}
]
}

You are first adding everything to tmp and then to tmp2 because every value was added to tmp before.
I changed the function a little bit to work for your specific test example:
def get_synonyms():
test_list = []
ingredients_access = read_data['test']
used_values =[]
for x in ingredients_access:
inner_tmp = []
for j in x['name2']:
if j not in used_values:
inner_tmp.append(j)
used_values.append(j)
test_list.append({'name2':inner_tmp})
return {'test': test_list}
result = get_synonyms()
print(result)
Output:
{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']}, {'name2': ['123', 'r45', '676', '121']}]}

Here's a little hackish answer:
d = {'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
{'name2': ['Tik', 'eev', '123', 'r45', '676', '121']}]}
s = set()
for l in d['test']:
l['name2'] = [(v, s.add(v))[0] for v in l['name2'] if v not in s]
Output:
{'test': [{'name2': ['Tik', 'eev', 'asdv', 'asdfa', 'sadf', 'Nick']},
{'name2': ['123', 'r45', '676', '121']}]}
This uses a set to track the unique values, and add unique values to set while returning the value back to the list.

How to sort all lists in a deeply nested dictionary in python?

I want to sort all lists within a deeply nested dictionary. It is basically a JSON object which deep nesting of dictionaries within lists and then lists within dictionaries. All I want to do is, parse through all dictionary keys to all leaf nodes and sort all the lists that i encounter on the way. Basically, any list directly available or deep down within that given dictionary object should get sorted and the same dictionary with all sorted lists should be returned.
I tried doing recursion on the dict object to pass any dict object encountered to the recursion method and sorting the lists when encountered. But they fail to produce results when there is a dict inside a list and then another list inside that dict object.
Sample JSON below:
my_json = {
a: {
b: {
c: [
{
d: [
{ f: 'some_string' }
]
},
{
e: {
g: [
h: 'another string'
]
}
}
]
}
}
z: [
b: {
c: [
{
d: [
{ f: 'some_string1' }
]
},
{
e: {
g: [
h: 'another string1'
]
}
}
]
},
x: {
c: [
{
d: [
{ f: 'some_string2' }
]
},
{
e: {
g: [
h: 'another string2'
]
}
}
]
}
]
}
def gen_dict_extract(input_dict):
result_obj = input_dict;
if hasattr(var, 'iteritems'):
for k, v in var.iteritems():
if isinstance(v, dict):
for result in gen_dict_extract(v):
yield result
elif isinstance(v, list):
v.sort();
for d in v:
for result in gen_dict_extract(d):
yield result
The output expectation is just to have all lists sorted irrespective of where they lie. I am even okay with sorting every item in the dictionary but list sorting is what I require.
Taking a smaller example here to explain the output:
old_json = {
'x': [
{
'z': {
'y': ['agsd', 'xef', 'sdsd', 'erer']
}
},
{
's': {
'f': 'ererer',
'd': [5, 6, 2, 3, 1]
}
}
]
}
new_json = {
'x': [
{
's': {
'f': 'ererer',
'd': [1, 2, 3, 5, 6]
}
},
{
'z': {
'y': ['agsd', 'erer', 'sdsd','xef']
}
}
]
}
Something like above.

If you want the output to be a different dictionary (i.e. not sorting the original), the function should be written like this:
def sortedDeep(d):
if isinstance(d,list):
return sorted( sortedDeep(v) for v in d )
if isinstance(d,dict):
return { k: sortedDeep(d[k]) for k in sorted(d)}
return d
This way you can use sortedDeep() the same way you would use the buil-in sorted() function:
new_json = sortedDeep(old_json)
[EDIT] Improved version that will also sort lists of dictionaries (or list of lists) based on the smallest key/value of the embedded object:
def sortedDeep(d):
def makeTuple(v): return (*v,) if isinstance(v,(list,dict)) else (v,)
if isinstance(d,list):
return sorted( map(sortedDeep,d) ,key=makeTuple )
if isinstance(d,dict):
return { k: sortedDeep(d[k]) for k in sorted(d)}
return d

I believe the code snippet here will do the job for sorting nested dictionaries.
def nested_sort(d:dict):
for v in d.values():
if isinstance(v,dict):
nested_sort(v)
elif isinstance(v,list):
v.sort()
However, I cannot test the code because the example you gave is not in legal JSON format or a legal python dictionary.

Bidirectional data structure conversion in Python

Note: this is not a simple two-way map; the conversion is the important part.
I'm writing an application that will send and receive messages with a certain structure, which I must convert from and to an internal structure.
For example, the message:
{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
This must be converted to :
{
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
And vice-versa.
These messages have a lot of information, so I want to avoid having to manually write converters for both directions. Is there any way in Python to specify the mapping once, and use it for both cases?
In my research, I found an interesting Haskell library called JsonGrammar which allows for this (it's for JSON, but that's irrelevant for the case). But my knowledge of Haskell isn't good enough to attempt a port.

That's actually quite an interesting problem. You could define a list of transformation, for example in the form (key1, func_1to2, key2, func_2to1), or a similar format, where key could contain separators to indicate different levels of the dict, like "Person.name.first".
noop = lambda x: x
relations = [("Person.name.first", noop, "Person.firstname", noop),
("Person.name.last", noop, "Person.lastname", noop),
("birth_date", lambda s: datetime.date(*map(int, s.split("."))),
"Person.birth", lambda d: d.strftime("%Y.%m.%d")),
("points", int, "Person.points", str)]
Then, iterate the elements in that list and transform the entries in the dictionary according to whether you want to go from form A to B or vice versa. You will also need some helper function for accessing keys in nested dictionaries using those dot-separated keys.
def deep_get(d, key):
for k in key.split("."):
d = d[k]
return d
def deep_set(d, key, val):
*first, last = key.split(".")
for k in first:
d = d.setdefault(k, {})
d[last] = val
def convert(d, mapping, atob):
res = {}
for a, x, b, y in mapping:
a, b, f = (a, b, x) if atob else (b, a, y)
deep_set(res, b, f(deep_get(d, a)))
return res
Example:
>>> d1 = {"Person": { "name": { "first": "John", "last": "Smith" } },
... "birth_date": "1997.01.12",
... "points": "330" }
...
>>> print(convert(d1, relations, True))
{'Person': {'birth': datetime.date(1997, 1, 12),
'firstname': 'John',
'lastname': 'Smith',
'points': 330}}

Tobias has answered it quite well. If you are looking for a library that ensures the Model Transformation dynamically then you can explore the Python's Model transformation library PyEcore.
PyEcore allows you to handle models and metamodels (structured data model), and gives the key you need for building ModelDrivenEngineering-based tools and other applications based on a structured data model. It supports out-of-the-box:
Data inheritance,
Two-ways relationship management (opposite references),
XMI (de)serialization,
JSON (de)serialization etc
Edit
I have found something more interesting for you with example similar to yours, check out JsonBender.
import json
from jsonbender import bend, K, S
MAPPING = {
'Person': {
'firstname': S('Person', 'name', 'first'),
'lastname': S('Person', 'name', 'last'),
'birth': S('birth_date'),
'points': S('points')
}
}
source = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
result = bend(MAPPING, source)
print(json.dumps(result))
Output:
{"Person": {"lastname": "Smith", "points": "330", "firstname": "John", "birth": "1997.01.12"}}

Here is my take on this (converter lambdas and dot-based notation idea taken from tobias_k):
import datetime
converters = {
(str, datetime.date): lambda s: datetime.date(*map(int, s.split("."))),
(datetime.date, str): lambda d: d.strftime("%Y.%m.%d"),
}
mapping = [
('Person.name.first', str, 'Person.firstname', str),
('Person.name.last', str, 'Person.lastname', str),
('birth_date', str, 'Person.birth', datetime.date),
('points', str, 'Person.points', int),
]
def covert_doc(doc, mapping, converters, inverse=False):
converted = {}
for keys1, type1, keys2, type2 in mapping:
if inverse:
keys1, type1, keys2, type2 = keys2, type2, keys1, type1
converter = converters.get((type1, type2), type2)
keys1 = keys1.split('.')
keys2 = keys2.split('.')
obj1 = doc
while keys1:
k, *keys1 = keys1
obj1 = obj1[k]
dict2 = converted
while len(keys2) > 1:
k, *keys2 = keys2
dict2 = dict2.setdefault(k, {})
dict2[keys2[0]] = converter(obj1)
return converted
# Test
doc1 = {
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}
doc2 = {
"Person": {
"firstname": "John",
"lastname": "Smith",
"birth": datetime.date(1997, 1, 12),
"points": 330
}
}
assert doc2 == covert_doc(doc1, mapping, converters)
assert doc1 == covert_doc(doc2, mapping, converters, inverse=True)
This nice things are that you can reuse converters (even to convert different document structures) and that you only need to define non-trivial conversions. The drawback is that, as it is, every pair of types must always use the same conversion (maybe it could be extended to add optional alternative conversions).

You can use lists to describe paths to values in objects with type converting functions, for example:
from_paths = [
(['Person', 'name', 'first'], None),
(['Person', 'name', 'last'], None),
(['birth_date'], lambda s: datetime.date(*map(int, s.split(".")))),
(['points'], lambda s: int(s))
]
to_paths = [
(['Person', 'firstname'], None),
(['Person', 'lastname'], None),
(['Person', 'birth'], lambda d: d.strftime("%Y.%m.%d")),
(['Person', 'points'], str)
]
and a little function to covert from and to (much like tobias suggests but without string separation and using reduce to get values from dict):
def convert(from_paths, to_paths, obj):
to_obj = {}
for (from_keys, convfn), (to_keys, _) in zip(from_paths, to_paths):
value = reduce(operator.getitem, from_keys, obj)
if convfn:
value = convfn(value)
curr_lvl_dict = to_obj
for key in to_keys[:-1]:
curr_lvl_dict = curr_lvl_dict.setdefault(key, {})
curr_lvl_dict[to_keys[-1]] = value
return to_obj
test:
from_json = '''{
"Person": {
"name": {
"first": "John",
"last": "Smith"
}
},
"birth_date": "1997.01.12",
"points": "330"
}'''
>>> obj = json.loads(from_json)
>>> new_obj = convert(from_paths, to_paths, obj)
>>> new_obj
{'Person': {'lastname': u'Smith',
'points': 330,
'birth': datetime.date(1997, 1, 12), 'firstname': u'John'}}
>>> convert(to_paths, from_paths, new_obj)
{'birth_date': '1997.01.12',
'Person': {'name': {'last': u'Smith', 'first': u'John'}},
'points': '330'}
>>>

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn dict with duplicate keys into list containing these keys - python

Related

Count number of objects in list of dictionary where a key's value is more than 1

unable to update JSON using python

Remove duplicate values in different Json Lists python

How to sort all lists in a deeply nested dictionary in python?

Bidirectional data structure conversion in Python

Categories

Resources