Collect sub-elements in a json object python

Collect sub-elements in a json object python - python

So essentially I have a JSON object obtained through an API that looks similar to the one below and I am wondering how I would collect the sub-elements such as name and quantity and place it into an array/list.
{
"item_one": {
"name": "Item One",
"weight": 0,
"quantity": 1
},
"item_two": {
"name": "Item Two",
"weight": 0,
"quantity": 23
},
"item_three": {
"name": "Item Three",
"weight": 0,
"quantity": 53
}
}
An example for what the desired output is would be the following:
nameLst = ['Item One', 'Item Two', 'Item Three']
quantityLst = ['1', '23', '53']
So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.

You don't need to know the item names, you can simply loop over the keys of the dictionary and use those keys to query the JSON blob for each subdict.
namelst = []
quantitylst = []
for key in d.keys():
subdict = d[key]
namelst.append(subdict["name"])
quantitylst.append(subdict["quantity"])
If you don't need the keys at any point, then you can loop over the values solely as Kelly Bundy mentions.
for v in d.values():
namelst.append(v["name"])
quantitylst.append(v["quantity"])

So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.
I imagine you're just saying that this would be hard to do by hand, and you could do something like this.
distinct_keys = {k for d in json_obj.values() for k in d}
# you seem to want to convert ints to strings?
# if so, consider (some_transform(d[k]) if k in d else None)
result = {k:[d.get(k, None) for d in json_obj.values()] for k in distinct_keys}
If you actually need to iterate through this thing one object at a time though, consider something like the following:
from collections import defaultdict
result = defaultdict(list)
for d in json_obj.values():
# if you KNOW you don't have missing data
# for k,v in d.items(): result[k].append(v)
# you probably do have missing data though, so a cost proportional
# to your key sizes is unavoidable starting from completely unprocessed
# json data. you could save a little work, but here's the basic idea
# the work we do is different based on which sets/maps have they
# keys we're operating on
s = set(d.keys())
new_keys = s.difference(result)
missing_keys = [k for k in result if k not in s]
same_keys = s.intersection(result)
# this doesn't necessarily have to be special cased, but it
# allows us to guarantee result is non-empty everywhere else
# and avoid some more special casing.
if new_keys and not result:
for k,v in d.items():
result[k].append(v)
else:
# backfill new keys we found with None
L = result[next(iter(result))]
for key in new_keys:
result[key] = [None]*len(L)
result[key].append(d[key])
# fill in None for the current object for any keys in result
# that we don't have available
for key in missing_keys:
result[key].append(None)
# for everything in both objects, just append the new data
for key in same_keys:
result[key].append(d[key])
Then if you really needed variables and not a dictionary you can explicitly store them that way.
for k,L in result.items():
globals()[f'{k}Lst'] = L

Related

Group similar list elements into dict without grouping it's multiple occurence

I have a huge list where I convert it to a dict(based on 1st list element) for quick search with key.
List:
[0100,A,1.00,1]
.
.
[0450,A,1.00,1]
[0470,B,1.00,1]
[0480,A,1.00,1]
[0490,A,1.00,1]
[0500,A,1.00,1] #list-1 for below ref
[0510,B,1.00,1]
[0520,A,1.00,1]
[0530,A,1.00,1]
[0500,A,1.00,1] #list-2 for below ref
[0510,B,1.00,1]
[0520,X,1.00,1] #........
Converting into Dic:
for key, *vals in bytes_data: #Probably need a diff approach here itself instead appending
data_dict.setdefault(key, []).append(vals)
Dict looks like
{
'0450': [[A,1.00,1]],
'0470': [[B,1.00,1]],
'0480': [[A,1.00,1]], #......
}
Now, my current scenario needs to chunk data like 4xx/5xx/... based on situations.
For which, I use..
key_series = ["0" + str(x) for x in range(500, 600, 10)]
article_data = {
key: data_dict[key] for key in set(key_series) & set(data_dict)
}
The issue is, for some series like 5xx there are multiple occurrences. In that case My dict is grouped like
{
0500: [list-1,list-2,..],
0510: [list-1,list-2,..]
}
But, I need something like
{
0500-1: {0500: [list-1], 0510: [list-1],....},
0500-2: {0500: [list-2], 0510: [list-2],....},
}
Any trick to achieve this ? Thanks.

Not sure, if this is what you want, letme know if this solves your problem
from collections import defaultdict
data_dict = {
"0500": [["A",1.00,1]],
"0510": [["A",1.00,1], ["B",1.00,1], ["B",1.00,1]],
"0520": [["A",1.00,1], ["D",1.00,1]]
}
key_series = ["0" + str(x) for x in range(500, 600, 10)]
article_data = {
key: data_dict[key] for key in set(key_series) & set(data_dict)
}
res = defaultdict(dict)
for k ,v in data_dict.items():
for i, d in enumerate(v):
# for now 0500 is hardcoded, which can be made dynamic as per requirement
res[f"0500-{i+1}"][k] = d
print(res)

Find item in list of dicts

I have a list of dictionary something like this:
list = [
{
"ENT_AUT":[
"2018-11-27"
]
},
{
"ENT_NAT_REF_COD":"C87193"
},
{
"ENT_NAM":"MONEYBASE LIMITED"
},
{
"ENT_NAM_COM":"MONEYBASE LIMITED"
},
{
"ENT_ADD":"Ewropa Business Centre, Triq Dun Karm"
},
{
"ENT_TOW_CIT_RES":"Birkirkara"
},
{
"ENT_POS_COD":"BKR 9034"
},
{
"ENT_COU_RES":"MT"
}
]
Here every dictionary will always contain only one key value pair. Now I need to know the value of ENT_NAM, ENT_AUT and etc all fields.
I tried something like this:
ENT_NAM = (list[2].values())[0]
print('ENT_NAM = ', ENT_NAM)
It works perfectly for this list but my problem is that the 'ENT_NAM' containing dictionary will not always be on the 2nd index of the list. How can I generalize this solution so that even if the order of the dictionary under the list changes, I always find a perfect solution?

What you are describing is a search problem. Here the naive solution is probably fine:
def get_prop(dicts, k):
return next(x[k] for x in dicts if k in x)
get_prop(l, "ENT_NAM")
Incidentally, don't call your variable list: it shadows a builtin.
If you need to use this data more than about 3 times I would just reduce it to a dict:
def flatten(dicts):
iterdicts = iter(dicts)
start = next(iterdicts)
for d in iterdicts:
start.update(d)
return start
one_dict = flatten(list_of_dicts)
one_dict["ENT_NAM"]
(There are plenty of other ways to flatten a list of dicts, I just currently think the use of iter() to get a consumable list is neat.)

As jasonharper said in the comments, if it is possible, the data should be formulated as a single dictionary.
If this cannot happen, you can retrieve the value of ENT_NAM using:
print(list(filter(lambda elem: "ENT_NAM" in elem.keys(), my_list))[0]["ENT_NAM"])
Returns:
MONEYBASE LIMITED
Note: list has been renamed to my_list since list is a reserved Python keyword

If all of the keys are unique, you can flatten the list of dictionaries into a dictionary with a straightforward dictionary comprehension.
Note: you don't want to use list as a name for a variable, as it is an important built-in type. Use something like lst instead.
{ k: v for d in lst for k, v in d.items() }
Result:
{'ENT_AUT': ['2018-11-27'], 'ENT_NAT_REF_COD': 'C87193',
'ENT_NAM': 'MONEYBASE LIMITED', 'ENT_NAM_COM': 'MONEYBASE LIMITED',
'ENT_ADD': 'Ewropa Business Centre, Triq Dun Karm',
'ENT_TOW_CIT_RES': 'Birkirkara', 'ENT_POS_COD': 'BKR 9034',
'ENT_COU_RES': 'MT'}
Getting the value for key 'ENT_NAM' is now just:
{ k: v for d in lst for k, v in d.items() }['ENT_NAM']

Merge lists of dictionaries based on matching values in one key

I have a dictionary
dicte = [{'value':3, 'content':'some_string'}, {'value':4, 'content':'some_string1'}, {'value':4, 'content':'some_string2'}, {'value':4, 'content':'some_string3'}, {'value':4, 'content':'some_string4'}, {'value':5, 'content':'some_string5'}]
I want to be able to do an operation on that dictionary so that all the content that got the same values regarding the key "value" are regrouped and generate another dictionary based on that
The result here should be
new_dicte = [{'value':3, 'content':['some_string']}, {'value':4, 'content':['some_string1', 'some_string2', 'some_string3', 'some_string4']}, {'value':5, 'content':['some_string5']}]
What is the best pythonic way to do this ?

"Pythonic" is debatable, but you could build an intermediate dict to group by value:
grouped_by_value = {}
for item in dicte:
if item["value"] not in grouped_by_value:
grouped_by_value[item["value"]] = []
grouped_by_value[item["value"]].append(item["content"])
new_dicte = [{"value": key, "content": val} for key, val in grouped_by_value.items()]
The order of the elements in the new_dicte list will be the order that the values are encountered in the dicte list, assuming you are using a Python version where dicts are sorted by insertion order.
If you want to guarantee a specific ordering for new_dicte, you can sort the items of the grouped_by_value.items(), for example, by increasing "value" key.

Here's one way using dict.setdefault method to collect value-content pairs and unpacking later:
out = {}
for d in dicte:
out.setdefault(d['value'],[]).append(d['content'])
new_dicte = list(map(lambda x: dict(zip(('value','content'), x)), out.items()))
Here's another method that doesn't use an intermediate dictionary:
out = {}
for d in dicte:
if d['value'] not in out:
out[d['value']] = {'value':d['value'], 'content':[d['content']]}
else:
out[d['value']]['content'].append(d['content'])
new_dicte = list(out.values())
Output:
[{'value': 3, 'content': ['some_string']},
{'value': 4,
'content': ['some_string1', 'some_string2', 'some_string3', 'some_string4']},
{'value': 5, 'content': ['some_string5']}]

Best Pythonic is debatable but I would personally go with something like this,
import itertools
from collections import defaultdict
resp = defaultdict(list)
for key,group in itertools.groupby(d, key=lambda x: x["value"]):
for thing in group:
resp[key].append(thing.get("content"))
print(resp)
Output (can be adjusted to the way we want...)
{3: ['some_string'],
4: ['some_string1', 'some_string2', 'some_string3', 'some_string4'],
5: ['some_string5']}

Get parent(key) from value (an element stored in a list)

I am stuck in a problem wherein I have to find the common 'parent' of items in a list.
Here's the problem-
I have a reference file - hierarchy.json, and this file is loaded as a dictionary in my super class.
{
"DATE": ["ISO", "SYSTEM", "HTTP"],
"IP": ["IPv4", "IPv6"],
"PATH": ["UNIX", "WINDOWS"]
}
As an input, I get a list of values and I expect to get a set of elements, which belong to the same parent (referring to the hierarchy.json file) as an output. Even better, if I could get the parent name, that would be great.
input_list = ["ISO", "UNIX", "HTTP"]
result = do_something(input_list)
print('RESULT:\t', result)
>>> RESULT: set("ISO","HTTP")
Essentially, I want to make sets of the elements which belong to different "parents" in the element.
I know this can be done O(n^3) by looping through each element of the list. This is obviously not the best way to achieve the result.
Here's what I have tried -
def do_something(input_list: list, reference_dir: dict) -> list:
result_list = []
for lists in reference_dir.values():
results = []
for i in input_list:
for j in input_list:
if i != j:
if set([i,j]).issubset(set(lists)):
results.extend(set([i,j]))
result_list.append(set(results))
return result_list
input_list = ["ISO", "UNIX", "HTTP", "SYSTEM","WINDOWS"]
reference_dir = {"DATE": ["ISO", "SYSTEM", "HTTP"],"IP": ["IPv4", "IPv6"],"PATH":["UNIX", "WINDOWS"]}
result = do_something(input_list, reference_dir)
print('RESULT:\t', (result))
>>> RESULT: [{'SYSTEM', 'HTTP', 'ISO'}, set(), {'UNIX', 'WINDOWS'}]
Is there a way to optimize this/implement this in a better way?
Edited (Added) ->
ALSO,
if there's a way I could get the name of the 'parent' as the output, that would be AWESOME.
>>> RESULT: [DATE, PATH]
Thanks.

def do_something(input_list: list, reference_dir: dict) -> list:
sets_1 = {k: set() for k in reference_dir.keys()}
sets_2 = {item: sets_1[k] for k, list in reference_dir.items() for item in list}
for input in input_list:
sets_2[input].add(input)
return sets_1
reference_dir = {
"DATE": ["ISO", "SYSTEM", "HTTP"],
"IP": ["IPv4", "IPv6"],
"PATH": ["UNIX", "WINDOWS"]
}
input_list = ["ISO", "UNIX", "HTTP"]
print(do_something(input_list, reference_dir))
Prints:
{'DATE': {'HTTP', 'ISO'}, 'IP': set(), 'PATH': {'UNIX'}}
What do_something does:
sets_1 = {k: set() for k in reference_dir.keys()} This line creates a dictionary where each key is a key form reference_dir and the value is an empty set.
sets_2 = {item: sets_1[k] for k, list in reference_dir.items() for item in list} This line creates another dictionary. The keys are the elements of the lists which are the values of the reference_dir dictionary and the value for each key is the corresponding empty set created in step 1 associated with the lists owning key.
for input in input_list: sets_2[input].add(input) Each element of input_list is added to appropriate set using the dictionary created in step 2 to select the correct set.
return sets_1 This returns a dictionary whose keys are all the keys in reference_dir and whose values are now the filled in sets (some of which might be empty). The keys are the "names" of the sets you were looking for.
See Demo

How to build path from keys of nested dictionary?

I'm writing a script that broadcasts a number of data streams over an MQTT network. I'm trying to convert the keys of the nested dicts to a string that I can then use as the MQTT broadcast channel. The data is coming in every second already formatted into a nested dict like so:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4}},
'dataset2': { 'value1': 123.4,
'value2': 567.8},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
I've indented it to add readability, the extra spaces aren't in the actual data. As you can see it has multiple levels, not all levels have the same number of values, and some value keys are repeated. Also, one level is shallower than the rest but I can easily make it the same depth as the rest if that makes the problem easier to solve.
The dict above should provide an output like this:
("stream1/dataset1/value1", "stream1/dataset2/value1", ..., "stream3/value5")
and so on.
I feel like recursion might be a good solution to this but I'm not sure how to maintain an ordered list of keys as I pass through the structure, as well as make sure I hit each item in the structure, generating a new path for each base-level item (note the absence of "stream1/dataset1").
Here's the code I have so far:
my_dict = { as defined above }
def get_keys(input_dict, path_list, current_path):
for key, value in input_dict.items():
if isinstance(value, dict):
current_path += value
get_keys(value, path_list, current_path)
else:
path = '/'.join(current_path)
path_list.append(path)
my_paths = []
cur_path = []
get_keys(my_dict, my_paths, cur_path)
[print(p) for p in my_paths]

This is a great opportunity to use yield to turn your function into a generator. A generator can yield a whole bunch of items and behave much like a list or other iterable. The caller loops over its return value and gets one yielded item each iteration until the function returns.
def get_keys(input_dict):
for key, value in input_dict.items():
if isinstance(value, dict):
for subkey in get_keys(value):
yield key + '/' + subkey
else:
yield key
for key in get_keys(my_dict):
print(key)
Inside the outer for loop each value is either a dict or a plain value. If it's a plain value, just yield the key. If it's a dict, iterate over it and prepend key + '/' to each sub-key.
The nice thing is that you don't have to maintain any state. path_list and current_path are gone. get_keys() simply yields the strings one by one and the yield statements and recursive loop make the flattening of keys naturally shake out.
stream1/dataset1/value1
dataset2/value1
dataset2/value2
stream2/dataset3/value4
stream3/value5

You can use a generator for that purpose:
def convert(d):
for k, v in d.items():
if isinstance(v, dict):
yield from (f'{k}/{x}' for x in convert(v))
else:
yield k
Considering your expected output you seem to have a misplaced curly brace } in your example data, but using this test data:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4},
'dataset2': { 'value1': 123.4,
'value2': 567.8}},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
This is the output:
print(list(convert(d)))
# ['stream1/dataset1/value1', 'stream1/dataset2/value1', 'stream1/dataset2/value2', 'stream2/dataset3/value4', 'stream3/value5']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Collect sub-elements in a json object python - python

Related

Group similar list elements into dict without grouping it's multiple occurence

Find item in list of dicts

Merge lists of dictionaries based on matching values in one key

Get parent(key) from value (an element stored in a list)

How to build path from keys of nested dictionary?

Categories

Resources