I'm writing a script that broadcasts a number of data streams over an MQTT network. I'm trying to convert the keys of the nested dicts to a string that I can then use as the MQTT broadcast channel. The data is coming in every second already formatted into a nested dict like so:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4}},
'dataset2': { 'value1': 123.4,
'value2': 567.8},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
I've indented it to add readability, the extra spaces aren't in the actual data. As you can see it has multiple levels, not all levels have the same number of values, and some value keys are repeated. Also, one level is shallower than the rest but I can easily make it the same depth as the rest if that makes the problem easier to solve.
The dict above should provide an output like this:
("stream1/dataset1/value1", "stream1/dataset2/value1", ..., "stream3/value5")
and so on.
I feel like recursion might be a good solution to this but I'm not sure how to maintain an ordered list of keys as I pass through the structure, as well as make sure I hit each item in the structure, generating a new path for each base-level item (note the absence of "stream1/dataset1").
Here's the code I have so far:
my_dict = { as defined above }
def get_keys(input_dict, path_list, current_path):
for key, value in input_dict.items():
if isinstance(value, dict):
current_path += value
get_keys(value, path_list, current_path)
else:
path = '/'.join(current_path)
path_list.append(path)
my_paths = []
cur_path = []
get_keys(my_dict, my_paths, cur_path)
[print(p) for p in my_paths]
This is a great opportunity to use yield to turn your function into a generator. A generator can yield a whole bunch of items and behave much like a list or other iterable. The caller loops over its return value and gets one yielded item each iteration until the function returns.
def get_keys(input_dict):
for key, value in input_dict.items():
if isinstance(value, dict):
for subkey in get_keys(value):
yield key + '/' + subkey
else:
yield key
for key in get_keys(my_dict):
print(key)
Inside the outer for loop each value is either a dict or a plain value. If it's a plain value, just yield the key. If it's a dict, iterate over it and prepend key + '/' to each sub-key.
The nice thing is that you don't have to maintain any state. path_list and current_path are gone. get_keys() simply yields the strings one by one and the yield statements and recursive loop make the flattening of keys naturally shake out.
stream1/dataset1/value1
dataset2/value1
dataset2/value2
stream2/dataset3/value4
stream3/value5
You can use a generator for that purpose:
def convert(d):
for k, v in d.items():
if isinstance(v, dict):
yield from (f'{k}/{x}' for x in convert(v))
else:
yield k
Considering your expected output you seem to have a misplaced curly brace } in your example data, but using this test data:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4},
'dataset2': { 'value1': 123.4,
'value2': 567.8}},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
This is the output:
print(list(convert(d)))
# ['stream1/dataset1/value1', 'stream1/dataset2/value1', 'stream1/dataset2/value2', 'stream2/dataset3/value4', 'stream3/value5']
Related
So essentially I have a JSON object obtained through an API that looks similar to the one below and I am wondering how I would collect the sub-elements such as name and quantity and place it into an array/list.
{
"item_one": {
"name": "Item One",
"weight": 0,
"quantity": 1
},
"item_two": {
"name": "Item Two",
"weight": 0,
"quantity": 23
},
"item_three": {
"name": "Item Three",
"weight": 0,
"quantity": 53
}
}
An example for what the desired output is would be the following:
nameLst = ['Item One', 'Item Two', 'Item Three']
quantityLst = ['1', '23', '53']
So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.
You don't need to know the item names, you can simply loop over the keys of the dictionary and use those keys to query the JSON blob for each subdict.
namelst = []
quantitylst = []
for key in d.keys():
subdict = d[key]
namelst.append(subdict["name"])
quantitylst.append(subdict["quantity"])
If you don't need the keys at any point, then you can loop over the values solely as Kelly Bundy mentions.
for v in d.values():
namelst.append(v["name"])
quantitylst.append(v["quantity"])
So far the only way I know how to do this would be to individually collect the quantity and name data by searching through all the specific items, this however would be impossible due to the sheer number of potential items.
I imagine you're just saying that this would be hard to do by hand, and you could do something like this.
distinct_keys = {k for d in json_obj.values() for k in d}
# you seem to want to convert ints to strings?
# if so, consider (some_transform(d[k]) if k in d else None)
result = {k:[d.get(k, None) for d in json_obj.values()] for k in distinct_keys}
If you actually need to iterate through this thing one object at a time though, consider something like the following:
from collections import defaultdict
result = defaultdict(list)
for d in json_obj.values():
# if you KNOW you don't have missing data
# for k,v in d.items(): result[k].append(v)
# you probably do have missing data though, so a cost proportional
# to your key sizes is unavoidable starting from completely unprocessed
# json data. you could save a little work, but here's the basic idea
# the work we do is different based on which sets/maps have they
# keys we're operating on
s = set(d.keys())
new_keys = s.difference(result)
missing_keys = [k for k in result if k not in s]
same_keys = s.intersection(result)
# this doesn't necessarily have to be special cased, but it
# allows us to guarantee result is non-empty everywhere else
# and avoid some more special casing.
if new_keys and not result:
for k,v in d.items():
result[k].append(v)
else:
# backfill new keys we found with None
L = result[next(iter(result))]
for key in new_keys:
result[key] = [None]*len(L)
result[key].append(d[key])
# fill in None for the current object for any keys in result
# that we don't have available
for key in missing_keys:
result[key].append(None)
# for everything in both objects, just append the new data
for key in same_keys:
result[key].append(d[key])
Then if you really needed variables and not a dictionary you can explicitly store them that way.
for k,L in result.items():
globals()[f'{k}Lst'] = L
I am looking for different ways to check values within JSON.
For instance, to check that the value is:
int - isinstance(value, int)
String - isinstance(value, str)
Array - isinstance(value, list)
But what is the cleanest way to check if values are list, dict or a list of dictionaries? How to correctly differentiate between them?
Example
{
"test": ["a","b"]
}
vs
{
"test": {"a":0, "b":1}
}
vs
{
"test": [
{"a":0},
{"b":1}
]
}
If JSON data schema validation is what you are after, the examples you gave can be easily handled by GoodJSON
Is list (of specific type of values)
from goodjson.validators import gj_all, foreach, is_list, is_string
validate_fn = gj_all(
is_list(),
foreach(is_string)
)
validate_fn(['a', 'b', 'c'])
Is dict
from goodjson.validators import foreach_key, is_dict
validate_fn = foreach_key(test=[is_dict])
validate_fn({
'test': {
'is_dict': 'yes'
}
})
Is list of dict
from goodjson.validators import foreach_key, foreach, is_dict, is_list
validate_fn = foreach_key(
test=[
is_list(),
foreach(is_dict)
]
)
validate_fn({
'test': [{'foo': 'bar'}]
})
To recursively search through a JSON data structure, and handle the case where the items are collections such as lists or dictionaries, you could do something like the following.
Example: Recursively search for JSON key
def find_key(json_input, lookup_key):
if isinstance(json_input, dict):
for k, v in json_input.items():
if k == lookup_key:
yield v
else:
yield from find_key(v, lookup_key)
elif isinstance(json_input, list):
for item in json_input:
yield from find_key(item, lookup_key)
Also, be sure to take advantage of the standard library json package. The main functions for JSON encoding and decoding are:
json.dumps()
json.loads()
See also
recursive iteration through nested json for specific key in python
Iterating through a JSON object
Validate JSON data using python
This is sort of a followup question to one of my previous questions. I have some dictionaries where I need to look at every value they contain and if that value is a datetime I need to format it a specific way. I also need to be able to recurse into nested dictionaries and lists. This is what I have so far:
def fix_time(in_time):
out_time = '{}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}'.format(in_time.year, in_time.month, in_time.day, in_time.hour, in_time.minute, in_time.second)
return out_time
def fix_recursive(dct):
for key, value in dct.items():
if isinstance(value, datetime.datetime):
mydict[key] = fix_time(value)
elif isinstance(value, dict):
fix_recursive(value)
mydict={
'Field1':'Value1'
'SomeDateField1':1516312413.729,
'Field2':'Value2',
'Field3': [
{
'Subfield3_1':'SubValue1',
'SubDateField3_1':1516312413.729
},
{
'Subfield3_2':'SubValue2',
'SubDateField3_2':1516312413.729
},
{
'Subfield3_3':'SubValue3',
'SubDateField3_3':1516312413.729
}
],
'Field4': {
'Subfield4_1':'SubValue1',
'SubDateField4_1':1516312413.729
}
}
fix_recursive(mydict)
This works great for dictionaries and nested dictionaries, but not so much for lists. So in the above example fix_recursive would correct SomeDateField1 and SubDateField4_1, but not SubDateField3_1, SubDateField3_2, or SubDateField3_3. Also, as I don't know what the input will look like before I get it, I am trying to create a function that could get values in listed nested 3 or 4 levels deep.
And suggestions would be appreciated.
Thanks!
You need to differentiate between looping over a list and a dictionary
def fix_recursive(obj):
if isinstance(obj, list): # could replace with collections.abc.MutableSequence
itr = enumerate(obj)
elif isinstance(obj, dict): # could replace with collections.abc.MutableMapping
itr = obj.items()
else:
return # don't iterate -- pass back up
for key, value in itr:
if isinstance(value, datetime.datetime):
obj[key] = fix_time(value)
else:
fix_recursive(value)
Flow your current route, added list support to recursive function.
Why use fix_time, for serialize and deserialize? Use JSON or pickle, no need to convert datetime.
I have a dictionary where each key has several lists of data as its values like this
myDict = {'data1' : ['data_d','dataD']['data_e','dataE']['data_f','dataF']}
I want to be able to input one of the values in the list and then be given the key. This is so I can get the other value in the list now that I have the key.
I've tried
dataKey = (list(myDict.keys())[list(myDict.values()).index(dataD)])
but that didn't work
I've also tried
for k, v in myDict.items():
if 'dataD' in v:
print k
but that didn't work as well.
Side question, in the questions that I've looked through, I see people using the variable k and v a lot even without the OP mentioning them, so I am wondering if k and v are already set variable in dictionaries?
Your second attempt was almost right, but a nested for loop is needed to traverse the list-of-lists:
myDict = {'data1' : [['data_d','dataD'], ['data_e','dataE'], ['data_f','dataF']]}
for key, value in myDict.items():
for sublist in value:
if 'dataD' in sublist:
print(key) # -> data1
Using variables named k, and v with dictionaries is purely optional and aren't special properties—other than being very short abbreviations for "key" and "value".
Note that if only one match is ever expected to occur, the code could be made more efficient by stopping the search after one is found. Here's one way of doing that:
target = 'dataD'
try:
for key, value in myDict.items():
for sublist in value:
if target in sublist:
print(key) # -> data1
raise StopIteration # found, so quit searching
except StopIteration:
pass # found
else:
print('{} not found'.format(target))
if they are all going to be lists then you can do something like this (if i am understanding correctly)
myDict = {
'data1': [['data_d','dataD'], ['data_e','dataE'], ['data_f','dataF']],
}
def find(dic, item):
for k, v in dic.items():
for ls in v:
if item in ls:
return k
return False
print(find(myDict, "data_d"))
# OUT [data1]
I would like to create a "translator" type of dict that would assign values that are keys in different dicts, which are nested, to keys in a dict that I created. The problem I run into is that I can't create a value that represents a nested dict key without having to convert that to a string or some other data type, and when I try to use a string as an index to the nested dict, I get an index error. Ideally, my dict would look something like this:
new_dict{
"new_key_1" : ['subdict1']['subdict2']['old_key_1'],
"new_key_2" : ['subdict1']['subdict2']['old_key_2'],
"new_key_3" : ['subdict1']['subdict3']['old_key_3']
}
Then, for each nested dict, I could generate a new dict object with a simple for loop:
for key, value in new_dict.items() :
user_dict_1[key] = OldDict[value]
The nested dicts are very large and I only need a few fields from each, otherwise I could just use the .copy() function to work with the old dicts.
PS- Any help in rewriting this question to be more readable also appreciated.
You're going to need reduce() for this one...
attrmap = {
"new_key_1": ('subdict1', 'subdict2', 'old_key_1'),
...
}
print reduce(lambda x, y: x[y], attrmap[somekey], old_object)
Are you talking something like this?
from pprint import pprint as pp
subdict1 = {'subdict1_item1':1, 'subdict1_item2':2}
subdict2 = {'subdict2_item1':3, 'subdict2_item2':4}
subdict3 = {'subdict3_item1': 5, 'subdict3_item1':6}
olddict = {
'old_key_1': [subdict1, subdict2],
'old_key_2': [subdict1, subdict2],
'old_key_3': [subdict1, subdict3],
}
newdict = {
'new_key_1': olddict['old_key_1'].append('old_key_1'),
'new_key_2': olddict['old_key_2'].append('old_key_2'),
'new_key_3': olddict['old_key_3'].append('old_key_3'),
}
or this
newdict = {
'new_key_1': 'old_key_1',
'new_key_2': 'old_key_2',
'new_key_3': 'old_key_3',
}
def getnew(newkey, newdict, olddict):
if newkey in newdict:
oldkey = newdict[newkey]
if oldkey in olddict:
preitem = olddict[ oldkey ] # returns a list with two items
item = []
item.append([preitem[0]]) # makes subdict1 wrapped in a list
item.append([preitem[1]]) # makes subdict2/3 wrapped in a list
item.append([oldkey])
return item
else:
raise KeyError('newdict has no matching olddict key')
results to:
pp( getnew('new_key_1', newdict, olddict) )
print
pp( getnew('new_key_2', newdict, olddict) )
print
pp( getnew('new_key_3', newdict, olddict) )
[[{'subdict1_item1': 1, 'subdict1_item2': 2}],
[{'subdict2_item1': 3, 'subdict2_item2': 4}],
['old_key_1']]
[[{'subdict1_item1': 1, 'subdict1_item2': 2}],
[{'subdict2_item1': 3, 'subdict2_item2': 4}],
['old_key_2']]
[[{'subdict1_item1': 1, 'subdict1_item2': 2}],
[{'subdict3_item1': 6}],
['old_key_3']]