Create a nested tree from list - python

From a list of lists, I would like to create a nested dictionary of which the keys would point to the next value in the sublist. In addition, I would like to count the number of times a sequence of sublist values occurred.
Example:
From a list of lists as such:
[['a', 'b', 'c'],
['a', 'c'],
['b']]
I would like to create a nested dictionary as such:
{
'a': {
{'b':
{
'c':{}
'count_a_b_c': 1
}
'count_a_b*': 1
},
{'c': {},
'count_a_c': 1
}
'count_a*': 2
},
{
'b':{},
'count_b':1
}
}
Please note that the names of the keys for counts do not matter, they were named as such for illustration.

i was curious how i would do this and came up with this:
lst = [['a', 'b', 'c'],
['a', 'c'],
['b']]
tree = {}
for branch in lst:
count_str = 'count_*'
last_node = branch[-1]
cur_tree = tree
for node in branch:
if node == last_node:
count_str = count_str[:-2] + f'_{node}'
else:
count_str = count_str[:-2] + f'_{node}_*'
cur_tree[count_str] = cur_tree.get(count_str, 0) + 1
cur_tree = cur_tree.setdefault(node, {})
nothing special happening here...
for your example:
import json
print(json.dumps(tree, sort_keys=True, indent=4))
produces:
{
"a": {
"b": {
"c": {},
"count_a_b_c": 1
},
"c": {},
"count_a_b_*": 1,
"count_a_c": 1
},
"b": {},
"count_a_*": 2,
"count_b": 1
}
it does not exactly reproduce what you imagine - but that is in part due to the fact that your desired result is not a valid python dictionary...
but it may be a starting point for you to solve your problem.

Related

Best approach for converting list of nested dictionaries to a single dictionary with aggregate functions

I've looked through a lot of solutions on this topic, but I have been unable to adapt my case to a performant one. Suppose I have a list of dictionaries stored as:
db_data = [
{
"start_time": "2020-04-20T17:55:54.000-00:00",
"results": {
"key_1": ["a","b","c","d"],
"key_2": ["a","b","c","d"],
"key_3": ["a","b","c","d"]
}
},
{
"start_time": "2020-04-20T18:32:27.000-00:00",
"results": {
"key_1": ["a","b","c","d"],
"key_2": ["a","b","e","f"],
"key_3": ["a","e","f","g"]
}
},
{
"start_time": "2020-04-21T17:55:54.000-00:00",
"results": {
"key_1": ["a","b","c"],
"key_2": ["a"],
"key_3": ["a","b","c","d"]
}
},
{
"start_time": "2020-04-21T18:32:27.000-00:00",
"results": {
"key_1": ["a","b","c"],
"key_2": ["b"],
"key_3": ["a"]
}
}
]
I am trying to get a data aggregation from the list output as a dictionary, with the key values of the results object as the keys of the output, and the size of the set of unique values for each date for each key.
I am attempting to aggregate the data by date value, and outputting the count of unique values for each key for each day.
Expected output is something like:
{
"key_1": {
"2020-04-20": 4,
"2020-04-21": 3
},
"key_2": {
"2020-04-20": 6,
"2020-04-21": 2
},
"key_3": {
"2020-04-20": 7,
"2020-04-21": 4
}
}
What I have tried so far is using defaultdict and loops to aggregate the data. This takes a very long time unfortunately:
from datetime import datetime
grouped_data = defaultdict(dict)
for item in db_data:
group = item['start_time'].strftime('%-b %-d, %Y')
for k, v in item['results'].items():
if group not in grouped_data[k].keys():
grouped_data[k][group] = []
grouped_data[k][group] = list(set(v + grouped_data[k][group]))
for k, v in grouped_data.items():
grouped_data[k] = {x:len(y) for x, y in v.items()}
print(grouped_data)
Any help or guidance is appreciated. I have read that pandas might help here, but I am not quite sure how to adapt this use case.
Edit
I am not sure why this was closed so fast. I am just looking for some advice on how to increase performance. I would appreciate if this could get re-opened.
The code below has a generator assigned to flat_list that flattens the original dictionary out into a list of tuples. Then the defaultdict is set up to be a dictionary with two levels of key, key and date, for which the value is a set. The set is updated for each key/date so it contains a list of unique items. This is vaguely similar to the example code, but it should be more efficient.
>>> from collections import defaultdict
>>> from functools import partial
>>>
>>> flat_list = ((key, db_item['start_time'][:10], results)
... for db_item in db_data
... for key, results in db_item['results'].items())
>>>
>>> d = defaultdict(partial(defaultdict, set))
>>>
>>> for key, date, li in flat_list:
... d[key][date].update(li)
...
Testing it out, we get the same number of list items per key/date as the counts in the example:
defaultdict(..., {'key_1': defaultdict(<class 'set'>, {
'2020-04-20': {'a', 'd', 'b', 'c'},
'2020-04-21': {'a', 'b', 'c'}}),
'key_2': defaultdict(<class 'set'>, {
'2020-04-20': {'a', 'f', 'd', 'c', 'b', 'e'},
'2020-04-21': {'a', 'b'}}),
'key_3': defaultdict(<class 'set'>, {
'2020-04-20': {'a', 'f', 'd', 'c', 'b', 'g', 'e'},
'2020-04-21': {'a', 'd', 'b', 'c'}})})
If you prefer the value to be a count of list items, you can just do len(d[key][date]).
Since flat_list is a generator, it doesn't do all its looping separately, but does it in conjunction with the loop that builds the dictionary. So in that way it's efficient.
[Update] I'm not seeing the performance gain on my system with CPython 3.8 indicated in the comments. The algorithm here is only marginally faster than the example in the question after fixing the line with item['start_time'].strftime('%-b %-d, %Y') to item['start_time'][:10].
With that said, efficiency is addressed by taking advantage of the set. Operations on a set are very fast, and we're just updating its elements. We don't need to test a list for group membership first. Checking lists for membership is a very slow operation that can really add up in loops. The time-complexity for checking a list for membership is O(n) per item added, whereas set operations are O(1) per item added.
Reference on time complexity of python data types and operations: https://wiki.python.org/moin/TimeComplexity

How to get all values for the same key from key-value pairs in python dictionary

How to iterate over all key value pairs of a dictionary and build all different values list for the same key using python.
Sample data:
"list": [
{
"1": 1
},
{
"1": 8
},
{
"1": 9
},
{
"1": 1
},
{
"2": 8
},
{
"2": 10
}
],
For the above list, I need to build a list like:
[{"1":[1,8,9]}, {"2":[8,10]}]
This will give you what you want
import collections
input_ = {'list': [{'1': 1}, {'1': 8}, {'1': 9}, {'1': 1}, {'2': 8}, {'2': 10}]}
result = collections.defaultdict(list)
for elem in input_['list']:
key, value = next(iter(elem.items())) # best way to extract data from your strange format
if value not in result[key]:
result[key].append(value)
result = [dict(result)] # put it in the strange format you want
outputs
[{'1': [1, 8, 9], '2': [8, 10]}]
However I strongly recommend reconsidering how you structure your data. For example, your input should be a list of tuples, not a list of dictionaries.
this should do the trick - but please notice that doing a dictionary with only one key and value is not the way to save data.
new_dict = {}
for pair in your_list:
for key in pair:
if key in new_dict:
new_dict[key] = new_dict[key].append(pair[key])
else:
new_dict[key] = [pair[key]]
print new_dict
new_dict is a dictionary with key -> list of values mapping

how to write python to replace the next perl code?

I have just encountered Perl code similar to the following:
my #keys = qw/ 1 2 3 /;
my #vals = qw/ a b c /;
my %hash;
#hash{#keys} = #vals;
This code populates an associative array given a list of keys and a list of values. For example, the above code creates the following data structure (expressed as JSON):
{
"1": "a",
"2": "b",
"3": "c"
}
How would one go about doing this in Python?
Like this:
import json
keys = [1, 2, 3]
vals = ['a', 'b', 'c']
hash = dict(zip(keys, vals))
json.dumps(hash)
=> '{"1": "a", "2": "b", "3": "c"}'
That json is pretty much a polyglot with Python. Once you assign it to a name, though, it stops being a polyglot.
hf = {
"1": "a",
"2": "b",
"3": "c"
}
You can also iteratively align items into a dictionary.
letters = ('a', 'b', 'c', )
numbers = ('1', '2', '3', )
hf = { n : l for n, l in zip(numbers, letters) }
You can do:
>>> keys='123'
>>> vals='abc'
>>> dict(zip(keys,vals))
{'1': 'a', '3': 'c', '2': 'b'}
(Python note: strings are iterable, so list('abc') is the rough equivalent of my #vals = qw/ a b c /; in Perl)
Then if you want JSON:
>>> import json
>>> json.dumps(dict(zip(keys,vals)))
'{"1": "a", "3": "c", "2": "b"}'

Comparing python dictionaries and find diffrence of the two

So im trying to write a python program that will take 2 .json files compare the contents and display the differences between the two. So far my program takes user input to select two files and compares the two just fine. I have hit a wall trying to figure out how to print what the actual differences are between the two files.
my program:
#!/usr/bin/env python2
import json
#get_json() requests user input to select a .json file
#and creates a python dict with the information
def get_json():
file_name = raw_input("Enter name of JSON File: ")
with open(file_name) as json_file:
json_data = json.load(json_file)
return json_data
#compare_json(x,y) takes 2 dicts, and compairs the contents
#print match if equal, or not a match if there is difrences
def compare_json(x,y):
for x_values, y_values in zip(x.iteritems(), y.iteritems()):
if x_values == y_values:
print 'Match'
else:
print 'Not a match'
def main():
json1 = get_json()
json2 = get_json()
compare_json(json1, json2)
if __name__ == "__main__":
main()
example of my .json:
{
"menu": {
"popup": {
"menuitem": [
{
"onclick": "CreateNewDoc()",
"value": "New"
},
{
"onclick": "OpenDoc()",
"value": "Open"
},
{
"onclick": "CloseDoc()",
"value": "Close"
}
]
},
"id": "file",
"value": "File"
}
}
Your problem stems from the fact that dictionaries are stored in a structure with an internal logical consistency - when you ask for someDict.items() and someOtherDict.items(), the key-value pairs of elements are computed by the same algorithm. However, due to differences in the keys that may be present in either dictionary, identical keys may not be present in the corresponding index in either list returned by the call to dict.items(). As a result, you are much better off checking if a particular key exists in another dictionary, and comparing the associated value in both.
def compare_json(x,y):
for x_key in x:
if x_key in y and x[x_key] == y[x_key]:
print 'Match'
else:
print 'Not a match'
if any(k not in x for k in y):
print 'Not a match'
If you want to print out the actual differences:
def printDiffs(x,y):
diff = False
for x_key in x:
if x_key not in y:
diff = True
print "key %s in x, but not in y" %x_key
elif x[x_key] != y[x_key]:
diff = True
print "key %s in x and in y, but values differ (%s in x and %s in y)" %(x_key, x[x_key], y[x_key])
if not diff:
print "both files are identical"
You might want to try out the jsondiff library in python.
https://pypi.python.org/pypi/jsondiff/0.1.0
The examples referenced from the site are below.
>>> from jsondiff import diff
>>> diff({'a': 1}, {'a': 1, 'b': 2})
{<insert>: {'b': 2}}
>>> diff({'a': 1, 'b': 3}, {'a': 1, 'b': 2})
{<update>: {'b': 2}}
>>> diff({'a': 1, 'b': 3}, {'a': 1})
{<delete>: ['b']}
>>> diff(['a', 'b', 'c'], ['a', 'b', 'c', 'd'])
{<insert>: [(3, 'd')]}
>>> diff(['a', 'b', 'c'], ['a', 'c'])
{<delete>: [1]}
# Similar items get patched
>>> diff(['a', {'x': 3}, 'c'], ['a', {'x': 3, 'y': 4}, 'c'])
{<update>: [(1, {<insert>: {'y': 4}})]}
# Special handling of sets
>>> diff({'a', 'b', 'c'}, {'a', 'c', 'd'})
{<add>: set(['d']), <discard>: set(['b'])}
# Parse and dump JSON
>>> print diff('["a", "b", "c"]', '["a", "c", "d"]', parse=True, dump=True, indent=2)
{
"$delete": [
1
],
"$insert": [
[
2,
"d"
]
]
}

Method for Creating a Nested Dictionary from a List of Keys

I would like to create an empty nested dictionary from an arbitrary tuple/list that holds the keys. I am trying to find a simple way to do this in Python. It looks like something that collections defaultdict should handle but I can't seem to figure it out.
keys = ('a', 'b', 'c')
And a dictionary that will end up looking like this:
d = {
'a': {
'b': {
'c': {}
}
}
}
I suppose you could do it with reduce:
def subdict(sub, key):
return { key: sub }
d = reduce(subdict, reversed(keys), {})
(In Python 3, it’s functools.reduce.)
def nested_dict(keys):
if len(keys) == 1:
return {keys[0]: {}}
return {keys[0]: nested_dict(keys[1:])}

Categories

Resources