Recursively creating a tree hierarchy without using class/object - python

I am having trouble creating a tree hierarchy in Python 3. I'd like to be able to do this without using classes.
The data I need to start with is not in order and in the format ['ID','Parent']:
data=[['E1', 'C1'],['C1', 'P1'],['P1', 'R1'],['E2', 'C2'],['C2', 'P2'],['P2', 'R1'],['C3', 'P2'],['E3', 'C4'],['C4', 'P3'],
['P3', 'R2'],['C5', 'P3'],['E4', 'C6'],['C6', 'P4'], ['P4', 'R2'],['E5', 'C7'],['C7', 'P5'],['P5', 'R3'],['E6', 'C9'],['C9', 'P6'],['P6', 'R3'],
['C8', 'P6'],['E7', 'C10'],['C10', 'P7'],['P7', 'R4'],['C11', 'P7'],['E8', 'C12'],['C12', 'P8'],['P8', 'R4']]
I want to create the (Tree) dictionary variable without the use of classes and end up with something like:
Tree={'R1':{'P1':{},'P2':{}},'R2':{}} etc
OR
Tree={'R1':[{'P1':[],'P2':[]}],'R2':[]} etc
Obviously R1 and R2 have more children than that but perhaps that's what the Tree structure would look like?

You can simply iterate over every child,parent tuple, create dictionary that maps the id's of the child and the parent to a list that contains the children of these elements. We keep doing this until we are done.
roots = set()
mapping = {}
for child,parent in data:
childitem = mapping.get(child,None)
if childitem is None:
childitem = {}
mapping[child] = childitem
else:
roots.discard(child)
parentitem = mapping.get(parent,None)
if parentitem is None:
mapping[parent] = {child:childitem}
roots.add(parent)
else:
parentitem[child] = childitem
Now that we have done that, roots is a set of the ids of the tree roots: so for each such element we know that there is no id that is a parent. For each id in the roots, we can simply fetch from the mapping and that is a dictionary of the structure {'childid':child} where childid is the id (here a string) and child is again a dictionary of that form.
So you can print them like:
for root in roots:
print(mapping[root])
So in your case, the tree is:
tree = { id : mapping[id] for id in roots }
For your sample data, it generates:
>>> tree
{'R1': {'P1': {'C1': {'E1': {}}}, 'P2': {'C2': {'E2': {}}, 'C3': {}}}, 'R2': {'P4': {'C6': {'E4': {}}}, 'P3': {'C5': {}, 'C4': {'E3': {}}}}, 'R3': {'P6': {'C8': {}, 'C9': {'E6': {}}}, 'P5': {'C7': {'E5': {}}}}, 'R4': {'P8': {'C12': {'E8': {}}}, 'P7': {'C11': {}, 'C10': {'E7': {}}}}}

Related

Targets don't match node IDs in networkx json file

I have a network I want to output to a json file. However, when I output it, node targets become converted to numbers and do not match the node ids which are strings.
For example:
G = nx.DiGraph(data)
G.edges()
results in:
[(22, 'str1'),
(22, 'str2'),
(22, 'str3')]
in python. This is correct.
But in the output, when I write out the data like so...
json.dump(json_graph.node_link_data(G), f,
indent = 4, sort_keys = True, separators=(',',':'))
while the ids for the three target nodes 'str1', 'str2', and 'str3'...
{
"id":"str1"
},
{
"id":"str2"
},
{
"id":"str3"
}
The targets of node 22 have been turned into numbers
{
"source":22,
"target":972
},
{
"source":22,
"target":1261
},
{
"source":22,
"target":1259
}
This happens for all nodes that have string ids
Why is this, and how can I prevent it?
The desired result is that either "target" fields should keep the string ids, or that the string ids become numeric in a way that they match the targets.
Why is this
It's a feature. Not all graph libraries accept strings as identifiers, but all that I know of accept integers.
how can I prevent it?
Replace the ids by node names using the nodes map:
>>> import networkx as nx
>>> import pprint
>>> g = nx.DiGraph()
>>> g.add_edge(1, 'foo')
>>> g.add_edge(2, 'bar')
>>> g.add_edge('foo', 'bar')
>>> res = nx.node_link_data(g)
>>> pprint.pprint(res)
{'directed': True,
'graph': {},
'links': [{'source': 0, 'target': 3},
{'source': 1, 'target': 2},
{'source': 3, 'target': 2}],
'multigraph': False,
'nodes': [{'name': 1}, {'name': 2}, {'name': 'bar'}, {'name': 'foo'}]}
>>> res['links'] = [
{
'source': res['nodes'][link['source']]['name'],
'target': res['nodes'][link['target']]['name']
}
for link in res['links']]
>>> pprint.pprint(res)
{'directed': True,
'graph': {},
'links': [{'source': 1, 'target': 'foo'},
{'source': 2, 'target': 'bar'},
{'source': 'foo', 'target': 'bar'}],
'multigraph': False,
'nodes': [{'name': 1}, {'name': 2}, {'name': 'bar'}, {'name': 'foo'}]}
To make the output conform to the d3 template that is linked in the node_link_data documentation, you can make a couple simple changes to the node_link_data function. Just run the below function and use it instead. All I changed was to trim some of the unnecessary outputs for the template, and to store the graph label instead of an index. The index the original function used for target and destination was created in the function, so it isn't something you can extract from the graph itself, so if you want to be certain that your node labels correspond to your links, it's safest to modify node_link_data.
The D3 Template this creates data for is here
Note that if you use the below data without adding a node or link attribute, you will need to delete the following lines from the d3 template:
.attr("stroke-width", function(d) { return Math.sqrt(d.value); })
and
.attr("fill", function(d) { return color(d.group); })
Modified function:
from itertools import chain, count
import json
import networkx as nx
from networkx.utils import make_str
__author__ = """Aric Hagberg <hagberg#lanl.gov>"""
_attrs = dict(id='id', source='source', target='target', key='key')
def node_link_data(G, attrs=_attrs):
"""Return data in node-link format that is suitable for JSON serialization
and use in Javascript documents.
"""
multigraph = G.is_multigraph()
id_ = attrs['id']
source = attrs['source']
target = attrs['target']
# Allow 'key' to be omitted from attrs if the graph is not a multigraph.
key = None if not multigraph else attrs['key']
if len(set([source, target, key])) < 3:
raise nx.NetworkXError('Attribute names are not unique.')
mapping = dict(zip(G, count()))
data = {}
data['nodes'] = [dict(chain(G.node[n].items(), [(id_, n)])) for n in G]
if multigraph:
data['links'] = [
dict(chain(d.items(),
[(source, u), (target,v), (key, k)]))
for u, v, k, d in G.edges_iter(keys=True, data=True)]
else:
data['links'] = [
dict(chain(d.items(),
[(source, u), (target, v)]))
for u, v, d in G.edges_iter(data=True)]
return data

Flatten a nested dict structure into a dataset

For some post-processing, I need to flatten a structure like this
{'foo': {
'cat': {'name': 'Hodor', 'age': 7},
'dog': {'name': 'Mordor', 'age': 5}},
'bar': { 'rat': {'name': 'Izidor', 'age': 3}}
}
into this dataset:
[{'foobar': 'foo', 'animal': 'dog', 'name': 'Mordor', 'age': 5},
{'foobar': 'foo', 'animal': 'cat', 'name': 'Hodor', 'age': 7},
{'foobar': 'bar', 'animal': 'rat', 'name': 'Izidor', 'age': 3}]
So I wrote this function:
def flatten(data, primary_keys):
out = []
keys = copy.copy(primary_keys)
keys.reverse()
def visit(node, primary_values, prim):
if len(prim):
p = prim.pop()
for key, child in node.iteritems():
primary_values[p] = key
visit(child, primary_values, copy.copy(prim))
else:
new = copy.copy(node)
new.update(primary_values)
out.append(new)
visit(data, { }, keys)
return out
out = flatten(a, ['foo', 'bar'])
I was not really satisfied because I have to use copy.copy to protect my inputs. Obviously, when using flatten one does not want the inputs be altered.
Then I thought about one alternative that uses more global variables (at least global to flatten) and uses an index instead of directly passing primary_keys to visit. However, this does not really help me to get rid of the ugly initial copy:
keys = copy.copy(primary_keys)
keys.reverse()
So here is my final version:
def flatten(data, keys):
data = copy.copy(data)
keys = copy.copy(keys)
keys.reverse()
out = []
values = {}
def visit(node, id):
if id:
id -= 1
for key, child in node.iteritems():
values[keys[id]] = key
visit(child, id)
else:
node.update(values)
out.append(node)
visit(data, len(keys))
return out
Is there a better implementation (that can avoid the use of copy.copy)?
Edit: modified to account for variable dictionary depth.
By using the merge function from my previous answer (below), you can avoid calling update which modifies the caller. There is then no need to copy the dictionary first.
def flatten(data, keys):
out = []
values = {}
def visit(node, id):
if id:
id -= 1
for key, child in node.items():
values[keys[id]] = key
visit(child, id)
else:
out.append(merge(node, values)) # use merge instead of update
visit(data, len(keys))
return out
One thing I don't understand is why you need to protect the keys input. I don't see them being modified anywhere.
Previous answer
How about list comprehension?
def merge(d1, d2):
return dict(list(d1.items()) + list(d2.items()))
[[merge({'foobar': key, 'animal': sub_key}, sub_sub_dict)
for sub_key, sub_sub_dict in sub_dict.items()]
for key, sub_dict in a.items()]
The tricky part was merging the dictionaries without using update (which returns None).

Using recursion to reverse a dictionary around a value in python

I have a data set which follows the structure of the following example:
exampleset = {
'body' : {
'abdomen' : [{
'arms' : {
'value' : 2,
}
},{
'legs': {
'value' : 2,
}
}],
'hands' : {
'fingers' : {
'value' : 5,
}
},
}
}
I am trying to reverse this so I get something like:
{'value': {'value1': {5: {'fingers': {'hands': {'body': {}}}}},
'value2': {2: {'legs': {'abdomen': {'body': {}}}}},
'value3': {2: {'arms': {'abdomen': {'body': {}}}}}},
}
(I hope I got the bracket matching right, but you get the idea.)
I am using a couple of recursion functions to do this, like so:
def recurse_find(data, values, count):
global conf
for key in data:
for v in conf['value_names']:
if key == v:
values[v+str(count)] = {}
values[v+str(count)][data[key]] = {}
count += 1
# originally just using this line:
# values[data[key]] = {}
if type(data[key]) is list:
for i in data[key]:
if type(i) is dict:
values = recurse_find(i, values, count)
values = add_new_level(values, key)
elif type(data[key]) is dict:
values = recurse_find(data[key], values, count)
values = add_new_level(values, key)
return values
def add_new_level(data, new_key):
for key in data:
if data[key] == {}:
data[key][new_key] = {}
else:
data[key] = add_new_level(data[key], new_key)
return data
conf = { "value_names": ["value"] }
for value in conf['value_names']:
values[value] = recurse_find(exampleset, {}, 1)
print(values)
At the moment I only get one value returned correctly, obviously I would like them all. Originally I didn't label the values (value1, value2 etc), but when doing this example set I realised that of course if the values are the same I'll only get one! If I remove the value name keys it finds all the values (unless duplicate) but still doesn't return the correct levels as it includes some of the others while it loops round. I don't care about the order of the values, just that they are labelled differently so I don't miss out any.
Current result:
{'value': {'value1': {5: {'fingers': {'hands': {'body': {}}}}}}}
I think that the solution is the inclusion of a pretty simple step, but I can't see it at the moment and I've already spent too long looking at this.
Any help appreciated.
EDIT:
I've gotten a little further by changing my recursive function to make count a global variable and having count=1 outside the function which has sorted out the getting all the values problem.
I have narrowed down the addition of extra keys to the add_new_level function, but haven't yet figured out how to change it.
Output:
{'value': {'value1': {2: {'arms': {'abdomen': {'legs': {'abdomen': {'fingers': {'hands': {'body': {}}}}}}}}},
'value2': {2: {'legs': {'abdomen': {'fingers': {'hands': {'body': {}}}}}}},
'value3': {5: {'fingers': {'hands': {'body': {}}}}}}}
I have adjusted your output type slightly to make the dictionary containing 'value1' 'value2' etc... to an array. I believe this is better because the order of these will be lost anyway unless an OrderedDict (from collections package) is used and in any case an array will translate quite easily from index 0,1,2,3.. to val1, val2, val3, etc...
res = {'value': []}
def revnest(inp, keys=[]):
res2 = res['value']
if type(inp) == list:
inp = {i:j[i] for j in inp for i in j}
for x in inp:
if x == 'value':
res2.append({inp[x]:{}})
res2 = res2[-1][inp[x]]
for y in keys[::-1]:
res2[y] = {}
res2 = res2[y]
else:
revnest(inp[x], keys+[x])
revnest(exampleset)
print res
which given your exampleset, prints:
{'value': [{2: {'legs': {'abdomen': {'body': {}}}}}, {2: {'arms': {'abdomen': {'body': {}}}}}, {5: {'fingers': {'hands': {'body': {}}}}}]}

How can I change the value of a node in a python dictionary by following a list of keys?

I have a bit of a complex question that I can't seem to get to the bottom of. I have a list of keys corresponding to a position in a Python dictionary. I would like to be able to dynamically change the value at the position (found by the keys in the list).
For example:
listOfKeys = ['car', 'ford', 'mustang']
I also have a dictionary:
DictOfVehiclePrices = {'car':
{'ford':
{'mustang': 'expensive',
'other': 'cheap'},
'toyota':
{'big': 'moderate',
'small': 'cheap'}
},
'truck':
{'big': 'expensive',
'small': 'moderate'}
}
Via my list, how could I dynamically change the value of DictOfVehiclePrices['car']['ford']['mustang']?
In my actual problem, I need to follow the list of keys through the dictionary and change the value at the end position. How can this be done dynamically (with loops, etc.)?
Thank you for your help! :)
Use reduce and operator.getitem:
>>> from operator import getitem
>>> lis = ['car', 'ford', 'mustang']
Update value:
>>> reduce(getitem, lis[:-1], DictOfVehiclePrices)[lis[-1]] = 'cheap'
Fetch value:
>>> reduce(getitem, lis, DictOfVehiclePrices)
'cheap'
Note that in Python 3 reduce has been moved to functools module.
A very simple approach would be:
DictOfVehiclePrices[listOfKeys[0]][listOfKeys[1]][listOfKeys[2]] = 'new value'
print reduce(lambda x, y: x[y], listOfKeys, dictOfVehiclePrices)
Output
expensive
In order to change the values,
result = dictOfVehiclePrices
for key in listOfKeys[:-1]:
result = result[key]
result[listOfKeys[-1]] = "cheap"
print dictOfVehiclePrices
Output
{'car': {'toyota': {'small': 'cheap', 'big': 'moderate'},
'ford': {'mustang': 'cheap', 'other': 'cheap'}},
'truck': {'small': 'moderate', 'big': 'expensive'}}
You have a great solution here by #Joel Cornett.
based on Joel method you can use it like this:
def set_value(dict_nested, address_list):
cur = dict_nested
for path_item in address_list[:-2]:
try:
cur = cur[path_item]
except KeyError:
cur = cur[path_item] = {}
cur[address_list[-2]] = address_list[-1]
DictOfVehiclePrices = {'car':
{'ford':
{'mustang': 'expensive',
'other': 'cheap'},
'toyota':
{'big': 'moderate',
'small': 'cheap'}
},
'truck':
{'big': 'expensive',
'small': 'moderate'}
}
set_value(DictOfVehiclePrices,['car', 'ford', 'mustang', 'a'])
print DictOfVehiclePrices
STDOUT:
{'car': {'toyota': {'small': 'cheap', 'big': 'moderate'}, 'ford':
{'mustang': 'a', 'other': 'cheap'}}, 'truck': {'small': 'moderate',
'big': 'expensive'}}
def update_dict(parent, data, value):
'''
To update the value in the data if the data
is a nested dictionary
:param parent: list of parents
:param data: data dict in which value to be updated
:param value: Value to be updated in data dict
:return:
'''
if parent:
if isinstance(data[parent[0]], dict):
update_dict(parent[1:], data[parent[0]], value)
else:
data[parent[0]] = value
parent = ["test", "address", "area", "street", "locality", "country"]
data = {
"first_name": "ttcLoReSaa",
"test": {
"address": {
"area": {
"street": {
"locality": {
"country": "india"
}
}
}
}
}
}
update_dict(parent, data, "IN")
Here is a recursive function to update a nested dict based on a list of keys:
1.Trigger the update dict function with the required params
2.The function will iterate the list of keys, and retrieves the value from the dict.
3.If the retrieved value is dict, it pops the key from the list and also it updates the dict with the value of the key.
4.Sends the updated dict and list of keys to the same function recursively.
5.When the list gets empty, it means that we have reached the desired the key, where we need to apply our replacement. So if the list is empty, the funtion replaces the dict[key] with the value

Methods to indent when using multiple list comprehension

Trying not to use too many variables in code, I came up with the code below. It looks horrible. Any ideas on how to format it nicely? Do I need to use more variables?
I write code like this a lot, and it'd help to see what methods people usually resort to have readable code while making creating less variables
exceptions = []
# find all the distinct parent exceptions (sorted) and add to the list
# with their children list
for parent in collection.find(
{'tags': 'exception'}).sort('viewPriority').distinct('parentException'):
group_info = {'groupName': parent,
'children': [{'value': ex['value'],
'label': ex['label'],}
for ex in collection.find({'tags': 'exception',
'parentException': parent}
).sort('viewPriority')],
}
exceptions.append(group_info)
I would break your logic up into functions
def get_children(parent):
result = collection.find({'tags': 'exception', 'parentException': parent})
result = result.sort('viewPriority')
return [{'value': ex['value'], 'label': ex['label']} for ex in result]
def get_group_info(parent):
return {'groupName': parent, 'children': get_children(parent)}
result = collection.find({'tags': 'exception'})
result = result.sort('viewPriority').distinct('parentException')
exceptions = [get_group_info(parent) for parent in result]
As a bonus, you can easily unittest get_children and get_group_info
Definitely difficult to get this to look any good, here is my best attempt at keeping the line lengths short and maintaining readability:
exceptions = []
# find all the distinct parent exceptions (sorted) and add to the list
# with their children list
for parent in (collection.find({'tags': 'exception'})
.sort('viewPriority').distinct('parentException')):
group_info = {
'groupName': parent,
'children': [{'value': ex['value'], 'label': ex['label'],}
for ex in (collection.find({'tags': 'exception',
'parentException': parent})
.sort('viewPriority'))],
}
exceptions.append(group_info)

Categories

Resources