Say I'm given a tuple of strings, representing relationships between objects, for example:
connections = ("dr101-mr99", "mr99-out00", "dr101-out00", "scout1-scout2","scout3-scout1", "scout1-scout4", "scout4-sscout", "sscout-super")
each dash "-" shows a relationship between the two items in the string. Then I'm given two items:
first = "scout2"
second = "scout3"
How might I go about finding if first and second are interrelated, meaning I could find a path that connects them, not necessarily if they are just in a string group.
You can try concatenating the strings and using the in operator to check if it is an element of the tuple connections:
if first + "-" + second in connections:
# ...
Edit:
You can also use the join() function:
if "-".join((first, second)) in connections:
# ...
If you plan on doing this any number of times, I'd consider frozensets...
connections_set = set(frozenset(c.split('-')) for c in connections)
Now you can do something like:
if frozenset((first, second)) in connections_set:
...
and you have an O(1) solution (plus the O(N) upfront investment). Note that I'm assuming the order of the pairs is irrelevant. If it's relevant, just use a tuple instead of frozenset and you're good to go.
If you actually need to walk through a graph, an adjacency list implementation might be a little better.
from collections import defaultdict
adjacency_dict = defaultdict(list)
for c in connections:
left, right = c.split('-')
adjacency_dict[left].append(right)
# if undirected: adjacency_dict[right].append(left)
class DFS(object):
def __init__(self, graph):
self.graph = graph
def is_connected(self, node1, node2):
self._seen = set()
self._walk_connections(node1)
output = node2 in self._seen
del self._seen
return output
def _walk_connections(self, node):
if node in self._seen:
return
self._seen.add(node)
for subnode in self.graph[node]:
self._walk_connections(subnode)
print DFS(adjacency_dict).is_connected()
Note that this implementation is definitely suboptimal (I don't stop when I found the node I'm looking for for example) -- and I don't check for an optimal path from node1 to node2. For that, you'd want something like Dijkstra's algorithm
You could use a set of pairs (tuples):
connections = {("dr101", "mr99"), ("mr99", "out00"), ("dr101", "out00")} # ...
if ("scout2", "scout3") in connections:
print "scout2-scout3 in connections"
This only works if the 2 elements are already in the right order, though, because ("scout3", "scout2") != ("scout2", "scout3"), but maybe this is what you want.
If the order of the items in the connection is not significant, you can use a set of frozensets instead (see mgilson's answer). Then you can look up pairs of item regardless of which order they appear in, but the order of the original pairs in connections is lost.
Related
I have a dictionary which has IP address ranges as Keys (used to de-duplicate in a previous step) and certain objects as values. Here's an example
Part of the dictionary sresult:
10.102.152.64-10.102.152.95 object1:object3
10.102.158.0-10.102.158.255 object2:object5:object4
10.102.158.0-10.102.158.31 object3:object4
10.102.159.0-10.102.255.255 object6
There are tens of thousands of lines, I want to sort (correctly) by IP address in keys
I tried splitting the key based on the range separator - to get a single IP address that can be sorted as follows:
ips={}
for key in sresult:
if '-' in key:
l = key.split('-')[0]
ips[l] = key
else:
ips[1] = key
And then using code found on another post, sorting by IP address and then looking up the values in the original dictionary:
sips = sorted(ipaddress.ip_address(line.strip()) for line in ips)
for x in sips:
print("SRC: "+ips[str(x)], "OBJECT: "+" :".join(list(set(sresult[ips[str(x)]]))), sep=",")
The problem I have encountered is that when I split the original range and add the sorted first IPs as new keys in another dictionary, I de-duplicate again losing lines of data - lines 2 & 3 in the example
line 1 10.102.152.64 -10.102.152.95
line 2 10.102.158.0 -10.102.158.255
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
becomes
line 1 10.102.152.64 -10.102.152.95
line 3 10.102.158.0 -10.102.158.31
line 4 10.102.159.0 -10.102.255.25
So upon rebuilding the original dictionary using the IP address sorted keys, I have lost data
Can anyone help please?
EDIT This post now consists of three parts:
1) A bit of information about dictionaries that you will need in order to understand the rest.
2) An analysis of your code, and how you could fix it without using any other Python features.
3) What I would consider the best solution to the problem, in detail.
1) Dictionaries
Python dictionaries are not ordered. If I have a dictionary like this:
dictionary = {"one": 1, "two": 2}
And I loop through dictionary.items(), I could get "one": 1 first, or I could get "two": 2 first. I don't know.
Every Python dictionary implicitly has two lists associated with it: a list of it's keys and a list of its values. You can get them list this:
print(list(dictionary.keys()))
print(list(dictionary.values()))
These lists do have an ordering. So they can be sorted. Of course, doing so won't change the original dictionary, however.
Your Code
What you realised is that in your case you only want to sort according to the first IP address in your dictionaries keys. Therefore, the strategy that you adopted is roughly as follows:
1) Build a new dictionary, where the keys are only this first part.
2) Get that list of keys from the dictionary.
3) Sort that list of keys.
4) Query the original dictionary for the values.
This approach will, as you noticed, fail at step 1. Because as soon as you made the new dictionary with truncated keys, you will have lost the ability to differentiate between some keys that were only different at the end. Every dictionary key must be unique.
A better strategy would be:
1) Build a function which can represent you "full" ip addresses with as an ip_address object.
2) Sort the list of dictionary keys (original dictionary, don't make a new one).
3) Query the dictionary in order.
Let's look at how we could change your code to implement step 1.
def represent(full_ip):
if '-' in full_ip:
# Stylistic note, never use o or l as variable names.
# They look just like 0 and 1.
first_part = full_ip.split('-')[0]
return ipaddress.ip_address(first_part.strip())
Now that we have a way to represent the full IP addresses, we can sort them according to this shortened version, without having to actually change the keys at all. All we have to do is tell Python's sorted method how we want the key to be represented, using the key parameter (NB, this key parameter has nothing to do with key in a dictionary. They just both happened to be called key.):
# Another stylistic note, always use .keys() when looping over dictionary keys. Explicit is better than implicit.
sips = sorted(sresults.keys(), key=represent)
And if this ipaddress library works, there should be no problems up to here. The remainder of your code you can use as is.
Part 3 The best solution
Whenever you are dealing with sorting something, it's always easiest to think about a much simpler problem: given two items, how would I compare them? Python gives us a way to do this. What we have to do is implement two data model methods called
__le__
and
__eq__
Let's try doing that:
class IPAddress:
def __init__(self, ip_address):
self.ip_address = ip_address # This will be the full IP address
def __le__(self, other):
""" Is this object less than or equal to the other one?"""
# First, let's find the first parts of the ip addresses
this_first_ip = self.ip_address.split("-")[0]
other_first_ip = other.ip_address.split("-")[0]
# Now let's put them into the external library
this_object = ipaddress.ip_address(this_first_ip)
other_object = ipaddress.ip_adress(other_first_ip)
return this_object <= other_object
def __eq__(self, other):
"""Are the two objects equal?"""
return self.ip_address == other.ip_adress
Cool, we have a class. Now, the data model methods will automatically be invoked any time I use "<" or "<=" or "==". Let's check that it is working:
test_ip_1 = IPAddress("10.102.152.64-10.102.152.95")
test_ip_2 = IPAddress("10.102.158.0-10.102.158.255")
print(test_ip_1 <= test_ip_2)
Now, the beauty of these data model methods is that Pythons "sort" and "sorted" will use them as well:
dictionary_keys = sresult.keys()
dictionary_key_objects = [IPAddress(key) for key in dictionary_keys]
sorted_dictionary_key_objects = sorted(dictionary_key_objects)
# According to you latest comment, the line below is what you are missing
sorted_dictionary_keys = [object.ip_address for object in sorted_dictionary_key_objects]
And now you can do:
for key in sorted_dictionary_keys:
print(key)
print(sresults[key])
The Python data model is almost the defining feature of Python. I'd recommend reading about it.
I have this Binary Tree Structure:
# A Node is an object
# - value : Number
# - children : List of Nodes
class Node:
def __init__(self, value, children):
self.value = value
self.children = children
I can easily sum the Nodes, recursively:
def sumNodesRec(root):
sumOfNodes = 0
for child in root.children:
sumOfNodes += sumNodesRec(child)
return root.value + sumOfNodes
Example Tree:
exampleTree = Node(1,[Node(2,[]),Node(3,[Node(4,[Node(5,[]),Node(6,[Node(7,[])])])])])
sumNodesRec(exampleTree)
> 28
However, I'm having difficulty figuring out how to sum all the nodes iteratively. Normally, with a binary tree that has 'left' and 'right' in the definition, I can find the sum. But, this definition is tripping me up a bit when thinking about it iteratively.
Any help or explanation would be great. I'm trying to make sure I'm not always doing things recursively, so I'm trying to practice creating normally recursive functions as iterative types, instead.
If we're talking iteration, this is a good use case for a queue.
total = 0
queue = [exampleTree]
while queue:
v = queue.pop(0)
queue.extend(v.children)
total += v.value
print(total)
28
This is a common idiom. Iterative graph traversal algorithms also work in this manner.
You can simulate stacks/queues using python's vanilla lists. Other (better) alternatives would be the collections.deque structure in the standard library. I should explicitly mention that its enque/deque operations are more efficient than what you'd expect from a vanilla list.
Iteratively you can create a list, stack, queue, or other structure that can hold the items you run through. Put the root into it. Start going through the list, take an element and add its children into the list also. Add the value to the sum. Take next element and repeat. This way there’s no recursion but performance and memory usage may be worse.
In response to the first answer:
def sumNodes(root):
current = [root]
nodeList = []
while current:
next_level = []
for n in current:
nodeList.append(n.value)
next_level.extend(n.children)
current = next_level
return sum(nodeList)
Thank you! That explanation helped me think through it more clearly.
I have overcome the problem of avoiding the creation of duplicate nodes on my DB with the use of merge_one functions which works like that:
t=graph.merge_one("User","ID","someID")
which creates the node with unique ID. My problem is that I can't find a way to add multiple attributes/properties to my node along with the ID which is added automatically (date for example).
I have managed to achieve this the old "duplicate" way but it doesn't work now since merge_one can't accept more arguments! Any ideas???
Graph.merge_one only allows you to specify one key-value pair because it's meant to be used with a uniqueness constraint on a node label and property. Is there anything wrong with finding the node by its unique id with merge_one and then setting the properties?
t = graph.merge_one("User", "ID", "someID")
t['name'] = 'Nicole'
t['age'] = 23
t.push()
I know I am a bit late... but still useful I think
Using py2neo==2.0.7 and the docs (about Node.properties):
... and the latter is an instance of PropertySet which extends dict.
So the following worked for me:
m = graph.merge_one("Model", "mid", MID_SR)
m.properties.update({
'vendor':"XX",
'model':"XYZ",
'software':"OS",
'modelVersion':"",
'hardware':"",
'softwareVesion':"12.06"
})
graph.push(m)
This hacky function will iterate through the properties and values and labels gradually eliminating all nodes that don't match each criteria submitted. The final result will be a list of all (if any) nodes that match all the properties and labels supplied.
def find_multiProp(graph, *labels, **properties):
results = None
for l in labels:
for k,v in properties.iteritems():
if results == None:
genNodes = lambda l,k,v: graph.find(l, property_key=k, property_value=v)
results = [r for r in genNodes(l,k,v)]
continue
prevResults = results
results = [n for n in genNodes(l,k,v) if n in prevResults]
return results
The final result can be used to assess uniqueness and (if empty) create a new node, by combining the two functions together...
def merge_one_multiProp(graph, *labels, **properties):
r = find_multiProp(graph, *labels, **properties)
if not r:
# remove tuple association
node,= graph.create(Node(*labels, **properties))
else:
node = r[0]
return node
example...
from py2neo import Node, Graph
graph = Graph()
properties = {'p1':'v1', 'p2':'v2'}
labels = ('label1', 'label2')
graph.create(Node(*labels, **properties))
for l in labels:
graph.create(Node(l, **properties))
graph.create(Node(*labels, p1='v1'))
node = merge_one_multiProp(graph, *labels, **properties)
I've found related methods:
find - doesn't work because this version of neo4j doesn't support labels.
match - doesn't work because I cannot specify a relation, because the node has no relations yet.
match_one - same as match.
node - doesn't work because I don't know the id of the node.
I need an equivalent of:
start n = node(*) where n.name? = "wvxvw" return n;
Cypher query. Seems like it should be basic, but it really isn't...
PS. I'm opposed to using Cypher for too many reasons to mention. So that's not an option either.
Well, you should create indexes so that your start nodes are reduced. This will be automatically taken care of with the use of labels, but in the meantime, there can be a work around.
Create an index, say "label", which will have keys pointing to the different types of nodes you will have (in your case, say 'Person')
Now while searching you can write the following query :
START n = node:label(key_name='Person') WHERE n.name = 'wvxvw' RETURN n; //key_name is the key's name you will assign while creating the node.
user797257 seems to be out of the game, but I think this could still be useful:
If you want to get nodes, you need to create an index. An index in Neo4j is the same as in MySQL or any other database (If I understand correctly). Labels are basically auto-indexes, but an index offers additional speed. (I use both).
somewhere on top, or in neo4j itself create an index:
index = graph_db.get_or_create_index(neo4j.Node, "index_name")
Then, create your node as usual, but do add it to the index:
new_node = batch.create(node({"key":"value"}))
batch.add_indexed_node(index, "key", "value", new_node)
Now, if you need to find your new_node, execute this:
new_node_ref = index.get("key", "value")
This returns a list. new_node_ref[0] has the top item, in case you want/expect a single node.
use selector to obtain node from the graph
The following code fetches the first node from list of nodes matching the search
selector = NodeSelector(graph)
node = selector.select("Label",key='value')
nodelist=list(node)
m_node=node.first()
using py2neo, this hacky function will iterate through the properties and values and labels gradually eliminating all nodes that don't match each criteria submitted. The final result will be a list of all (if any) nodes that match all the properties and labels supplied.
def find_multiProp(graph, *labels, **properties):
results = None
for l in labels:
for k,v in properties.iteritems():
if results == None:
genNodes = lambda l,k,v: graph.find(l, property_key=k, property_value=v)
results = [r for r in genNodes(l,k,v)]
continue
prevResults = results
results = [n for n in genNodes(l,k,v) if n in prevResults]
return results
see my other answer for creating a merge_one() that will accept multiple properties...
pydot has a huge number of bound methods for getting and setting every little thing in a dot graph, reading and writing, you-name-it, but I can't seem to find a simple membership test.
>>> d = pydot.Dot()
>>> n = pydot.Node('foobar')
>>> d.add_node(n)
>>> n in d.get_nodes()
False
is just one of many things that didn't work. It appears that nodes, once added to a graph, acquire a new identity
>>> d.get_nodes()[0]
<pydot.Node object at 0x171d6b0>
>>> n
<pydot.Node object at 0x1534650>
Can anyone suggest a way to create a node and test to see if it's in a graph before adding it so you could do something like this:
d = pydot.Dot()
n = pydot.Node('foobar')
if n not in d:
d.add_node(n)
Looking through the source code, http://code.google.com/p/pydot/source/browse/trunk/pydot.py, it seems that node names are unique values, used as the keys to locate the nodes within a graph's node dictionary (though, interestingly, rather than return an error for an existing node, it simply adds the attributes of the new node to those of the existing one).
So unless you want to add an implementation of __contains__() to one of the classes in the pydot.py file that does the following, you can just do the following in your code:
if n.get_name() not in d.obj_dict['nodes'].keys():
d.add_node(n)