Adding Recursively to Set - python

I'm trying to wrap my head around recursive functions with this one function def friends(self, name, degree):. The purpose of this one is to return the set of all friends up to a specified degree (for an address book). It's the last part of a larger class called class SocialAddressBook:. The 'degree' in this class allows the user to 'query' friends of friends: degree one is a direct friend, degree 2 is a friend-of-a-friend, and so on. The code I have is
def friends(self, name, degree):
fs = set()
if degree == 0:
return set()
if degree == 1:
which is far as my knowledge on this goes....
also some more context:
Transitive friendship:
Fred → Barb → Jane → Emma → Lisa
Fred → Sue
Jane → Mary
and so my tests are : a.friends('Fred', 1) == {'Barb', 'Sue'}
a.friends('Fred', 2) == {'Barb', 'Jane', 'Sue'}
a.friends('Fred', 3) == {'Mary', 'Barb', 'Jane', 'Sue', 'Emma'}
a.friends('Fred', 4) == {'Barb', 'Emma', 'Mary', 'Lisa', 'Sue', 'Jane'}
it only goes up to degree 4. SO should I even do this recursively or just manually since I know the degree it goes up to?.
If anyone could point me in the right direction on how to complete this recursively, that'd be awesome, thanks!

I would say to do this iteratively: just make it add friends to the current list n times, where n is an input parameter.
fs = set(self)
for i in range (n):
wider = set()
for chum in fs.copy():
for new_chum in chum.friend_list:
fs += new_chum
At each level, make a wider set from the friends of the current set. Once you've been through all of those, add them to the friend set. Repeat N times.

It is best to do this iteratively.
def get_friends_iteratively(self, name, degree)
if degree < 0:
raise ValueError('degree should be an int >= 0')
if degree == 0:
return set() # no friends of degree 0!
friends = self.friends
for _ in range(degree):
new_friends = set()
# it is important not to change a set while we iterate through it.
# thus, we change new_friends, then update friends when we are done.
for other_person in self.friends:
new_friends |= other_person.friends
friends |= new_friends
return friends
# In python, the |= ('in-place or') operator updates a set with
# the union of itself and another set.

Related

Am I missing a check for the actions in the given state?

The problem:
Three traditional, but jealous, couples need to cross a river. Each couple consists of a husband and a wife. They find a small boat that can contain no more than two persons. Find the simplest schedule of crossings that will permit all six people to cross the river so that none of the women shall be left in company with any of the men, unless her husband is present. It is assumed that all passengers on the boat onboard before the next trip and at least one person has to be in the boat for each crossing.
Had to edit out code, was requested by professor.
I've been working on this problem for 6 hours and I am stumped. My professor is busy and cannot help.
I toke a careful look on your code. It is indeed a very interesting problem and quite complex. After some while I realized that what maybe causing your problem is that your are checking the conditions before the crossing is made and not afterwords. A saw the template you provided and I guess we can try to stick with to logic proposed by 1- make the action method return all possible crosses (without checking the states yet) 2- given each action, get the corresponding new state and check if that state is valid. 3- make the value() method to check if we are making progress on the optimization.
class Problem:
def __init__(self, initial_state, goal):
self.goal = goal
self.record = [[0, initial_state, "LEFT", []]]
# list of results [score][state][boat_side][listActions]
def actions(self, state, boat_side):
side = 0 if boat_side == 'LEFT' else 1
boat_dir = 'RIGHT' if boat_side == 'LEFT' else 'LEFT'
group = [i for i, v in enumerate(state) if v == side]
onboard_2 = [[boat_dir, a, b] for a in group for b in group if
a < b and ( # not the same person and unique group
(a%2==0 and b - a == 1) or ( # wife and husband
a%2==0 and b%2==0) or ( # two wife's
a%2==1 and b%2==1) # two husbands
)]
onboard_1 = [[boat_dir, a] for a in group]
return onboard_1 + onboard_2
def result(self, state, action):
new_boat_side = action[0]
new_state = []
for i, v in enumerate(state):
if i in action[1:]:
new_state.append(1 if v == 0 else 0)
else:
new_state.append(v)
# check if invalid
for p, side, in enumerate(new_state):
if p%2 == 0: # is woman
if side != new_state[p+1]: # not with husband
if any(men == side for men in new_state[1::2]):
new_state = False
break
return new_state, new_boat_side
def goal_test(self, state):
return state == self.goal
def value(self, state):
# how many people already crossed
return state.count(1)
# optimization process
initial_state = [0]*6
goal = [1]*6
task = Problem(initial_state, goal)
while True:
batch_result = []
for score, state, side, l_a in task.record:
possible_actions = task.actions(state, side)
for a in possible_actions:
new_state, new_boat_side = task.result(state, a)
if new_state: # is a valid state
batch_result.append([
task.value(new_state),
new_state,
new_boat_side,
l_a + a,
])
batch_result.sort(key= lambda x: x[0], reverse= True)
# sort the results with the most people crossed
task.record = batch_result[:5]
# I am only sticking with the best 5 results but
# any number should be fine on this problem
if task.goal_test(task.record[0][1]):
break
# for i in task.record[:5]: # uncomment these lines to see full progress
# print(i)
# x = input() # press any key to continue
print(task.record[0][3])
I hope it helped, please fill free to say if anything is still not so clear.

How to find closely matching unique elements in two lists? (Using a distance function here)

I am trying to create a name matcher to compare say, 'JOHN LEWIS' to 'JOHN SMITH LEWIS'. They are clearly the same person and I want to create a function where when you enter those names, it turns it into a list then gives you the matching names.
The problem is that my loop is returning that 'LEWIS' matches with 'LEWIS' and 'SMITH' matches with 'LEWIS' because of the order that it is in.
from pyjarowinkler import distance
entered_name = 'JOHN LEWIS'.split(' ') # equals ['JOHN','LEWIS']
system_name = 'JOHN SMITH LEWIS'.split(' ') # equals ['JOHN','SMITH','LEWIS']
ratio = []
for i in entered_name:
maximum = 0
for j in system_name:
score = distance.get_jaro_distance(i, j, winkler=True,
scaling=0.1)
while score > maximum:
maximum = score
new = (i, j, maximum)
system_name.remove(i)
#removes that name from the original list
ratio.append(new)
would return something like: [('JOHN', 'JOHN', 1.0), ('LEWIS', 'SMITH', 0.47)]
and not: [('JOHN', 'JOHN', 1.0), ('LEWIS', 'LEWIS', 1.0)] <- this is what I want.
Also, if you try something like 'ALLY A ARM' with 'ALLY ARIANA ARMANI', it matches 'ALLY' twice if you don't do that remove(i) line. This is why I only want unique matches!
I just keep getting errors or the answers that I am not looking for.
The issue is with your system_name.remove(i) line. First of all, it's usually a bad idea to modify a list while you're iterating through that list. This can lead to unexpected behavior. In your case, here's what your code is doing:
First time through, matches 'JOHN', and 'JOHN'. No problem.
Removes 'JOHN' from system_name. Now system_name = ['SMITH', 'LEWIS'].
Second time through, i = 'LEWIS', j = 'SMITH', score = .47 which is greater than 0, so your check score > maximum passes
We set maximum = score
We set new = ('LEWIS', 'SMITH', 0.47)
We remove 'LEWIS' from system_name. Now system_name = ['SMITH']. Uh oh...
Simple rewrite below, using an if instead of a while loop because the while loop is totally unnecessary:
for i in entered_name:
maximum = 0
for j in system_name:
score = distance.get_jaro_distance(i, j, winkler=True,
scaling=0.1)
if score > maximum:
maximum = score
new = (i, j, maximum)
system_name.remove(new[1]) # want to remove 'SMITH' in the example, not 'LEWIS'
ratio.append(new)
All I did was move the system_name.remove() call outside of the loop over system_name, and replace i with j (using new[1] since I'm outside of the j loop).
Jaro-Winkler distance is for comparison of sequences, there is no need to compare individual elements as if you were trying to find an edit distance between individual characters rather than whole words.
With that in mind, one should probably treat parts of a name as individual letters, and the whole name as a word, comparing, say, "JL" vs. "JSL" instead of "JOHN LEWIS" and "JOHN SMITH LEWIS":
import string
import itertools
from pyjarowinkler import distance
WORDS_CACHE = {}
def next_letter():
base = ""
while True:
for ch in string.ascii_lowercase:
yield base + ch
base += ch
GENERATOR = next_letter()
def encode(word):
if word not in WORDS_CACHE:
WORDS_CACHE[word] = GENERATOR.next()
return WORDS_CACHE[word]
def score(first_name, second_name):
return distance.get_jaro_distance(
"".join(map(encode, first_name.split())),
"".join(map(encode, second_name.split())),
)

Sorting from smallest to biggest

I would like to sort several points from smallest to biggest however.
I will wish to get this result:
Drogba 2 pts
Owen 4 pts
Henry 6 pts
However, my ranking seems to be reversed for now :-(
Henry 6 pts
Owen 4 pts
Drogba 2 pts
I think my problem is with my function Bubblesort ?
def Bubblesort(name, goal1, point):
swap = True
while swap:
swap = False
for i in range(len(name)-1):
if goal1[i+1] > goal1[i]:
goal1[i], goal1[i+1] = goal1[i+1], goal1[i]
name[i], name[i+1] = name[i+1], name[i]
point[i], point[i + 1] = point[i + 1], point[i]
swap = True
return name, goal1, point
def ranking(name, point):
for i in range(len(name)):
print(name[i], "\t" , point[i], " \t ")
name = ["Henry", "Owen", "Drogba"]
point = [0]*3
goal1 = [68, 52, 46]
gain = [6,4,2]
name, goal1, point = Bubblesort( name, goal1, point )
for i in range(len(name)):
point[i] += gain[i]
ranking (name, point)
In your code:
if goal1[i+1] > goal1[i]:
that checks if it is greater. You need to swap it if the next one is less, not greater.
Change that to:
if goal1[i+1] < goal1[i]:
A bunch of issues:
def Bubblesort - PEP8 says function names should be lowercase, ie def bubblesort
You are storing your data as a bunch of parallel lists; this makes it harder to work on and think about (and sort!). You should transpose your data so that instead of having a list of names, a list of points, a list of goals you have a list of players, each of whom has a name, points, goals.
def bubblesort(name, goal1, point): - should look like def bubblesort(items) because bubblesort does not need to know that it is getting names and goals and points and sorting on goals (specializing it that way keeps you from reusing the function later to sort other things). All it needs to know is that it is getting a list of items and that it can compare pairs of items using >, ie Item.__gt__ is defined.
Instead of using the default "native" sort order, Python sort functions usually let you pass an optional key function which allows you to tell it what to sort on - that is, sort on key(items[i]) > key(items[j]) instead of items[i] > items[j]. This is often more efficient and/or convenient than reshuffling your data to get the sort order you want.
for i in range(len(name)-1): - you are iterating more than needed. After each pass, the highest value in the remaining list gets pushed to the top (hence "bubble" sort, values rise to the top of the list like bubbles). You don't need to look at those top values again because you already know they are higher than any of the remaining values; after the nth pass, you can ignore the last n values.
actually, the situation is a bit better than that; you will often find runs of values which are already in sorted order. If you keep track of the highest index that actually got swapped, you don't need to go beyond that on your next pass.
So your sort function becomes
def bubblesort(items, *, key=None):
"""
Return items in sorted order
"""
# work on a copy of the list (don't destroy the original)
items = list(items)
# process key values - cache the result of key(item)
# so it doesn't have to be called repeatedly
keys = items if key is None else [key(item) for item in items]
# initialize the "last item to sort on the next pass" index
last_swap = len(items) - 1
# sort!
while last_swap:
ls = 0
for i in range(last_swap):
j = i + 1
if keys[i] > keys[j]:
# have to swap keys and items at the same time,
# because keys may be an alias for items
items[i], items[j], keys[i], keys[j] = items[j], items[i], keys[j], keys[i]
# made a swap - update the last_swap index
ls = i
last_swap = ls
return items
You may not be sure that this is actually correct, so let's test it:
from random import sample
def test_bubblesort(tries = 1000):
# example key function
key_fn = lambda item: (item[2], item[0], item[1])
for i in range(tries):
# create some sample data to sort
data = [sample("abcdefghijk", 3) for j in range(10)]
# no-key sort
assert bubblesort(data) == sorted(data), "Error: bubblesort({}) gives {}".format(data, bubblesort(data))
# keyed sort
assert bubblesort(data, key=key_fn) == sorted(data, key=key_fn), "Error: bubblesort({}, key) gives {}".format(data, bubblesort(data, key_fn))
test_bubblesort()
Now the rest of your code becomes
class Player:
def __init__(self, name, points, goals, gains):
self.name = name
self.points = points
self.goals = goals
self.gains = gains
players = [
Player("Henry", 0, 68, 6),
Player("Owen", 0, 52, 4),
Player("Drogba", 0, 46, 2)
]
# sort by goals
players = bubblesort(players, key = lambda player: player.goals)
# update points
for player in players:
player.points += player.gains
# show the result
for player in players:
print("{player.name:<10s} {player.points:>2d} pts".format(player=player))
which produces
Drogba 2 pts
Owen 4 pts
Henry 6 pts

Sorting a List with Relative Positional Data

This is more of a conceptual programming question, so bear with me:
Say you have a list of scenes in a movie, and each scene may or may not make reference to past/future scenes in the same movie. I'm trying to find the most efficient algorithm of sorting these scenes. There may not be enough information for the scenes to be completely sorted, of course.
Here's some sample code in Python (pretty much pseudocode) to clarify:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
class Scene:
def __init__(self, scene_id, references):
self.id = scene_id
self.references = references
def __repr__(self):
return self.id
def relative_sort(scenes):
return scenes # Algorithm in question
def main():
s1 = Scene('s1', [
Reference('s3', 'after')
])
s2 = Scene('s2', [
Reference('s1', 'before'),
Reference('s4', 'after')
])
s3 = Scene('s3', [
Reference('s4', 'after')
])
s4 = Scene('s4', [
Reference('s2', 'before')
])
print relative_sort([s1, s2, s3, s4])
if __name__ == '__main__':
main()
The goal is to have relative_sort return [s4, s3, s2, s1] in this case.
If it's helpful, I can share my initial attempt at the algorithm; I'm a little embarrassed at how brute-force it is. Also, if you're wondering, I'm trying to decode the plot of the film "Mulholland Drive".
FYI: The Python tag is only here because my pseudocode was written in Python.
The algorithm you are looking for is a topological sort:
In the field of computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks.
You can compute this pretty easily using a graph library, for instance, networkx, which implements topological_sort. First we import the library and list all of the relationships between scenes -- that is, all of the directed edges in the graph
>>> import networkx as nx
>>> relations = [
(3, 1), # 1 after 3
(2, 1), # 2 before 1
(4, 2), # 2 after 4
(4, 3), # 3 after 4
(4, 2) # 4 before 2
]
We then create a directed graph:
>>> g = nx.DiGraph(relations)
Then we run a topological sort:
>>> nx.topological_sort(g)
[4, 3, 2, 1]
I have included your modified code in my answer, which solves the current (small) problem, but without a larger sample problem, I'm not sure how well it would scale. If you provide the actual problem you're trying to solve, I'd love to test and refine this code until it works on that problem, but without test data I won't optimize this solution any further.
For starters, we track references as sets, not lists.
Duplicates don't really help us (if "s1" before "s2", and "s1" before "s2", we've gained no information)
This also lets us add inverse references with abandon (if "s1" comes before "s2", then "s2" comes after "s1").
We compute a min and max position:
Min position based on how many scenes we come after
This could be extended easily: If we come after two scenes with a min_pos of 2, our min_pos is 4 (If one is 2, other must be 3)
Max position based on how many things we come before
This could be extended similarly: If we come before two scenes with a max_pos of 4, our max_pos is 2 (If one is 4, other must be 3)
If you decide to do this, just replace pass in tighten_bounds(self) with code to try to tighten the bounds for a single scene (and set anything_updated to true if it works).
The magic is in get_possible_orders
Generates all valid orderings if you iterate over it
If you only want one valid ordering, it doesn't take the time to create them all
Code:
class Reference:
def __init__(self, scene_id, relation):
self.scene_id = scene_id
self.relation = relation
def __repr__(self):
return '"%s %s"' % (self.relation, self.scene_id)
def __hash__(self):
return hash(self.scene_id)
def __eq__(self, other):
return self.scene_id == other.scene_id and self.relation == other.relation
class Scene:
def __init__(self, title, references):
self.title = title
self.references = references
self.min_pos = 0
self.max_pos = None
def __repr__(self):
return '%s (%s,%s)' % (self.title, self.min_pos, self.max_pos)
inverse_relation = {'before': 'after', 'after': 'before'}
def inverted_reference(scene, reference):
return Reference(scene.title, inverse_relation[reference.relation])
def is_valid_addition(scenes_so_far, new_scene, scenes_to_go):
previous_ids = {s.title for s in scenes_so_far}
future_ids = {s.title for s in scenes_to_go}
for ref in new_scene.references:
if ref.relation == 'before' and ref.scene_id in previous_ids:
return False
elif ref.relation == 'after' and ref.scene_id in future_ids:
return False
return True
class Movie:
def __init__(self, scene_list):
self.num_scenes = len(scene_list)
self.scene_dict = {scene.title: scene for scene in scene_list}
self.set_max_positions()
self.add_inverse_relations()
self.bound_min_max_pos()
self.can_tighten = True
while self.can_tighten:
self.tighten_bounds()
def set_max_positions(self):
for scene in self.scene_dict.values():
scene.max_pos = self.num_scenes - 1
def add_inverse_relations(self):
for scene in self.scene_dict.values():
for ref in scene.references:
self.scene_dict[ref.scene_id].references.add(inverted_reference(scene, ref))
def bound_min_max_pos(self):
for scene in self.scene_dict.values():
for ref in scene.references:
if ref.relation == 'before':
scene.max_pos -= 1
elif ref.relation == 'after':
scene.min_pos += 1
def tighten_bounds(self):
anything_updated = False
for scene in self.scene_dict.values():
pass
# If bounds for any scene are tightened, set anything_updated back to true
self.can_tighten = anything_updated
def get_possible_orders(self, scenes_so_far):
if len(scenes_so_far) == self.num_scenes:
yield scenes_so_far
raise StopIteration
n = len(scenes_so_far)
scenes_left = set(self.scene_dict.values()) - set(scenes_so_far)
valid_next_scenes = set(s
for s in scenes_left
if s.min_pos <= n <= s.max_pos)
# valid_next_scenes = sorted(valid_next_scenes, key=lambda s: s.min_pos * self.num_scenes + s.max_pos)
for s in valid_next_scenes:
if is_valid_addition(scenes_so_far, s, scenes_left - {s}):
for valid_complete_sequence in self.get_possible_orders(scenes_so_far + (s,)):
yield valid_complete_sequence
def get_possible_order(self):
return self.get_possible_orders(tuple()).__next__()
def relative_sort(lst):
try:
return [s.title for s in Movie(lst).get_possible_order()]
except StopIteration:
return None
def main():
s1 = Scene('s1', {Reference('s3', 'after')})
s2 = Scene('s2', {
Reference('s1', 'before'),
Reference('s4', 'after')
})
s3 = Scene('s3', {
Reference('s4', 'after')
})
s4 = Scene('s4', {
Reference('s2', 'before')
})
print(relative_sort([s1, s2, s3, s4]))
if __name__ == '__main__':
main()
As others have pointed out, you need a topological sort. A depth first traversal of the directed graph where the order relation forms the edges is all you need. Visit in post order. This the reverse of a topo sort. So to get the topo sort, just reverse the result.
I've encoded your data as a list of pairs showing what's known to go before what. This is just to keep my code short. You can just as easily traverse your list of classes to create the graph.
Note that for topo sort to be meaningful, the set being sorted must satisfy the definition of a partial order. Yours is fine. Order constraints on temporal events naturally satisfy the definition.
Note it's perfectly possible to create a graph with cycles. There's no topo sort of such a graph. This implementation doesn't detect cycles, but it would be easy to modify it to do so.
Of course you can use a library to get the topo sort, but where's the fun in that?
from collections import defaultdict
# Before -> After pairs dictating order. Repeats are okay. Cycles aren't.
# This is OP's data in a friendlier form.
OrderRelation = [('s3','s1'), ('s2','s1'), ('s4','s2'), ('s4','s3'), ('s4','s2')]
class OrderGraph:
# nodes is an optional list of items for use when some aren't related at all
def __init__(self, relation, nodes=[]):
self.succ = defaultdict(set) # Successor map
heads = set()
for tail, head in relation:
self.succ[tail].add(head)
heads.add(head)
# Sources are nodes that have no in-edges (tails - heads)
self.sources = set(self.succ.keys()) - heads | set(nodes)
# Recursive helper to traverse the graph and visit in post order
def __traverse(self, start):
if start in self.visited: return
self.visited.add(start)
for succ in self.succ[start]: self.__traverse(succ)
self.sorted.append(start) # Append in post-order
# Return a reverse post-order visit, which is a topo sort. Not thread safe.
def topoSort(self):
self.visited = set()
self.sorted = []
for source in self.sources: self.__traverse(source)
self.sorted.reverse()
return self.sorted
Then...
>>> print OrderGraph(OrderRelation).topoSort()
['s4', 's2', 's3', 's1']
>>> print OrderGraph(OrderRelation, ['s1', 'unordered']).topoSort()
['s4', 's2', 's3', 'unordered', 's1']
The second call shows that you can optionally pass values to be sorted in a separate list. You may but don't have mention values already in the relation pairs. Of course those not mentioned in order pairs are free to appear anywhere in the output.

I can't find the corner cases

The problem specification is in https://www.dropbox.com/s/lmwxcsp3lie0x3n/437.pdf?dl=0
My solution is in http://ideone.com/3JsFCq
name = raw_input()
D = int(raw_input()) #degree of separation
N = int(raw_input()) #number of links
M = int(raw_input()) #book users
users = {}
books = {}
def build_edges(user1, user2):
if user1 not in users:
users[user1] = set([user2, ])
else:
users[user1].add(user2)
for i in xrange(N):
nw = raw_input()
us = nw.split('|')
build_edges(us[0], us[1])
build_edges(us[1], us[0])
def build_booklist(user1, book):
if user1 not in books:
users[user1] = []
else:
users[user1].append(user2)
for i in xrange(M):
bk = raw_input().split('|')
books[bk[0]] = []
for book in bk[1:]:
books[bk[0]].append(book)
rec = []
depth = [0,]
def bfs(graph, start):
visited, queue = set(), [start]
while queue:
vertex = queue.pop(0)
if vertex not in visited:
visited.add(vertex)
for book in books[vertex]:
if book not in books[start]:
rec.append(book)
queue.extend(graph[vertex] - visited)
depth[0] += 1
if depth[0] > D:
return
return visited
bfs(users, name)
print len(rec)
I couldn't find the corner cases.
It passes the example case, but it doesn't pass some others.
What is going wrong?
Your problem is that you are increasing the depth every time you process a vertex. Instead you need to store a depth for every vertex, and stop when you encounter a vertex with a depth larger that the given.
For example, if Alice has two friends, Bob and Carl, then as you process Alice, you will set depth to 1. Then as you process Bob, you will set it to two, and stop, before you process Carl, who is within distance one from Alice. Instead, you should set Alice's depth to 0, then as you add Bob and Carl to the queue, set their depths to 1, and process them. As you process them, you add their friends, whom you have not seen yet, with depth 2, and as soon as you encounter any of them in the main loop (pop from the queue), you stop.
UPDATE: also, add the first vertex to the visited set, when you initialize it. Otherwise you will process it as a vertex with depth two (you will add Alice's friend Bob with depth 1, and then Alice as Bob's friend with distance two). It doesn't hurt in this particular problem, but might be a problem if you make a similar mistake in a solution for some other BFS problem.

Categories

Resources