Minimum cost to set internet in all rooms? - python

There are n rooms in a hostel. We want to supply internet to all the rooms, laying a proper network.
For each room 'i', we can either have a router directly with the cost 'router[i]', or connect that room through an ethernet cable pulled from another neighbor room’s router. The cost to lay ethernet cable between the two rooms is given by ethernet array, containing '(i, j, c)' which means it costs 'c' to lay ethernet cable between room 'i' and 'j' (or) 'j' to 'i'.
Connection is undirected, and we can have multiple ways to set the
network up
At least 1 person has to take an initial router connection if everyone else intends to pull an ethernet cable from him.
If 1 person has pulled an ethernet from their neighbour, another neighbour of them also can pull ethernet from them. There's no limit/constraint on this chain.
Example:
minCostConnecting(n, router, ethernet)
n = 5
router = [1,2,1,5,3]
ethernet = [[2,4,1], [0,2,3], [1,3,3], [0,4,1]]
Output = 8
At Room 0 and Room 2, set up a router. Cost = router[0] + router[2] = 2
Connect rooms 0 and 4 at a cost of 1.
At room 1 set up a router, with cost 2.
From room 1 to 3, setup an ethernet cable at cost 3.
I am unsure of the best approach for solving this problem. I believe we need to at least lay internet in 1 room, and thought greedily choosing the cheapest room to lay the connection would work best? But then I am unsure how to connect the rooms properly given that we can also chain connections together?
Any sense of the algorithms/approaches here would be appreciated

Sounds exactly like MST problem. (Minimum Spanning Tree, for more info check the wiki -> https://en.wikipedia.org/wiki/Minimum_spanning_tree)
Firstly I would make a directed weighted graph of the internet cost which the nodes are the rooms and the edges weight are the cost of the laying the ethernet between the two rooms.
beside the rooms nodes I would add another node for the initial router with edges weight corresponding to the cost of the router in each room. this node guarantees that atleast one person has to take a router but also more than one person can buy another router (if it's cost effective).
Now all is left is to find the MSP (takes O(m + n log n) using Prim's algorithm) and that's it

You can do something like this
import heapq
def minCostConnecting(n, router, ethernet):
# Create a graph with n vertices
graph = [[] for _ in range(n)]
# Add edges to the graph with the costs
for i, j, c in ethernet:
graph[i].append((j, c))
graph[j].append((i, c))
# Initialize the heap with the costs of the routers
heap = [(cost, i) for i, cost in enumerate(router)]
heapq.heapify(heap)
# Initialize the set of visited vertices
visited = set()
# Initialize the total cost
total_cost = 0
# Loop until all vertices are visited
while len(visited) < n:
# Pop the cheapest router from the heap
cost, u = heapq.heappop(heap)
# If the router's vertex hasn't been visited yet, add it to the visited set and update the total cost
if u not in visited:
visited.add(u)
total_cost += cost
# Visit all the neighbors of the router's vertex and add them to the heap if they haven't been visited yet
for v, c in graph[u]:
if v not in visited:
heapq.heappush(heap, (c, v))
return total_cost
n = 5
router = [1,2,1,5,3]
ethernet = [[2,4,1], [0,2,3], [1,3,3], [0,4,1]]
print(minCostConnecting(n, router, ethernet)) # Output: 8

Related

How can I find the maximum number of cities that I can visit given a travel budget (in minutes) using a travel time matrix

I have a list of 12 cities connected to each other without exception. The only thing of concern is travel time. The name of each city is here.  The distance matrix (representing travel time in minutes) between city pairs is here. 
How can I find out how many cities I can visited given a certain travel budget (say 800 minutes) from a city of origin (it can be any of the 12).
You can't visit the same city twice during the trip and you don't need to worry about returning to your origin. I can't go above my travel budget.
import numpy as np
from scipy.spatial import distance_matrix
from sklearn.cluster import AgglomerativeClustering
def find_cities(dist, budget): # dist: a 12x12 matrix of travel time in minutes between city pairs; budget: max travel time allowed for the trip (in mins)
assert len(dist) == 12 # there are 12 cities to visit and each one has a pairwise cost with all other 11 citis
clusters = [] # list of cluster labels from 1..n where n is no of cities to be visited
dists = [0] + [row[1:] for row in dist] # start-to-start costs have been excluded from the distances array which only contains intercity distances
linkage = 'complete' # complete linkage used here because we want an optimal solution i.e., finding minimum number of clusters required
ac = AgglomerativeClustering(affinity='precomputed', linkage=linkage, compute_full_tree=True) # affinity must be precomputed or function otherwise it will use euclidean distance by default !!!
# compute full tree ensures that I get all possible clustesr even if they don't make up entire population! This is needed so that I can determine how many clusters need to be created given my budget constraints below
Z = ac.fit_predict(dists).tolist() # AgglomerativeClustering.fit_predict returns list of cluster labels for each city
while budget >= min(dists): # while my budget is greater than the minimum intercity travel cost, i.e., I can still visit another city
if len(set(Z)) > 1: # at least 2 clusters are needed to form a valid tour so continue only when there are more than one cluster left in Z
c1 = np.argmax([sum([i==j for j in Z]) for i in set(Z)]) # find which clustes has max no of cities associated with it and that will be the next destination (cities within this cluster have same label as their parent cluster!) # numpy argmax returns index of maximum value along an axis; here we want to know which group has most elements!
c2 = [j for j,val in enumerate(Z) if val == Z[c1]][0] # retrieve first element from the group whose parent is 'cluster' returned by previous line
clusters += [c2 + 1] ## add new destination found into our trip plan/list "clusters" after converting its city id back into integer starting from 1 instead of 0 like array indices do!!
dists += [dist[c1][c2]] ## update total distance travelled so far based on newly added destination ... note: distances between two adjacent cities always equals zero because they fall under same cluster
budget -= dists[-1] ## update travel budget by subtracting the cost of newly added destination from our total budget
else: break # when there is only one city left in Z, then stop! it's either a single city or two cities are connected which means cost between them will always be zero!
return clusters # this is a list of integers where each integer represents the id of city that needs to be visited next!
def main():
with open('uk12_dist.txt','r') as f: ## read travel time matrix between cities from file ## note: 'with' keyword ensures file will be closed automatically after reading or writing operation done within its block!!!
dist = [[int(num) for num in line.split()] for line in f] ## convert text data into array/list of lists using list comprehension; also ensure all data are converted into int before use!
with open('uk12_name.txt','r') as f: ## read names of 12 cities from file ## note: 'with' keyword ensures file will be closed automatically after reading or writing operation done within its block!!!
name = [line[:-1].lower().replace(" ","") for line in f] ## remove newline character and any leading/trailing spaces, then convert all characters to lowercase; also make sure there's no space between first and last name (which results in empty string!) otherwise won't match later when searching distances!!
budget = 800 # max travel budget allowed (in mins) i.e., 8 hrs travelling at 60 mins per km which means I can cover about 800 kms on a full tank!
print(find_cities(dist,budget), "\n") ## print(out list of city ids to visit next!
print("Total distance travelled: ", sum(dist[i][j] for i, j in enumerate([0]+find_cities(dist,budget))), "\n" ) # calculate total cost/distance travelled so far by adding up all distances between cities visited so far - note index '0' has been added at start because 0-2 is same as 2-0 and it's not included in find_cities() output above !
while True:
try: ## this ensures that correct input from user will be obtained only when required!!
budget = int(raw_input("\nEnter your travel budget (in minutes): ")) # get new travel budget from user and convert into integer before use!!!
if budget <= 800: break ## stop asking for valid input only when the value entered by user isn't greater than 800 mins or 8 hrs !!
except ValueError: ## catch exception raised due to invalid data type; continue asking until a valid number is given by user!!
pass
print(name[find_cities(dist,budget)[1]],"->",name[find_cities(dist,budget)[2]],"-> ...",name[find_cities(dist,budget)[-1]] )## print out the city names of cities to visit next!
return None
if __name__ == '__main__': main()

How to set minimum locations per route in Google OR-Tools?

I am trying to limit the minimum locations visit per vehicle, I have implemented the maximum location constraint successfully but having issues in figuring out minimum locations. My code for maximum location:
def counter_callback(from_index):
"""Returns 1 for any locations except depot."""
# Convert from routing variable Index to user NodeIndex.
from_node = manager.IndexToNode(from_index)
return 1 if (from_node != 0) else 0;
counter_callback_index = routing.RegisterUnaryTransitCallback(counter_callback)
routing.AddDimensionWithVehicleCapacity(
counter_callback_index,
0, # null slack
[16,16,16], # maximum locations per vehicle
True, # start cumul to zero
'Counter')
You should not put a hard limit on the number of nodes as it easily makes the model unfeasible.
The recommended way is to create a new dimension which just counts the number of visits (the evaluator always returns 1), then push a soft lower bound on the cumulvar of this dimension at the end of each vehicle.

Pair items of a list depending on value

I have an xml file like the following:
<edge from="0/0" to="0/1" speed="10"/>
<edge from="0/0" to="1/0" speed="10"/>
<edge from="0/1" to="0/0" speed="10"/>
<edge from="0/1" to="0/2" speed="10"/>
...
Note, that there exist pairs of from-to and vice versa. (In the example above only the pair ("0/0","0/1") and ("0/1","0/0") is visible, however there is a partner for every entry.) Also, note that those pairs are not ordered.
The file describes edges within a SUMO network simulation. I want to assign new speeds randomly to the different streets. However, every <edge> entry only describes one direction(lane) of a street. Hence, I need to find its "partner".
The following code distributes the speed values lane-wise only:
import xml.dom.minidom as dom
import random
edgexml = dom.parse("plain.edg.xml")
MAX_SPEED_OPTIONS = ["8","9","10"]
for edge in edgexml.getElementsByTagName("edge"):
x = random.randint(0,2)
edge.setAttribute("speed", MAX_SPEED_OPTIONS[x])
Is there a simple (pythonic) way to maybe gather those pairs in tuples and then assign the same value to both?
If you know a better way to solve my problem using SUMO tools, I'd be happy too. However I'm still interested in how I can solve the given abstract list problem in python as it is not just a simple zip like in related questions.
Well, you can walk the list of edges and nest another iteration over all edges to search for possible partners. Since this is of quadratic complexity, we can even reduce calculation time by only walking over not yet visited edges in the nested run.
Solution
(for a detailed description, scroll down)
import xml.dom.minidom as dom
import random
edgexml = dom.parse('sampledata/tmp.xml')
MSO = [8, 9, 10]
edge_groups = []
passed = []
for idx, edge in enumerate(edgexml.getElementsByTagName('edge')):
if edge in passed:
continue
partners = []
for partner in edgexml.getElementsByTagName('edge')[idx:]:
if partner.getAttribute('from') == edge.getAttribute('to') \
and partner.getAttribute('to') == edge.getAttribute('from'):
partners.append(partner)
edge_groups.append([edge] + partners)
passed.extend([edge] + partners)
for e in edge_groups:
print('NEW EDGE GROUP')
x = random.choice(MSO)
for p in e:
p.setAttribute('speed', x)
print(' E from "%s" to "%s" at "%s"' % (p.getAttribute('from'), p.getAttribute('to'), x))
Yields the output:
NEW EDGE GROUP
E from "0/0" to "0/1" at "8"
E from "0/1" to "0/0" at "8"
NEW EDGE GROUP
E from "0/0" to "1/0" at "10"
NEW EDGE GROUP
E from "0/1" to "0/2" at "9"
Detailed description
edge_groups = []
passed = []
Initialize the result structure edge_groups, which will be a list of lists holding partnered edges in groups. The additional list passed will help us to avoid redundant edges in our result.
for idx, edge in enumerate(edgexml.getElementsByTagName('edge')):
Start iterating over the list of all edges. I use enumerate here to obtain the index at the same time, because our nested iteration will only iterate over a sub-list starting at the current index to reduce complexity.
if edge in passed:
continue
Stop, if we have visited this edge at any point in time before. This does only happen if the edge has been recognized as a partner of another list before (due to index-based sublisting). If it has been taken as the partner of another list, we can omit it with no doubt.
partners = []
for partner in edgexml.getElementsByTagName('edge')[idx:]:
if partner.getAttribute('from') == edge.getAttribute('to') \
and partner.getAttribute('to') == edge.getAttribute('from'):
partners.append(partner)
Initialize helper list to store identified partner edges. Then, walk through all edges in the remaining list starting from the current index. I.e. do not iterate over edges that have already been passed in the outer iteration. If the potential partner is an actual partner (from/to matches), then append it to our partners list.
edge_groups.append([edge] + partners)
passed.extend([edge] + partners)
The nested iteration has passed and partners holds all identified partners for the current edge. Push them into one list and append it to the result variable edge_groups. Since it is unneccessarily complex to check against the 2-level list edge_groups to see whether we have already traversed an edge in the next run, we will additionally keep a list of already used nodes and call it passed.
for e in edge_groups:
print('NEW EDGE GROUP')
x = random.choice(MSO)
for p in e:
p.setAttribute('speed', x)
print(' E from "%s" to "%s" at "%s"' % (p.getAttribute('from'), p.getAttribute('to'), x))
Finally, we walk over all groups of edges in our result edge_groups, randomly draw a speed from MSO (hint: use random.choice() to randomly choose from a list), and assign it to all edges in this group.

How to extract certain path types in igraph?

TLDR: I'd like to extract the edge types of every path between two vertices in igraph. Is there a relatively sane way to do this?
The clinic I work for recently undertook a rather large (1400-person) tuberculosis contact investigation in a high school. I have class schedules for all of the students and teachers (!) and have put them into a network (using igraph in R), with each student and each room-period combination as a vertex (e.g., the class in Room 123 in Period 1 is a vertex with a directed edge to the class that's in Room 123 for Period 2). I also know which rooms share ventilation systems - a plausible but unlikely mechanism for infection. The graph is directed out from sole source case, so every path on the network has only two people in it - the source and a contact, separated by a variable number of room-period vertices. Conceptually, there are four kinds of paths:
personal-contact exposures (source -> contact only)
shared-class exposures (source -> room-period -> contact)
next-period exposures (source-> Room 123 Period 1 -> Room 123 Period
2 -> contact)
ventilation exposures (source -> Room 123 Period 1 -> Room 125 Period
1 -> contact)
Every edge has an attribute indicating whether it's a person-to-person exposure, same-room-different-period, or ventilation edge.
As an intermediate step toward modeling infection on this network, I'd like to just get a simple count of how many exposures of each type a student has had. For example, a student might have shared a class with the source, then later have been in a room the source had been in but a period later, and perhaps the next day been in a ventilation-adjacent room. That student's indicators would then be:
personal.contact: 0
shared.class: 1
next.period: 1
vent: 1
I'm not sure how best to get this kind of info, though - I see functions for getting shortest paths, which makes identifying personal contact links easy, but I think I need to evaluat all paths (which seems like a crazy thing to ask for on a typical social network, but isn't so mad when only the source and the room-periods have out-edges). If I could get to the point where each source-to-contact path were represented by an ordered vector of edge types, I think I could subset them to my criteria easily. I just don't know how to get there. If igraph isn't the right framework for this and I just need to write some big horrible loops over the students' schedules, so be it! But I'd appreciate some guidance before I dive down that hole.
Here's a sample graph of a contact with each of the three indirect paths:
# Strings ain't factors
options(stringsAsFactors = FALSE)
library(igraph)
# Create a sample case
edgelist <- data.frame(out.id = c("source", "source",
"source", "Rm 123 Period 1",
"Rm 125 Period 2", "Rm 125 Period 3",
"Rm 127 Period 4", "Rm 129 Period 4"),
in.id = c("Rm 123 Period 1", "Rm 125 Period 2",
"Rm 127 Period 4", "contact",
"Rm 125 Period 3", "contact",
"Rm 129 Period 4", "contact"),
edge.type = c("Source in class", "Source in class",
"Source in class", "Student in class",
"Class-to-class",
"Student in class", "Vent link",
"Student in class"
)
)
samp.graph <- graph.data.frame(edgelist, directed = TRUE)
# Label the vertices with meaningful names
V(samp.graph)$label <- V(samp.graph)$name
plot(samp.graph, layout = layout.fruchterman.reingold)
I'm not entirely sure that I understand your graph model, but if the question is:
I have two vertices and I wish to extract every path between them,
then extract the edge attributes of those edges.
then perhaps this might work.
Go with a breadth-first search. Igraph contains one but it's easy enough to roll your own, and this will give you more flexibility as to what information you want to get. I assume you have no cycles in your graph - otherwise you'll get an infinite number of paths. I don't know much Python (though I do use igraph in R), so here's some pseudocode.
list <- empty
allSimplePaths(u, v, thisPath)
if (u == v) return
for (n in neighborhood(u))
if (n in thisPath)
next
if (u == v)
list <- list + (thisPath + v)
for (n in neighborhood(u))
thisPath <- thisPath + n
allSimplePaths(n, v, thisPath)
thisPath <- thisPath - thisPath.end
Basically it says "from each vertex, try all possible paths of expansion to get to the end." It's a simple matter to add another thisPathEdges and insert edges, passing it through the function, as well as vertices. Of course this would run better were it not recursive. Be careful, as this algorithm might blow your stack with enough nodes.
You still might want to go with #PaulG 's model, and just have multiple edges between nodes of students. You could do cool things like run a breadth first search to see how the disease spread or find a minimum spanning tree to get a time estimate, or find a min-cut to quarantine an ongoing infection or something.

How do I make my topological sort to be linear time? (Code is well annotated)

Question 1:
What should the correct running time be for a well implemented topological sort. I am seeing different opinions:
Wikipedia says: O(log^2(n))
Geeksforgeeks says: O(V+E)
Question 2:
My implementation is running at O(V*E). Because at worst, I will need to loop through the graph E times and each time I will need to check V items. How do I make my implementation into linear time.
The algorithm works in steps:
Produce the graph in the form of an adjacency list
e.g. this graph
0 - - 2
\
1 -- 3
produces this adjacency list
{0: [], 1: [0], 2: [0], 3: [1, 2]}
0 depends on nothing, 1 depends on 0 etc..
Iterate through the graph and find nodes that does not have any dependencies
def produce_graph(prerequisites):
adj = {}
for course in prerequisites:
if course[0] in adj:
# append prequisites
adj[course[0]].append(course[1])
else:
adj[course[0]] = [course[1]]
# ensure that prerequisites are also in the graph
if course[1] not in adj:
adj[course[1]] = []
return adj
def toposort(graph):
sorted_courses = []
while graph:
# we mark this as False
# In acyclic graph, we should be able to resolve at least
# one node in each cycle
acyclic = False
for node, predecessors in graph.items():
# here, we check whether this node has predecessors
# if a node has no predecessors, it is already resolved,
# we can jump straight to adding the node into sorted
# else, mark resolved as False
resolved = len(predecessors) == 0
for predecessor in predecessors:
# this node has predecessor that is not yet resolved
if predecessor in graph:
resolved = False
break
else:
# this particular predecessor is resolved
resolved = True
# all the predecessor of this node has been resolved
# therefore this node is also resolved
if resolved:
# since we are able to resolve this node
# We mark this to be acyclic
acyclic = True
del graph[node]
sorted_courses.append(node)
# if we go through the graph, and found that we could not resolve
# any node. Then that means this graph is cyclic
if not acyclic:
# if not acyclic then there is no order
# return empty list
return []
return sorted_courses
graph = produce_graph([[1,0],[2,0],[3,1],[3,2]])
print toposort(graph)
Ok this is a good question. So long as the graph is directed acyclic, then depth first search can be used and depth first search has order O(n+m) as is explained here: http://www.csd.uoc.gr/~hy583/reviewed_notes/dfs_dags.pdf
If you are curious networkx has an implementation using depth first search and it is called topological_sort which has source code available for viewing a python implementation.

Categories

Resources