Rat with the randomize path in and 2-D array - python

The problem is similar to rat-maze problem. I have given an 2-d array MxN. each cell of an array is either 1 or 0 ,where 1 means blocked. I have given 2 points (starting point and ending point). I have to go from start index to end index. But the catch is 1) Path should be random. 2) There should be some parameter which allow me to decide how much random it can be. (i.e how crazily it should wander before reaching to its destination.) 3) Path should not intersect itself.(like a snake game).
This algorithm is needed to create population (randomly) which will used as input for genetic model for further optimize it.
For now i have used bfs and created one solution. But the problem is i cannot create any no of random path with this (which i will later use as population) + i'm unable to formalize the idea of how much random it should be.
This is my code that only produces min path by using bfs
def isSafe(x,y,length):
if ((x<length) and (x>-1) and (y<length) and (y>-1)):
return True;
return False;
def path(room,x1,y1,x2,y2,distance):
roomSize=len(room);
if ((x1==x2) and (y1==y2)):
room[x1][y1]=distance+1
return
queue=[[x1,y1]]
room[x1][y1]=0
start=0
end=0
while start<=end:
x,y=queue[start]
start+=1
distance=room[x][y]
for i in [-1,1]:
if isSafe(x+i,y,roomSize):
if room[x+i][y]=="O":
queue.append([x+i,y])
room[x+i][y]=distance+1
end+=1;
for i in [-1,1]:
if isSafe(x,y+i,roomSize):
if room[x][y+i]=="O":
queue.append([x,y+i])
room[x][y+i]=distance+1
end+=1;
def retrace(array,x1,y1,x2,y2):
roomSize=len(array)
if not (isSafe(x2,y2,roomSize)):
print("Wrong Traversing Point");
if type(array[x2][y2])==str:
print("##################No Pipe been installed due to path constrained################")
return [];
distance=array[x2][y2];
path=[[x2,y2]]
x=0
while not (array[x2][y2]==0):
if ((isSafe(x2+1,y2,roomSize)) and type(array[x2+1][y2])==int and array[x2+1][y2]==array[x2][y2]-1):
x2+=1;
path.append([x2,y2]);
elif ((isSafe(x2-1,y2,roomSize)) and type(array[x2-1][y2])==int and array[x2-1][y2]==array[x2][y2]-1):
x2-=1;
path.append([x2,y2])
elif ((isSafe(x2,y2+1,roomSize)) and type(array[x2][y2+1])==int and array[x2][y2+1]==array[x2][y2]-1):
y2+=1;
path.append([x2,y2]);
elif ((isSafe(x2,y2-1,roomSize)) and type(array[x2][y2-1])==int and array[x2][y2-1]==array[x2][y2]-1):
y2-=1;
path.append([x2,y2]);
return path;

Related

Optimizing python function for Numba

The following is the python function which I am trying to rewrite in python:
def freeoh_count_nojit(coord=np.array([[]]),\
molInterfaceIndex=np.array([]),\
hNeighbourList=np.array([]),\
topol=np.array([[]]),\
cos_HAngle=0.0,\
cellsize=np.array([]),\
is_orig_def=False,is_new_def=True):
labelArray=[]
freeOHcosDA=[]; freeOHcosDAA=[]
mol1Coord=np.zeros((3,3),dtype=float)
labelArray=np.empty(molInterfaceIndex.shape[0], dtype="U10")
for i in range(molInterfaceIndex.shape[0]): # loop over selected molecules
mol2CoordList=[]; timesave=[]
mol1Coord=np.array([coord[k] for k in topol[molInterfaceIndex[i]]]) # extract center molecule
gen = np.array([index for index in hNeighbourList[i] if index!=-1]) # remove padding
for j in range(gen.shape[0]):
mol2CoordList.append([coord[k] for k in topol[gen[j]]]) # extract neighbors
mol2Coord=np.array(mol2CoordList).reshape(-1,3)
if is_orig_def:
acceptor,donor,cosAngle=interface_hbonding_orig(mol1Coord,mol2Coord,cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor))+"A"*np.clip(np.array([np.sum(acceptor)]),1,2)[0]
elif is_new_def:
acceptor,donor,cosAngle=interface_hbonding_new(mol1Coord,mol2Coord,cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor))+"A"*np.sum(acceptor)
if labelArray[i] in "DA":
freeOHcosDA.append(cosAngle)
elif labelArray[i] in "DAA":
freeOHcosDAA.append(cosAngle)
freeOHcos=freeOHcosDA+freeOHcosDAA
return labelArray, freeOHcos
The function takes in a coordinates frame of the simulation molecules. The code selects a central molecule from an index list molInterfaceIndex and extracts its neighbouring molecules coordinates from a pre-generated neighbour (generated from scipy.spatial.KDTree hence cannot be called from a jitted function). The central molecule and its neighbour are send to a jitted function which then returns two [1,1] and a scaler which are then used to label the central molecule.
My attempt at rewriting the above python function is below:
#njit(cache=True,parallel=True)
def freeoh_count_jit(coord=np.array([[]]),\
molInterfaceIndex=np.array([]),\
hNeighbourList=np.array([]),\
topol=np.array([[]]),\
cos_HAngle=0.0,\
cellsize=np.array([]),\
is_orig_def=False,is_new_def=True):
NAtomsMol=3 #No. of atoms in a molecule
_M=molInterfaceIndex.shape[0]
_N=hNeighbourList.shape[1]
mol1Coord=np.zeros((NAtomsMol,3),dtype=np.float64)
mol2Coord=np.zeros((_N*NAtomsMol,3),dtype=np.float64)
acceptor=np.zeros((_M,2),dtype=int)
donor=np.zeros((_M,2),dtype=int)
cosAngle=np.zeros(_M,dtype=np.float64)
gen=np.zeros(_M,dtype=int)
freeOHMask = np.zeros(_M, dtype=int) == 0
labelArray=np.empty(_M, dtype="U10")
for i in range(_M): # loop over selected molecules
for index,j in enumerate(topol[molInterfaceIndex[i]]):
mol1Coord[index]=coord[j] # extract center molecule
for indexJ,j in enumerate(hNeighbourList[i]):
for indexK,k in enumerate(topol[j]):
mol2Coord[indexK+topol[j].shape[0]*indexJ]=coord[k] # extract neighbors
gen[i] = len(np.array([index for index in hNeighbourList[i] if index!=-1]))*NAtomsMol # get actual number of neighbor atoms
if is_orig_def:
acceptor[i],donor[i],cosAngle[i]=interface_hbonding_orig(mol1Coord,mol2Coord[:gen[i]],cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor[i]))+"A"*np.clip(np.array([np.sum(acceptor[i])]),1,2)[0]
elif is_new_def:
acceptor[i],donor[i],cosAngle[i]=interface_hbonding_new(mol1Coord,mol2Coord[:gen[i]],cos_HAngle,cellsize)
labelArray[i]="D"*np.abs(2-np.sum(donor[i]))+"A"*np.sum(acceptor[i])
freeOHMask[np.where(cosAngle > 1.0)] = False
return acceptor, donor, labelArray, freeOHMask
The main issue is that #jit function seem to be providing incorrect results while using numba.prange for the outer loop. Also, the execution time for the function increases per call which is a bit confusing. The functions interface_hbonding_orig() and interface_hbonding_new() are already jitted so I think they are out of scope of discussion here. One of the bigger questions is that whether I even to jit this function at all as the most time consuming part is supposing to be the array selection in the initial few initial lines in the outer loop. If anyone has any suggestions for rewriting this function or even for algorithm design, it would be really helpful.

Find shortest path on graph with 1 character difference

I have a somewhat complicated question. I am provided a list of words (each word has the same length). I am given two words into my function (StartNode and EndNode) and my task is to find the shortest path between the two (A follow-up would be how to collect all paths from startNode to EndNode). The words can only be connected if they have at most a 1 word difference. For example, TRIE and TREE could be connected since they only differ by one letter (I v E) but TRIE and TWEP can't be connected since they have 2 character differences.
My solution was to first build an adjacency list, which I successfully implemented, and then compute a BFS to determine whether a path exists between the startNode and endNode. I am able to determine if a path exists but I'm unsure on how I can keep track of the path.
My attempt is as follows:
def shortestPath(startNode, endNode, words):
adjList=createGraph(words)
print(adjList)
#Returns shortest path from startNode to EndNode
visited=set()
q=collections.deque()
total=-1
q.append(startNode)
while q:
node=q.popleft()
visited.add(node)
if node==endNode:
if node!=startNode:
return total+1
total=total+1
for i in adjList[node]:
if i not in visited:
print(i)
q.append(i)
return -1
My BFS doesn't take in the path and the total_length is quite obviously wrong too. Is there any way I can improve my algorithm?
Sample Input:
{'POON': ['POIN', 'LOON'], 'PLEE': ['PLEA', 'PLIE'], 'SAME': [], 'POIE': ['PLIE', 'POIN'], 'PLEA': ['PLEE', 'PLIE'], 'PLIE': ['PLEE', 'POIE', 'PLEA'], 'POIN': ['POON', 'POIE'], 'LOON': ['POON']}
startWord: POON
endWord: PLEA
Expected Output:
POON -> POIN -> POIE -> PLIE -> PLEA
Current Output:
POIN
LOON
POIE
PLIE
PLEE
PLEA
PLEA
6
Any tips on where I am going wrong?
For anyone who stumbled upon this question, I did figure out a solution. A normal BFS just figures out if a path exists to the node and implicitly goes through the shortest traversal BUT if you want to show that traversal (path or length of path), it becomes necessary to keep two more counters.
In this case, I kept a counter of predecessor and a distance from source, my function therefore becomes:
def shortestPath(startNode, endNode, words):
adjList=createGraph(words)
print(adjList)
#Returns shortest path from startNode to EndNode
visited=set()
pred={i:-1 for i in adjList} #Keep the predecessor to each node as -1 initially
dist = {i:10000000 for i in adjList} #Initially set distance for each node from src to max
#Distance and Predecessor:
dist[startNode]=0 #initialize distance of distance from startNode to startNode =0
q=collections.deque()
total=-1
q.append(startNode)
while q:
node=q.popleft()
visited.add(node)
if node==endNode:
if node!=startNode:
findShortestPath(startNode, endNode, pred) #Pass it into another helper function since pred is already constructed
return total+1
total=total+1
for i in adjList[node]:
if i not in visited:
dist[i]=dist[node]+1
pred[i]=node
q.append(i)
#If there is no available path between the two Nodes
return -1
When the BFS is complete, we will also have a pred and distance array set up. Predecessor would contain each node and its predecessor in the path from start -> end (and -1 if no connection exists). To print out the path from start-> end, we could use a helper function.
Additionally, I also kept the distance dictionary. It would show the path to each node.
Helper Function:
def findShortestPath(startNode, endNode, pred):
path=[]
crawl=endNode
path.append(crawl)
while (pred[crawl]!=-1):
path.append(pred[crawl])
crawl=pred[crawl]
path.reverse()
print(path)
This is kind of a Djikstra's algorithm approach but I'm unsure on how else I can achieve this

Compute reachability of elements in a list of tuples

I have a list of tuples like this.
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
In this list 1 relates to 2 and then 2 relates to 5 and 5 relates to 6 therefore 1 relates to 6. Similarly I need to find the relations between other elements in tuples. I need a function that takes the input values and outputs as follows:
input = (1,6) #output = True
input = (5,3) #output = True
input = (2,8) #output = False
I do not have knowledge of itertools or map functions. Can they be used to solve these types of problems?
And for the sake of curiosity and interest where can I find these types of questions to solve and where are these types of problems encountered in real life situations?
This can be easily done by considering the tuples as edges in a graph. The question is then reduced to checking if there is a path between the two nodes.
There exists lots of nice libraries for this, see e.g. networkx
import networkx as nx
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
G = nx.Graph(a)
nx.has_path(G, 1, 6) # True
nx.has_path(G, 5, 3) # True
nx.has_path(G, 2, 8) # False
This answer here nicely states your problem as a graph problem, where every time you need to run your algorithm you need to check for the existence of a path between your input vertices. The time complexity for every query then depends on the size, order, diameter, degree of the underlying graph.
However, if you intend to run this algorithm many times with the same array a, it may be worth doing some preprocessing on the input graph to find the connected components (Wikipedia : connected components) first. In that case you can get constant time for every query. Here is the code I suggest :
# NOTE : tested using python 3.6.1
# WARNING : no input sanitization
a = [(1,2),(1,3),(1,4),(2,5),(6,5),(7,8)]
n = 8 # order of the underlying graph
# prepare graph as lists of neighbors for every vertex, i.e. adjacency lists (extra unused vertex '0', just to match the value range of the problem)
graph = [[] for i in range(n+1)]
for edge in a:
graph[edge[0]].append(edge[1])
graph[edge[1]].append(edge[0])
print( "graph : " + str(graph) )
# list of unprocessed vertices : contains all of them at the beginning
unprocessed_vertices = {i for i in range(1,n+1)}
# subroutine to discover the connected component of a vertex
def build_component():
component = [] # current connected component
curr_vertices = {unprocessed_vertices.pop()} # locally unprocessed vertices, initialize with one of the globally unprocessed vertices
while len(curr_vertices) > 0:
curr_vertex = curr_vertices.pop() # vertex to be processed
# add unprocessed neighbours of current vertex to the set of vertices to process
for neighbour in graph[curr_vertex]:
if neighbour in unprocessed_vertices:
curr_vertices.add(neighbour)
unprocessed_vertices.remove(neighbour)
component.append(curr_vertex)
return component
# main algorithm : graph traversal on multiple connected components
components = []
while len(unprocessed_vertices) > 0:
components.append( build_component() )
print( "components : " + str(components) )
# assign a number to each component
component_numbers = [None] * (n+1)
curr_number = 1
for comp in components:
for vertex in comp:
component_numbers[vertex] = curr_number
curr_number += 1
print( "component_numbers : " + str(component_numbers) )
# main functionality
def is_connected( pair ):
return component_numbers[pair[0]] == component_numbers[pair[1]]
# run main functionnality on inputs : every call is executed in constant time now, regardless of the size of the graph
print( is_connected( (1,6) ) )
print( is_connected( (5,3) ) )
print( is_connected( (2,8) ) )
I don't really know about the most likely situations where this problem could be encountered, but I suppose it can have application is some clustering tasks, or maybe if you want to know if it is possible to go from one place to another. If the edges of the graph represent dependencies between modules, this problem would tell you if two parts depend on each other, so maybe some potential applications in compiling or the managment of large projects. The underlying problem is a "Connected component" problem which is among the problems we know polynomial algorithms for.
It is generally very useful to model these kind of problems with graphs as these objects have a very simple structure, and most of the time we can reduce the original problem to a well known problem on graphs.

What data structure is good for maintaining file paths?

I'm working on the "Longest Absolute filepath" problem on LeetCode. This is a simple problem that asks "What is the length of the longest absolute file path in a given directory". And my working solution is as follows. The file directory is given as a string.
def lengthLongestPath(self, input):
"""
:type input: str, the file directory
:rtype: int
"""
current_folder_path = [""] * 40
longest_file_path_size = 0
for item in input.split("\n"):
num_tabs = item.count("\t")
print num_tabs
if "." not in item:
current_folder_path[num_tabs] = item.lstrip("\t")
else:
absolute_file_path = "/".join(current_folder_path[:num_tabs] + [item.lstrip("\t")])
print item
print num_tabs, absolute_file_path, current_folder_path
longest_file_path_size = max(len(absolute_file_path), longest_file_path_size)
return longest_file_path_size
This works. However, note that on line current_folder_path = [""] * 40 is very unelegant. This was a line to remember the current file path. I wonder if there is a way to remove this.
The problem statement does not address some fine points. It is very unclear what path may correspond to the string
a\n\t\tb
Is it a//b or plain illegal? If the former, do we need to normalize it?
I guess it is safe to assume that such paths are illegal. In other words, the path depth only grows by 1, and the current_folder_path in fact functions like a stack. You don't need to preinitialize it, but just push the name when num_tabs exceeds its size, and pop as necessary.
As a side note, since join is linear to the current accumulated length, the entire algorithm seems quadratic, which violates the time complexity requirement.

recursion to iteration in python

We are trying to make a cluster analysis for a big amount of data. We are kind of new to python and found out that an iterative function is way more efficient than an recursive one. Now we are trying to change that but it is way harder than we thought.
This code underneath is the heart of our clustering function. This takes over 90 percent of the time. Can you help us to change that into a recursive one?
Some extra information: The taunach function gets neighbours of our point which will later form the clusters. The problem is that we have many many points.
def taunach(tau,delta, i,s,nach,anz):
dis=tabelle[s].dist
#delta=tau
x=data[i]
y=Skalarprodukt(data[tabelle[s].index]-x)
a=tau-abs(dis)
#LA.norm(data[tabelle[s].index]-x)
if y<a*abs(a):
nach.update({item.index for item in tabelle[tabelle[s].inner:tabelle[s].outer-1]})
anz = anzahl(delta, i, tabelle[s].inner, anz)
if dis>-1:
b=dis-tau
if y>=b*abs(b):#*(1-0.001):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
else:
if y<tau**2:
nach.add(tabelle[s].index)
if y < delta:
anz += 1
if tabelle[s].dist>-4:
b = dis - tau
if y>=b*abs(b):#*(1-0.001)):
nach,anz=taunach(tau,delta, i,tabelle[s].outer,nach,anz)
if tabelle[s].dist > -1:
if y<=(dis+tau)**2:
nach,anz=taunach(tau,delta, i,tabelle[s].inner,nach,anz)
return nach,anz

Categories

Resources