I have developed an algorithm that is kind of a variation of a BFS on a tree, but it includes a probabilistic factor. To check whether a node is the one I am looking for, a statistical test is performed (I won't get into too much detail about this). If the test result is positive, the node is added to another queue (called tested). But when a node fails the test, the nodes in the tested need to be tested again, so this queue is appended to the one with the nodes yet to be tested.
In Python, considering that the queue q starts with the root node:
...
tested = []
while q:
curr = q.pop(0)
p = statistical_test(curr)
if p:
tested.append(curr)
else:
q.extend(curr.children())
q.extend(tested)
tested = []
return tested
As the algorithm is probabilistic, more than one node might be in tested after the search, but that is expected. The problem I am facing is trying to estimate this algorithm's complexity because I can't simply use BFS's complexity as q and tested will have a variable length.
I don't need a closed and definitive answer for this. What I need are some insights on how to deal with this situation.
The worst case scenario is the following process:
All elements 1 : n-1 pass the test and are appended to the tested queue.
Element n fails the test, is removed from q, and n-1 elements from tested are pushed back into q.
Go back to step 1 with n = n-1
This is a classic O(n2) process.
Related
What is the problem about:
I am building an agent-based model with mesa & networkx in python. In one line, the model tries to model how changes in an agent's attitude can influence whether or not they take the decision to adopt a new technology. I m currently attempting to parallelize a part of it, to speed up run time. The number of agents currently are 4000. I keep hitting the error message as follows:
'if i < 256
Recursion depth reached in comparison'
The pseudo-code below outlines the process (after which I explain what I've tried and failed).
Initializes a model of 4000 agents
Gives each agent a set of agents to interact with at every time step at two levels: a) geographic neighbhours, b) 3 social circles.
From each interaction pair in the list, agents' attitudes are compared, some modifications to attitudes are made.
This process repeats for several time-steps, with results of one step carrying over to another.
import pandas as pd
import multiprocessing as mp
import dill
from pathos.multiprocessing import ProcessingPool
def model_initialization():
df = pd.read_csv(path+'4000_household_agents.csv')
for agent in df:
model.schedule.add(agent)
#assign three circles of influence
agent.social_circle1 = social_circle1
agent.social_circle2 = social_circle2
agent.social_circle3 = social_circle3
def assign_interactions():
for agent in schedule.agents:
#geograhic neighbhours
neighbours = agent.get_neighbhours()
interaction_list.append(agent,neighbhour)
#interaction in circles of influence
interaction_list.append(agent, social_circle1)
interaction_list.append(agent, social_circle2)
interaction_list.append(agent, social_circle3)
return interaction_list
def attitude_change(agent1,agent2):
#compare attitudes
if agent1.attitude > agent2.attitude:
# make some change to attitudes
agent1.attitude -= 0.2
agent2.attitude += 0.2
return agent1.attitude,agent2.attitude
def interactions(interaction):
agent1 = interaction[0]
agent2 = interaction[1]
agent1.attitude,agent2.attitude = attitude_change(agent1,agent2)
def main():
model_initialization()
interaction_list= assign_interactions()
#pool = mp.Pool(10)
pool = ProcessingPool(10)
#interaction list can have over and above 89,000 interactions atleast
results = pool.map(interactions, [interaction for interaction in interaction_list])
# run this process several times
for i in range(12):
main()
What I've tried
Because the model step is sequential, the only part I can parallelize is the interactions( ) function. Because I thought the interaction loop is called more 90,000 times, I reset the sys.setrecursionlimit( ) to about 100,000. Fails.
I have broken the interactions_list to several chunks of 500 each and pooled the processes for each chunk. Same error.
To see if something was absolutely wrong, I only took the first 35 elements (a small number) of the interactions list and only ran that. It still hits recursion depth.
Can anyone help me see which part of the code hits recursion depth? I tried both dill + multiprocessing as well as multiprocessing alone. The latter gives 'pickling error'.
I want to do some analysis on graphs to find all the possible simple paths between all pairs of nodes in graph. With help of Networkx library I can use DFS to find all possible paths between 2 nodes with this function:
nx.all_simple_paths(G,source,target)
The below code runs without any workload since my toy example contains only 6 nodes in the graph. However, in my real task, my graph contains 5,213 nodes and 11,377,786 edges and finding all possible simple paths in this graph is impossible with below solution:
import networkx as nx
graph = nx.DiGraph()
graph.add_weighted_edges_from(final_edges_list)
list_of_nodes = list(graph.nodes())
paths = {}
for n1 in list_of_nodes:
for n2 in list_of_nodes:
if n1 != n2:
all_simple_paths = list(nx.all_simple_paths(graph,n1,n2))
paths[n1+ "-"+n2] = all_simple_paths
The "paths" dictionary holds the "n1-n2" (source node and target node respectively) as keys, and list of all simple paths as values.
The question is whether I can use of multi processing in this scenario in order to run this code on my original problem or not. My knowledge about the processors, threads, shared memory and CPU cores are very naive and I am not sure if I can really use the concurrency (running my nested loops in parallel) in my task.
I use a windows server with 128 GB RAM and 32 core CPU.
PS: Thorough searching the net (mostly StackOverFlow), I've found solutions which recommended to use threading and others recommended multiprocessing. I am not sure if I understand the distinction between these two :|
If you want to use threading then use threadpool executor to submit your function call to a thread. It will return a future object. Future.result() will return the value returned by the call. If the call hasn’t yet completed then this method will wait up to timeout seconds.If call is not completed till that time it will raise the TimeoutError.
with ThreadPoolExecutor() as executor:
for n1 in list_of_nodes:
for n2 in list_of_nodes:
if n1 != n2:
all_simple_paths_futures = executor.submit(nx.all_simple_paths, graph,n1,n2)
paths[n1+ "-"+n2] = all_simple_paths_futures
try:
for key in paths.keys():
# get back results from thread
future_obj = paths[key]
paths[key]= list(future_obj.result())
except Exception as e:
print(e)
raise e
For the difference between multiprocessing and threads, check this link :Multiprocessing vs Threading Python
My requirement is to generate a list permissible combinations. Below code is the simplified version which meets my need.
def getChild(tupF):
if len(tupF) <= 60:
for val in range(1,10): #in actual requirement, it is not a fixed range, but some complex processing to determing the list which need to be appended
t=list(tupF) #I am converting a tuple to list and after appending it, back to tuple as if I just handle as list, some how it didn't work
t.append(val)
getChild(tuple(t))
t=[]
else:
print(tupF)
tup = ()
getChild(tup)
But, as the number of levels are high (60) and each of my combination is completely independent of each other, I would like to make this code multiprocess one.
I tried adding
t.append(val)
tmpLst.append(tuple(t))
t=[]
if __name__ == '__main__':
pool = Pool(processes=3)
pool.map(getChild,tmpLst)
But this didn't work as my worker process is trying to sub-divide further. In my case, I don't think the sub-process would explode as once the parent process has called a set of child process, I am OK with terminating the parent process as the all the desired information are in the tuple I am passing to the child process.
Please let me know whether this problem is right candidate for multiprocessing, if yes provide some guidance on how to make it multiprocess so that I can reduce the computational time. I have no prior experience in writing multiprocessing code, so if you can point to a relevant example, that would be great. Thanks.
I'm working on implementing a randomized algorithm in python. Since this involves doing the same thing many (say N) times, it rather naturally parallelizes and I would like to make use of that. More specifically, I want to distribute the N iterations on all of the cores of my CPU. The problem in question involves computing the maximum of something and is thus something where every worker could compute their own maximum and then only report that one back to the parent process, which then only needs to figure out the global maximum out of those few local maxima.
Somewhat surprisingly, this does not seem to be an intended use-case of the multiprocessing module, but I'm not entirely sure how else to go about it. After some research I came up with the following solution (toy problem to find the maximum in a list that is structurally the same as my actual one):
import random
import multiprocessing
l = []
N = 100
numCores = multiprocessing.cpu_count()
# globals for every worker
mySendPipe = None
myRecPipe = None
def doWork():
pipes = zip(*[multiprocessing.Pipe() for i in range(numCores)])
pool = multiprocessing.Pool(numCores, initializeWorker, (pipes,))
pool.map(findMax, range(N))
results = []
# collate results
for p in pipes[0]:
if p.poll():
results.append(p.recv())
print(results)
return max(results)
def initializeWorker(pipes):
global mySendPipe, myRecPipe
# ID of a worker process; they are consistently named PoolWorker-i
myID = int(multiprocessing.current_process().name.split("-")[1])-1
# Modulo: When starting a second pool for the second iteration of doWork() they are named with IDs 5-8.
mySendPipe = pipes[1][myID%numCores]
myRecPipe = pipes[0][myID%numCores]
def findMax(count):
myMax = 0
if myRecPipe.poll():
myMax = myRecPipe.recv()
value = random.choice(l)
if myMax < value:
myMax = value
mySendPipe.send(myMax)
l = range(1, 1001)
random.shuffle(l)
max1 = doWork()
l = range(1001, 2001)
random.shuffle(l)
max2 = doWork()
return (max1, max2)
This works, sort of, but I've got a problem with it. Namely, using pipes to store the intermediate results feels rather silly (and is probably slow). But it also has the real problem, that I can't send arbitrarily large things through the pipe, and my application unfortunately sometimes exceeds this size (and deadlocks).
So, what I'd really like is a function analogue to the initializer that I can call once for every worker on the pool to return their local results to the parent process. I could not find such functionality, but maybe someone here has an idea?
A few final notes:
I use a global variable for the input because in my application the input is very large and I don't want to copy it to every process. Since the processes never write to it, I believe it should never be copied (or am I wrong there?). I'm open to suggestions to do this differently, but mind that I need to run this on changing inputs (sequentially though, just like in the example above).
I'd like to avoid using the Manager-class, since (by my understanding) it introduces synchronisation and locks, which in this problem should be completely unnecessary.
The only other similar question I could find is Python's multiprocessing and memory, but they wish to actually process the individual results of the workers, whereas I do not want the workers to return N things, but to instead only run a total of N times and return only their local best results.
I'm using Python 2.7.15.
tl;dr: Is there a way to use local memory for every worker process in a multiprocessing pool, so that every worker can compute a local optimum and the parent process only needs to worry about figuring out which one of those is best?
You might be overthinking this a little.
By making your worker-functions (in this case findMax) actually return a value instead of communicating it, you can store the result from calling pool.map() - it is just a parallel variant of map, after all! It will map a function over a list of inputs and return the list of results of that function call.
The simplest example illustrating my point follows your "distributed max" example:
import multiprocessing
# [0,1,2,3,4,5,6,7,8]
x = range(9)
# split the list into 3 chunks
# [(0, 1, 2), (3, 4, 5), (6, 7, 8)]
input = zip(*[iter(x)]*3)
pool = multiprocessing.Pool(2)
# compute the max of each chunk:
# max((0,1,2)) == 2
# max((3,4,5)) == 5
# ...
res = pool.map(max, input)
print(res)
This returns [2, 5, 8].
Be aware that there is some light magic going on: I use the built-in max() function which expects iterables as input. Now, if I would only pool.map over a plain list of integers, say, range(9), that would result in calls to max(0), max(1) etc. - not very useful, huh? Instead, I partition my list into chunks, so effectively, when mapping, we now map over a list of tuples, thus feeding a tuple to max on each call.
So perhaps you have to:
return a value from your worker func
think about how you structure your input domain so that you feed meaningful chunks to each worker
PS: You wrote a great first question! Thank you, it was a pleasure reading it :) Welcome to StackOverflow!
I have a node tree where every node has an id (node number), a list over children and a debth indicator. I am then given a list over nodes which i am to find the debth of. To do this i use a recursive function.
This is all fine and dandy but I want to speed the process up. I've been looking into multiprocessing, but every time I try it, the calculation time goes up (the higher process count, the longer runtime) compared to using no other processes at all.
My code looks like junk from trying to understand a lot of different examples, so il post this psuedocode instead.
class Node:
id = int
children = int[]
debth = int
function makeNodeTree() ...
function find(x, node):
for c in node.children:
if c.id == x: return c
else:
if find(x, c) != None: return result
return None
function main():
search = [nodeid, nodeid, nodeid...]
timerstart
for x in search: find(x, rootNode)
timerstop
timerstart
<split list over number of processes>
<do some multiprocess magic>
<get results>
timerstop
compare the two
I've tried all kinds off tree sizes to see if there is any gain at all, but i have yet to find such a case, which leads me thinking I'm doing something wrong. I guess what I'm asking for is an example/way of doing this traversal with a performance gain, using multiprocessing.
I know there are plenty ways to organize nodes to make this task easy, but i want to check the possible(?) performance boost, if it is possible at all.
Multiprocessing has overhead because every time you add a process it takes time to set it up. Also if you are using standard Python threads you are unlikely to get any speedup because all the threads will still run on one processor. So three thoughts (1) are your really so big that you need to speed it up? (2) spawn subprocesses (3) don't use paralellism at each node, just at the top few levels to minimize overhead.