Running functions parallel using Python - python

I have a question about running function parallel in Python.
I have tried using multi processing to reduce time sending and receiving data from API but when I execute code below, it tend to crash my IDE.
def network_request_function(value)
#this function sends requests using value.
for i in list:
p1 = Process(target=network_request_function, args=(i,))
p1.start()
Can you provide a way to fix my code?
Or are there better alternatives?

You should specify what platform this is running on what your IDE is. Also, if all network_request_function is doing is making a network request and awaiting a reply which gets no further processing requiring intensive CPU, then this seems like it should be using multithreading instead of multiprocessing and a multithreading pool where the number of concurrent threads can be limited in case the length of your input list is very large and where it is simpler to get a return value from network_request_function that you might be interested in. And you should not use a name, such as list, that happens to be the name of a built-in function or class for naming a variable.
For example:
def network_request_function(value):
#this function sends requests using value and returns the reply
return reply
if __name__ == '__main__': # Required if we switch to multiprocessing
# To use multiprocessing:
#from multiprocessing.pool import Pool as Executor
# To use multithreading:
from multiprocessing.pool import ThreadPool as Executor
# inputs is our list of value arguments used with network_request_function:
inputs = []; # This variable is set somewhere
# May need to be a smaller number if we are using multiprocessing and
# depending on the platform:
MAX_POOL_SIZE = 200
pool_size = min(len(inputs), MAX_POOL_SIZE)
with Executor(pool_size) as pool:
# Get list of all replies:
replies = pool.map(network_request_function, inputs)

Related

How do I run two looping functions parallel to each other? [duplicate]

Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

Separate computation from socket work in Python

I'm serializing column data and then sending it over a socket connection.
Something like:
import array, struct, socket
## Socket setup
s = socket.create_connection((ip, addr))
## Data container setup
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
## Binarize data
columns['col1'] = array.array('i', range(10000))
columns['col2'] = array.array('f', [float(num) for num in range(10000)])
.
.
.
## Send away
chunk = b''.join(columns[col_name] for col_name in ordered_col_list]
s.sendall(chunk)
s.recv(1000) #get confirmation
I wish to separate the computation from the sending, put them on separate threads or processes, so I can keep doing computations while data is sent away.
I've put the binarizing part as a generator function, then sent the generator to a separate thread, which then yielded binary chunks via a queue.
I collected the data from the main thread and sent it away. Something like:
import array, struct, socket
from time import sleep
try:
import thread
from Queue import Queue
except:
import _thread as thread
from queue import Queue
## Socket and queue setup
s = socket.create_connection((ip, addr))
chunk_queue = Queue()
def binarize(num_of_chunks):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
.
.
yield b''.join((columns[col_name] for col_name in ordered_col_list))
def chunk_yielder(queue):
''' Generate binary chunks and put them on a queue. To be used from a thread '''
while True:
try:
data_gen = queue.get_nowait()
except:
sleep(0.1)
continue
else:
for chunk in data_gen:
queue.put(chunk)
## Setup thread and data generator
thread.start_new_thread(chunk_yielder, (chunk_queue,))
num_of_chunks = 100
data_gen = binarize(num_of_chunks)
queue.put(data_gen)
## Get data back and send away
while True:
try:
binary_chunk = queue.get_nowait()
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) #Get confirmation
However, I did not see and performance imporovement - it did not work faster.
I don't understand threads/processes too well, and my question is whether it is possible (at all and in Python) to gain from this type of separation, and what would be a good way to go about it, either with threads or processess (or any other way - async etc).
EDIT:
As far as I've come to understand -
Multirpocessing requires serializing any sent data, so I'm double-sending every computed data.
Sending via socket.send() should release the GIL
Therefore I think (please correct me if I am mistaken) that a threading solution is the right way. However I'm not sure how to do it correctly.
I know cython can release the GIL off of threads, but since one of them is just socket.send/recv, my understanding is that it shouldn't be necessary.
You have two options for running things in parallel in Python, either use the multiprocessing (docs) library , or write the parallel code in cython and release the GIL. The latter is significantly more work and less applicable generally speaking.
Python threads are limited by the Global Interpreter Lock (GIL), I won't go into detail here as you will find more than enough information online on it. In short, the GIL, as the name suggests, is a global lock within the CPython interpreter that ensures multiple threads do not modify objects, that are within the confines of said interpreter, simultaneously. This is why, for instance, cython programs can run code in parallel because they can exist outside the GIL.
As to your code, one problem is that you're running both the number crunching (binarize) and the socket.send inside the GIL, this will run them strictly serially. The queue is also connected very strangely, and there is a NameError but let's leave those aside.
With the caveats already pointed out by Jeremy Friesner in mind, I suggest you re-structure the code in the following manner: you have two processes (not threads) one for binarising the data and the other for sending data. In addition to those, there is also the parent process that started both children, and a queue connecting child 1 to child 2.
Subprocess-1 does number crunching and produces crunched data into a queue
Subprocess-2 consumes data from a queue and does socket.send
in code the setup would look something like
from multiprocessing import Process, Queue
work_queue = Queue()
p1 = Process(target=binarize, args=(100, work_queue))
p2 = Process(target=send_data, args=(ip, port, work_queue))
p1.start()
p2.start()
p1.join()
p2.join()
binarize can remain as it is in your code, with the exception that instead of a yield at the end, you add elements into the queue
def binarize(num_of_chunks, q):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
data = b''.join((columns[col_name] for col_name in ordered_col_list))
q.put(data)
send_data should just be the while loop from the bottom of your code, with the connection open/close functionality
def send_data(ip, addr, q):
s = socket.create_connection((ip, addr))
while True:
try:
binary_chunk = q.get(False)
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) # Get confirmation
# maybe remember to close the socket before killing the process
Now you have two (three actually if you count the parent) processes that are processing data independently. You can force the two processes to synchronise their operations by setting the max_size of the queue to a single element. The operation of these two separate processes is also easy to monitor from the process manager on your computer top (Linux), Activity Monitor (OsX), don't remember what it's called under Windows.
Finally, Python 3 comes with the option of using co-routines which are neither processes nor threads, but something else entirely. Co-routines are pretty cool from a CS point of view, but a bit of a head scratcher at first. There is plenty of resources to learn from though, like this post on Medium and this talk by David Beazley.
Even more generally, you might want to look into the producer/consumer pattern, if you are not already familiar with it.
If you are trying to use concurrency to improve performance in CPython I would strongly recommend using multiprocessing library instead of multithreading. It is because of GIL (Global Interpreter Lock), which can have a huge impact on execution speed (in some cases, it may cause your code to run slower than single threaded version). Also, if you would like to learn more about this topic, I recommend reading this presentation by David Beazley. Multiprocessing bypasses this problem by spawning a new Python interpreter instance for each process, thus allowing you to take full advantage of multi core architecture.

Creating a Queue delay in a Python pool without blocking

I have a large program (specifically, a function) that I'm attempting to parallelize using a JoinableQueue and the multiprocessing map_async method. The function that I'm working with does several operations on multidimensional arrays, so I break up each array into sections, and each section evaluates independently; however I need to stitch together one of the arrays early on, but the "stitch" happens before the "evaluate" and I need to introduce some kind of delay in the JoinableQueue. I've searched all over for a workable solution but I'm very new to multiprocessing and most of it goes over my head.
This phrasing may be confusing- apologies. Here's an outline of my code (I can't put all of it because it's very long, but I can provide additional detail if needed)
import numpy as np
import multiprocessing as mp
from multiprocessing import Pool, Pipe, JoinableQueue
def main_function(section_number):
#define section sizes
array_this_section = array[:,start:end+1,:]
histogram_this_section = np.zeros((3, dataset_size, dataset_size))
#start and end are defined according to the size of the array
#dataset_size is to show that the histogram is a different size than the array
for m in range(1,num_iterations+1):
#do several operations- each section of the array
#corresponds to a section on the histogram
hist_queue.put(histogram_this_section)
#each process sends their own part of the histogram outside of the pool
#to be combined with every other part- later operations
#in this function must use the full histogram
hist_queue.join()
full_histogram = full_hist_queue.get()
full_hist_queue.task_done()
#do many more operations
hist_queue = JoinableQueue()
full_hist_queue = JoinableQueue()
if __name__ == '__main__':
pool = mp.Pool(num_sections)
args = np.arange(num_sections)
pool.map_async(main_function, args, chunksize=1)
#I need the map_async because the program is designed to display an output at the
#end of each iteration, and each output must be a compilation of all processes
#a few variable definitions go here
for m in range(1,num_iterations+1):
for i in range(num_sections):
temp_hist = hist_queue.get() #the code hangs here because the queue
#is attempting to get before anything
#has been put
hist_full += temp_hist
for i in range(num_sections):
hist_queue.task_done()
for i in range(num_sections):
full_hist_queue.put(hist_full) #the full histogram is sent back into
#the pool
full_hist_queue.join()
#etc etc
pool.close()
pool.join()
I'm pretty sure that your issue is how you're creating the Queues and trying to share them with the child processes. If you just have them as global variables, they may be recreated in the child processes instead of inherited (the exact details depend on what OS and/or context you're using for multiprocessing).
A better way to go about solving this issue is to avoid using multiprocessing.Pool to spawn your processes and instead explicitly create Process instances for your workers yourself. This way you can pass the Queue instances to the processes that need them without any difficulty (it's technically possible to pass the queues to the Pool workers, but it's awkward).
I'd try something like this:
def worker_function(section_number, hist_queue, full_hist_queue): # take queues as arguments
# ... the rest of the function can work as before
# note, I renamed this from "main_function" since it's not running in the main process
if __name__ == '__main__':
hist_queue = JoinableQueue() # create the queues only in the main process
full_hist_queue = JoinableQueue() # the workers don't need to access them as globals
processes = [Process(target=worker_function, args=(i, hist_queue, full_hist_queue)
for i in range(num_sections)]
for p in processes:
p.start()
# ...
If the different stages of your worker function are more or less independent of one another (that is, the "do many more operations" step doesn't depend directly on the "do several operations" step above it, just on full_histogram), you might be able to keep the Pool and instead split up the different steps into separate functions, which the main process could call via several calls to map on the pool. You don't need to use your own Queues in this approach, just the ones built in to the Pool. This might be best especially if the number of "sections" you're splitting the work up into doesn't correspond closely with the number of processor cores on your computer. You can let the Pool match the number of cores, and have each one work on several sections of the data in turn.
A rough sketch of this would be something like:
def worker_make_hist(section_number):
# do several operations, get a partial histogram
return histogram_this_section
def worker_do_more_ops(section_number, full_histogram):
# whatever...
return some_result
if __name__ == "__main__":
pool = multiprocessing.Pool() # by default the size will be equal to the number of cores
for temp_hist in pool.imap_unordered(worker_make_hist, range(number_of_sections)):
hist_full += temp_hist
some_results = pool.starmap(worker_do_more_ops, zip(range(number_of_sections),
itertools.repeat(hist_full)))

Access python program data while running

I have a python program that's been running for a while, and because of an unanticipated event, I'm now unsure that it will complete within a reasonable amount of time. The data it's collected so far, however, is valuable, and I would like to recover it if possible.
Here is the relevant code
from multiprocessing.dummy import Pool as ThreadPool
def pull_details(url):
#accesses a given URL
#returns some data which gets appended to the results list
pool = ThreadPool(25)
results = pool.map(pull_details, urls)
pool.close()
pool.join()
So I either need to access the data that is currently in results or somehow change the source of the code (or somehow manually change the program's control) to kill the loop so it continues to the later part of the program in which the data is exported (not sure if the second way is possible).
It seems as though the first option is also quite tricky, but luckily the IDE (Spyder) I'm using indicates the value of what I assume is the location of the list in the machine's memory (0xB73EDECCL).
Is it possible to create a C program (or another python program) to access this location in memory and read what's there?
Can't you use some sort of mechanism to exchange data between the two processes, like queues or pipes.
something like below:
from multiprocessing import Queue
from multiprocessing.dummy import Pool as ThreadPool
def pull_details(args=None):
q.put([my useful data])
q = Queue()
pool = ThreadPool(25)
results = pool.map(pull_details(args=q), urls)
while not done:
results = q.get()
pool.close()
pool.join()

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

Categories

Resources