I'm a multiprocessing newbie,
I know something about threading but I need to increase the speed of this calculation, hopefully with multiprocessing:
Example Description: sends string to a thread, alters string + benchmark test,
send result back for printing.
from threading import Thread
class Alter(Thread):
def __init__(self, word):
Thread.__init__(self)
self.word = word
self.word2 = ''
def run(self):
# Alter string + test processing speed
for i in range(80000):
self.word2 = self.word2 + self.word
# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()
#wait for both to finish
while thread1.is_alive() == True: pass
while thread2.is_alive() == True: pass
print(thread1.word2)
print(thread2.word2)
This is currently takes about 6 seconds and I need it to go faster.
I have been looking into multiprocessing and cannot find something equivalent to the above code. I think what I am after is pooling but examples I have found have been hard to understand. I would like to take advantage of all cores (8 cores) multiprocessing.cpu_count() but I really just have scraps of useful information on multiprocessing and not enough to duplicate the above code. If anyone can point me in the right direction or better yet, provide an example that would be greatly appreciated. Python 3 please
Just replace threading with multiprocessing and Thread with Process. Threads in Pyton are (almost) never used to gain performance because of the big bad GIL! I explained it in an another SO-post with some links to documentation and a great talk about threading in python.
But the multiprocessing module is intentionally very similar to the threading module. You can almost use it as an drop-in replacement!
The multiprocessing module doesn't AFAIK offer a functionality to enforce the use of a specific amount of cores. It relies on the OS-implementation. You could use the Pool object and limit the worker-onjects to the core-count. Or you could look for an other MPI library like pypar. Under Linux you could use a pipe under the shell to start multiple instances on different cores
Related
Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!
I want to speed up my program as much as possible. Can someone help me which will be better in terms of speed? As per my requirement I can go with any approach.
Approach 1 (spawned 2 threads from main process):
def a(something):
# Does something at fixed interval
while 1:
print("a")
time.sleep(60)
def b(something):
# Keeps running for infinitely without any delay.
while 1:
print("b")
def main():
something = {}
t1 = threading.Thread(target=b, args=(something,))
t1.start()
t2 = threading.Thread(target=a, args=(something,))
t2.start()
Approach 2 (spawned a nested thread):
def a(something):
# Does something at fixed interval
while 1:
print("a")
time.sleep(60)
def b(something):
t2 = threading.Thread(target=a, args=(something,))
t2.start()
# Keeps running for infinitely without any delay.
while 1:
print("b")
def main():
something = {}
t1 = threading.Thread(target=b, args=(something,))
t1.start()
P.S. a and b are just dummy functions but does the things in similar way.
The coexistence of threads is flat, not hierarchical. A thread does not operate within another thread. (I am pretty sure that this is the case for CPython, it would be nice if someone can check it).
In other words, there is no difference between a thread spawned within the main thread and a thread spawned within any other thread (what you refer to a nested thread).
Regarding the other small differences between your two approaches (such as global vs local variables), they would hardly affect speed.
And finally, in this particular case multithreading would work as expected, the Python's infamous GIL Lock won't have any effects (the time.sleep() block would be avoided by rescheduling threads).
Instead of multiple threats i suggest you look into multiprocessing, while heavier python's threads effectively will only run in one CPU core no matter how many of them you use. This is due to the GIL lock and that's why multiprocessing is used to get around that.
An alternative would be to use PyPy which is a different python implementation than the official with speedups is some cases and without GIL, which can allow for mutlithreading efficienlty
I'm having the hardest time trying to figure out the difference in usage between multiprocessing.Pool and multiprocessing.Queue.
To help, this is bit of code is a barebones example of what I'm trying to do.
def update():
def _hold(url):
soup = BeautifulSoup(url)
return soup
def _queue(url):
soup = BeautifulSoup(url)
li = [l for l in soup.find('li')]
return True if li else False
url = 'www.ur_url_here.org'
_hold(url)
_queue(url)
I'm trying to run _hold() and _queue() at the same time. I'm not trying to have them communicate with each other so there is no need for a Pipe. update() is called every 5 seconds.
I can't really rap my head around the difference between creating a pool of workers, or creating a queue of functions. Can anyone assist me?
The real _hold() and _queue() functions are much more elaborate than the example so concurrent execution actually is necessary, I just thought this example would suffice for asking the question.
The Pool and the Queue belong to two different levels of abstraction.
The Pool of Workers is a concurrent design paradigm which aims to abstract a lot of logic you would otherwise need to implement yourself when using processes and queues.
The multiprocessing.Pool actually uses a Queue internally for operating.
If your problem is simple enough, you can easily rely on a Pool. In more complex cases, you might need to deal with processes and queues yourself.
For your specific example, the following code should do the trick.
def hold(url):
...
return soup
def queue(url):
...
return bool(li)
def update(url):
with multiprocessing.Pool(2) as pool:
hold_job = pool.apply_async(hold, args=[url])
queue_job = pool.apply_async(queue, args=[url])
# block until hold_job is done
soup = hold_job.get()
# block until queue_job is done
li = queue_job.get()
I'd also recommend you to take a look at the concurrent.futures module. As the name suggest, that is the future proof implementation for the Pool of Workers paradigm in Python.
You can easily re-write the example above with that library as what really changes is just the API names.
I'm serializing column data and then sending it over a socket connection.
Something like:
import array, struct, socket
## Socket setup
s = socket.create_connection((ip, addr))
## Data container setup
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
## Binarize data
columns['col1'] = array.array('i', range(10000))
columns['col2'] = array.array('f', [float(num) for num in range(10000)])
.
.
.
## Send away
chunk = b''.join(columns[col_name] for col_name in ordered_col_list]
s.sendall(chunk)
s.recv(1000) #get confirmation
I wish to separate the computation from the sending, put them on separate threads or processes, so I can keep doing computations while data is sent away.
I've put the binarizing part as a generator function, then sent the generator to a separate thread, which then yielded binary chunks via a queue.
I collected the data from the main thread and sent it away. Something like:
import array, struct, socket
from time import sleep
try:
import thread
from Queue import Queue
except:
import _thread as thread
from queue import Queue
## Socket and queue setup
s = socket.create_connection((ip, addr))
chunk_queue = Queue()
def binarize(num_of_chunks):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
.
.
yield b''.join((columns[col_name] for col_name in ordered_col_list))
def chunk_yielder(queue):
''' Generate binary chunks and put them on a queue. To be used from a thread '''
while True:
try:
data_gen = queue.get_nowait()
except:
sleep(0.1)
continue
else:
for chunk in data_gen:
queue.put(chunk)
## Setup thread and data generator
thread.start_new_thread(chunk_yielder, (chunk_queue,))
num_of_chunks = 100
data_gen = binarize(num_of_chunks)
queue.put(data_gen)
## Get data back and send away
while True:
try:
binary_chunk = queue.get_nowait()
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) #Get confirmation
However, I did not see and performance imporovement - it did not work faster.
I don't understand threads/processes too well, and my question is whether it is possible (at all and in Python) to gain from this type of separation, and what would be a good way to go about it, either with threads or processess (or any other way - async etc).
EDIT:
As far as I've come to understand -
Multirpocessing requires serializing any sent data, so I'm double-sending every computed data.
Sending via socket.send() should release the GIL
Therefore I think (please correct me if I am mistaken) that a threading solution is the right way. However I'm not sure how to do it correctly.
I know cython can release the GIL off of threads, but since one of them is just socket.send/recv, my understanding is that it shouldn't be necessary.
You have two options for running things in parallel in Python, either use the multiprocessing (docs) library , or write the parallel code in cython and release the GIL. The latter is significantly more work and less applicable generally speaking.
Python threads are limited by the Global Interpreter Lock (GIL), I won't go into detail here as you will find more than enough information online on it. In short, the GIL, as the name suggests, is a global lock within the CPython interpreter that ensures multiple threads do not modify objects, that are within the confines of said interpreter, simultaneously. This is why, for instance, cython programs can run code in parallel because they can exist outside the GIL.
As to your code, one problem is that you're running both the number crunching (binarize) and the socket.send inside the GIL, this will run them strictly serially. The queue is also connected very strangely, and there is a NameError but let's leave those aside.
With the caveats already pointed out by Jeremy Friesner in mind, I suggest you re-structure the code in the following manner: you have two processes (not threads) one for binarising the data and the other for sending data. In addition to those, there is also the parent process that started both children, and a queue connecting child 1 to child 2.
Subprocess-1 does number crunching and produces crunched data into a queue
Subprocess-2 consumes data from a queue and does socket.send
in code the setup would look something like
from multiprocessing import Process, Queue
work_queue = Queue()
p1 = Process(target=binarize, args=(100, work_queue))
p2 = Process(target=send_data, args=(ip, port, work_queue))
p1.start()
p2.start()
p1.join()
p2.join()
binarize can remain as it is in your code, with the exception that instead of a yield at the end, you add elements into the queue
def binarize(num_of_chunks, q):
''' Generator function that yields chunks of binary data. In reality it wouldn't be the same data'''
ordered_col_list = ('col1', 'col2')
columns = dict.fromkeys(ordered_col_list)
for i in range(num_of_chunks):
columns['col1'] = array.array('i', range(10000)).tostring()
columns['col2'] = array.array('f', [float(num) for num in range(10000)]).tostring()
data = b''.join((columns[col_name] for col_name in ordered_col_list))
q.put(data)
send_data should just be the while loop from the bottom of your code, with the connection open/close functionality
def send_data(ip, addr, q):
s = socket.create_connection((ip, addr))
while True:
try:
binary_chunk = q.get(False)
except:
sleep(0.1)
continue
else:
socket.sendall(binary_chunk)
socket.recv(1000) # Get confirmation
# maybe remember to close the socket before killing the process
Now you have two (three actually if you count the parent) processes that are processing data independently. You can force the two processes to synchronise their operations by setting the max_size of the queue to a single element. The operation of these two separate processes is also easy to monitor from the process manager on your computer top (Linux), Activity Monitor (OsX), don't remember what it's called under Windows.
Finally, Python 3 comes with the option of using co-routines which are neither processes nor threads, but something else entirely. Co-routines are pretty cool from a CS point of view, but a bit of a head scratcher at first. There is plenty of resources to learn from though, like this post on Medium and this talk by David Beazley.
Even more generally, you might want to look into the producer/consumer pattern, if you are not already familiar with it.
If you are trying to use concurrency to improve performance in CPython I would strongly recommend using multiprocessing library instead of multithreading. It is because of GIL (Global Interpreter Lock), which can have a huge impact on execution speed (in some cases, it may cause your code to run slower than single threaded version). Also, if you would like to learn more about this topic, I recommend reading this presentation by David Beazley. Multiprocessing bypasses this problem by spawning a new Python interpreter instance for each process, thus allowing you to take full advantage of multi core architecture.
I'm starting to write a program that uses threads but after searching how to start threads in Python I have found two methods that accomplish the same thing. There must be a difference or advantage one over the other. Confused which road I should go down.
My thread is going to be ran in the background continuously and never stop until the program is told to by the user. Also one or more arguments will be passed to the thread when started.
one way using classes:
from threading import Thread
class myClassA(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
print 'A'
myClassA()
while True:
pass
Second way using methods:
from threading import Thread
def runA():
while True:
print 'A\n'
if __name__ == "__main__":
t1 = Thread(target = runA)
t1.setDaemon(True)
t1.start()
while True:
pass
My rule of thumb for using classes is that you shouldn't use them until you find a good use case for them. One use case would be if you wanted to define multiple methods to interact with the thread. But usually developers can't see the future when designing classes so it's better to just code using functions, and when you see a use case for classes refactor your code. What I mean is, you might spend a lot of time designing a class and not even end up using or needing a lot of the functionality you implemented; so you wasted your time and made your code complex for no reason.