I want to speed up my program as much as possible. Can someone help me which will be better in terms of speed? As per my requirement I can go with any approach.
Approach 1 (spawned 2 threads from main process):
def a(something):
# Does something at fixed interval
while 1:
print("a")
time.sleep(60)
def b(something):
# Keeps running for infinitely without any delay.
while 1:
print("b")
def main():
something = {}
t1 = threading.Thread(target=b, args=(something,))
t1.start()
t2 = threading.Thread(target=a, args=(something,))
t2.start()
Approach 2 (spawned a nested thread):
def a(something):
# Does something at fixed interval
while 1:
print("a")
time.sleep(60)
def b(something):
t2 = threading.Thread(target=a, args=(something,))
t2.start()
# Keeps running for infinitely without any delay.
while 1:
print("b")
def main():
something = {}
t1 = threading.Thread(target=b, args=(something,))
t1.start()
P.S. a and b are just dummy functions but does the things in similar way.
The coexistence of threads is flat, not hierarchical. A thread does not operate within another thread. (I am pretty sure that this is the case for CPython, it would be nice if someone can check it).
In other words, there is no difference between a thread spawned within the main thread and a thread spawned within any other thread (what you refer to a nested thread).
Regarding the other small differences between your two approaches (such as global vs local variables), they would hardly affect speed.
And finally, in this particular case multithreading would work as expected, the Python's infamous GIL Lock won't have any effects (the time.sleep() block would be avoided by rescheduling threads).
Instead of multiple threats i suggest you look into multiprocessing, while heavier python's threads effectively will only run in one CPU core no matter how many of them you use. This is due to the GIL lock and that's why multiprocessing is used to get around that.
An alternative would be to use PyPy which is a different python implementation than the official with speedups is some cases and without GIL, which can allow for mutlithreading efficienlty
Related
Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!
Why this python multi-thread approach take more time than single thread to solve the same problem?
Note that my computer is multi-core processor cpu.
I wrote the same code in both ways and make a comparison. Surprisingly single thread way is faster! Anyone have any thoughts?
#!/usr/bin/python
L = [1,2,3,4,5,7,8,9,10]
def gen(index,value):
if index==len(L):
return 1
count=0
for i in range(len(value)+1):
count+=gen(index+1,value[:i]+[L[index]]+value[i:])
return count
#Single thread approach
print gen(1,[1]) #this takes 480ms to run!
#Multi-thread approach
from threading import Thread
def t1_start():
global pointer1
pointer1=gen(2,[2,1])
def t2_start():
global pointer2
pointer1=gen(2,[1,2])
pointer1=0
pointer2=0
t1=Thread(target=t1_start,args=())
t2=Thread(target=t2_start,args=())
t1.start()
t2.start()
t1.join()
t2.join()
#print pointer1+pointer2 #this takes 650ms to run!
I've had this problem in Java.
The users suggested that the overhead, which the creation of the threads costs, was slowing down so much, that it took longer.
I recommend you to try your threaded code with really really complex calculations and much more numbers in your gen(index, value) function.
The threaded code might be better than the simple solution, if the gen function would take more time.
I am new to multithreading in python and trying to learn multithreading using threading module. I have made a very simple program of multi threading and i am having trouble understanding the threading.Thread.join method.
Here is the source code of the program I have made
import threading
val = 0
def increment():
global val
print "Inside increment"
for x in range(100):
val += 1
print "val is now {} ".format(val)
thread1 = threading.Thread(target=increment, args=())
thread2 = threading.Thread(target=increment, args=())
thread1.start()
#thread1.join()
thread2.start()
#thread2.join()
What difference does it make if I use
thread1.join()
thread2.join()
which I have commented in the above code? I ran both the source codes (one with comments and the one without comments) but the output is the same.
A call to thread1.join() blocks the thread in which you're making the call, until thread1 is finished. It's like wait_until_finished(thread1).
For example:
import time
def printer():
for _ in range(3):
time.sleep(1.0)
print "hello"
thread = Thread(target=printer)
thread.start()
thread.join()
print "goodbye"
prints
hello
hello
hello
goodbye
—without the .join() call, goodbye would come first and then 3 * hello.
Also, note that threads in Python do not provide any additional performance (in terms of CPU processing power) because of a thing called the Global Interpreter Lock, so while they are useful for spawning off potentially blocking (e.g. IO, network) and time consuming tasks (e.g. number crunching) to keep the main thread free for other tasks, they do not allow you to leverage multiple cores or CPUs; for that, look at multiprocessing which uses subprocesses but exposes an API equivalent to that of threading.
PLUG: ...and it is also for the above reason that, if you're interested in concurrency, you might also want to look into a fine library called Gevent, which essentially just makes threading much easier to use, much faster (when you have many concurrent activities) and less prone to concurrency related bugs, while allowing you to keep coding the same way as with "real" threads. Also Twisted, Eventlet, Tornado and many others, are either equivalent or comparable. Furthermore, in any case, I'd strongly suggest reading these classics:
Generator Tricks for Systems Programmers
A Curious Course on Coroutines and Concurrency
I modified the code so that you will understand how exactly join works.
so run this code with comments and without comments and observe the output for both.
val = 0
def increment(msg,sleep_time):
global val
print "Inside increment"
for x in range(10):
val += 1
print "%s : %d\n" % (msg,val)
time.sleep(sleep_time)
thread1 = threading.Thread(target=increment, args=("thread_01",0.5))
thread2 = threading.Thread(target=increment, args=("thread_02",1))
thread1.start()
#thread1.join()
thread2.start()
#thread2.join()
As the relevant documentation states, join makes the caller wait until the thread terminates.
In your case, the output is the same because join doesn't change the program behaviour - it's probably being used to exit the program cleanly, only when all the threads have terminated.
I'm a multiprocessing newbie,
I know something about threading but I need to increase the speed of this calculation, hopefully with multiprocessing:
Example Description: sends string to a thread, alters string + benchmark test,
send result back for printing.
from threading import Thread
class Alter(Thread):
def __init__(self, word):
Thread.__init__(self)
self.word = word
self.word2 = ''
def run(self):
# Alter string + test processing speed
for i in range(80000):
self.word2 = self.word2 + self.word
# Send a string to be altered
thread1 = Alter('foo')
thread2 = Alter('bar')
thread1.start()
thread2.start()
#wait for both to finish
while thread1.is_alive() == True: pass
while thread2.is_alive() == True: pass
print(thread1.word2)
print(thread2.word2)
This is currently takes about 6 seconds and I need it to go faster.
I have been looking into multiprocessing and cannot find something equivalent to the above code. I think what I am after is pooling but examples I have found have been hard to understand. I would like to take advantage of all cores (8 cores) multiprocessing.cpu_count() but I really just have scraps of useful information on multiprocessing and not enough to duplicate the above code. If anyone can point me in the right direction or better yet, provide an example that would be greatly appreciated. Python 3 please
Just replace threading with multiprocessing and Thread with Process. Threads in Pyton are (almost) never used to gain performance because of the big bad GIL! I explained it in an another SO-post with some links to documentation and a great talk about threading in python.
But the multiprocessing module is intentionally very similar to the threading module. You can almost use it as an drop-in replacement!
The multiprocessing module doesn't AFAIK offer a functionality to enforce the use of a specific amount of cores. It relies on the OS-implementation. You could use the Pool object and limit the worker-onjects to the core-count. Or you could look for an other MPI library like pypar. Under Linux you could use a pipe under the shell to start multiple instances on different cores
import time
import threading
class test(threading.Thread):
def __init__ (self):
threading.Thread.__init__(self)
self.doSkip = False
self.count = 0
def run(self):
while self.count<9:
self.work()
def skip(self):
self.doSkip = True
def work(self):
self.count+=1
time.sleep(1)
if(self.doSkip):
print "skipped"
self.doSkip = False
return
print self.count
t = test()
t.start()
while t.count<9:
time.sleep(2)
t.skip()
Thread-safe in which way? I don't see any part you might want to protect here.
skip may reset the doSkip at any time, so there's not much point in locking it. You don't have any resources that are accessed at the same time - so IMHO nothing can be corrupted / unsafe in this code.
The only part that might run differently depending on locking / counting is how many "skip"s do you expect on every call to .skip(). If you want to ensure that every skip results in a skipped call to .work(), you should change doSkip into a counter that is protected by a lock on both increment and compare/decrement. Currently one thread might turn doSkip on after the check, but before the doSkip reset. It doesn't matter in this example, but in some real situation (with more code) it might make a difference.
Whenever the test of a mutex boolean ( e.g. if(self.doSkip) ) is separate from the set of the mutex boolean you will probably have threading problems.
The rule is that your thread will get swapped out at the most inconvenient time. That is, after the test and before the set. Moving them closer together reduces the window for screw-ups but does not eliminate them. You almost always need a specially created mechanism from the language or kernel to fully close that window.
The threading library has Semaphores that can be used to synchronize threads and/or create critical sections of code.
Apparently there isn't any critical resource, so I'd say it's thread-safe.
But as usual you can't predict in which order the two threads will be blocked/run by the scheduler.
This is and will thread safe as long as you don't share data between threads.
If an other thread needs to read/write data to your thread class, then this won't be thread safe unless you protect data with some synchronization mechanism (like locks).
To elaborate on DanM's answer, conceivably this could happen:
Thread 1: t.skip()
Thread 2: if self.doSkip: print 'skipped'
Thread 1: t.skip()
Thread 2: self.doSkip = False
etc.
In other words, while you might expect to see one "skipped" for every call to t.skip(), this sequence of events would violate that.
However, because of your sleep() calls, I think this sequence of events is actually impossible.
(unless your computer is running really slowly)