I want to upload a file to two ftp sites. After both finishing, I need to delete the file.
In case not to block the main function running, both ftp and deletion functions will be implemented by threading, which means there will be 3 threads running in background simultaneously. An easy problem becomes complicated because of threading.
Following are possible solutions:
Use a queue and put all three threads in order
Use mutex
Both work, but I don't think they are the best way to do that. Can anyone share his/her idea?
Use a semaphore.
In your parent handler create a semaphore of 0 and hand it off to the threads.
import threading
sem = threading.Semaphore(0)
hostThread = threading.Thread(target=uploadToHost, args=(sem,))
backupThread = threading.Thread(target=uploadToBackup, args=(sem,))
sem.acquire() # Wait for one of them to finish
sem.acquire() # Wait for the other to finish
In your children, you'll just have to call sem.release once the upload is finished.
Related
I am a beginner in python and unable to get an idea about threading.By using simple example could someone please explain threading and multithreading in python?
-Thanks
Here is Alex Martelli's answer about multithreading, as linked above.
He uses a simple program that tries some URLs then returns the contents of first one to respond.
import Queue
import threading
import urllib2
# called by each thread
def get_url(q, url):
q.put(urllib2.urlopen(url).read())
theurls = ["http://google.com", "http://yahoo.com"]
q = Queue.Queue()
for u in theurls:
t = threading.Thread(target=get_url, args = (q,u))
t.daemon = True
t.start()
s = q.get()
print s
This is a case where threading is used as a simple optimization: each subthread is waiting for a URL to resolve and respond, in order to put its contents on the queue; each thread is a daemon (won't keep the process up if main thread ends -- that's more common than not); the main thread starts all subthreads, does a get on the queue to wait until one of them has done a put, then emits the results and terminates (which takes down any subthreads that might still be running, since they're daemon threads).
Proper use of threads in Python is invariably connected to I/O operations (since CPython doesn't use multiple cores to run CPU-bound tasks anyway, the only reason for threading is not blocking the process while there's a wait for some I/O). Queues are almost invariably the best way to farm out work to threads and/or collect the work's results, by the way, and they're intrinsically threadsafe so they save you from worrying about locks, conditions, events, semaphores, and other inter-thread coordination/communication concepts.
Kind all, I'm really new to python and I'm facing a task which I can't completely grasp.
I've created an interface with Tkinter which should accomplish a couple of apparently easy feats.
By clicking a "Start" button two threads/processes will be started (each calling multiple subfunctions) which mainly read data from a serial port (one port per process, of course) and write them to file.
The I/O actions are looped within a while loop with a very high counter to allow them to go onward almost indefinitely.
The "Stop" button should stop the acquisition and essentially it should:
Kill the read/write Thread
Close the file
Close the serial port
Unfortunately I still do not understand how to accomplish point 1, i.e.: how to create killable threads without killing the whole GUI. Is there any way of doing this?
Thank you all!
First, you have to choose whether you are going to use threads or processes.
I will not go too much into differences, google it ;) Anyway, here are some things to consider: it is much easier to establish communication between threads than betweeween processes; in Python, all threads will run on the same CPU core (see Python GIL), but subprocesses may use multiple cores.
Processes
If you are using subprocesses, there are two ways: subprocess.Popen and multiprocessing.Process. With Popen you can run anything, whereas Process gives a simpler thread-like interface to running python code which is part of your project in a subprocess.
Both can be killed using terminate method.
See documentation for multiprocessing and subprocess
Of course, if you want a more graceful exit, you will want to send an "exit" message to the subprocess, rather than just terminate it, so that it gets a chance to do the clean-up. You could do that e.g. by writing to its stdin. The process should read from stdin and when it gets message "exit", it should do whatever you need before exiting.
Threads
For threads, you have to implement your own mechanism for stopping, rather than using something as violent as process.terminate().
Usually, a thread runs in a loop and in that loop you check for a flag which says stop. Then you break from the loop.
I usually have something like this:
class MyThread(Thread):
def __init__(self):
super(Thread, self).__init__()
self._stop_event = threading.Event()
def run(self):
while not self._stop_event.is_set():
# do something
self._stop_event.wait(SLEEP_TIME)
# clean-up before exit
def stop(self, timeout):
self._stop_event.set()
self.join(timeout)
Of course, you need some exception handling etc, but this is the basic idea.
EDIT: Answers to questions in comment
thread.start_new_thread(your_function) starts a new thread, that is correct. On the other hand, module threading gives you a higher-level API which is much nicer.
With threading module, you can do the same with:
t = threading.Thread(target=your_function)
t.start()
or you can make your own class which inherits from Thread and put your functionality in the run method, as in the example above. Then, when user clicks the start button, you do:
t = MyThread()
t.start()
You should store the t variable somewhere. Exactly where depends on how you designed the rest of your application. I would probably have some object which hold all active threads in a list.
When user clicks stop, you should:
t.stop(some_reasonable_time_in_which_the_thread_should_stop)
After that, you can remove the t from your list, it is not usable any more.
First you can use subprocess.Popen() to spawn child processes, then later you can use Popen.terminate() to terminate them.
Note that you could also do everything in a single Python thread, without subprocesses, if you want to. It's perfectly possible to "multiplex" reading from multiple ports in a single event loop.
So I have the following program:
https://github.com/eWizardII/homobabel/blob/master/Experimental/demo_async_falcon.py
However, when it's run I only get two active threads are running, how can I make it so that there are more threads running. I have tried doing stuff like urlv2 = birdofprey(ip2) where ip2 = str(host+1) however that just ends up sending the same thing to two threads. Any help would be appreciated.
Thanks,
Active count=2 means that you have one of your designed thread (birdofprey), and the main thread. This is because you use lock, so the second birdofprey thread waits for the first and so on. I didn't get deeper into the algorithm, but it seems that you don't need to lock birdofprey threads, since they don't share any data (I could get wrong). If they share, you should make exclusive access to the shared data, and not to lock the whole body of run.
Update upon comment
remove locks (if there is no shared data. storage_i is not a shared data.);
in the for loop` create threads, start them, append to a list;
make the second loop over the list of threads, call join collect the information you need.
line 75, urlv.join() blocks until the thread finishes. So you actually create one thread, wait until it's done and then start the next. The other thread is the main thread.
I think the problem is that you need to pull urlv.join() out of the for loop. Right now, because of the join, you're waiting for your new thread to complete before starting the next one.
But for general readability, maintainability, etc., you might want to consider using Python's Queue class to set up a work queue and have a pool of worker threads pulling from it.
I wrote a python script that:
1. submits search queries
2. waits for the results
3. parses the returned results(XML)
I used the threading and Queue modules to perform this in parallel (5 workers).
It works great for the querying portion because i can submit multiple search jobs and deal with the results as they come in.
However, it appears that all my threads get bound to the same core. This is apparent when it gets to the part where it processes the XML(cpu intensive).
Has anyone else encountered this problem? Am i missing something conceptually?
Also, i was pondering the idea of having two separate work queues, one for making the queries and one for parsing the XML. As it is now, one worker will do both in serial. I'm not sure what that will buy me, if anything. Any help is greatly appreciated.
Here is the code: (proprietary data removed)
def addWork(source_list):
for item in source_list:
#print "adding: '%s'"%(item)
work_queue.put(item)
def doWork(thread_id):
while 1:
try:
gw = work_queue.get(block=False)
except Queue.Empty:
#print "thread '%d' is terminating..."%(thread_id)
sys.exit() # no more work in the queue for this thread, die quietly
##Here is where i make the call to the REST API
##Here is were i wait for the results
##Here is where i parse the XML results and dump the data into a "global" dict
#MAIN
producer_thread = Thread(target=addWork, args=(sources,))
producer_thread.start() # start the thread (ie call the target/function)
producer_thread.join() # wait for thread/target function to terminate(block)
#start the consumers
for i in range(5):
consumer_thread = Thread(target=doWork, args=(i,))
consumer_thread.start()
thread_list.append(consumer_thread)
for thread in thread_list:
thread.join()
This is a byproduct of how CPython handles threads. There are endless discussions around the internet (search for GIL) but the solution is to use the multiprocessing module instead of threading. Multiprocessing is built with pretty much the same interface (and synchronization structures, so you can still use queues) as threading. It just gives every thread its own entire process, thus avoiding the GIL and forced serialization of parallel workloads.
Using CPython, your threads will never actually run in parallel in two different cores. Look up information on the Global Interpreter Lock (GIL).
Basically, there's a mutual exclusion lock protecting the actual execution part of the interpreter, so no two threads can compute in parallel. Threading for I/O tasks will work just fine, because of blocking.
edit: If you want to fully take advantage of multiple cores, you need to use multiple processes. There's a lot of articles about this topic, I'm trying to look one up for you I remember was great, but can't find it =/.
As Nathon suggested, you can use the multiprocessing module. There are tools to help you share objects between processes (take a look at POSH, Python Object Sharing).
I need it to open 10 processes, and each time one of them finishes I want to wait few seconds and start another one.
It seems pretty simple, but somehow I can't get it to work.
I'm not 100% clear on what you're trying to accomplish, but have you looked at the multiprocessing module, specifically using a pool of workers?
I've done this same thing to process web statistics using a semaphore. Essentially, as processes are created, the semaphore is incremented. When they exit, it's decremented. The creation process is blocked when the semaphore blocks.
This actually fires off threads, which run external processes down execution path a bit.
Here's an example.
thread_sem = threading.Semaphore(int(cfg.maxthreads))
for k,v in log_data.items():
thread_list.append(ProcessorThread(int(k), v, thread_sem))
thread_list[-1].start()
And then in the constructor for ProcessorThread, I do this:
def __init__(self, siteid, data, lock_object):
threading.Thread.__init__(self)
self.setDaemon(False)
self.lock_object = lock_object
self.data = data
self.siteid = siteid
self.lock_object.acquire()
When the thread finishes it's task (whether successfully or not), the lock_object is released which allows for another process to begin.
HTH