Python, is it proper for one thread to spawn another - python

I am writing an update application in Python 2.x. I have one thread (ticket_server) sitting on a database (CouchDB) url in longpoll mode. Update requests are dumped into this database from an outside application. When a change comes, ticket_server triggers a worker thread (update_manager). The heavy lifting is done in this update_manager thread. There will be telnet connections and ftp uploads performed. So it is of highest importance that this process not be interrupted.
My question is, is it safe to spawn update_manager threads from the ticket_server threads?
The other option might be to put requests into a queue, and have another function wait for a ticket to enter the queue and then pass the request off to an update_manager thread. But, Id rather keeps tings simple (Im assuming the ticket_server spawning update_manager is simple) until I have a reason to expand.
# Here is the heavy lifter
class Update_Manager(threading.Thread):
def __init__(self)
threading.Thread.__init__(self, ticket, telnet_ip, ftp_ip)
self.ticket = ticket
self.telnet_ip = telnet_ip
self.ftp_ip = ftp_ip
def run(self):
# This will be a very lengthy process.
self.do_some_telnet()
self.do_some_ftp()
def do_some_telnet(self)
...
def do_some_ftp(self)
...
# This guy just passes work orders off to Update_Manager
class Ticket_Server(threading.Thread):
def __init__(self)
threading.Thread.__init__(self, database_ip)
self.database_ip
def run(self):
# This function call will block this thread only.
ticket = self.get_ticket(database_ip)
# Here is where I question what to do.
# Should I 1) call the Update thread right from here...
up_man = Update_Manager(ticket)
up_man.start
# Or should I 2) put the ticket into a queue and let some other function
# not in this thread fire the Update_Manager.
def get_ticket()
# This function will 'wait' for a ticket to get posted.
# for those familiar with couchdb:
url = 'http://' + database_ip:port + '/_changes?feed=longpoll&since=' + update_seq
response = urllib2.urlopen(url)
This is just a lot of code to ask which approach is the safer/more efficient/more pythonic
Im only a few months old with python so these question get my brain stuck in a while loop.

The main thread of a program is a thread; the only way to spawn a thread is from another thread.
Of course, you need to make sure your blocking thread is releasing the GIL while it waits, or other Python threads won't run. All mature Python database bindings will do this, but I've never heard of couchdb.

Related

Tornado 4.x solution of running game on ThreadPoolExecutor not working anymore. Need help refactoring it

My ThreadPoolExecutor/gen.coroutine(tornado v4.x) solution to circumvent blocking the webserver is not working anymore with tornado version 6.x.
A while back I started to develop an online Browser game using a Tornado webserver(v4.x) and websockets. Whenever user input is expected, the game would send the question to the client and wait for the response. Back than i used gen.coroutine and a ThreadPoolExecutor to make this task non-blocking. Now that I started refactoring the game, it is not working with tornado v6.x and the task is blocking the server again. I searched for possible solutions, but so far i have been unable to get it working again. It is not clear to me how to change my existing code to be non-blocking again.
server.py:
class PlayerWebSocket(tornado.websocket.WebSocketHandler):
executor = ThreadPoolExecutor(max_workers=15)
#run_on_executor
def on_message(self,message):
params = message.split(':')
self.player.callbacks[int(params[0])]=params[1]
if __name__ == '__main__':
application = Application()
application.listen(9999)
tornado.ioloop.IOLoop.instance().start()
player.py:
#gen.coroutine
def send(self, message):
self.socket.write_message(message)
def create_choice(self, id, choices):
d = {}
d['id'] = id
d['choices']=choices
self.choice[d['id']]=d
self.send('update',self)
while not d['id'] in self.callbacks:
pass
del self.choice[d['id']]
return self.callbacks[d['id']]
Whenever a choice is to be made, the create_choice function creates a dict with a list (choices) and an id and stores it in the players self.callbacks. After that it just stays in the while loop until the websocket.on_message function puts the received answer (which looks like this: id:Choice_id, so for example 1:12838732) into the callbacks dict.
The WebSocketHandler.write_message method is not thread-safe, so it can only be called from the IOLoop's thread, and not from a ThreadPoolExecutor (This has always been true, but sometimes it might have seemed to work anyway).
The simplest way to fix this code is to save IOLoop.current() in a global variable from the main thread (the current() function accesses a thread-local variable so you can't call it from the thread pool) and use ioloop.add_callback(self.socket.write_message, message) (and remove #gen.coroutine from send - it doesn't do any good to make functions coroutines if they contain no yield expressions).

How do I know if a thread is a dummy thread in python?

My basic question is: how do I detect whether the current thread is a dummy thread? I am new to threading and I recently was debugging some code in my Apache2/Flask app and thought it might be useful. I was getting a flip flopping error where a request was processed successfully on the main thread, unsuccessfully on a dummy thread and then successfully on the main thread again, etc.
Like I said I am using Apache2 and Flask which seems the combination of which creates these dummy threads. I would also be interested in knowing more about that if anyone can teach me.
My code is meant to print information about the threads running on the service and looks something like this:
def allthr_info(self):
"""Returns info in JSON form of all threads."""
all_thread_infos = Queue()
for thread_x in threading.enumerate():
if thread_x is threading.current_thread() or thread_x is threading.main_thread():
continue
info = self._thr_info(thread_x)
all_thread_infos.put(info)
return list(all_thread_infos.queue)
def _thr_info(self, thr):
"""Consolidation of the thread info that can be obtained from threading module."""
thread_info = {}
try:
thread_info = {
'name': thr.getName(),
'ident': thr.ident,
'daemon': thr.daemon,
'is_alive': thr.is_alive(),
}
except Exception as e:
LOGGER.error(e)
return thread_info
You can check if the current thread is an instance of threading._DummyThread.
isinstance(threading.current_thread(), threading._DummyThread)
threading.py itself can teach you what dummy-threads are about:
Dummy thread class to represent threads not started here.
These aren't garbage collected when they die, nor can they be waited for.
If they invoke anything in threading.py that calls current_thread(), they
leave an entry in the _active dict forever after.
Their purpose is to return something from current_thread().
They are marked as daemon threads so we won't wait for them
when we exit (conform previous semantics).
def current_thread():
"""Return the current Thread object, corresponding to the caller's thread of control.
If the caller's thread of control was not created through the threading
module, a dummy thread object with limited functionality is returned.
"""
try:
return _active[get_ident()]
except KeyError:
return _DummyThread()

Stop a long-running action in web2py with multiprocessing

I have a web2py application that basically serves as a browser interface for a Python script. This script usually returns pretty quickly, but can occasionally take a long time. I want to provide a way for the user to stop the script's execution if it takes too long.
I am currently calling the function like this:
def myView(): # this function is called from ajax
session.model = myFunc() # myFunc is from a module which i have complete control over
return dict(model=session.model)
myFunc, when called with certain options, uses multiprocessing but still ends up taking a long time. I need some way to terminate the function, or at the very least the thread's children.
The first thing i tried was to run myFunc in a new process, and roll my own simple event system to kill it:
# in the controller
def myView():
p_conn, c_conn = multiprocessing.Pipe()
events = multiprocessing.Manager().dict()
proc = multiprocessing.Process(target=_fit, args=(options, events c_conn))
proc.start()
sleep(0.01)
session.events = events
proc.join()
session.model = p_conn.recv()
return dict(model=session.model)
def _fit(options, events pipe):
pipe.send(fitting.logistic_fit(options=options, events=events))
pipe.close()
def stop():
try:
session.events['kill']()
except SystemExit:
pass # because it raises that error intentionally
return dict()
# in the module
def kill():
print multiprocessing.active_children()
for p in multiprocessing.active_children():
p.terminate()
raise SystemExit
def myFunc(options, events):
events['kill'] = kill
I ran into a few major problems with this.
The session in stop() wasn't always the same as the session in myView(), so session.events was None.
Even when the session was the same, kill() wasn't properly killing the children.
The long-running function would hang the web2py thread, so stop() wasn't even processed until the function finished.
I considered not calling join() and using AJAX to pick up the result of the function at a later time, but I wasn't able to save the process object in session for later use. The pipe seemed to be able to be pickled, but then I had the problem with not being able to access the same session from another view.
How can I implement this feature?
For long running tasks, you are better off queuing them via the built-in scheduler. If you want to allow the user to manually stop a task that is taking too long, you can use the scheduler.stop_task(ref) method (where ref is the task id or uuid). Alternatively, when you queue a task, you can specify a timeout, so it will automatically stop if not completed within the timeout period.
You can do simple Ajax polling to notify the client when the task has completed (or implement something more sophisticated with websockets or SSE).

Multi-threaded web scraping in Python/PySide/PyQt

I'm building a web scraper of a kind. Basically, what the soft would do is:
User (me) inputs some data (IDs) - IDs are complex, so not just numbers
Based on those IDs, the script visits http://localhost/ID
What is the best way to accomplish this? So I'm looking upwards of 20-30 concurrent connections to do it.
I was thinking, would a simple loop be the solution? This loop would start QThreads (it's a Qt app), so they would run concurrently.
The problem I am seeing with the loop however is how to instruct it to use only those IDs not used before i.e. in the iteration/thread that had been executed just before it was? Would I need some sort of a "delegator" function which will keep track of what IDs had been used and delegate the unused ones to the QThreads?
Now I've written some code but I am not sure if it is correct:
class GUI(QObject):
def __init__(self):
print "GUI CLASS INITIALIZED!!!"
self.worker = Worker()
for i in xrange(300):
QThreadPool().globalInstance().start(self.worker)
class Worker(QRunnable):
def run(self):
print "Hello world from thread", QThread.currentThread()
Now I'm not sure if these achieve really what I want. Is this actually running in separate threads? I'm asking because currentThread() is the same every time this is executed, so it doesn't look that way.
Basically, my question comes down to how do I execute several same QThreads concurrently?
Thanks in advance for the answer!
As Dikei says, Qt is red herring here. Focus on just using Python threads as it will keep your code much simpler.
In the code below we have a set, job_queue, containing the jobs to be executed. We also have a function, worker_thread which takes a job from the passed in queue and executes. Here it just sleeps for a random period of time. The key thing here is that set.pop is thread safe.
We create an array of thread objects, workers, and call start on each as we create it. From the Python documentation threading.Thread.start runs the given callable in a separate thread of control. Lastly we go through each worker thread and block until it has exited.
import threading
import random
import time
pool_size = 5
job_queue = set(range(100))
def worker_thread(queue):
while True:
try:
job = queue.pop()
except KeyError:
break
print "Processing %i..." % (job, )
time.sleep(random.random())
print "Thread exiting."
workers = []
for thread in range(pool_size):
workers.append(threading.Thread(target=worker_thread, args=(job_queue, )))
workers[-1].start()
for worker in workers:
worker.join()
print "All threads exited"

Waiting on event with Twisted and PB

I have a python app that uses multiple threads and I am curious about the best way to wait for something in python without burning cpu or locking the GIL.
my app uses twisted and I spawn a thread to run a long operation so I do not stomp on the reactor thread. This long operation also spawns some threads using twisted's deferToThread to do something else, and the original thread wants to wait for the results from the defereds.
What I have been doing is this
while self._waiting:
time.sleep( 0.01 )
which seemed to disrupt twisted PB's objects from receiving messages so I thought sleep was locking the GIL. Further investigation by the posters below revealed however that it does not.
There are better ways to wait on threads without blocking the reactor thread or python posted below.
If you're already using Twisted, you should never need to "wait" like this.
As you've described it:
I spawn a thread to run a long operation ... This long operation also spawns some threads using twisted's deferToThread ...
That implies that you're calling deferToThread from your "long operation" thread, not from your main thread (the one where reactor.run() is running). As Jean-Paul Calderone already noted in a comment, you can only call Twisted APIs (such as deferToThread) from the main reactor thread.
The lock-up that you're seeing is a common symptom of not following this rule. It has nothing to do with the GIL, and everything to do with the fact that you have put Twisted's reactor into a broken state.
Based on your loose description of your program, I've tried to write a sample program that does what you're talking about based entirely on Twisted APIs, spawning all threads via Twisted and controlling them all from the main reactor thread.
import time
from twisted.internet import reactor
from twisted.internet.defer import gatherResults
from twisted.internet.threads import deferToThread, blockingCallFromThread
def workReallyHard():
"'Work' function, invoked in a thread."
time.sleep(0.2)
def longOperation():
for x in range(10):
workReallyHard()
blockingCallFromThread(reactor, startShortOperation, x)
result = blockingCallFromThread(reactor, gatherResults, shortOperations)
return 'hooray', result
def shortOperation(value):
workReallyHard()
return value * 100
shortOperations = []
def startShortOperation(value):
def done(result):
print 'Short operation complete!', result
return result
shortOperations.append(
deferToThread(shortOperation, value).addCallback(done))
d = deferToThread(longOperation)
def allDone(result):
print 'Long operation complete!', result
reactor.stop()
d.addCallback(allDone)
reactor.run()
Note that at the point in allDone where the reactor is stopped, you could fire off another "long operation" and have it start the process all over again.
Have you tried condition variables? They are used like
condition = Condition()
def consumer_in_thread_A():
condition.acquire()
try:
while resource_not_yet_available:
condition.wait()
# Here, the resource is available and may be
# consumed
finally:
condition.release()
def produce_in_thread_B():
# ... create resource, whatsoever
condition.acquire()
try:
condition.notify_all()
finally:
condition.release()
Condition variables act as locks (acquire and release), but their main purpose is to provide the control mechanism which allows to wait for them to be notify-d or notify_all-d.
I recently found out that calling
time.sleep( X ) will lock the GIL for
the entire time X and therefore freeze
ALL python threads for that time
period.
You found wrongly -- this is definitely not how it works. What's the source where you found this mis-information?
Anyway, then you clarify (in comments -- better edit your Q!) that you're using deferToThread and your problem with this is that...:
Well yes I defer the action to a
thread and give twisted a callback.
But the parent thread needs to wait
for the whole series of sub threads to
complete before it can move onto a new
set of sub threads to spawn
So use as the callback a method of an object with a counter -- start it at 0, increment it by one every time you're deferring-to-thread and decrement it by one in the callback method.
When the callback method sees that the decremented counter has gone back to 0, it knows that we're done waiting "for the whole series of sub threads to complete" and then the time has come to "move on to a new set of sub threads to spawn", and thus, in that case only, calls the "spawn a new set of sub threads" function or method -- it's that easy!
E.g. (net of typos &c as this is untested code, just to give you the idea)...:
class Waiter(object):
def __init__(self, what_next, *a, **k):
self.counter = 0
self.what_next = what_next
self.a = a
self.k = k
def one_more(self):
self.counter += 1
def do_wait(self, *dont_care):
self.counter -= 1
if self.counter == 0:
self.what_next(*self.a, **self.k)
def spawn_one_thread(waiter, long_calculation, *a, **k):
waiter.one_more()
d = threads.deferToThread(long_calculation, *a, **k)
d.addCallback(waiter.do_wait)
def spawn_all(waiter, list_of_lists_of_functions_args_and_kwds):
if not list_of_lists_of_functions_args_and_kwds:
return
if waiter is None:
waiter=Waiter(spawn_all, list_of_lists_of_functions_args_and_kwds)
this_time = list_of_list_of_functions_args_and_kwds.pop(0)
for f, a, k in this_time:
spawn_one_thread(waiter, f, *a, **k)
def start_it_all(list_of_lists_of_functions_args_and_kwds):
spawn_all(None, list_of_lists_of_functions_args_and_kwds)
According to the Python source, time.sleep() does not hold the GIL.
http://code.python.org/hg/trunk/file/98e56689c59c/Modules/timemodule.c#l920
Note the use of Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, as documented here:
http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock
The threading module allows you to spawn a thread, which is then represented by a Thread object. That object has a join method that you can use to wait for the subthread to complete.
See http://docs.python.org/library/threading.html#module-threading

Categories

Resources