Basic using threads question here.
I'm modifying a program with 2 thread classes and I'd like to use a function defined in one class in both classes now.
As a thread newbie (only been playing with them for a few months) is it OK to move the function out of the thread class into the main program and just call it from both classes or do I need to duplicate the function in the other class that doesn't have it?
regards
Simon
You can call the same function from both threads. The issue to be aware of is modifying shared data from two threads at once. If the function attempts to modify the same data from both threads, you will end up with an unpredictable program.
So the answer to your question is, "it depends what the function does."
It certainly won't help to copy the function into both thread classes. What matters is what the function does, not how many copies of the code there are.
might wanna checkout thread locking. threads operating on 1 function/method can 'lock' that function in many languages so other threads can't access it at the same time. http://en.wikipedia.org/wiki/Lock_(computer_science)
Related
I am facing a very weird problem (at least I think it is weird).
Basically I have a very simple function which is called from multiple threads concurrently. While calling the function from a single sequential thread seems to work correctly, when I call the function concurrently it seems to mess up data from different callers.
The actual code is hard to replicate standalone, I just wondered what the life/scope of python data structure is. If I call the same function multiple times concurrently their internal variables/arguments are independent from one another right?
I seem to remember I ran into a similar problem when I was using recursive functions, where a function's data whould persist from call to call, could this be the case?
Thanks
I have a function inside a threaded object, this function accepts several parameters and I don't know if when many threads try to use this function this threads will change the parameter values of another thread?
I can use a lock but after the parameters have been assigned.
If the parameters are stored in the stack I guess they will live inside each threads stack but if they live in heap how can avoid threads changing another threads function parameters?
Function parameters are put on the stack, and each thread has its own stack. You don't have to worry about their thread-safety.
However, all Python objects are stored on the heap; the stack merely holds references to such objects. If multiple threads are accessing one such mutable object they can still interfere with one another if the access is not synchronised somehow. This has nothing to do with how functions are called however.
I have a question posted here, and I got it resolved.
My new question has to do with the code at the end that iterates through the modules in the directory and loads them dynmaically:
modules = pkgutil.iter_modules(path=[os.path.join(path,'scrapers')])
for loader, mod_name, ispkg in modules:
# Ensure that module isn't already loaded, and that it isn't the parent class
if (mod_name not in sys.modules) and (mod_name != "Scrape_BASE"):
# Import module
loaded_mod = __import__('scrapers.'+mod_name, fromlist=[mod_name])
# Load class from imported module. Make sure the module and the class are named the same
class_name = mod_name
loaded_class = getattr(loaded_mod, class_name)
# only instantiate subclasses of Scrape_BASE
if(issubclass(loaded_class,Scrape_BASE.Scrape_BASE)):
# Create an instance of the class and run it
instance = loaded_class()
instance.start()
instance.join()
text = instance.GetText()
In most of the classes I am reading a PDF from a website, scraping the content and setting the text that is subsequently returned by GetText().
In some cases, the PDF is too big and I end up with a Segmentation Fault. Is there a way to monitor the threads to make them time-out after 3 minutes or so? Does anyone have a suggestion as to how I implement this?
The right way to do this is to change the code in those classes that you haven't shown us, so that they don't run forever. If that's possible, you should definitely do that. And if what you're trying to time out is "reading the PDF from a website", it's almost certainly possible.
But sometimes, it isn't possible; sometimes you're just, e.g., calling some C function that has no timeout. So, what do you do about that?
Well, threads can't be interrupted. So you need to use processes instead. multiprocessing.Process is very similar to threading.Thread, except that it runs the code in a child process instead of a thread in the same process.
This does mean that you can't share any global data with your workers without making it explicit, but that's generally a good thing. However, it does mean that the input data (which in this case doesn't seem to be anything) and the output (which seems to be a big string) have to be picklable, and explicitly passed over queues. This is pretty easy to do; read the Exchanging objects between processes section for details.
While we're at it, you may want to consider rethinking your design to think in terms of tasks instead of threads. If you have, say, 200 PDFs to download, you don't really want 200 threads; you want maybe 8 or 12 threads, all servicing a queue of 200 jobs. The multiprocessing module has support for process pools, but you may find concurrent.futures a better fit for this. Both multiprocessing.Pool and concurrent.futures.ProcessPoolExecutor let you just pass a function and some arguments, and then wait for the results, without having to worry about scheduling or queues or anything else.
I am working on a class which operates in a multithreaded environment, and looks something like this (with excess noise removed):
class B:
#classmethod
def apply(cls, item):
cls.do_thing(item)
#classmethod
def do_thing(cls, item)
'do something to item'
def run(self):
pool = multiprocessing.Pool()
for list_of_items in self.data_groups:
pool.map(list_of_items, self.apply)
My concern is that two threads might call apply or do_thing at the same time, or that a subclass might try to do something stupid with cls in one of these functions. I could use staticmethod instead of classmethod, but calling do_thing would become a lot more complicated, especially if a subclass reimplements one of these but not the other. So my question is this: Is the above class thread-safe, or is there a potential problem with using classmethods like that?
Whether a method is thread safe or not depends on what the method does.
Working with local variables only is thread safe. But when you change the same non local variable from different threads, it becomes unsafe.
‘do something to item’ seems to modify only the given object, which is independent from any other object in the list, so it should be thread safe.
If the same object is in the list several times, you may have to think about making the object thread safe. That can be done by using with self.object_scope_lock: in every method which modifies the object.
Anyway, what you are doing here is using processes instead of threads. In this case the objects are pickled and send through a pipe to the other process, where they are modified and send back. In contrast to threads processes do not share memory. So I don’t think using a lock in the class-method would have an effect.
http://docs.python.org/3/library/threading.html?highlight=threading#module-threading
There's no difference between classmethods and regular functions (and instance methods) in this regard. Neither is automagically thread-safe.
If one or more classmethods/methods/functions can manipulate data structures simultaneously from different threads, you'd need to add synchronization protection, typically using threading.Locks.
Both other answers are technically correct in that the safety of do_thing() depends on what happens inside the function.
But the more precise answer is that the call itself is safe. In other words if apply()and do_thing()are a pure functions, then your code is safe. Any unsafe-ness would be due to them not being pure functions (e.g. relying on or affecting a shared variable during execution)
As shx2 mentioned, classmethods are only "in" a class visually, for grouping. They have no inherent attachment to any instance of the class. Therefore this code is roughly equivalent in functioning:
def apply(item):
do_thing(item)
def do_thing(item)
'do something to item'
class B:
def run(self):
pool = multiprocessing.Pool()
for list_of_items in self.data_groups:
pool.map(list_of_items, apply)
A further note on concurrency given the other answers:
threading.Lock is easy to understand, but should be your last resort. In naive implementations it is often slower than completely linear processing. Your code will usually be faster if you can use things like threading.Event, queue.Queue, or multiprocessing.Pipe to transfer information instead.
asyncio is the new hotness in python3. It's a bit more difficult to get right but is generally the fastest method.
If you want a great walkthrough modern concurrency techniques in python check out core developer Raymond Hettinger's Keynote on Concurrency. The whole thing is great, but the downside of lockis highlighted starting at t=57:59.
I have following code for click handler in my PyQT4 program:
def click_btn_get_info(self):
task = self.window.le_task.text()
self.statusBar().showMessage('Getting task info...')
def thread_routine(task_id):
order = self.ae.get_task_info(task_id)
if order:
info_str = "Customer: {email}\nTitle: {title}".format(**order)
self.window.lbl_order_info.setText(info_str)
self.statusBar().showMessage('Done')
else:
self.statusBar().showMessage('Authentication: failed!')
thread = threading.Thread(target=thread_routine, args=(task,))
thread.start()
Is it a good practice to declare function in function for using with threads?
In general, yes, this is perfectly reasonable. However, the alternative of creating a separate method (or, for top-level code, a separate function) is also perfectly reasonable. And so is creating a Thread subclass. So, there's no rule saying to always do one of the three; there are different cases where each one seems more reasonable than the others, but there's overlap between those cases, so it's usually a judgment call.
As Maxime pointed out, you probably want to use Qt's threading, not native Python threading. Especially since you want to call methods on your GUI objects. The Qt docs article Threads, Events and QObjects in the Qt documentation gives you an overview (although from a C++, not Python, viewpoint). And if you're using a QThread rather than a threading.Thread, it is much more common to use the OO method—define a subclass of QThread and override its run method than to define a function, which makes your question moot.
But if you do stick with Python threading, here's how I'd decide.
Pro separate method:
You're doing this in a class method, rather than a function, and that the only state you want to share with the new thread is self.
Non-trivial code, longer than the function it's embedded in.
Pro local function:
Pretty specific to the info button callback; no one else will ever want to call it.
I'd probably make it a method, but I wouldn't complain about someone else's code that made it a local function.
In a different case—e.g., if the thread needed access to a local variable that had no business being part of the object, or if it were a trivial function I could write as an inline lambda, or if this were a top-level function sharing globals rather than a method sharing self, I'd go the other direction.