Let's assume a have class
class Helper(object):
#property
def lazy_prop(self):
if not self.__model:
self.__model = init()
return self.__model
...
and I have function
def action(data):
#handle some actions which using Helper
and I have some data, which need to be handled with action function like this
data = ['bla-bla', 'foo-foo']
pool = Pool()
pool.imap(action, data)
The problems is that lazy property goes to be initialized many times, instead of single one. Why that happens and how to fix that?
With multiprocessing, if you spawn multiple jobs then the Helper.lazy_prop will be initialized in each Process.
Related
I'm trying to write a class to help with buffering some data that takes a while to read in, and which needs to be periodically updated. The python version is 3.7.
There are 3 criteria I would like the class to satisfy:
Manual update: An instance of the class should have an 'update' function, which reads in new data.
Automatic update: An instance's update method should be periodically run, so the buffered data never gets too old. As reading takes a while, I'd like to do this without blocking the main process.
Self contained: Users should be able to inherit from the class and overwrite the method for refreshing data, i.e. the automatic updating should work out of the box.
I've tried having instances create their own subprocess for running the updates. This causes problems because simply passing the instance to another process seems to create a copy, so the desired instance is not updated automatically.
Below is an example of the approach I'm trying. Can anyone help getting the automatic update to work?
import multiprocessing as mp
import random
import time
def refresh_helper(buffer, lock):
"""Periodically calls refresh method in a buffer instance."""
while True:
with lock.acquire():
buffer._refresh_data()
time.sleep(10)
class Buffer:
def __init__(self):
# Set up a helper process to periodically update data
self.lock = mp.Lock()
self.proc = mp.Process(target=refresh_helper, args=(self, self.lock), daemon=True)
self.proc.start()
# Do an initial update
self.data = None
self.update()
def _refresh_data(self):
"""Pretends to read in some data. This would take a while for real data"""
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]
data = [random.choice(numbers) for _ in range(3)]
self.data = data
def update(self):
with self.lock.acquire():
self._refresh_data()
def get_data(self):
return self.data
#
if __name__ == '__main__':
buffer = Buffer()
data_first = buffer.get_data()
time.sleep(11)
data_second = buffer.get_data() # should be different from first
Here is an approach that makes use a of a multiprocessing queue. It's similar to what you had implemented, but your implementation was trying to assign to self within Buffer._refresh_data in both processes. Because self refers to a different Buffer object in each process, they did not affect each other.
To send data from one process to another you need to use shared memory, pipes, or some other such mechanism. Python's multiprocessing library provides multiprocess.Queue, which simplifies this for us.
To send data from the refresh helper to the main process we need only use queue.put in the helper process, and queue.get in the main process. The data being sent must be serializable using Python's pickle module to be sent between the processes through a multiprocess.Queue.
Using a multiprocess.Queue also saves us from having to use locks ourselves, since the queue handles that internally.
To handle the helper process starting and stopping cleanly for the example, I have added __enter__ and __exit__ methods to make Buffer into a context manager. They can be removed if you would rather manually stop the helper process.
I have also changed your _refresh_data method into _get_new_data, which returns new data half the time, and has no new data to give the other half of the time (i.e. it returns None). This was done to make it more similar to what I imagine a real application for this class would be.
It is important that only static/class methods or external functions are called from the other process, as otherwise they may operate on a self attribute that refers to a completely different instance. The exception is if the attribute is meant to be sent across the process barrier, like with self.queue. That is why the update method can use self.queue to send data to the main process despite self being a different Buffer instance in the other process.
The method get_next_data will return the oldest item found in the queue. If there is nothing in the queue, it will wait until something is added to the queue. You can change this behaviour by giving the call to self.queue.get a timeout (which will cause an exception to be raised if it times out), or using self.queue.get_nowait (which will return None immediately if the queue is empty).
from __future__ import annotations
import multiprocessing as mp
import random
import time
class Buffer:
def __init__(self):
self.queue = mp.Queue()
self.proc = mp.Process(target=self._refresh_helper, args=(self,))
self.update()
def __enter__(self):
self.proc.start()
return self
def __exit__(self, ex_type, ex_val, ex_tb):
self.proc.kill()
self.proc.join()
#staticmethod
def _refresh_helper(buffer: "Buffer", period: float = 1.0) -> None:
"""Periodically calls refresh method in a buffer instance."""
while True:
buffer.update()
time.sleep(period)
#staticmethod
def _get_new_data() -> list[int] | None:
"""Pretends to read in some data. This would take a while for real data"""
if random.randint(0, 1):
return random.choices(range(10), k=3)
return None
def update(self) -> None:
new_data = self._get_new_data()
if new_data is not None:
self.queue.put(new_data)
def get_next_data(self):
return self.queue.get()
if __name__ == '__main__':
with Buffer() as buffer:
for _ in range(5):
print(buffer.get_next_data())
Running this code will, as an example, start the helper process, then print out the first 5 pieces of data it gets from the buffer. The first one will be from the update that is performed when the buffer is initialized. The others will all be provided by the helper process running update.
Let's review your criteria:
Manual update: An instance of the class should have an 'update' function, which reads in new data.
The Buffer.update method can be used for this.
Automatic update: An instance's update method should be periodically run, so the buffered data never gets too old. As reading takes a while, I'd like to do this without blocking the main process.
This is done by a helper process which adds data to a queue for later processing. If you would rather throw away old data, and only process the newest data, then the queue can be swapped out for a multiprocess.Array, or whatever other multiprocessing compatible shared memory wrapper you prefer.
Self contained: Users should be able to inherit from the class and overwrite the method for refreshing data, i.e. the automatic updating should work out of the box.
This works by overwriting the _get_new_data method. So long as it's a static or class method which returns the data, automatic updating should work with it without any changes.
All processes exist in different areas of memory from one another, each of which is meant to be fully separate from all others. As you pointed out, the additional process creates a copy of the instance on which it operates, meaning the updated version exists in a separate memory space from the instance you're running get_data() on. Because of this there is no easy way to perform this operation on this specific instance from a different process.
Given that you want the updating of the data to not block the checking of the data, you may not use threading, as only 1 thread may operate at a time in any given process. Instead, you need to use an object which exists in a memory space shared between all processes. To do this, you can use a multiprocessing.Value object or a multiprocessing.Array, both of which store ctypes objects. Both of these objects existed in 3.7 (appropriate documentation attached.)
If this approach does not work, consider examining these similar threads:
Sharing a complex object between processes?
multiprocessing: sharing a large read-only object between processes?
Good luck with your project!
This is the code:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
session = foo()
i=0
threads = list()
for i in range(0,5):
thread = threading.Thread(target=session.levelOne.worker, args=(i,))
thread.start()
threads.append(thread)
writerThread = threading.Thread(target=session.levelOne.writer)
writerThread.start()
for thread in threads:
thread.join()
writerThread.join()
5 workers should do the job and the writer should collect their results.
The error I get is: session object has no attribute worker
workers are actually testers that do a certain work in different "areas" while writer is keeping track of them without making my workers return any result.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo" etc. because they will all work together. This is the main reason why I keep the threading outside the class instead of the levelOne method.
please help me understand where I'm wrong
You certainly dont have "session object has no attribute worker" as error message with the code you posted - the error should be "'function' object has no attribute 'worker'". And actually I don't know why you'd expect anything else - names defined within a function are local variables (hint: python functions are objects just like any other), they do not become attributes of the function.
It's important for this algorithm to be divided on layers like "levelOne", "levelTwo"
Well, possibly but that's not the proper design. If you want foo to be nothing but a namespace and levelOne, levelTwo etc to be instances of some type having bot a writer and worker methods, then you need to 1/ define your LevelXXX as classes, 2/ build instances of those objects as attributes of your foo class, ie:
class LevelOne():
def worker(self, i):
# ...
def writer(self):
# ...
class foo():
levelOne = LevelOne()
Now whether this is the correct design for your use case is not garanteed in any way, but it's impossible to design a proper solution without knowing anything about the problem...
If it's possible could you explain why trying to access workers and writer as shown in question's code is bad design?
Well, for the mere reason that it doesn't work, to start with, obviously xD.
Note that you could return the "worker" and "writer" functions from the levelOne method, ie:
class foo:
def levelOne(self):
def worker(self, i):
print('doing hard work')
def writer(self):
print('writing workers work')
return worker, writer
session = foo()
worker, writer = session.levelOne()
# etc
but this is both convoluted (assuming the point is to let worker and writer share self, which is much more simply done using a proper LevelOne class and making worker and writer methods of this class) and inefficient (def is an executable statement, so with your solution the worker and writer functions are created anew - which is not free - on each call).
I have a function foo that takes a parameter stuff
Stuff can be something in a database and I'd like to create a function that takes a stuff_id, get the stuff from the db, execute foo.
Here's my attempt to solve it:
1/ Create a second function with suffix from_stuff_id
def foo(stuff):
do something
def foo_from_stuff_id(stuff_id):
stuff = get_stuff(stuff_id)
foo(stuff)
2/ Modify the first function
def foo(stuff=None, stuff_id=None):
if stuff_id:
stuff = get_stuff(stuff_id)
do something
I don't like both ways.
What's the most pythonic way to do it ?
Assuming foo is the main component of your application, your first way. Each function should have a different purpose. The moment you combine multiple purposes into a single function, you can easily get lost in long streams of code.
If, however, some other function can also provide stuff, then go with the second.
The only thing I would add is make sure you add docstrings (PEP-257) to each function to explain in words the role of the function. If necessary, you can also add comments to your code.
I'm not a big fan of type overloading in Python, but this is one of the cases where I might go for it if there's really a need:
def foo(stuff):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
With type annotations it would look like this:
def foo(stuff: Union[int, Stuff]):
if isinstance(stuff, int):
stuff = get_stuff(stuff)
...
It basically depends on how you've defined all these functions. If you're importing get_stuff from another module the second approach is more Pythonic, because from an OOP perspective you create functions for doing one particular purpose and in this case when you've already defined the get_stuff you don't need to call it within another function.
If get_stuff it's not defined in another module, then it depends on whether you are using classes or not. If you're using a class and you want to use all these modules together you can use a method for either accessing or connecting to the data base and use that method within other methods like foo.
Example:
from some module import get_stuff
MyClass:
def __init__(self, *args, **kwargs):
# ...
self.stuff_id = kwargs['stuff_id']
def foo(self):
stuff = get_stuff(self.stuff_id)
# do stuff
Or if the functionality of foo depends on the existence of stuff you can have a global stuff and simply check for its validation :
MyClass:
def __init__(self, *args, **kwargs):
# ...
_stuff_id = kwargs['stuff_id']
self.stuff = get_stuff(_stuff_id) # can return None
def foo(self):
if self.stuff:
# do stuff
else:
# do other stuff
Or another neat design pattern for such situations might be using a dispatcher function (or method in class) that delegates the execution to different functions based on the state of stuff.
def delegator(stff, stuff_id):
if stuff: # or other condition
foo(stuff)
else:
get_stuff(stuff_id)
What I want to do is get result from call functions in real time.
For example, I want to get the result of i in class model in real time.However, if I use return,I can only get the result of i once.
import threading
class model(object):
"""docstring for model"""
def __init__(self):
pass
def func(self):
for i in range(1000):
print('i',i)
return i
class WorkThread(threading.Thread):
# trigger = pyqtSignal()
def __int__(self):
super(WorkThread,self).__init__()
def run(self):
model1=model()
result = model1.func() #I want to get `i` from class model in real time,however return can only get once.
print('result',result)
if __name__ == '__main__':
WorkThread=WorkThread()
WorkThread.start()
for j in range(1000,2000):
print('j',j)
Anyone has a good idea? Hopefully for help.
You have several options; you could:
Use a generator function, to produce the results as you iterate. This requires that the model1.func() call loops over the generator returned by the model1.func() call. Use this if you don't need access to the data from another thread.
Use a queue; push i results into the queue as you produce them, and another thread can receive them from the queue.
I spent the last hour(s???) looking/googling for a way to have a class start one of its methods in a new thread as soon as it is instanciated.
I could run something like this:
x = myClass()
def updater():
while True:
x.update()
sleep(0.01)
update_thread = Thread(target=updater)
update_thread.daemon = True
update_thread.start()
A more elegant way would be to have the class doing it in init when it is instanciated.
Imagine having 10 instances of that class...
Until now I couldn't find a (working) solution for this problem...
The actual class is a timer and the method is an update method that updates all the counter's variables. As this class also has to run functions at a given time it is important that the time updates won't be blocked by the main thread.
Any help is much appreciated. Thx in advance...
You can subclass directly from Thread in this specific case
from threading import Thread
class MyClass(Thread):
def __init__(self, other, arguments, here):
super(MyClass, self).__init__()
self.daemon = True
self.cancelled = False
# do other initialization here
def run(self):
"""Overloaded Thread.run, runs the update
method once per every 10 milliseconds."""
while not self.cancelled:
self.update()
sleep(0.01)
def cancel(self):
"""End this timer thread"""
self.cancelled = True
def update(self):
"""Update the counters"""
pass
my_class_instance = MyClass()
# explicit start is better than implicit start in constructor
my_class_instance.start()
# you can kill the thread with
my_class_instance.cancel()
In order to run a function (or memberfunction) in a thread, use this:
th = Thread(target=some_func)
th.daemon = True
th.start()
Comparing this with deriving from Thread, it has the advantage that you don't export all of Thread's public functions as own public functions. Actually, you don't even need to write a class to use this code, self.function or global_function are both equally usable as target here.
I'd also consider using a context manager to start/stop the thread, otherwise the thread might stay alive longer than necessary, leading to resource leaks and errors on shutdown. Since you're putting this into a class, start the thread in __enter__ and join with it in __exit__.