I am trying to pass data in threading.local() to functions in different module.
Code is something like this:
other_module.py:
import threading
# 2.1
ll = threading.local()
def other_fn():
# 2.2
ll = threading.local()
v = getattr(ll, "v", None)
print(v)
main_module.py:
import threading
import other_module
# 1.1
ll = threading.local()
def main_fn(v):
# 1.2
ll = threading.local()
ll.v = v
other_fn()
for i in [1,2]:
t = threading.Thread(target=main_fn, args=(i,))
t.start()
But none of combinations 1.x - 2.x not working for me.
I have found similar question - Access thread local object in different module - Python but reply, marked as answer not working for me too if print_message function located in different module.
Is it possible to pass thread local data between modules without passing it as function argument?
In a similar situation I ended up doing the following in a separate module:
import threading
from collections import defaultdict
tls = defaultdict(dict)
def get_thread_ctx():
""" Get thread-local, global context"""
return tls[threading.get_ident()]
This essentially creates a global variable called tls. Then each thread (based on its identity) gets a key in the global dict. I handle that also as a dict. Example:
class Test(Thread):
def __init__(self):
super().__init__()
# note: we cannot initialize thread local here, since thread
# is not running yet
def run(self):
# Get thread context
tmp = get_thread_ctx()
# Create an app-specific entry
tmp["probe"] = {}
self.ctx = tmp["probe"]
while True:
...
Now, in a different module:
def get_thread_settings():
ctx = get_thread_ctx()
probe_ctx = ctx.get("probe", None)
# Get what you need from the app-specific region of this thread
return probe_ctx.get("settings", {})
Hope it helps the next one looking for something similar
Related
I've been struggling with this for a couple months now and have tried a lot of different things to try to alleviate but am not sure what to do anymore. All the examples that I see are different than what I need and in my case it just wouldn't work.
To preface the problem, I have processor applications that get spawned by a manager as a docker container. The processor is a single class that gets run in a forever while loop that processes the same list of items over and over again and runs a function on them. The code I'm working with is quite large so I created a smaller version of the problem below.
this is how I create my engine
db.py
from os import getpid
from pymongo import MongoClient
_mongo_client = None
_mongo_client_pid = None
def get_mongodb_uri(MONGO_DB_HOST, MONGO_DB_PORT) -> str:
return 'mongodb://{}:{}/{}'.format(MONGO_DB_HOST, MONGO_DB_PORT, 'taskprocessor')
def get_db_engine():
global _mongo_client, _mongo_client_pid
curr_pid = getpid()
if curr_pid != _mongo_client_pid:
_mongo_client = MongoClient(get_mongodb_uri(), connect=False)
_mongo_client_pid = curr_pid
return _mongo_client
def get_db(name):
return get_db_engine()['taskprocessor'][name]
These are my DB models
processor.py
from uuid import uuid4
from taskprocessor.db import get_db
class ProcessorModel():
db = get_db("processors")
def __init__(self, **kwargs):
self.uid = kwargs.get('uid', str(uuid4()))
self.exceptions = kwargs.get('exceptions', [])
self.to_process = kwargs.get('to_process', [])
self.functions = kwargs.get('functions', ["int", "round"])
def save(self):
return self.db.insert_one(self.__dict__).inserted_id is not None
#classmethod
def get(cls, uid):
res = cls.db.find_one(dict(uid=uid))
return ProcessorModel(**res)
result.py
from uuid import uuid4
from taskprocessor.db import get_db
class ResultModel():
db = get_db("results")
def __init__(self, **kwargs):
self.uid = kwargs.get('uid', str(uuid4()))
self.res = kwargs.get('res', dict())
def save(self):
return self.db.insert_one(self.__dict__).inserted_id is not None
And my main.py that gets started as a docker container
to run a forever loop
import os
from time import sleep
from taskprocessor.db.processor import ProcessorModel
from taskprocessor.db.result import ResultModel
from multiprocessing import Pool
class Processor:
def __init__(self):
self.id = os.getenv("PROCESSOR_ID")
self.db_model = ProcessorModel.get(self.id)
self.to_process = self.db_model.to_process # list of floats [1.23, 1.535, 1.33499, 242.2352, 352.232]
self.functions = self.db_model.functions # list i.e ["round", "int"]
def run(self):
while True:
try:
pool = Pool(2)
res = list(pool.map(self.analyse, self.to_process))
print(res)
sleep(100)
except Exception as e:
self.db_model = ProcessorModel.get(os.getenv("PROCESSOR_ID"))
self.db_model.exceptions.append(f"exception {e}")
self.db_model.save()
print("Exception")
def analyse(self, item):
res = {}
for func in self.functions:
if func == "round":
res['round'] = round(item)
if func == "int":
res['int'] = int(item)
ResultModel(res=res).save()
return res
if __name__ == "__main__":
p = Processor()
p.run()
I've tried setting connect=False, or even trying to close the connection after the configuration but then end up with connection closed errors. I also tried using a system of recognizing the PID and giving a different client but that still did not help.
Almost all examples I see are where the DB access is not needed before the multiprocessing fork. In my case the initial configuration is heavy and cannot be efficient to do every single time in the process loop. Furthermore the items to process themselves depends on the data from the DB.
I can live with not being able to save the exceptions to the db object from the main pid.
I'm seeing the error logs around fork safety as well as hitting connection pool paused errors as as symptom of this issue.
If anybody sees this, I was using pymongo 4.0.2 and upgraded to 4.3.3 and not seeing the errors I was previously seeing.
I know this is possible using thread local in python but for some reason I am unable to find the exact syntax to achieve this. I have following sample code to test this but this is not working -
module1.py
import threading
def print_value():
local = threading.local() // what should I put here? this is actually creating a new thread local instead of returning a thread local created in main() method of module2.
print local.name;
module2.py
import module1
if __name__ == '__main__':
local = threading.local()
local.name = 'Shailendra'
module1.print_value()
Edit1 - Shared data should be available to only a thread which will invoke these functions and not to all the threads in the system. One example is request id in a web application.
In module 1, define a global variable that is a threading.local
module1
import threading
shared = threading.local()
def print_value():
print shared.name
module2
import module1
if __name__ == '__main__':
module1.shared.name = 'Shailendra'
module1.print_value()
If it's within the same process, why not use a singleton?
import functools
def singleton(cls):
''' Use class as singleton. '''
cls.__new_original__ = cls.__new__
#functools.wraps(cls.__new__)
def singleton_new(cls, *args, **kw):
it = cls.__dict__.get('__it__')
if it is not None:
return it
cls.__it__ = it = cls.__new_original__(cls, *args, **kw)
it.__init_original__(*args, **kw)
return it
cls.__new__ = singleton_new
cls.__init_original__ = cls.__init__
cls.__init__ = object.__init__
return cls
#singleton
class Bucket(object):
pass
Now just import Bucket and bind some data to it
from mymodule import Bucket
b = Bucket()
b.name = 'bob'
b.loves_cats = True
I am trying to add the references of a function in a set (in exposed_setCallback method).
The answer is given at the end. Somehow, it is not adding the reference for the second attempt. The links of the source files are:
http://pastebin.com/BNde5Cgr
http://pastebin.com/aCi6yMT9
Below is the code:
import rpyc
test = ['hi']
myReferences= set()
class MyService(rpyc.Service):
def on_connect(self):
"""Think o+ this as a constructor of the class, but with
a new name so not to 'overload' the parent's init"""
self.fn = None
def exposed_setCallback(self,fn):
# i = 0
self.fn = fn # Saves the remote function for calling later
print self.fn
myReferences.add(self.fn)
#abc.append(i)
#i+=1
print myReferences
for x in myReferences:
print x
#print abc
if __name__ == "__main__":
# lists are pass by reference, so the same 'test'
# will be available to all threads
# While not required, think about locking!
from rpyc.utils.server import ThreadedServer
t = ThreadedServer(MyService, port = 18888)
t.start()
Answer:
<function myprint at 0x01FFD370>
set([<function myprint at 0x01FFD370>])
<function myprint at 0x01FFD370>
<function myprint at 0x022DD370>
set([<function myprint at 0x022DD370>,
Please help
I think the issue is because you have a ThreadedServer which is of course going to be multithreaded.
However, Python sets are not threadsafe, (they are not allowed to be accessed by multiple threads at the same time) so you need to implement a lock for whenever you access the set. You use the lock with a Python context manager (the with statement), which handle acquiring/releasing the lock for you and the Lock itself can only be acquired by one context manager at a time, thus preventing simultaneous access to your set. See the modified code below:
import rpyc
import threading
test = ['hi']
myReferences= set()
myReferencesLock = threading.Lock()
class MyService(rpyc.Service):
def on_connect(self):
"""Think o+ this as a constructor of the class, but with
a new name so not to 'overload' the parent's init"""
self.fn = None
def exposed_setCallback(self,fn):
# i = 0
self.fn = fn # Saves the remote function for calling later
print self.fn
with myReferencesLock:
myReferences.add(self.fn)
#abc.append(i)
#i+=1
with myReferencesLock:
print myReferences
for x in myReferences:
print x
#print abc
if __name__ == "__main__":
# lists are pass by reference, so the same 'test'
# will be available to all threads
# While not required, think about locking!
from rpyc.utils.server import ThreadedServer
t = ThreadedServer(MyService, port = 18888)
t.start()
Welcome to the world of threaded programming. Make sure you protect data shared between threads with locks!
If you want to modify a global variable, you should use global statement on top of your function
def exposed_setCallback(self, fn):
global myReferences
# i = 0
self.fn = fn # Saves the remote function for calling later
print self.fn
myReferences.add(self.fn)
I've been searching for answer to that problem for few hours but couldn't solve it so I have to post here this question, I'm sure it's trivial.
The project I work with has many classes and threads and I'm adding small classes to it. Those classes are executed in different threads with project's engine but I need them to synchronize between themselves - that is class A should be able to send a message to class B. They are also in different modules.
EDIT2: there is a new explanation of this question: look at the bottom.
I am really very beginner in python and I tried to solve this by sharing queue object (Queue.Queue()) and examining it's content in endless loops, I made very simple module with this object and methods get and put:
messenger module:
import Queue
MessageQueue = Queue.Queue()
def GetMessage():
return MessageQueue.get()
def PutMessage(message):
MessageQueue.put(message)
return
and use it in two different classes (import messenger), but since it's not global variable, I assume that 'MessageQueue' object has different instances in different classes. Because those classes seems working on different queues.
How to synchronize two classes with such object between (maybe there is a pretier way instead just making this queue global)?
EDIT1 - here are classes:
class A:
from utils import messenger as m
class Foo():
[...]
def foo():
[...]
m.put(message)
class B:
from utils import messenger
class Bar():
[...]
def bar():
[...]
while True:
print(str(m.get()))
EDIT2: Since I understand my problem a bit better now, here is an update:
Both classes are run as distinct programs in different processes (what may explain why the are not sharing global variables :)).
So the problem remains: how to synchronize between two different programs? The only solution I think of is to make a file on a disc and read it between processes, but it seems very unreliable (locks etc.) and slow.
Can you suggest me different approach?
Ok, I solved the problem using Zero MQ library.
Node A, the publisher:
import zmq, time
from datetime import datetime
context = zmq.Context()
#create this node as publisher
socket = context.socket(zmq.PUB)
socket.bind("tcp://*:25647")
for i in range(300):
message = ("%d, %d" % (1, i))
print str(datetime.now().time()) + "> sending: " + message
socket.send(message)
time.sleep(1)
Node B, the receiver:
import zmq, time
from datetime import datetime
context = zmq.Context()
socket = context.socket(zmq.SUB)
socket.connect("tcp://localhost:25647")
#filter message for particular subscriber ('1')
socket.setsockopt(zmq.SUBSCRIBE, '1')
while True:
message = socket.recv()
print(str(datetime.now().time()) + "> received: " + str(message))
This setting does what I wanted, that is, it conveys signal from one program to another and it does it in quite good time (this very simple message, tuple of two integers, is sent in around 0.5 ms).
Two important things:
subscribe have to be "authorized" to receive message - it is done by filtering the first value of message
publisher is "binding", subscriber "connecting" to socket
The way to share any object among multiple instances (of different classes, of the same class, whatever) without making it global is the same: Pass the object into each instance's constructor. For example:
class Foo(object):
def __init__(self, m):
self.m = m
# ...
# ...
def foo(self):
# ...
self.m.put(message)
# ...
# ...
class Bar(object):
def __init__(self, m):
self.m = m
self.foo = Foo(m)
# ...
# ...
def foo(self):
# ...
self.m.put(message)
# ...
# ...
m = Queue.Queue()
bar1 = Bar(m)
bar2 = Bar(m)
Now bar1, bar2, bar1.foo, and bar2.foo all have the same m object.
(I found a decent solution here for this, but unfortunately I'm using IronPython which does not implement the mutliprocessing module ...)
Driving script Threader.py will call Worker.py's single function twice, using the threading module.
Its single function just fetches a dictionary of data.
Roughly speaking:
Worker.py
def GetDict():
:
:
:
return theDict
Threader.py
import threading
from Worker import GetDict
:
:
:
def ThreadStart():
t = threading.Thread(target=GetDict)
t.start()
:
:
In the driver script Threader.py, I want to be able to operate on the two dictionaries outputted by the 2 instances of Worker.py.
The accepted answer here involving the Queue module seems to be what I need in terms of accessing return values, but this is written from the point of view of everthing being doen in a single script. How do I go about making the return values of the function called in Worker.py available to Threader.py (or any other script for that matter)?
Many thanks
another way to do what you want (without using a Queue) would be by using the concurrent.futures module (from python3.2, for earlier versions there is a backport).
using this, your example would work like this:
from concurrent import futures
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
def ThreadStart():
executor = futures.ThreadPoolExecutor(max_workers=4)
future = executor.submit(GetDict)
print(future.result()) # blocks until GetDict finished
# or doing more then one:
jobs = [executor.submit(GetDict) for i in range(10)]
for j in jobs:
print(j.result())
if __name__ == '__main__':
ThreadStart()
edit:
something similar woule be to use your own thread to execute the target function and save it's return value, something like this:
from threading import Thread
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
class WorkerThread(Thread):
def __init__(self, fnc, *args, **kwargs):
super(WorkerThread, self).__init__()
self.fnc = fnc
self.args = args
self.kwargs = kwargs
def run(self):
self.result = self.fnc(*self.args, **self.kwargs)
def ThreadStart():
jobs = [WorkerThread(GetDict) for i in range(10)]
for j in jobs:
j.start()
for j in jobs:
j.join()
print(j.result)
if __name__ == '__main__':
ThreadStart()