How are global objects handled in threading? - python

I would like to create a Pyramid app with an orm which I am writing (currently in deep alpha status). I want to plug the orm into the app sanely and thus I want to know how global objects are handled in multithreading.
In the file:
https://www.megiforge.pl/p/elephantoplasty/source/tree/0.0.1/src/eplasty/ctx.py
you can see, there is a global object called ctx which contains a default session. What if I run set_context() and start_session() in middleware at ingress? Can I expect then to have a separate session in ctx in every thread? Or is there a risk that two threads will use the same session?

Global variables are shared between all threads, so if you run those functions the threads will conflict with each other in unpredictable ways.
To do what you want you can use thread local data, using threading.local. You need to remove the global definition of ctx and then create the following function.
def get_ctx():
thread_data = threading.local()
if not hasattr(thread_data, "ctx"):
thread_data.ctx = Ctx()
return thread_data.ctx
Then, everywhere you reference ctx call get_ctx() instead. This will ensure that your context is not shared between threads.

Related

Is it safe to define a context manager around a nested function?

For a third party library* I have to provide a function which consumes some data. My implementation of the consumption requires the data to be posted to an API. So I came up with this structure below:
def consumer_provider():
with HttpClient() as http_client:
def consumer(data):
http_client.post(data)
return consumer
So I can present the function to the third party lib like so:
third_party_lib.add(consumer=consumer_provider())
In my tests it works quite well, but is this legit? When do I have to expect that the context manager releases the resource (in this case the connection pool)?
* loguru in this case, but it should not really matter for the question
It depends on the context manager. In the code you wrote, the HTTPClient you created stays alive because the function it returns maintains a reference to it, even though the variable http_client defined in consumer_provider goes out of scope.
However, HTTPClient.__exit__ is still called before consumer_provider returns, so the consumer function may not work as intended.
You may want to do something like
def consumer_provider():
http_client = HttpClient()
def consumer(data):
with http_client:
http_client.post(data)
return consumer
which ensures that the HttpClient object stays "hidden" inside the closure, but its __enter__ and __exit__ methods aren't called until the function gets called. (Whether the client can be used by multiple function calls also depends on the definition of HttpClient.)

How exactly do modules in Python work?

I am trying to better understand Pythons modules, coming from C background mostly.
I have main.py with the following:
def g():
print obj # Need access to the object below
if __name__ == "__main__":
obj = {}
import child
child.f()
And child.py:
def f():
import main
main.g()
This particular structure of code may seem strange at first, but rest assured this is stripped from a larger project I am working on, where delegation of responsibility and decoupling forces the kind of inter-module function call sequence you see.
I need to be able to access the actual object I create when first executing main python main.py. Is this possible without explicitly sending obj as parameter around? Because I will have other variables and I don't want to send these too. If desperate, I can create a "state" object for the entire main module that I need access to, and send it around, but even that is to me a last resort. This is global variables at its simplest in C, but in Python this is a different beast I suppose (module global variables only?)
One of the solutions, excluding parameter passing at least, has turned to revolve around the fact that when executing the main Python module main as such - via f.e. python main.py where if clause suceeds and subsequently, obj is bound - the main module and its state exist and are referenced as __main__ (inspected using sys.modules dictionary). So when the child module needs the actual instance of the main module, it is not main it needs to import but __main__, otherwise two distinct copies would exist, with their own distinct states.
'Fixed' child.py:
def f():
import __main__
__main__.g()

Python tornado global variables

I have a Python Tornado application. I want to have variables which are shared across multiple files. previously I used to declare and initiate them in a python file name global.py and import it into other files. this was a good idea until some of my variables needs to query from database, so every time I imported global.py to get just one value, all of queries was running and causes to slow down my application.
The next step was that I defined my variables in tornado start.py like this:
class RepublishanApplication(tornado.web.Application):
def __init__(self):
##################################################
# conn = pymongo.Connection("localhost", 27017)
self.Countries = GlobalDefined.Countries
self.Countries_rev = GlobalDefined.Countries_rev
self.Languages = GlobalDefined.Languages
self.Categories = GlobalDefined.Categories
self.Categories_rev = GlobalDefined.Categories_rev
self.NewsAgencies = GlobalDefined.NewsAgencies
self.NewsAgencies_rev = GlobalDefined.NewsAgencies_rev
self.SharedConnections = SharedConnections
I can access these variables in handlers like this:
self.application.Countries
It's working good. but the problem is that I can access to these variables only in handler classes and if I want to access them, I have to pass them to functions. I think it's not a good idea. Do you have any suggestion to have access to these variable every where without having to pass application instance to all of my functions or even another way to help me?
Putting your global variables in a globals.py file is a fine way to accomplish this. If you use PyMongo to query values from MongoDB when globals.py is imported, that work is only done the first time globals.py is imported in a process. Other imports of globals.py get the module from the sys.modules cache.

Using Python's multiprocessing.Process class

This is a newbie question:
A class is an object, so I can create a class called pippo() and inside of this add function and parameter, I don't understand if the functions inside of pippo are executed from up to down when I assign x=pippo() or I must call them as x.dosomething() outside of pippo.
Working with Python's multiprocessing package, is it better to define a big function and create the object using the target argument in the call to Process(), or to create your own process class by inheriting from Process class?
I often wondered why Python's doc page on multiprocessing only shows the "functional" approach (using target parameter). Probably because terse, succinct code snippets are best for illustration purposes. For small tasks that fit in one function, I can see how that is the preferred way, ala:
from multiprocessing import Process
def f():
print('hello')
p = Process(target=f)
p.start()
p.join()
But when you need greater code organization (for complex tasks), making your own class is the way to go:
from multiprocessing import Process
class P(Process):
def __init__(self):
super(P, self).__init__()
def run(self):
print('hello')
p = P()
p.start()
p.join()
Bear in mind that each spawned process is initialized with a copy of the memory footprint of the master process. And that the constructor code (i.e. stuff inside __init__()) is executed in the master process -- only code inside run() executes in separate processes.
Therefore, if a process (master or spawned) changes it's member variable, the change will not be reflected in other processes. This, of course, is only true for bulit-in types, like bool, string, list, etc. You can however import "special" data structures from multiprocessing module which are then transparently shared between processes (see Sharing state between processes.) Or, you can create your own channels of IPC (inter-process communication) such as multiprocessing.Pipe and multiprocessing.Queue.

Initializing global object in Python

In summary, my problem is how do you easily make a connection resource a global variable? To be specific, I'd like to open a Redis queue connection and would like to use that in multiple functions without the hassle of passing it as a parameter, i.e.'
#===============================================================================
# Global variables
#===============================================================================
REDIS_QUEUE <- how to initialize
Then, in my main function, have
# Open redis queue connection to server
REDIS_QUEUE = redis.StrictRedis(host=SERVER_IP, port=6379, db=0)
And then use REDIS_QUEUE in multiple functions, e.g.
def sendStatusMsgToServer(statusMsg):
print "\nSending status message to server:"
print simplejson.dumps(statusMsg)
REDIS_QUEUE.rpush(TLA_DATA_CHANNEL, simplejson.dumps(statusMsg))
I thought REDIS_QUEUE = none would work but it gives me
AttributeError: 'NoneType' object has no attribute 'rpush'
I'm new to Python, what's the best way to solve this?
If you want to set the value of a global variable from inside a function, you need to use the global statement. So in your main function:
def main():
global REDIS_QUEUE
REDIS_QUEUE = redis.StrictRedis(host=SERVER_IP, port=6379, db=0)
# whatever else
Note that there's no need to "initialize" the variable outside main before doing this, although you may want to just to document that the variable exists.

Categories

Resources