How can I provide shared state to my Flask app with multiple workers without depending on additional software? - python

I want to provide shared state for a Flask app which runs with multiple workers, i. e. multiple processes.
To quote this answer from a similar question on this topic:
You can't use global variables to hold this sort of data. [...] Use a data source outside of Flask to hold global data. A database, memcached, or redis are all appropriate separate storage areas, depending on your needs.
(Source: Are global variables thread safe in flask? How do I share data between requests?)
My question is on that last part regarding suggestions on how to provide the data "outside" of Flask. Currently, my web app is really small and I'd like to avoid requirements or dependencies on other programs. What options do I have if I don't want to run Redis or anything else in the background but provide everything with the Python code of the web app?

If your webserver's worker type is compatible with the multiprocessing module, you can use multiprocessing.managers.BaseManager to provide a shared state for Python objects. A simple wrapper could look like this:
from multiprocessing import Lock
from multiprocessing.managers import AcquirerProxy, BaseManager, DictProxy
def get_shared_state(host, port, key):
shared_dict = {}
shared_lock = Lock()
manager = BaseManager((host, port), key)
manager.register("get_dict", lambda: shared_dict, DictProxy)
manager.register("get_lock", lambda: shared_lock, AcquirerProxy)
try:
manager.get_server()
manager.start()
except OSError: # Address already in use
manager.connect()
return manager.get_dict(), manager.get_lock()
You can assign your data to the shared_dict to make it accessible across processes:
HOST = "127.0.0.1"
PORT = 35791
KEY = b"secret"
shared_dict, shared_lock = get_shared_state(HOST, PORT, KEY)
shared_dict["number"] = 0
shared_dict["text"] = "Hello World"
shared_dict["array"] = numpy.array([1, 2, 3])
However, you should be aware of the following circumstances:
Use shared_lock to protect against race conditions when overwriting values in shared_dict. (See Flask example below.)
There is no data persistence. If you restart the app, or if the main (the first) BaseManager process dies, the shared state is gone.
With this simple implementation of BaseManager, you cannot directly edit nested values in shared_dict. For example, shared_dict["array"][1] = 0 has no effect. You will have to edit a copy and then reassign it to the dictionary key.
Flask example:
The following Flask app uses a global variable to store a counter number:
from flask import Flask
app = Flask(__name__)
number = 0
#app.route("/")
def counter():
global number
number += 1
return str(number)
This works when using only 1 worker gunicorn -w 1 server:app. When using multiple workers gunicorn -w 4 server:app it becomes apparent that number is not a shared state but individual for each worker process.
Instead, with shared_dict, the app looks like this:
from flask import Flask
app = Flask(__name__)
HOST = "127.0.0.1"
PORT = 35791
KEY = b"secret"
shared_dict, shared_lock = get_shared_state(HOST, PORT, KEY)
shared_dict["number"] = 0
#app.route("/")
def counter():
with shared_lock:
shared_dict["number"] += 1
return str(shared_dict["number"])
This works with any number of workers, like gunicorn -w 4 server:app.

your example is a bit magic for me! I'd suggest reusing the magic already in the multiprocessing codebase in the form of a Namespace. I've attempted to make the following code compatible with spawn servers (i.e. MS Windows) but I only have access to Linux machines, so can't test there
start by pulling in dependencies and defining our custom Manager and registering a method to get out a Namespace singleton:
from multiprocessing.managers import BaseManager, Namespace, NamespaceProxy
class SharedState(BaseManager):
_shared_state = Namespace(number=0)
#classmethod
def _get_shared_state(cls):
return cls._shared_state
SharedState.register('state', SharedState._get_shared_state, NamespaceProxy)
this might need to be more complicated if creating the initial state is expensive and hence should only be done when it's needed. note that the OPs version of initialising state during process startup will cause everything to reset if gunicorn starts a new worker process later, e.g. after killing one due to a timeout
next I define a function to get access to this shared state, similar to how the OP does it:
def shared_state(address, authkey):
manager = SharedState(address, authkey)
try:
manager.get_server() # raises if another server started
manager.start()
except OSError:
manager.connect()
return manager.state()
though I'm not sure if I'd recommend doing things like this. when gunicorn starts it spawns lots of processes that all race to run this code and it wouldn't surprise me if this could go wrong sometimes. also if it happens to kill off the server process (because of e.g. a timeout) every other process will start to fail
that said, if we wanted to use this we would do something like:
ss = shared_state('server.sock', b'noauth')
ss.number += 1
this uses Unix domain sockets (passing a string rather than a tuple as an address) to lock this down a bit more.
also note this has the same race conditions as the OP's code: incrementing a number will cause the value to be transferred to the worker's process, which is then incremented, and sent back to the server. I'm not sure what the _lock is supposed to be protecting, but I don't think it'll do much

Related

Prevent change of workers of gunicorn between requests flask [duplicate]

This question already has answers here:
Are global variables thread-safe in Flask? How do I share data between requests?
(4 answers)
Closed 4 months ago.
I'm using flask to develop my application with multiple routes. Running my application on my local machine with a single "thread", it works very well. When I deploy it the problem occurs. The following snippet represents the situation:
object_a = None
#app.route(/route_a)
def route_a():
global object_a
object_a = do_something()
return render_template("route_a.html")
#app.route(/route_b)
def route_b():
global object_a
object_a.get_something
return render_template("route_b.html")
In this example, I used a global variable that is accessible through the routes functions. With a single worker/thread, this approach worked, but when I use gunicorn with 3 workers, e.g., the application crashes because the object accessed is empty. My main hypothesis is that work shifts during the process. Is there a proper way to handle this behavior?
EDIT 1:
The following command are executed for gunicor:
gunicorn --workers 3 --timeout 180 --bind
When you run multiple workers, there will be separate processes
And separate processes can't share data directly.
You need to use some kind of IPC to share your object, or store it in external storage, e.g. Redis
Like #aramcpp said, you'll need to use IPC (inter-process comm) because I believe gunicron spawns multiple processes. You could use threading module with socket module to do the communication. It looks like one process does a task, and the other grabs the result... Is that correct?
I haven't used sockets in a while so that part is pseudo code, but this is a solution:
import threading
import socket
import queue
class Sender(threading.Thread):
def __init__(self, term, q):
self.term = term
self.q = q
# < start socket connection here >
super().__init__()
def run(self):
while not self.term.is_set():
item = self.q.get(block=True)
# < convert item to byte data here >
socket.send(byte_data)
class Receiver(threading.Thread):
def __init__(self, term, q):
self.term = term
self.q = q
# < start socket connection here >
super().__init__()
def run(self):
while not self.term.is_set():
byte_data = socket.recv()
# < convert bytes to whatever >
self.q.put(item)
#app.route(/route_a)
def route_a():
q = queue.Queue()
term = threading.Event()
s = Sender(term, q)
s.start()
data = do_something()
q.put(data, block=True)
term.set()
return render_template("route_a.html")
#app.route(/route_b)
def route_b():
q = queue.Queue()
term = threading.Event()
r = Receiver(term, q)
r.start()
item = q.get(block=True) # <--- your item here
return render_template("route_b.html")
A lot of this is pseudo code, so you'll have to do some research on sockets but this is my approach. I'm sure there are other good ways to do this - or even simpler approaches.
The while loops aren't needed if it's a single shot communication, but I added them to show that a thread can run continuously to do ongoing communication.
Also, if you go this route, some resources will suggest using asyncio for IPC. I would discourage this, considering threading will never create locks. Async IPC requires more coordination.
Gunicorn documentation states " Gunicorn does not implement any IPC solution for coordinating between worker processes." So there you go. You'll need to do it yourself. Which isn't so bad because IPC is a very important skill to learn.

Using the Jaeger Python client together with Luigi

I'm just starting to use Jaeger for tracing and want to get the Python client to work with Luigi. The root of the problem is, that Luigi uses multiprocessing to fork worker processes. The docs mention that this can cause problems and recommend - in case of web apps - to defer the tracer initialization until the request handling process has been forked. This does not work for me, because I want to create traces in the main process and in the worker processes.
In the main process I initialize the tracer like described here:
from jaeger_client import Config
config = Config(...)
tracer = config.initialize_tracer()
This creates internally a Tornado io-loop that cannot be reused in the forked processes. So I try to re-initialize the Jaeger client in each Luigi worker process. This is possible by setting the (undocumented?) task_process_context of the worker-Section to a class implementing the context manager protocol. It looks like this:
class WorkerContext:
def __init__(self, process):
pass
def __enter__(self):
Config._initialized = False
Config._initialized_lock = threading.Lock()
config = Config(...)
self.tracer = config.initialize_tracer()
def __exit__(self, type, value, traceback):
self.tracer.close()
time.sleep(2)
Those two lines are of course very "hackish":
Config._initialized = False
Config._initialized_lock = threading.Lock()
The class variables are copied to the forked process and initialize_tracer would complain about already being initialized, if I would not reset the variables.
The details of how multiprocessing forks the new process and what this means for the Tornado loop are somewhat "mystic" to me. So my question is:
Is the above code safe or am I asking for trouble?
Of course I would get rid of accessing Configs internals. But if the solution could be considered safe, I would ask the maintainers for a reset method, that can be called in a forked process.

Why ZeroMQ fails to communicate when I use multiprocessing.Process to run?

please see the code below :
server.py
import zmq
import time
from multiprocessing import Process
class A:
def __init__(self):
ctx = zmq.Context(1)
sock = zmq.Socket(ctx, zmq.PUB)
sock.bind('ipc://test')
p = Process(target=A.run, args=(sock,))
p.start() # Process calls run, but the client can't receive messages
p.join() #
#A.run(sock) # this one is ok, messages get it to be received
#staticmethod
def run(sock):
while True:
sock.send('demo'.encode('utf-8'))
print('sent')
time.sleep(1)
if __name__ =='__main__':
a = A()
client.py
import zmq
ctx=zmq.Context(1)
sock = zmq.Socket(ctx, zmq.SUB)
sock.connect('ipc://test')
sock.setsockopt_string(zmq.SUBSCRIBE, '')
while True:
print(sock.recv())
In the constructor of server.py, if I call .run()-method directly, the client can receive the message, but when I use the multiprocessing.Process()-method, it fails. Can anyone explain on this and provide some advice?
Q : "Why ZeroMQ fails to communicate when I use multiprocessing.Process to run?"
Well, ZeroMQ does not fail to communicate, the problem is, how Python multiprocessing module "operates".
The module is designed so that some processing may escape from the python central GIL-lock (re-[SERIAL]-iser, that is used as a forever present [CONCURRENT]-situations' principal avoider).
This means that the call to the multiprocessing.Process makes one exact "mirror-copy" of the python interpreter state, "exported" into new O/S-spawned process (details depend on localhost O/S).
Given that, there is zero chance a "mirror"-ed replica could get access to resources already owned by the __main__ - here the .bind()-method already acquired ipc://test address, so "remote"-process will never get "permission" to touch this ZeroMQ AccessPoint, unless the code gets repaired & fully re-factored.
Q : "Can anyone explain on this and provide some advice?"
Sure. The best step to start is to fully understand Pythonic culture of monopolistic GIL-lock re-[SERIAL]-isation, where no two things ever happen in the same time, so even adding more threads does not speed-up the flow of the processing, as it all gets re-aligned by the central "monopolist" The GIL-lock.
Next, understanding a promise of a fully reflected copy of the python interpreter state, while it sounds promising, also has some obvious drawbacks - the new processes, being "mirror"-copies cannot introduce colliding cases on already owned resources. If they try to, a not working as expected cases are the milder of the problems in such principally ill-designed cases.
In your code, the first row in __main__ instantiates a = A(), where A's .__init__ method straight occupies the IPC-resource since .bind('ipc://test'). The later step, p = Process( target = A.run, args = ( sock, ) ) "mirror"-replicates the state of the python interpreter (an as-is copy) and the p.start() cannot but crash into disability to "own" the same resource as the __main__ already owns (yes, the ipc://test for a "mirror"-ed process instructed call to grab the same, non-free resource in .bind('ipc://test') ). This will never fly.
Last but not least, enjoy the Zen-of-Zero, the masterpiece of Martin SUSTRIK for distributed-computing, so well crafted for ultimately scalable, almost zero-latency, very comfortable, widely ported signalling & messaging framework.
Short answer: Start your subprocesses. Create your zmq.Context- and .Socket-instances from within your Producer.run()-classmethod within each subprocess. Use .bind()-method on the side on which your cardinality is 1, and .connect()-method on the side where your cardinality is >1 (in this case, the "server").
My approach would be structured something like...
# server.py :
import zmq
from multiprocessing import Process
class Producer (Process):
def init(self):
...
def run(self):
ctx = zmq.Context(1)
sock = zmq.Socket(ctx, zmq.PUB)
# Multiple producers, so connect instead of bind (consumer must bind)
sock.connect('ipc://test')
while True:
...
if __name__ == "__main__":
producer = Producer()
p = Process(target=producer.run)
p.start()
p.join()
# client.py :
import zmq
ctx = zmq.Context(1)
sock = zmq.Socket(ctx, zmq.SUB)
# Capture from multiple producers, so bind (producers must connect)
sock.bind('ipc://test')
sock.setsockopt_string(zmq.SUBSCRIBE, '')
while True:
print(sock.recv())

How to share variables in multiprocessing [duplicate]

The following does not work
one.py
import shared
shared.value = 'Hello'
raw_input('A cheap way to keep process alive..')
two.py
import shared
print shared.value
run on two command lines as:
>>python one.py
>>python two.py
(the second one gets an attribute error, rightly so).
Is there a way to accomplish this, that is, share a variable between two scripts?
Hope it's OK to jot down my notes about this issue here.
First of all, I appreciate the example in the OP a lot, because that is where I started as well - although it made me think shared is some built-in Python module, until I found a complete example at [Tutor] Global Variables between Modules ??.
However, when I looked for "sharing variables between scripts" (or processes) - besides the case when a Python script needs to use variables defined in other Python source files (but not necessarily running processes) - I mostly stumbled upon two other use cases:
A script forks itself into multiple child processes, which then run in parallel (possibly on multiple processors) on the same PC
A script spawns multiple other child processes, which then run in parallel (possibly on multiple processors) on the same PC
As such, most hits regarding "shared variables" and "interprocess communication" (IPC) discuss cases like these two; however, in both of these cases one can observe a "parent", to which the "children" usually have a reference.
What I am interested in, however, is running multiple invocations of the same script, ran independently, and sharing data between those (as in Python: how to share an object instance across multiple invocations of a script), in a singleton/single instance mode. That kind of problem is not really addressed by the above two cases - instead, it essentially reduces to the example in OP (sharing variables across two scripts).
Now, when dealing with this problem in Perl, there is IPC::Shareable; which "allows you to tie a variable to shared memory", using "an integer number or 4 character string[1] that serves as a common identifier for data across process space". Thus, there are no temporary files, nor networking setups - which I find great for my use case; so I was looking for the same in Python.
However, as accepted answer by #Drewfer notes: "You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter"; or in other words: either you have to use a networking/socket setup - or you have to use temporary files (ergo, no shared RAM for "totally separate python sessions").
Now, even with these considerations, it is kinda difficult to find working examples (except for pickle) - also in the docs for mmap and multiprocessing. I have managed to find some other examples - which also describe some pitfalls that the docs do not mention:
Usage of mmap: working code in two different scripts at Sharing Python data between processes using mmap | schmichael's blog
Demonstrates how both scripts change the shared value
Note that here a temporary file is created as storage for saved data - mmap is just a special interface for accessing this temporary file
Usage of multiprocessing: working code at:
Python multiprocessing RemoteManager under a multiprocessing.Process - working example of SyncManager (via manager.start()) with shared Queue; server(s) writes, clients read (shared data)
Comparison of the multiprocessing module and pyro? - working example of BaseManager (via server.serve_forever()) with shared custom class; server writes, client reads and writes
How to synchronize a python dict with multiprocessing - this answer has a great explanation of multiprocessing pitfalls, and is a working example of SyncManager (via manager.start()) with shared dict; server does nothing, client reads and writes
Thanks to these examples, I came up with an example, which essentially does the same as the mmap example, with approaches from the "synchronize a python dict" example - using BaseManager (via manager.start() through file path address) with shared list; both server and client read and write (pasted below). Note that:
multiprocessing managers can be started either via manager.start() or server.serve_forever()
serve_forever() locks - start() doesn't
There is auto-logging facility in multiprocessing: it seems to work fine with start()ed processes - but seems to ignore the ones that serve_forever()
The address specification in multiprocessing can be IP (socket) or temporary file (possibly a pipe?) path; in multiprocessing docs:
Most examples use multiprocessing.Manager() - this is just a function (not class instantiation) which returns a SyncManager, which is a special subclass of BaseManager; and uses start() - but not for IPC between independently ran scripts; here a file path is used
Few other examples serve_forever() approach for IPC between independently ran scripts; here IP/socket address is used
If an address is not specified, then an temp file path is used automatically (see 16.6.2.12. Logging for an example of how to see this)
In addition to all the pitfalls in the "synchronize a python dict" post, there are additional ones in case of a list. That post notes:
All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)
The workaround to dict['key'] getting and setting, is the use of the dict public methods get and update. The problem is that there are no such public methods as alternative for list[index]; thus, for a shared list, in addition we have to register __getitem__ and __setitem__ methods (which are private for list) as exposed, which means we also have to re-register all the public methods for list as well :/
Well, I think those were the most critical things; these are the two scripts - they can just be ran in separate terminals (server first); note developed on Linux with Python 2.7:
a.py (server):
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
class MyListManager(multiprocessing.managers.BaseManager):
pass
syncarr = []
def get_arr():
return syncarr
def main():
# print dir([]) # cannot do `exposed = dir([])`!! manually:
MyListManager.register("syncarr", get_arr, exposed=['__getitem__', '__setitem__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'])
manager = MyListManager(address=('/tmp/mypipe'), authkey='')
manager.start()
# we don't use the same name as `syncarr` here (although we could);
# just to see that `syncarr_tmp` is actually <AutoProxy[syncarr] object>
# so we also have to expose `__str__` method in order to print its list values!
syncarr_tmp = manager.syncarr()
print("syncarr (master):", syncarr, "syncarr_tmp:", syncarr_tmp)
print("syncarr initial:", syncarr_tmp.__str__())
syncarr_tmp.append(140)
syncarr_tmp.append("hello")
print("syncarr set:", str(syncarr_tmp))
raw_input('Now run b.py and press ENTER')
print
print 'Changing [0]'
syncarr_tmp.__setitem__(0, 250)
print 'Changing [1]'
syncarr_tmp.__setitem__(1, "foo")
new_i = raw_input('Enter a new int value for [0]: ')
syncarr_tmp.__setitem__(0, int(new_i))
raw_input("Press any key (NOT Ctrl-C!) to kill server (but kill client first)".center(50, "-"))
manager.shutdown()
if __name__ == '__main__':
main()
b.py (client)
import time
import multiprocessing
import multiprocessing.managers
import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)
class MyListManager(multiprocessing.managers.BaseManager):
pass
MyListManager.register("syncarr")
def main():
manager = MyListManager(address=('/tmp/mypipe'), authkey='')
manager.connect()
syncarr = manager.syncarr()
print "arr = %s" % (dir(syncarr))
# note here we need not bother with __str__
# syncarr can be printed as a list without a problem:
print "List at start:", syncarr
print "Changing from client"
syncarr.append(30)
print "List now:", syncarr
o0 = None
o1 = None
while 1:
new_0 = syncarr.__getitem__(0) # syncarr[0]
new_1 = syncarr.__getitem__(1) # syncarr[1]
if o0 != new_0 or o1 != new_1:
print 'o0: %s => %s' % (str(o0), str(new_0))
print 'o1: %s => %s' % (str(o1), str(new_1))
print "List is:", syncarr
print 'Press Ctrl-C to exit'
o0 = new_0
o1 = new_1
time.sleep(1)
if __name__ == '__main__':
main()
As a final remark, on Linux /tmp/mypipe is created - but is 0 bytes, and has attributes srwxr-xr-x (for a socket); I guess this makes me happy, as I neither have to worry about network ports, nor about temporary files as such :)
Other related questions:
Python: Possible to share in-memory data between 2 separate processes (very good explanation)
Efficient Python to Python IPC
Python: Sending a variable to another script
You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter.
If it's just simple variables you want, you can easily dump a python dict to a file with the pickle module in script one and then re-load it in script two.
Example:
one.py
import pickle
shared = {"Foo":"Bar", "Parrot":"Dead"}
fp = open("shared.pkl","w")
pickle.dump(shared, fp)
two.py
import pickle
fp = open("shared.pkl")
shared = pickle.load(fp)
print shared["Foo"]
sudo apt-get install memcached python-memcache
one.py
import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)
shared.set('Value', 'Hello')
two.py
import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)
print shared.get('Value')
What you're trying to do here (store a shared state in a Python module over separate python interpreters) won't work.
A value in a module can be updated by one module and then read by another module, but this must be within the same Python interpreter. What you seem to be doing here is actually a sort of interprocess communication; this could be accomplished via socket communication between the two processes, but it is significantly less trivial than what you are expecting to have work here.
you can use the relative simple mmap file.
you can use the shared.py to store the common constants. The following code will work across different python interpreters \ scripts \processes
shared.py:
MMAP_SIZE = 16*1024
MMAP_NAME = 'Global\\SHARED_MMAP_NAME'
* The "Global" is windows syntax for global names
one.py:
from shared import MMAP_SIZE,MMAP_NAME
def write_to_mmap():
map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_WRITE)
map_file.seek(0)
map_file.write('hello\n')
ret = map_file.flush() != 0
if sys.platform.startswith('win'):
assert(ret != 0)
else:
assert(ret == 0)
two.py:
from shared import MMAP_SIZE,MMAP_NAME
def read_from_mmap():
map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_READ)
map_file.seek(0)
data = map_file.readline().rstrip('\n')
map_file.close()
print data
*This code was written for windows, linux might need little adjustments
more info at - https://docs.python.org/2/library/mmap.html
Share a dynamic variable by Redis:
script_one.py
from redis import Redis
from time import sleep
cli = Redis('localhost')
shared_var = 1
while True:
cli.set('share_place', shared_var)
shared_var += 1
sleep(1)
Run script_one in a terminal (a process):
$ python script_one.py
script_two.py
from redis import Redis
from time import sleep
cli = Redis('localhost')
while True:
print(int(cli.get('share_place')))
sleep(1)
Run script_two in another terminal (another process):
$ python script_two.py
Out:
1
2
3
4
5
...
Dependencies:
$ pip install redis
$ apt-get install redis-server
I'd advise that you use the multiprocessing module. You can't run two scripts from the commandline, but you can have two separate processes easily speak to each other.
From the doc's examples:
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
You need to store the variable in some sort of persistent file. There are several modules to do this, depending on your exact need.
The pickle and cPickle module can save and load most python objects to file.
The shelve module can store python objects in a dictionary-like structure (using pickle behind the scenes).
The dbm/bsddb/dbhash/gdm modules can store string variables in a dictionary-like structure.
The sqlite3 module can store data in a lightweight SQL database.
The biggest problem with most of these are that they are not synchronised across different processes - if one process reads a value while another is writing to the datastore then you may get incorrect data or data corruption. To get round this you will need to write your own file locking mechanism or use a full-blown database.
If you wanna read and modify shared data between 2 scripts which run separately, a good solution would be to take advantage of python multiprocessing module and use a Pipe() or a Queue() (see differences here). This way you get to sync scripts and avoid problems regarding concurrency and global variables (like what happens if both scripts wanna modify a variable at the same time).
The best part about using pipes/queues is that you can pass python objects through them.
Also there are methods to avoid waiting for data if there hasn't been passed yet (queue.empty() and pipeConn.poll()).
See an example using Queue() below:
# main.py
from multiprocessing import Process, Queue
from stage1 import Stage1
from stage2 import Stage2
s1= Stage1()
s2= Stage2()
# S1 to S2 communication
queueS1 = Queue() # s1.stage1() writes to queueS1
# S2 to S1 communication
queueS2 = Queue() # s2.stage2() writes to queueS2
# start s2 as another process
s2 = Process(target=s2.stage2, args=(queueS1, queueS2))
s2.daemon = True
s2.start() # Launch the stage2 process
s1.stage1(queueS1, queueS2) # start sending stuff from s1 to s2
s2.join() # wait till s2 daemon finishes
# stage1.py
import time
import random
class Stage1:
def stage1(self, queueS1, queueS2):
print("stage1")
lala = []
lis = [1, 2, 3, 4, 5]
for i in range(len(lis)):
# to avoid unnecessary waiting
if not queueS2.empty():
msg = queueS2.get() # get msg from s2
print("! ! ! stage1 RECEIVED from s2:", msg)
lala = [6, 7, 8] # now that a msg was received, further msgs will be different
time.sleep(1) # work
random.shuffle(lis)
queueS1.put(lis + lala)
queueS1.put('s1 is DONE')
# stage2.py
import time
class Stage2:
def stage2(self, queueS1, queueS2):
print("stage2")
while True:
msg = queueS1.get() # wait till there is a msg from s1
print("- - - stage2 RECEIVED from s1:", msg)
if msg == 's1 is DONE ':
break # ends loop
time.sleep(1) # work
queueS2.put("update lists")
EDIT: just found that you can use queue.get(False) to avoid blockage when receiving data. This way there's no need to check first if the queue is empty. This is no possible if you use pipes.
Use text files or environnement variables. Since the two run separatly, you can't really do what you are trying to do.
In your example, the first script runs to completion, and then the second script runs. That means you need some sort of persistent state. Other answers have suggested using text files or Python's pickle module. Personally I am lazy, and I wouldn't use a text file when I could use pickle; why should I write a parser to parse my own text file format?
Instead of pickle you could also use the json module to store it as JSON. This might be preferable if you want to share the data to non-Python programs, as JSON is a simple and common standard. If your Python doesn't have json, get simplejson.
If your needs go beyond pickle or json -- say you actually want to have two Python programs executing at the same time and updating the persistent state variables in real time -- I suggest you use the SQLite database. Use an ORM to abstract the database away, and it's super easy. For SQLite and Python, I recommend Autumn ORM.
This method seems straight forward for me:
class SharedClass:
def __init__(self):
self.data = {}
def set_data(self, name, value):
self.data[name] = value
def get_data(self, name):
try:
return self.data[name]
except:
return "none"
def reset_data(self):
self.data = {}
sharedClass = SharedClass()
PS : you can set the data with a parameter name and a value for it, and to access the value you can use the get_data method, below is the example:
to set the data
example 1:
sharedClass.set_data("name","Jon Snow")
example 2:
sharedClass.set_data("email","jon#got.com")\
to get the data
sharedClass.get_data("email")\
to reset the entire state simply use
sharedClass.reset_data()
Its kind of accessing data from a json object (dict in this case)
Hope this helps....
You could use the basic from and import functions in python to import the variable into two.py. For example:
from filename import variable
That should import the variable from the file.
(Of course you should replace filename with one.py, and replace variable with the variable you want to share to two.py.)
You can also solve this problem by making the variable as global
python first.py
class Temp:
def __init__(self):
self.first = None
global var1
var1 = Temp()
var1.first = 1
print(var1.first)
python second.py
import first as One
print(One.var1.first)

Maintaining log file from multiple threads in Python

I have my python baseHTTPServer server, which handles post requests.
I used ThreadingMixIn and its now opens a thread for each connection.
I wish to do several multithreaded actions, such as:
1. Monitoring successful/failed connections activities, by adding 1 to a counter for each.
I need a lock for that. My counter is in global scope of the same file. How can I do that?
2. I wish to handle some sort of queue and write it to a file, where the content of the queue is a set of strings, written from my different threads, that simply sends some information for logging issues. How can it be done? I fail to accomplish that since my threading is done "behind the scenes", as each time Im in do_POST(..) method, Im already in a different thread.
Succcessful_Logins = 0
Failed_Logins = 0
LogsFile = open(logfile)
class httpHandler(BaseHTTPRequestHandler):
def do_POST(self):
..
class ThreadingHTTPServer(ThreadingMixIn, HTTPServer):
pass
server = ThreadingHTTPServer(('localhost', PORT_NUMBER), httpHandler)
server.serve_forever()
this is a small fragile of my server.
Another thing that bothers my is the face I want to first send the post response back to the client, and only then possibly get delayed due to locking mechanism or whatever.
From your code, it looks like a new httpHandler is constructed in each thread? If that's the case you can use a class variable for the count and a mutex to protect the count like:
class httpHandler(...):
# Note that these are class variables and are therefore accessable
# to all instances
numSuccess = 0
numSuccessLock = new threading.Lock()
def do_POST(self):
self.numSuccessLock.aquire()
self.numSuccess += 1
self.numSuccessLock.release()
As for writing to a file from different threads, there are a few options:
Use the logging module, "The logging module is intended to be thread-safe without any special work needing to be done by its clients." from http://docs.python.org/2/library/logging.html#thread-safety
Use a Lock object like above to serialize writes to the file
Use a thread safe queue to queue up writes and then read from the queue and write to the file from a separate thread. See http://docs.python.org/2/library/queue.html#module-Queue for examples.

Categories

Resources