redis block until key exists - python

I'm new to Redis and was wondering if there is a way to be able to await geting a value by it's key until the key exists. Minimal code:
async def handler():
data = await self._fetch(key)
async def _fetch(key):
return self.redis_connection.get(key)
As you know, if such key doesnt exist, it return's None. But since in my project, seting key value pair to redis takes place in another application, I want the redis_connection get method to block untill key exists.
Is such expectation even valid?

It is not possible to do what you are trying to do without implementing some sort of polling redis GET on your client. On that case your client would have to do something like:
async def _fetch(key):
val = self.redis_connection.get(key)
while val is None:
# Sleep and retry here
asyncio.sleep(1)
val = self.redis_connection.get(key)
return val
However I would ask you to completelly reconsider the pattern you are using for this problem.
It seems to me that what you need its to do something like Pub/Sub https://redis.io/topics/pubsub.
So the app that performs the SET becomes a publisher, and the app that does the GET and waits until the key is available becomes the subscriber.
I did a bit of research on this and it looks like you can do it with asyncio_redis:
Subscriber https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/receiver.py.
Sender(Publisher): https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/sender.py
Hope this helps.

Except the keyspace notification method mentioned by #Itamar Haber, another solution is the blocking operations on LIST.
handler method calls BRPOP on an empty LIST: BRPOP notify-list timeout, and blocks until notify-list is NOT empty.
The other application pushes the value to the LIST when it finishes setting the key-value pair as usual: SET key value; LPUSH notify-list value.
handler awake from the blocking operation with the value you want, and the notify-list is destroyed by Redis automatically.
The advantage of this solution is that you don't need to modify your handler method too much (with the keyspace notification solution, you need to register a callback function). While the disadvantage is that you have to rely on the notification of another application (with keyspace notification solution, Redis does the notification automatically).

The closest you can get to this behavior is by enabling keyspace notifications and subscribing to the relevant channels (possibly by pattern).
Note, however, that notifications rely on PubSub that is not guaranteed to deliver messages (at-most-once semantics).

After Redis 5.0 there is built-in stream which supports blocking read. The following are sample codes with redis-py.
#add value to my_stream
redis.xadd('my_stream',{'key':'str_value'})
#read from beginning of stream
last_id='0'
#blocking read until there is value
last_stream_item = redis.xread({"my_stream":last_id},block=0)
#update last_id
last_id = last_stream_item[0][1][0][0]
#wait for next value to arrive on stream
last_stream_item = redis.xread({"my_stream":last_id},block=0)

Related

Best approach to multiple websocket client connections in Python?

I appreciate that the question I am about to ask is rather broad but, as a newcomer to Python, I am struggling to find the [best] way of doing something which would be trivial in, say, Node.js, and pretty trivial in other environments such as C#.
Let's say that there is a warehouse full of stuff. And let's say that there is a websocket interface onto that warehouse with two characteristics: on client connection it pumps out a full list of the warehouse's current inventory, and it then follows that up with further streaming updates when the inventory changes.
The web is full of examples of how, in Python, you connect to the warehouse and respond to changes in its state. But...
What if I want to connect to two warehouses and do something based on the combined information retrieved separately from each one? And what if I want to do things based on factors such as time, rather than solely being driven by inventory changes and incoming websocket messages?
In all the examples I've seen - and it's beginning to feel like hundreds - there is, somewhere, in some form, a run() or a run_forever() or a run_until_complete() etc. In other words, the I/O may be asynchronous, but there is always a massive blocking operation in the code, and always two fundamental assumptions which don't fit my case: that there will only be one websocket connection, and that all processing will be driven by events sent out by the [single] websocket server.
It's very unclear to me whether the answer to my question is some sort of use of multiple event loops, or of multiple threads, or something else.
To date, experimenting with Python has felt rather like being on the penthouse floor, admiring the quirky but undeniably elegant decor. But then you get in the elevator, press the button marked "parallelism" or "concurrency", and the evelator goes into freefall, eventually depositing you in a basement filled with some pretty ugly and steaming pipes.
... Returning from flowery metaphors back to the technical, the key thing I'm struggling with is the Python equivalent of, say, Node.js code which could be as trivially simple as the following example [left inelegant for simplicity]:
var aggregateState = { ... some sort of representation of combined state ... };
var socket1 = new WebSocket("wss://warehouse1");
socket1.on("message", OnUpdateFromWarehouse);
var socket2 = new WebSocket("wss://warehouse2");
socket2.on("message", OnUpdateFromWarehouse);
function OnUpdateFromWarehouse(message)
{
... Take the information and use it to update aggregate state from both warehouses ...
}
Answering my own question, in the hope that it may help other Python newcomers... asyncio seems to be the way to go (though there are gotchas such as the alarming ease with which you can deadlock the event loop).
Assuming the use of an asyncio-friendly websocket module such as websockets, what seems to work is a framework along the following lines - shorn, for simplicity, of logic such as reconnects. (The premise remains a warehouse which sends an initial list of its full inventory, and then sends updates to that initial state.)
class Warehouse:
def __init__(self, warehouse_url):
self.warehouse_url = warehouse_url
self.inventory = {} # Some description of the warehouse's inventory
async def destroy():
if (self.websocket.open):
self.websocket.close() # Terminates any recv() in wait_for_incoming()
await self.incoming_message_task # keep asyncio happy by awaiting the "background" task
async def start(self):
try:
# Connect to the warehouse
self.websocket = await connect(self.warehouse_url)
# Get its initial message which describes its full state
initial_inventory = await self.websocket.recv()
# Store the initial inventory
process_initial_inventory(initial_inventory)
# Set up a "background" task for further streaming reads of the web socket
self.incoming_message_task = asyncio.create_task(self.wait_for_incoming())
# Done
return True
except:
# Connection failed (or some unexpected error)
return False
async def wait_for_incoming(self):
while self.websocket.open:
try:
update_message = await self.websocket.recv()
asyncio.create_task(self.process_update_message(update_message))
except:
# Presumably, socket closure
pass
def process_initial_inventory(self, initial_inventory_message):
... Process initial_inventory_message into self.inventory ...
async def process_update_message(self, update_message):
... Merge update_message into self.inventory ...
... And fire some sort of event so that the object's
... creator can detect the change. There seems to be no ...
... consensus about what is a pythonic way of implementing events, ...
... so I'll declare that - potentially trivial - element as out-of-scope ...
After completing the initial connection logic, one key thing is setting up a "background" task which repeatedly reads further update messages coming in over the websocket. The code above doesn't include any firing of events, but there are all sorts of ways in which process_update_message() can/could do this (many of them trivially simple), allowing the object's creator to deal with notifications whenever and however it sees fit. The streaming messages will continue to be received, and any events will be continued to be fired, for as long as the object's creator continues to play nicely with asyncio and to participate in co-operative multitasking.
With that in place, a connection can be established along the following lines:
async def main():
warehouse1 = Warehouse("wss://warehouse1")
if await warehouse1.start():
... Connection succeeded. Update messages will now be processed
in the "background" provided that other users of the event loop
yield in some way ...
else:
... Connection failed ...
asyncio.run(main())
Multiple warehouses can be initiated in several ways, including doing a create_task(warehouse.start()) on each one and then doing a gather on the tasks to ensure/check that they're all okay.
When it's time to quit, to keep asyncio happy, and to stop it complaining about orphaned tasks, and to allow everything to shut down nicely, it's necessary to call destroy() on each warehouse.
But there's one common element which this doesn't cover. Extending the original premise above, let's say that the warehouse also accepts requests from our websocket client, such as "ship X to Y". The success/failure responses to these requests will come in alongside the general update messages; it generally won't be possible to guarantee that the first recv() after the send() of a request will be the response to that request. This complicates process_update_message().
The best answer I've found may or may not be considered "pythonic" because it uses a Future in a way which is strongly analogous to a TaskCompletionSource in .NET.
Let's invent a couple of implementation details; any real-world scenario is likely to look something like this:
We can supply a request_id when submitting an instruction to the warehouse
The success/failure response from the warehouse repeats the request_id back to us (and thus also distinguishing between command-response messages versus inventory-update messages)
The first step is to have a dictionary which maps the ID of pending, in-progress requests to Future objects:
def __init__(self, warehouse_url):
...
self.pending_requests = {}
The definition of a coroutine which sends a request then looks something like this:
async def send_request(self, some_request_definition)
# Allocate a unique ID for the request
request_id = <some unique request id>
# Create a Future for the pending request
request_future = asyncio.Future()
# Store the map of the ID -> Future in the dictionary of pending requests
self.pending_requests[request_id] = request_future
# Build a request message to send to the server, somehow including the request_id
request_msg = <some request definition, including the request_id>
# Send the message
await self.websocket.send(request_msg)
# Wait for the future to complete - we're now asynchronously awaiting
# activity in a separate function
await asyncio.wait_for(command_future, timeout = None)
# Return the result of the Future as the return value of send_request()
return request_future.result()
A caller can create a request and wait for its asynchronous response using something like the following:
some_result = await warehouse.send_request(<some request def>)
The key to making this all work is then to modify and extend process_update_message() to do the following:
Distinguish between request responses versus inventory updates
For the former, extract the request ID (which our invented scenario says gets repeated back to us)
Look up the pending Future for the request
Do a set_result() on it (whose value can be anything depending on what the server's response says). This releases send_request() and causes the await from it to be resolved.
For example:
async def process_update_message(self, update_message):
if <some test that update_message is a request response>:
request_id = <extract the request ID repeated back in update_message>
# Get the Future for this request ID
request_future = self.pending_requests[request_id]
# Create some sort of return value for send_request() based on the response
return_value = <some result of the request>
# Complete the Future, causing send_request() to return
request_future.set_result(return_value)
else:
... handle inventory updates as before ...
I've not used sockets with asyncio, but you're likely just looking for asyncio's open_connection
async def socket_activity(address, callback):
reader, _ = await asyncio.open_connection(address)
while True:
message = await reader.read()
if not message: # empty bytes on EOF
break # connection was closed
await callback(message)
Then add these to the event loop
tasks = [] # keeping a reference prevents these from being garbage collected
for address in ["wss://warehouse1", "wss://warehouse2"]:
tasks.append(asyncio.create_task(
socket_activity(address, callback)
))
# return tasks # or work with them
If you want to wait in a coroutine until N operations are complete, you can use .gather()
Alternatively, you may find Tornado does everything you want and more (I based my Answer off this one)
Tornado websocket client: how to async on_message? (coroutine was never awaited)

Background task with progress tracking without Celery or Redis

My project do some export from one to another service. For do it, need a long background task and show progress as some text like current step of progress. One problem is how to get this text from long task.
I know about Celery and Redis. But it is needed additional resource like server. The my project is too small and does not count on attendance of more than a couple of people once a month. So I don't want to buy a shared hosting or a machine.
I tried to save this current step text into session. But response during long task is always null. I think because flask is busy with the task and does not return a valid value for the session. Tried to run the task in a new thread. Then I get an error about accessing the session from the wrong thread.
My solution
Create global dict
step = dict()
Create random key and give this key with global step into module a background task. Start a new thread and return key.
#app.route('/api/transfer', methods=['post'])
def api_transfer():
global step
key = token_urlsafe(10)
service = Service(step, key, session.get('token'))
t = threading.Thread(target=long_task, args=(service,))
t.start()
return {'key': key}
And send request with this key to get response current step for it
#app.route('/api/transfer/current_step', methods=['post'])
def api_current_step():
global step
return {'step': step[request.json['key']]}
Now for every request to export will create new key and text for only it
{'jlIG7VKtbMHtXg': 'some step for jlIG7VKtbMHtXg'}
{'jlIG7VKtbMHtXg': 'some step for jlIG7VKtbMHtXg', 'wxr4jKyP7c72sg': 'some step for wxr4jKyP7c72sg'}
And at the end of the task, remove this key from the dictionary
del self.step[self.key]

Why does Firebase event return empty object on second and subsequent events?

I have a Python Firebase SDK on the server, which writes to Firebase real-time DB.
I have a Javascript Firebase client on the browser, which registers itself as a listener for "child_added" events.
Authentication is handled by the Python server.
With Firebase rules allowing reads, the client listener gets data on the first event (all data at that FB location), but only a key with empty data on subsequent child_added events.
Here's the listener registration:
firebaseRef.on
(
"child_added",
function(snapshot, prevChildKey)
{
console.log("FIREBASE REF: ", firebaseRef);
console.log("FIREBASE KEY: ", snapshot.key);
console.log("FIREBASE VALUE: ", snapshot.val());
}
);
"REF" is always good.
"KEY" is always good.
But "VALUE" is empty after the first full retrieval of that db location.
I tried instantiating the firebase reference each time anew inside the listen function. Same result.
I tried a "value" event instead of "child_added". No improvement.
The data on the Firebase side looks perfect in the FB console.
Here's how the data is being written by the Python admin to firebase:
def push_value(rootAddr, childAddr, data):
try:
ref = db.reference(rootAddr)
posts_ref = ref.child(childAddr)
new_post_ref = posts_ref.push()
new_post_ref.set(data)
except Exception:
raise
And as I said, this works perfectly to put the data at the correct place in FB.
Why the empty event objects after the first download of the database, on subsequent events?
I found the answer. Like most things, it turned out to be simple, but took a couple of days to find. Maybe this will save someone else.
On the docs page:
http://firebase.google.com/docs/database/admin/save-data#section-push
"In JavaScript and Python, the pattern of calling push() and then
immediately calling set() is so common that the Firebase SDK lets you
combine them by passing the data to be set directly to push() as
follows..."
I suggest the wording should emphasize that you must do it that way.
The earlier Python example on the same page doesn't work:
new_post_ref = posts_ref.push()
new_post_ref.set({
'author': 'gracehop',
'title': 'Announcing COBOL, a New Programming Language'
})
A separate empty push() followed by set(data) as in this example, won't work for Python and Javascript because in those cases the push() implicitly also does a set() and so an empty push triggers unwanted event listeners with empty data, and the set(data) didn't trigger an event with data, either.
In other words, the code in the question:
new_post_ref = posts_ref.push()
new_post_ref.set(data)
must be:
new_post_ref = posts_ref.push(data)
with set() not explicitly called.
Since this push() code happens only when new objects are written to FB, the initial download to the client wasn't affected.
Though the documentation may be trying to convey the evolution of the design, it fails to point out that only the last Python and Javascript example given will work and the others shouldn't be used.

Alternative to a while loop in twisted which doesn't block the reactor thread

I'm making a chat application in twisted. Suppose my server is designed in such a way that whenever it detects a client online, it sends the client all the pending-messages (those messages of that client which were cached in a python-list on the server because it was offline) one-by-one in a while loop until the list is exhausted. Something like this:
class MyChat(LineReceiver):
def connectionMade(self):
self.factory.clients.append(self)
while True:
#retrieve first message from a list of pending-messages(queue) of "self"
msg = self.retrieveFromQueue(self)
if msg != "empty":
self.transport.write(msg)
else:
break
def lineReceived(self, line):
...
def connectionLost(self, reason):
...
def retrieveFromQueue(self, who):
msglist = []
if who in self.factory.userMessages:
msglist = self.factory.userMessages[who]
if msglist != []:
msg = msglist.pop(0) #msglist is a list of strings
self.factory.userMessages[self] = msglist
return msg
else:
return "empty"
factory.userMessages = {} #dict of list of incoming messages of users who aren't online
So according to my understanding of Twisted, the while loop will block the main reactor thread and any interaction from any other client with the server will not be registered by the server. If that's the case, I want an alternate code/method to this approach which will not block the twisted thread.
Update: There may be 2000-3000 pending messages per user because of the nature of the app.
I think that https://glyph.twistedmatrix.com/2011/11/blocking-vs-running.html addresses this point.
The answer here depends on what exactly self.retrieveFromQueue(self) does. You implied it's something like:
if self.list_of_messages:
return self.list_of_messages.pop(0)
return b"empty"
If this is the case, then the answer is one thing. On the other hand, if the implementation is something more like:
return self.remote_mq_client.retrieve_queue_item(self.queue_identifier)
then the answer might be something else entirely. However, note that it's the implementation of retrieveFromQueue upon which the answer appears to hinge.
That there is a while loop isn't quite as important. The while loop reflects the fact that (to use Glyph's words), this code is getting work done.
You may decide that the amount of work this loop represents is too great to all get done at one time. If there are hundreds of millions of queued messages then copying them one by one into the connection's send buffer will probably use both a noticable amount of time and memory. In this case, you may wish to consider the producer/consumer pattern and its support in Twisted. This won't make the code any less (or more) "blocking" but it will make it run for shorter periods of time at a time.
So the questions to answer here are really:
whether or not retrieveFromQueue blocks
if it does not block, whether or not there will be so many queued messages that processing them all will cause connectionMade to run for so long that other clients notice a disruption in service

Using DeferredQueue for inter-task communication in Twisted

I have a Client that currently does the following:
connects
collects some data locally
sends that data to a server
repeats
if disconnected, reconnects and continues the above (not shown)
Like this:
def do_send(self):
def get_data():
# do something
return data
def send_data(data)
self.sendMessage(data)
return deferToThread(get_data).addCallback(send_data)
def connectionMade(self):
WebSocketClientProtocol.connectionMade(self)
self.sender = task.LoopingCall(self.do_send)
self.sender.start(60)
However, when disconnected, I would like the data collection to continue, probably queuing and writing to file at a certain limit. I have reviewed the DeferredQueue object which seems like what I need, but I can't seem to crack it.
In pseudo-code, it would go something like this:
queue = DeferredQueue
# in a separate class from the client protocol
def start_data_collection():
self.collecter = task.LoopingCall(self.get_data)
self.sender.start(60)
def get_data()
# do something
queue.put(data)
Then have the client protocol check the queue, which is where I get lost. Is DeferredQueue what I need, or is there a better way?
A list would work just as well. You'll presumably get lost in the same place - how do you have the client protocol check the list?
Either way, here's one answer:
queued = []
...
connecting = endpoint.connect(factory)
def connected(protocol):
if queued:
sending = protocol.sendMessage(queued.pop(0))
sending.addCallback(sendNextMessage, protocol)
sending.addErrback(reconnect)
connecting.addCallback(connected)
The idea here is that at some point an event happens: your connection is established. This example represents that event as the connecting Deferred. When the event happens, connected is called. This example pops the first item from the queue (a list) and sends it. It waits for the send to be acknowledged and then sends the next message. It also implies some logic about handling errors by reconnecting.
Your code could look different. You could use the Protocol.connectionMade callback to represent the connection event instead. The core idea is the same - define callbacks to handle certain events when they happen. Whether you use an endpoint's connect Deferred or a protocol's connectionMade doesn't really matter.

Categories

Resources