Redis, only allow operation on existing keys - python

I am using the python package (redis-py) to operate the redis database. I have a bunch of clients that set keys and values of a hash in redis. I want they set keys and values only when the hash exists. If the hash doesn't exist, setting keys and values will create the hash, which is not what I want to do.
In the redis-py page (https://github.com/andymccurdy/redis-py), the author suggested a way to do atomic operation on client side. So I wrote a similar function:
with r.pipeline() as pipe:
while True:
try:
pipe.watch("a_hash")
if pipe.exists("a_hash"):
pipe.hset("a_hash", "key", "value")
break
except redis.WatchError:
continue
finally:
pipe.reset()
However, this seems does not work. After I delete the hash from another client, this hash still gets created by this piece of code, so I guess this piece of code is not atomic operation. Could someone help me to identify what's the problem with this code? Or is there a better to achieve this purpose?
Appreciate your help!

I would suggest to read the definition of a WATCH/MULTI/EXEC block as explained in the Redis documentation.
In such block, only the commands between MULTI and EXEC are actually processed atomically (and conditionally, with an all-or-nothing semantic depending on the watch).
In your example, the EXISTS and HSET commands are not executed atomically. Actually, you don't need this atomicity: what you want is the conditional execution.
This should work better:
with r.pipeline() as pipe:
while True:
try:
pipe.watch("a_hash")
if pipe.exists("a_hash"):
pipe.multi()
pipe.hset("a_hash", "key", "value")
pipe.execute()
break
except redis.WatchError:
continue
finally:
pipe.reset()
If the key is deleted after the EXISTS but before the MULTI, the HSET will not be executed, thanks to the watch.
With Redis 2.6, a Lua server-side script is probably easier to write, and more efficient.

Related

Prevent aws lambda code execute for multiple time

I have a very important webhook which call my lambda function. The issue is this webhook is hitting my lambda function thrice with same data. I don't want to process thrice. I want to exit if it's already being called. I tried to store the data (paid) in dynamo db and check if it's already present but that ain't working. it's like the db is not atomic.
I call below method before executing the code.
def check_duplicate_webhook(user_id, order_id):
try:
status = dynamodb_table_payment.get_item(Key={'user_id': user_id},
ProjectionExpression='payments.#order_id.#pay_status',
ExpressionAttributeNames={
"#order_id": order_id,
'#pay_status': "status"
})
if "Item" in status and "payments" in status['Item']:
check = status['Item']['payments'][order_id]
if check == 'paid':
return True
return False
except Exception as e:
log(e)
return False
Updating the database
dynamodb_table_payment.update_item(Key={'user_id': user_id},
UpdateExpression="SET payments.#order_id.#pay_status = :pay_status, "
"payments.#order_id.#update_date = :update_date, "
"payments.#order_id.reward = :reward_amount",
ExpressionAttributeNames={
"#order_id": attr['order_id'],
'#pay_status': "status",
'#update_date': 'updated_at'
},
ExpressionAttributeValues={
":pay_status": 'paid',
':update_date': int(time.time()),
':reward_amount': reward_amount
})
DynamoDB isn't atomic and if the three requests come very close together, it could happen that the read value isn't consistent. For financial transactions it is recommended to use DynamoDB transactions.
May I also suggest that you use Step Functions and decouple the triggering from the actual execution. The webhook will trigger a function that will register the payment for execution. A different function will execute it. You will need some orchestration in the future, if for not anything else, to implement a retry logic.
You're separating the retrieve and update, which can cause a race condition. You should be able to switch to a put_item() with condition, which will only insert once (or optionally update if the criteria are met).
You could also use a FIFO SQS queue as an intermediary between the webhook and Lambda, and let it do the deduplication. But that's a more complex solution.
It also appears that you're storing all orders for a given customer in a single record in DynamoDB. This seems like a bad idea to me: first because you need more RCUs/WCUs to be able to retrieve larger records, second because you will eventually bump up against the size limit of a DynamoDB record, and third because it makes the update logic more complex. I think you would be better to manage orders separately, using a key of (user_id, order_id).

Python : Handling an error by provoking it. Is this good practice?

I have a Python application for which I have a logger.
At different steps of the execution, the application must read various input files. Input files which can have different information but are all read through the same function.
One of the particular information I am looking at is one called computation_id and MUST be present in one of the file but can be absent from all others.
And I am interested to know what is a correct way of handling this situation. Currently I am handling it like this :
def input_reading(filename):
results = {}
[...]
try:
results['computation_id'] = read_computation_id()
except KeyError:
pass
[...]
return results
so if the computation_id is absent from the file being read, the code shall keep running. However, at some point, I will need this computation id and therefore I need to check if it was correctly read from the file where I expect to find it.
I actually need this value far down the code. But running the code up to this point (which takes some time) to then fail is wasted computation time. So my idea is to check for this value as soon as I can and handle the error the following way :
def specifc_file_read(filename):
[...]
results = input_reading('my_file')
try:
results['computation_id']
except KeyError:
logger.exception('no computation id provided, aborting')
raise SystemExit('no computation id provided, aborting')
[...]
Is this good practice ?
I have a feeling it's not since I need to write special code lines to check for the error "as soon as I can" in the code to avoid wasting computation time.
Since I don't have much experience with error handling, I want to know if this is good practice or not in order not to keep bad habits.
It comes down to what you prefer, imo.
I think that would be more readable
if 'computation_id' not in results: raise ...
although if you want to check if it's available in any file before you begin some heavy data processing, you could to
for f in get_files():
if 'computation_id' in f:
break
else:
raise SystemExit
So it will raise SystemExit if it didn't break, so if it's not available in at least one file.

Check if a database connection is busy using python

I want to create a Database class which can create cursors on demand.
It must be possible to use the cursors in parallel (two or more cursor can coexist) and, since we can only have one cursor per connection, the Database class must handle multiple connections.
For performance reasons we want to reuse connections as much as possible and avoid creating a new connection every time a cursor is created:
whenever a request is made the class will try to find, among the opened connections, the first non-busy connection and use it.
A connection is still busy as long as the cursor has not been consumed.
Here is an example of such class:
class Database:
...
def get_cursos(self,query):
selected_connection = None
# Find usable connection
for con in self.connections:
if con.is_busy() == False: # <--- This is not PEP 249
selected_connection = con
break
# If all connections are busy, create a new one
if (selected_connection is None):
selected_connection = self._new_connection()
self.connections.append(selected_connection)
# Return cursor on query
cur = selected_connection.cursor()
cur.execute(query)
return cur
However looking at the PEP 249 standard I cannot find any way to check whether a connection is actually being used or not.
Some implementations such as MySQL Connector offer ways to check whether a connection has still unread content (see here), however as far as I know those are not part of PEP 249.
Is there a way I can achieve what described before for any PEP 249 compliant python database API ?
Perhaps you could use the status of the cursor to tell you if a cursor is being used. Let's say you had the following cursor:
new_cursor = new_connection.cursor()
cursor.execute(new_query)
and you wanted to see if that connection was available for another cursor to use. You might be able to do something like:
if (new_cursor.rowcount == -1):
another_new_cursor = new_connection.cursor()
...
Of course, all this really tells you is that the cursor hasn't executed anything yet since the last time it was closed. It could point to a cursor that is done (and therefore a connection that has been closed) or it could point to a cursor that has just been created or attached to a connection. Another option is to use a try/catch loop, something along the lines of:
try:
another_new_cursor = new_connection.cursor()
except ConnectionError?: //not actually sure which error would go here but you get the idea.
print("this connection is busy.")
Of course, you probably don't want to be spammed with printed messages but you can do whatever you want in that except block, sleep for 5 seconds, wait for some other variable to be passed, wait for user input, etc. If you are restricted to PEP 249, you are going to have to do a lot of things from scratch. Is there a reason you can't use external libraries?
EDIT: If you are willing to move outside of PEP 249, here is something that might work, but it may not be suitable for your purposes. If you make use of the mysql python library, you can take advantage of the is_connected method.
new_connection = mysql.connector.connect(host='myhost',
database='myDB',
user='me',
password='myPassword')
...stuff happens...
if (new_connection.is_connected()):
pass
else:
another_new_cursor = new_connection.cursor()
...

Why does Firebase event return empty object on second and subsequent events?

I have a Python Firebase SDK on the server, which writes to Firebase real-time DB.
I have a Javascript Firebase client on the browser, which registers itself as a listener for "child_added" events.
Authentication is handled by the Python server.
With Firebase rules allowing reads, the client listener gets data on the first event (all data at that FB location), but only a key with empty data on subsequent child_added events.
Here's the listener registration:
firebaseRef.on
(
"child_added",
function(snapshot, prevChildKey)
{
console.log("FIREBASE REF: ", firebaseRef);
console.log("FIREBASE KEY: ", snapshot.key);
console.log("FIREBASE VALUE: ", snapshot.val());
}
);
"REF" is always good.
"KEY" is always good.
But "VALUE" is empty after the first full retrieval of that db location.
I tried instantiating the firebase reference each time anew inside the listen function. Same result.
I tried a "value" event instead of "child_added". No improvement.
The data on the Firebase side looks perfect in the FB console.
Here's how the data is being written by the Python admin to firebase:
def push_value(rootAddr, childAddr, data):
try:
ref = db.reference(rootAddr)
posts_ref = ref.child(childAddr)
new_post_ref = posts_ref.push()
new_post_ref.set(data)
except Exception:
raise
And as I said, this works perfectly to put the data at the correct place in FB.
Why the empty event objects after the first download of the database, on subsequent events?
I found the answer. Like most things, it turned out to be simple, but took a couple of days to find. Maybe this will save someone else.
On the docs page:
http://firebase.google.com/docs/database/admin/save-data#section-push
"In JavaScript and Python, the pattern of calling push() and then
immediately calling set() is so common that the Firebase SDK lets you
combine them by passing the data to be set directly to push() as
follows..."
I suggest the wording should emphasize that you must do it that way.
The earlier Python example on the same page doesn't work:
new_post_ref = posts_ref.push()
new_post_ref.set({
'author': 'gracehop',
'title': 'Announcing COBOL, a New Programming Language'
})
A separate empty push() followed by set(data) as in this example, won't work for Python and Javascript because in those cases the push() implicitly also does a set() and so an empty push triggers unwanted event listeners with empty data, and the set(data) didn't trigger an event with data, either.
In other words, the code in the question:
new_post_ref = posts_ref.push()
new_post_ref.set(data)
must be:
new_post_ref = posts_ref.push(data)
with set() not explicitly called.
Since this push() code happens only when new objects are written to FB, the initial download to the client wasn't affected.
Though the documentation may be trying to convey the evolution of the design, it fails to point out that only the last Python and Javascript example given will work and the others shouldn't be used.

redis block until key exists

I'm new to Redis and was wondering if there is a way to be able to await geting a value by it's key until the key exists. Minimal code:
async def handler():
data = await self._fetch(key)
async def _fetch(key):
return self.redis_connection.get(key)
As you know, if such key doesnt exist, it return's None. But since in my project, seting key value pair to redis takes place in another application, I want the redis_connection get method to block untill key exists.
Is such expectation even valid?
It is not possible to do what you are trying to do without implementing some sort of polling redis GET on your client. On that case your client would have to do something like:
async def _fetch(key):
val = self.redis_connection.get(key)
while val is None:
# Sleep and retry here
asyncio.sleep(1)
val = self.redis_connection.get(key)
return val
However I would ask you to completelly reconsider the pattern you are using for this problem.
It seems to me that what you need its to do something like Pub/Sub https://redis.io/topics/pubsub.
So the app that performs the SET becomes a publisher, and the app that does the GET and waits until the key is available becomes the subscriber.
I did a bit of research on this and it looks like you can do it with asyncio_redis:
Subscriber https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/receiver.py.
Sender(Publisher): https://github.com/jonathanslenders/asyncio-redis/blob/b20d4050ca96338a129b30370cfaa22cc7ce3886/examples/pubsub/sender.py
Hope this helps.
Except the keyspace notification method mentioned by #Itamar Haber, another solution is the blocking operations on LIST.
handler method calls BRPOP on an empty LIST: BRPOP notify-list timeout, and blocks until notify-list is NOT empty.
The other application pushes the value to the LIST when it finishes setting the key-value pair as usual: SET key value; LPUSH notify-list value.
handler awake from the blocking operation with the value you want, and the notify-list is destroyed by Redis automatically.
The advantage of this solution is that you don't need to modify your handler method too much (with the keyspace notification solution, you need to register a callback function). While the disadvantage is that you have to rely on the notification of another application (with keyspace notification solution, Redis does the notification automatically).
The closest you can get to this behavior is by enabling keyspace notifications and subscribing to the relevant channels (possibly by pattern).
Note, however, that notifications rely on PubSub that is not guaranteed to deliver messages (at-most-once semantics).
After Redis 5.0 there is built-in stream which supports blocking read. The following are sample codes with redis-py.
#add value to my_stream
redis.xadd('my_stream',{'key':'str_value'})
#read from beginning of stream
last_id='0'
#blocking read until there is value
last_stream_item = redis.xread({"my_stream":last_id},block=0)
#update last_id
last_id = last_stream_item[0][1][0][0]
#wait for next value to arrive on stream
last_stream_item = redis.xread({"my_stream":last_id},block=0)

Categories

Resources