Data race between differenct processes using same database

Data race between differenct processes using same database - python

I have a system which includes MySQL as database and RabbitMQ for organizing asynchronous data processing.
There are two processes (in two different containers) that work with the same record. First one updates record status in db transaction and sends message to rabbit queue. Second process fetches record from db and does some job. The problem is that the second process can read the message from the queue before the first process completes the update of the record.
Currently, in order to avoid this problem, the second process checks the status of the record, if it does not correspond to the target value, then the process waits for an update by re-sending it to the same queue.
This behavior occurs due to the fact that sending to the queue is performed within the transaction context. If I move the sending to the queue outside the transaction, it is possible that an error will occur or the process will be interrupted after the db transaction completed, the status in the database will change, but the message will not be sent to the queue, and the second process will not process this record.
What can you suggest to solve this architectural problem?

Related

Confluent Kafka poll, when does a message get committed

I have a Python application that has autocommit=True and it is using poll() to get messages with a interval of 1 second. I was reading on the documentation and it mentions that polling reads message in a background thread and queues them so that the main thread can take them afterwards. I was a bit confused there on what happens if I have multiple messages queued and my consumer crashes. Would those messages queued from the background thread have been committed already and hence get lost?

As mentioned in the docs, every auto.commit.interval.ms, any polled offsets will get committed.
If you are concerned about missing data, you should always disable auto-commits, in any Kafka client, and handle commits on your own after you know you've actually processed those records.

Termination of hanging threads in Python

I am quite new to Python.
My problem:
I am starting a bunch of threads in parallel each one trying to create a session with one of a number of foreign hosts (to perform a few activities in those sessions later). Some of these hosts may be in an awkward state in which case the session creation fails eventually, however, that takes about 60 secs. If successful, the session is created immediately. Hence I want to terminate the respective session creation threads after a reasonable time (a few secs). However, I learned that the only way to stop a thread is to communicate an event status to the thread to observe - which makes no sense if it is stuck in an action (here: to establish the session). I use join() with a timeout - that speeds up the execution for all sessions successfully created; however, main() won't exit of course until all the temporarily stuck threads have returned. Is there really no way to cut this short?
Thanks for any useful advice.
Uwe

Delete messages in a queue on a RabbitMQ (AMQP) server

I have 1 big task which consists out of 200 sub-tasks (messages) which will be published onto a queue. If I want to cancel this 1 task, the 200 messages (or the ones that are left and not processed yet) should be deleted. Is there any way to delete these published messages in a queue?
One solution I could think of is to create a queue (Q) which where I publish the name of a new queue (X). Each consumer connects then to this new dynamically created queue (X) and process the 200 published messages. If I want to abort the entire task I delete only that queue (X) from the publisher side. Is that a common approach?

I see few issues with your suggested approach.
The first problem is due to RMQ consumer prefetch which is intended to improve performance by reducing the amount of requests to the broker. If your consumers have retrieved a batch of tasks they will process them all before they ask for new ones, only then they will realize the queue was cancelled. Therefore, your cancellation request would not be handled properly most of the times. You could reduce the prefetch count to 1 to avoid this side effect but this would increase the pressure over the network and reduce overall speed.
The second issue is that the AMQP protocol does not provide mechanisms for gracefully dealing with queue deletion. Therefore your consumers would need to carefully deal with queues disappearing as they would otherwise crash. By doing so, you would loose visibility over bugs and issues. How can you distinguish when a queue was explicitly deleted from a case where it actually crashed?
What I would recommend in this case is marking all your tasks with an identifier of their parent job. Each time a consumer starts consuming a new task, it would check if the parent job is valid or has been cancelled. In the latter case, it would simply ignore the task and move to the next one. You need a supporting service for that. A Redis instance should be more then enough for example.
This mechanism would be way simpler and robust. You can spin as many consumers as you want without the need of orchestrating their connection to the right queue. Also out-of-order or interleaved tasks would not be a problem.

Delay message consumption with SelectConnection

I want to write a consumer with a SelectConnection.
We have several devices in our network infrastructure that close connections after a certain time, therefore I want to use the heartbeat functionality.
As far as I know, the IOLoop runs on the main thread, so heartbeat frames can not be processed while this thread is processing the message.
My idea is to create several worker threads that process the messages so that the main thread can handle the IOLoop. The processing of a message takes a lot of resources, so only a certain amount of the messages should be processed at once. Instead of storing the remaining messages on the client side, I would like to leave them in the queue.
Is there a way to interrupt the consumption of messages, without interrupting the heartbeat?

I am not an expert on SelectConnection for pika, but you could implement this by setting the Consumer Prefetch (QoS) to the wanted number of processes.
This would basically mean that once a message comes in, you offload it to a process or thread, once the message has been processed you then acknowledge that the message has been processed.
As an example, if you set the QoS to 10. The client would pull at most 10 messages, and won't pull any new messages until at least one of those has been acknowledged.
The important part here is that you would need to acknowledge messages only once you are finished processing them.

Distributed server model

Lets say I have 100 servers each running a daemon - lets call it server - that server is responsible for spawning a thread for each user of this particular service (lets say 1000 threads per server). Every N seconds each thread does something and gets information for that particular user (this request/response model cannot be changed). The problem I a have is sometimes a thread hangs and stops doing something. I need some way to know that users data is stale, and needs to be refreshed.
The only idea I have is every 5N seconds have the thread update a MySQL record associated with that user (a last_scanned column in the users table), and another process that checks that table every 15N seconds, if the last_scanned column is not current, restart the thread.

The general way to handle this is to have the threads report their status back to the server daemon. If you haven't seen a status update within the last 5N seconds, then you kill the thread and start another.
You can keep track of the current active threads that you've spun up in a list, then just loop through them occasionally to determine state.
You of course should also fix the errors in your program that are causing threads to exit prematurely.
Premature exits and killing a thread could also leave your program in an unexpected, non-atomic state. You should probably also have the server daemon run a cleanup process that makes sure any items in your queue, or whatever you're using to determine the workload, get reset after a certain period of inactivity.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.