Scanning MySQL table for updates Python

Scanning MySQL table for updates Python - python

I am creating a GUI that is dependent on information from MySQL table, what i want to be able to do is to display a message every time the table is updated with new data. I am not sure how to do this or even if it is possible. I have codes that retrieve the newest MySQL update but I don't know how to have a message every time new data comes into a table. Thanks!

Quite simple and straightforward solution will be just to poll the latest autoincrement id from your table, and compare it with what you've seen at the previous poll. If it is greater -- you have new data. This is called 'active polling', it's simple to implement and will suffice if you do this not too often. So you have to store the last id value somewhere in your GUI. And note that this stored value will reset when you restart your GUI application -- be sure to think what to do at the start of the GUI. Probably you will need to track only insertions that occur while GUI is running -- then, at the GUI startup you need just to poll and store current id value, and then poll peroidically and react on its changes.

#spacediver gives some good advice about the active polling approach. I wanted to post some other options as well.
You could use some type of message passing to communcate notifications between clients. ZeroMQ, twisted, etc offer these features. One way to do it is have the updating client issue the message along with their successful database insert. Clients can all listen to a channel for notifications instead of always polling the db.
If you cant control adding an update message to the client doing the insertions, you could also look at this link for using a database trigger to call a script which would simply issue an update message to your messaging framework. It explains installing a UDF extension to allow you to run a sys_exec command in a trigger and call a simple script.
This way clients simply respond to a notification instead of all checking regularily.

Related

Django - Two Users Accessing The Same Data

Let's say that I have a Django web application with two users. My web application has a global variable that exist on the server (a Pandas Dataframe created from data from an external SQL database).
Let's say that a user makes an update request to that Dataframe and now that Dataframe is being updated. As the Dataframe is being updated, the other user makes a get request for that Dataframe. Is there a way to 'lock' that Dataframe until user 1 is finished with it and then finish the request made by user 2?
EDIT:
So the order of events should be:
User 1 makes an update request, Dataframe is locked, User 2 makes a get request, Dataframe is finished updating, Dataframe is unlocked, User 2 gets his/her request.
Lines of code would be appreciated!

Ehm... Django is not a server. It has a single-threaded development server in it, but it should not be used for anything beyond development and maybe not even for that. Django applications are deployed using WSGI. WSGI server running your app is likely to start several separate worker threads and will be killing and restarting these threads according to the rules in its configuration.
This means, that you cannot rely on multiple requests hitting the same process. Django app lifecycle is between getting a request and returning a response. Anything that is not explicitly made persistent between those two events should be considered gone.
So, when one of your users updates a global variable, this variable only exists in the one process this user randomly accessed. The second user might or might not hit the same process and therefore might or might not get the same copy of the variable. More than that, the process will sooner or later be killed by the WSGI server and all the updates will be gone.
What I am getting at is that you might want to rethink your architecture before you bother with the atomic update problems.

Don't share in memory objects if you're going to mutate them. Concurrency is super hard to do right and premature optimization is evil. Give each user their own view of the data and only share data via the database (using transactions to make your updates atomic). Keep and increment counters in your database every time you make an update, make transactions fail if those number have changed since the data was read (as somebody else has mutated it).
Also, don't make important architectural decisions when tired! :)

Django : Detecting finishing of a script in the background and other functionalities should work in the meantime

I need to execute a command on a simple button press event in my Django project (for which I'm using "subprocess.Popen()" in my views.py ).
After I execute this script it may take anywhere from 2 minutes to 5 minutes to complete. So while the script executes I need to disable the html button but I want the users to continue using other web pages while the script finishes in the background. Now the real problem is that I want to enable the html button back, when the process finishes!
I'm stuck at this from many days. Any help or suggestion is really really appreciated.

I think you have to use some "realtime" libraries for django. I personally know django-realtime (simple one) and swampdragon (less simple, but more functional). With both of this libraries you can create web-socket connection and send messages to clients from server that way. It may be command for enabling html button or javascript alert or whatever you want.
In your case I advice you first option, because you can send message to client directly from any view. And swampdragon needs model to track changes as far I know.

Like valentjedi suggested, you should be using swampdragon for real time with django.
You should take the first tutorial here: http://swampdragon.net/tutorial/part-1-here-be-dragons-and-thats-a-good-thing/
Then read this as it holds knowledge required to accomplish what you want:
http://swampdragon.net/tutorial/building-a-real-time-server-monitor-app-with-swampdragon-and-django/
However there is a difference between your situation and the example given above, in your situation:
Use Celery or any other task queue, since the action you wait for takes long time to finish, you will need to pass it to the background. (You can also make these tasks occur one after another if you don't want to freeze your system with enormous memory usage).
Move the part of code that runs the script to your celery task, in this case, Popen should be called in your Celery task and not in your view (router in swampdragon).
You then create a channel with the user's unique identifier, and add relevant swampdragon javascript code in your html file for the button to subscribe to that user's channel (also consider disabling the feature on your view (router) since front-end code can be tempered with.
The channel's role will be to pull the celery task state, you
then disable or enable the button according to the state of
the task.
overview:
Create celery task for your script.
Create a user unique channel that pulls the task state.
Disable or enable the button on the front-end according to the state of the taks, consider displaying failure message in case the script fails so that the user restart again.
Hope this helps!

ZeroMQ is too fast for database transaction

Inside an web application ( Pyramid ) I create certain objects on POST which need some work done on them ( mainly fetching something from the web ). These objects are persisted to a PostgreSQL database with the help of SQLAlchemy. Since these tasks can take a while it is not done inside the request handler but rather offloaded to a daemon process on a different host. When the object is created I take it's ID ( which is a client side generated UUID ) and send it via ZeroMQ to the daemon process. The daemon receives the ID, and fetches the object from the database, does it's work and writes the result to the database.
Problem: The daemon can receive the ID before it's creating transaction is committed. Since we are using pyramid_tm, all database transactions are committed when the request handler returns without an error and I would rather like to leave it this way. On my dev system everything runs on the same box, so ZeroMQ is lightning fast. On the production system this is most likely not an issue since web application and daemon run on different hosts but I don't want to count on this.
This problem only recently manifested itself since we previously used MongoDB with a write_convern of 2. Having only two database servers the write on the entity always blocked the web-request until the entity was persisted ( which is obviously is not the greatest idea ).
Has anyone run into a similar problem?
How did you solve it?
I see multiple possible solutions, but most of them don't satisfy me:
Flushing the transaction manually before triggering the ZMQ message. However, I currently use SQLAlchemy after_created event to trigger it and this is really nice since it decouples this process completely and thus eliminating the risk of "forgetting" to tell the daemon to work. Also think that I still would need a READ UNCOMMITTED isolation level on the daemon side, is this correct?
Adding a timestamp to the ZMQ message, causing the worker thread that received the message, to wait before processing the object. This obviously limits the throughput.
Dish ZMQ completely and simply poll the database. Noooo!

I would just use PostgreSQL's LISTEN and NOTIFY functionality. The worker can connect to the SQL server (which it already has to do), and issue the appropriate LISTEN. PostgreSQL would then let it know when relevant transactions finished. You trigger for generating the notifications in the SQL server could probably even send the entire row in the payload, so the worker doesn't even have to request anything:
CREATE OR REPLACE FUNCTION magic_notifier() RETURNS trigger AS $$
BEGIN
PERFORM pg_notify('stuffdone', row_to_json(new)::text);
RETURN new;
END;
$$ LANGUAGE plpgsql;
With that, right as soon as it knows there is work to do, it has the necessary information, so it can begin work without another round-trip.

This comes close to your second solution:
Create a buffer, drop the ids from your zeromq messages in there and let you worker poll regularly this id-pool. If it fails retrieving an object for the id from the database, let the id sit in the pool until the next poll, else remove the id from the pool.
You have to deal somehow with the asynchronous behaviour of your system. When the ids arrive constantly before persisting the object in the database, it doesnt matter whether pooling the ids (and re-polling the the same id) reduces throughput, because the bottleneck is earlier.
An upside is, you could run multiple frontends in front of this.

Inter-database communications in PostgreSQL

I am using PostgreSQL 8.4. I really like the new unnest() and array_agg() features; it is about time they realize the dynamic processing potential of their Arrays!
Anyway, I am working on web server back ends that uses long Arrays a lot. Their will be two successive processes which will each occur on a different physical machine. Each such process is a light python application which ''manage'' SQL queries to the database on each of their machines as well as requests from the front ends.
The first process will generate an Array which will be buffered into an SQL Table. Each such generated Array is accessible via a Primary Key. When its done the first python app sends the key to the second python app. Then the second python app, which is running on a different machine, uses it to go get the referenced Array found in the first machine. It then sends it to it's own db for generating a final result.
The reason why I send a key is because I am hopping that this will make the two processes go faster. But really what I would like is for a way to have the second database send a query to the first database in the hope of minimizing serialization delay and such.
Any help/advice would be appreciated.
Thanks

Sounds like you want dblink from contrib. This allows some inter-db postgres communication. The pg docs are great and should provide the needed examples.

not sure I totally understand, but you've looked at notify/listen? http://www.postgresql.org/docs/8.1/static/sql-listen.html

I am thinking either listen/notify or something with a cache such as memcache. You would send the key to memcache and have the second python app retrieve it from there. You could even do it with listen/notify... e.g; send the key and notify your second app that the key is in memcache waiting to be retrieved.

Dynamic data in postgresql

I intend to have a python script do many UPDATEs per second on 2,433,000 rows. I am currently trying to keep the dynamic column in python as a value in a python dict. Yet to keep my python dict synchronized with changes in the other columns is becoming more and more difficult or nonviable.
I know I could put the autovacuum on overdrive, but I wonder if this would be enough to catch up with the sheer amount of UPDATEs. If only I could associate a python variable to each row...
I fear that the VACUUM and diskwrite overhead will kill my server?
Any suggestions on how to associate extremely dynamic variables to rows/keys?
Thx

PostgreSQL supports asynchronous notifications using the LISTEN and NOTIFY commands. An application (client) LISTENs for a notification using a notification name (e.g. "table_updated"). The database itself can be made to issue notifications either manually i.e. in the code that performs the insertions or modifications (useful when a large number of updates are made, allowing for batch notifications) or automatically inside a row update TRIGGER.
You could use such notifications to keep your data structures up to date.
Alternatively (or you can use this in combination with the above), you can customize your Python dictionary by overriding the __getitem__(), has_key(), contains() methods and have them perform lookups as needed, allowing you to cache the results using timeouts etc.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.