python & postgresql: reliably check for updates in a specific table - python

Situation: I have a live trading script which computes all sorts of stuff every x minutes in my main thread (Python). the order sending is performed through such thread. the reception and execution of such orders though is a different matter as I cannot allow x minutes to pass but I need them as soon as they come in. I initialized another thread to check for such data (execution) which is in a database table (POSTGRES SQL).
Problem(s): I cannot continuosly perform query every xx ms, get data from DB, compare table length, and then get the difference for a variety of reasons (not only guy to use such DB, perforamnce issues, etc). so I looked up some solutions and came up with this thread (https://dba.stackexchange.com/questions/58214/getting-last-modification-date-of-a-postgresql-database-table) where basically the gist of it was that
"There is no reliable, authorative record of the last modified time of a table".
Question: what can I do about it, that is: getting near instantenuous responses from a postgres sql table without overloading the whole thing using Python?

You can use notifications in postgresql:
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
import select
def dblisten(dsn):
connection = psycopg2.connect(dsn)
connection.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cur = connection.cursor()
cur.execute("LISTEN new_id;")
while True:
select.select([connection],[],[])
connection.poll()
events = []
while connection.notifies:
notify = connection.notifies.pop().payload
do_something(notify)
and install a trigger for each update:
CREATE OR REPLACE FUNCTION notify_id_trigger() RETURNS trigger AS $$
BEGIN
PERFORM pg_notify('new_id', NEW.ID);
RETURN new;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER data_modified AFTER insert or update on data_table for each row execute procedure notify_id_trigger();")

Related

Django Calculated Running Balance

I am working on a Personal Finance App using Django (Creating an API Based Backend). I have a transactions table where I need to maintain the running balance based on the UserID, AccountID, TransactionDate. After Creating, Updating, or Deleting a Transaction I need to update the running balance. The current method I have identified to do this is by running a custom SQL statement which I call once one of the above operations has been done to update this field (see code below):
def update_transactions_running_balance(**kwargs):
from django.db import connection
querystring = '''UPDATE transactions_transactions
SET
transaction_running_balance = running_balance_calc.calc_running_balance
FROM
(SELECT
transaction_id,
transaction_user,
transaction_account,
SUM(transaction_amount_account_currency) OVER (ORDER BY transaction_user, transaction_account, transaction_date ASC, transaction_id ASC) as calc_running_balance
FROM
transactions_transactions
WHERE
transaction_user = {}
AND
transaction_account = {}) as running_balance_calc
WHERE
transactions_transactions.transaction_id = running_balance_calc.transaction_id
AND
transactions_transactions.transaction_user = running_balance_calc.transaction_user
AND
transactions_transactions.transaction_account = running_balance_calc.transaction_account'''.format(int(kwargs['transaction_user']), int(kwargs['transaction_account']))
with connection.cursor() as cursor:
cursor.execute(querystring)
However the issue I have is once the table gets a little larger, the response time started to increase (The SELECT statement is where the time is taken). The other issue I have is when I load the server with multiple concurrent create transactions, every once in a while (current rate is 0.25%) the running balance update fails due to the following error:
ERROR:  deadlock detected.
Process 7038 waits for ShareLock on transaction 5549; blocked by process 7036.
Process 7036 waits for ShareLock on transaction 5551; blocked by process 7038.
I was wondering if there was a better way to do this? I originally wanted to define the running balance field as calculated field in the Django Model but I couldn't figure out how to define the it so that it achieves the same result as the code above. Any help would be appreciated.
Thanks.

PostgreSQL DROP TABLE query freezes

I am writing code to create a GUI in Python on the Spyder environment of Anaconda. within this code I operate with a PostgreSQL database and I therefore use the psycopg2 database adapter so that I can interact with directly from the GUI.
The code is too long to post here, as it is over 3000 lines, but to summarize, I have no problem interacting with my database except when I try to drop a table.
When I do so, the GUI frames become unresponsive, the drop table query doesn't drop the intended table and no errors or anything else of that kind are thrown.
Within my code, all operations which result in a table being dropped are processed via a function (DeleteTable). When I call this function, there are no problems as I have inserted several print statements previously which confirmed that everything was in order. The problem occurs when I execute the statement with the cur.execute(sql) line of code.
Can anybody figure out why my tables won't drop?
def DeleteTable(table_name):
conn=psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'")
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
That must be because a concurrent transaction is holding a lock that blocks the DROP TABLE statement.
Examine the pg_stat_activity view and watch out for sessions with state equal to idle in transaction or active that have an xact_start of more than a few seconds ago.
This is essentially an application bug: you must make sure that all transactions are closed immediately, otherwise Bad Things can happen.
I am having the same issue when using psycopg2 within airflow's postgres hook and I resolved it with with statement. Probably this resolves the issue because the connection becomes local within the with statement.
def drop_table():
with PostgresHook(postgres_conn_id="your_connection").get_conn() as conn:
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS your_table")
task_drop_table = PythonOperator(
task_id="drop_table",
python_callable=drop_table
)
And a solution is possible for the original code above like this (I didn't test this one):
def DeleteTable(table_name):
with psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'") as conn:
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
Please comment if anyone tries this.

simple select query very slow

I have a sqlite database that I use as a cache and using this function
def read_cache(table):
if not table.isidentifier(): # <- my version of sqli protection used to allow for user to provide table and column names
raise ValueError(f"Invalid table name '{table}'")
for record in cur.execute(f"SELECT * FROM {table};"):
yield {**record}
a call list(read_cache('some_values')) takes forever (nearly a minute) to return, yet when I run the query against the db itself (as in cur.execute('select * from some_values;').fetchall() or [r for r in cur.execute('select * from some_values;')]) it returns immediately.
The table in question has 40k rows, 2 columns each
What can I do to find the issue or speed it up?
EDIT: I think I found the issue, the speed issue only occurs in an interpreter that has some threading.Thread running... wonder if the issue is GIL related... will have to see what to do about it...

Global query timeout in MySQL 5.6

I need to apply a query timeout at a global level in my application. The query: SET SESSION max_execution_time=1 does this with MySQL 5.7. I am using MySQL 5.6 and cannot upgrade at the moment. Any solution with SQL Alchemy would also help.
It seems there is no equivalent to max_execution_time in MySQL prior to versions 5.7.4 and 5.7.8 (the setting changed its name). What you can do is create your own periodic job that checks if queries have exceeded timeout and manually kill them. Unfortunately that is not quite the same as what the newer MySQL versions do: without inspecting the command info you'll end up killing all queries, not just read only SELECT, and it is nigh impossible to control at session level.
One way to do that would be to create a stored procedure that queries the process list and kills as required. Such stored procedure could look like:
DELIMITER //
CREATE PROCEDURE stmt_timeout_killer (timeout INT)
BEGIN
DECLARE query_id INT;
DECLARE done INT DEFAULT FALSE;
DECLARE curs CURSOR FOR
SELECT id
FROM information_schema.processlist
WHERE command = 'Query' AND time >= timeout;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
-- Ignore ER_NO_SUCH_THREAD, in case the query finished between
-- checking the process list and actually killing threads
DECLARE CONTINUE HANDLER FOR 1094 BEGIN END;
OPEN curs;
read_loop: LOOP
FETCH curs INTO query_id;
IF done THEN
LEAVE read_loop;
END IF;
-- Prevent suicide
IF query_id != CONNECTION_ID() THEN
KILL QUERY query_id;
END IF;
END LOOP;
CLOSE curs;
END//
DELIMITER ;
Alternatively you could implement all that in your application logic, but it would require separate round trips to the database for each query to be killed. What's left then is to call this periodically:
# Somewhere suitable
engine.execute(text("CALL stmt_timeout_killer(:timeout)"), timeout=30)
How and where exactly depends heavily on your actual application.

Results of previous update returned on mysql select in Python3

I have one script running on a server that updates a list of items in a MySQL database to be processed by another script running on my desktop. The script runs in a loop, processing the list every 5 minutes (the server side script also runs on a 5 minute cycle). On the first loop, the script retrieves the current list (basic SELECT operation), on the second cycle, it gets the same version (not updated) list, on the third, it gets the list it should have gotten on the second pass. On every pass after the first, the SELECT operation returns the data from the previous UPDATE operation.
def mainFlow():
activeList=[]
d=()
a=()
b=()
#cycleStart=datetime.datetime.now()
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id=1")
d=cur.fetchone()
DBSV.commit()
a=d[0]
b=a[0]
activeList=ast.literal_eval(a)
print(activeList)
buyList=[]
clearOrders()
sellDecide()
if activeList:
for i in activeList:
a=buyCalculate(i)
if a:
buyList.append(i)
print ('buy list: ',buyList)
if buyList:
buyDecide(buyList)
cur.close()
d=()
a=()
b=()
activeList=[]
print ('+++++++++++++END OF BLOCK+++++++++++++++')
state=True
while state==True:
cycleStart=datetime.datetime.now()
mainFlow()
cycleEnd=datetime.datetime.now()
wait=300-(cycleEnd-cycleStart).total_seconds()
print ('wait=: ' +str(wait))
if wait>0:
time.sleep(wait)
As you can see, I am re initializing all my variables, I am closing my cursor, I am doing a commit() operation that is supposed to solve this sort of problem, I have tried plain cursors, and cursors with the buffer set True and False, always with the same result.
When I run the exact same Select query from MySQL Workbench, the results returned are fine.
Baffled, and stuck on this for 2 days.
You're performing your COMMIT before your UPDATE/INSERT/DELETE transactions
Though a SELECT statement is, theoretically, DML it has certain differences with INSERT, UPDATE and DELETE in that it doesn't modify the data within the database. If you want to see the data that has been changed within another session then you must COMMIT it only after it's been changed. This is partially exacerbated by you closing the cursor after each loop.
You've gone far too far in trying to solve this problem; there's no need to reset everything within the mainFlow() method (and I can't see a need for most of the variables)
def mainFlow():
buyList = []
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id = 1")
activeList = cur.fetchone()[0]
activeList = ast.literal_eval(activeList)
clearOrders()
sellDecide()
for i in activeList:
a = buyCalculate(i)
if a:
buyList.append(i)
if buyList:
buyDecide(buyList)
DBSV.commit()
cur.close()
while True:
cycleStart = datetime.datetime.now()
mainFlow()
cycleEnd = datetime.datetime.now()
wait = 300 - (cycleEnd - cycleStart).total_seconds()
if wait > 0:
time.sleep(wait)
I've removed a fair amount of unnecessary code (and added spaces), I've removed the reuse of variable names for different things and the declaration of variables that are overwritten immediately. This still isn't very OO though...
As we don't have detailed knowledge of exactly what clearOrders(), sellDecide() and buyCalculate() you might want to double check this yourself.

Categories

Resources