I am working on a Personal Finance App using Django (Creating an API Based Backend). I have a transactions table where I need to maintain the running balance based on the UserID, AccountID, TransactionDate. After Creating, Updating, or Deleting a Transaction I need to update the running balance. The current method I have identified to do this is by running a custom SQL statement which I call once one of the above operations has been done to update this field (see code below):
def update_transactions_running_balance(**kwargs):
from django.db import connection
querystring = '''UPDATE transactions_transactions
SET
transaction_running_balance = running_balance_calc.calc_running_balance
FROM
(SELECT
transaction_id,
transaction_user,
transaction_account,
SUM(transaction_amount_account_currency) OVER (ORDER BY transaction_user, transaction_account, transaction_date ASC, transaction_id ASC) as calc_running_balance
FROM
transactions_transactions
WHERE
transaction_user = {}
AND
transaction_account = {}) as running_balance_calc
WHERE
transactions_transactions.transaction_id = running_balance_calc.transaction_id
AND
transactions_transactions.transaction_user = running_balance_calc.transaction_user
AND
transactions_transactions.transaction_account = running_balance_calc.transaction_account'''.format(int(kwargs['transaction_user']), int(kwargs['transaction_account']))
with connection.cursor() as cursor:
cursor.execute(querystring)
However the issue I have is once the table gets a little larger, the response time started to increase (The SELECT statement is where the time is taken). The other issue I have is when I load the server with multiple concurrent create transactions, every once in a while (current rate is 0.25%) the running balance update fails due to the following error:
ERROR: deadlock detected.
Process 7038 waits for ShareLock on transaction 5549; blocked by process 7036.
Process 7036 waits for ShareLock on transaction 5551; blocked by process 7038.
I was wondering if there was a better way to do this? I originally wanted to define the running balance field as calculated field in the Django Model but I couldn't figure out how to define the it so that it achieves the same result as the code above. Any help would be appreciated.
Thanks.
Related
I have a DB where i am sending some data. At the same time i'm running a Python script and i would like this script to send on my console the last entry to the MongoDB database as soon as it's added.
I've been looking for a solution to this for days without finding anything.
I made some research and found about:
a) tailable cursor, but the only problem is that my database is not capped, and since i will be putting data every 5 seconds i'm afraid that a capped database would not be enough for my needs since when the max size it's reached it will start overwriting the oldest records b) change_streams, but my db is not a replica set, i'm fairly new to this so i still have to learn about more advanced topics like RS.
Any advice?
This is what i got so far:
from pymongo import MongoClient
import pymongo
import time
import random
from pprint import pprint
#Step 1: Connect to MongoDB - Note: Change connection string as needed
client = MongoClient(port=27017)
db = client.one
mycol = client["coll"]
highest_previous_primary_key = 1
while True:
cursor = db.mycol.find()
for document in cursor:
# get the current primary key, and if it's greater than the previous one
# we print the results and increment the variable to that value
current_primary_key = document['num']
if current_primary_key > highest_previous_primary_key:
print(document['num'])
highest_previous_primary_key = current_primary_key
But the problem with this it's that it will stop printing after the 4th record, plus i don't know if it's the best solution for when my db will have a lot of data.
Any advice?
b) change_streams, but my db is not a replica set, i'm fairly new to this so i still have to learn about more advanced topics like RS
Replica sets provide redundancy and high availability, and are the basis for all production MongoDB deployments. Having said that, for testing and/or deployment you can deploy a replica set with only a single member. For local testing example:
mongod --dbpath /path/data/test --replSet test
Once the local test server started, connect with mongo shell to perform rs.initiate():
mongo
> rs.initiate()
See related Deploy a replica set for testing and deployment
try:
# Only catch insert operations
with client.watch([{'$match': {'operationType': 'insert'}}]) as stream:
for insert_change in stream:
print(insert_change)
except pymongo.errors.PyMongoError:
# The ChangeStream encountered an unrecoverable error or the
# resume attempt failed to recreate the cursor.
logging.error('...')
See also pymongo.mongo_client.MongoClient.watch()
It is a bit late but you can do so without replica set too by using following:
highest_previous_primary_key = 0
while True:
cursor = collection.find()
for document in cursor:
# get the current primary key, and if it's greater than the previous one
# we print the results and increment the variable to that value
current_primary_key = collection.count()
if current_primary_key > highest_previous_primary_key:
curs = collection.find()
curs[highest_previous_primary_key:]
for i in curs:
print(i)
highest_previous_primary_key = current_primary_key
Situation: I have a live trading script which computes all sorts of stuff every x minutes in my main thread (Python). the order sending is performed through such thread. the reception and execution of such orders though is a different matter as I cannot allow x minutes to pass but I need them as soon as they come in. I initialized another thread to check for such data (execution) which is in a database table (POSTGRES SQL).
Problem(s): I cannot continuosly perform query every xx ms, get data from DB, compare table length, and then get the difference for a variety of reasons (not only guy to use such DB, perforamnce issues, etc). so I looked up some solutions and came up with this thread (https://dba.stackexchange.com/questions/58214/getting-last-modification-date-of-a-postgresql-database-table) where basically the gist of it was that
"There is no reliable, authorative record of the last modified time of a table".
Question: what can I do about it, that is: getting near instantenuous responses from a postgres sql table without overloading the whole thing using Python?
You can use notifications in postgresql:
import psycopg2
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
import select
def dblisten(dsn):
connection = psycopg2.connect(dsn)
connection.set_isolation_level(ISOLATION_LEVEL_AUTOCOMMIT)
cur = connection.cursor()
cur.execute("LISTEN new_id;")
while True:
select.select([connection],[],[])
connection.poll()
events = []
while connection.notifies:
notify = connection.notifies.pop().payload
do_something(notify)
and install a trigger for each update:
CREATE OR REPLACE FUNCTION notify_id_trigger() RETURNS trigger AS $$
BEGIN
PERFORM pg_notify('new_id', NEW.ID);
RETURN new;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER data_modified AFTER insert or update on data_table for each row execute procedure notify_id_trigger();")
I am creating a testing web-application and each time a user submits a question correct, the incorrect or correct method will be called, giving the user passed in +1 of the totalattempted in that respective row.
def incorrect(user):
u = user
u.totalattempted += 1
session.commit()
u.score = round(u.totalcorrect * u.totalcorrect / u.totalattempted)
session.commit()
def correct(user):
u = user
u.totalattempted += 1
session.commit()
u.totalcorrect += 1
session.commit()
u.score = round(u.totalcorrect * u.totalcorrect / u.totalattempted)
session.commit()
The issue that I have is when the correct method is called several times per second, the server will correctly give the user +1 to its totalcorrect but will not give +1 to it's total attempted. I almost feel like this issue may be server-wide as when a user's account is created, there is about a 30 second delay before the server detects that user's account. Thanks!
Your code is prone to race conditions. Incrementing the value on a model updates the database with the new value. It does not perform an atomic update. To do that:
u.totalattempted = User.totalattempted + 1
The same should be done for totalcorrect.
Also, you probably shouldn't commit between updating each field. That causes SQLAlchemy to flush the session and reload the record when you update totalcorrect.
(Sorry in advance for the long question. I tried to break it up into sections to make it clearer what I'm asking. Please let me know if I should add anything else or reorganize it at all.)
Background:
I'm writing a web crawler that uses a producer/consumer model with jobs (pages to crawl or re-crawl) stored in a postgresql database table called crawler_table. I'm using SQLAlchemy to access and make changes to the database table. The exact schema is not important for this question. The important thing is that I (will) have multiple consumers, each of which repeatedly selects a record from the table, loads the page with phantomjs, and then writes information about the page back to the record.
It can happen on occasion that two consumers select the same job. This is not itself a problem; however, it is important that if they update the record with their results simultaneously, that they make consistent changes. It's good enough for me to just find out if an update would cause the record to become inconsistent. If so, I can deal with it.
Investigation:
I initially assumed that if two transactions in separate sessions read then updated the same record simultaneously, the second one to commit would fail. To test that assumption, I ran the following code (simplified slightly):
SQLAlchemySession = sessionmaker(bind=create_engine(my_postgresql_uri))
class Session (object):
# A simple wrapper for use with `with` statement
def __enter__ (self):
self.session = SQLAlchemySession()
return self.session
def __exit__ (self, exc_type, exc_val, exc_tb):
if exc_type:
self.session.rollback()
else:
self.session.commit()
self.session.close()
with Session() as session: # Create a record to play with
if session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url').count() == 0:
session.add(CrawlerPage(website='website', url='url',
first_seen=datetime.utcnow()))
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
page.failed_count = 0
# commit
# Actual experiment:
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (session)', page.failed_count
# 0 (expected)
page.failed_count += 5
with Session() as other_session:
same_page = other_session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (other_session)', same_page.failed_count
# 0 (expected)
same_page.failed_count += 10
print 'final (other_session)', same_page.failed_count
# 10 (expected)
# commit other_session, no errors (expected)
print 'final (session)', page.failed_count
# 5 (expected)
# commit session, no errors (why?)
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'final value', page.failed_count
# 5 (expected, given that there were no errors)
(Apparently Incorrect) Expectations:
I would have expected that reading a value from a record then updating that value within the same transaction would:
Be an atomic operation. That is, either succeed completely or fail completely. This much appears to be true, since the final value is 5, the value set in the last transaction to be committed.
Fail if the record being updated is updated by a concurrent session (other_session) upon attempting to commit the transaction. My rationale is that all transactions should behave as though they are performed independently in order of commit whenever possible, or should fail to commit. In these circumstances, the two transactions read then update the same value of the same record. In a version-control system, this would be the equivalent of a merge conflict. Obviously databases are not the same as version-control systems, but they have enough similarities to inform some of my assumptions about them, for better or worse.
Questions:
Why doesn't the second commit raise an exception?
Am I misunderstanding something about how SQLAlchemy handles transactions?
Am I misunderstanding something about how postgresql handles transactions? (This one seems most likely to me.)
Something else?
Is there a way to get the second commit to raise an exception?
PostgreSQL has select . . . for update, which SQLAlchemy seems to support.
My rationale is that all transactions should behave as though they are
performed independently in order of commit whenever possible, or
should fail to commit.
Well, in general there's a lot more to transactions than that. PostgreSQL's default transaction isolation level is "read committed". Loosely speaking, that means multiple transactions can simultaneously read committed values from the same rows in a table. If you want to prevent that, set transaction isolation serializable (might not work), or select...for update, or lock the table, or use a column-by-column WHERE clause, or whatever.
You can test and demonstrate transaction behavior by opening two psql connections.
begin transaction; begin transaction;
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(1 row)
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(waiting)
update test
set date = '2014-10-31'
where pid = 1
and date = '2014-10-01';
commit;
-- Locks released. SELECT for update fails.
(0 rows)
I have this running on a live website. When a user logs in I query his profile to see how many "credits" he has available. Credits are purchased via paypal. If a person buys credits and the payment comes through, the query still shows 0 credits even though if I run the same query in phpmyadmin it brings the right result. If I restart the apache webserver and reload the page the right number of credits are being shown. Here's my mapper code which shows the number of credits each user has:
mapper( User, users_table, order_by = 'user.date_added DESC, user.id DESC', properties = {
'userCreditsCount': column_property(
select(
[func.ifnull( func.sum( orders_table.c.quantity ), 0 )],
orders_table.c.user_id == users_table.c.id
).where( and_(
orders_table.c.date_added > get_order_expire_limit(), # order must not be older than a month
orders_table.c.status == STATUS_COMPLETED
) ).\
label( 'userCreditsCount' ),
deferred = True
)
# other properties....
} )
I'm using sqlalchemy with flask framework but not using their flask-sqlalchemy package (just pure sqlalchemy)
Here's how I initiate my database:
engine = create_engine( config.DATABASE_URI, pool_recycle = True )
metadata = MetaData()
db_session = scoped_session( sessionmaker( bind = engine, autoflush = True, autocommit = False ) )
I learned both python and sqlalchemy on this project so I may be missing things but this one is driving me nuts. Any ideas?
when you work with a Session, as soon as it starts working with a connection, it holds onto that connection until commit(), rollback() or close() is called. With the DBAPI, the connection to the database also remains in a transaction until the transaction is committed or rolled back.
In this case, when you've loaded data into your session, SQLAlchemy doesn't refresh the data until the transaction is ended (or if you explicitly expire some part of the data with expire()). This is the natural behavior to have, since due to transaction isolation, it's very likely that the current transaction cannot see changes that have occurred since that transaction started in any case.
So while using expire() or refresh() may or may not be part of how to get the latest data into your Session, really you need to end your transaction and start a new one to truly see what's been changed elsewhere since that transaction started. you should organize your application so that a particular Session() is ready to go when a new request comes in, but when that request completes, the Session() should be closed out, and a new one (or at least a new transaction) started up on the next request.
Please try to call refresh or expire on your object before accessing the field userCreditsCount:
user1 = session.query(User).get(1)
# ...
session.refresh(user1, ('userCreditsCount',))
This will make the query execute again (when refresh is called).
However, depending on the isolation mode your transaction uses, it might not resolve the problem, in which case you might need to commit/rollback the transaction (session) in order for the query to give you new result.
Lifespan of a Contextual Session
I'd make sure you're closing the session when you're done with it.
session = db_session()
try:
return session.query(User).get(5)
finally:
session.close()
set sessionmaker's autocommit to True and see if that helps, according to documentation sessionmaker caches
the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching.
so in your code it would become:
sessionmaker(bind = engine, autoflush = True, autocommit = True)