So I'm using psycopg2, I have a simple table:
CREATE TABLE IF NOT EXISTS feed_cache (
feed_id int REFERENCES feeds(id) UNIQUE,
feed_cache text NOT NULL,
expire_date timestamp --without time zone
);
I'm calling the following method and query:
#staticmethod
def get_feed_cache(conn, feed_id):
c = conn.cursor()
try:
sql = 'SELECT feed_cache FROM feed_cache WHERE feed_id=%s AND localtimestamp <= expire_date;'
c.execute(sql, (feed_id,))
result = c.fetchone()
if result:
conn.commit()
return result[0]
else:
print 'DBSELECT.get_feed_cache: %s' % result
print 'sql: %s' % (c.mogrify(sql, (feed_id,)))
except:
conn.rollback()
raise
finally:
c.close()
return None
I've added the else statement to output the exact sql and result that is being executed and returned.
The get_feed_cache() method is called from a database connection thread pool. When the get_feed_cache() method is called "slowishly" (~1/sec or less) the result is returned as expected, however when called concurrently it will occasionally return None. I have tried multiple ways of writing this query & method.
Some observations:
If I remove 'AND localtimestamp <= expire_date' from the query, the query ALWAYS returns a result.
Executing the query rapidly in serial in psql always returns a result.
After reading about the fetch*() methods of psycopg's cursor class they note that the results are cached for the cursor, I'm assuming that the cache is not shared between different cursors. http://initd.org/psycopg/docs/faq.html#best-practices
I have tried using postgresql's now() and current_timestamp functions with the same results. (I am aware of the timezone aspect of now() & current_timestamp)
Conditions to note:
There will NEVER be a case where there is not a feed_cache value for a provided feed_id.
There will NEVER be a case where any value in the feed_cache table is NULL
While testing I have completely disabled any & all writes to this table
I have set the expire_date to be sufficiently far in the future for all values such that the expression 'AND localtimestamp <= expire_date' will always be true.
Here is a copy & pasted output of it returning None:
DBSELECT.get_feed_cache: None
sql: SELECT feed_cache FROM feed_cache WHERE feed_id=5 AND localtimestamp < expire_date;
Well that's pretty much it, I'm not sure what's going on. Maybe I'm making some really dumb mistake and I just don't notice it! My current guess is that it has something to do with psycopg2 and perhaps the way it's caching results between cursors. If the cursors DO share the cache and the queries happen near-simultaneously then it could be possible that the first cursor fetches the result, the second cursor sees there is a cache of the same query, so it does not execute, then the first cursor closes and deletes the cache and the second cursor tries to fetch a now null/None cache.*
That said, psycopg2 states that it's thread-safe for read-only queries, so unless I'm miss-interpreting their implementation of thread-safe, this shouldn't be the case.
Thank you for your time!
*After adding a thread lock for the get_feed_cache, acquiring before creating the cursor and releasing before returning, I still occasionally get a None result
I think this might have to do with the fact that the time stamps returned by localtimestamp or current_timestamp are fixed when the transaction starts, not when you run the statement. And psycopg manages the transactions behind your back to some degree. So you might be getting a slightly older time stamp.
You could debug this by setting log_statement = all in your server and then observing when the BEGIN statements are executed relative to your queries.
You might want to look into using a function such as clock_timestamp(), which updates more often per transaction. See http://www.postgresql.org/docs/current/static/functions-datetime.html.
Related
The problem
For a while now i've encountered a bug where a data retrieval query keeps hanging during execution. If that was all, then debugging would be fine, but it is not easy to reproduce, namely:
It only occurs on my linux laptop (manjaro xfce), with no problems on my windows pc
Primarily occurs on a few specific timestamps (mostly 4:05)
Even then doesn't consistently appear
I know how this can be fixed (by prepending the query with a select 1;), but i don't understand why the problem occurs, and why my solution fixes it, which is where i'm stuck. I've not seen any other problems that specifically describe this issue.
Code
The query in question below. What is does is select a range of measurements, and then averaging those measurements per timestep (and interpolating in case it's necessary) to get a range of averages.
SELECT datetime, AVG(wc) as wc
FROM (
SELECT public.time_bucket_gapfill('5 minutes', m.datetime)
AS datetime, public.interpolate(AVG(m.wc)) as wc
FROM growficient.measurement AS m
INNER JOIN growficient.placement AS p ON m.placement_id = p.id
WHERE m.datetime >= '2022-09-30T22:00:00+00:00'
AND m.datetime < '2022-10-01T04:05:00+00:00'
AND p.section_id = 'bd5114b8-4aab-11eb-af66-32bd66d4e25c'
GROUP BY public.time_bucket_gapfill('5 minutes', m.datetime), p.id
) AS placement_averages
GROUP BY datetime
ORDER BY datetime;
This is then executed via SQLAlchemy on a session level. In case the bug appears, it never gets to the fetchall().
execute_result = session.execute(query)
readings = execute_result.fetchall()
We're using session management very similar to what's seen in the SQLAlchemy documentation. This is meant to be a debug-session however, meaning that no commit statements are included.
sessionMaker = sessionmaker(
autocommit=False,
autoflush=False,
bind=create_engine(
config.get_settings().main_db,
echo=False,
connect_args=connect_options,
pool_pre_ping=True,
),
)
#contextlib.contextmanager
def managed_session() -> Session:
session = sessionMaker()
try:
yield session
except Exception as e:
session.rollback()
logger.error("Session error: %s", e)
raise
finally:
session.close()
Observations
I can visually see the transaction hanging if i execute select * from pg_catalog.pg_stat_activity psa
Printing the identical query and then executing it inside the database directly (i.e. dbeaver) correctly returns the results
None of the timeouts mentioned in the Postgres documentation do anything to break out of the hang
Adding a SELECT 1; statement works, but setting pool_pre_ping=True in the engine doesn't, which confuses me as they do the same thing to my understanding.
I have a 'throwaway' sql statement that I would like to run. I don't care about the error status, and I don't need to know if it completed successfully. It is to create an index on a table that is very infrequently used. I currently have the connection and cursor object, and here is how I would normally do it:
self.cursor.execute('ALTER TABLE mytable ADD INDEX (_id)')
Easy enough. However, this statement takes about five minutes, and like I mentioned, it's not important enough to block other items that are unrelated to it. Is it possible to execute a cursor statement in the background? Again, I don't need any status or anything from it, and I don't care about 'closing the cursor/connection' or anything -- it really is a throw-away statement on a table that is probably accessed one to five times in its lifetime before being dropped.
threading.Thread(target=lambda tn, cursor: cursor.execute('ALTER TABLE %s ADD INDEX (_id)' % tn))).start()
What would be the best approach to execute a statement in the background so it doesn't block future sql statements.
(Sorry in advance for the long question. I tried to break it up into sections to make it clearer what I'm asking. Please let me know if I should add anything else or reorganize it at all.)
Background:
I'm writing a web crawler that uses a producer/consumer model with jobs (pages to crawl or re-crawl) stored in a postgresql database table called crawler_table. I'm using SQLAlchemy to access and make changes to the database table. The exact schema is not important for this question. The important thing is that I (will) have multiple consumers, each of which repeatedly selects a record from the table, loads the page with phantomjs, and then writes information about the page back to the record.
It can happen on occasion that two consumers select the same job. This is not itself a problem; however, it is important that if they update the record with their results simultaneously, that they make consistent changes. It's good enough for me to just find out if an update would cause the record to become inconsistent. If so, I can deal with it.
Investigation:
I initially assumed that if two transactions in separate sessions read then updated the same record simultaneously, the second one to commit would fail. To test that assumption, I ran the following code (simplified slightly):
SQLAlchemySession = sessionmaker(bind=create_engine(my_postgresql_uri))
class Session (object):
# A simple wrapper for use with `with` statement
def __enter__ (self):
self.session = SQLAlchemySession()
return self.session
def __exit__ (self, exc_type, exc_val, exc_tb):
if exc_type:
self.session.rollback()
else:
self.session.commit()
self.session.close()
with Session() as session: # Create a record to play with
if session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url').count() == 0:
session.add(CrawlerPage(website='website', url='url',
first_seen=datetime.utcnow()))
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
page.failed_count = 0
# commit
# Actual experiment:
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (session)', page.failed_count
# 0 (expected)
page.failed_count += 5
with Session() as other_session:
same_page = other_session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'initial (other_session)', same_page.failed_count
# 0 (expected)
same_page.failed_count += 10
print 'final (other_session)', same_page.failed_count
# 10 (expected)
# commit other_session, no errors (expected)
print 'final (session)', page.failed_count
# 5 (expected)
# commit session, no errors (why?)
with Session() as session:
page = session.query(CrawlerPage) \
.filter(CrawlerPage.url == 'url') \
.one()
print 'final value', page.failed_count
# 5 (expected, given that there were no errors)
(Apparently Incorrect) Expectations:
I would have expected that reading a value from a record then updating that value within the same transaction would:
Be an atomic operation. That is, either succeed completely or fail completely. This much appears to be true, since the final value is 5, the value set in the last transaction to be committed.
Fail if the record being updated is updated by a concurrent session (other_session) upon attempting to commit the transaction. My rationale is that all transactions should behave as though they are performed independently in order of commit whenever possible, or should fail to commit. In these circumstances, the two transactions read then update the same value of the same record. In a version-control system, this would be the equivalent of a merge conflict. Obviously databases are not the same as version-control systems, but they have enough similarities to inform some of my assumptions about them, for better or worse.
Questions:
Why doesn't the second commit raise an exception?
Am I misunderstanding something about how SQLAlchemy handles transactions?
Am I misunderstanding something about how postgresql handles transactions? (This one seems most likely to me.)
Something else?
Is there a way to get the second commit to raise an exception?
PostgreSQL has select . . . for update, which SQLAlchemy seems to support.
My rationale is that all transactions should behave as though they are
performed independently in order of commit whenever possible, or
should fail to commit.
Well, in general there's a lot more to transactions than that. PostgreSQL's default transaction isolation level is "read committed". Loosely speaking, that means multiple transactions can simultaneously read committed values from the same rows in a table. If you want to prevent that, set transaction isolation serializable (might not work), or select...for update, or lock the table, or use a column-by-column WHERE clause, or whatever.
You can test and demonstrate transaction behavior by opening two psql connections.
begin transaction; begin transaction;
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(1 row)
select *
from test
where pid = 1
and date = '2014-10-01'
for update;
(waiting)
update test
set date = '2014-10-31'
where pid = 1
and date = '2014-10-01';
commit;
-- Locks released. SELECT for update fails.
(0 rows)
I have one script running on a server that updates a list of items in a MySQL database to be processed by another script running on my desktop. The script runs in a loop, processing the list every 5 minutes (the server side script also runs on a 5 minute cycle). On the first loop, the script retrieves the current list (basic SELECT operation), on the second cycle, it gets the same version (not updated) list, on the third, it gets the list it should have gotten on the second pass. On every pass after the first, the SELECT operation returns the data from the previous UPDATE operation.
def mainFlow():
activeList=[]
d=()
a=()
b=()
#cycleStart=datetime.datetime.now()
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id=1")
d=cur.fetchone()
DBSV.commit()
a=d[0]
b=a[0]
activeList=ast.literal_eval(a)
print(activeList)
buyList=[]
clearOrders()
sellDecide()
if activeList:
for i in activeList:
a=buyCalculate(i)
if a:
buyList.append(i)
print ('buy list: ',buyList)
if buyList:
buyDecide(buyList)
cur.close()
d=()
a=()
b=()
activeList=[]
print ('+++++++++++++END OF BLOCK+++++++++++++++')
state=True
while state==True:
cycleStart=datetime.datetime.now()
mainFlow()
cycleEnd=datetime.datetime.now()
wait=300-(cycleEnd-cycleStart).total_seconds()
print ('wait=: ' +str(wait))
if wait>0:
time.sleep(wait)
As you can see, I am re initializing all my variables, I am closing my cursor, I am doing a commit() operation that is supposed to solve this sort of problem, I have tried plain cursors, and cursors with the buffer set True and False, always with the same result.
When I run the exact same Select query from MySQL Workbench, the results returned are fine.
Baffled, and stuck on this for 2 days.
You're performing your COMMIT before your UPDATE/INSERT/DELETE transactions
Though a SELECT statement is, theoretically, DML it has certain differences with INSERT, UPDATE and DELETE in that it doesn't modify the data within the database. If you want to see the data that has been changed within another session then you must COMMIT it only after it's been changed. This is partially exacerbated by you closing the cursor after each loop.
You've gone far too far in trying to solve this problem; there's no need to reset everything within the mainFlow() method (and I can't see a need for most of the variables)
def mainFlow():
buyList = []
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id = 1")
activeList = cur.fetchone()[0]
activeList = ast.literal_eval(activeList)
clearOrders()
sellDecide()
for i in activeList:
a = buyCalculate(i)
if a:
buyList.append(i)
if buyList:
buyDecide(buyList)
DBSV.commit()
cur.close()
while True:
cycleStart = datetime.datetime.now()
mainFlow()
cycleEnd = datetime.datetime.now()
wait = 300 - (cycleEnd - cycleStart).total_seconds()
if wait > 0:
time.sleep(wait)
I've removed a fair amount of unnecessary code (and added spaces), I've removed the reuse of variable names for different things and the declaration of variables that are overwritten immediately. This still isn't very OO though...
As we don't have detailed knowledge of exactly what clearOrders(), sellDecide() and buyCalculate() you might want to double check this yourself.
I'm using Psycopg2 in Python to access a PostgreSQL database. I'm curious if it's safe to use the with closing() pattern to create and use a cursor, or if I should use an explicit try/except wrapped around the query. My question is concerning inserting or updating, and transactions.
As I understand it, all Psycopg2 queries occur within a transaction, and it's up to calling code to commit or rollback the transaction. If within a with closing(... block an error occurs, is a rollback issued? In older versions of Psycopg2, a rollback was explicitly issued on close() but this is not the case anymore (see http://initd.org/psycopg/docs/connection.html#connection.close).
My question might make more sense with an example. Here's an example using with closing(...
with closing(db.cursor()) as cursor:
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
What happens when module.raise_unexpected_error() raises its error? Is the transaction rolled back? As I understand transactions, I either need to commit them or roll them back. So in this case, what happens?
Alternately I could write my query like this:
cursor = None
try:
cursor = db.cursor()
cursor.execute("""UPDATE users
SET password = %s, salt = %s
WHERE user_id = %s""",
(pw_tuple[0], pw_tuple[1], user_id))
module.rase_unexpected_error()
cursor.commit()
except BaseException:
if cursor is not None:
cursor.rollback()
finally:
if cursor is not None:
cursor.close()
Also I should mention that I have no idea if Psycopg2's connection class cursor() method could raise an error or not (the documentation doesn't say) so better safe than sorry, no?
Which method of issuing a query and managing a transaction should I use?
Your link to the Psycopg2 docs kind of explains it itself, no?
... Note that closing a connection without committing the changes first will
cause any pending change to be discarded as if a ROLLBACK was
performed (unless a different isolation level has been selected: see
set_isolation_level()).
Changed in version 2.2: previously an explicit ROLLBACK was issued by
Psycopg on close(). The command could have been sent to the backend at
an inappropriate time, so Psycopg currently relies on the backend to
implicitly discard uncommitted changes. Some middleware are known to
behave incorrectly though when the connection is closed during a
transaction (when status is STATUS_IN_TRANSACTION), e.g. PgBouncer
reports an unclean server and discards the connection. To avoid this
problem you can ensure to terminate the transaction with a
commit()/rollback() before closing.
So, unless you're using a different isolation level, or using PgBouncer, your first example should work fine. However, if you desire some finer-grained control over exactly what happens during a transaction, then the try/except method might be best, since it parallels the database transaction state itself.