I have a 'throwaway' sql statement that I would like to run. I don't care about the error status, and I don't need to know if it completed successfully. It is to create an index on a table that is very infrequently used. I currently have the connection and cursor object, and here is how I would normally do it:
self.cursor.execute('ALTER TABLE mytable ADD INDEX (_id)')
Easy enough. However, this statement takes about five minutes, and like I mentioned, it's not important enough to block other items that are unrelated to it. Is it possible to execute a cursor statement in the background? Again, I don't need any status or anything from it, and I don't care about 'closing the cursor/connection' or anything -- it really is a throw-away statement on a table that is probably accessed one to five times in its lifetime before being dropped.
threading.Thread(target=lambda tn, cursor: cursor.execute('ALTER TABLE %s ADD INDEX (_id)' % tn))).start()
What would be the best approach to execute a statement in the background so it doesn't block future sql statements.
Related
I am writing code to create a GUI in Python on the Spyder environment of Anaconda. within this code I operate with a PostgreSQL database and I therefore use the psycopg2 database adapter so that I can interact with directly from the GUI.
The code is too long to post here, as it is over 3000 lines, but to summarize, I have no problem interacting with my database except when I try to drop a table.
When I do so, the GUI frames become unresponsive, the drop table query doesn't drop the intended table and no errors or anything else of that kind are thrown.
Within my code, all operations which result in a table being dropped are processed via a function (DeleteTable). When I call this function, there are no problems as I have inserted several print statements previously which confirmed that everything was in order. The problem occurs when I execute the statement with the cur.execute(sql) line of code.
Can anybody figure out why my tables won't drop?
def DeleteTable(table_name):
conn=psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'")
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
That must be because a concurrent transaction is holding a lock that blocks the DROP TABLE statement.
Examine the pg_stat_activity view and watch out for sessions with state equal to idle in transaction or active that have an xact_start of more than a few seconds ago.
This is essentially an application bug: you must make sure that all transactions are closed immediately, otherwise Bad Things can happen.
I am having the same issue when using psycopg2 within airflow's postgres hook and I resolved it with with statement. Probably this resolves the issue because the connection becomes local within the with statement.
def drop_table():
with PostgresHook(postgres_conn_id="your_connection").get_conn() as conn:
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS your_table")
task_drop_table = PythonOperator(
task_id="drop_table",
python_callable=drop_table
)
And a solution is possible for the original code above like this (I didn't test this one):
def DeleteTable(table_name):
with psycopg2.connect("host='localhost' dbname='trial2' user='postgres' password='postgres'") as conn:
cur=conn.cursor()
sql="""DROP TABLE """+table_name+""";"""
cur.execute(sql)
conn.commit()
Please comment if anyone tries this.
I have the following code that is using MySQLdb for db inserts
self.cursor.execute('START TRANSACTION;')
for item in data:
self.cursor.execute('INSERT INTO...')
self.cursor.execute('COMMIT;')
self.conn.commit()
Is the self.conn.commit() at the end redundant, or does that need to be there?
If you start a transaction you're responsible for calling COMMIT or it'll get unrolled when you close the connection.
As a note it's bad form to include ; in your queries unless you're using an interactive shell. They're not necessary and immediately raise a bunch of questions about how they came to be included there.
The ; delimiter is used by the shell to determine where one command stops and the next starts, something that's not necessary when using code where each statement is supplied as a separate string.
I have one script running on a server that updates a list of items in a MySQL database to be processed by another script running on my desktop. The script runs in a loop, processing the list every 5 minutes (the server side script also runs on a 5 minute cycle). On the first loop, the script retrieves the current list (basic SELECT operation), on the second cycle, it gets the same version (not updated) list, on the third, it gets the list it should have gotten on the second pass. On every pass after the first, the SELECT operation returns the data from the previous UPDATE operation.
def mainFlow():
activeList=[]
d=()
a=()
b=()
#cycleStart=datetime.datetime.now()
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id=1")
d=cur.fetchone()
DBSV.commit()
a=d[0]
b=a[0]
activeList=ast.literal_eval(a)
print(activeList)
buyList=[]
clearOrders()
sellDecide()
if activeList:
for i in activeList:
a=buyCalculate(i)
if a:
buyList.append(i)
print ('buy list: ',buyList)
if buyList:
buyDecide(buyList)
cur.close()
d=()
a=()
b=()
activeList=[]
print ('+++++++++++++END OF BLOCK+++++++++++++++')
state=True
while state==True:
cycleStart=datetime.datetime.now()
mainFlow()
cycleEnd=datetime.datetime.now()
wait=300-(cycleEnd-cycleStart).total_seconds()
print ('wait=: ' +str(wait))
if wait>0:
time.sleep(wait)
As you can see, I am re initializing all my variables, I am closing my cursor, I am doing a commit() operation that is supposed to solve this sort of problem, I have tried plain cursors, and cursors with the buffer set True and False, always with the same result.
When I run the exact same Select query from MySQL Workbench, the results returned are fine.
Baffled, and stuck on this for 2 days.
You're performing your COMMIT before your UPDATE/INSERT/DELETE transactions
Though a SELECT statement is, theoretically, DML it has certain differences with INSERT, UPDATE and DELETE in that it doesn't modify the data within the database. If you want to see the data that has been changed within another session then you must COMMIT it only after it's been changed. This is partially exacerbated by you closing the cursor after each loop.
You've gone far too far in trying to solve this problem; there's no need to reset everything within the mainFlow() method (and I can't see a need for most of the variables)
def mainFlow():
buyList = []
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id = 1")
activeList = cur.fetchone()[0]
activeList = ast.literal_eval(activeList)
clearOrders()
sellDecide()
for i in activeList:
a = buyCalculate(i)
if a:
buyList.append(i)
if buyList:
buyDecide(buyList)
DBSV.commit()
cur.close()
while True:
cycleStart = datetime.datetime.now()
mainFlow()
cycleEnd = datetime.datetime.now()
wait = 300 - (cycleEnd - cycleStart).total_seconds()
if wait > 0:
time.sleep(wait)
I've removed a fair amount of unnecessary code (and added spaces), I've removed the reuse of variable names for different things and the declaration of variables that are overwritten immediately. This still isn't very OO though...
As we don't have detailed knowledge of exactly what clearOrders(), sellDecide() and buyCalculate() you might want to double check this yourself.
So I'm using psycopg2, I have a simple table:
CREATE TABLE IF NOT EXISTS feed_cache (
feed_id int REFERENCES feeds(id) UNIQUE,
feed_cache text NOT NULL,
expire_date timestamp --without time zone
);
I'm calling the following method and query:
#staticmethod
def get_feed_cache(conn, feed_id):
c = conn.cursor()
try:
sql = 'SELECT feed_cache FROM feed_cache WHERE feed_id=%s AND localtimestamp <= expire_date;'
c.execute(sql, (feed_id,))
result = c.fetchone()
if result:
conn.commit()
return result[0]
else:
print 'DBSELECT.get_feed_cache: %s' % result
print 'sql: %s' % (c.mogrify(sql, (feed_id,)))
except:
conn.rollback()
raise
finally:
c.close()
return None
I've added the else statement to output the exact sql and result that is being executed and returned.
The get_feed_cache() method is called from a database connection thread pool. When the get_feed_cache() method is called "slowishly" (~1/sec or less) the result is returned as expected, however when called concurrently it will occasionally return None. I have tried multiple ways of writing this query & method.
Some observations:
If I remove 'AND localtimestamp <= expire_date' from the query, the query ALWAYS returns a result.
Executing the query rapidly in serial in psql always returns a result.
After reading about the fetch*() methods of psycopg's cursor class they note that the results are cached for the cursor, I'm assuming that the cache is not shared between different cursors. http://initd.org/psycopg/docs/faq.html#best-practices
I have tried using postgresql's now() and current_timestamp functions with the same results. (I am aware of the timezone aspect of now() & current_timestamp)
Conditions to note:
There will NEVER be a case where there is not a feed_cache value for a provided feed_id.
There will NEVER be a case where any value in the feed_cache table is NULL
While testing I have completely disabled any & all writes to this table
I have set the expire_date to be sufficiently far in the future for all values such that the expression 'AND localtimestamp <= expire_date' will always be true.
Here is a copy & pasted output of it returning None:
DBSELECT.get_feed_cache: None
sql: SELECT feed_cache FROM feed_cache WHERE feed_id=5 AND localtimestamp < expire_date;
Well that's pretty much it, I'm not sure what's going on. Maybe I'm making some really dumb mistake and I just don't notice it! My current guess is that it has something to do with psycopg2 and perhaps the way it's caching results between cursors. If the cursors DO share the cache and the queries happen near-simultaneously then it could be possible that the first cursor fetches the result, the second cursor sees there is a cache of the same query, so it does not execute, then the first cursor closes and deletes the cache and the second cursor tries to fetch a now null/None cache.*
That said, psycopg2 states that it's thread-safe for read-only queries, so unless I'm miss-interpreting their implementation of thread-safe, this shouldn't be the case.
Thank you for your time!
*After adding a thread lock for the get_feed_cache, acquiring before creating the cursor and releasing before returning, I still occasionally get a None result
I think this might have to do with the fact that the time stamps returned by localtimestamp or current_timestamp are fixed when the transaction starts, not when you run the statement. And psycopg manages the transactions behind your back to some degree. So you might be getting a slightly older time stamp.
You could debug this by setting log_statement = all in your server and then observing when the BEGIN statements are executed relative to your queries.
You might want to look into using a function such as clock_timestamp(), which updates more often per transaction. See http://www.postgresql.org/docs/current/static/functions-datetime.html.
What is the best way to deal with the
1205 "deadlock victim"
error when calling SQL Server from Python?
The issue arises when I have multiple Python scripts running, and all are attempting to update a table with a MERGE statement which adds a row if it doesn't yet exist (this query will be called millions of times in each script).
MERGE table_name as table // including UPDLOCK or ROWLOCK eventually
// results in deadlock
USING ( VALUES ( ... ) )
AS row( ... )
ON table.feature = row.feature
WHEN NOT MATCHED THEN
INSERT (...)
VALUES (...)
The scripts require immediate access to the table to access the unique id assigned to the row.
Eventually, one of the scripts raises an OperationalError:
Transaction (Process ID 52) was deadlocked on lock resources with
another process and has been chosen as the deadlock victim. Rerun the
transaction.
1) I have tried using a try-except block around the call in Python:
while True:
try:
cur.execute(stmt)
break
except OperationalError:
continue
This approach slows the process down considerably. Also, I think I might be doing this incorrectly (I think I might need to reset the connection...).
2) Use a try-catch in SQL Server (something like below...):
WHILE 1 = 1
BEGIN
BEGIN TRY
MERGE statement // see above
BREAK
END TRY
BEGIN CATCH
SELECT ERROR_NUMBER() AS ErrorNumber
ROLLBACK
CONTINUE
END CATCH;
END
3) Something else?
Thanks for your help. And let me know if you need additional details, etc.
I am using Python 2.7, SQL Server 2008, and pymssql to make the connection.