Running sqlite queries in a loop slows down after 20 calls - python

I have a sqlite database that I am running a query on and there is a select statement that is working much slower than I thought it would.
I have one db method for getting a single table
def get_tables_by_id(self, id)
with self.conn as conn:
c = conn.cursor()
c.execute(SELECT * FROM tbl WHERE foreign_id = ? AND date(date) >= ? AND date(date) <= ?, params)
return c.fetchall()
I then have another method to do this many times under different parameters. Instead of preparing a single statement to fetch them all at once I thought I would just loop and call get_tables_by_id because I am not hitting the database over the network and I would have to do some operations to organize everything if I queried them all at once. I thought since sqlite runs on the filesystem it would be fast... but it slows down after 20 or so calls to the above method. (I am trying to call it 1000 times)
If I however execute the following SQL and run it in one big query, it returns all of the results instantly...
q = f"SELECT * FROM tbl WHERE foreign_id IN ({','.join('?' * len(foreign_ids)}) AND date(date) >= ? and date(date) <= ?"
I would understand if the looping query took longer, on the order of seconds, but I guess it would take 5 minutes to complete at the rate that it was going at. WHat would be the reason for such a slow down like that?

Related

simple select query very slow

I have a sqlite database that I use as a cache and using this function
def read_cache(table):
if not table.isidentifier(): # <- my version of sqli protection used to allow for user to provide table and column names
raise ValueError(f"Invalid table name '{table}'")
for record in cur.execute(f"SELECT * FROM {table};"):
yield {**record}
a call list(read_cache('some_values')) takes forever (nearly a minute) to return, yet when I run the query against the db itself (as in cur.execute('select * from some_values;').fetchall() or [r for r in cur.execute('select * from some_values;')]) it returns immediately.
The table in question has 40k rows, 2 columns each
What can I do to find the issue or speed it up?
EDIT: I think I found the issue, the speed issue only occurs in an interpreter that has some threading.Thread running... wonder if the issue is GIL related... will have to see what to do about it...

Drop Temporary Table after execution of function

I am executing a selfwritten postgresql function in a loop for several times from Python. I am using the psycopg2 framework to do this.
The function I wrote hast the following structure:
CREATE OR REPLACE FUNCTION my_func()
RETURNS void AS
$$
BEGIN
-- create a temporary table that should be deleted after
-- the functions finishes
-- normally a CREATE TABLE ... would be here
CREATE TEMPORARY TABLE temp_t
(
seq integer,
...
) ON COMMIT DROP;
-- now the insert
INSERT INTO temp_t
SELECT
...
END
$$
LANGUAGE 'plpgsql';
Thats basically the python part
import time
import psycopg2
conn = psycopg2.connect(host="localhost", user="user", password="...", dbname="some_db")
cur = conn.cursor()
for i in range(1, 11):
print i
print time.clock()
cur.callproc("my_func")
print time.clock()
cur.close()
conn.close()
The error I get when I run the python script is:
---> relation "temp_t" already exists
Basically I want to measure how long it takes to execute the function. Doing that, the loop shall run several times. Storing the result of the SELECT in a temporary table is supposed to replace the CREATE TABLE ... part which would normally create the output table
Why doesnt postgres drop the function after I executed the function from Python?
All the function calls in the loop are performed in a single transaction, so the temporary table is not dropped each time. Setting autocommit should change this behavior:
...
conn = psycopg2.connect(host="localhost", user="user", password="...", dbname="some_db")
conn.autocommit = True
cur = conn.cursor()
for i in range(1, 11):
...
Temporary tables are dropped when the session ends. Since your session does not end with the function call, the second function call will try to create the table again. You need to alter your store function and to check whether the temporary table already exists and create it if it doesn't. This post can help you in doing so.
Another quick n dirty is to connect and disconnet after each function call.
import time
import psycopg2
for i in range(1, 11):
conn = psycopg2.connect(host="localhost", user="user", password="...", dbname="some_db")
cur = conn.cursor()
print i
print time.clock()
cur.callproc("my_func")
print time.clock()
cur.close()
conn.close()
Not nice, but does the trick.

Results of previous update returned on mysql select in Python3

I have one script running on a server that updates a list of items in a MySQL database to be processed by another script running on my desktop. The script runs in a loop, processing the list every 5 minutes (the server side script also runs on a 5 minute cycle). On the first loop, the script retrieves the current list (basic SELECT operation), on the second cycle, it gets the same version (not updated) list, on the third, it gets the list it should have gotten on the second pass. On every pass after the first, the SELECT operation returns the data from the previous UPDATE operation.
def mainFlow():
activeList=[]
d=()
a=()
b=()
#cycleStart=datetime.datetime.now()
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id=1")
d=cur.fetchone()
DBSV.commit()
a=d[0]
b=a[0]
activeList=ast.literal_eval(a)
print(activeList)
buyList=[]
clearOrders()
sellDecide()
if activeList:
for i in activeList:
a=buyCalculate(i)
if a:
buyList.append(i)
print ('buy list: ',buyList)
if buyList:
buyDecide(buyList)
cur.close()
d=()
a=()
b=()
activeList=[]
print ('+++++++++++++END OF BLOCK+++++++++++++++')
state=True
while state==True:
cycleStart=datetime.datetime.now()
mainFlow()
cycleEnd=datetime.datetime.now()
wait=300-(cycleEnd-cycleStart).total_seconds()
print ('wait=: ' +str(wait))
if wait>0:
time.sleep(wait)
As you can see, I am re initializing all my variables, I am closing my cursor, I am doing a commit() operation that is supposed to solve this sort of problem, I have tried plain cursors, and cursors with the buffer set True and False, always with the same result.
When I run the exact same Select query from MySQL Workbench, the results returned are fine.
Baffled, and stuck on this for 2 days.
You're performing your COMMIT before your UPDATE/INSERT/DELETE transactions
Though a SELECT statement is, theoretically, DML it has certain differences with INSERT, UPDATE and DELETE in that it doesn't modify the data within the database. If you want to see the data that has been changed within another session then you must COMMIT it only after it's been changed. This is partially exacerbated by you closing the cursor after each loop.
You've gone far too far in trying to solve this problem; there's no need to reset everything within the mainFlow() method (and I can't see a need for most of the variables)
def mainFlow():
buyList = []
cur = DBSV.cursor(buffered=True)
cur.execute("SELECT list FROM active_list WHERE id = 1")
activeList = cur.fetchone()[0]
activeList = ast.literal_eval(activeList)
clearOrders()
sellDecide()
for i in activeList:
a = buyCalculate(i)
if a:
buyList.append(i)
if buyList:
buyDecide(buyList)
DBSV.commit()
cur.close()
while True:
cycleStart = datetime.datetime.now()
mainFlow()
cycleEnd = datetime.datetime.now()
wait = 300 - (cycleEnd - cycleStart).total_seconds()
if wait > 0:
time.sleep(wait)
I've removed a fair amount of unnecessary code (and added spaces), I've removed the reuse of variable names for different things and the declaration of variables that are overwritten immediately. This still isn't very OO though...
As we don't have detailed knowledge of exactly what clearOrders(), sellDecide() and buyCalculate() you might want to double check this yourself.

psycopg2 occasionally returns null

So I'm using psycopg2, I have a simple table:
CREATE TABLE IF NOT EXISTS feed_cache (
feed_id int REFERENCES feeds(id) UNIQUE,
feed_cache text NOT NULL,
expire_date timestamp --without time zone
);
I'm calling the following method and query:
#staticmethod
def get_feed_cache(conn, feed_id):
c = conn.cursor()
try:
sql = 'SELECT feed_cache FROM feed_cache WHERE feed_id=%s AND localtimestamp <= expire_date;'
c.execute(sql, (feed_id,))
result = c.fetchone()
if result:
conn.commit()
return result[0]
else:
print 'DBSELECT.get_feed_cache: %s' % result
print 'sql: %s' % (c.mogrify(sql, (feed_id,)))
except:
conn.rollback()
raise
finally:
c.close()
return None
I've added the else statement to output the exact sql and result that is being executed and returned.
The get_feed_cache() method is called from a database connection thread pool. When the get_feed_cache() method is called "slowishly" (~1/sec or less) the result is returned as expected, however when called concurrently it will occasionally return None. I have tried multiple ways of writing this query & method.
Some observations:
If I remove 'AND localtimestamp <= expire_date' from the query, the query ALWAYS returns a result.
Executing the query rapidly in serial in psql always returns a result.
After reading about the fetch*() methods of psycopg's cursor class they note that the results are cached for the cursor, I'm assuming that the cache is not shared between different cursors. http://initd.org/psycopg/docs/faq.html#best-practices
I have tried using postgresql's now() and current_timestamp functions with the same results. (I am aware of the timezone aspect of now() & current_timestamp)
Conditions to note:
There will NEVER be a case where there is not a feed_cache value for a provided feed_id.
There will NEVER be a case where any value in the feed_cache table is NULL
While testing I have completely disabled any & all writes to this table
I have set the expire_date to be sufficiently far in the future for all values such that the expression 'AND localtimestamp <= expire_date' will always be true.
Here is a copy & pasted output of it returning None:
DBSELECT.get_feed_cache: None
sql: SELECT feed_cache FROM feed_cache WHERE feed_id=5 AND localtimestamp < expire_date;
Well that's pretty much it, I'm not sure what's going on. Maybe I'm making some really dumb mistake and I just don't notice it! My current guess is that it has something to do with psycopg2 and perhaps the way it's caching results between cursors. If the cursors DO share the cache and the queries happen near-simultaneously then it could be possible that the first cursor fetches the result, the second cursor sees there is a cache of the same query, so it does not execute, then the first cursor closes and deletes the cache and the second cursor tries to fetch a now null/None cache.*
That said, psycopg2 states that it's thread-safe for read-only queries, so unless I'm miss-interpreting their implementation of thread-safe, this shouldn't be the case.
Thank you for your time!
*After adding a thread lock for the get_feed_cache, acquiring before creating the cursor and releasing before returning, I still occasionally get a None result
I think this might have to do with the fact that the time stamps returned by localtimestamp or current_timestamp are fixed when the transaction starts, not when you run the statement. And psycopg manages the transactions behind your back to some degree. So you might be getting a slightly older time stamp.
You could debug this by setting log_statement = all in your server and then observing when the BEGIN statements are executed relative to your queries.
You might want to look into using a function such as clock_timestamp(), which updates more often per transaction. See http://www.postgresql.org/docs/current/static/functions-datetime.html.

New rows not showing up after SQL INSERT & "commit" with Python and SQL

I made a loop in Python that calls itself to repeatedly check for new entries in a database. On first execution, all affected rows are shown fine. Meanwhile, I add more rows into the database. On the next query in my loop, the new rows are not shown.
This is my query-loop:
def loop():
global mysqlconfig # username, passwd...
tbd=[] # this is where I save the result
conn = MySQLdb.connect(**mysqlconfig)
conn.autocommit(True)
c = conn.cursor()
c.execute("SELECT id, message FROM tasks WHERE date <= '%s' AND done = 0;" % now.isoformat(' '))
conn.commit()
tbd = c.fetchall()
print tbd
c.close()
conn.close()
time.sleep(5)
loop()
loop()
This is the SQL part of my Python insertion-script:
conn = MySQLdb.connect(**mysqlconfig)
conn.autocommit(1)
c = conn.cursor()
c.execute("INSERT INTO tasks (date, message) VALUES ('{0}', '{1}');".format("2012-10-28 23:50", "test"))
conn.commit()
id = c.lastrowid
c.close()
conn.close()
I tried SQLite, I tried Oracle MySQL's connector, I tried MySQLdb on a Windows and Linux system and all had the same problem. I looked through many, many threads on Stackoverflow that recommended to turn on autocommit or use commit() after an SQL statement (ex. one, two, three), which I tried and failed.
When I added data with HeidiSQL to my database it showed up in the loop query, but I don't really know why this is. Rows inserted with mysql-client on Linux and my Python insertion script never show up until I restart my loop script.
I don't know if it's the fact that I open 2 connections, each in their own script, but I close every connection and every cursor when I'm done with them.
The problem could be with your variable now. I don't see anywhere in the loop that it is being reset.
I'd probably use the mysql NOW() function:
c.execute("SELECT id, message FROM tasks WHERE date <= NOW() AND done = 0;")
It looks like the time you are inserting into the database is a time in the future. I don't think your issue is with your database connection, I think it's something to do with the queries you are doing.

Categories

Resources