sqlite3 Operation Error when doing many commits rapidly - python

I get
sqlite3.OperationalError: SQL logic error or missing database
when I run an application I've been working on. What follows is a narrowed-down but complete sample that exhibits the problem for me. This sample uses two tables; one to store users and one to record whether user information is up-to-date in an external directory system. (As you can imagine, the tables are a fair bit longer in my real application). The sample creates a bunch of random users, and then goes through a list of (random) users and adds them to the second table.
#!/usr/bin/env python
import sqlite3
import random
def random_username():
# Returns one of 10 000 four-letter placeholders for a username
seq = 'abcdefghij'
return random.choice(seq) + random.choice(seq) + \
random.choice(seq) + random.choice(seq)
connection = sqlite3.connect("test.sqlite")
connection.execute('''CREATE TABLE IF NOT EXISTS "users" (
"entry_id" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL ,
"user_id" INTEGER NOT NULL ,
"obfuscated_name" TEXT NOT NULL)''')
connection.execute('''CREATE TABLE IF NOT EXISTS "dir_x_user" (
"user_id" INTEGER PRIMARY KEY NOT NULL)''')
# Create a bunch of random users
random.seed(0) # get the same results every time
for i in xrange(1500):
connection.execute('''INSERT INTO users
(user_id, obfuscated_name) VALUES (?, ?)''',
(i, random_username()))
connection.commit()
#random.seed()
for i in xrange(4000):
username = random_username()
result = connection.execute(
'SELECT user_id FROM users WHERE obfuscated_name = ?',
(username, ))
row = result.fetchone()
if row is not None:
user_id = row[0]
print " %4d %s" % (user_id, username)
connection.execute(
'INSERT OR IGNORE INTO dir_x_user (user_id) VALUES(?)',
(user_id, ))
else:
print " ? %s" % username
if i % 10 == 0:
print "i = %s; committing" % i
connection.commit()
connection.commit()
Of particular note is the line near the end that says,
if i % 10 == 0:
In the real application, I'm querying the data from a network resource, and want to commit the users every now and then. Changing that line changes when the error occurs; it seems that when I commit, there is a non-zero chance of the OperationalError. It seems to be somewhat related to the data I'm putting in the database, but I can't determine what the problem is.
Most of the time if I read all the data and then commit only once, an error does not occur. [Yes, there is an obvious work-around there, but a latent problem remains.]
Here is the end of a sample run on my computer:
? cgha
i = 530; committing
? gegh
? aabd
? efhe
? jhji
? hejd
? biei
? eiaa
? eiib
? bgbf
759 bedd
i = 540; committing
Traceback (most recent call last):
File "sqlitetest.py", line 46, in <module>
connection.commit()
sqlite3.OperationalError: SQL logic error or missing database
I'm using Mac OS X 10.5.8 with the built-in Python 2.5.1 and Sqlite3 3.4.0.

As the "lite" part of the name implies, sqlite3 is meant for light-weight database use, not massive scalable concurrency like some of the Big Boys. Seems to me that what's happening here is that sqlite hasn't finished writing the last change you requested when you make another request
So, some options I see for you are:
You could spend a lot of time learning about file locking, concurrency, and transaction in sqlite3
You could add some more error-proofing simply by having your app retry the action after the first failure, as suggested by some on this Reddit post, which includes tips such as "If the code has an effective mechanism for simply trying again, most of sqlite's concurrency problems go away" and "Passing isolation_level=None to connect seems to fix it".
You could switch to using a more scalable database like PostgreSQL
(For my money, #2 or #3 are the way to go.)

Related

How can I create a list inside of a "dictionary" with sqlite3?

Instead of using a JSON file to store data, I've decided I wanted to use a database instead. Here is how the data currently looks inside of the JSON file:
{"userID": ["reason 1", "reason 2", "reason 3"]}
I made it so that after a certain amount of time a reason is removed. For example, "reason 2" will be removed after 12 hours of it being added. However, I realised that if I terminate the process and then run it again the reason would just stay there until I manually remove it.
I've decided to use sqlite3 to make a database and have a discord.py task loop to remove it for me. How can I replicate the dictionary inside the database? Here is what I'm thinking at the moment:
c = sqlite3.connect('file_name.db')
cursor = c.cursor()
cursor.execute("""CREATE TABLE table_name (
userID text,
reason blob
)"""
Try the following table to store the reasons:
CREATE TABLE reasons (
reason_id PRIMARY KEY,
user_id,
reason,
is_visible,
created_at
)
Then the reasons could be soft deleted for every user by running:
UPDATE reasons
SET is_visible = 0
WHERE created_at + 3600 < CAST( strftime('%s', 'now') AS INT )
The example shows hiding reasons after 1 hour (3600 seconds).
The reasons can be hard deleted later by running the following query:
DELETE reasons
WHERE is_visible = 0
The soft delete comes in handy for verification and getting data back in case of a future bug in the software.
Simply use nested loops to insert each reason into a row of the table.
insert_query = """INSERT INTO table_name (userID, reason) VALUES (?, ?)"""
for user, reasons in json_data.items():
for reason in reasons:
cursor.execute(insert_query, (user, reason))

Use of '.format()' vs. '%s' in cursor.execute() for mysql JSON field, with Python mysql.connector,

My objective is to store a JSON object into a MySQL database field of type json, using the mysql.connector library.
import mysql.connector
import json
jsonData = json.dumps(origin_of_jsonData)
cnx = mysql.connector.connect(**config_defined_elsewhere)
cursor = cnx.cursor()
cursor.execute('CREATE DATABASE dataBase')
cnx.database = 'dataBase'
cursor = cnx.cursor()
cursor.execute('CREATE TABLE table (id_field INT NOT NULL, json_data_field JSON NOT NULL, PRIMARY KEY (id_field))')
Now, the code below WORKS just fine, the focus of my question is the use of '%s':
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES (%s, %s)"
values_to_insert = (1, jsonData)
cursor.execute(insert_statement, values_to_insert)
My problem with that: I am very strictly adhering to the use of '...{}'.format(aValue) (or f'...{aValue}') when combining variable aValue(s) into a string, thus avoiding the use of %s (whatever my reasons for that, let's not debate them here - but it is how I would like to keep it wherever possible, hence my question).
In any case, I am simply unable, whichever way I try, to create something that stores the jsonData into the mySql dataBase using something that resembles the above structure and uses '...{}'.format() (in whatever shape or form) instead of %s. For example, I have (among many iterations) tried
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES ({}, {})".format(1, jsonData)
cursor.execute(insert_statement)
but no matter how I turn and twist it, I keep getting the following error:
ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[some_content_from_jsonData})]' at line 1
Now my question(s):
1) Is there a way to avoid the use of %s here that I am missing?
2) If not, why? What is it that makes this impossible? Is it the cursor.execute() function, or is it the fact that it is a JSON object, or is it something completely different? Shouldn't {}.format() be able to do everything that %s could do, and more?
First of all: NEVER DIRECTLY INSERT YOUR DATA INTO YOUR QUERY STRING!
Using %s in a MySQL query string is not the same as using it in a python string.
In python, you just format the string and 'hello %s!' % 'world' becomes 'hello world!'. In SQL, the %s signals parameter insertion. This sends your query and data to the server separately. You are also not bound to this syntax. The python DB-API specification specifies more styles for this: DB-API parameter styles (PEP 249). This has several advantages over inserting your data directly into the query string:
Prevents SQL injection
Say you have a query to authenticate users by password. You would do that with the following query (of course you would normally salt and hash the password, but that is not the topic of this question):
SELECT 1 FROM users WHERE username='foo' AND password='bar'
The naive way to construct this query would be:
"SELECT 1 FROM users WHERE username='{}' AND password='{}'".format(username, password)
However, what would happen if someone inputs ' OR 1=1 as password. The formatted query would then become
SELECT 1 FROM users WHERE username='foo' AND password='' OR 1=1
which will allways return 1. When using parameter insertion:
execute('SELECT 1 FROM users WHERE username=%s AND password=%s', username, password)
this will never happen, as the query will be interpreted by the server separately.
Performance
If you run the same query many times with different data, the performance difference between using a formatted query and parameter insertion can be significant. With parameter insertion, the server only has to compile the query once (as it is the same every time) and execute it with different data, but with string formatting, it will have to compile it over and over again.
In addition to what was said above, I would like to add some details that I did not immediately understand, and that other (newbies like me ;)) may also find helpful:
1) "parameter insertion" is meant for only for values, it will not work for table names, column names, etc. - for those, the Python string substitution works fine in the sql syntax defintion
2) the cursor.execute function requires a tuple to work (as specified here, albeit not immediately clear, at least to me: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html)
EXAMPLE for both in one function:
def checkIfRecordExists(column, table, condition_name, condition_value):
...
sqlSyntax = 'SELECT {} FROM {} WHERE {} = %s'.format(column, table, condition_name)
cursor.execute(sqlSyntax, (condition_value,))
Note both the use of .format in the initial sql syntax definition and the use of (condition_value,) in the execute function.

Error (1046, 'No database selected') happened in Python program, but not in mysql workbench

I have encounter a weird problem when using python to update a record in mysql database. That is, the error 1046 was thrown by python group, but the same mysql statement worked pretty fine in mysql workbench.
Here is the mysql statement,
UPDATE r resultant_data d
INNER JOIN
(SELECT
uid,
SUBSTRING_INDEX(GROUP_CONCAT(login_type
ORDER BY device_ct DESC), ',', 1) devices
FROM
(SELECT
uid, login_type, COUNT(*) AS device_ct
FROM
login_record l
WHERE
l.ctime > 1451577600
AND l.ctime < 1454256000
GROUP BY uid , login_type
ORDER BY device_ct DESC) a
GROUP BY uid) ct ON d.uid = ct.uid AND d.month_id = 1
SET
d.device = ct.devices
;
My task is to update the most used login device of one user during one month into table resultant_data based on the login_record table. So step one (innermost query): create a table that showcases uid, login_device, login times(i.e. device_ct). Step two (the second innermost query): based on the device_ct, find out the uid and login_type which is associated with the most device_ct. Step three (the update layer): match the uid and update the record into resultant_data.
So does the problem come from the python? Or mysql statement? I suspect the problem is due to "inner join" command (although it works fine in mysql workbench_. I have a similar problem before, which I solved by rewriting "inner join" as "where uid in (select....)". But for this task, is there a way to rewrite or restructure the statement?
Many thanks.
I don't think your problem is your SQL. Make sure when you are making your database connection in python that you include the database aka schema name that the table resides in:
db_connection = mysql.connector.connect(
host="192.168.1.101",
user="myusername",
passwd="mypassword",
database="database_table_lives_in"
)

Python, MySQLdb, unable to update row

I know SO is not a "debug my code service" but after checking and retrying for hours either I'm missing something very stupid or maybe there's a bug (or bad compilation) in my MySQLdb module...
Also I've some related question I've put along with the posted code...
def NextDocumentIdToCache():
if not OpenConnection():
return 0
# ...setting string values... #
cur = connection.cursor(mysql.cursors.DictCursor)
cur.execute('SELECT * FROM documents WHERE resolutions <> %s AND pdf > 0 AND status = %s AND expire > %s AND locked = %s LIMIT 0,1', (resolutions, status, expireLimit, ''))
rowsCount = cur.rowcount
if rowsCount==0:
return 0
row = cur.fetchone()
id = row['document_id']
Everything ok since now. Connection opens, I get one row and retrieve the correct id that is returned at the end of the function.
Then I need to perform an update operation on the row fetched...
cur.close()
cur = connection.cursor(mysql.cursors.Cursor)
I've closed the cursor and opened a new one. Do I actually need to do this? Or can I reuse the same cursor?
cur.execute("""UPDATE documents SET locked = %s WHERE document_id = %s""", ("worker: rendering pages", id))
This actually doesn't update the row. No exceptions happens, just don't work.
Finally the function ends...
cur.close()
return id
Also a couple of questions.
What's the difference between
cur.execute("""UPDATE documents....
and
cur.execute("UPDATE documents....
I've seen around both versions. What's the functional difference between triple double-quote, single double-quote, single single-quote?
And finally
If I write the query
cur.execute("""UPDATE documents SET locked = %s WHERE document_id = %d""", ("worker: rendering pages", id))
(note the %d instead of %s) I get an error. But id is a long integer, I've checked. So what's wrong?
Thank you
The change doesn't take effect:
Are you doing a transaction? You'll need to commit() if that's the case. PEP 249, which defines the Python Database API, states that a connection will use transactions even if you have not explicitly enabled them: "Note that if the database supports an auto-commit feature, this must be initially off."
Quoting
Triple quoted strings are multiline, single quoted strings aren't.
Interpolation
Because this is not actually string interpolation. Those placeholders are interpreted by MySQLdb, which will handle quoting for you.

How to update records in SQL Alchemy in a Loop

I am trying to use SQLSoup - the SQLAlchemy extention, to update records in a SQL Server 2008 database. I am using pyobdc for the connections. There are a number of issues which make it hard to find a relevant example.
I am reprojection a geometry field in a very large table (2 million + records), so many of the standard ways of updating fields cannot be used. I need to extract coordinates from the geometry field to text, convert them and pass them back in. All this is fine, and all the individual pieces are working.
However I want to execute a SQL Update statement on each row, while looping through the records one by one. I assume this places locks on the recordset, or the connection is in use - as if I use the code below it hangs after successfully updating the first record.
Any advice on how to create a new connection, reuse the existing one, or accomplish this another way is appreciated.
s = select([text("%s as fid" % id_field),
text("%s.STAsText() as wkt" % geom_field)],
from_obj=[feature_table])
rs = s.execute()
for row in rs:
new_wkt = ReprojectFeature(row.wkt)
update_value = "geometry :: STGeomFromText('%s',%s)" % (new_wkt, "3785")
update_sql = ("update %s set GEOM3785 = %s where %s = %i" %
(full_name, update_value, id_field, row.fid))
conn = db.connection()
conn.execute(update_sql)
conn.close() #or not - no effect..
Updated working code now looks like this. It works fine on a few records, but hangs on the whole table, so I guess it is reading in too much data.
db = SqlSoup(conn_string)
#create outer query
Session = sessionmaker(autoflush=False, bind=db.engine)
session = Session()
rs = session.execute(s)
for row in rs:
#create update sql...
session.execute(update_sql)
session.commit()
I now get connection busy errors.
DBAPIError: (Error) ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')
It looks like this could be a problem with the ODBC driver - http://sourceitsoftware.blogspot.com/2008/06/connection-is-busy-with-results-for.html
Further Update:
On the server using profiler, it shows the select statement then the first update statement are "starting" but neither complete.
If I set the Select statement to return the top 10 rows, then it does complete and the updates run.
SQL: Batch Starting Select...
SQL: Batch Starting Update...
I believe this is an issue with pyodbc and SQL Server drivers. If I remove SQL Alchemy and execute the same SQL with pyodbc it also hangs. Even if I create a new connection object for the updates.
I also tried the SQL Server Native Client 10.0 driver which is meant to allow MARS - Multiple Active Record Sets but it made no difference. In the end I have resorted to "paging the results" and updating these batches using pyodbc and SQL (see below), however I thought SQLAlchemy would have been able to do this for me automatically.
Try using a Session.
rs = s.execute() then becomes session.execute(rs) and you can replace the last three lines with session.execute(update_sql). I'd also suggest configuring your Session with autocommit off and call session.commit() at the end.
Can I suggest that when your process hangs you do a sp_who2 on the Sql box and see what is happening. Check for blocked spid's and see if you can find anything in the Sql code that can suggest what is happening. If you do find a spid that is blocking others you can do a dbcc inputbuffer(*spidid*) and see if that tells you what the query was it executed. Otherwise you can also attach the Sql profiler and trace your calls.
In some cases it could also be parallelism on the Sql server that cause blocks. Unless this is a data warehouse, I suggest turn your Max DOP off, (set it to 1). Let me know and when I check this again in the morning and you need help, I'll be glad to help.
Until I find another solution I am using a single connection and custom SQL to return sets of records, and updating these in batches. I don't think what I am doing is a particulary unique case, so I am not sure why I cannot handle multiple result sets simultaneously.
Below works but is very, very slow..
cnxn = pyodbc.connect(conn_string, autocommit=True)
cursor = cnxn.cursor()
#get total recs in the database
s = "select count(fid) as count from table"
count = cursor.execute(s).fetchone().count
#choose number of records to update in each iteration
batch_size = 100
for i in range(1,count, batch_size):
#sql to bring back relevant records in each batch
s = """SELECT fid, wkt from(select ROW_NUMBER() OVER(ORDER BY FID ASC) AS 'RowNumber'
,FID
,GEOM29902.STAsText() as wkt
FROM %s) features
where RowNumber >= %i and RowNumber <= %i""" % (full_name,i,i+batch_size)
rs = cursor.execute(s).fetchall()
for row in rs:
new_wkt = ReprojectFeature(row.wkt)
#...create update sql statement for the record
cursor.execute(update_sql)
counter += 1
cursor.close()
cnxn.close()

Categories

Resources