Python MySQLdb / MySQL INSERT IGNORE & Checking if Ignored - python

I understand that the fastest way to check if a row exists isn't even to check, but to use an INSERT IGNORE when inserting new data. This is most excellent for my application. However, I need to be able to check if the insert was ignored (or conversely, if the row was actually inserted).
I could use a try/catch, but that's not very elegant. Was hoping that someone might have a cleaner and more elegant solution.

Naturally, a final search after I post the question yields the result.
mysql - after insert ignore get primary key
However, this still requires a second trip to the database. I would love to see if there's a clean pythonic way to do this with a single query.
query = "INSERT IGNORE ..."
cursor.execute(query)
# Last row was ignored
if cursor.lastrowid == 0:
This does an INSERT IGNORE query and if the insert is ignored (duplicate), the lastrowid will be 0.

Related

Is it possible to NOT add data into a database when a certain condition is met? Sqlite3

I'm trying to NOT add data to the database if a certain condition is met.
Is it possible, and if so, what's the syntax?
import sqlite3
text = input('> ')
def DO_NOT_ADD_DATA():
con = sqlite3.connect('my db file.db')
cursor = con.cursor()
if "a" in text():
print("Uh Oh")
# I want to NOT add data in this part, it still gets added but it does print out Uh Oh
else:
cursor.execute("INSERT INTO table VALUES ('value1', 'value2')")
con.commit()
con.close()
Yes. It is quite possible. And you have multiple ways of doing it.
You could do it as in your example (if you fix the syntax errors), where the python script can perform some complex evaluation of whether to perform the operation.
If you instead want to avoid inserting duplicates, you would probably not check so in python, as you can run into race conditions (e.g. if you were to query the database first whether entry 'a' already exists, it doesn't, but another process sneaks in the entry in the time between you've checked and actually inserted it).
In these cases you can actually build your database to ensure it always upholds some constraints. In these cases, you could put a "UNIQUE" constaint on the column, and if you attempted to insert a duplicate, the database will throw you an error, so you can react accordingly.
See e.g. https://sqlite.org/lang_createtable.html, https://sqlite.org/syntax/column-constraint.html, https://sqlite.org/syntax/table-constraint.html.
Whether you want to do one or another really, really depends on what you actually want to acheive.
(Note: The race conditions could be prevented by using transactions, and sometimes transactions and locking rows/tables/databases is preferred over using constraints in the database schema. It all really depends.)

SQLite3 if exists add else increment

I'm trying to write a query for sqlite3 to check if a discordID exists in my database and increment the associated count variable if it exists and if not it should add a new row with that discordID and its associated count increased by one.
crs.execute("INSERT OR IGNORE INTO {0} (discordID,count) VALUES ({1},1) UPDATE {0} SET count = count + 1 WHERE discordID = {1};".format(tableName,user))
I tried this query (where user is an input discordID) however I keep getting the error:
sqlite3.OperationalError: near "UPDATE": syntax error
and I would like to know why this happening and how it can be fixed or if there's a better way to be doing this.
What you are looking for is an UPSERT. This allows you to specify an ON CONFLICT clause that will be executed if the INSERT would violate a constraint.
Applied to your query, it should look something like this:
crs.execute("INSERT INTO {0} (discordID,count) VALUES ({1},1) ON CONFLICT(discordID) DO UPDATE SET count = count + 1;".format(tableName,user))
On a side note: you should avoid inserting user-generated content directly into your database and/or query strings. If you need to store user-generated inputs, look into using prepared statements, otherwise you'll be vulnerable to SQL Injection

SQLAlchemy - bulk insert ignore Duplicate / Unique

Using Sqlalchemy With a large dataset, I would like to insert all rows using something efficient like session.add_all() followed by session.commit(). I am looking for a way to ignore inserting any rows which raise duplicate / unique key errors. The problem is that these errors only come up on the session.commit() call, so there is no way to fail that specific row and move onto the next.
The closest question I have seen is here: SQLAlchemy - bulk insert ignore: "Duplicate entry" ; however, the accepted answer proposes not using the bulk method and committing after every single row insert, which is extremely slow and causes huge amounts of I/O, so I am looking for a better solution.
Indeed.
Same issue here. They seem to have forgotten performance, and especially when you have a remote DB this is an issue.
What I then always do is code around it in Python using a Dictionary or List. The trick is for instance in a Dictionary to set key and value to the same key data.
i.e.
myEmailAddressesDict = {}
myEmailList = []
for emailAddress in allEmailAddresses:
if emailAddress not in myEmailAddressesDict:
#can add
myEmailList.append(emailAddress)
myEmailAddressesDict[emailAddress] = emailAddress
mySession = sessionmaker(bind=self.engine)
try:
mySession.add_all(myEmailList)
mySession.commit()
except Exception as e:
print("Add exception: ", str(e))
mySession.close()
It's not a fix to the actual problem but a sort of workaround for the moment. The key in this solution here is that you actually have cleared (delete_all) the DB or start with nothing. Otherwise, when you already have a DB then the code will fail nevertheless.
For this we need something like a parameter in SQLAlchemy to ignore dupes on the add_all or they should provide a merge_all.

MySQLdb Python prevent duplicate and optimize muliple inserts

I wrote this python script to import a specific xls file into mysql. It works fine but if it's run twice on the same data it will create duplicate entries. I'm pretty sure I need to use MySQL JOIN but I'm not clear on how to do that. Also is executemany() going to have the same overhead as doing inserts in a loop? I'm obviously trying to avoid that.
Here's the code in question...
for row in range(sheet.nrows):
"""name is in the 0th col. email is the 4th col."""
name = sheet.cell(row, 0).value
email = sheet.cell(row, 4).value
if name and email:
mailing_list[name.lstrip()] = email.strip()
for n, e in sorted(mailing_list.iteritems()):
rows.append((n, e))
db = MySQLdb.connect(host=host, user=user, db=dbname, passwd=pwd)
cursor = db.cursor()
cursor.executemany("""
INSERT IGNORE INTO mailing_list (name, email) VALUES (%s,%s)""",(rows))
CLARIFICATION...
I read here that...
To be sure, executemany() is effectively the same as simple iteration.
However, it is typically faster. It provides an optimized means of
affecting INSERT and REPLACE across multiple rows.
Also I took Unodes suggestion and used the UNIQUE constraint. But the IGNORE keyword is better than ON DUPLICATE KEY UPDATE because I want it to fail silently.
TL;DR
1. What's the best way prevent duplicate inserts?
ANSWER 1: UNIQUE contraint on column with SELECT IGNORE to fail silently or ON DUPLICATE KEY UPDATE to increment the duplicate value and insert it.
Is executemany() as expensive as INSERT in a loop?
#Unode says it's not but my research tells me otherwise. I would like a definitive answer.
Is this the best way or is it going to be really slow with bigger
tables and how would I test to be sure?
1 - What's the best way prevent duplicate inserts?
Depending on what "preventing" means in your case, you have two strategies and one requirement.
The requirement is that you add a UNIQUE constraint on the column/columns that you want to be unique. This alone will cause an error if insertion of a duplicate entry is attempted. However given you are using executemany the outcome may not be what you would expect.
Then as strategies you can do:
An initial filter step by running a SELECT statement before. This means running one SELECT statement per item in your rows to check if it exists already. This strategy works but is inefficient.
Using ON DUPLICATE KEY UPDATE. This automatically triggers an update if the data already exists. For more information refer to the official documentation.
2 - Is executemany() as expensive as INSERT in a loop?
No, executemany creates one query which inserts in bulk while doing a for loop will create as many queries as the number of elements in your rows.

Continue loading after IntegrityError

In python, I am populating a SQLITE data base using the importmany, so I can import tens of thousands of rows of data at once. My data is contained as a list of tuples. I had my database set up with the primary keys where I wanted them.
Problem I ran into was primary key errors would throw up an IntegrityError. If I handle the exception my script stops importing at the primary key conflict.
try:
try:
self.curs.executemany("INSERT into towers values (NULL,?,?,?,?)",self.insertList)
except IntegrityError:
print "Primary key error"
conn.commit()
So my questions are, in python using importmany can I:
1. Capture the values that violate the primary key?
2. Continue loading data after I get my primary key errors.
I get why it doesnt continue to load, because after the exception I commit the data to the database. I dont know how to continue where I left off however.
Unforutnley I cannot copy and paste all the code on this network, any help would be greatly appreciated. Right now I have no PKs set as a work around...
To answer (2) first, if you want to continue loading after you get an error, it's a simple fix on the SQL side:
INSERT OR IGNORE INTO towers VALUES (NULL,?,?,?,?)
This will successfully insert any rows that don't have any violations, and gracefully ignore the conflicts. Please do note however that the IGNORE clause will still fail on Foreign Key violations.
Another option for a conflict resolution clause in your case is: INSERT OR REPLACE INTO .... I strongly recommend the SQLite docs for more information on conflicts and conflict resolution.
As far as I know you cannot do both (1) and (2) simultaneously in an efficient way. You could possibly create a trigger to fire before insertions that can capture conflicting rows but this will impose a lot of unnecessary overhead on all of your insertions. (Someone please let me know if you can do this in a smarter way.) Therefore I would recommend you consider whether you truly need to capture the values of the conflicting rows or whether a redesign of your schema is required, if possible/applicable.
You could use lastrowid to get the point where you stopped:
http://docs.python.org/library/sqlite3.html#sqlite3.Cursor.lastrowid
If you use it, however, you can't use executemany.
Use a for loop to iterate through the list and use execute instead of executemany. Surround the for loop with your try and continue execution after an exception. Something like this:
for it in self.insertList:
try:
self.curs.execute("INSERT into towers values (NULL,?,?,?,?)",it)
except IntegrityError:
#here you could insert the itens that were rejected in a temporary table
#without constraints for later use (question 1)
pass
conn.commit()
You can even count how many items of the list were really inserted.

Categories

Resources