Sqlite Avoid Duplicates Using Insert, Executemany and list of Tuples - python

I have an existing Sqlite table containing qualification data for students, the code periodically checks for new qualifications obtained and inserts them into the table. This causes duplicates.
def insertQualificationData(data):
# Run execute many command to insert data
self.cursor.executemany(
"""
INSERT INTO qualification (
qualificationperson,
type,
secondaryreference,
reference,
name,
pass,
start,
qualificationband,
grade,
time_stamp
) VALUES (?,?,?,?,?,?,?,?,?,?)
""", data
)
The 'data' variable is a list of tuples. Eg:
('209000010111327', 'WLC', 'G0W915', 'Certificate', 'Child Care and Education', 'P', '12/07/2001', 'PASS', 'Pass', 1648018935)
I want to prevent 'duplicate' values being inserted into the qualifications table, by 'duplicate' I mean if a row matches the qualificationperson, reference, name & pass columns it should not insert.
I have seen other answers doing a similar thing but with named columns from a second table, I am struggling with replicating this using a list of tuples and executemany()

You could add a unique index on those columns:
CREATE UNIQUE INDEX IF NOT EXISTS QData ON qualification (qualificationperson, reference, name, pass)
and then use an INSERT OR IGNORE statement so that a failure of one value to insert does not cause the entire executemany to fail:
INSERT OR IGNORE INTO qualification (...)

Related

sqlite incorrect number of bindings: insert none if no value available

I have created a python list with 41 columns and 50 rows.
Now I want to insert this into an SQLite database.
When I execute the database export, I got the error message:
sqlite3.ProgrammingError: Incorrect number of bindings supplied. The current statement uses 41, and there are 40 supplied.
Most of the list fields should have data. Perhaps one or two don't have any.
Can I write into the sqlite database with a prompt like:
insert if data available, otherwise write none
Or something like this?
My code is like:
c.execute("""CREATE TABLE IF NOT EXISTS statstable (
spielid integer PRIMARY KEY,
41x field descr. (real, integer and text)
UNIQUE (spielid)
)
""")
c.executemany("INSERT OR REPLACE INTO statstable VALUES (41x ?)", all_data)
Append the appropriate number of None values to the nested lists to make them all 41 elements:
c.executemany("INSERT OR REPLACE INTO statstable VALUES (41x ?)",
[l + [None] * (41 - len(l)) for l in all_data])
This assumes the missing elements are always at the end of the list of columns. If they can be different columns in each row, I don't see how you can implement any automatic solution.
If the elements of all_data were dictionaries whose keys correspond to column names, you could determine which keys are missing. Then turn it into a list with the None placeholders in the appropriate places for those columns.

Combine two SQL lite databases with Python

I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:
con = lite.connect('test_score.db')
with con:
cur = con.cursor()
cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
item = cur.fetchone()
on.commit()
cur.close()
con.close()
I get table "scores" with following data:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
Two different users (userA and userB) copy test_score.db and code to their computer and use it separately.
I get back two db test_score.db but now with different content:
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15
I was trying to use
insert into AuditRecords select * from toMerge.AuditRecords;
to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing.
I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.
Result should be something like this:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09
I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.
Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.
You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?
Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:
insert into
AuditRecords
select
id,
first_name,
last_name,
max(score) as score
from
toMerge.AuditRecords
group by
id,
first_name,
last_name;
For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:
CREATE TABLE AuditRecords(
id INTEGER PRIMARY KEY AUTOINCREMENT,
first TEXT,
last TEXT,
score INTEGER,
UNIQUE(first, last, score)
);
Now you can use INSERT OR IGNORE to merge the tables:
INSERT OR IGNORE INTO AuditRecords(first, last, score)
SELECT first, last, score
FROM toMerge.AuditRecords;
Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.
Another way to do it without defining the UNIQUE constraint is to use EXCEPT:
INSERT INTO AuditRecords(first, last, score)
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords

Creating a table in MariaDB using a list of column names in Python

I am trying to create a table in mariadb using python. I have all the column names stored in a list as shown below.
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
This is just the sample list. Actual list has 200 items in the list. I am trying to create a table using the above collist elements as columns and the datatype for the columns is VARCHAR.
This is the code I am using to create a table
for p in collist:
cur.execute('CREATE TABLE IF NOT EXISTS table1 ({} VARCHAR(45))'.format(p)
The above code is executing but only the first element of the list is being added as a column in the table and I cannot see the remaining elements. I'd really appreciate if I can get a help with this.
You can build the string in 3 parts and then .join() those together. The middle portion is the column definitions, joining each of the item in the original list. This doesn't seem particularly healthy; both in the number of columns and the fact that everything is VARCHAR(45) but that's your decision:
collist = ['RR', 'ABPm', 'ABPs', 'ABPd', 'HR', 'SPO']
query = ''.join(["(CREATE TABLE IF NOT EXISTS table1 ",
' VARCHAR(45), '.join(collist),
' VARCHAR(45))'])
Because we used join, you need to specify the last column type separately (the third item in the list) to correctly close the query.
NOTE: If the input data comes from user input then this would be susceptible to SQL injection since you are just formatting unknown strings in, to be executed. I am assuming the list of column names is internal to your program.

Python - Sqlite insert tuple without the autoincrement primary key value

I create a table with primary key and autoincrement.
with open('RAND.xml', "rb") as f, sqlite3.connect("race.db") as connection:
c = connection.cursor()
c.execute(
"""CREATE TABLE IF NOT EXISTS race(RaceID INTEGER PRIMARY KEY AUTOINCREMENT,R_Number INT, R_KEY INT,\
R_NAME TEXT, R_AGE INT, R_DIST TEXT, R_CLASS, M_ID INT)""")
I want to then insert a tuple which of course has 1 less number than the total columns because the first is autoincrement.
sql_data = tuple(b)
c.executemany('insert into race values(?,?,?,?,?,?,?)', b)
How do I stop this error.
sqlite3.OperationalError: table race has 8 columns but 7 values were supplied
It's extremely bad practice to assume a specific ordering on the columns. Some DBA might come along and modify the table, breaking your SQL statements. Secondly, an autoincrement value will only be used if you don't specify a value for the field in your INSERT statement - if you give a value, that value will be stored in the new row.
If you amend the code to read
c.executemany('''insert into
race(R_number, R_KEY, R_NAME, R_AGE, R_DIST, R_CLASS, M_ID)
values(?,?,?,?,?,?,?)''',
sql_data)
you should find that everything works as expected.
From the SQLite documentation:
If the column-name list after table-name is omitted then the number of values inserted into each row must be the same as the number of columns in the table.
RaceID is a column in the table, so it is expected to be present when you're doing an INSERT without explicitly naming the columns. You can get the desired behavior (assign RaceID the next autoincrement value) by passing an SQLite NULL value in that column, which in Python is None:
sql_data = tuple((None,) + a for a in b)
c.executemany('insert into race values(?,?,?,?,?,?,?,?)', sql_data)
The above assumes b is a sequence of sequences of parameters for your executemany statement and attempts to prepend None to each sub-sequence. Modify as necessary for your code.

Insert Table Field Values from Dictionary Values with corresponding Keys

Would it be be possible to insert dictionary values, one at a time into a specific field/column of a sqlite3 database table?
The fields/columns were created with the database in a previous step.
I would prefer to use a for loop, but I haven't found the right sqlite3 "command" that selects the column, which is the same as the dictionary key, and inserts the corresponding values (dictionary value).
import sqlite3
with sqlite.connect(db_full_path) as connection:
cursor = connection.cursor()
cursor.execute("SELECT * FROM {}".format(db_table))
db_fields = ','.join('?' for tab in cursor.description) # i.e. ?,?,?,?
cursor.executemany("INSERT INTO {} VALUES ({})"\
.format(db_table, db_fields), (ORDERED_DATA_VALUES,))
connection.commit()
'cursor.execute' and 'cursor.executemany' both require a predefined number of columns and a sorted list with the same number of items in the right order, which I find less than ideal for my purpose.
I'd much rather iterate over a dictionary and insert the values one at a time, but into the same row:
for key, value in NOT_ORDERED_DATA_VALUES.items():
# insert value into corresponding field/column (key)

Categories

Resources