Combine two SQL lite databases with Python - python

I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:
con = lite.connect('test_score.db')
with con:
cur = con.cursor()
cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
item = cur.fetchone()
on.commit()
cur.close()
con.close()
I get table "scores" with following data:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
Two different users (userA and userB) copy test_score.db and code to their computer and use it separately.
I get back two db test_score.db but now with different content:
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15
I was trying to use
insert into AuditRecords select * from toMerge.AuditRecords;
to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing.
I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.
Result should be something like this:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09
I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.
Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.

You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?
Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:
insert into
AuditRecords
select
id,
first_name,
last_name,
max(score) as score
from
toMerge.AuditRecords
group by
id,
first_name,
last_name;

For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:
CREATE TABLE AuditRecords(
id INTEGER PRIMARY KEY AUTOINCREMENT,
first TEXT,
last TEXT,
score INTEGER,
UNIQUE(first, last, score)
);
Now you can use INSERT OR IGNORE to merge the tables:
INSERT OR IGNORE INTO AuditRecords(first, last, score)
SELECT first, last, score
FROM toMerge.AuditRecords;
Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.
Another way to do it without defining the UNIQUE constraint is to use EXCEPT:
INSERT INTO AuditRecords(first, last, score)
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords

Related

Is it possible to assign cursor.fetchall() to a variable?

rows_order = "SELECT COUNT (*) FROM 'Order'"
cursor.execute(rows_order)
ordernum = cursor.fetchall()
connection.commit()
cursor.execute("INSERT INTO 'Order' (OrderNo, CustomerID, Date, TotalCost) VALUES (?,?,?,?)", (
[ordernum], custid_Sorder, now, total_item_price))
This is what I am trying but this error popped up;
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
How do I fix this? I want to make it so the OrderNo is = to the amount of orders before it, hence why I want to assign the orderno to it. (I am using sqlite3)
as you have only one value you need only fetchone
import sqlite3
con = sqlite3.connect("tutorial.db")
cursor = con.cursor()
rows_order = "SELECT COUNT (*) FROM 'Order'"
cursor.execute(rows_order)
ordernum = cursor.fetchone()[0]
cursor.execute("INSERT INTO 'Order' (OrderNo, CustomerID, Date, TotalCost) VALUES (?,?,?,?)", (
ordernum, custid_Sorder, now, total_item_price))
tl;dr Don't do this. Use an auto-incremented primary key.
fetchall returns all rows as a list, even if there is only one row.
Instead, use fetchone. This will return a single tuple which you can then select the first item. ordernum = cursor.fetchone()[0]
However, you appear to be writing a query to get the next ID. Using count(*) is wrong. If there are any gaps in OrderNo, for example if something gets deleted, it can return a duplicate. Consider [1, 3, 4]; count(*) will return 3. Use max(OrderNo) instead.
Furthermore, if you try to insert two orders at the same time you might get a race condition and one will try to duplicate the other.
process 1 process 2
select max(orderNo)
fetchone # 4
select max(orderNo)
fetchone # 4
insert into orders...
insert into orders... # duplicate OrderNo
To avoid this, you have to do both the select and insert in a transaction.
process 1 process 2
begin
select max(orderNo)...
fetchone # 4 begin
select max(orderNo)
fetchone
insert into orders... # wait
commit # wait
# 5
insert into orders...
commit
Better yet, do them as a single query.
insert into "Order" (OrderNo, CustomerID, Date, TotalCost)
select max(orderNo), ?, ?, ?
from "order"
Even better don't do it at all. There is a built-in mechanism to do this use an auto-incremented primary keys.
-- order is a keyword, pluralizing table names helps to avoid them
create table orders (
-- It is a special feature of SQLite that this will automatically be unique.
orderNo integer primary key
customerID int,
-- date is also a keyword, and vague. Use xAt.
orderedAt timestamp,
totalCost int
)
-- orderNo will automatically be set to a unique number
insert into orders (customerID, orderedAt, totalCost) values (...)

Sqlite Avoid Duplicates Using Insert, Executemany and list of Tuples

I have an existing Sqlite table containing qualification data for students, the code periodically checks for new qualifications obtained and inserts them into the table. This causes duplicates.
def insertQualificationData(data):
# Run execute many command to insert data
self.cursor.executemany(
"""
INSERT INTO qualification (
qualificationperson,
type,
secondaryreference,
reference,
name,
pass,
start,
qualificationband,
grade,
time_stamp
) VALUES (?,?,?,?,?,?,?,?,?,?)
""", data
)
The 'data' variable is a list of tuples. Eg:
('209000010111327', 'WLC', 'G0W915', 'Certificate', 'Child Care and Education', 'P', '12/07/2001', 'PASS', 'Pass', 1648018935)
I want to prevent 'duplicate' values being inserted into the qualifications table, by 'duplicate' I mean if a row matches the qualificationperson, reference, name & pass columns it should not insert.
I have seen other answers doing a similar thing but with named columns from a second table, I am struggling with replicating this using a list of tuples and executemany()
You could add a unique index on those columns:
CREATE UNIQUE INDEX IF NOT EXISTS QData ON qualification (qualificationperson, reference, name, pass)
and then use an INSERT OR IGNORE statement so that a failure of one value to insert does not cause the entire executemany to fail:
INSERT OR IGNORE INTO qualification (...)

Python - Sqlite insert tuple without the autoincrement primary key value

I create a table with primary key and autoincrement.
with open('RAND.xml', "rb") as f, sqlite3.connect("race.db") as connection:
c = connection.cursor()
c.execute(
"""CREATE TABLE IF NOT EXISTS race(RaceID INTEGER PRIMARY KEY AUTOINCREMENT,R_Number INT, R_KEY INT,\
R_NAME TEXT, R_AGE INT, R_DIST TEXT, R_CLASS, M_ID INT)""")
I want to then insert a tuple which of course has 1 less number than the total columns because the first is autoincrement.
sql_data = tuple(b)
c.executemany('insert into race values(?,?,?,?,?,?,?)', b)
How do I stop this error.
sqlite3.OperationalError: table race has 8 columns but 7 values were supplied
It's extremely bad practice to assume a specific ordering on the columns. Some DBA might come along and modify the table, breaking your SQL statements. Secondly, an autoincrement value will only be used if you don't specify a value for the field in your INSERT statement - if you give a value, that value will be stored in the new row.
If you amend the code to read
c.executemany('''insert into
race(R_number, R_KEY, R_NAME, R_AGE, R_DIST, R_CLASS, M_ID)
values(?,?,?,?,?,?,?)''',
sql_data)
you should find that everything works as expected.
From the SQLite documentation:
If the column-name list after table-name is omitted then the number of values inserted into each row must be the same as the number of columns in the table.
RaceID is a column in the table, so it is expected to be present when you're doing an INSERT without explicitly naming the columns. You can get the desired behavior (assign RaceID the next autoincrement value) by passing an SQLite NULL value in that column, which in Python is None:
sql_data = tuple((None,) + a for a in b)
c.executemany('insert into race values(?,?,?,?,?,?,?,?)', sql_data)
The above assumes b is a sequence of sequences of parameters for your executemany statement and attempts to prepend None to each sub-sequence. Modify as necessary for your code.

Why does SQLite3 not yield an error

I am quite new to SQL, but trying to bugfix the output of an SQL-Query. However this question does not concern the bug, but rather why SQLite3 does not yield an error when it should.
I have query string that looks like:
QueryString = ("SELECT e.event_id, "
"count(e.event_id), "
"e.state, "
"MIN(e.boot_time) AS boot_time, "
"e.time_occurred, "
"COALESCE(e.info, 0) AS info "
"FROM events AS e "
"JOIN leg ON leg.id = e.leg_id "
"GROUP BY e.event_id "
"ORDER BY leg.num_leg DESC, "
"e.event_id ASC;\n"
)
This yields an output with no errors.
What I dont understand, is why there is no error when I GROUP BY e.event_id and e.state and e.time_occurred does not contain aggregate-functions and is not part of the GROUP BY statement?
e.state is a string column. e.time_occurred is an integer column.
I am using the QueryString in Python.
In a misguided attempt to be compatible with MySQL, this is allowed. (The non-aggregated column values come from some random row in the group.)
Since SQLite 3.7.11, using min() or max() guarantees that the values in the non-aggregated columns come from the row that has the minimum/maximum value in the group.
SQLite and MySQL allow bare columns in an aggregation query. This is explained in the documentation:
In the query above, the "a" column is part of the GROUP BY clause and
so each row of the output contains one of the distinct values for "a".
The "c" column is contained within the sum() aggregate function and so
that output column is the sum of all "c" values in rows that have the
same value for "a". But what is the result of the bare column "b"? The
answer is that the "b" result will be the value for "b" in one of the
input rows that form the aggregate. The problem is that you usually do
not know which input row is used to compute "b", and so in many cases
the value for "b" is undefined.
Your particular query is:
SELECT e.event_id, count(e.event_id), e.state, MIN(e.boot_time) AS boot_time,
e.time_occurred, COALESCE(e.info, 0) AS info
FROM events AS e JOIN
leg
ON leg.id = e.leg_id "
GROUP BY e.event_id
ORDER BY leg.num_leg DESC, e.event_id ASC;
If e.event_id is the primary key in events, then this syntax is even supported by the ANSI standard, because event_id is sufficient to uniquely define the other columns in a row in events.
If e.event_id is a PRIMARY or UNIQUE key of the table then e.time_occurred is called "functionally dependent" and would not even throw an error in other SQL compliant DBMSs.
However, SQLite has not implemented functional dependency. In the case of SQLite (and MySQL) no error is thrown even for columns that are not functionally dependent on the GROUP BY columns.
SQLite (and MySQL) simply select a random row from the result set to fill the (in SQLite lingo) "bare column", see this.

Add list to sqlite database

How would I add something in sqlite to an already existing table this is what I have so far
>>> rid
'26539249'
>>> for t in [(rid,("billy","jim"))]:
c.execute("insert into whois values (?,?)",t)
How would I add onto jim and create a list? or is there some way to add onto it so It can have multiple values?
I'll take a guess here, but I suspect I'm wrong.
You can't insert ("billy", "jim") as a column in the database. This is intentional. The whole point of RDBMSs like sqlite is that each field holds exactly one value, not a list of values. You can't search for 'jim' in the middle of a column shared with other people, you can't join tables based on 'jim', etc.
If you really, really want to do this, you have to pick some way to convert the multiple values into a single string, and to convert them back on reading. You can use json.dumps/json.loads, repr/ast.literal_eval, or anything else that seems appropriate. But you have to write the extra code yourself. And you won't be getting any real benefit out of the database if you do so; you'd be better off just using shelve.
So, I'm guessing you don't want to do this, and you want to know what you want to do instead.
Assuming your schema looks something like this:
CREATE TABLE whois (Rid, Names);
What you want is:
CREATE TABLE whois (Rid);
CREATE TABLE whois_names (Rid, Name, FOREIGN KEY(Rid) REFERENCES whois(Rid);
And then, to do the insert:
tt = [(rid,("billy","jim"))]
for rid, names in tt:
c.execute('INSERT INTO whois VALUES (?)', (rid,))
for name in names:
c.execute('INSERT INTO whois_names VALUES (?, ?)', (rid, name))
Or (probably faster, but not as interleaved):
c.executemany('INSERT INTO whois VALUES (?)', (rid for rid, names in tt))
c.executemany('INSERT INTO whois_names VALUES (?, ?),
(rid, name for rid, names in tt for name in names))
Not tested but should do the trick
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute('''CREATE TABLE if not exists Data
(id integer primary key autoincrement, List)''')
cur.execute("INSERT INTO Data (id,List) values (?,?)",
(lid, str(map(lambda v : v, My_list) ) ))

Categories

Resources