Is there a way to check and load missing data to SQL? - python

I am trying to figure out a way to check data I am loading into a SQL table from a dataframe so I can load missing data and avoid loading duplicate data.
Here is a really rough idea.
sql_data = []
data = [(2020-01-01, Monday, 20, 0.1), (2020-01-02, Tuesday, 12, 0.4), (2020-01-01, Wednesday, 26, 0.3)]
cursor.execute('''Select * FROM Table ''')
for row in cursor.fetchall():
sql_data.append(row)
if data in sql_data:
pass
else:
query = '''INSERT INTO Table (Time, Day, Number, Decimal)
VALUES (?, ?, ?, ?)'''
cursor.execute(query, data)
conn.commit()

Consider rarely used EXCEPT clause (part of UNION and INTERSECT set operator family) since SQL Server supports scalar values in SELECT without a FROM data source:
query = '''INSERT INTO Table (Time, Day, Number, Decimal)
SELECT ?, ?, ?, ?
EXCEPT
SELECT Time, Day, Number, Decimal
FROM Table
'''
cursor.executemany(query, data)
conn.commit()
Online Demo

Related

Shorten SQLite3 insert statement for efficiency and readability

From this answer:
cursor.execute("INSERT INTO booking_meeting (room_name,from_date,to_date,no_seat,projector,video,created_date,location_name) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (rname, from_date, to_date, seat, projector, video, now, location_name ))
I'd like to shorten it to something like:
simple_insert(booking_meeting, rname, from_date, to_date, seat, projector, video, now, location_name)
The first parameter is the table name which can be read to get list of column names to format the first section of the SQLite3 statement:
cursor.execute("INSERT INTO booking_meeting (room_name,from_date,to_date,no_seat,projector,video,created_date,location_name)
Then the values clause (second part of the insert statement):
VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
can be formatted by counting the number of column names in the table.
I hope I explained the question properly and you can appreciate the time savings of such a function. How to write this function in python? ...is my question.
There may already a simple_insert() function in SQLite3 but I just haven't stumbled across it yet.
If you're inserting into all the columns, then you don't need to specify the column names in the INSERT query. For that scenario, you could write a function like this:
def simple_insert(cursor, table, *args):
query = f'INSERT INTO {table} VALUES (' + '?, ' * (len(args)-1) + '?)'
cursor.execute(query, args)
For your example, you would call it as:
simple_insert(cursor, 'booking_meeting', rname, from_date, to_date, seat, projector, video, now, location_name)
Note I've chosen to pass cursor to the function, you could choose to just rely on it as a global variable instead.

Insert or update rows in MS Access database in Python

I've got an MS Access table (SearchAdsAccountLevel) which needs to be updated frequently from a python script. I've set up the pyodbc connection and now I would like to UPDATE/INSERT rows from my pandas df to the MS Access table based on whether the Date_ AND CampaignId fields match with the df data.
Looking at previous examples I've built the UPDATE statement which uses iterrows to iterate through all rows within df and execute the SQL code as per below:
connection_string = (
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};"
r"c:\AccessDatabases\Database2.accdb;"
)
cnxn = pyodbc.connect(connection_string, autocommit=True)
crsr = cnxn.cursor()
for index, row in df.iterrows():
crsr.execute("UPDATE SearchAdsAccountLevel SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?, [AppName]=?, [AppId]=?, [TotalBudgetAmount]=?, [TotalBudgetCurrency]=?, [DailyBudgetAmount]=?, [DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?, [ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?, [LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?, [Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?, [RowUpdatedTime]=? WHERE [Date_]=? AND [CampaignId]=?",
row['OrgId'],
row['CampaignName'],
row['CampaignStatus'],
row['Storefront'],
row['AppName'],
row['AppId'],
row['TotalBudgetAmount'],
row['TotalBudgetCurrency'],
row['DailyBudgetAmount'],
row['DailyBudgetCurrency'],
row['Impressions'],
row['Taps'],
row['Conversions'],
row['ConversionsNewDownloads'],
row['ConversionsRedownloads'],
row['Ttr'],
row['LocalSpendAmount'],
row['LocalSpendCurrency'],
row['ConversionRate'],
row['Week_'],
row['Month_'],
row['Year_'],
row['Quarter'],
row['FinancialYear'],
row['RowUpdatedTime'],
row['Date_'],
row['CampaignId'])
crsr.commit()
I would like to iterate through each row within my df (around 3000) and if the ['Date_'] AND ['CampaignId'] match I UPDATE all other fields. Otherwise I want to INSERT the whole df row in my Access Table (create new row). What's the most efficient and effective way to achieve this?
Consider DataFrame.values and pass list into an executemany call, making sure to order columns accordingly for the UPDATE query:
cols = ['OrgId', 'CampaignName', 'CampaignStatus', 'Storefront',
'AppName', 'AppId', 'TotalBudgetAmount', 'TotalBudgetCurrency',
'DailyBudgetAmount', 'DailyBudgetCurrency', 'Impressions',
'Taps', 'Conversions', 'ConversionsNewDownloads', 'ConversionsRedownloads',
'Ttr', 'LocalSpendAmount', 'LocalSpendCurrency', 'ConversionRate',
'Week_', 'Month_', 'Year_', 'Quarter', 'FinancialYear',
'RowUpdatedTime', 'Date_', 'CampaignId']
sql = '''UPDATE SearchAdsAccountLevel
SET [OrgId]=?, [CampaignName]=?, [CampaignStatus]=?, [Storefront]=?,
[AppName]=?, [AppId]=?, [TotalBudgetAmount]=?,
[TotalBudgetCurrency]=?, [DailyBudgetAmount]=?,
[DailyBudgetCurrency]=?, [Impressions]=?, [Taps]=?, [Conversions]=?,
[ConversionsNewDownloads]=?, [ConversionsRedownloads]=?, [Ttr]=?,
[LocalSpendAmount]=?, [LocalSpendCurrency]=?, [ConversionRate]=?,
[Week_]=?, [Month_]=?, [Year_]=?, [Quarter]=?, [FinancialYear]=?,
[RowUpdatedTime]=?
WHERE [Date_]=? AND [CampaignId]=?'''
crsr.executemany(sql, df[cols].values.tolist())
cnxn.commit()
For the insert, use a temp, staging table with exact structure as final table which you can create with make-table query: SELECT TOP 1 * INTO temp FROM final. This temp table will be regularly cleaned out and inserted with all data frame rows. The final query migrates only new rows from temp into final with NOT EXISTS, NOT IN, or LEFT JOIN/NULL. You can run this query anytime and never worry about duplicates per Date_ and CampaignId columns.
# CLEAN OUT TEMP
sql = '''DELETE FROM SearchAdsAccountLevel_Temp'''
crsr.executemany(sql)
cnxn.commit()
# APPEND TO TEMP
sql = '''INSERT INTO SearchAdsAccountLevel_Temp (OrgId, CampaignName, CampaignStatus, Storefront,
AppName, AppId, TotalBudgetAmount, TotalBudgetCurrency,
DailyBudgetAmount, DailyBudgetCurrency, Impressions,
Taps, Conversions, ConversionsNewDownloads, ConversionsRedownloads,
Ttr, LocalSpendAmount, LocalSpendCurrency, ConversionRate,
Week_, Month_, Year_, Quarter, FinancialYear,
RowUpdatedTime, Date_, CampaignId)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?, ?, ?, ?, ?);'''
crsr.executemany(sql, df[cols].values.tolist())
cnxn.commit()
# MIGRATE TO FINAL
sql = '''INSERT INTO SearchAdsAccountLevel
SELECT t.*
FROM SearchAdsAccountLevel_Temp t
LEFT JOIN SearchAdsAccountLevel f
ON t.Date_ = f.Date_ AND t.CampaignId = f.CampaignId
WHERE f.OrgId IS NULL'''
crsr.executemany(sql)
cnxn.commit()

Insert python list into SQLite3 column

I have three python lists with about 1.5 million entries each and would like to insert these into a new SQLite table. When doing this I get the error:
OperationalError: no such column: days
This is the code I have:
con = sqlite3.connect('database.db')
cur = con.cursor()
...
cur.execute("DROP TABLE IF EXISTS days")
cur.execute("CREATE TABLE IF NOT EXISTS days(DAYS_NEEDED integer, RAISED_TIME text, POSTED_TIME text)")
cur.execute("INSERT INTO days (DAYS_NEEDED, RAISED_TIME, POSTED_TIME) VALUES (days, rt_list, pt_list)")
con.commit()
"days" is a list of integers, rt_list and pt_list are both lists of strings. Does anyone know what I'm doing wrong here?
Any help is much appreciated!
That's not the way you can insert list of values in SQL. First, you must give a valid SQL instruction using ? as placeholders. Then if you want to insert more than one row at a time, you will need the executemany method. It is a true improvement because the SQL in only parsed and prepared once.
So you should have written:
cur.execute("DROP TABLE IF EXISTS days")
cur.execute("CREATE TABLE IF NOT EXISTS days(DAYS_NEEDED integer, RAISED_TIME text, POSTED_TIME text)")
cur.executemany("INSERT INTO days (DAYS_NEEDED, RAISED_TIME, POSTED_TIME) VALUES (?,?,?)",
zip(days, rt_list, pt_list))
con.commit()
BTW, the direct usage of zip is a Sqlite3 module extension, the DB-API 2.0 Python interface normally requires a sequence where zip returns an iterator, so the more portable way (any DB engine) would be:
cur.executemany("INSERT INTO days (DAYS_NEEDED, RAISED_TIME, POSTED_TIME) VALUES (?,?,?)",
tuple(zip(days, rt_list, pt_list)))
You have to use ? placeholders inside your VALUES() and then provide the actual values to the execute method.
Something along the lines should do the job:
con = sqlite3.connect('database.db')
cur = con.cursor()
...
cur.execute("DROP TABLE IF EXISTS days")
cur.execute("CREATE TABLE IF NOT EXISTS days(DAYS_NEEDED integer, RAISED_TIME text, POSTED_TIME text)")
def insert(days_needed, rt, pt):
cur.execute("INSERT INTO days (DAYS_NEEDED, RAISED_TIME, POSTED_TIME) VALUES (?, ?, ?)", (days_needed, rt, pt))
for d, rt, pt in zip(days, rt_list, pt_list):
insert(d, rt, pt)
con.commit()

Sqlite not using default values

When using:
import datetime
import sqlite3
db = sqlite3.connect('mydb.sqlite', detect_types=sqlite3.PARSE_DECLTYPES)
c = db.cursor()
db.text_factory = str
c.execute('create table if not exists mytable (date timestamp, title str, \
custom str, x float, y float, z char default null, \
postdate timestamp default null, id integer primary key autoincrement, \
url text default null)')
c.execute('insert into mytable values(?, ?, ?, ?, ?)', \
(datetime.datetime(2018,4,23,23,00), 'Test', 'Test2', 2.1, 11.1))
I have:
sqlite3.OperationalError: table mytable has 9 columns but 5 values were supplied
Why doesn't SQlite take default values (specified during table creation) in consideration to populate a new row?
(Also, as I'm reopening a project I wrote a few years ago, I don't find the datatypes str, char anymore in the sqlite3 doc, is it still relevant?)
Because you are saying that you want to insert all columns by not specifying the specific columns.
Change 'insert into mytable values(?, ?, ?, ?, ?)'
to 'insert into mytable (date, title, custom, x, y) values(?, ?, ?, ?, ?)'
Virtually any value for column type can be specified, the value will follow a set of rules and be converted to TEXT, INTEGER, REAL, NUMERIC or BLOB. However, you can store any type of value in any column.
STR will resolve to NUMERIC,
TIMESTAMP will resolve to NUMERIC,
FLOAT will resolve to REAL,
CHAR to TEXT.
Have a read of Datatypes In SQLite or perhaps have a look at How flexible/restricive are SQLite column types?
If you're going to only supply values for some columns, you need to specify which columns. Otherwise the engine won't know where to put them. This line needs to be changed:
c.execute('insert into mytable values(?, ?, ?, ?, ?)', \
(datetime.datetime(2018,4,23,23,00), 'Test', 'Test2', 2.1, 11.1))
To this:
c.execute('insert into mytable (date, title, custom, x, y)values(?, ?, ?, ?, ?)', \
(datetime.datetime(2018,4,23,23,00), 'Test', 'Test2', 2.1, 11.1))
Example Solution
cursor.execute('CREATE TABLE vehicles_record (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP)')
SQLite3 Query Above
cursor.execute("INSERT INTO vehicles_record(name) VALUES(?)", (name))
Result
id would be 1, name would be value of name var, current timestamp for last column.

How to format and build query strings in Python Sqlite?

What is the most used way to create a Sqlite query in Python?
query = 'insert into events (date, title, col3, col4, int5, int6)
values("%s", "%s", "%s", "%s", %s, %s)' % (date, title, col3, col4, int5, int6)
print query
c.execute(query)
Problem: it won't work for example if title contains a quote ".
query = 'insert into events (date, title, col3, col4, int5, int6)
values(?, ?, ?, ?, ?, ?)'
c.execute(query, (date, title, col3, col4, int5, int6))
Problem: in solution 1., we could display/print the query (to log it); here in solution 2. we can't log the query string anymore because the "replace" of each ? by a variable is done during the execute.
Another cleaner way to do it? Can we avoid to repeat ?, ?, ?, ..., ? and have one single values(?) and still have it replaced by all the parameters in the tuple?
You should always use parameter substitution of DB API, to avoid SQL injection, query logging is relatively trivial by subclassing sqlite3.Cursor:
import sqlite3
class MyConnection(sqlite3.Connection):
def cursor(self):
return super().cursor(MyCursor)
class MyCursor(sqlite3.Cursor):
def execute(self, sql, parameters=''):
print(f'statement: {sql!r}, parameters: {parameters!r}')
return super().execute(sql, parameters)
conn = sqlite3.connect(':memory:', timeout=60, factory=MyConnection)
conn.execute('create table if not exists "test" (id integer, value integer)')
conn.execute('insert into test values (?, ?)', (1, 0));
conn.commit()
yields:
statement: 'create table if not exists "test" (id integer, value integer)', parameters: ''
statement: 'insert into test values (?, ?)', parameters: (1, 0)
To avoid formatting problems and SQL injection attacks, you should always use parameters.
When you want to log the query, you can simply log the parameter list together with the query string.
(SQLite has a function to get the expanded query, but Python does not expose it.)
Each parameter markers corresponds to exactly one value. If writing many markers is too tedious for you, let the computer do it:
parms = (1, 2, 3)
markers = ",".join("?" * len(parms))

Categories

Resources