I have a function in my main Python file which gets called by main() and executes a SQL Merge (Upsert) statement using pyodbc from a different file & function. Concretely, the SQL statement traverses a source table with transaction details by distinct transaction datetimes and merges customers into a separate target table. The function that executes the statement and the function that returns the completed SQL statement are attached below.
When I run my Python script, it doesn't work as expected and inserts only around 70 rows (sometimes 69, 71, or 72) into the target customer table. However, when I use an identical SQL statement and execute it in the Microsoft SQL Server Management Studio console (attached below), it works fine and inserts 4302 rows (as expected).
I'm not sure what's wrong.. Would really appreciate any help!
SQL Statement Executor in Python main file:
def stage_to_dim(connection, cursor, now):
log(f"Filling {cfg.dim_customer} and {cfg.dim_product}")
try:
cursor.execute(sql_statements.stage_to_dim_statement(now))
connection.commit()
except Exception as e:
log(f"Error in stage_to_dim: {e}" )
sys.exit(1)
log("Stage2Dimensions complete.")
SQL Statement formulator in Python:
def stage_to_dim_statement(now):
return f"""
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM {cfg.stage_table} ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE {cfg.dim_customer} AS Target
USING (SELECT * FROM {cfg.stage_table} WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME, '{now}'), Target.LoadSource = '{cfg.ingest_file_path}'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID,
Source.AcquisitionDate, Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'{now}'), '{cfg.ingest_file_path}');
END
"""
SQL Statement from MS SQL Server Console:
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM dbo.STG_CustomerTransactions ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE dbo.DIM_CustomerDup AS Target
USING (SELECT * FROM dbo.STG_CustomerTransactions WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME,'6/30/2022 11:53:05'), Target.LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID, Source.AcquisitionDate,
Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'6/30/2022 11:53:05'), '../csv/cleaned_original_data.csv');
END
If you think carefully about what your final result ends up, you are actually just taking the latest row (by date) for each customer. So you can just filter the source using a standard row-number approach.
Exactly why the Python code didn't work properly is unclear, but the below query might work better. You are also doing SQL injection, which is dangerous and can also cause correctness problems.
Also you should always use a non-ambiguous date format.
MERGE dbo.DIM_CustomerDup AS t
USING (
SELECT *
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY s.CustomerID ORDER BY s.TransactionDateTime DESC)
FROM dbo.STG_CustomerTransactions s
) AS s
WHERE s.rn = 1
) AS s
ON t.CustomerCodeNK = s.CustomerID
WHEN MATCHED THEN
UPDATE SET
AquiredDate = s.AcquisitionDate,
AquiredSource = s.AcquisitionSource,
ZipCode = s.Zipcode,
LoadDate = SYSDATETIME(),
LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource)
VALUES (s.CustomerID, s.AcquisitionDate, s.AcquisitionSource, s.Zipcode, SYSDATETIME(), '../csv/cleaned_original_data.csv')
;
I am working in Python/Django with a MySQL Db. This sql query works fine in my Workbench
SELECT * FROM frontend_appliance_sensor_reading
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x'28E86479972003D2') AND
appliance_id = 185)
I am returning the latest record for a sensor. The hex string is the sensor ID that I need to pass in as a variable in python.
Here is my python function that would return this object
def get_last_sensor_reading(appliance_id, sensor_id):
dw_conn = connection
dw_cur = dw_conn.cursor()
appliance_id_lookup = appliance_id
dw_cur.execute('''
SELECT * FROM frontend_appliance_sensor_reading as sr
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x{sensor_id_lookup}) AND
appliance_id = {appliance_id_lookup})
'''.format(appliance_id_lookup=appliance_id_lookup, sensor_id_lookup = str(sensor_id)))
values = dw_cur.fetchall()
dw_cur.close()
dw_conn.close()
print values
However it seems to concat the x with the variable like this:
(1054, "Unknown column 'x9720032F0100DE86' in 'where clause'")
I have tried various string combinations to get it to execute correctly with no luck. What am I missing? Does the x actually get interpreted as a str? Or should I be converting it to something else prior?
Also, I cannot not use Django's ORM for this as the sensor id is stored in a BinaryField as BLOB data. You cannot filter by BLOB data in Django. This is the reason I am using a sql command instead of just doing SensorReading.objects.filter(sensor_id = sensor).latest('id)
I have a query I use except.I want to send the table path in format when running the select query.
query_2="""select *
from {}.{}
where date(etl_date) = current_date
except select *
from {}_test.{}
where date(etl_date)=current_date"""
.format(liste[0],liste[1])
But naturally I get an error like this.
IndexError: tuple index out of range
How else can I use the format function here? Thanks...
Do not use simple format for SQL queries; use sql.Identifier for tables, fields and use the second argument of the execute method to pass variables (if needed).
from psycopg2.sql import Identifier, SQL
connection = psycopg2.connect("...")
cursor = connection.cursor()
suffix = "_test"
identifiers = [Identifier("some_schema"), Identifier("some_table"), Identifier("other_schema%s" % suffix), Identifier("other_table")]
query_2 = SQL("""select * from {}.{} where date(etl_date) = current_date
except select * from {}.{} where date(etl_date)=current_date""").format(*identifiers)
print(query_2.as_string(cursor)) # if you want to see the final query
cursor.execute(query_2)
Output
select * from "some_schema"."some_table" where date(etl_date) = current_date
except select * from "other_schema_test"."other_table" where date(etl_date)=current_date
This assumes you have multiple schemas in the same database as you can't easily do cross database queries in PostgreSQL.
I am retrieving a value from a SQLITE table and then using this value to retrieve data from another table using a SELECT FROM WHERE statement. I cannot use the retrieved value to query the other table even though the values appear to match when I retrieve this value independently from the table I am querying. I get Error binding parameter 0 - probably unsupported type. I am passing the value I believe correctly from what I've read in the docs, but there is obviously something wrong.
Edit: The ulitmate goal is to select the Name and EndTime then insert the EndTime value in another table in the same db if a column value = Name in that table. Added Update code below that gives an idea of what I'm attempting to accomplish.
When I print nameItem it appears this is a unicode string, (u'Something') is how it appears. I don't know if this is an encoding issue, but I have used similar queries before and not run into this issue.
I have tried to use the text I am searching for directly in the SELECT query and this works, but when passing it as a variable I still get unsupported type.
c.execute('SELECT Name FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for i in range(len(timeList)):
nameItem = timeList[i]
c.execute('SELECT "EndTime" FROM Expected WHERE "Name" = ?', (nameItem,))
end = c.fetchone()
conn = sqlite3.connect(otherDb)
c = conn.cursor()
c.execute('UPDATE Individuals SET Ending = end WHERE NAME = NameItem')
I expect this to retrieve a time associated with the current value of nameItem.
The first fix to try is to use string instead of tuple (note timeList is list of tuples):
nameItem = timeList[i][0]
But better is to not use len, something like:
c.execute('SELECT Name FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for elem in timeList:
nameItem = elem[0]
c.execute('SELECT "EndTime" FROM Expected WHERE "Name" = ?', (nameItem,))
end = c.fetchone()
And even better is to fetch EndTime in the first query (looks acceptable in given context):
c.execute('SELECT Name, EndTime FROM Expected WHERE Timing = 1')
timeList = c.fetchall()
for elem in timeList:
name = elem[0]
end_time = elem[1]
I'm pretty new to SQL and the Sqlite3 module and I want to edit the timestamps of all the records in my DB randomly.
import sqlite3
from time import time
import random
conn = sqlite3.connect('database.db')
c = sqlite3.Cursor(conn)
ts_new = round(time())
ts_old = 1537828957
difference = ts_new - ts_old
for i in range(1,309):
#getting a new, random timestamp
new_ts = ts_old + random.randint(0, difference)
t = (new_ts, i)
c.executemany("UPDATE questions SET timestamp = (?) WHERE rowid = (?)", t)
#conn.commit()
When run, I get a ValueError: parameters are of unsupported type.
To add the timestamp value originally I set t to a tuple and the current UNIX timestamp as the first value of a it e.g (1537828957, ). Is this error displaying because I've used two (?) unlike the single one I used in the statement to add the timestamps to begin with?
You're using executemany instead of execute. executemany takes an iterator of tuples and executes the query for each tuple.
You want to use execute instead, it executes the query once using your tuple.
c.execute('UPDATE questions SET timestamp = (?) where rowid = (?)', t)