Extracting max date from a database and use output in another query

Extracting max date from a database and use output in another query - python

I want to query max date in a table and use this as parameter in a where clausere in another query. I am doing this:
query = (""" select
cast(max(order_date) as date)
from
tablename
""")
cursor.execute(query)
d = cursor.fethcone()
as output:[(datetime.date(2021, 9, 8),)]
Then I want to use this output as parameter in another query:
query3=("""select * from anothertable
where order_date = d::date limit 10""")
cursor.execute(query3)
as output: column "d" does not exist
I tried to cast(d as date) , d::date but nothing works. I also tried to datetime.date(d) no success too.
What I am doing wrong here?

There is no reason to select the date then use it in another query. That requires 2 round trips to the server. Do it in a single query. This has the advantage of removing all client side processing of that date.
select *
from anothertable
where order_date =
( select max(cast(order_date as date ))
from tablename
);
I am not exactly how this translates into your obfuscation layer but, from what I see, I believe it would be something like.
query = (""" select *
from anothertable
where order_date =
( select max(cast(order_date as date ))
from tablename
) """)
cursor.execute(query)
Heed the warning by #OneCricketeer. You may need cast on anothertable order_date as well. So where cast(order_date as date) = ( select ... )

Related

Extracting parameters from strings - SQL Server

I have a table with strings in one column, which are actually storing other SQL Queries written before and stored to be ran at later times. They contain parameters such as '#organisationId' or '#enterDateHere'. I want to be able to extract these.
Example:
ID
Query
1
SELECT * FROM table WHERE id = #organisationId
2
SELECT * FROM topic WHERE creation_time <=#startDate AND creation_time >= #endDate AND id = #enterOrgHere
3
SELECT name + '#' + domain FROM user
I want the following:
ID
Parameters
1
#organisationId
2
#startDate, #endDate, #enterOrgHere
3
NULL
No need to worry about how to separate/list them, as long as they are clearly visible and as long as the query lists all of them, which I don't know the number of. Please note that sometimes the queries contain just # for example when email binding is being done, but it's not a parameter. I want just strings which start with # and have at least one letter after it, ending with a non-letter character (space, enter, comma, semi-colon). If this causes problems, then return all strings starting with # and I will simply identify the parameters manually.
It can include usage of Excel/Python/C# if needed, but SQL is preferable.

The official way to interrogate the parameters is with sp_describe_undeclared_parameters, eg
exec sp_describe_undeclared_parameters #tsql = N'SELECT * FROM topic WHERE creation_time <=#startDate AND creation_time >= #endDate AND id = #enterOrgHere'

It is very simple to implement by using tokenization via XML and XQuery.
Notable points:
1st CROSS APPLY is tokenazing Query column as XML.
2nd CROSS APPLY is filtering out tokens that don't have "#" symbol.
SQL #1
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Query VARCHAR(2048));
INSERT INTO #tbl (Query) VALUES
('SELECT * FROM table WHERE id = #organisationId'),
('SELECT * FROM topic WHERE creation_time <=#startDate AND creation_time >= #endDate AND id = #enterOrgHere'),
('SELECT name + ''#'' + domain FROM user');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = SPACE(1);
SELECT t.ID
, Parameters = IIF(t2.Par LIKE '#[a-z]%', t2.Par, NULL)
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(Query, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"#")])').value('text()[1]','VARCHAR(1024)'))) AS t2(Par)
SQL #2
A cleansing step was added to handle other than a regular space whitespaces first.
SELECT t.*
, Parameters = IIF(t2.Par LIKE '#[a-z]%', t2.Par, NULL)
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + Query + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t0(pure)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(Pure, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
CROSS APPLY (SELECT TRIM('><=' FROM c.query('data(/root/r[contains(text()[1],"#")])')
.value('text()[1]','VARCHAR(1024)'))) AS t2(Par);
Output
ID
Parameters
1
#organisationId
2
#startDate #endDate #enterOrgHere
3
NULL

You can use string split, and then remove the undesired caracters, here's a query :
DROP TABLE IF EXISTS #TEMP
SELECT 1 AS ID ,'SELECT * FROM table WHERE id = #organisationId' AS Query
INTO #TEMP
UNION ALL SELECT 2, 'SELECT * FROM topic WHERE creation_time <=#startDate AND creation_time >= #endDate AND id = #enterOrgHere'
UNION ALL SELECT 3, 'SELECT name + ''#'' + domain FROM user'
;WITH cte as
(
SELECT ID,
Query,
STRING_AGG(REPLACE(REPLACE(REPLACE(value,'<',''),'>',''),'=',''),', ') AS Parameters
FROM #TEMP
CROSS APPLY string_split(Query,' ')
WHERE value LIKE '%#[a-z]%'
GROUP BY ID,
Query
)
SELECT #TEMP.*,cte.Parameters
FROM #TEMP
LEFT JOIN cte on #TEMP.ID = cte.ID

Using SQL Server for parsing is a very bad idea because of low performance and lack of tools. I highly recommend using .net assembly or external language (since your project is in python anyway) with regexp or any other conversion method.
However, as a last resort, you can use something like this extremely slow and generally horrible code (this code working just on sql server 2017+, btw. On earlier versions code will be much more terrible):
DECLARE #sql TABLE
(
id INT PRIMARY KEY IDENTITY
, sql_query NVARCHAR(MAX)
);
INSERT INTO #sql (sql_query)
VALUES (N'SELECT * FROM table WHERE id = #organisationId')
, (N'SELECT * FROM topic WHERE creation_time <=#startDate AND creation_time >= #endDate AND id = #enterOrgHere')
, (N' SELECT name + ''#'' + domain FROM user')
;
WITH prepared AS
(
SELECT id
, IIF(sql_query LIKE '%#%'
, SUBSTRING(sql_query, CHARINDEX('#', sql_query) + 1, LEN(sql_query))
, CHAR(32)
) prep_string
FROM #sql
),
parsed AS
(
SELECT id
, IIF(CHARINDEX(CHAR(32), value) = 0
, SUBSTRING(value, 1, LEN(VALUE))
, SUBSTRING(value, 1, CHARINDEX(CHAR(32), value) -1)
) parsed_value
FROM prepared p
CROSS APPLY STRING_SPLIT(p.prep_string, '#')
)
SELECT id, '#' + STRING_AGG(IIF(parsed_value LIKE '[a-zA-Z]%', parsed_value, NULL) , ', #')
FROM parsed
GROUP BY id

Why does identical code for SQL Merge (Upsert) work in Microsoft SQL Server console but doesn't work in Python?

I have a function in my main Python file which gets called by main() and executes a SQL Merge (Upsert) statement using pyodbc from a different file & function. Concretely, the SQL statement traverses a source table with transaction details by distinct transaction datetimes and merges customers into a separate target table. The function that executes the statement and the function that returns the completed SQL statement are attached below.
When I run my Python script, it doesn't work as expected and inserts only around 70 rows (sometimes 69, 71, or 72) into the target customer table. However, when I use an identical SQL statement and execute it in the Microsoft SQL Server Management Studio console (attached below), it works fine and inserts 4302 rows (as expected).
I'm not sure what's wrong.. Would really appreciate any help!
SQL Statement Executor in Python main file:
def stage_to_dim(connection, cursor, now):
log(f"Filling {cfg.dim_customer} and {cfg.dim_product}")
try:
cursor.execute(sql_statements.stage_to_dim_statement(now))
connection.commit()
except Exception as e:
log(f"Error in stage_to_dim: {e}" )
sys.exit(1)
log("Stage2Dimensions complete.")
SQL Statement formulator in Python:
def stage_to_dim_statement(now):
return f"""
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM {cfg.stage_table} ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE {cfg.dim_customer} AS Target
USING (SELECT * FROM {cfg.stage_table} WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME, '{now}'), Target.LoadSource = '{cfg.ingest_file_path}'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID,
Source.AcquisitionDate, Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'{now}'), '{cfg.ingest_file_path}');
END
"""
SQL Statement from MS SQL Server Console:
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM dbo.STG_CustomerTransactions ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE dbo.DIM_CustomerDup AS Target
USING (SELECT * FROM dbo.STG_CustomerTransactions WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME,'6/30/2022 11:53:05'), Target.LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID, Source.AcquisitionDate,
Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'6/30/2022 11:53:05'), '../csv/cleaned_original_data.csv');
END

If you think carefully about what your final result ends up, you are actually just taking the latest row (by date) for each customer. So you can just filter the source using a standard row-number approach.
Exactly why the Python code didn't work properly is unclear, but the below query might work better. You are also doing SQL injection, which is dangerous and can also cause correctness problems.
Also you should always use a non-ambiguous date format.
MERGE dbo.DIM_CustomerDup AS t
USING (
SELECT *
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY s.CustomerID ORDER BY s.TransactionDateTime DESC)
FROM dbo.STG_CustomerTransactions s
) AS s
WHERE s.rn = 1
) AS s
ON t.CustomerCodeNK = s.CustomerID
WHEN MATCHED THEN
UPDATE SET
AquiredDate = s.AcquisitionDate,
AquiredSource = s.AcquisitionSource,
ZipCode = s.Zipcode,
LoadDate = SYSDATETIME(),
LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource)
VALUES (s.CustomerID, s.AcquisitionDate, s.AcquisitionSource, s.Zipcode, SYSDATETIME(), '../csv/cleaned_original_data.csv')
;

Python: cx_Oracle does not like how I am entering date

I am trying to do a simple select all query in python using the Cx_oracle module. When I do a select all for the first ten rows in a table I am able to print our the output. However when I do a select all for the first ten rows for a specific date in the table all that gets printed out is a blank list like this: [].
Here is the query select all query that prints out all the results:
sql_query = "select * from table_name fetch first 10 rows only"
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)
The above query works and is able to print out the results.
Here is the query that I am having trouble with and this query below works in sql developer:
sql_query = "select * from table_name where requested_time = '01-jul-2021' fetch first 10 rows only"
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)
I also tried this way where I define the date outside of the query.
specific_date = '01-jul-2021'
sql_query = "select * from table_name where requested_time = '{0}' fetch first 10 rows only".format(specific_date)
cur = db_eng.OpenCursor()
db_eng.ExecuteQuery(cur, sql_query)
result = db_eng.FetchResults(cur)
print(result)

Oracle dates have a time portion. The query
select * from table_name where requested_time = '01-jul-2021' fetch first 10 rows only
Will only give you the rows for which the value for the column requested_time is 01-jul-2021 00:00. Chances are that you have other rows for which there is a time portion as well.
To cut off the time portion there are several options. Note that I explicitly added the a TO_DATE function to the date - you're assuming that the database is expecting a dd-mon-yyyy format and successfully will do the implicit conversion but it's safer to let the database know.
TRUNC truncate the column - this will remove the time portion
SELECT *
FROM table_name
WHERE TRUNC(requested_time) = TO_DATE('01-jul-2021','DD-mon-YYYY')
FETCH FIRST 10 ROWS ONLY
Format the column date to the same format as the date you supplied and compare the resulting string:
SELECT *
FROM table_name
WHERE TO_CHAR(requested_time,'DD-mon-YYYY') = '01-jul-2021'
FETCH FIRST 10 ROWS ONLY
Example:
pdb1--KOEN>create table test_tab(requested_time DATE);
Table TEST_TAB created.
pdb1--KOEN>BEGIN
2 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 00:00','DD-MON-YYYY HH24:MI'));
3 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 01:00','DD-MON-YYYY HH24:MI'));
4 INSERT INTO test_tab(requested_time) VALUES (TO_DATE('08-AUG-2021 02:10','DD-MON-YYYY HH24:MI'));
5 END;
6 /
PL/SQL procedure successfully completed.
pdb1--KOEN>SELECT COUNT(*) FROM test_tab WHERE requested_time = TO_DATE('08-AUG-2021','DD-MON-YYYY');
COUNT(*)
----------
1
--only 1 row. That is the rows with time 00:00. Other rows are ignored
pdb1--KOEN>SELECT COUNT(*) FROM test_tab WHERE TRUNC(requested_time) = TO_DATE('08-AUG-2021','DD-MON-YYYY');
-- all rows
COUNT(*)
----------
3

Python SQLite best way to use variables in query

Why do i get this error?
sqlite3.OperationalError: near "?": syntax error
when i run this:
c.execute('UPDATE ? SET Quantity = Quantity + ? WHERE Date = ?', (table, amount, date))
But not when i run this?
c.execute('UPDATE table1 SET Quantity = Quantity + ? WHERE Date = ?', (amount, date))
Variable value is:
table = 'table1'
amount = 20
Date = '12/5/2014'
I'm trying to dynamically create tables, but just doesn't work out.

You can't use placeholders for table names. You have to use normal Python string formatting or concatenation.

Return all values when a condition value is NULL

Suppose I have the following very simple query:
query = 'SELECT * FROM table1 WHERE id = %s'
And I'm calling it from a python sql wrapper, in this case psycopg:
cur.execute(query, (row_id))
The thing is that if row_id is None, I would like to get all the rows, but that query would return an empty table instead.
The easy way to approach this would be:
if row_id:
cur.execute(query, (row_id))
else:
cur.execute("SELECT * FROM table1")
Of course this is non idiomatic and gets unnecessarily complex with non-trivial queries. I guess there is a way to handle this in the SQL itself but couldn't find anything. What is the right way?

Try to use COALESCE function as below
query = 'SELECT * FROM table1 WHERE id = COALESCE(%s,id)'

SELECT * FROM table1 WHERE id = %s OR %s IS NULL
But depending how the variable is forwarded to the query it might be better to make it 0 if it is None
SELECT * FROM table1 WHERE id = %s OR %s = 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting max date from a database and use output in another query - python

Related

Extracting parameters from strings - SQL Server

Why does identical code for SQL Merge (Upsert) work in Microsoft SQL Server console but doesn't work in Python?

Python: cx_Oracle does not like how I am entering date

Python SQLite best way to use variables in query

Return all values when a condition value is NULL

Categories

Resources