How to use a SQL View with Pony ORM - python

I'm trying to fetch the data returned by a View in MySQL using Pony ORM but the documentation does not provide any information on how to achieve this (well, I couldn't find any solution until this moment). Can Pony ORM do this? If so, what should I do to get it working?
Here is my MySQL View:
CREATE
ALGORITHM = UNDEFINED
DEFINER = `admin`#`%`
SQL SECURITY DEFINER
VIEW `ResidueCountByDate` AS
SELECT
CAST(`ph`.`DATE` AS DATE) AS `Date`,
COUNT(`art`.`RESIDUE_TYPE_ID`) AS `Aluminum Count`,
COUNT(`prt`.`RESIDUE_TYPE_ID`) AS `PET Count`
FROM
((((`TBL_PROCESS_HISTORY` `ph`
JOIN `TBL_RESIDUE` `pr` ON ((`ph`.`RESIDUE_ID` = `pr`.`RESIDUE_ID`)))
LEFT JOIN `TBL_RESIDUE_TYPE` `prt` ON (((`pr`.`RESIDUE_TYPE_ID` = `prt`.`RESIDUE_TYPE_ID`)
AND (`prt`.`DESCRIPTION` = 'PET'))))
JOIN `TBL_RESIDUE` `ar` ON ((`ph`.`RESIDUE_ID` = `ar`.`RESIDUE_ID`)))
LEFT JOIN `TBL_RESIDUE_TYPE` `art` ON (((`ar`.`RESIDUE_TYPE_ID` = `art`.`RESIDUE_TYPE_ID`)
AND (`art`.`DESCRIPTION` = 'ALUMINUM'))))
GROUP BY CAST(`ph`.`DATE` AS DATE)
ORDER BY CAST(`ph`.`DATE` AS DATE)

You can try one of the following:
1) Define a new entity and specify the view name as a table name for that entity:
class ResidueCountByDate(db.Entity):
dt = PrimaryKey(date, column='Date')
aluminum_count = Required(int, column='Aluminum Count')
pet_count = Required(int, column='PET Count')
After that you can use that entity to select data from the view:
with db_session:
start_date = date(2017, 1, 1)
query = select(rc for rc in ResidueCountByDate if rc.date >= start_date)
for rc in query:
print(rc.date, rc.aluminum_count, rc.pet_count)
By default, a column name is equal to an attribute name. I explicitly specified column for each attribute, because in Python attribute names cannot contain spaces, and usually written in lowercase.
It is possible to explicitly specify table name if it is not equal to entity name:
class ResidueCount(db.Entity):
_table_ = 'ResidueCountByDate'
...
2) You can write raw SQL query without defining any entity:
with db_session:
start_date = date(2017, 1, 1)
rows = db.select('''
SELECT `Date` AS dt, `Aluminum Count` AS ac, `PET Count` AS pc
FROM `ResidueCountByDate`
WHERE `Date` >= $start_date
''')
for row in rows:
print(row[0], row[1], row[2])
print(row.dt, row.ac, row.pc) # the same as previous row
If column name can be used as a Python identifier (i.e. it does not contain spaces or special characters) you can access column value using dot notation as in the last line

Related

Why does identical code for SQL Merge (Upsert) work in Microsoft SQL Server console but doesn't work in Python?

I have a function in my main Python file which gets called by main() and executes a SQL Merge (Upsert) statement using pyodbc from a different file & function. Concretely, the SQL statement traverses a source table with transaction details by distinct transaction datetimes and merges customers into a separate target table. The function that executes the statement and the function that returns the completed SQL statement are attached below.
When I run my Python script, it doesn't work as expected and inserts only around 70 rows (sometimes 69, 71, or 72) into the target customer table. However, when I use an identical SQL statement and execute it in the Microsoft SQL Server Management Studio console (attached below), it works fine and inserts 4302 rows (as expected).
I'm not sure what's wrong.. Would really appreciate any help!
SQL Statement Executor in Python main file:
def stage_to_dim(connection, cursor, now):
log(f"Filling {cfg.dim_customer} and {cfg.dim_product}")
try:
cursor.execute(sql_statements.stage_to_dim_statement(now))
connection.commit()
except Exception as e:
log(f"Error in stage_to_dim: {e}" )
sys.exit(1)
log("Stage2Dimensions complete.")
SQL Statement formulator in Python:
def stage_to_dim_statement(now):
return f"""
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM {cfg.stage_table} ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE {cfg.dim_customer} AS Target
USING (SELECT * FROM {cfg.stage_table} WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME, '{now}'), Target.LoadSource = '{cfg.ingest_file_path}'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID,
Source.AcquisitionDate, Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'{now}'), '{cfg.ingest_file_path}');
END
"""
SQL Statement from MS SQL Server Console:
DECLARE #dates table(id INT IDENTITY(1,1), date DATETIME)
INSERT INTO #dates (date)
SELECT DISTINCT TransactionDateTime FROM dbo.STG_CustomerTransactions ORDER BY TransactionDateTime;
DECLARE #i INT;
DECLARE #cnt INT;
DECLARE #date DATETIME;
SELECT #i = MIN(id) - 1, #cnt = MAX(id) FROM #dates;
WHILE #i < #cnt
BEGIN
SET #i = #i + 1
SET #date = (SELECT date FROM #dates WHERE id = #i)
MERGE dbo.DIM_CustomerDup AS Target
USING (SELECT * FROM dbo.STG_CustomerTransactions WHERE TransactionDateTime = #date) AS Source
ON Target.CustomerCodeNK = Source.CustomerID
WHEN MATCHED THEN
UPDATE SET Target.AquiredDate = Source.AcquisitionDate, Target.AquiredSource = Source.AcquisitionSource,
Target.ZipCode = Source.Zipcode, Target.LoadDate = CONVERT(DATETIME,'6/30/2022 11:53:05'), Target.LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource) VALUES (Source.CustomerID, Source.AcquisitionDate,
Source.AcquisitionSource, Source.Zipcode, CONVERT(DATETIME,'6/30/2022 11:53:05'), '../csv/cleaned_original_data.csv');
END
If you think carefully about what your final result ends up, you are actually just taking the latest row (by date) for each customer. So you can just filter the source using a standard row-number approach.
Exactly why the Python code didn't work properly is unclear, but the below query might work better. You are also doing SQL injection, which is dangerous and can also cause correctness problems.
Also you should always use a non-ambiguous date format.
MERGE dbo.DIM_CustomerDup AS t
USING (
SELECT *
FROM (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY s.CustomerID ORDER BY s.TransactionDateTime DESC)
FROM dbo.STG_CustomerTransactions s
) AS s
WHERE s.rn = 1
) AS s
ON t.CustomerCodeNK = s.CustomerID
WHEN MATCHED THEN
UPDATE SET
AquiredDate = s.AcquisitionDate,
AquiredSource = s.AcquisitionSource,
ZipCode = s.Zipcode,
LoadDate = SYSDATETIME(),
LoadSource = '../csv/cleaned_original_data.csv'
WHEN NOT MATCHED THEN
INSERT (CustomerCodeNK, AquiredDate, AquiredSource, ZipCode, LoadDate, LoadSource)
VALUES (s.CustomerID, s.AcquisitionDate, s.AcquisitionSource, s.Zipcode, SYSDATETIME(), '../csv/cleaned_original_data.csv')
;

Illegal Variable Name/Number when Passing in Python List

I'm trying to run SQL statements through Python on a list.
By passing in a list, in this case date. Since i want to run multiple SELECT SQL queries and return them.
I've tested this by passing in integers, however when trying to pass in a date I am getting ORA-01036 error. Illegal variable name/number. I'm using an Oracle DB.
cursor = connection.cursor()
date = ["'01-DEC-21'", "'02-DEC-21'"]
sql = "select * from table1 where datestamp = :date"
for item in date:
cursor.execute(sql,id=item)
res=cursor.fetchall()
print(res)
Any suggestions to make this run?
You can't name a bind variable date, it's an illegal name. Also your named variable in cursor.execute should match the bind variable name. Try something like:
sql = "select * from table1 where datestamp = :date_input"
for item in date:
cursor.execute(sql,date_input=item)
res=cursor.fetchall()
print(res)
Some recommendation and warnings to your approach:
you should not depend on your default NLS date setting, while binding a String (e.g. "'01-DEC-21'") to a DATE column. (You probably need also remone one of the quotes).
You should ommit to fetch data in a loop if you can fetch them in one query (using an IN list)
use prepared statement
Example
date = ['01-DEC-21', '02-DEC-21']
This generates the query that uses bind variables for your input list
in_list = ','.join([f" TO_DATE(:d{ind},'DD-MON-RR','NLS_DATE_LANGUAGE = American')" for ind, d in enumerate(date)])
sql_query = "select * from table1 where datestamp in ( " + in_list + " )"
The sql_query generate is
select * from table1 where datestamp in
( TO_DATE(:d0,'DD-MON-RR','NLS_DATE_LANGUAGE = American'), TO_DATE(:d1,'DD-MON-RR','NLS_DATE_LANGUAGE = American') )
Note that the INlist contains one bind variable for each member of your input list.
Note also the usage of to_date with explicite mask and fixing the language to avoid problems with interpretation of the month abbreviation. (e.g. ORA-01843: not a valid month)
Now you can use the query to fetch the data in one pass
cur.prepare(sql_query)
cur.execute(None, date)
res = cur.fetchall()

Python postgreSQL sqlalchemy query a DATERANGE column

I have a booking system and I save the booked daterange in a DATERANGE column:
booked_date = Column(DATERANGE(), nullable=False)
I already know that I can access the actual dates with booked_date.lower or booked_date.upper
For example I do this here:
for bdate in room.RoomObject_addresses_UserBooksRoom:
unaviable_ranges['ranges'].append([str(bdate.booked_date.lower),\
str(bdate.booked_date.upper)])
Now I need to filter my bookings by a given daterange. For example I want to see all bookings between 01.01.2018 and 10.01.2018.
Usually its simple, because dates can be compared like this: date <= other date
But if I do it with the DATERANGE:
the_daterange_lower = datetime.strptime(the_daterange[0], '%d.%m.%Y')
the_daterange_upper = datetime.strptime(the_daterange[1], '%d.%m.%Y')
bookings = UserBooks.query.filter(UserBooks.booked_date.lower >= the_daterange_lower,\
UserBooks.booked_date.upper <= the_daterange_upper).all()
I get an error:
AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with UserBooks.booked_date has an attribute 'lower'
EDIT
I found a sheet with useful range operators and it looks like there are better options to do what I want to do, but for this I need somehow to create a range variable, but python cant do this. So I am still confused.
In my database my daterange column entries look like this:
[2018-11-26,2018-11-28)
EDIT
I am trying to use native SQL and not sqlalchemy, but I dont understand how to create a daterange object.
bookings = db_session.execute('SELECT * FROM usersbookrooms WHERE booked_date && [' + str(the_daterange_lower) + ',' + str(the_daterange_upper) + ')')
The query
the_daterange_lower = datetime.strptime(the_daterange[0], '%d.%m.%Y')
the_daterange_upper = datetime.strptime(the_daterange[1], '%d.%m.%Y')
bookings = UserBooks.query.\
filter(UserBooks.booked_date.lower >= the_daterange_lower,
UserBooks.booked_date.upper <= the_daterange_upper).\
all()
could be implemented using "range is contained by" operator <#. In order to pass the right operand you have to create an instance of psycopg2.extras.DateRange, which represents a Postgresql daterange value in Python:
the_daterange_lower = datetime.strptime(the_daterange[0], '%d.%m.%Y').date()
the_daterange_upper = datetime.strptime(the_daterange[1], '%d.%m.%Y').date()
the_daterange = DateRange(the_dateranger_lower, the_daterange_upper)
bookings = UserBooks.query.\
filter(UserBooks.booked_date.contained_by(the_daterange)).\
all()
Note that the attributes lower and upper are part of the psycopg2.extras.Range types. The SQLAlchemy range column types do not provide such, as your error states.
If you want to use raw SQL and pass date ranges, you can use the same DateRange objects to pass values as well:
bookings = db_session.execute(
'SELECT * FROM usersbookrooms WHERE booked_date && %s',
(DateRange(the_daterange_lower, the_daterange_upper),))
You can also build literals manually, if you want to:
bookings = db_session.execute(
'SELECT * FROM usersbookrooms WHERE booked_date && %s::daterange',
(f'[{the_daterange_lower}, {the_daterange_upper})',))
The trick is to build the literal in Python and pass it as a single value – using placeholders, as always. It should avoid any SQL injection possibilities; only thing that can happen is that the literal has invalid syntax for a daterange. Alternatively you can pass the bounds to a range constructor:
bookings = db_session.execute(
'SELECT * FROM usersbookrooms WHERE booked_date && daterange(%s, %s)',
(the_daterange_lower, the_daterange_upper))
All in all it is easier to just use the Psycopg2 Range types and let them handle the details.

How to filtering a list of names from Python when querying through SQLAlchemy

My original query is like
select table1.id, table1.value
from some_database.something table1
join some_set table2 on table2.code=table1.code
where table1.date_ >= :_startdate and table1.date_ <= :_enddate
which is saved in a string in Python. If I do
x = session.execute(script_str, {'_startdate': start_date, '_enddate': end_date})
then
x.fetchall()
will give me the table I want.
Now the situation is, table2 is no longer available to me in the Oracle database, instead it is available in my python environment as a DataFrame. I wonder what is the best way to fetch the same table from the database in this case?
You can use the IN clause instead.
First remove the join from the script_str:
script_str = """
select table1.id, table1.value
from something table1
where table1.date_ >= :_startdate and table1.date_ <= :_enddate
"""
Then, get codes from dataframe:
codes = myDataFrame.code_column.values
Now, we need to dynamically extend the script_str and the parameters to the query:
param_names = ['_code{}'.format(i) for i in range(len(codes))]
script_str += "AND table1.code IN ({})".format(
", ".join([":{}".format(p) for p in param_names])
)
Create dict with all parameters:
params = {
'_startdate': start_date,
'_enddate': end_date,
}
params.update(zip(param_names, codes))
And execute the query:
x = session.execute(script_str, params)

sqlalchemy join and order by on multiple tables

I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()

Categories

Resources