How to get base from existing sql DDL file? - python

I'm using SQLAlchemy for MySQL.
The common example of SQLAlchemy is
Defining model classes by the table structure. (class User(Base))
Migrate to the database by db.create_all (or alembic, etc)
Import the model class, and use it. (db.session.query(User))
But what if I want to use raw SQL file instead of defined model classes?
I did read automap do similar like this, but I want to get mapper object from raw SQL file, not created database.
Is there any best practice to do this?
This is an example of DDL
-- ddl.sql
-- This is just an example, so please ignore some issues related to a grammar
CREATE TABLE `card` (
`card_id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'card',
`card_company_id` bigint(20) DEFAULT NULL COMMENT 'card_company_id',
PRIMARY KEY (`card_id`),
KEY `card_ix01` (`card_company_id`),
KEY `card_ix02` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='card table'
And I want to do like
Base = raw_sql_base('ddl.sql') # Some kinda automap_base but from SQL file
# engine, suppose it has two tables 'user' and 'address' set up
engine = create_engine("mysql://user#localhost/program")
# reflect the tables
Base.prepare(engine)
# mapped classes are now created with names by sql file
Card = Base.classes.card
session = Session(engine)
session.add(Card(card_id=1, card_company_id=1))
session.commit() # Insert

SQLAlchemy is not an SQL parser, but the exact opposite; its reflection works against existing databases only. In other words you must execute your DDL and then use reflection / automap to create the necessary Python models:
from sqlalchemy.ext.automap import automap_base
# engine, suppose it has two tables 'user' and 'address' set up
engine = create_engine("mysql://user#localhost/program")
# execute the DDL in order to populate the DB
with open('ddl.sql') as ddl:
engine.execute(ddl)
Base = automap_base()
# reflect the tables
Base.prepare(engine, reflect=True)
# mapped classes are now created with names by sql file
Card = Base.classes.card
session = Session(engine)
session.add(Card(card_id=1, card_company_id=1))
session.commit() # Insert
This of course may fail, if you have already executed the same DDL against your database, so you would have to handle that case as well. Another possible caveat is that some DB-API drivers may not like executing multiple statements at a time, if your ddl.sql happens to contain more than one CREATE TABLE statement etc.
...but I want to get mapper object from raw SQL file.
Ok, in that case what you need is the aforementioned parser. A cursory search produced two candidates:
sqlparse: Generic, but the issue tracker is a testament to how nontrivial parsing SQL is. Is often confused, for example parses ... COMMENT 'card', `card_company_id` ... as a keyword and an identifier list, not as a keyword, a literal, punctuation, and an identifier (or even better, the column definitions as their own nodes).
mysqlparse: A MySQL specific solution, but with limited support for just about anything, and it seems abandoned.
Parsing would be just the first step, though. You'd then have to convert the resulting trees to models.

Related

Python SQLAlchemy INSERT after DELETE violates constraint

I have this pattern for deletion of all rows in a Postgresql table and subsequent insertion with SQLAlchemy:
db = create_engine("postgresql://...", echo=False).connect()
metadata = MetaData(db)
my_table = Table('my_table', metadata, autoload_with=db)
...
db.execute(my_table.delete())
db.execute(my_table.insert(), values)
where values is a list.
I can't uderstand why I get a psycopg2.errors.UniqueViolation when trying to insert.
The data which is inserted is not duplicated, so I guess the problem is that the delete is not committed?
I don't use a Session: what do I need to do to get this simple pattern working correctly?
I found the solution by completely disabling automatic SQLAlchemy transactions (which are not needed in my case of bulk deletions/insertions) with the supported DBAPI isolation_level="AUTOCOMMIT":
db = create_engine("postgresql://...", echo=False).connect().execution_options(isolation_level="AUTOCOMMIT")
See https://docs.sqlalchemy.org/en/14/core/connections.html#setting-transaction-isolation-levels-including-dbapi-autocommit

Forcing autoincrement on Numeric primary key with Sqlalchemy and SQL Server

I have an existing SQL Server (2012) DB with many tables having a primary key of Numeric(9, 0) type. For all intents and purposes they are integers.
The ORM mapping (generated using sqlacodegen) looks like:
class SomeTable(Base):
__tablename__ = 'SOME_TABLE'
__table_args__ = {'schema': 'dbo'}
SOME_TABLE_ID = Column(Numeric(9, 0), primary_key=True)
some_more_fields_here = XXX
Sample code to insert data:
some_table = SomeTable(_not_specifying_SOME_TABLE_ID_explicityly_)
session.add(some_table)
session.flush() # <---BOOM, FlushError here
When I try to insert data into such tables, my app crashes on session.flush() with the following error:
sqlalchemy.orm.exc.FlushError: Instance SomeTable ... has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such aswithin a load() event.
If I replace Numeric with BigInteger then everything works fine. I did some digging and the query generated with Numeric is like this:
INSERT INTO dbo.[SOME_TABLE] (_columns_without_SOME_TABLE_ID)
VALUES (...)
seems like a valid query from SQL point of view, but Sqlalchemy raises the above exception there
The query generated using BigInteger is as follows:
INSERT INTO dbo.[SOME_TABLE] (_columns_without_SOME_TABLE_ID)
OUTPUT inserted.[SOME_TABLE_ID]
VALUES (...)
I also found this peace of documentation about autoincrement property. And sure enough, it explains the behavior I observe, i.e. autoincrement only works with integers.
So my question is whether there is some kind of workaround to make autoincrement work with Numeric columns without converting them to BigInteger?
My system configuration is - Centos 7 64 bit, Python 3.5.2, Sqlalchemy 1.1.4, pymssql 2.2.0, SQL Server 2012.

Specifying the schema in Pandas to_sql

From the source of to_sql, I can see that it gets mapped to an Meta Data object meta = MetaData(con, schema=schema). However, I can't find SQLAlchemy docs that tell me how to define the Schema for MySQL
How do I specify the schema string ?
The schema parameter in to_sql is confusing as the word "schema" means something different from the general meaning of "table definitions". In some SQL flavors, notably postgresql, a schema is effectively a namespace for a set of tables.
For example, you might have two schemas, one called test and one called prod. Each might contain a table called user_rankings generated in pandas and written using the to_sql command. You would specify the test schema when working on improvements to user rankings. When you are ready to deploy the new rankings, you would write to the prod schema.
As others have mentioned, when you call to_sql the table definition is generated from the type information for each column in the dataframe. If the table already exists in the database with exactly the same structure, you can use the append option to add new data to the table.
DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Just use schema parameter. But note that schema is not odbc driver.
Starting from the Dialects page of the SQLAlchemy documentation, select documentation page of your dialect and search for create_engine to find example on how to create it.
Even more concise overview you can get on Engine Configuration page for all supported dialects.
Verbatim extract for mysql:
# default
engine = create_engine('mysql://scott:tiger#localhost/foo')
# mysql-python
engine = create_engine('mysql+mysqldb://scott:tiger#localhost/foo')
# MySQL-connector-python
engine = create_engine('mysql+mysqlconnector://scott:tiger#localhost/foo')
# OurSQL
engine = create_engine('mysql+oursql://scott:tiger#localhost/foo')
Then pass this engine to the to_sql(...) of pandas' DataFrame.

How to get SqlAlchemy Table to read "implicit" schema

Using table creation as normal:
t = Table(name, meta, [columns ...])
This is the first run where I create the table. In future executions I would like to use the table without having to indicate the [columns]. This seems redundant as it should already be specified in the table schema. In other words, for future accesses, I'd like to simply do:
t = Table(name, meta) # columns already read from schema
Is there a way to do this in SqlAlchemy?
See Reflecting Database Objects of SA documentation:
t = Table(name, meta, autoload=True)#, autoload_with=engine)

Is there are standard way to store a database schema outside a python app

I am working on a small database application in Python (currently targeting 2.5 and 2.6) using sqlite3.
It would be helpful to be able to provide a series of functions that could setup the database and validate that it matches the current schema. Before I reinvent the wheel, I thought I'd look around for libraries that would provide something similar. I'd love to have something akin to RoR's migrations. xml2ddl doesn't appear to be meant as a library (although it could be used that way), and more importantly doesn't support sqlite3. I'm also worried about the need to move to Python 3 one day given the lack of recent attention to xml2ddl.
Are there other tools around that people are using to handle this?
You can find the schema of a sqlite3 table this way:
import sqlite3
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute('create table foo (bar integer, baz timestamp)')
c.execute("select sql from sqlite_master where type = 'table' and name = 'foo'")
r=c.fetchone()
print(r)
# (u'CREATE TABLE foo (bar integer, baz timestamp)',)
Take a look at SQLAlchemy migrate. I see no problem using it as migration tool only, but comparing of configuration to current database state is experimental yet.
I use this to keep schemas in sync.
Keep in mind that it adds a metadata table to keep track of the versions.
South is the closest I know to RoR migrations. But just as you need Rails for those migrations, you need django to use south.
Not sure if it is standard but I just saved all my schema queries in a txt file like so (tables_creation.txt):
CREATE TABLE "Jobs" (
"Salary" TEXT,
"NumEmployees" TEXT,
"Location" TEXT,
"Description" TEXT,
"AppSubmitted" INTEGER,
"JobID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("JobID")
);
CREATE TABLE "Questions" (
"Question" TEXT NOT NULL,
"QuestionID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("QuestionID" AUTOINCREMENT)
);
CREATE TABLE "FreeResponseQuestions" (
"Answer" TEXT,
"FreeResponseQuestionID" INTEGER NOT NULL UNIQUE,
PRIMARY KEY("FreeResponseQuestionID"),
FOREIGN KEY("FreeResponseQuestionID") REFERENCES "Questions"("QuestionID")
);
...
Then I used this function taking advantage of the fact that I made each query delimited by two newline characters:
def create_db_schema(self):
db_schema = open("./tables_creation.txt", "r")
sql_qs = db_schema.read().split('\n\n')
c = self.conn.cursor()
for sql_q in sql_qs:
c.execute(sql_q)

Categories

Resources