I'm designing a database that has an API layer over it to get data from the tables. The database is postgres. Every night, we do a batch ETL process to update the data in the database. Due to some complications that aren't worth mentioning, the ETL process involves wiping out all of the data and rebuilding things from scratch.
Obviously, this is problematic for the API because if the API queries the database during the rebuilding phase, data will be missing.
I've decided to solve this by using two schemas. The "finished" schema (let's call this schema A) and the "rebuilding" schema (let's call this schema B). My ETL process looks like this:
1. Create schema B as an exact replica of schema A
2. Completely rebuild the data in schema B
3. In a transaction, drop schema A and rename schema B to schema A
The problem I'm currently running into is that I'm using sqlalchemy Session and Table objects, and the tables are bound to schema A by virtue of their metadata.
I would like to be able to do session.add(obj) and have it add that data to schema B. After all, schema A and schema B are exactly the same so the table definitions should be valid for both.
I'm wondering if anyone has any recommendations on how I can use sqlalchemy's session object and/or table objects to dynamically select which schema I should be using.
I still want sessions/tables to point to schema A because the same code is reused in the API layer. I only want to use schema B during this one step.
I ended up solving this by wrapping my table definitions in functions that accept a sqlalchemy metadata object and return the table definition bound to that metadata object.
Related
I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.
I am trying to understand what the MetaData() created object is in essence. It is used when reflecting and creating databases in Python (using SQLAlchemy package).
Consider the following working code:
/ with preloaded Engine(sqlite:///chapter5.sqlite) and metadata = MetaData(): when I call metadata in the console, it returns 'MetaData(bind=None)' /
# Import Table, Column, String, and Integer
from sqlalchemy import Table, Column, String, Integer
# Build a census table: census
census = Table('census', metadata,
Column('state', String(30)),
Column('sex', String(1)),
Column('age', Integer()),
Column('pop2000', Integer()),
Column('pop2008',Integer()))
# Create the table in the database
metadata.create_all(engine)
Of course by typing type(metadata) I get exactly what type of object metadata is: sqlalchemy.sql.schema.MetaData. In SQLAlchemy documentation it is written
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
However, I am confused, because in the code we only create a table that "points" to metadata. After that, when we call the create_all method on metadata (empty by far), pointing to the database (which is pointed by engine).
Probably my question is silly, but:
How does python exactly connect these instances? Probably the declaration of the census table links metadata to the column names in a two-sided way.
Note: The code is from an exercise from datacamp course.
I think you asked how does python (SQLAlchemy you presumably mean) connect the table to the metadata and the metadata to the database and engine.
So database tables in SQLAlchemy belong (are linked to) a metadata object. The table adds itself to the metadata; there is a tables property on the metadata object that acts a lot like a list:
>>> len(models.Base.metadata.tables)
22
The reason you need the metadata object is:
To have a single unit of work for creating and dropping related tables
To have a place to collect all the results of a reflection operation
To sort related tables based on their dependencies so that foreign key constraints can be created in the right order.
So, the metadata object contains SQLAlchemy's idea of what it thinks a database might look like. It's typically populated either from reflection or from you creating table objects (possibly through the declarative base extension).
You can directly associate a metadata object with a real database engine by setting the bind parameter in the metadata constructor. Alternatively, you can make the link when you use the metadata either in create calls or in reflection calls.
I'm writing a SQLAlchemy app that needs to connect to a PostgreSQL database and a MySQL database. Basically I'm loading the data from an existing MySQL database, doing some transforms on it, and then saving it in PostgreSQL.
I am managing the PostgreSQL schema using SQLAlchemy's declarative base. The MySQL database already exists, and I am accessing the schema via SQLAlchemy's reflection. Both have very different schemas.
I know I need dedicated engines for each database, but I'm unclear on whether I need dedicated objects of any of the following:
Base - I think this corresponds to the database schema. Since both databases have very different schemas, I will need a dedicated Base for each schema.
Metadata - Is this intended to be a single global metadata object that holds all schemas from all engines?
Sessions - I'm not sure, but I think I need separate sessions for each database? Or can a single session share multiple engine/Base combos? I'm using scoped_sessions.
Part of my confusion comes from not understanding the difference between Base and Metadata. The SQLAlchemy docs say:
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
This seems to imply that a single metadata can hold multiple Base's, but I'm still a bit fuzzy on how that works. For example, I want to be able to call metadata.create_all() and create tables in PostgreSQL, but not MySQL.
The short answer is that it's easiest to have separate instances of them all for both databases. It is possible to create a single routing session, but it has its caveats.
The sessionmaker and Session also support passing multiple binds as an argument and 2-phase commits, which can also allow using a single session with multiple databases. As luck would have it, the 2 databases that support 2-phase commits are PostgreSQL and MySQL.
About the relation between Base and metadata:
Base is a base class that has a metaclass used to declaratively create Table objects from information provided in the class itself and its subclasses. All Table objects implicitly declared by subclasses of Base will share the same MetaData.
You can provide metadata as an argument when creating a new declarative base and thus share it between multiple Bases, but in your case it is not useful.
MetaData
is a collection of Table objects and their associated schema constructs. It also can hold a binding to an Engine or Session.
In short, you can have Tables and MetaData without a Base, but a Base requires MetaData to function.
I'm using SqlAlchemy in my Pylons application to access data and SqlAlchemy-migrate to maintain the database schema.
It works fine for managing the schema itself. However, I also want to manage seed data in a migrate-like way. E.g. when ProductCategory table is created it would make sense to seed it with categories data.
Looks like SqlAlchemy-migrate does not support this directly. What would be a good approach to do this with Pylons+SqlAlchemy+SqlAlchemy-migrate?
Well what format is your seed data starting out in? The migrate calls are just python methods so you're free to open some csv, create SA object instances, loop, etc. I usually have my seed data as a series of sql insert statements and just loop over them executing a migate.execute(query) for each one.
So I'll first create the table, loop and run seed data, and then empty/drop table on the downgrade method.
I added a url column in my table, and now sqlalchemy is saying 'unknown column url'.
Why isn't it updating the table?
There must be a setting when I create the session?
I am doing:
Session = sessionmaker(bind=engine)
Is there something I am missing?
I want it to update any table that doesn't have a property that I added to my Table structure in my python code.
I'm not sure SQLAlchemy supports schema migration that well (atleast the last time I touched it, it wasn't there).
A couple of options.
Don't manually specify your tables. Use the autoload feature to have SQLAlchemy automatically read out the columns from your database. This will require tests to make sure that it works but you get the idea in generally. DRY.
Try SQLAlchemy migrate.
Manually update the table after you change the model specification.
In cases when you just add new columns, you can safely use metadata.create_all(bind=engine) to update your schema from your class definitions.
However this will NOT modify existing columns or remove columns from DB schema if you remove them in SQLA definition.