I am trying to understand what the MetaData() created object is in essence. It is used when reflecting and creating databases in Python (using SQLAlchemy package).
Consider the following working code:
/ with preloaded Engine(sqlite:///chapter5.sqlite) and metadata = MetaData(): when I call metadata in the console, it returns 'MetaData(bind=None)' /
# Import Table, Column, String, and Integer
from sqlalchemy import Table, Column, String, Integer
# Build a census table: census
census = Table('census', metadata,
Column('state', String(30)),
Column('sex', String(1)),
Column('age', Integer()),
Column('pop2000', Integer()),
Column('pop2008',Integer()))
# Create the table in the database
metadata.create_all(engine)
Of course by typing type(metadata) I get exactly what type of object metadata is: sqlalchemy.sql.schema.MetaData. In SQLAlchemy documentation it is written
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
However, I am confused, because in the code we only create a table that "points" to metadata. After that, when we call the create_all method on metadata (empty by far), pointing to the database (which is pointed by engine).
Probably my question is silly, but:
How does python exactly connect these instances? Probably the declaration of the census table links metadata to the column names in a two-sided way.
Note: The code is from an exercise from datacamp course.
I think you asked how does python (SQLAlchemy you presumably mean) connect the table to the metadata and the metadata to the database and engine.
So database tables in SQLAlchemy belong (are linked to) a metadata object. The table adds itself to the metadata; there is a tables property on the metadata object that acts a lot like a list:
>>> len(models.Base.metadata.tables)
22
The reason you need the metadata object is:
To have a single unit of work for creating and dropping related tables
To have a place to collect all the results of a reflection operation
To sort related tables based on their dependencies so that foreign key constraints can be created in the right order.
So, the metadata object contains SQLAlchemy's idea of what it thinks a database might look like. It's typically populated either from reflection or from you creating table objects (possibly through the declarative base extension).
You can directly associate a metadata object with a real database engine by setting the bind parameter in the metadata constructor. Alternatively, you can make the link when you use the metadata either in create calls or in reflection calls.
Related
I have a Flask application with a PostgreSQL database, Alembic migrations and SQLAlchemy.
Recently I started writing an integration test against the database. I import a model, say Item, that is mapped to table "item" and doing "from models import Item" triggers construction of the SQLAlchemy metadata for my tables.
In my test setup, I have
def setUpClass(cls):
try:
os.remove('testdb.db:')
except:
pass
#Run db migrations
global db_manager
db_manager = DatabaseManager()
alembic_cfg = Config("./alembic.ini")
alembic_cfg.attributes['db_manager'] = db_manager
command.upgrade(alembic_cfg, "head")
This results in
sqlalchemy.exc.InvalidRequestError: Table 'item' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.
I have debugged this to the metadata object being the same one between the calls and thus accumulating the "item" table twice to its tables array.
I have another pretty much identical application where this setup works, so I know it should work in theory. In this other application, metadata objects in the import and in the upgrade phases differ so the tables array is empty when alembic runs the upgrade, and hence there is no error.
Sorry I can't provide actual code, work project. Might be able to construct a minimal toy example if I find the time.
If I understood where the metadata actually gets created inside SQLAlchemy, I might be able to track down why alembic gets a clean metadata instance in the working app, and not in the problem app.
In the working application, "extend_existing" is not set and I'd rather not invoke some hack to mask an underlying issue.
I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.
I'm designing a database that has an API layer over it to get data from the tables. The database is postgres. Every night, we do a batch ETL process to update the data in the database. Due to some complications that aren't worth mentioning, the ETL process involves wiping out all of the data and rebuilding things from scratch.
Obviously, this is problematic for the API because if the API queries the database during the rebuilding phase, data will be missing.
I've decided to solve this by using two schemas. The "finished" schema (let's call this schema A) and the "rebuilding" schema (let's call this schema B). My ETL process looks like this:
1. Create schema B as an exact replica of schema A
2. Completely rebuild the data in schema B
3. In a transaction, drop schema A and rename schema B to schema A
The problem I'm currently running into is that I'm using sqlalchemy Session and Table objects, and the tables are bound to schema A by virtue of their metadata.
I would like to be able to do session.add(obj) and have it add that data to schema B. After all, schema A and schema B are exactly the same so the table definitions should be valid for both.
I'm wondering if anyone has any recommendations on how I can use sqlalchemy's session object and/or table objects to dynamically select which schema I should be using.
I still want sessions/tables to point to schema A because the same code is reused in the API layer. I only want to use schema B during this one step.
I ended up solving this by wrapping my table definitions in functions that accept a sqlalchemy metadata object and return the table definition bound to that metadata object.
I'm writing a SQLAlchemy app that needs to connect to a PostgreSQL database and a MySQL database. Basically I'm loading the data from an existing MySQL database, doing some transforms on it, and then saving it in PostgreSQL.
I am managing the PostgreSQL schema using SQLAlchemy's declarative base. The MySQL database already exists, and I am accessing the schema via SQLAlchemy's reflection. Both have very different schemas.
I know I need dedicated engines for each database, but I'm unclear on whether I need dedicated objects of any of the following:
Base - I think this corresponds to the database schema. Since both databases have very different schemas, I will need a dedicated Base for each schema.
Metadata - Is this intended to be a single global metadata object that holds all schemas from all engines?
Sessions - I'm not sure, but I think I need separate sessions for each database? Or can a single session share multiple engine/Base combos? I'm using scoped_sessions.
Part of my confusion comes from not understanding the difference between Base and Metadata. The SQLAlchemy docs say:
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
This seems to imply that a single metadata can hold multiple Base's, but I'm still a bit fuzzy on how that works. For example, I want to be able to call metadata.create_all() and create tables in PostgreSQL, but not MySQL.
The short answer is that it's easiest to have separate instances of them all for both databases. It is possible to create a single routing session, but it has its caveats.
The sessionmaker and Session also support passing multiple binds as an argument and 2-phase commits, which can also allow using a single session with multiple databases. As luck would have it, the 2 databases that support 2-phase commits are PostgreSQL and MySQL.
About the relation between Base and metadata:
Base is a base class that has a metaclass used to declaratively create Table objects from information provided in the class itself and its subclasses. All Table objects implicitly declared by subclasses of Base will share the same MetaData.
You can provide metadata as an argument when creating a new declarative base and thus share it between multiple Bases, but in your case it is not useful.
MetaData
is a collection of Table objects and their associated schema constructs. It also can hold a binding to an Engine or Session.
In short, you can have Tables and MetaData without a Base, but a Base requires MetaData to function.
I have little to no experience with databases and i'm wondering how i would go about storing certain parts of an object.
Let's say I have an object like the following and steps can be an arbitrary length. How would I store these steps or list of steps into an sql database?
class Error:
name = "" #name of error
steps = [] #steps to take to attempt to solve error
For your example you would create a table called Errors with metadata about the error such as an error_ID as the primary key, a name, date created, etc... then you'd create another table called Steps with it's own id, lets say Step_ID and any fields related to the step. The important part is you'd create a field on the Steps table that relates back to the Error that the steps are for we'll call that field again error_ID, then you'd make that field a foreign key so the database enforces that constraint.
If you want to store your Python objects in a database (or any other language objects in a database) the place to start is a good ORM (Object-Relational Mapper). For example Django has a built-in ORM. This link has a comparison of some Python Object-Relational mappers.