Why does sqlalchemy use DeclarativeMeta class inheritance to map objects to tables - python

I'm learning sqlalchemy's ORM and I'm finding it very confusing / unintuitive. Say I want to create a User class and corresponding table.. I think I should do something like
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine("sqlite:///todooo.db")
Base = declarative_base()
class User(Base):
__tablename__ = 'some_table'
id = Column(Integer, primary_key=True)
name = Column(String(50))
Base.metadata.create_all(engine)
Session = sessionmaker(bind = engine)
session1 = Session()
user1 = User(name='fred')
session1.add(user1)
session1.commit()
Here I
create an Engine. My understanding is that the engine is like the front-line communicator between my code and the database.
create a DeclarativeMeta, a metaclass whose job I think is to keep track of a mapping between my Python classes and my SQL tables
create a User class
initialize my database and tables with Base.metadata.create_all(engine)
create a Session metaclass
create a Session instance, session1
create a User instance, user1
The thing I find quite confusing here is the Base superclass. What is the benefit to using it as opposed to doing something like engine.create_table(User)?
Additionally, if we don't need a Session to create the database and insert tables, why do we need a Session to insert records?

SQLAlchemy needs a mechanism to hook the classes being mapped to the database rows. Basically, you have to tell it:
Use the class User as a mapped class for the table some_table. One way to do it is to use a common base class - declarative base. Another way would be calling a function to register you class into the mapper. This declarative base used to be an extension of sqlalchemy IIRC but later it became a standard.
Now, having a common base makes perfect sense for me, because I do not have to make an extra step to call a function to register the mapping. Instead, whatever I inherit from the declarative base is automatically mapped. Both attitudes can work in general.
Engine is able to give you connections and can take care of connection pooling. Connection is able to run things on the database. No ORM as yet. With a connection, you can create and run queries using QL (query language), but you have no mapping of database data to python objects.
Session uses a connection and takes care of ORM. Better read the docs here, but the simplest case is:
user1.name = "Different Fred"
That's it. It will generate and execute the SQL at the right moment. Really, read the docs.
Now, you can create a table only with a connection, as it does not make much sense to include the session in the process, because session takes care of the current mapping session. There is nothing to map if you do not physically have the table yet. So you create the tables with the connection, then you can make a session and use mapping. Also, tables creation is usually a once-off action done separately from the normal program run (at least create_all).

Related

Idiomatic Way to Insert/Upsert Protobuf Into A Relational Database

I have an Python object which is a ProtoBuf message, that I want to insert into a database.
Ideally I'd like to be able to do something like
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
from event_pb2 import Event
engine = create_engine(...)
metadata = MetaData(engine)
table = Table("events", metadata, autoload=True)
mapping = mapper(Event, table)
Session = sessionmaker(engine)
session = Session()
byte_string = b'.....'
event = Event()
event.ParseFromString(byte_string)
session.add(event)
When I try the above I get an error AttributeError: 'Event' object has no attribute '_sa_instance_state' when I try to create the Event object. Which isn't shocking because the Event class has been generated by ProtoBuf.
Is there a better i.e. safer or more succinct way to do that than manually generating the insert statement by looping over all the field names and values? I'm not married to using SqlAlchemy if there's a better way to solve the problem.
I think it's generally advised that you should limit protobuf generated classes to the client and server side gRPC methods and, for any uses beyond that, map Protobuf objects from|to application specific classes.
In this case, define a set of SQLAlchemy classes and transform the gRPC objects into SQLAlchemy specific classes for your app.
This avoids breakage if e.g. gRPC maintainers change the implementation in a way that would break SQLAlchemy, it provides you with a means to translate between e.g. proto Timestamps and your preferred database time format, and it provides a level of abstraction between gRPC and SQLAlchemy that affords you more flexibility in making changes to one or the other.
There do appear to be some tools to help with the translation but, these highlight issues with their approach e.g. Mercator.

When to use `session_maker` and when to use `Session` in sqlalchemy

Sqlalchemy's documentation says that one can create a session in two ways:
from sqlalchemy.orm import Session
session = Session(engine)
or with a sessionmaker
from sqlalchemy.orm import session_maker
Session = session_maker(engine)
session = Session()
Now in either case one needs a global object (either the engine, or the session_maker object). So I do not really see what the point of the session_maker is. Maybe I am misunderstanding something.
I could not find any advice when one should use one or the other. So the question is: In which situation would you want to use Session(engine) and in which situation would you prefer session_maker?
The docs describe the difference very well:
Session is a regular Python class which can be directly instantiated. However, to standardize how sessions are configured and acquired, the sessionmaker class is normally used to create a top level Session configuration which can then be used throughout an application without the need to repeat the configurational arguments.

When connecting to multiple databases, do I need multiple SQLAlchemy Metadata, Base, or Session objects?

I'm writing a SQLAlchemy app that needs to connect to a PostgreSQL database and a MySQL database. Basically I'm loading the data from an existing MySQL database, doing some transforms on it, and then saving it in PostgreSQL.
I am managing the PostgreSQL schema using SQLAlchemy's declarative base. The MySQL database already exists, and I am accessing the schema via SQLAlchemy's reflection. Both have very different schemas.
I know I need dedicated engines for each database, but I'm unclear on whether I need dedicated objects of any of the following:
Base - I think this corresponds to the database schema. Since both databases have very different schemas, I will need a dedicated Base for each schema.
Metadata - Is this intended to be a single global metadata object that holds all schemas from all engines?
Sessions - I'm not sure, but I think I need separate sessions for each database? Or can a single session share multiple engine/Base combos? I'm using scoped_sessions.
Part of my confusion comes from not understanding the difference between Base and Metadata. The SQLAlchemy docs say:
MetaData is a container object that keeps together many different features of a database (or multiple databases) being described.
This seems to imply that a single metadata can hold multiple Base's, but I'm still a bit fuzzy on how that works. For example, I want to be able to call metadata.create_all() and create tables in PostgreSQL, but not MySQL.
The short answer is that it's easiest to have separate instances of them all for both databases. It is possible to create a single routing session, but it has its caveats.
The sessionmaker and Session also support passing multiple binds as an argument and 2-phase commits, which can also allow using a single session with multiple databases. As luck would have it, the 2 databases that support 2-phase commits are PostgreSQL and MySQL.
About the relation between Base and metadata:
Base is a base class that has a metaclass used to declaratively create Table objects from information provided in the class itself and its subclasses. All Table objects implicitly declared by subclasses of Base will share the same MetaData.
You can provide metadata as an argument when creating a new declarative base and thus share it between multiple Bases, but in your case it is not useful.
MetaData
is a collection of Table objects and their associated schema constructs. It also can hold a binding to an Engine or Session.
In short, you can have Tables and MetaData without a Base, but a Base requires MetaData to function.

Add database support at runtime

I have a python module that I've been using over the years to process a series of text files for work. I now have a need to store some of the info in a db (using SQLAlchemy), but I would still like the flexibility of using the module without db support, i.e. not have to actually have sqlalchemy import'ed (or installed). As of right now, I have the following... and I've been creating Product or DBProduct, etc depending on wether I intend to use a db or not.
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Product(object):
pass
class WebSession(Product):
pass
class Malware(WebSession):
pass
class DBProduct(Product, Base):
pass
class DBWebSession(WebSession, DBProduct):
pass
class DBMalware(Malware, DBWebSession):
pass
However, I feel that there has got to be an easier/cleaner way to do this. I feel that I'm creating an inheritance mess and potential problems down the road. Ideally, I'd like to create a single class of Product, WebSession, etc (maybe using decorators) that contains the information neccessary for using a db, but it's only enabled/functional after calling something like enable_db_support(). Once that function is called, then regardless of what object I create, itself (and all the objects it inherits) enable all the column
bindings, etc. I should also note that if I somehow figure out how to include Product and DBProduct in one class, I sometimes need 2 versions of the same function: 1 which is called if db support is enabled and 1 if it's not. I've also considered "recreating" the object hierarchy when enable_db_support() is called, however that turned out to be a nightmare as well.
Any help is appreciated.
Well, you can probably get away with creating a pure non-DB aware model by using Classical Mapping without using a declarative extension. In this case, however, you will not be able to use relationships as they are used in SA, but for simple data import/export types of models this should suffice:
# models.py
class User(object):
pass
----
# mappings.py
from sqlalchemy import Table, MetaData, Column, ForeignKey, Integer, String
from sqlalchemy.orm import mapper
from models import User
metadata = MetaData()
user = Table('user', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(50)),
Column('fullname', String(50)),
Column('password', String(12))
)
mapper(User, user)
Another option would be to have a base class for your models defined in some other module and configure on start-up this base class to be either DB-aware or not, and in case of DB-aware version add additional features like relationships and engine configurations...
It seems to me that the DRYest thing to do would be to abstract away the details of your data storage format, be that a plain text file or a database.
That is, write some kind of abstraction layer that your other code uses to store the data, and make it so that the output of your abstraction layer is switchable between SQL or text.
Or put yet another way, don't write a Product and DB_Product class. Instead, write a store_data() function that can be told to use either format='text' or format='db'. Then use that everywhere.
This is actually the same thing SQLAlchemy does behind the scenes - you don't have to write separate code for SQLAlchemy depending on whether it's driving mySQL, PostgreSQL, etc. That is all handled in SQLAlchemy, and you use the abstracted (database-neutral) interface.
Alternately, if your objection to SQLAlchemy is that it's not a Python builtin, there's always sqlite3. This gives you all the goodness of an SQL relational database with none of the fat.
Alternately alternately, use sqlite3 as an intermediate format. So rewrite all your code to use sqlite3, and then translate from sqlite3 to plain text (or another database) as required. In the limit case, conversion to plain text is only a sqlite3 db .dump away.

ORM for python that does not require we to define all properties for classes for database tables

I have access to a large database system. I would like to talk with it in an efficient manner.
Are there ORM framework like SQLAlchemy (I know SQLAlchemy) that does not require we to define all properties for classes for each database tables?
Because the database is already there, my aim is to avoid creating properties for classes.
Using SQLAlchemy's introspection features you can easily enough have a metaclass that gives you a new mapped class given a table name. The class still has to be defined, but you don't have to define it.
def introspect(tablename, *mapper_args, **mapper_kwargs):
u'given a table name and optional mapper arguments return an ORM class'
global metadata # or pass it in, or use OO, whatever
global engine # or pass it in, or use OO, whatever
table = sqlalchemy.Table(tablename, metadata,
autoload=True, autoload_with=engine)
class DynamicClass: pass # you can provide nice __init__, __str__ methods
return sqlalchemy.orm.mapper(DynamicClass, table,
*mapper_args, **mapper_kwargs)
Hibernate has a similar introspection feature, but it generates Java source files, and therefore is a build-time, not run-time, operation.

Categories

Resources