Idiomatic Way to Insert/Upsert Protobuf Into A Relational Database

Idiomatic Way to Insert/Upsert Protobuf Into A Relational Database - python

I have an Python object which is a ProtoBuf message, that I want to insert into a database.
Ideally I'd like to be able to do something like
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
from event_pb2 import Event
engine = create_engine(...)
metadata = MetaData(engine)
table = Table("events", metadata, autoload=True)
mapping = mapper(Event, table)
Session = sessionmaker(engine)
session = Session()
byte_string = b'.....'
event = Event()
event.ParseFromString(byte_string)
session.add(event)
When I try the above I get an error AttributeError: 'Event' object has no attribute '_sa_instance_state' when I try to create the Event object. Which isn't shocking because the Event class has been generated by ProtoBuf.
Is there a better i.e. safer or more succinct way to do that than manually generating the insert statement by looping over all the field names and values? I'm not married to using SqlAlchemy if there's a better way to solve the problem.

I think it's generally advised that you should limit protobuf generated classes to the client and server side gRPC methods and, for any uses beyond that, map Protobuf objects from|to application specific classes.
In this case, define a set of SQLAlchemy classes and transform the gRPC objects into SQLAlchemy specific classes for your app.
This avoids breakage if e.g. gRPC maintainers change the implementation in a way that would break SQLAlchemy, it provides you with a means to translate between e.g. proto Timestamps and your preferred database time format, and it provides a level of abstraction between gRPC and SQLAlchemy that affords you more flexibility in making changes to one or the other.
There do appear to be some tools to help with the translation but, these highlight issues with their approach e.g. Mercator.

Related

Why does sqlalchemy use DeclarativeMeta class inheritance to map objects to tables

I'm learning sqlalchemy's ORM and I'm finding it very confusing / unintuitive. Say I want to create a User class and corresponding table.. I think I should do something like
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
engine = create_engine("sqlite:///todooo.db")
Base = declarative_base()
class User(Base):
__tablename__ = 'some_table'
id = Column(Integer, primary_key=True)
name = Column(String(50))
Base.metadata.create_all(engine)
Session = sessionmaker(bind = engine)
session1 = Session()
user1 = User(name='fred')
session1.add(user1)
session1.commit()
Here I
create an Engine. My understanding is that the engine is like the front-line communicator between my code and the database.
create a DeclarativeMeta, a metaclass whose job I think is to keep track of a mapping between my Python classes and my SQL tables
create a User class
initialize my database and tables with Base.metadata.create_all(engine)
create a Session metaclass
create a Session instance, session1
create a User instance, user1
The thing I find quite confusing here is the Base superclass. What is the benefit to using it as opposed to doing something like engine.create_table(User)?
Additionally, if we don't need a Session to create the database and insert tables, why do we need a Session to insert records?

SQLAlchemy needs a mechanism to hook the classes being mapped to the database rows. Basically, you have to tell it:
Use the class User as a mapped class for the table some_table. One way to do it is to use a common base class - declarative base. Another way would be calling a function to register you class into the mapper. This declarative base used to be an extension of sqlalchemy IIRC but later it became a standard.
Now, having a common base makes perfect sense for me, because I do not have to make an extra step to call a function to register the mapping. Instead, whatever I inherit from the declarative base is automatically mapped. Both attitudes can work in general.
Engine is able to give you connections and can take care of connection pooling. Connection is able to run things on the database. No ORM as yet. With a connection, you can create and run queries using QL (query language), but you have no mapping of database data to python objects.
Session uses a connection and takes care of ORM. Better read the docs here, but the simplest case is:
user1.name = "Different Fred"
That's it. It will generate and execute the SQL at the right moment. Really, read the docs.
Now, you can create a table only with a connection, as it does not make much sense to include the session in the process, because session takes care of the current mapping session. There is nothing to map if you do not physically have the table yet. So you create the tables with the connection, then you can make a session and use mapping. Also, tables creation is usually a once-off action done separately from the normal program run (at least create_all).

SQLAlchemy doesn't map reflected class

I have this code:
def advertiser_table(engine):
return Table('advertiser', metadata, autoload=True, autoload_with=engine)
And later I try this:
advertisers = advertiser_table(engine)
...
session.bulk_insert_mappings(
advertisers.name,
missing_advetisers.to_dict('records'),
)
where missing_adverisers is a Pandas DataFrame (but it's not important for this question).
The error this gives me is:
sqlalchemy.orm.exc.UnmappedClassError: Class ''advertiser'' is not mapped
From reading the documentation I could scramble enough to ask the question, but not much more than that... What is Mapper and why is it so detrimental to the functioning of this library?.. Why isn't "the class" mapped? Obviously, what am I to do to "map" it to whatever this library wants it to map?

A Mapper is the M in ORM. It is the thing that maps your table (advertisers in this case) to instances of a class (which you are missing in this case) in order for you to operate on it.
The reason it's confusing for you is because SQLAlchemy is actually two libraries in one -- one is called SQLAlchemy Core, and the other is the SQLAlchemy ORM. Core provides the ability to work with tables and to construct queries that return rows, while the ORM builds on top of Core to provide the ability to work with instances of classes and their relationships as an abstraction. Core roughly corresponds to things you can do on Connection and Engine, while ORM roughly corresponds to things you can do on Session.
So, all of that is to say, session.bulk_insert_mappings is an ORM functionality, and you cannot use it without having a mapped class.
What can you do instead? Use the equivalent Core functionality:
query = advertisers.insert().values(missing_advetisers.to_dict('records'))
engine.execute(query) # or session.execute(query)
Or even use the pandas-provided to_sql function:
missing_advetisers.to_sql("advertiser", engine, if_exists="append")
If you insist on using the ORM, you need to declare a mapped class for your table. The easiest way when using reflection is to use automap. The linked documentation has many examples, so I won't go into detail here.

Can SQLAlchemy Use MySQL's SSCursor For Only Some Queries?

I have a query that fetches a lot of data from my MySQL db, where loading all of the data into memory isn't an option. Luckily, SQLAlchemy lets me create an engine using MySQL's SSCursor, so the data is streamed and not fully loaded into memory. I can do this like so:
create_engine(connect_str, connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
That's great, but I don't want to use SSCursor for all my queries including very small ones. I'd rather only use it where it's really necessary. I thought I'd be able to do this with the stream_results setting like so:
conn.execution_options(stream_results=True).execute(MyTable.__table__.select())
Unfortunately, when monitoring memory usage when using that, it seems to use the exact same amount of memory as if I don't do that, whereas using SSCursor, my memory usage goes down to nil as expected. What am I missing? Is there some other way to accomplish this?

From the docs:
stream_results – Available on: Connection, statement. Indicate to the
dialect that results should be “streamed” and not pre-buffered, if
possible. This is a limitation of many DBAPIs. The flag is currently
understood only by the psycopg2 dialect.
I think you just want to create multiple sessions one for streaming and one for normal queries, like:
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
def create_session(engine):
# configure Session class with desired options
Session = sessionmaker()
# associate it with our custom Session class
Session.configure(bind=engine)
# work with the session
session = Session()
return session
#streaming
stream_engine = create_engine(connect_str, connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
stream_session = create_session(stream_engine)
stream_session.execute(MyTable.__table__.select())
#normal
normal_engine = create_engine(connect_str)
normal_session = create_session(normal_engine)
normal_session.execute(MyTable.__table__.select())

Pattern for a Flask App using (only) SQLAlchemy Core

I have a Flask application with which I'd like to use SQLAlchemy Core (i.e. I explicitly do not want to use an ORM), similarly to this "fourth way" described in the Flask doc:
http://flask.pocoo.org/docs/patterns/sqlalchemy/#sql-abstraction-layer
I'd like to know what would be the recommended pattern in terms of:
How to connect to my database (can I simply store a connection instance in the g.db variable, in before_request?)
How to perform reflection to retrieve the structure of my existing database (if possible, I'd like to avoid having to explicitly create any "model/table classes")

Correct: You would create a connection once per thread and access it using a threadlocal variable. As usual, SQLAlchemy has thought of this use-case and provided you with a pattern: Using the Threadlocal Execution Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')
db.execute('SELECT * FROM some_table')
Note: If I am not mistaken, the example seems to mix up the names db and engine (which should be db as well, I think).
I think you can safely disregard the Note posted in the documentation as this is explicitly what you want. As long as each transaction scope is linked to a thread (as is with the usual flask setup), you are safe to use this. Just don't start messing with threadless stuff (but flask chokes on that anyway).
Reflection is pretty easy as described in Reflecting Database Objects. Since you don't want to create all the tables manually, SQLAlchemy offers a nice way, too: Reflecting All Tables at Once
meta = MetaData()
meta.reflect(bind=someengine)
users_table = meta.tables['users']
addresses_table = meta.tables['addresses']
I suggest you check that complete chapter concerning reflection.

Add database support at runtime

I have a python module that I've been using over the years to process a series of text files for work. I now have a need to store some of the info in a db (using SQLAlchemy), but I would still like the flexibility of using the module without db support, i.e. not have to actually have sqlalchemy import'ed (or installed). As of right now, I have the following... and I've been creating Product or DBProduct, etc depending on wether I intend to use a db or not.
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Product(object):
pass
class WebSession(Product):
pass
class Malware(WebSession):
pass
class DBProduct(Product, Base):
pass
class DBWebSession(WebSession, DBProduct):
pass
class DBMalware(Malware, DBWebSession):
pass
However, I feel that there has got to be an easier/cleaner way to do this. I feel that I'm creating an inheritance mess and potential problems down the road. Ideally, I'd like to create a single class of Product, WebSession, etc (maybe using decorators) that contains the information neccessary for using a db, but it's only enabled/functional after calling something like enable_db_support(). Once that function is called, then regardless of what object I create, itself (and all the objects it inherits) enable all the column
bindings, etc. I should also note that if I somehow figure out how to include Product and DBProduct in one class, I sometimes need 2 versions of the same function: 1 which is called if db support is enabled and 1 if it's not. I've also considered "recreating" the object hierarchy when enable_db_support() is called, however that turned out to be a nightmare as well.
Any help is appreciated.

Well, you can probably get away with creating a pure non-DB aware model by using Classical Mapping without using a declarative extension. In this case, however, you will not be able to use relationships as they are used in SA, but for simple data import/export types of models this should suffice:
# models.py
class User(object):
pass
----
# mappings.py
from sqlalchemy import Table, MetaData, Column, ForeignKey, Integer, String
from sqlalchemy.orm import mapper
from models import User
metadata = MetaData()
user = Table('user', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(50)),
Column('fullname', String(50)),
Column('password', String(12))
)
mapper(User, user)
Another option would be to have a base class for your models defined in some other module and configure on start-up this base class to be either DB-aware or not, and in case of DB-aware version add additional features like relationships and engine configurations...

It seems to me that the DRYest thing to do would be to abstract away the details of your data storage format, be that a plain text file or a database.
That is, write some kind of abstraction layer that your other code uses to store the data, and make it so that the output of your abstraction layer is switchable between SQL or text.
Or put yet another way, don't write a Product and DB_Product class. Instead, write a store_data() function that can be told to use either format='text' or format='db'. Then use that everywhere.
This is actually the same thing SQLAlchemy does behind the scenes - you don't have to write separate code for SQLAlchemy depending on whether it's driving mySQL, PostgreSQL, etc. That is all handled in SQLAlchemy, and you use the abstracted (database-neutral) interface.
Alternately, if your objection to SQLAlchemy is that it's not a Python builtin, there's always sqlite3. This gives you all the goodness of an SQL relational database with none of the fat.
Alternately alternately, use sqlite3 as an intermediate format. So rewrite all your code to use sqlite3, and then translate from sqlite3 to plain text (or another database) as required. In the limit case, conversion to plain text is only a sqlite3 db .dump away.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Idiomatic Way to Insert/Upsert Protobuf Into A Relational Database - python

Related

Why does sqlalchemy use DeclarativeMeta class inheritance to map objects to tables

SQLAlchemy doesn't map reflected class

Can SQLAlchemy Use MySQL's SSCursor For Only Some Queries?

Pattern for a Flask App using (only) SQLAlchemy Core

Add database support at runtime

Categories

Resources