I have a query that fetches a lot of data from my MySQL db, where loading all of the data into memory isn't an option. Luckily, SQLAlchemy lets me create an engine using MySQL's SSCursor, so the data is streamed and not fully loaded into memory. I can do this like so:
create_engine(connect_str, connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
That's great, but I don't want to use SSCursor for all my queries including very small ones. I'd rather only use it where it's really necessary. I thought I'd be able to do this with the stream_results setting like so:
conn.execution_options(stream_results=True).execute(MyTable.__table__.select())
Unfortunately, when monitoring memory usage when using that, it seems to use the exact same amount of memory as if I don't do that, whereas using SSCursor, my memory usage goes down to nil as expected. What am I missing? Is there some other way to accomplish this?
From the docs:
stream_results – Available on: Connection, statement. Indicate to the
dialect that results should be “streamed” and not pre-buffered, if
possible. This is a limitation of many DBAPIs. The flag is currently
understood only by the psycopg2 dialect.
I think you just want to create multiple sessions one for streaming and one for normal queries, like:
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
def create_session(engine):
# configure Session class with desired options
Session = sessionmaker()
# associate it with our custom Session class
Session.configure(bind=engine)
# work with the session
session = Session()
return session
#streaming
stream_engine = create_engine(connect_str, connect_args={'cursorclass': MySQLdb.cursors.SSCursor})
stream_session = create_session(stream_engine)
stream_session.execute(MyTable.__table__.select())
#normal
normal_engine = create_engine(connect_str)
normal_session = create_session(normal_engine)
normal_session.execute(MyTable.__table__.select())
Related
I have an Python object which is a ProtoBuf message, that I want to insert into a database.
Ideally I'd like to be able to do something like
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import mapper, sessionmaker
from event_pb2 import Event
engine = create_engine(...)
metadata = MetaData(engine)
table = Table("events", metadata, autoload=True)
mapping = mapper(Event, table)
Session = sessionmaker(engine)
session = Session()
byte_string = b'.....'
event = Event()
event.ParseFromString(byte_string)
session.add(event)
When I try the above I get an error AttributeError: 'Event' object has no attribute '_sa_instance_state' when I try to create the Event object. Which isn't shocking because the Event class has been generated by ProtoBuf.
Is there a better i.e. safer or more succinct way to do that than manually generating the insert statement by looping over all the field names and values? I'm not married to using SqlAlchemy if there's a better way to solve the problem.
I think it's generally advised that you should limit protobuf generated classes to the client and server side gRPC methods and, for any uses beyond that, map Protobuf objects from|to application specific classes.
In this case, define a set of SQLAlchemy classes and transform the gRPC objects into SQLAlchemy specific classes for your app.
This avoids breakage if e.g. gRPC maintainers change the implementation in a way that would break SQLAlchemy, it provides you with a means to translate between e.g. proto Timestamps and your preferred database time format, and it provides a level of abstraction between gRPC and SQLAlchemy that affords you more flexibility in making changes to one or the other.
There do appear to be some tools to help with the translation but, these highlight issues with their approach e.g. Mercator.
Sqlalchemy's documentation says that one can create a session in two ways:
from sqlalchemy.orm import Session
session = Session(engine)
or with a sessionmaker
from sqlalchemy.orm import session_maker
Session = session_maker(engine)
session = Session()
Now in either case one needs a global object (either the engine, or the session_maker object). So I do not really see what the point of the session_maker is. Maybe I am misunderstanding something.
I could not find any advice when one should use one or the other. So the question is: In which situation would you want to use Session(engine) and in which situation would you prefer session_maker?
The docs describe the difference very well:
Session is a regular Python class which can be directly instantiated. However, to standardize how sessions are configured and acquired, the sessionmaker class is normally used to create a top level Session configuration which can then be used throughout an application without the need to repeat the configurational arguments.
I have a multithreaded data analysis pipeline, which queries a database (via SQLAlchemy). Additionally, the database is synchronized across multiple systems by syncthing - long story short, this means that write permission cannot always be guaranteed.
Even when I am able to guarantee write access, I still occasionally and rather randomly get operational errors:
OperationalError: (sqlite3.OperationalError) database is locked
The code I use to load the session for the query is the following:
def loadSession(db_path):
db_path = "sqlite:///" + path.expanduser(db_path)
engine = create_engine(db_path, echo=False)
Session = sessionmaker(bind=engine)
session = Session()
Base.metadata.create_all(engine)
return session, engine
And can be seen in its full context here.
My query (and the way I turn it into a value) look like this:
session, engine = loadSession(db_path)
sql_query=session.query(LaserStimulationProtocol).filter(LaserStimulationProtocol.code==stim_protocol_dictionary[scan_type])
mystring = sql_query.statement
mydf = pd.read_sql_query(mystring,engine)
delay = int(mydf["stimulation_onset"][0])
And again, the full context can be found here.
How could I change my code so the database can be queried without having to rely on the file being writeable/unlocked? I have checked the file's checksum, and it does not change upon query, so clearly I'm not writing anything to it. As such, I guess there should be some way to extract the info I am looking for without write access?
I've written a blog post on the subject which provides some more explanation of the issue and some ways to work around it: http://charlesleifer.com/blog/multi-threaded-sqlite-without-the-operationalerrors/
Peewee ORM has an extension that is designed to support multiple threads writing to a SQLite database. http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#sqliteq
I have a Flask application with which I'd like to use SQLAlchemy Core (i.e. I explicitly do not want to use an ORM), similarly to this "fourth way" described in the Flask doc:
http://flask.pocoo.org/docs/patterns/sqlalchemy/#sql-abstraction-layer
I'd like to know what would be the recommended pattern in terms of:
How to connect to my database (can I simply store a connection instance in the g.db variable, in before_request?)
How to perform reflection to retrieve the structure of my existing database (if possible, I'd like to avoid having to explicitly create any "model/table classes")
Correct: You would create a connection once per thread and access it using a threadlocal variable. As usual, SQLAlchemy has thought of this use-case and provided you with a pattern: Using the Threadlocal Execution Strategy
db = create_engine('mysql://localhost/test', strategy='threadlocal')
db.execute('SELECT * FROM some_table')
Note: If I am not mistaken, the example seems to mix up the names db and engine (which should be db as well, I think).
I think you can safely disregard the Note posted in the documentation as this is explicitly what you want. As long as each transaction scope is linked to a thread (as is with the usual flask setup), you are safe to use this. Just don't start messing with threadless stuff (but flask chokes on that anyway).
Reflection is pretty easy as described in Reflecting Database Objects. Since you don't want to create all the tables manually, SQLAlchemy offers a nice way, too: Reflecting All Tables at Once
meta = MetaData()
meta.reflect(bind=someengine)
users_table = meta.tables['users']
addresses_table = meta.tables['addresses']
I suggest you check that complete chapter concerning reflection.
Which is the best driver in python to connect to postgresql?
There are a few possibilities, http://wiki.postgresql.org/wiki/Python but I don't know which is the best choice
Any idea?
psycopg2 is the one everyone uses with CPython. For PyPy though, you'd want to look at the pure Python ones.
I would recommend sqlalchemy - it offers great flexibility and has a sophisticated inteface.
Futhermore it's not bound to postgresql alone.
Shameless c&p from the tutorial:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# an Engine, which the Session will use for connection
# resources
some_engine = create_engine('postgresql://scott:tiger#localhost/')
# create a configured "Session" class
Session = sessionmaker(bind=some_engine)
# create a Session
session = Session()
# work with sess
myobject = MyObject('foo', 'bar')
session.add(myobject)
session.commit()
Clarifications due to comments (update):
sqlalchemy itself is not a driver, but a so called Object Relational Mapper. It does provide and include it's own drivers, which in the postgresql-case is libpq, which itself is wrapped in psycopg2.
Because the OP emphasized he wanted the "best driver" to "connect to postgresql" i pointed sqlalchemy out, even if it might be a false answer terminology-wise, but intention-wise i felt it to be the more useful one.
And even if i do not like the "hair-splitting" dance, i still ended up doing it nonetheless, due to the pressure felt coming from the comments to my answer.
I apologize for any irritations caused by my slander.