We have a file db.py where a peewee database is defined:
db = PostgresqlExtDatabase('mom',
user=DB_CONFIG['username'],
password=DB_CONFIG['password'],
host=DB_CONFIG['host'],
port=DB_CONFIG['port'],
threadlocals=True,
register_hstore=False,
autocommit=True,
autorollback=True,
cursor_factory=DictCursor)
Calling db.execute("SOME RAW SQL UPDATE QUERY") works as expected.
But calling a begin before that does not stop the DB from being modified.
db.begin()
db.execute("SOME RAW SQL UPDATE QUERY") # <- Does not wait, db is updated immediately here
db.commit()
Am i doing this right?
I basically need to nest the raw sql in a transaction if one is already ongoing,
else just execute it right away if there is no transaction begin called.
This works as expected if i do db.set_autocommit(False) then execute_sql then commit().
It also works inside the atomic() context manager.
To give some context, I am working on a web application, on logistics,
and our codebase uses Flask and an SQLAlchemy scoped_session with autocommit set to True.
It does not use the SQLAlchemy ORM (due to.. historical reasons)
and instead just uses the Session object and its
execute(), begin(), begin_nested(), rollback() and remove() methods.
The way it does it is by defining a Session Session = scoped_session(sessionmaker(autocommit=True)) in a file,
and then calling session = Session() everywhere in the codebase, and executing the queries using session.execute("SQL")
Sometimes, a session.begin() is called, so the query does not execute until the commit (or rollback).
We'd now really like to use peewee.
But.. the codebase is built on this session. So this has to be spoofed.
Going and changing every file is impossible, and not enough test cases to boot (for.. historical reasons).
Also I had some questions but I don't know where to ask them, so I hope you don't mind if I put them here:
Is this db object (and its connection) bound to the thread it is executing in?
Basically will there be some bug if db is imported from two different files, and db.begin() is called from each?
I can see in the ipython shell that the id for the db object above is the same per thread,
so am I correct in assuming that unless the psycopg2 connection is recreated, this should be isolated?
To spoof the sqlalchemy Session, I have created a wrapper class that returns the kind of session required,
the SQLA Session object, or the wrapper I've written for peewee to spoof that.
class SessionMocker(object):
# DO NOT make this a singleton. Sessions will break
def __init__(self, orm_type=ORM_TYPES.SQLA):
assert orm_type in ORM_TYPES, "Invalid session constructor type"
super(SessionMocker, self).__init__()
self.orm_type = orm_type
def __call__(self, *args, **kwargs):
if self.orm_type == ORM_TYPES.SQLA:
return SQLASession(*args, **kwargs)
if self.orm_type == ORM_TYPES.PEEWEE:
# For now lets assume no slave
return SessionWrapper(*args, **kwargs)
raise NotImplementedError
def __getattr__(self, item):
"""
Assuming this will never be called without calling Session() first.
Else there is no way to tell what type of Session class (ORM) is required, since that can't be passed.
"""
if self.orm_type == ORM_TYPES.SQLA:
kls = SQLASession
elif self.orm_type == ORM_TYPES.PEEWEE:
kls = SessionWrapper
else:
raise NotImplementedError
return getattr(kls, item)
Session = SessionMocker(ORM_TYPES.SQLA)
I figured this will allow the codebase to make a transparent and seamless switch over to using peewee without having to change it everywhere.
How can i do this in a better way?
The docs explain how to do this: http://docs.peewee-orm.com/en/latest/peewee/transactions.html#autocommit-mode
But, tl;dr, you need to disable autocommit before begin/commit/rollback will work like you expect:
db.set_autocommit(False)
db.begin()
try:
user.delete_instance(recursive=True)
except:
db.rollback()
raise
else:
try:
db.commit()
except:
db.rollback()
raise
finally:
db.set_autocommit(True)
The default value for autocommit is True but the default value of autorollback is False. Setting autorollback as True automatically rollback when an exception occurs while executing a query. Can't be sure but maybe this mess the situation. So, if you want try it with the autorollback as False
Related
I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.
Currently I am developing a class which abstracts the SQLAlchemy. This class will act as helper tool to verify the values from database. This class will be used in regression/load test. Test cases will make hundred-thousands of database query. The layout of my class is as following.
class MyDBClass:
def __init__(self, dbName)
self.dbName = dbName
self.dbEngines[dbName] = create_engine()
self.dbMetaData[dbName] = MetaData()
self.dbMetaData[dbName].reflect(bind=self.dbEngines[dbName])
self.dbSession[dbName] = sessionmaker(bind=self.dbEngines[dbName])
def QueryFunction(self,dbName, tablename, some arguments):
session = self.dbSession[dbName]()
query = session.query(requiredTable)
result = query.filter().all()
session.close()
def updateFunction(self, dbName, talbeName, some arguments):
session = self.dbSession[dbName]()
session.query(requiredTable).filter().update()
session.commit()
session.close()
def insertFunction(self, dbName, tableName, some arguments):
connection = self.dbEngines[dbName].connect()
requiredTable = self.dbMetaData[dbName].tables[tableName]
connection.execute(requiredTable.insert(values=columnValuePair))
connection.close()
def cleanClose(self):
# Code which will remove the connection/session/object from memory.
# do some graceful work to clean close.
I want to write cleanClose() method which should remove the object which might be created by this class. This method should remove all those object from memory and provide a clean close. This may also avoid the memory leak.
I am not able to figure out what all object should be removed from the memory. Can some one suggest me what method calls I need to make here?
Edit1:
Is there any way by which I can measure the performance different method and their variant?
I was going through the documentation here and realized that I should not make session in every method rather I should create single instance of session and use throughout. Please provide your feedback on this. And let me know what would be the best way of doing thing here.
Any kind of help will be greatly appreciated here.
To remove objects from memory in Python, you just need to stop referencing them. There is not usually any need to explicitly write or call any methods to destroy or clean up the objects. So, an instance of MyDBClass will be automatically cleaned up when it goes out of scope.
If you are talking about closing down an SQLAlchemy session, then you just need to call the close() method on it.
An SQLAlchemy session is designed for multiple transactions. You don't generally need to create and destroy it multiple times. Create one session in the __init__ function and then use that in QueryFunction, updateFunction, etc.
I am writing a class for database queries with SQLite3 in my application. Most of the methods of the class are very similar to this:
def getPrice(self, symbol, date):
date = dt.datetime.strptime(date, '%Y-%m-%d')
conn = sqlite3.connect('stocks.db')
curs =conn.cursor()
curs.execute('''SELECT close FROM prices WHERE symbol = ? AND date = ?;''', (symbol, date))
close = curs.fetchall()
curs.close()
return close
The only difference is the database query and the number of arguments. Is there a possibility to abstract the opening and closing of the database connection away?
I know that it would be probably easier to use a ORM like SQLAlchemy. But I want to understand how I solve this kind of problem in general, not only in relation to databases.
Thanks for your suggestions!
EDIT: This post basically answers my question.
First. You'll be much, much happier with one -- and only one -- global connection. Configuration changes are much easier if you do this in exactly one place.
Second, use the with statement and the context manager library.
from contextlib import closing
from my_database_module import the_global_connection
def getPrice(
with closing(the_global_connection.cursor())
curs.execute('''SELECT close FROM prices WHERE symbol = ? AND date = ?;''', (symbol, date))
close = curs.fetchall()
return close
Your database module looks like this:
import sqlite3
the_global_connection = sqlite3.connect( "stocks.db" )
This gives you the ability to change databases, or database server technology in exactly one place.
Note that as of Python2.6, sqlite.connect returns a context manager:
Connection objects can be used as context managers that automatically
commit or rollback transactions. In the event of an exception, the
transaction is rolled back; otherwise, the transaction is committed:
Therefore, do not decorate the connection with contextlib.closing -- otherwise, you will lose the commit/rollback behavior and instead only get the connection.close() called upon exiting the with-statement.
Per PEP249:
... closing a connection without committing the changes first will cause
an implicit rollback to be performed.
So the commit/rollback behavior is much more useful than simply calling close.
You could use a context manager:
import contextlib
def query(sql,args):
with contextlib.closing(sqlite3.connect('stocks.db')) as conn:
curs = conn.cursor()
curs.execute(sql,args))
close = curs.fetchall()
return close
def getPrice(self, symbol, date):
date = dt.datetime.strptime(date, '%Y-%m-%d')
sql = '''SELECT close FROM prices WHERE symbol = ? AND date = ?'''
args = (symbol, date)
return query(sql, args)
Since you have many functions like getPrice which differ only by the SQL and arguments, you could reduce the repetitious boiler-plate code by defining the query function.
You could also define a context manager to rollback the connection on errors and commit as well as close upon exiting the with block. An example of this (for MySQL) can be found here, adapting it to sqlite3 should not be difficult..
Reference:
The contextlib.closing decorator
Encapsulate that logic into an object, pass that object to the data access object and ask it to call the methods.
Aspects or decorators might be a good way to do things.
You don't mention pooling or transactions. Think about those as well.
This example illustrates a mystery I encountered in an application I am building. The application needs to support an option allowing the user to exercise the code without actually committing changes to the DB. However, when I added this option, I discovered that changes were persisted to the DB even when I did not call the commit() method.
My specific question can be found in the code comments. The underlying goal is to have a clearer understanding of when and why SQLAlchemy will commit to the DB.
My broader question is whether my application should (a) use a global Session instance, or (b) use a global Session class, from which particular instances would be instantiated. Based on this example, I'm starting to think that the correct answer is (b). Is that right? Edit: this SQLAlchemy documentation suggests that (b) is recommended.
import sys
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key = True)
name = Column(String)
age = Column(Integer)
def __init__(self, name, age = 0):
self.name = name
self.age = 0
def __repr__(self):
return "<User(name='{0}', age={1})>".format(self.name, self.age)
engine = create_engine('sqlite://', echo = False)
Base.metadata.create_all(engine)
Session = sessionmaker()
Session.configure(bind=engine)
global_session = Session() # A global Session instance.
commit_ages = False # Whether to commit in modify_ages().
use_global = True # If True, modify_ages() will commit, regardless
# of the value of commit_ages. Why?
def get_session():
return global_session if use_global else Session()
def add_users(names):
s = get_session()
s.add_all(User(nm) for nm in names)
s.commit()
def list_users():
s = get_session()
for u in s.query(User): print ' ', u
def modify_ages():
s = get_session()
n = 0
for u in s.query(User):
n += 10
u.age = n
if commit_ages: s.commit()
add_users(('A', 'B', 'C'))
print '\nBefore:'
list_users()
modify_ages()
print '\nAfter:'
list_users()
tl;dr - The updates are not actually committed to the database-- they are part of an uncommitted transaction in progress.
I made 2 separate changes to your call to create_engine(). (Other than this one line, I'm using your code exactly as posted.)
The first was
engine = create_engine('sqlite://', echo = True)
This provides some useful information. I'm not going to post the entire output here, but notice that no SQL update commands are issued until after the second call to list_users() is made:
...
After:
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 UPDATE users SET age=? WHERE users.id = ?
xxxx-xx-xx xx:xx:xx,xxx INFO sqlalchemy.engine.base.Engine.0x...d3d0 (10, 1)
...
This is a clue that the data is not persisted, but kept around in the session object.
The second change I made was to persist the database to a file with
engine = create_engine('sqlite:///db.sqlite', echo = True)
Running the script again provides the same output as before for the second call to list_users():
<User(name='A', age=10)>
<User(name='B', age=20)>
<User(name='C', age=30)>
However, if you now open the db we just created and query it's contents, you can see that the added users were persisted to the database, but the age modifications were not:
$ sqlite3 db.sqlite "select * from users"
1|A|0
2|B|0
3|C|0
So, the second call to list_users() is getting its values from the session object, not from the database, because there is a transaction in progress that hasn't been committed yet. To prove this, add the following lines to the end of your script:
s = get_session()
s.rollback()
print '\nAfter rollback:'
list_users()
Since you state you are actually using MySQL on the system you are seeing the problem, check the engine type the table was created with. The default is MyISAM, which does not support ACID transactions. Make sure you are using the InnoDB engine, which does do ACID transactions.
You can see which engine a table is using with
show create table users;
You can change the db engine for a table with alter table:
alter table users engine="InnoDB";
1. the example: Just to make sure that (or check if) the session does not commit the changes, it is enough to call expunge_all on the session object. This will most probably prove that the changes are not actually committed:
....
print '\nAfter:'
get_session().expunge_all()
list_users()
2. mysql: As you already mentioned, the sqlite example might not reflect what you actually see when using mysql. As documented in sqlalchemy - MySQL - Storage Engines, the most likely reason for your problem is the usage of non-transactional storage engines (like MyISAM), which results in an autocommit mode of execution.
3. session scope: Although having one global session sounds like a quest for a problem, using new session for every tiny little request is also not a great idea. You should think of a session as a transaction/unit-of-work. I find the usage of the contextual sessions the best of two worlds, where you do not have to pass the session object in the hierarchy of method calls, and at the same time you are given a pretty good safety in the multi-threaded environment. I do use the local session once in a while where I know I do not want to interact with the currently running transaction (session).
Note that the defaults of create_session() are the opposite of that of sessionmaker(): autoflush and expire_on_commit are False, autocommit is True.
global_session is already instantiated when you call modify_ages() and you've already committed to the database. If you re-instantiate global_session after you commit, it should start a new transaction.
My guess is since you've already committed and are re-using the same object, each additional modification is automatically committed.
We're using SQLAlchemy declarative base and I have a method that I want isolate the transaction level for. To explain, there are two processes concurrently writing to the database and I must have them execute their logic in a transaction. The default transaction isolation level is READ COMMITTED, but I need to be able to execute a piece of code using SERIALIZABLE isolation levels.
How is this done using SQLAlchemy? Right now, I basically have a method in our model, which inherits from SQLAlchemy's declarative base, that essentially needs to be transactionally invoked.
from psycopg2.extensions import ISOLATION_LEVEL_AUTOCOMMIT
from psycopg2.extensions import ISOLATION_LEVEL_READ_COMMITTED
from psycopg2.extensions import ISOLATION_LEVEL_SERIALIZABLE
class OurClass(SQLAlchemyBaseModel):
#classmethod
def set_isolation_level(cls, level=ISOLATION_LEVEL_SERIALIZABLE):
cls.get_engine().connect().connection.set_isolation_level(level)
#classmethod
def find_or_create(cls, **kwargs):
try:
return cls.query().filter_by(**kwargs).one()
except NoResultFound:
x = cls(**kwargs)
x.save()
return x
I am doing this to invoke this using a transaction isolation level, but it's not doing what I expect. The isolation level still is READ COMMITTED from what I see in the postgres logs. Can someone help identify what I'm doing anythign wrong?
I'm using SQLAlchemy 0.5.5
class Foo(OurClass):
def insert_this(self, kwarg1=value1):
# I am trying to set the isolation level to SERIALIZABLE
try:
self.set_isolation_level()
with Session.begin():
self.find_or_create(kwarg1=value1)
except Exception: # if any exception is thrown...
print "I caught an expection."
print sys.exc_info()
finally:
# Make the isolation level back to READ COMMITTED
self.set_isolation_level(ISOLATION_LEVEL_READ_COMMITTED)
From Michael Bayer, the maintainer of SQLAlchemy:
Please use the "isolation_level"
argument to create_engine()
and use the latest tip of SQLAlchemy
until 0.6.4 is released, as there was
a psycopg2-specific bug fixed recently
regarding isolation level.
The approach you have below does not
affect the same connection which is
later used for querying - you'd
instead use a PoolListener that sets
up set_isolation_level on all
connections as they are created.
The isolation level is set within a transaction, e.g.
try:
Session.begin()
Session.execute('set transaction isolation level serializable')
self.find_or_create(kwarg1=value1)
except:
...
From PostgreSQL doc:
If SET TRANSACTION is executed without a prior START TRANSACTION or BEGIN, it will appear to have no effect, since the transaction will immediately end.