I am writing a class for database queries with SQLite3 in my application. Most of the methods of the class are very similar to this:
def getPrice(self, symbol, date):
date = dt.datetime.strptime(date, '%Y-%m-%d')
conn = sqlite3.connect('stocks.db')
curs =conn.cursor()
curs.execute('''SELECT close FROM prices WHERE symbol = ? AND date = ?;''', (symbol, date))
close = curs.fetchall()
curs.close()
return close
The only difference is the database query and the number of arguments. Is there a possibility to abstract the opening and closing of the database connection away?
I know that it would be probably easier to use a ORM like SQLAlchemy. But I want to understand how I solve this kind of problem in general, not only in relation to databases.
Thanks for your suggestions!
EDIT: This post basically answers my question.
First. You'll be much, much happier with one -- and only one -- global connection. Configuration changes are much easier if you do this in exactly one place.
Second, use the with statement and the context manager library.
from contextlib import closing
from my_database_module import the_global_connection
def getPrice(
with closing(the_global_connection.cursor())
curs.execute('''SELECT close FROM prices WHERE symbol = ? AND date = ?;''', (symbol, date))
close = curs.fetchall()
return close
Your database module looks like this:
import sqlite3
the_global_connection = sqlite3.connect( "stocks.db" )
This gives you the ability to change databases, or database server technology in exactly one place.
Note that as of Python2.6, sqlite.connect returns a context manager:
Connection objects can be used as context managers that automatically
commit or rollback transactions. In the event of an exception, the
transaction is rolled back; otherwise, the transaction is committed:
Therefore, do not decorate the connection with contextlib.closing -- otherwise, you will lose the commit/rollback behavior and instead only get the connection.close() called upon exiting the with-statement.
Per PEP249:
... closing a connection without committing the changes first will cause
an implicit rollback to be performed.
So the commit/rollback behavior is much more useful than simply calling close.
You could use a context manager:
import contextlib
def query(sql,args):
with contextlib.closing(sqlite3.connect('stocks.db')) as conn:
curs = conn.cursor()
curs.execute(sql,args))
close = curs.fetchall()
return close
def getPrice(self, symbol, date):
date = dt.datetime.strptime(date, '%Y-%m-%d')
sql = '''SELECT close FROM prices WHERE symbol = ? AND date = ?'''
args = (symbol, date)
return query(sql, args)
Since you have many functions like getPrice which differ only by the SQL and arguments, you could reduce the repetitious boiler-plate code by defining the query function.
You could also define a context manager to rollback the connection on errors and commit as well as close upon exiting the with block. An example of this (for MySQL) can be found here, adapting it to sqlite3 should not be difficult..
Reference:
The contextlib.closing decorator
Encapsulate that logic into an object, pass that object to the data access object and ask it to call the methods.
Aspects or decorators might be a good way to do things.
You don't mention pooling or transactions. Think about those as well.
Related
I was writing a few simple CRUD operations to try out sqlite3 with Python, and then I saw a nice function that executes queries and closes connection in this answer:
from contextlib import closing
import sqlite3
def query(self, db_name, sql):
with closing(sqlite3.connect(db_name)) as con, con, \
closing(con.cursor()) as cur:
cur.execute(sql)
return cur.fetchall()
I thought it would be nice to have something like this and call this function with whatever sql sentence I need whenever I want to query the database.
However, when I'm running an insert I'd need to return cur.lastrowid instead or cur.fetchall() and when deleting I'd like to know the cursor.rowcount instead. Also, sometimes I need to add parameters to the query, for instance sometimes I want to run select * from [some_table] and some other times I need select * from [some_table] where [some_column] = ?. So the function needs some tweaks depending on what kind of operation is being executed.
I could write one function for each kind of operation, with the same basic structure and the tweaks each query needs. But that sounds a bit repetitive since there would be duplicate chunks of code and these functions would look pretty similar to each other. So I'm not sure it's the right approach.
Is there another alternative to make this function a bit more "generic" to fit all cases?
One option is to have callouts in the with clause that let you customize program actions. There are many ways to do this. One is to write a class that calls methods to allow specialization. In this example, a class has pre and post processers. It does its work in __init__ and leaves its result in an instance variable which allows for terse usage.
from contextlib import closing
import sqlite3
class SqlExec:
def __init__(self, db_name, sql, parameters=()):
self.sql = sql
self.parameters = parameters
with closing(sqlite3.connect(db_name)) as self.con, \
closing(con.cursor()) as self.cur:
self.pre_process()
self.cur.execute(self.sql, parameters=self.parameters)
self.retval = self.post_process()
def pre_process(self):
return
def post_process_fetchall(self):
self.retval = self.cur.fetchall
post_process = post_process_fetchall
class SqlExecLastRowId(SqlExec):
def post_process(self):
self.retval = cur.lastrowid
last_row = SqlExecLastRowId("mydb.db", "DELETE FROM FOO WHERE BAR='{}'",
paramters=("baz",)).retval
I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,
def already_exists(story_data,c):
# TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
# exists in the DB.
c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,)=c.fetchone()
if number_of_rows > 0:
return True
return False
This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?
There are two routes you can take to address this problem:
Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.
Which one you should do depends on what you're trying to achieve.
If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.
If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:
from unittest.mock import Mock
...
def test_already_exists_returns_true_for_positive_count():
mockConn = Mock(
execute=Mock(),
fetchone=Mock(return_value=(5,)),
)
story = Story(post_id=10) # Making some assumptions about what your object might look like.
result = already_exists(story, mockConn)
assert result
# Possibly assert calls on the mock. Value of these asserts is debatable.
mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
mockConn.fetchone.assert_called()
The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.
Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.
def already_exists(story_data):
# Here `connection` is a singleton which returns the database connection.
connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
(number_of_rows,) = connection.fetchone()
if number_of_rows > 0:
return True
return False
Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.
def already_exists(self):
# Here the connection is associated with the object.
self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
(number_of_rows,) = self.connection.fetchone()
if number_of_rows > 0:
return True
return False
But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy_declarative import Address, Base, Person
engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
Then you use that to make queries. For example, it has an exists method.
session.query(Post.id).filter(q.exists()).scalar()
Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.
As the title states I need some help with Python and MySQL. I am currently studying Python further and I am focusing hard on using Python and MySQL for database design, development, administration and applications.
I am familiar with MySQL and somewhat familiar with Python. Currently I am working on object orientated programming and I am trying my hand at setting up a database connection inside of a database class and then using the class to Create, Update, Delete and Read data.
I have created a new Python object:
import pymysql as MySQL
class Database(object):
Host = "127.0.0.1"
Database = "****"
user = "****"
password = "****"
#staticmethod
def initialize():
currentdb = MySQL.connect(Database.Host, Database.user, Database.password, Database.Database)
cursor = currentdb.cursor()
#staticmethod
def insert(Table, DataDict):
placeholders = ", ".join(["%s"] * len(DataDict))
columns = ", ".join(DataDict.keys())
sql = "INSERT INTO %s (%s) VALUES (%s)"%(Table, columns, placeholders)
cursor.execute(sql, DataDict.values())
I want to know, how do I work with the cursor inside of a object? I don't know if my current approach is even close to how it should be handled, I am really not sure.
Can the cursor be initialized in this way, and then used further in the object as I intend on doing in the above extract?
Any help would be highly appreciated.
The right way to work with cursors is like this:
import contextlib
def doSomething():
with contextlib.closing(database.cursor()) as cursor:
cursor.execute(...)
# At the end of the `with` statement, cursor is closed
Do not keep a cursor open for too long. Keeping a connection open for a long time, as you do, is fine. Also, read on transaction control.
If you're doing more than a handful of DB operations, consider using a library like SQLAlchemy or Pony ORM.
import contextlib
def doSomething():
with contextlib.closing(database.cursor()) as cursor:
cursor.execute(...)
library for db SQLAlchemy or Pony ORM.
Have you considered using SQLAlchemy? This gives you a mapping between Python classes and MySQL (or any other RDBMS) tables. I've recently been using it on a fairly hefty real-world project and it seems to do the job fairly well and is easy enough to learn.
Check out the following code. I added the content in your initialize() to the standard python class init method and made the database be initialized with different types of parameters:
import pymysql as MySQL
class Database(object):
def __init__(self, host, db, user, pw):
self.currentdb = MySQL.connect(Database.host, user, pw, db)
def insert(self, Table, DataDict):
placeholders = ", ".join(["%s"] * len(DataDict))
columns = ", ".join(DataDict.keys())
sql = "INSERT INTO %s (%s) VALUES (%s)"%(Table, columns, placeholders)
with self.currentdb.cursor() as db_cursor:
db_cursor.execute(sql, DataDict.values())
Once you are here, then you can initialize a Database object as below and insert data:
my_db = Database(host="127.0.0.1", user="****", pw="****", db="****")
my_db.insert('table_name', data_dict)
Please note, I haven't changed your code, only presenting an organization based on your initial post that could work.
We have a file db.py where a peewee database is defined:
db = PostgresqlExtDatabase('mom',
user=DB_CONFIG['username'],
password=DB_CONFIG['password'],
host=DB_CONFIG['host'],
port=DB_CONFIG['port'],
threadlocals=True,
register_hstore=False,
autocommit=True,
autorollback=True,
cursor_factory=DictCursor)
Calling db.execute("SOME RAW SQL UPDATE QUERY") works as expected.
But calling a begin before that does not stop the DB from being modified.
db.begin()
db.execute("SOME RAW SQL UPDATE QUERY") # <- Does not wait, db is updated immediately here
db.commit()
Am i doing this right?
I basically need to nest the raw sql in a transaction if one is already ongoing,
else just execute it right away if there is no transaction begin called.
This works as expected if i do db.set_autocommit(False) then execute_sql then commit().
It also works inside the atomic() context manager.
To give some context, I am working on a web application, on logistics,
and our codebase uses Flask and an SQLAlchemy scoped_session with autocommit set to True.
It does not use the SQLAlchemy ORM (due to.. historical reasons)
and instead just uses the Session object and its
execute(), begin(), begin_nested(), rollback() and remove() methods.
The way it does it is by defining a Session Session = scoped_session(sessionmaker(autocommit=True)) in a file,
and then calling session = Session() everywhere in the codebase, and executing the queries using session.execute("SQL")
Sometimes, a session.begin() is called, so the query does not execute until the commit (or rollback).
We'd now really like to use peewee.
But.. the codebase is built on this session. So this has to be spoofed.
Going and changing every file is impossible, and not enough test cases to boot (for.. historical reasons).
Also I had some questions but I don't know where to ask them, so I hope you don't mind if I put them here:
Is this db object (and its connection) bound to the thread it is executing in?
Basically will there be some bug if db is imported from two different files, and db.begin() is called from each?
I can see in the ipython shell that the id for the db object above is the same per thread,
so am I correct in assuming that unless the psycopg2 connection is recreated, this should be isolated?
To spoof the sqlalchemy Session, I have created a wrapper class that returns the kind of session required,
the SQLA Session object, or the wrapper I've written for peewee to spoof that.
class SessionMocker(object):
# DO NOT make this a singleton. Sessions will break
def __init__(self, orm_type=ORM_TYPES.SQLA):
assert orm_type in ORM_TYPES, "Invalid session constructor type"
super(SessionMocker, self).__init__()
self.orm_type = orm_type
def __call__(self, *args, **kwargs):
if self.orm_type == ORM_TYPES.SQLA:
return SQLASession(*args, **kwargs)
if self.orm_type == ORM_TYPES.PEEWEE:
# For now lets assume no slave
return SessionWrapper(*args, **kwargs)
raise NotImplementedError
def __getattr__(self, item):
"""
Assuming this will never be called without calling Session() first.
Else there is no way to tell what type of Session class (ORM) is required, since that can't be passed.
"""
if self.orm_type == ORM_TYPES.SQLA:
kls = SQLASession
elif self.orm_type == ORM_TYPES.PEEWEE:
kls = SessionWrapper
else:
raise NotImplementedError
return getattr(kls, item)
Session = SessionMocker(ORM_TYPES.SQLA)
I figured this will allow the codebase to make a transparent and seamless switch over to using peewee without having to change it everywhere.
How can i do this in a better way?
The docs explain how to do this: http://docs.peewee-orm.com/en/latest/peewee/transactions.html#autocommit-mode
But, tl;dr, you need to disable autocommit before begin/commit/rollback will work like you expect:
db.set_autocommit(False)
db.begin()
try:
user.delete_instance(recursive=True)
except:
db.rollback()
raise
else:
try:
db.commit()
except:
db.rollback()
raise
finally:
db.set_autocommit(True)
The default value for autocommit is True but the default value of autorollback is False. Setting autorollback as True automatically rollback when an exception occurs while executing a query. Can't be sure but maybe this mess the situation. So, if you want try it with the autorollback as False
I am designing a fairly complex database, and know that some of my queries will be far outside the scope of Django's ORM. Has anyone integrated SP's with Django's ORM successfully? If so, what RDBMS and how did you do it?
We (musicpictures.com / eviscape.com) wrote that django snippet but its not the whole story (actually that code was only tested on Oracle at that time).
Stored procedures make sense when you want to reuse tried and tested SP code or where one SP call will be faster than multiple calls to the database - or where security requires moderated access to the database - or where the queries are very complicated / multistep. We're using a hybrid model/SP approach against both Oracle and Postgres databases.
The trick is to make it easy to use and keep it "django" like. We use a make_instance function which takes the result of cursor and creates instances of a model populated from the cursor. This is nice because the cursor might return additional fields. Then you can use those instances in your code / templates much like normal django model objects.
def make_instance(instance, values):
'''
Copied from eviscape.com
generates an instance for dict data coming from an sp
expects:
instance - empty instance of the model to generate
values - dictionary from a stored procedure with keys that are named like the
model's attributes
use like:
evis = InstanceGenerator(Evis(), evis_dict_from_SP)
>>> make_instance(Evis(), {'evi_id': '007', 'evi_subject': 'J. Bond, Architect'})
<Evis: J. Bond, Architect>
'''
attributes = filter(lambda x: not x.startswith('_'), instance.__dict__.keys())
for a in attributes:
try:
# field names from oracle sp are UPPER CASE
# we want to put PIC_ID in pic_id etc.
setattr(instance, a, values[a.upper()])
del values[a.upper()]
except:
pass
#add any values that are not in the model as well
for v in values.keys():
setattr(instance, v, values[v])
#print 'setting %s to %s' % (v, values[v])
return instance
# Use it like this:
pictures = [make_instance(Pictures(), item) for item in picture_dict]
# And here are some helper functions:
def call_an_sp(self, var):
cursor = connection.cursor()
cursor.callproc("fn_sp_name", (var,))
return self.fn_generic(cursor)
def fn_generic(self, cursor):
msg = cursor.fetchone()[0]
cursor.execute('FETCH ALL IN "%s"' % msg)
thing = create_dict_from_cursor(cursor)
cursor.close()
return thing
def create_dict_from_cursor(cursor):
rows = cursor.fetchall()
# DEBUG settings (used to) affect what gets returned.
if DEBUG:
desc = [item[0] for item in cursor.cursor.description]
else:
desc = [item[0] for item in cursor.description]
return [dict(zip(desc, item)) for item in rows]
cheers, Simon.
You have to use the connection utility in Django:
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("SQL STATEMENT CAN BE ANYTHING")
data = cursor.fetchone()
If you are expecting more than one row, use cursor.fetchall() to fetch a list of them.
More info here: http://docs.djangoproject.com/en/dev/topics/db/sql/
Don't.
Seriously.
Move the stored procedure logic into your model where it belongs.
Putting some code in Django and some code in the database is a maintenance nightmare. I've spent too many of my 30+ years in IT trying to clean up this kind of mess.
There is a good example :
https://djangosnippets.org/snippets/118/
from django.db import connection
cursor = connection.cursor()
ret = cursor.callproc("MY_UTIL.LOG_MESSAGE", (control_in, message_in))# calls PROCEDURE named LOG_MESSAGE which resides in MY_UTIL Package
cursor.close()
If you want to look at an actual running project that uses SP, check out minibooks. A good deal of custom SQL and uses Postgres pl/pgsql for SP. I think they're going to remove the SP eventually though (justification in trac ticket 92).
I guess the improved raw sql queryset support in Django 1.2 can make this easier as you wouldn't have to roll your own make_instance type code.
Cx_Oracle can be used. Also, It is fairly helpful when we do not have access to production deployed code and need arises to make major changes in database.
import cx_Oracle
try:
db = dev_plng_con
con = cx_Oracle.connect(db)
cur = con.cursor()
P_ERROR = str(error)
cur.callproc('NAME_OF_PACKAGE.PROCEDURENAME', [P_ERROR])
except Exception as error:
error_logger.error(message)