How to create a function that connects, executes and disconnects - python

Today I have spent lots of time learning abit slowly with Postgres and I have been creating a code that does some stuff with the database such as insert, select etc etc.
I have realized that most of my code is copy paste when it comes to Connect & Disconnet and I know some people do not like it also depends on what I am doing so before people gets mad at me, do not take this as a bad code but more that I would like to imrpove of course <3
What I have done so far is:
import psycopg2
import psycopg2.extras
from loguru import logger
DATABASE_CONNECTION = {
"host": "TEST",
"database": "TEST",
"user": "TEST",
"password": "TEST"
}
def register_datas(store, data):
"""
Register a data to database
:param store:
:param data:
:return:
"""
ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
ps_cursor = ps_connection.cursor()
ps_connection.autocommit = True
sql_update_query = "INSERT INTO public.store_items (store, name) VALUES (%s, %s);"
try:
data_tuple = (store, data["name"])
ps_cursor.execute(sql_update_query, data_tuple)
has_registered = ps_cursor.rowcount
ps_cursor.close()
ps_connection.close()
return bool(has_registered)
except (Exception, psycopg2.DatabaseError) as error:
logger.exception("Error: %s" % error)
ps_connection.rollback()
ps_cursor.close()
ps_connection.close()
return False
def get_all_keywords(keywords):
"""
Get all keywords
:param positive_or_negative:
:return:
"""
ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
sql_update_query = "SELECT keyword FROM public.keywords WHERE filter_type = %s;"
try:
data_tuple = (keywords,)
ps_cursor.execute(sql_update_query, data_tuple)
all_keywords = [keyword["keyword"] for keyword in ps_cursor]
ps_cursor.close()
ps_connection.close()
return all_keywords
except (Exception, psycopg2.DatabaseError) as error:
logger.exception("Error: %s" % error)
ps_connection.rollback()
ps_cursor.close()
ps_connection.close()
return []
def check_if_store_exists(store):
"""
Check if the store exists in database
:param store:
:return:
"""
ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
ps_cursor = ps_connection.cursor()
sql_update_query = "SELECT store FROM public.store_config WHERE store = %s;"
try:
data_tuple = (store,)
ps_cursor.execute(sql_update_query, data_tuple)
exists = bool(ps_cursor.fetchone())
ps_cursor.close()
ps_connection.close()
return exists
except (Exception, psycopg2.DatabaseError) as error:
logger.exception("Error: %s" % error)
ps_connection.rollback()
ps_cursor.close()
ps_connection.close()
return []
and I do see that I have same code where I do:
ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
ps_cursor = ps_connection.cursor()
...
...
...
...
ps_cursor.close()
ps_connection.close()
return data
and to shorter this code, I wonder if its possible to do a function where I call the function that connects -> lets me do the query/execution -> close the connection and then return the data I want to return?

Context manager
There is a pattern built-in Python. It is named context manager. It has two purposes:
Do operations before and after some logic in with block.
Catch errors inside with block and allow to handle it in a custom way.
To create a context manager, you can go in either of two ways. One (I like it more) is to define a class satisfying the following protocol:
__enter__(self)
__exit__(self, exc_type, exc_value, traceback)
Any class that satisfies the protocol can be used in with statement and works as a context manager. Accordingly to Python duck-typing principles, the interpreter "knows" how to use the class in with statement.
Example:
class QuickConnection:
def __init__(self):
self.ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
self.ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
def __enter__(self):
return self.ps_cursor
def __exit__(self, err_type, err_value, traceback):
if err_type and err_value:
self.ps_connection.rollback()
self.ps_cursor.close()
self.ps_connection.close()
return False
Return value of __exit__ method does matter. If True returned then all the errors happened in the with block suppressed. If False returned then the errors raised at the end of the __exit__ execution. The return values of __exit__ is better to keep explicit since the feature itself is not that obvious.
Or use contextlib.contextmanager decorator
from contextlib import contextmanager
#contextmanager
def quick_connection():
ps_connection = psycopg2.connect(**DATABASE_CONNECTION)
ps_cursor = ps_connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
try:
yield ps_cursor
except Exception: # don't do this, catch specific errors instead.
ps_connection.rollback()
raise
finally:
ps_cursor.close()
ps_connection.close()
Usage
with QuickConnection() as ps_cursor:
data_tuple = (store, data["name"])
ps_cursor.execute(sql_update_query, data_tuple)
has_registered = ps_cursor.rowcount
With the context manager, you can reuse a connection, be sure that it is closed. Also, you can catch and handle errors related to your DB operations in the context manager. Usage of context managers is compact and readable, I believe in the end this is the goal.

Related

Return data from Python's Redshift procedure call

redshift_connector is defined to be aligned with https://peps.python.org/pep-0249/#id24 but I can't, after calling procedure, retrieve data into a dataframe.
Instead I'm getting 'mycursor' value. How to overcome this?
fetch*() methods don't allow passing argument that will allow getting into data in mycursor.
I also tried RECORDS type but no luck.
Procedure's body:
--CREATE TABLE reporting.tbl(a int, b int);
--INSERT INTO reporting.tblVALUES(1, 4);
CREATE OR REPLACE PROCEDURE reporting.procedure(param IN integer, rs_out INOUT refcursor)
LANGUAGE plpgsql
AS $$
BEGIN
OPEN rs_out FOR SELECT a FROM reporting.tbl;
END;
$$;
Python code:
import redshift_connector
conn = redshift_connector.connect(
host='xyz.xyz.region.redshift.amazonaws.com',
database='db',
port=5439,
user="user",
password='p##s'
)
cursor = conn.cursor()
cursor.execute("BEGIN;")
res = cursor.callproc("reporting.procedure", parameters=[1, 'mycursor'])
res = cursor.fetchall()
cursor.execute("COMMIT;")
#returns (['mycursor'],)
print(res)
I think you are trying to define 2 cursors and only one is allowed. "conn.cursor()" creates a cursor with name defined by redshift_connector. "OPEN rs_out FOR SELECT a FROM reporting.tbl;" in your procedure opens a second cursor with the name mycursor. The "cursor.fetchall()" is trying to fetch from the first cursor (and possibly erroring). No command is fetching from mycursor.
I don't believe there is a way to get "cursor.fetchall()" to point to a different cursor name so I think you need to run the SQL commands (CALL, FETCH, etc) directly.
Something like this:
import redshift_connector
conn = redshift_connector.connect(
host='xyz.xyz.region.redshift.amazonaws.com',
database='db',
port=5439,
user="user",
password='p##s'
)
conn.run("BEGIN;")
res = conn.run("CALL reporting.procedure(1, 'mycursor')")
res = conn.run("FETCH ALL FROM mycursor;")
conn.run("COMMIT;")
print(res)
Be aware that if you are on a single node Redshift cluster FETCH ALL isn't allowed and you will need to use FETCH FORWARD instead.
Above untested and off the cuff.
I modified Bill's solution by:
Adding context manager class:
import redshift_connector
class ConnectionError(Exception):
pass
class CredentialsError(Exception):
pass
class SQLError(Exception):
pass
class RedshiftClientData():
def __init__(self, config: dict) -> None:
self.configuration = config
def __enter__(self) -> 'cursor':
try:
self.conn = redshift_connector.connect(**self.configuration)
self.cursor = self.conn.cursor()
return self.cursor
except redshift_connector.InterfaceError as err:
raise ConnectionError(err)
except redshift_connector.ProgrammingError as err:
raise CredentialsError(err)
except Exception as ex:
template = "An exception of type {0} occured in class RedshiftClient. Arguments:\n{1!r}"
message = template.format(type(ex).__name__, ex.args)
print(message)
def __exit__(self, exc_type, exc_value, exc_trace) -> None:
self.conn.commit()
self.cursor.close()
self.conn.close()
if exc_type is redshift_connector.ProgrammingError:
raise SQLError(exc_value)
elif exc_type:
raise exc_type(exc_value)
Below tested and I confirm that data is fetched without any temp tables.
dbconfig = {
"host": os.environ["redshift_host"],
"database": os.environ["redshift_database"],
"port": int(os.environ["redshift_port"]),
"user": os.environ["redshift_username"],
"password": os.environ["redshift_password"],
}
logging.info("Connecting to Redshift...")
with RedshiftClientData(dbconfig) as cursor:
logging.info("Excel file prep...")
#on the fly process
cursor.execute("CALL reporting.procedure('mycursor');")
cursor.execute("FETCH ALL FROM mycursor;")
result = pd.DataFrame = cursor.fetch_dataframe()
print(result)

UnitTesting: Mock pyodbc cursor messages

I've been trying to test the below metheod specially the if block and have tried multiple things like patching, mocking in various combinations of pyodbc but I've not been able to mock the if condition.
def execute_read(self, query):
dbconn = pyodbc.connect(self.connection_string, convert_unicode=True)
with dbconn.cursor() as cur:
cursor = cur.execute(query)
if not cursor.messages:
res = cursor.fetchall()
else:
raise Exception(cursor.messages[0][1])
return res;
# unit test method
#patch.object(pyodbc, 'connect')
def test_execute_read(self, pyodbc_mock):
pyodbc_mock.return_value = MagicMock()
self.assertIsNotNone(execute_read('query'))
I've read the docs of unittest.mock, but I haven't found a way to get this above if condition covered. Thank you.
You would want to patch the Connection class (given the Cursor object is immutable) and supply a return value for covering the if block. Something that may look like:
with patch.object("pyodbc.Connection") as conn:
conn.cursor().messages = []
...
Tried this with sqlite3 and that worked for me.
Here's an example of using the patch object, something I wrote for frappe/frappe:
def test_db_update(self):
with patch.object(Database, "sql") as sql_called:
frappe.db.set_value(
self.todo1.doctype,
self.todo1.name,
"description",
f"{self.todo1.description}-edit by `test_for_update`",
)
first_query = sql_called.call_args_list[0].args[0]
second_query = sql_called.call_args_list[1].args[0]
self.assertTrue(sql_called.call_count == 2)
self.assertTrue("FOR UPDATE" in first_query)

Python sqlalchemy and mySQL stored procedure always returns 0 (out param only)

I am trying to get the ROW_COUNT() from a MySQL stored procedure into python.
here is what I got, but I don't know what I am missing.
DELIMITER //
CREATE OR REPLACE PROCEDURE sp_refresh_mytable(
OUT row_count INT
)
BEGIN
DECLARE exit handler for SQLEXCEPTION
BEGIN
ROLLBACK;
END;
DECLARE exit handler for SQLWARNING
BEGIN
ROLLBACK;
END;
DECLARE exit handler FOR NOT FOUND
BEGIN
ROLLBACK;
END;
START TRANSACTION;
DELETE FROM mytable;
INSERT INTO mytable
(
col1
, col2
)
SELECT
col1
, col2
FROM othertable
;
SET row_count = ROW_COUNT();
COMMIT;
END //
DELIMITER ;
If I call this in via normal SQL like follows I get the correct row_count of the insert operation (e.g. 26 rows inserted):
CALL sp_refresh_mytable(#rowcount);
select #rowcount as t;
-- output: 26
Then in python/mysqlalchemy:
def call_procedure(engine, function_name, params=None):
connection = engine.raw_connection()
try:
cursor = connection.cursor()
result = cursor.callproc('sp_refresh_mytable', [0])
## try result outputs
resultfetch = cursor.fetchone()
logger.info(result)
logger.info(result[0])
logger.info(resultfetch)
cursor.close()
connection.commit()
connection.close()
logger.info(f"Running procedure {function_name} success!")
return result
except Exception as e:
logger.error(f"Running procedure {function_name} failed!")
logger.exception(e)
return None
finally:
connection.close()
So I tried logging different variations of getting the out value, but it is always 0 or None.
[INFO] db_update [0]
[INFO] db_update 0
[INFO] db_update None
What am I missing?
Thanks!
With the help of this answer I found the following solution that worked for me.
a) Working solution using engine.raw_connection() and cursor.callproc:
def call_procedure(engine, function_name):
connection = engine.raw_connection()
try:
cursor = connection.cursor()
cursor.callproc(function_name, [0])
cursor.execute(f"""SELECT #_{function_name}_0""")
results = cursor.fetchone() ## returns a tuple e.g. (285,)
rows_affected = results[0]
cursor.close()
connection.commit()
logger.info(f"Running procedure {function_name} success!")
return rows_affected
except Exception as e:
logger.error(f"Running procedure {function_name} failed!")
logger.exception(e)
return None
finally:
connection.close()
And with this answer I found this solution also:
b) Instead of using a raw connection, this worked as well:
def call_procedure(engine, function_name, params=None):
try:
with engine.begin() as db_conn:
db_conn.execute(f"""CALL {function_name}(#out)""")
results = db_conn.execute('SELECT #out').fetchone() ## returns a tuple e.g. (285,)
rows_affected = results[0]
logger.debug(f"Running procedure {function_name} success!")
return rows_affected
except Exception as e:
logger.error(f"Running procedure {function_name} failed!")
logger.exception(e)
return None
finally:
if db_conn: db_conn.close()
If there are any advantages or drawbacks of using one of these methods over the other, please let me know in a comment.
I just wanted to add another piece of code, since I was trying to get callproc to work (using sqlalchemy) with multiple in- and out-params.
For this case I went with the callproc method using a raw connection [solution b) in my previous answer], since this functions accepts params as a list.
It could probably be done more elegantly or more pythonic in some parts, but it was mainly for getting it to work and I will probably create a function from this so I can use it for generically calling a SP with multiple in and out params.
I included comments in the code below to make it easier to understand what is going on.
In my case I decided to put the out-params in a dict so I can pass it along to the calling app in case I need to react to the results. Of course you could also include the in-params which could make sense for error logging maybe.
## some in params
function_name = 'sp_upsert'
in_param1 = 'my param 1'
in_param2 = 'abcdefg'
in_param3 = 'some-name'
in_param4 = 'some display name'
in_params = [in_param1, in_param1, in_param1, in_param1]
## out params
out_params = [
'out1_row_count'
,'out2_row_count'
,'out3_row_count'
,'out4_row_count_ins'
,'out5_row_count_upd'
]
params = copy(in_params)
## adding the outparams as integers from out_params indices
params.extend([i for i, x in enumerate(out_params)])
## the params list will look like
## ['my param 1', 'abcdefg', 'some-name', 'some display name', 0, 1, 2, 3, 4]
logger.info(params)
## build query to get results from callproc (including in and out params)
res_qry_params = []
for i in range(len(params)):
res_qry_params.append(f"#_{function_name}_{i}")
res_qry = f"SELECT {', '.join(res_qry_params)}"
## the query to fetch the results (in and out params) will look like
## SELECT #_sp_upsert_0, #_sp_upsert_1, #_sp_upsert_2, #_sp_upsert_3, #_sp_upsert_4, #_sp_upsert_5, #_sp_upsert_6, #_sp_upsert_7, #_sp_upsert_8
logger.info(res_qry)
try:
connection = engine.raw_connection()
## calling the sp
cursor = connection.cursor()
cursor.callproc(function_name, params)
## get the results (includes in and out params), the 0/1 in the end are the row_counts from the sp
## fetchone is enough since all results come as on result record like
## ('my param 1', 'abcdefg', 'some-name', 'some display name', 1, 0, 1, 1, 0)
cursor.execute(res_qry)
results = cursor.fetchone()
logger.info(results)
## adding just the out params to a dict
res_dict = {}
for i, element in enumerate(out_params):
res_dict.update({
element: results[i + len(in_params)]
})
## the result dict in this case only contains the out param results and will look like
## { 'out1_row_count': 1,
## 'out2_row_count': 0,
## 'out3_row_count': 1,
## 'out4_row_count_ins': 1,
## 'out5_row_count_upd': 0}
logger.info(pformat(res_dict, indent=2, sort_dicts=False))
cursor.close()
connection.commit()
logger.debug(f"Running procedure {function_name} success!")
except Exception as e:
logger.error(f"Running procedure {function_name} failed!")
logger.exception(e)
Just to complete the picture, here is a shortened version of my stored procedure. After BEGIN I declare some error handlers I set the out params to default 0, otherwise they could also return as NULL/None if not set by the procedure (e.g. because no insert was made):
DELIMITER //
CREATE OR REPLACE PROCEDURE sp_upsert(
IN in_param1 VARCHAR(32),
IN in_param2 VARCHAR(250),
IN in_param3 VARCHAR(250),
IN in_param4 VARCHAR(250),
OUT out1_row_count INTEGER,
OUT out2_row_count INTEGER,
OUT out3_row_count INTEGER,
OUT out4_row_count_ins INTEGER,
OUT out5_row_count_upd INTEGER
)
BEGIN
-- declare variables, do NOT declare the out params here!
DECLARE dummy INTEGER DEFAULT 0;
-- declare error handlers (e.g. continue handler for not found)
DECLARE CONTINUE HANDLER FOR NOT FOUND SET dummy = 1;
-- set out params defaulting to 0
SET out1_row_count = 0;
SET out2_row_count = 0;
SET out3_row_count = 0;
SET out4_row_count_ins = 0;
SET out5_row_count_upd = 0;
-- do inserts and updates and set the outparam variables accordingly
INSERT INTO some_table ...;
SET out1_row_count = ROW_COUNT();
-- commit if no errors
COMMIT;
END //
DELIMITER ;

Why is Twisted's adbapi failing to recover data from within unittests?

Overview
Context
I am writing unit tests for some higher-order logic that depends on writing to an SQLite3 database. For this I am using twisted.trial.unittest and twisted.enterprise.adbapi.ConnectionPool.
Problem statement
I am able to create a persistent sqlite3 database and store data therein. Using sqlitebrowser, I am able to verify that the data has been persisted as expected.
The issue is that calls to t.e.a.ConnectionPool.run* (e.g.: runQuery) return an empty set of results, but only when called from within a TestCase.
Notes and significant details
The problem I am experiencing occurs only within Twisted's trial framework. My first attempt at debugging was to pull the database code out of the unit test and place it into an independent test/debug script. Said script works as expected while the unit test code does not (see examples below).
Case 1: misbehaving unit test
init.sql
This is the script used to initialize the database. There are no (apparent) errors stemming from this file.
CREATE TABLE ajxp_changes ( seq INTEGER PRIMARY KEY AUTOINCREMENT, node_id NUMERIC, type TEXT, source TEXT, target TEXT, deleted_md5 TEXT );
CREATE TABLE ajxp_index ( node_id INTEGER PRIMARY KEY AUTOINCREMENT, node_path TEXT, bytesize NUMERIC, md5 TEXT, mtime NUMERIC, stat_result BLOB);
CREATE TABLE ajxp_last_buffer ( id INTEGER PRIMARY KEY AUTOINCREMENT, type TEXT, location TEXT, source TEXT, target TEXT );
CREATE TABLE ajxp_node_status ("node_id" INTEGER PRIMARY KEY NOT NULL , "status" TEXT NOT NULL DEFAULT 'NEW', "detail" TEXT);
CREATE TABLE events (id INTEGER PRIMARY KEY AUTOINCREMENT, type text, message text, source text, target text, action text, status text, date text);
CREATE TRIGGER LOG_DELETE AFTER DELETE ON ajxp_index BEGIN INSERT INTO ajxp_changes (node_id,source,target,type,deleted_md5) VALUES (old.node_id, old.node_path, "NULL", "delete", old.md5); END;
CREATE TRIGGER LOG_INSERT AFTER INSERT ON ajxp_index BEGIN INSERT INTO ajxp_changes (node_id,source,target,type) VALUES (new.node_id, "NULL", new.node_path, "create"); END;
CREATE TRIGGER LOG_UPDATE_CONTENT AFTER UPDATE ON "ajxp_index" FOR EACH ROW BEGIN INSERT INTO "ajxp_changes" (node_id,source,target,type) VALUES (new.node_id, old.node_path, new.node_path, CASE WHEN old.node_path = new.node_path THEN "content" ELSE "path" END);END;
CREATE TRIGGER STATUS_DELETE AFTER DELETE ON "ajxp_index" BEGIN DELETE FROM ajxp_node_status WHERE node_id=old.node_id; END;
CREATE TRIGGER STATUS_INSERT AFTER INSERT ON "ajxp_index" BEGIN INSERT INTO ajxp_node_status (node_id) VALUES (new.node_id); END;
CREATE INDEX changes_node_id ON ajxp_changes( node_id );
CREATE INDEX changes_type ON ajxp_changes( type );
CREATE INDEX changes_node_source ON ajxp_changes( source );
CREATE INDEX index_node_id ON ajxp_index( node_id );
CREATE INDEX index_node_path ON ajxp_index( node_path );
CREATE INDEX index_bytesize ON ajxp_index( bytesize );
CREATE INDEX index_md5 ON ajxp_index( md5 );
CREATE INDEX node_status_status ON ajxp_node_status( status );
test_sqlite.py
This is the unit test class that fails unexpectedly. TestStateManagement.test_db_clean passes, indicated that the tables were properly created. TestStateManagement.test_inode_create fails, reporitng that zero results were retrieved.
import os.path as osp
from twisted.internet import defer
from twisted.enterprise import adbapi
import sqlengine # see below
class TestStateManagement(TestCase):
def setUp(self):
self.meta = mkdtemp()
self.db = adbapi.ConnectionPool(
"sqlite3", osp.join(self.meta, "db.sqlite"), check_same_thread=False,
)
self.stateman = sqlengine.StateManager(self.db)
with open("init.sql") as f:
script = f.read()
self.d = self.db.runInteraction(lambda c, s: c.executescript(s), script)
def tearDown(self):
self.db.close()
del self.db
del self.stateman
del self.d
rmtree(self.meta)
#defer.inlineCallbacks
def test_db_clean(self):
"""Canary test to ensure that the db is initialized in a blank state"""
yield self.d # wait for db to be initialized
q = "SELECT name FROM sqlite_master WHERE type='table' AND name=?;"
for table in ("ajxp_index", "ajxp_changes"):
res = yield self.db.runQuery(q, (table,))
self.assertTrue(
len(res) == 1,
"table {0} does not exist".format(table)
)
#defer.inlineCallbacks
def test_inode_create_file(self):
yield self.d
path = osp.join(self.ws, "test.txt")
with open(path, "wt") as f:
pass
inode = mk_dummy_inode(path)
yield self.stateman.create(inode, directory=False)
entry = yield self.db.runQuery("SELECT * FROM ajxp_index")
emsg = "got {0} results, expected 1. Are canary tests failing?"
lentry = len(entry)
self.assertTrue(lentry == 1, emsg.format(lentry))
sqlengine.py
These are the artefacts being tested by the above unit tests.
def values_as_tuple(d, *param):
"""Return the values for each key in `param` as a tuple"""
return tuple(map(d.get, param))
class StateManager:
"""Manages the SQLite database's state, ensuring that it reflects the state
of the filesystem.
"""
log = Logger()
def __init__(self, db):
self._db = db
def create(self, inode, directory=False):
params = values_as_tuple(
inode, "node_path", "bytesize", "md5", "mtime", "stat_result"
)
directive = (
"INSERT INTO ajxp_index (node_path,bytesize,md5,mtime,stat_result) "
"VALUES (?,?,?,?,?);"
)
return self._db.runOperation(directive, params)
Case 2: bug disappears outside of twisted.trial
#! /usr/bin/env python
import os.path as osp
from tempfile import mkdtemp
from twisted.enterprise import adbapi
from twisted.internet.task import react
from twisted.internet.defer import inlineCallbacks
INIT_FILE = "example.sql"
def values_as_tuple(d, *param):
"""Return the values for each key in `param` as a tuple"""
return tuple(map(d.get, param))
def create(db, inode):
params = values_as_tuple(
inode, "node_path", "bytesize", "md5", "mtime", "stat_result"
)
directive = (
"INSERT INTO ajxp_index (node_path,bytesize,md5,mtime,stat_result) "
"VALUES (?,?,?,?,?);"
)
return db.runOperation(directive, params)
def init_database(db):
with open(INIT_FILE) as f:
script = f.read()
return db.runInteraction(lambda c, s: c.executescript(s), script)
#react
#inlineCallbacks
def main(reactor):
meta = mkdtemp()
db = adbapi.ConnectionPool(
"sqlite3", osp.join(meta, "db.sqlite"), check_same_thread=False,
)
yield init_database(db)
# Let's make sure the tables were created as expected and that we're
# starting from a blank slate
res = yield db.runQuery("SELECT * FROM ajxp_index LIMIT 1")
assert not res, "database is not empty [ajxp_index]"
res = yield db.runQuery("SELECT * FROM ajxp_changes LIMIT 1")
assert not res, "database is not empty [ajxp_changes]"
# The details of this are not important. Suffice to say they (should)
# conform to the DB schema for ajxp_index.
test_data = {
"node_path": "/this/is/some/arbitrary/path.ext",
"bytesize": 0,
"mtime": 179273.0,
"stat_result": b"this simulates a blob of raw binary data",
"md5": "d41d8cd98f00b204e9800998ecf8427e", # arbitrary
}
# store the test data in the ajxp_index table
yield create(db, test_data)
# test if the entry exists in the db
entry = yield db.runQuery("SELECT * FROM ajxp_index")
assert len(entry) == 1, "got {0} results, expected 1".format(len(entry))
print("OK")
Closing remarks
Again, upon checking with sqlitebrowser, it seems as though the data is being written to db.sqlite, so this looks like a retrieval problem. From here, I'm sort of stumped... any ideas?
EDIT
This code will produce an inode that that can be used for testing.
def mk_dummy_inode(path, isdir=False):
return {
"node_path": path,
"bytesize": osp.getsize(path),
"mtime": osp.getmtime(path),
"stat_result": dumps(stat(path), protocol=4),
"md5": "directory" if isdir else "d41d8cd98f00b204e9800998ecf8427e",
}
Okay, it turns out that this is a bit of a tricky one. Running the tests in isolation (as was posted to this question) makes it such that the bug only rarely occurs. However, when running in the context of an entire test suite, it fails almost 100% of the time.
I added yield task.deferLater(reactor, .00001, lambda: None) after writing to the db and before reading from the db, and this solves the issue.
From there, I suspected this might be a race condition stemming from the connection pool and sqlite's limited concurrency-tolerance. I tried setting the cb_min and cb_max parameters to ConnectionPool to 1, and this also solved the issue.
In short: it seems as though sqlite doesn't play very nicely with multiple connections, and that the appropriate fix is to avoid concurrency to the extent possible.
If you take a look at your setUp function, you're returning self.db.runInteraction(...), which returns a deferred. As you've noted, you assume that it waits for the deferred to finish. However this is not the case and it's a trap that most fall victim to (myself included). I'll be honest with you, for situations like this, especially for unit tests, I just execute the synchronous code outside the TestCase class to initialize the database. For example:
def init_db():
import sqlite3
conn = sqlite3.connect('db.sqlite')
c = conn.cursor()
with open("init.sql") as f:
c.executescript(f.read())
init_db() # call outside test case
class TestStateManagement(TestCase):
"""
My test cases
"""
Alternatively, you could decorate the setup and yield runOperation(...) but something tells me that it wouldn't work... In any case, it's surprising that no errors were raised.
PS
I've been eyeballing this question for a while and it's been in the back of my head for days now. A potential reason for this finally dawned on me at nearly 1am. However, I'm too tired/lazy to actually test this out :D but it's a pretty damn good hunch. I'd like to commend you on your level of detail in this question.

Python Mocking db connection/unknown type in unit test

Newby to python here.
My class uses a database connection to wrap some functions. I have figured out some basic examples successfully. For the more complex library that I am working with, I cannot find close examples of mocking the database connection. In mine, the
class DBSAccess():
def __init__(self, db_con):
self.db_con = db_con
def get_db_perm(self, target_user):
## this is where I start having trouble
with self.db_con.cursor() as cursor:
cursor.execute("SELECT CAST(sum(maxperm) AS bigint) \
FROM dbc.diskspace \
WHERE databasename = '%s' \
GROUP BY databasename" % (target_user))
res = cursor.fetchone()
if res is not None:
return res[0]
else:
msg = target_user + " does not exist"
return msg
where db_con is a teradata.UdaExec returns a connection
udaExec = teradata.UdaExec (appName="whatever", version="1.0", logConsole=True)
db_con = udaExec.connect(method="odbc", system='my_sys', username='my_name', password='my_pswd')
dbc_instance = tdtestpy.DBSaccess (db_con)
So for my test to not use any real connection, I have to mock some things out. I tried this combination:
class DBAccessTest(unittest.TestCase):
def test_get_db_free_perm_expects_500(self):
uda_exec = mock.Mock(spec=teradata.UdaExec)
db_con = MagicMock(return_value=None)
db_con.cursor.fetchone.return_value = [500]
uda_exec.connect.return_value = db_con
self.dbc_instance = DBSAccess(db_con)
self.assertEqual(self.dbc_instance.get_db_free_perm("dbc"), 500)
but my result is messed up because fetchone is returning a mock, not the [500] one item list I was expecting:
AssertionError: <MagicMock name='mock.connect().cursor().[54 chars]312'> != 500
I've found some examples where there is a 'with block' for testing an OS operation, but nothing with database. Plus, I don't know what data type the db_con.cursor is so I can't spec that precisely - I think the cursor is found in UdaExecConnection.cursor() found at Teradata/PyTd.
I need to know how to mock the response that will allow me to test the logic within my method.
The source of your problem is in the following line:
with self.db_con.cursor() as cursor:
with lines calls __enter__ method, which generate in your case a new mock.
The solution is to mock __enter__ method:
db_con.cursor.return_value.__enter__.return_value = cursor
Your tests:
class DBAccessTest(unittest.TestCase):
def test_get_db_free_perm_expects_500(self):
db_con = MagicMock(UdaExecConnection)
cursor = MagicMock(UdaExecCursor)
cursor.fetchone.return_value = [500]
db_con.cursor.return_value.__enter__.return_value = cursor
self.dbc_instance = DBSAccess(db_con)
self.assertEqual(self.dbc_instance.get_db_perm("dbc"), 500)
def test_get_db_free_perm_expects_None(self):
db_con = MagicMock(UdaExecConnection)
cursor = MagicMock(UdaExecCursor)
cursor.fetchone.return_value = None
db_con.cursor.return_value.__enter__.return_value = cursor
self.dbc_instance = DBSAccess(db_con)
self.assertEqual(self.dbc_instance.get_db_perm("dbc"), "dbc does not exist")

Categories

Resources