testing postgres db python - python

I don't understand how to test my repositories.
I want to be sure that I really saved object with all of it parameters into database, and when I execute my SQL statement I really received what I am supposed to.
But, I cannot put "CREATE TABLE test_table" in setUp method of unittest case because it will be created multiple times (tests of the same testcase are runned in parallel). So, as long as I create 2 methods in the same class which needs to work on the same table, it won't work (name clash of tables)
Same, I cannot put "CREATE TABLE test_table" setUpModule, because, now the table is created once, but since tests are runned in parallel, there is nothing which prevents from inserting the same object multiple times into my table, which breakes the unicity constraint of some field.
Same, I cannot "CREATE SCHEMA some_random_schema_name" in every method, because I need to globally "SET search_path TO ..." for a given Database, so every method runned in parallel will be affected.
The only way I see is to create to "CREATE DATABASE" for each test, and with unique name, and establish a invidual connection to each database.. This looks extreeeemly wasteful. Is there a better way?
Also, I cannot use SQLite in memory because I need to test PostgreSQL.

The best solution for this is to use the testing.postgresql module. This fires up a db in user-space, then deletes it again at the end of the run. You can put the following in a unittest suite - either in setUp, setUpClass or setUpModule - depending on what persistence you want:
import testing.postgresql
def setUp(self):
self.postgresql = testing.postgresql.Postgresql(port=7654)
# Get the url to connect to with psycopg2 or equivalent
print(self.postgresql.url())
def tearDown(self):
self.postgresql.stop()
If you want the database to persist between/after tests, you can run it with the base_dir option to set a directory - which will prevent it's removal after shutdown:
name = "testdb"
port = "5678"
path = "/tmp/my_test_db"
testing.postgresql.Postgresql(name=name, port=port, base_dir=path)
Outside of testing it can also be used as a context manager, where it will automatically clean up and shut down when the with block is exited:
with testing.postgresql.Postgresql(port=7654) as psql:
# do something here

Related

pyscopg2 WITHOUT transaction

Sometimes I have a need to execute a query from psycopg2 that is not in a transaction block.
For example:
cursor.execute('create index concurrently on my_table (some_column)')
Doesn't work:
InternalError: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
I don't see any easy way to do this with psycopg2. What am I missing?
I can probably call os.system('psql -c "create index concurrently"') or something similar to get it to run from my python code, however it would be much nicer to be able to do it inside python and not rely on psql to actually be in the container.
Yes, I have to use the concurrently option for this particular use case.
Another time I've explored this and not found an obvious answer is when I have a set of sql commands that I'd like to call with a single execute(), where the first one briefly locks a resource. When I do this, that resource will remain locked for the entire duration of the execute() rather than for just when the first statement in the sql string was running because they all run together in one big happy transaction.
In that case I could break the query up into a series of execute() statements - each became its own transaction, which was ok.
It seems like there should be a way, but I seem to be missing it. Hopefully this is an easy answer for someone.
EDIT: Add code sample:
#!/usr/bin/env python3.10
import psycopg2 as pg2
# -- set the standard psql environment variables to specify which database this should connect to.
# We have to set these to 'None' explicitly to get psycopg2 to use the env variables
connDetails = {'database': None, 'host': None, 'port': None, 'user': None, 'password': None}
with (pg2.connect(**connDetails) as conn, conn.cursor() as curs):
conn.set_session(autocommit=True)
curs.execute("""
create index concurrently if not exists my_new_index on my_table (my_column);
""")
Throws:
psycopg2.errors.ActiveSqlTransaction: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
Per psycopg2 documentation:
It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committed and no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM, CALL on stored procedures using transaction control…) require to be run outside any transaction: in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can use the autocommit property.
Hence on the connection:
conn.set_session(autocommit=True)
Further resources from psycopg2 documentation:
transactions-control
connection.autocommit

Import Excel file into MSSQL via Python, using an SQL Agent job

Task specs:
Import Excel file(s) into MSSQL database(s) using Python, but in a parametrized manner, and using SQL Server Agent job(s).
With the added requirement to set parameter values and/or run the job steps from SQL (query or SP).
And without using Access Database Engine(s) and/or any code that makes use of such drivers (in any wrapping).
First. Let's get some preparatory stuff out of the way.
We will need to set some PowerShell settings.
Run windows PowerShell as Administrator and do:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
Second. Some assumptions for reasons of clarity.
And those are:
1a. You have at least one instance of SQL2017 or later (Developer / Enterprise / Standard edition) installed and running on your machine.
1b. You have not bootstrapped the installation of this SQL instance so as to exclude Integration Services (SSIS).
1c. There exists SQL Server Agent running, bound to this SQL instance.
1d. You have some SSMS installed.
2a. There is at least one database attached to this instance (if not create one – please refrain from using in-memory filegroups for this exercise, I have not tested on those).
2b. There are no database level DML triggers that log all data changes in a designated table.
3. There is no active Server Audit Specification for this database logging everything we do.
4. Replication is not enabled (I mean the proper MSSQL replication feature not like scripts by 3rd party apps).
For 2b and 3 it's just cause I have not tested this with those on, but for number 4 it defo won't work with that on.
5. You are windows authenticated into the chosen SQL instance and your instance login and db mappings and privileges are sufficient for at least table creation and basic stuff.
Third.
We are going to need some kind of Python script to do this right?
Ok let's make one.
import pandas as pd
import sqlalchemy as sa
import urllib
import sys
import warnings
import os
import re
import time
#COMMAND LINE PARAMETERS
server = sys.argv[1]
database = sys.argv[2]
ExcelFileHolder = sys.argv[3]
SQLTableName = sys.argv[4]
#END OF COMMAND LINE PARAMETERS
excel_sheet_number_left_to_right = 0
warnings.filterwarnings('ignore')
driver = "SQL Server Native Client 11.0"
params = "DRIVER={%s};SERVER=%s;DATABASE=%s;Trusted_Connection=yes;QuotedID=Yes;" % (driver, server, database) #added the explicit "QuotedID=Yes;" to ensure no issues with column names
params = urllib.parse.quote_plus(params) #urllib.parse.quote_plus for Python 3
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s?charset=utf8" % params) #charset is cool to have here
conn = engine.connect()
def execute_sql_trans(sql_string, log_entry):
with conn.begin() as trans:
result = conn.execute(sql_string)
if len(log_entry) >= 1:
log.write(log_entry + "\n")
return result
excelfilesCursor = {}
def process_excel_file(excelfile, excel_sheet_name, tableName, withPyIndexOrSQLIndex, orderByCandidateFields):
withPyIndexOrSQLIndex = 0
excelfilesCursor.update({tableName: withPyIndexOrSQLIndex})
df = pd.read_excel(open(excelfile,'rb'), sheet_name=excel_sheet_name)
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Reading file \"" + excelfile + "\" to memory: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
df.to_sql(tableName, engine, if_exists='replace', index_label='index.py')
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Writing file \"" + excelfile + "\", sheet " +str(excel_sheet_name)+ " to SQL instance " +server+ ", into ["+database+"].[dbo].["+tableName+"]: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
def convert_datetimes_to_dates(tableNameParam):
sql_string = "exec [convert_datetimes_to_dates] '"+tableNameParam+"';"
execute_sql_trans(sql_string, "")
process_excel_file(ExcelFileHolder, excel_sheet_number_left_to_right, SQLTableName, 0, None)
sys.exit(0)
Ok you may or may not notice that my script contains some extra defs, I sometimes use them for convenience you may as well ignore them.
Save the python script somewhere nice say C:\PythonWorkspace\ExcelPythonToSQL.py
Also, needless to mention that you will need some py modules in your venv. The ones you don't already have you need to pip install them obviously.
Fourth.
Connect to your db, SSMS, etc. and create a new Agent job.
Let's call it "ExcelPythonToSQL".
New step, let's call it "PowerShell parametrized script".
Set the Type to PowerShell.
And place this code inside it:
$pyFile="C:\PythonWorkspace\ExcelPythonToSQL.py"
$SQLInstance="SomeMachineName\SomeNamedSQLInstance"
#or . or just the computername or localhost if your SQL instance is a default instance i.e. not a named one.
$dbName="SomeDatabase"
$ExcelFileFullPath="C:\Temp\ExampleExcelFile.xlsx"
$targetTableName="somenewtable"
C:\ProgramData\Miniconda3\envs\YOURVENVNAMEHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName
Save the step and the job.
Now let's wrap it around something easier to handle. Because remember, this job and step is not like an SSIS step where you could potentially alter the parameter values in its configuration tab. You don't want to properties the job and the step each time and specify different excel file or target table.
So.
Ah also, do me a solid and do this little trick. Do a small alteration in the code, anything and then instead of OK do a Script to New Query Window. That way we can capture the guid of the job without having to query for it.
So now.
Create a SP like so:
use [YourDatabase];
GO
create proc [ExcelPythonToSQL_amend_job_step_params]( #pyFile nvarchar(max),
#SQLInstance nvarchar(max),
#dbName nvarchar(max),
#ExcelFileFullPath nvarchar(max),
#targetTableName nvarchar(max)='somenewtable'
)
as
begin
declare #sql nvarchar(max);
set #sql = '
exec msdb.dbo.sp_update_jobstep #job_id=N''7f6ff378-56cd-4a8d-ba40-e9057439a5bc'', #step_id=1,
#command=N''
$pyFile="'+#pyFile+'"
$SQLInstance="'+#SQLInstance+'"
$dbName="'+#dbName+'"
$ExcelFileFullPath="'+#ExcelFileFullPath+'"
$targetTableName="'+#targetTableName+'"
C:\ProgramData\Miniconda3\envs\YOURVENVGOESHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName''
';
--print #sql;
exec sp_executesql #sql;
end
But inside it you must replace 2 things. One, the global uniqueidentifier for the Agent job that you found by doing the trick I described earlier, yes the one with the script to new query window. Two, you must fill in the name of your Python venv replacing the word YOURVENVGOESHERE in the code.
Cool.
Now, with a simple script we can play-test it.
Let's have in a new query window something like this:
use [YourDatabase];
GO
--to set parameters
exec [ExcelPythonToSQL_amend_job_step_params] #pyFile='C:\PythonWorkspace\ExcelPythonToSQL.py',
#SQLInstance='.',
#dbName='YourDatabase',
#ExcelFileFullPath='C:\Temp\ExampleExcelFile.xlsx',
#targetTableName='somenewtable';
--to execute the job
exec msdb.dbo.sp_start_job N'ExcelPythonToSQL', #step_name = N'PowerShell parametrized script';
--let's test that the table is there and drop it.
if object_id('YourDatabase..somenewtable') is not null
begin
select 'Table was here!' [test: table exists?];
drop table [somenewtable];
end
else select 'NADA!' [test: table exists?];
You can run the set parameters part, then the execution, carefull to then wait a little bit like a few seconds, calling the sp_start_job like in this script is asynchronous. And then run the test script to clean up and make sure it had gone in.
That's it.
Obviously lots of variations are possible.
Like in the job step, we could instead call a batch file, we could call a powershell .ps1 file and have the parameters in there, lots and lots of other ways of doing it. I merely described one in this post.

Pytest-django: cannot delete db after tests

I have a Django application, and I'm trying to test it using pytest and pytest-django. However, quite often, when the tests finish running, I get the error that the database failed to be deleted: DETAIL: There is 1 other session using the database.
Basically, the minimum test code that I could narrow it down to is:
#pytest.fixture
def make_bundle():
a = MyUser.objects.create(key_id=uuid.uuid4())
return a
class TestThings:
def test_it(self, make_bundle):
all_users = list(MyUser.objects.all())
assert_that(all_users, has_length(1))
Every now and again the tests will fail with the above error. Is there something I am doing wrong? Or how can I fix this?
The database that I am using is PostgreSQL 9.6.
I am posting this as an answer because I need to post a chunk of code and because this worked. However, this looks like a dirty hack to me, and I'll be more than happy to accept anybody else's answer if it is better.
Here's my solution: basically, add the raw sql that kicks out all the users from the given db to the method that destroys the db. And do that by monkeypatching. To ensure that the monkeypatching happens before tests, add that to the root conftest.py file as an autouse fixture:
def _destroy_test_db(self, test_database_name, verbosity):
"""
Internal implementation - remove the test db tables.
"""
# Remove the test database to clean up after
# ourselves. Connect to the previous database (not the test database)
# to do so, because it's not allowed to delete a database while being
# connected to it.
with self.connection._nodb_connection.cursor() as cursor:
cursor.execute(
"SELECT pg_terminate_backend(pg_stat_activity.pid) "
"FROM pg_stat_activity "
"WHERE pg_stat_activity.datname = '{}' "
"AND pid <> pg_backend_pid();".format(test_database_name)
)
cursor.execute("DROP DATABASE %s"
% self.connection.ops.quote_name(test_database_name))
#pytest.fixture(autouse=True)
def patch_db_cleanup():
creation.BaseDatabaseCreation._destroy_test_db = _destroy_test_db
Note that the kicking-out code may depend on your database engine, and the method that needs monkeypatching may be different in different Django versions.

How to executescript in sqlite3 from Python transactionally? [duplicate]

Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.

Tornado. Django-like testrunner and test database

I like django unit tests, cause they create and drop test database on run.
What ways to create test database for tornado exists?
UPD: I'm interested in postgresql test database creation on test run.
I found the easiest way is just to use a SQL dump for the test database. Create a database, populate it with fixtures, and write it out to file. Simply call load_fixtures before you run your tests (or whenever you want to reset the DB). This method can certainly be improved, but it's been good enough for my needs.
import os
import unittest2
import tornado.database
settings = dict(
db_host="127.0.0.1:3306",
db_name="testdb",
db_user="testdb",
db_password="secret",
db_fixtures_file=os.path.join(os.path.dirname(os.path.abspath(__file__)), 'fixtures.sql'),
)
def load_fixtures():
"""Fixtures are stored in an SQL dump.
"""
os.system("psql %s --user=%s --password=%s < %s" % (settings['db_name'],
settings['db_user'], settings['db_password'], settings['db_fixtures_file']))
return tornado.database.Connection(
host=settings['db_host'], database=settings['db_name'],
user=settings['db_user'], password=settings['db_password'])

Categories

Resources