Trying to insert into temp tables with SQLAlchemy (plugged into PYODBC/sql server) and inserting more than one row with decimal values and fast_executemany=True throws:
ProgrammingError("(pyodbc.ProgrammingError) ('Converting decimal loses precision', 'HY000')")
This happens only in temp table with fast_executemany=True and multiple rows being inserted at once with one column being decimal. Inserting one at a time, turning fast_executemany off or inserting into a regular table works perfectly.
I've built a simple example:
CONNSTR = "mssql+pyodbc://user:PASSWORD#SERVER?driver=ODBC Driver 17 for SQL Server&trusted_connection=yes"
def test():
data = [(1, Decimal('41763.9907359278'), Decimal('227367.1749095026')), (1027, Decimal('3117.1592020142'), Decimal('16970.1139430488'))]
engine = sqlalchemy.create_engine(CONNSTR, fast_executemany=True, connect_args={'connect_timeout': 10})
#this will fail
insert(engine, data, "#temp_table_test")
#this will work
insert(engine, data, "regular_table_test")
def insert(engine, data, table_name):
try:
with engine.begin() as con:
con.execute(f"""DROP TABLE IF EXISTS {table_name};""")
con.execute(f"""
CREATE TABLE {table_name} (
[id_column] INT NULL UNIQUE,
[usd_price] DECIMAL(38,20) NULL,
[brl_price] DECIMAL(38,20) NULL,
)
""")
sql_insert_prices = f"INSERT INTO {table_name} VALUES (?,?,?)"
con.execute(sql_insert_prices, data)
print(f"Insert em {table_name} worked!")
except Exception as e:
print(f"{e!r}")
print(f"Insert em {table_name} failed!")
While obviously related to the minimal conversion mechanisms done by fast execute, I can't find out why this runs differently depending on the type of table. Every other question here citing this particular exception is caused by other factors not present here I think so I'm really at a loss.
EDIT: so the original test with just one decimal column ran fine (I assumed reducing the number of columns wouldn't change the output), but adding another decimal column brings me back to square one with the same error message
fast_executemany=True asks the ODBC driver what the column types are, and the default mechanism used by Microsoft's ODBC drivers for SQL Server is to call a system stored procedure named sp_describe_undeclared_parameters. That stored procedure has some difficulties with #local_temp tables that do not occur with regular tables or ##global_temp tables. Details in this GitHub issue.
As mentioned in the related wiki entry, workarounds include
using Cursor.setinputsizes() to explicitly declare the column types,
using a ##global_temp table instead of a #local_temp table, or
adding UseFMTONLY=Yes to the ODBC connection string.
The easiest way to enable UseFMTONLY with SQLAlchemy is to use a pass-through pyodbc connection string, for example
from sqlalchemy.engine import URL
connection_string = (
"DRIVER=ODBC Driver 17 for SQL Server;"
"SERVER=192.168.0.199;"
"DATABASE=test;"
"UID=scott;PWD=tiger^5HHH;"
"UseFMTONLY=Yes;"
)
connection_url = URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url, fast_executemany=True)
Related
I use pyodbc in my python flask Project for the SQLite DB connection.
I know and understand SQL Injections but this is my first time dealing with it.
I tried to execute some
I have a function which concatenates the SQL String in my database.py file:
def open_issue(self, data_object):
cursor = self.conn.cursor()
# data_object is the issue i get from the user
name = data_object["name"]
text = data_object["text"]
rating_sum = 0
# if the user provides an issue
if name:
# check if issue is already in db
test = cursor.execute(f'''SELECT name FROM issue WHERE name = "{name}"''')
data = test.fetchall()
# if not in db insert
if len(data) == 0:
# insert the issue
cursor.executescript(f'''INSERT INTO issue (name, text, rating_sum)
VALUES ("{name}", "{text}", {rating_sum})''')
else:
print("nothing inserted!")
In the api.py file the open_issue() function gets called:
#self.app.route('/open_issue')
def insertdata():
# data sent from client
# data_object = flask.request.json
# unit test dictionary
data_object = {"name": "injection-test-table",
"text": "'; CREATE TABLE 'injected_table-1337';--"}
DB().open_issue(data_object)
The "'; CREATE TABLE 'injected_table-1337';--" sql injection has not created the injected_table-1337, instead it got inserted normally like a string into the text column of the injection-test-table.
So i don't really know if i am safe for the standard ways of SQL injection (this project will only be hosted locally but good security is always welcome)
And secondary: are there ways with pyodbc to check if a string contains sql syntax or symbols, so that nothing will get inserted in my example or do i need to check the strings manually?
Thanks a lot
As it turns out, with SQLite you are at much less risk of SQL injection issues because by default neither Python's built-in sqlite3 module nor the SQLite ODBC driver allow multiple statements to be executed in a single .execute call (commonly known as an "anonymous code block"). This code:
thing = "'; CREATE TABLE bobby (id int primary key); --"
sql = f"SELECT * FROM table1 WHERE txt='{thing}'"
crsr.execute(sql)
throws this for sqlite3
sqlite3.Warning: You can only execute one statement at a time.
and this for SQLite ODBC
pyodbc.Error: ('HY000', '[HY000] only one SQL statement allowed (-1) (SQLExecDirectW)')
Still, you should follow best practices and use a proper parameterized query
thing = "'; CREATE TABLE bobby (id int primary key); --"
sql = "SELECT * FROM table1 WHERE txt=?"
crsr.execute(sql, (thing, ))
because this will also correctly handle parameter values that would cause errors if injected directly, e.g.,
thing = "it's good to avoid SQL injection"
I have a query in a python script that creates a materialized view after some tables get created.
Script is something like this:
from sqlalchemy import create_engine, text
sql = '''CREATE MATERIALIZED VIEW schema1.view1 AS
SELECT t1.a,
t1.b,
t1.c,
t2.x AS d
FROM schema1.t1 t1
LEFT JOIN schema1.t2 t2 ON t1.f = t2.f
UNION ALL
SELECT t3.a,
t3.b,
t3.c,
t3.d
FROM schema1.t3 t3;'''
con=create_engine(db_conn)
con.execute(sql)
The query successfully executes when I run on the database directly.
But when running the script in python, I get an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "CREATE MATERIALIZED VIEW schema"
I can't for the life of me figure out what it has an issue with - any ideas?
This was the weirdest thing. I had copied my query text out of another tool that I use to navigate around my pg DB into VS Code. The last part of the answer by #EOhm gave me the idea to just type the whole thing out in VS Code instead of copy/pasting.
And everything worked.
Even though the pasted text and what I typed appear identical in every way. So apparently there was some invisible formatting causing this issue.
I don't know wether SQLAlchemy suports MView-Creation, but if it should be similiar or done with specific Metadata functions (https://docs.sqlalchemy.org/en/13/core/schema.html).
The text function is designed for database indepenendent DML, not DDL. Maybe it works for DDL (I don't know about SQLAlchemy) but by design the syntax is different than when You would execute directly on the database as SQLAlchemy shall abstract the details of databases from user.
If SQLAlchemy does no offer some convenient way for that and You nevertheless have valid reasons to use SQLAlchemy at all, You can just execute the plain SQL Statememt in the dialect the database backend understands, so just omit the sqlalchemies text function for the SQL statement, like:
from sqlalchemy import create_engine, text
sql = '''CREATE MATERIALIZED VIEW schema.view1 AS
SELECT t1.a,
t1.b,
t1.c
t2.x AS d
FROM schema.t1 t1
LEFT JOIN schema.t2 t2 ON t1.f = t2.f
UNION ALL
SELECT t3.a,
t3.b,
t3.c,
t3.d
FROM schema.t3 t3;'''
con=create_engine(db_conn)
con.raw_connection().cursor().execute(sql)
(But of course You have to take care for the backend type then opposed to the SQLAlchemy wrapped statements.)
I tested on my pg server without any issues using psycopg2 directly.
postgres=# create schema schema;
CREATE TABLE
postgres=# create table schema.t1 (a varchar, b varchar, c varchar, f integer);
CREATE TABLE
postgres=# create table schema.t2 (x varchar, f integer);
CREATE TABLE
postgres=# create table schema.t3 (a varchar, b varchar, c varchar, d varchar);
CREATE TABLE
postgres=# commit;
With the following script:
#!/usr/bin/python3
import psycopg2;
conn = psycopg2.connect("dbname=postgres")
cur = conn.cursor()
cur.execute("""
CREATE MATERIALIZED VIEW schema.view1 AS
SELECT t1.a,
t1.b,
t1.c,
t2.x AS d
FROM schema.t1 t1
LEFT JOIN schema.t2 t2 ON t1.f = t2.f
UNION ALL
SELECT t3.a,
t3.b,
t3.c,
t3.d
FROM schema.t3 t3;
""")
conn.commit()
cur.close()
conn.close()
I tested with quite current versions of python3.7/2.7 and current version of psycopg2 module and current libraries (I have 11.5 pg client and 2.8.3 psycopg2) from pgdg installed on a quite recent linux? Can You try to execute directly on psycopg2 like I did?
Also did You make sure Your dots are plain ascii dots as all the other characters in the statement are in this case? (Also keep in mind there can be invisible codepoints in unicode that can cause such sort of problems.) Maybe You can convert Your string to ASCII binary and back to Unicode-String if You are on Python. If it does not raise an error on .encode('ASCII') it should be clean.
I want to drop a table (if it exists) before writing some data in a Pandas dataframe:
def store_sqlite(in_data, dbpath = 'my.db', table = 'mytab'):
database = sqlalchemy.create_engine('sqlite:///' + dbpath)
## DROP TABLE HERE
in_data.to_sql(name = table, con = database, if_exists = 'append')
database.close()
The SQLAlchemy documentation all points to a Table.drop() object - how would I create that object, or equivalently is there an alternative way to drop this table?
Note : I can't just use if_exists = 'replace' as the input data is actually a dict of DataFrames which I loop over - I've suppressed that code for clarity (I hope).
From the panda docs;
"You can also run a plain query without creating a dataframe with execute(). This is useful for queries that don’t return values, such as INSERT. This is functionally equivalent to calling execute on the SQLAlchemy engine or db connection object."
http://pandas.pydata.org/pandas-docs/version/0.18.0/io.html#id3
So I do this;
from pandas.io import sql
sql.execute('DROP TABLE IF EXISTS %s'%table, engine)
sql.execute('VACUUM', engine)
Where "engine" is the SQLAlchemy database object (the OP's "database" above). Vacuum is optional, just reduces the size of the sqlite file (I use the table drop part infrequently in my code).
You should be able to create a cursor from your SQLAlchemy engine
import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///' + dbpath)
connection = engine.raw_connection()
cursor = connection.cursor()
command = "DROP TABLE IF EXISTS {};".format(table)
cursor.execute(command)
connection.commit()
cursor.close()
# Now you can chunk upload your data as you wish
in_data.to_sql(name=table, con=engine, if_exists='append')
If you're loading a lot of data into your db, you may find it faster to use pandas' to_csv() and SQL's copy_from function. You can also use StringIO() to hold it in memory and having to write the file.
I have to call a MS SQLServer stored procedure with a table variable parameter.
/* Declare a variable that references the type. */
DECLARE #TableVariable AS [AList];
/* Add data to the table variable. */
INSERT INTO #TableVariable (val) VALUES ('value-1');
INSERT INTO #TableVariable (val) VALUES ('value-2');
EXEC [dbo].[sp_MyProc]
#param = #TableVariable
Works well in the SQL Sv Management studio. I tried the following in python using PyOdbc:
cursor.execute("declare #TableVariable AS [AList]")
for a in mylist:
cursor.execute("INSERT INTO #TableVariable (val) VALUES (?)", a)
cursor.execute("{call dbo.sp_MyProc(#TableVariable)}")
With the following error: error 42000 : the table variable must be declared. THe variable does not survive the different execute steps.
I also tried:
sql = "DECLARE #TableVariable AS [AList]; "
for a in mylist:
sql = sql + "INSERT INTO #TableVariable (val) VALUES ('{}'); ".format(a)
sql = sql + "EXEC [dbo].[sp_MyProc] #param = #TableVariable"
cursor.execute(sql)
With the following error: No results. Previous SQL was not a query.
No more chance with
sql = sql + "{call dbo.sp_MyProc(#TableVariable)}"
does somebody knows how to handle this using Pyodbc?
Now the root of your problem is that a SQL Server variable has the scope of the batch it was defined in. Each call to cursor.execute is a separate batch, even if they are in the same transaction.
There are a couple of ways you can work around this. The most direct is to rewrite your Python code so that it sends everything as a single batch. (I tested this on my test server and it should work as long as you either add set nocount on or else step over the intermediate results with nextset.)
A more indirect way is to rewrite the procedure to look for a temp table instead of a table variable and then just create and populate the temp table instead of a table variable. A temp table that is not created inside a stored procedure has a scope of the session it was created in.
I believe this error has nothing to do with sql forgetting the table variable. I've experienced this recently, and the problem was that pyodbc doesnt know how to get a resultset back from the stored procedure if the SP also returns counts for the things affected.
In my case the fix for this was to simply put "SET NOCOUNT ON" at the start of the SP.
I hope this helps.
I am not sure if this works and I can't test it because I don't have MS SQL Server, but have you tried executing everything in a single statement:
cursor.execute("""
DECLARE #TableVariable AS [AList];
INSERT INTO #TableVariable (val) VALUES ('value-1');
INSERT INTO #TableVariable (val) VALUES ('value-2');
EXEC [dbo].[sp_MyProc] #param = #TableVariable;
""");
I had this same problem, but none of the answers here fixed it. I was unable to get "SET NOCOUNT ON" to work, and I was also unable to make a single batch operation working with a table variable. What did work was to use a temporary table in two batches, but it all day to find the right syntax. The code which follows creates and populates a temporary table in the first batch, then in the second, it executes a stored proc using the database name followed by two dots before the stored proc name. This syntax is important for avoiding the error, "Could not find stored procedure 'x'. (2812) (SQLExecDirectW))".
def create_incidents(db_config, create_table, columns, tuples_list, upg_date):
"""Executes trackerdb-dev mssql stored proc.
Args:
config (dict): config .ini file with mssqldb conn.
create_table (string): temporary table definition to be inserted into 'CREATE TABLE #TempTable ()'
columns (tuple): columns of the table table into which values will be inserted.
tuples_list (list): list of tuples where each describes a row of data to insert into the table.
upg_date (string): date on which the items in the list will be upgraded.
Returns:
None
"""
sql_create = """IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
DROP TABLE #TempTable;
CREATE TABLE #TempTable ({});
INSERT INTO #TempTable ({}) VALUES {};
"""
columns = '"{}"'.format('", "'.join(item for item in columns))
# this "params" variable is an egregious offense against security professionals everywhere. Replace it with parameterized queries asap.
params = ', '.join([str(tupl) for tupl in tuples_list])
sql_create = sql_create.format(
create_table
, columns
, params)
msconn.autocommit = True
cur = msconn.cursor()
try:
cur.execute(sql_create)
cur.execute("DatabaseName..TempTable_StoredProcedure ?", upg_date)
except pyodbc.DatabaseError as err:
print(err)
else:
cur.close()
return
create_table = """
int_column int
, name varchar(255)
, datacenter varchar(25)
"""
create_incidents(
db_config = db_config
, create_table = create_table
, columns = ('int_column', 'name', 'datacenter')
, cloud_list = tuples_list
, upg_date = '2017-09-08')
The stored proc uses IF OBJECT_ID('tempdb..#TempTable') IS NULL syntax to validate the temporary table has been created. If it has, the procedure selects data from it and continues. If the temporary table has not been created, the proc aborts. This forces the stored proc to use a copy of the #TempTable created outside the stored procedure itself but in the same session. The pyodbc session lasts until the cursor or connection is closed and the temporary table created by pyodbc has the scope of the entire session.
IF OBJECT_ID('tempdb..#TempTable') IS NULL
BEGIN
-- #TempTable gets created here only because SQL Server Management Studio throws errors if it isn't.
CREATE TABLE #TempTable (
int_column int
, name varchar(255)
, datacenter varchar(25)
);
-- This error is thrown so that the stored procedure requires a temporary table created *outside* the stored proc
THROW 50000, '#TempTable table not found in tempdb', 1;
END
ELSE
BEGIN
-- the stored procedure has now validated that the temporary table being used is coming from outside the stored procedure
SELECT * FROM #TempTable;
END;
Finally, note that "tempdb" is not a placeholder, like I thought when I first saw it. "tempdb" is an actual MS SQL Server database system object.
Set connection.autocommit = True and use cursor.execute() only once instead of multiple times. The SQL string that you pass to cursor.execute() must contain all 3 steps:
Declaring the table variable
Filling the table variable with data
Executing the stored procedure that uses that table variable as an input
You don't need semicolons between the 3 steps.
Here's a fully functional demo. I didn't bother with parameter passing since it's irrelevant, but it also works fine with this, for the record.
SQL Setup (execute ahead of time)
CREATE TYPE dbo.type_MyTableType AS TABLE(
a INT,
b INT,
c INT
)
GO
CREATE PROCEDURE dbo.CopyTable
#MyTable type_MyTableType READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT * INTO MyResultTable FROM #MyTable
END
python
import pyodbc
CONN_STRING = (
'Driver={SQL Server Native Client 11.0};'
'Server=...;Database=...;UID=...;PWD=...'
)
class DatabaseConnection(object):
def __init__(self, connection_string):
self.conn = pyodbc.connect(connection_string)
self.conn.autocommit = True
self.cursor = self.conn.cursor()
def __enter__(self):
return self.cursor
def __exit__(self, *args):
self.cursor.close()
self.conn.close()
sql = (
'DECLARE #MyTable type_MyTableType'
'\nINSERT INTO #MyTable VALUES'
'\n(11, 12, 13),'
'\n(21, 22, 23)'
'\nEXEC CopyTable #MyTable'
)
with DatabaseConnection(CONN_STRING) as cursor:
cursor.execute(sql)
If you want to spread the SQL across multiple calls to cursor.execute(), then you need to use a temporary table instead. Note that in that case, you still need connection.autocommit = True.
As Timothy pointed out the catch is to use nextset().
What I have found out is that when you execute() a multiple statement query, pyodbc checks (for any syntax errors) and executes only the first statement in the batch but not the entire batch unless you explicitly specify nextset().
say your query is :
cursor.execute('select 1 '
'select 1/0')
print(cursor.fetchall())
your result is:
[(1, )]
but as soon as you instruct it to move further in the batch that is the syntactically erroneous part via the command:
cursor.nextset()
there you have it:
pyodbc.DataError: ('22012', '[22012] [Microsoft][ODBC SQL Server Driver][SQL Server]Divide by zero error encountered. (8134) (SQLMoreResults)')
hence solves the issue that I encountered with working with variable tables in a multiple statement query.
I’m looking for a way to take a result set and use it to find records in a table that resides in SQL Server 2008 – without spinning through the records one at a time. The result sets that will be used to find the records could number in the hundreds of thousands. So far I am pursuing creating a table in memory using sqlite3 and then trying to feed that table to a stored procedure that takes a table valued parameter. The work on the SQL Server side is done, the user defined type is created, the test procedure accepting a table valued parameter exists and I’ve tested it through TSQL and it appears to work just fine. In Python a simple in memory table was created through sqlite3. Now the catch, the only documentation I have found for accessing a stored procedure with a table valued parameter uses ADO.Net and VB, nothing in Python. Unfortunately, I’m not enough of a programmer to translate. Has anyone used a SQL Server stored procedure with a table valued parameter? Is there another approach I should look into?
Here are some links:
Decent explanation of table valued parameters and how to set them up in SQL and using in .Net
http://www.sqlteam.com/article/sql-server-2008-table-valued-parameters
http://msdn.microsoft.com/en-us/library/bb675163.aspx#Y2142
Explanation of using ADO in Python – almost what I need, just need the structured parameter type.
http://www.mayukhbose.com/python/ado/ado-command-3.php
My simple code
--TSQL to create type on SQL database
create Type PropIDList as Table
(Prop_Id BigInt primary key)
--TSQL to create stored procedure on SQL database. Note reference to
create procedure PropIDListTest #PIDList PropIDList READONLY
as
SET NOCOUNT ON
select * from
#PIDList p
SET NOCOUNT OFF
--TSQL to test objects.
--Declare variable as user defined type (table that has prop_id)
declare #pidlist as propidlist
--Populate variable
insert into #pidlist(prop_id)
values(1000)
insert into #pidlist(prop_id)
values(2000)
--Pass table variable to stored procedure
exec PropIDListTest #pidlist
Now the tough part – Python.
Here is the code creating the in memory table
import getopt, sys, string, os, tempfile, shutil
import _winreg,win32api, win32con
from win32com.client import Dispatch
from adoconstants import *
import sqlite3
conn1 = sqlite3.connect(':memory:')
c = conn1.cursor()
# Create table
c.execute('''create table PropList
(PropID bigint)''')
# Insert a row of data
c.execute("""insert into PropList
values (37921019)""")
# Save (commit) the changes
conn1.commit()
c.execute('select * from PropList order by propID')
# lets print out what we have to make sure it works
for row in c:
print row
Ok, my attempt at connecting through Python
conn = Dispatch('ADODB.Connection')
conn.ConnectionString = "Provider=sqloledb.1; Data Source=nt38; Integrated Security = SSPI;database=pubs"
conn.Open()
cmd = Dispatch('ADODB.Command')
cmd.ActiveConnection = conn
cmd.CommandType = adCmdStoredProc
cmd.CommandText = "PropIDListTest #pidlist = ?"
param1 = cmd.CreateParameter('#PIDList', adUserDefined) # I “think” the parameter type is the key and yes it is most likely wrong here.
cmd.Parameters.Append(param1)
cmd.Parameters.Value = conn1 # Yeah, this is probably wrong as well
(rs, status) = cmd.Execute()
while not rs.EOF:
OutputName = rs.Fields("Prop_ID").Value.strip().upper()
print OutputName
rs.MoveNext()
rs.Close()
rs = None
conn.Close()
conn = None
# We can also close the cursor if we are done with it
c.close()
conn1.close()
I have coded TVPs from ADO.NET before.
Here is a question on TVPs in classic ADO that I am interested in, sql server - Classic ADO and Table-Valued Parameters in Stored Procedure - Stack Overflow. It does not give a direct answer but alternatives.
The option of XML is easier, you have probably already considered it; it would require more server side processing.
Here is the msdn link for low level ODBC programming of TVPs.
Table-Valued Parameters (ODBC). This one is the closest answer if you can switch to ODBC.
You could pass a csv string to nvarchar(max) and then pass it to a CLR SplitString function, that one is fast but has default behaviour I disagree with.
Please post back what works or does not here.