I am using python 2.7 for a specific job. I am connecting to MSSQL Server (2008) using FreeTDS. I can make some simple select queries but when I try to run a parametrised query I got an error:
('HY004', '[HY004] [FreeTDS][SQL Server]Invalid data type (0) (SQLBindParameter)')
Here is my query:
query = u"UPDATE table SET column1=? WHERE column2=?"
cursor.execute(query,[param1, param2])
However the same code on live works fine.
I have skimmed so many thread in various forums but they all seem misleading and I am really confused.
What is my actual problem and what do you suggest?
Edit: I've added query.
I know this is a super old thread, but I came across this same problem and the solution for me was to type cast the variables. For instance:
query = u"UPDATE table SET column1=? WHERE column2=?"
cursor.execute(query,[str(param1), str(param2)])
In this case it doesn't really matter what type the parameters are as it will be converted to a string.
Related
I'm trying to set up a connection to AWS Redshift from the Great Expectations Framework (GE) according to the tutorial using Python and facing two issues:
When I'm using postgresql+psycopg2 as driver in the connection string in step 5, adding the datasource (context.add_datasource(**datasource_config)) takes extremely long (up to 20 minutes !!!). Validating expectations afterwards works as expected and even runs quite fast. I'm assuming the huge amount of time needed is due to the size of the redshift cluster I'm connecting to (more than 1000 schemas) and the postgresql driver not being optimized for redshift.
In search for alternatives to the postgresql driver I came across the sqlalchemy-redshift driver. Changing it in the connection string (redshift+psycopg2) adds the datasource instantly, however, validating some expectations (e.g. expect_column_values_to_not_be_null) fails! After some digging through the code I realized it might be due to GE creating a temporary table in the SQL query. So when I specify the query:
select * from my_redshift_schema.my_table;
GE actually seems to run something like:
CREATE TEMPORARY TABLE "ge_temp_bf3cbfa2" AS select * from my_redshift_schema.my_table;
For certain expectations sqlalchemy-redshift tries to find information about the columns of the table, however, it searches for the name of the temporary table and not the actual one I specified in the SQL query. It consequently fails as it obviously can't find a table with that name in the redshift cluster. More specifically it results in a KeyError in the dialect.py file within sqlalchemy-redshift:
.venv/lib/python3.8/site-packages/sqlalchemy_redshift/dialect.py\", line 819, in _get_redshift_columns
return all_schema_columns[key]
KeyError: RelationKey(name='ge_temp_bf3cbfa2', schema='public')
Has anyone succeeded running GE on redshift? How could I mitigate the issues I'm facing (make option 1 faster or fix the error in option 2)?
I am attempting to execute a raw sql insert statement in Sqlalchemy, SQL Alchemy throws no errors when the constructed insert statement is executed but the lines do not appear in the database.
As far as I can tell, it isn't a syntax error (see no 2), it isn't an engine error as the ORM can execute an equivalent write properly (see no 1), it's finding the table it's supposed to write too (see no 3). I think it's a problem with a transaction not being commited and have attempted to address this (see no 4) but this hasn't solved the issue. Is it possible to create a nested transaction and what would start the 'first' so to speak?
Thankyou for any answers.
Some background:
I know that the ORM facilitates this and have used this feature and it works, but is too slow for our application. We decided to try using raw sql for this particular write function due to how often it's called and the ORM for everything else. An equivalent method using the ORM works perfectly, and the same engine is used for both, so it can't be an engine problem right?
I've issued an example of the SQL that the method using raw sql constructs to the database directly and that reads in fine, so I don't think it's a syntax error.
it's communicating with the database properly and can find the table as any syntax errors with table and column names throw a programmatic error so it's not just throwing stuff into the 'void' so to speak.
My first thought after reading around was that it was transaction error and that a transaction was being created and not closed, and so constructed the execute statement as such to ensure a transaction was properly created and commited.
with self.Engine.connect() as connection:
connection.execute(Insert_Statement)
connection.commit
The so called 'Insert Statement' has been converted to text using the sqlalchemy 'text' function, I don't quite understand why it won't execute if I pass the constructed string directly to the execute statement but mention it in case it's relevant.
Other things that may be relevant:
Python3 is running on an individual ec2 instance the postgres database on another. The table in particular is a timescaledb hypertable taking realtime data, hence the need for very fast writes, but probably not relevant.
Currently using pg8000 as dialect for no particular reason other than psycopg2 was throwing errors when trying the execute an equivalent method using the ORM.
Just so this question is answered in case anyone else ends up here:
The issue was a failure to call commit as a method, as #snakecharmerb pointed out. Gord Thompson also provided an alternate method using 'begin' which automatically commits rather than connection which is a 'commit as you go' style transaction.
I'm running a python db migration script (Flask-Migrate) and have added the alembic.ddl.imp import DefaultImpl to get around the first set of errors but now I'm getting the following. I'm trying to use this script to set up my tables and database in snowflake. What am I missing? Everything seems to be working and I can't seem to find any help on this particular error in the snowflake documentation. I would assume that the snowflake sqlalchemy connector would address the creation of a unique index.
The script so far does create several of the tables, but when it gets to this part it throws the error.
> sqlalchemy.exc.ProgrammingError:
> (snowflake.connector.errors.ProgrammingError) 001003 (42000): SQL
> compilation error: syntax error line 1 at position 7 unexpected
> 'UNIQUE'. [SQL: CREATE UNIQUE INDEX ix_flicket_users_token ON
> flicket_users (token)] (Background on this error at:
> http://sqlalche.me/e/f405)
Snowflake does not have INDEX objects, so any CREATE ... INDEX statement will fail.
With Snowflake, you have to trust the database to organize your data with micro partitions and build a good access plan for your queries.
You will feel uneasy at first, but eventually stop worrying.
Bleeding edge solutions will require monitoring/tuning performance using the query log, however.
Nothing new here.
I have a pandas DataFrame in python and want this DataFrame directly to be written into a Netezza Database.
I would like to use the pandas.to_sql() method that is described here but it seems like that this method needs one to use SQLAlchemy to connect to the DataBase.
The Problem: SQLAlchemy does not support Netezza.
What I am using at the moment to connect to the database is pyodbc. But this o the other hand is not understood by pandas.to_sql() or am I wrong with this?
My workaround to this is to write the DataFrame into a csv file via pandas.to_csv() and send this to the Netezza Database via pyodbc.
Since I have big data, writing the csv first is a performance issue. I actually do not care if I have to use SQLAlchemy or pyodbc or something different but I cannot change the fact that I have a Netezza Database.
I am aware of deontologician project but as the author states itself "is far from complete, has a lot of bugs".
I got the package to work (see my solution below). But if someone nows a better solution, please let me know!
I figured it out. For my solution see accepted answer.
Solution
I found a solution that I want to share for everyone with the same problem.
I tried the netezza dialect from deontologician but it does not work with python3 so I made a fork and corrected some encoding issues. I uploaded to github and it is available here. Be aware that I just made some small changes and that is mostly work of deontologician and nobody is maintaining it.
Having the netezza dialect I got pandas.to_sql() to work directy with the Netezza database:
import netezza_dialect
from sqlalchemy import create_engine
engine = create_engine("netezza://ODBCDataSourceName")
df.to_sql("YourDatabase",
engine,
if_exists='append',
index=False,
dtype=your_dtypes,
chunksize=1600,
method='multi')
A little explaination to the to_sql() parameters:
It is essential that you use the method='multi' parameter if you do not want to take pandas for ever to write in the database. Because without it it would send an INSERT query per row. You can use 'multi' or you can define your own insertion method. Be aware that you have to have at least pandas v0.24.0 to use it. See the docs for more info.
When using method='multi' it can happen (happend at least to me) that you exceed the parameter limit. In my case it was 1600 so I had to add chunksize=1600 to avoid this.
Note
If you get a warning or error like the following:
C:\Users\USER\anaconda3\envs\myenv\lib\site-packages\sqlalchemy\connectors\pyodbc.py:79: SAWarning: No driver name specified; this is expected by PyODBC when using DSN-less connections
"No driver name specified; "
pyodbc.InterfaceError: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Then you propably treid to connect to the database via
engine = create_engine(netezza://usr:pass#address:port/database_name)
You have to set up the database in the ODBC Data Source Administrator tool from Windows and then use the name you defined there.
engine = create_engine(netezza://ODBCDataSourceName)
Then it should have no problems to find the driver.
I know you already answered the question yourself (thanks for sharing the solution)
One general comment about large data-writes to Netezza:
I’d always choose to write data to a file and then use the external table/ODBC interface to insert the data. Instead of inserting 1600 rows at a time, you can probably insert millions of rows in the same timeframe.
We use UTF8 data in the flat file and CSV unless you want to load binary data which will probably require fixed width files.
I’m not a python savvy but I hope you can follow me ...
If you need a documentation link, you can start here: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_syntax.html
I am trying to fetch a clob data from Oracle server and the connection is made through ssh tunnel.
When I tried to run the following code:
(id,clob) = cursor.fetchone()
print('one fetched')
clob_data = clob.read()
print(clob_data)
the execution freezes
Can someone help me with what's wrong here because I have referred to cx_oracle docs and the example code is just the same.
It is possible that there is a round trip taking place that is not being handled properly by the cx_Oracle driver. Please create an issue here (https://github.com/oracle/python-cx_Oracle/issues) with a few more details such as platform, Python version, Oracle database/client version, etc.
You can probably work around the issue, however, by simply returning the CLOBs as strings as can be seen in this sample: https://github.com/oracle/python-cx_Oracle/blob/master/samples/ReturnLobsAsStrings.py.