I'm trying to insert some data on a DB2 table on an IBM iSeries (AS400) server, using sqlalchemy, pyodbc and the iaccess packages.
The server allows me to run SELECT and CREATE queries, but when I try to insert rows i get the following error:
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('HY000', "[REDACTED]
SQL7008 - TABLE in DATABASE not valid for operations. (-7008)
(SQLExecDirectW)")
I'm executing the following query
INSERT INTO database.table VALUES ('A', 'B', 'C')
I know the query works because I am able to run it using the same credentials from Aqua data studio, a db management IDE.
I'm using the following python code to connect to the db:
from sqlalchemy import create_engine
import pandas as pd
engine_statement = f"iaccess+pyodbc://{user}:{pwd}#{server}/{schema_name}?DRIVER={driver}"
connection = create_engine(engine_statement)
I tried using ibmi instead of iaccess+pyodbc but nothing changes.
The closest queztion I found asks the same thing but using Java.
I tried implementing the answer there in python, by setting the isolation_level option to all possible values, but still nothing changes.
I'm not 100% sure how journaling works and therefore how to use it so I was not able to implement point 2 in the answer.
If it may help, I am able to create new tables, but not write on them, which seems surprising, but I'm no sql expert so I guess I'm missing something.
Related
I have a problem setting which involves data manipulation on an IBMi as/400 database. I'm trying to solve the problem with the help of python and pandas.
For the last days I'm was trying to set up a proper connection to the as/400 db which every combination of package, driver, dialect whatsoever that I could find on SO or Google.
Neither of the solutions is fully working for me. Some are better, while others are not working at all.
Here's the current situation:
I'm able to read and write data through pyodbc. The connection string I'm using is the following:
cstring = urllib.parse.quote("DRIVER={IBM i Access ODBC Driver};SYSTEM=IP;UID=XXX;PWD=YYY;PORT=21;CommitMode=0;SIGNON=4;CCSID=1208;TRANSLATE=1;")
Then I establish the connection like so:
connection = pypyodbc.connect(cstring)
With connection I can read and write data from/to the as400 db through raw SQL statements:
connection.execute("""CREATE TABLE WWNMOD5.temp(
store_id INT GENERATED BY DEFAULT AS IDENTITY NOT NULL,
store_name VARCHAR(150),
PRIMARY KEY (store_id)
)""")
This is, of course, a meaningless example. My goal would be to write a pandas DataFrame to the as400 by using
df.to_sql()
But when trying to do something like this:
df.to_sql('temp', connection, schema='WWNMOD5', chunksize=1000, if_exists='append', index=False)
I get this error:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42000', '[42000] [IBM][System i Access ODBC-Treiber][DB2 für i5/OS]SQL0104 - Token ; ungültig. Gültige Token: <ENDE DER ANWEISUNG>.')
Meaning that an invalid token was used which in this case I believe is the ';' at the end of the SQL statement.
I believe that pandas isn't compatible with the pyodbc package. Therefore, I was also trying to work with the db over sqlalchemy.
With sqlalchemy, I establish the connection like so:
engine= sa.create_engine("iaccess+pyodbc:///?odbc_connect=%s"%cstring)
I also tried to use ibm_db_sa instead of iaccess but the result is always the same.
If I do the same from above with sqlalchemy, that is:
df.to_sql('temp', engine, schema='WWNMOD5', chunksize=1000, if_exists='append', index=False)
I don't get any error message but the table is not created either and I don't know why.
Is there a way how to get this working? All the SO threads are only suggesting solutions for establishing a connection and reading data from as400 databases but don't cover writing data back to the as400 db via python.
It looks like you are using the wrong driver. Pandas claims support for any DB supported by SQLAlchemy. But in order for SQLAlchemy to use DB2, it needs a third party extension.
This is the one recommended by SQLAlchemy: https://pypi.org/project/ibm-db-sa/
Now I am doing something freaky here... I want to ingest data from a pandas dataframe into an in-memory OLTP database on Microsoft SQL Server 2019. The table is not existing yet and I want to create it on the fly based on the pandas dataframe.
For that, I modify the SQL create statement that pandas generates before it usually inserts data so that I will actually create the table as an in-memory table. The create statement works fine when used directly in Microsoft SQL Server Management Studio.
However, when I use SQLAlchemy to run the create statement from within my Python script, I receive the following error message:
DDL statements ALTER, DROP and CREATE inside user transactions are not supported with memory optimized tables.
What does this mean? What is a user transaction? What could I try to make this work?
Thanks
I found out that the cause was due to the autocommit flag being set to False by default. After setting it to True, everything works as expected
Using Sybase as the main transactional DB and SQLite as the in memory DB for Integration tests.
Issue I am facing is conflicting behavior of the two implementations.
I need to execute a query similar to
select dbo.get_name(id), id from some_table
This runs perfectly fine in sybase (I understand the importance of schema prefix for user defined functions). However SQL lite throws error saying SQLite.Operational error near "("
Tried to add dbo as schema while creating SQLite connections but no luck.
Using Python for all the implementation.
You could make dbo a group, and make all your users member of that group. Then you could avoid using the schema prefix at all.
Or, you could have a SQLite database named dbo where you put the function get_name.
I have a pandas DataFrame in python and want this DataFrame directly to be written into a Netezza Database.
I would like to use the pandas.to_sql() method that is described here but it seems like that this method needs one to use SQLAlchemy to connect to the DataBase.
The Problem: SQLAlchemy does not support Netezza.
What I am using at the moment to connect to the database is pyodbc. But this o the other hand is not understood by pandas.to_sql() or am I wrong with this?
My workaround to this is to write the DataFrame into a csv file via pandas.to_csv() and send this to the Netezza Database via pyodbc.
Since I have big data, writing the csv first is a performance issue. I actually do not care if I have to use SQLAlchemy or pyodbc or something different but I cannot change the fact that I have a Netezza Database.
I am aware of deontologician project but as the author states itself "is far from complete, has a lot of bugs".
I got the package to work (see my solution below). But if someone nows a better solution, please let me know!
I figured it out. For my solution see accepted answer.
Solution
I found a solution that I want to share for everyone with the same problem.
I tried the netezza dialect from deontologician but it does not work with python3 so I made a fork and corrected some encoding issues. I uploaded to github and it is available here. Be aware that I just made some small changes and that is mostly work of deontologician and nobody is maintaining it.
Having the netezza dialect I got pandas.to_sql() to work directy with the Netezza database:
import netezza_dialect
from sqlalchemy import create_engine
engine = create_engine("netezza://ODBCDataSourceName")
df.to_sql("YourDatabase",
engine,
if_exists='append',
index=False,
dtype=your_dtypes,
chunksize=1600,
method='multi')
A little explaination to the to_sql() parameters:
It is essential that you use the method='multi' parameter if you do not want to take pandas for ever to write in the database. Because without it it would send an INSERT query per row. You can use 'multi' or you can define your own insertion method. Be aware that you have to have at least pandas v0.24.0 to use it. See the docs for more info.
When using method='multi' it can happen (happend at least to me) that you exceed the parameter limit. In my case it was 1600 so I had to add chunksize=1600 to avoid this.
Note
If you get a warning or error like the following:
C:\Users\USER\anaconda3\envs\myenv\lib\site-packages\sqlalchemy\connectors\pyodbc.py:79: SAWarning: No driver name specified; this is expected by PyODBC when using DSN-less connections
"No driver name specified; "
pyodbc.InterfaceError: ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Then you propably treid to connect to the database via
engine = create_engine(netezza://usr:pass#address:port/database_name)
You have to set up the database in the ODBC Data Source Administrator tool from Windows and then use the name you defined there.
engine = create_engine(netezza://ODBCDataSourceName)
Then it should have no problems to find the driver.
I know you already answered the question yourself (thanks for sharing the solution)
One general comment about large data-writes to Netezza:
I’d always choose to write data to a file and then use the external table/ODBC interface to insert the data. Instead of inserting 1600 rows at a time, you can probably insert millions of rows in the same timeframe.
We use UTF8 data in the flat file and CSV unless you want to load binary data which will probably require fixed width files.
I’m not a python savvy but I hope you can follow me ...
If you need a documentation link, you can start here: https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.load.doc/c_load_create_external_tbl_syntax.html
I want to come up minimal set of queries/loc that extracts the table metadata within a database, on as many versions of database as possible. I'm using PostgreSQl. I'm trying to get this using python. But I've no clue on how to do this, as I'm a python newbie.
I appreciate your ideas/suggestions on this issue.
You can ask your database driver, in this case psycopg2, to return some metadata about a database connection you've established. You can also ask the database directly about some of it's capabilities, or schemas, but this is highly dependent on the version of the database you're connecting to, as well as the type of database.
Here's an example taken from http://bytes.com/topic/python/answers/438133-find-out-schema-psycopg for PostgreSQL:
>>> import psycopg2 as db
>>> conn = db.connect('dbname=billings user=steve password=xxxxx port=5432')
>>> curs = conn.cursor()
>>> curs.execute("""select table_name from information_schema.tables WHERE table_schema='public' AND table_type='BASETABLE'""")
>>> curs.fetchall()
[('contacts',), ('invoicing',), ('lines',), ('task',), ('products',),('project',)]
However, you probably would be better served using an ORM like SQLAlchemy. This will create an engine which you can query about the database you're connected to, as well as normalize how you connect to varying database types.
If you need help with SQLAlchemy, post another question here! There's TONS of information already available by searching the site.