The approach I am trying is to write a dynamic script that would generate mirror tables as in Oracle with similar data types in SQL server. Then again, write a dynamic script to insert records to SQL server. The challenge I see is incompatible data types. Has anyone come across similar situation? I am a sql developer but I can learn python if someone can share their similar work.
Have you tried the "SQL Server Import and Export Wizard" in SSMS?
i.e. if you create an empty SQL server database and right click on it in SSMS then one of the "tasks" menu options is "Import Data..." which starts up the "SQL Server Import and Export Wizard". This builds a once-off SSIS package .. which can be saved if you want to re-use.
There is a data source option for "Microsoft OLE DB Provider for Oracle".
You might have a better Oracle OLE DB Provider available also to try.
The will require Oracle client software to be available.
I haven't actually tried this (Oracle to SQL*Server) so am not sure if reasonable or not.
How many tables, columns?
Oracle DB may also have Views, triggers, constraints, Indexes, Functions, Packages, sequence generators, synonyms.
I used linked server, got all the metadata of the tables from dba_tab_columns in Oracle. Wrote script to create tables based on the metadata. I needed to use SSIS script task to save the create table script for source control. Then I wrote sql script to insert data from oracle, handled type differences through script.
I'm trying to talk to an AS400 in Python. The goal is to use SQLAlchemy, but when I couldn't get that to work I stepped back to a more basic script using just ibm_db instead of ibm_db_sa.
import ibm_db
dbConnection = ibm_db.pconnect("DATABASE=myLibrary;HOSTNAME=1.2.3.4;PORT=8471;PROTOCOL=TCPIP;UID=username;PWD=password", "", "") #this line is where it hangs
print ibm_db.conn_errormsg()
The problem seems to be the port. If I use the 50000 I see in all the examples, I get an error. If I use 446, I get an error. The baffling part is this: if I use 8471, which IBM says to do, I get no error, no timeout, no response whatsoever. I've left the script running for over twenty minutes, and it just sits there, doing nothing. It's active, because I can't use the command prompt at all, but it never gives me any feedback of any kind.
This same 400 is used by the company I work for every day, for logging, emailing, and (a great deal of) database usage, so I know it works. The software we use, which talks to the database behind the scenes, runs just fine on my machine. That tells me my driver is good, the network settings are right, and so on. I can even telnet into the 400 from here.
I'm on the SQLAlchemy and ibm_db email lists, and have been communicating with them for days about this problem. I've also googled it so much I'm starting to run out of un-visited links in my search results. No one seems to have the problem of the connection hanging indefinitely. If there's anything I can try in Python, I'll try it. I don't deal with the 400 directly, but I can ask the guy who does to check/configure whatever I need to. As I said though, several workstations can talk to the 400's database with no problems, and queries run against the library I want to access work fine, if run from the 400 itself. If anyone has any suggestions, I'd greatly appreciate hearing them. Thanks!
The README for ibm_db_sa only lists DB2 for Linux/Unix/Windows in the "Supported Database" section. So it most likely doesn't work for DB2 for i, at least not right out of the box.
Since you've stated you have IBM System i Access for Windows, I strongly recommend just using one of the drivers that comes with it (ODBC, OLEDB, or ADO.NET, as #Charles mentioned).
Personally, I always use ODBC, with either pyodbc or pypyodbc. Either one works fine. A simple example:
import pyodbc
connection = pyodbc.connect(
driver='{iSeries Access ODBC Driver}',
system='11.22.33.44',
uid='username',
pwd='password')
c1 = connection.cursor()
c1.execute('select * from qsys2.sysschemas')
for row in c1:
print row
Now, one of SQLAlchemy's connection methods is pyodbc, so I would think that if you can establish a connection using pyodbc directly, you can somehow configure SQLAlchemy to do the same. But I'm not an SQLAlchemy user myself, so I don't have example code for that.
UPDATE
I managed to get SQLAlchemy to connect to our IBM i and execute straight SQL queries. In other words, to get it to about the same functionality as simply using PyODBC directly. I haven't tested any other SQLAlchemy features. What I did to set up the connection on my Windows 7 machine:
Install ibm_db_sa as an SQLAlchemy dialect
You may be able to use pip for this, but I did it the low-tech way:
Download ibm_db_sa from PyPI.
As of this writing, the latest version is 0.3.2, uploaded on 2014-10-20. It's conceivable that later versions will either be fixed or broken in different ways (so in the future, the modifications I'm about to describe might be unnecessary, or they might not work).
Unpack the archive (ibm_db_sa-0.3.2.tar.gz) and copy the enclosed ibm_db_sa directory into the sqlalchemy\dialects directory.
Modify sqlalchemy\dialects\ibm_db_sa\pyodbc.py
Add the initialize() method to the AS400Dialect_pyodbc class
The point of this is to override the method of the same name in DB2Dialect, which AS400Dialect_pyodbc inherits from. The problem is that DB2Dialect.initialize() tries to set attributes dbms_ver and dbms_name, neither of which is available or relevant when connecting to IBM i using PyODBC (as far as I can tell).
Add the module-level name dialect and set it to the AS400Dialect_pyodbc class
Code for the above modifications should go at the end of the file, and look like this:
def initialize(self, connection):
super(DB2Dialect, self).initialize(connection)
dialect = AS400Dialect_pyodbc
Note the indentation! Remember, the initialize() method needs to belong to the AS400Dialect_pyodbc class, and dialect needs to be global to the module.
Finally, you need to give the engine creator the right URL:
'ibm_db_sa+pyodbc://username:password#host/*local'
(Obviously, substitute valid values for username, password, and host.)
That's it. At this point, you should be able to create the engine, connect to the i, and execute plain SQL through SQLAlchemy. I would think a lot of the ORM stuff should also work at this point, but I have not verified this.
The way to find out what port is needed is to look at the service table entries on the IBM i.
Your IBM i guy can use the iNav GUI or the green screen Work with Service Table Entry (WRKSRVTBLE) command
Should get a screen like so:
Service Port Protocol
as-admin-http 2001 tcp
as-admin-http 2001 udp
as-admin-https 2010 tcp
as-admin-https 2010 udp
as-central 8470 tcp
as-central-s 9470 tcp
as-database 8471 tcp
as-database-s 9471 tcp
drda 446 tcp
drda 446 udp
The default port for the DB is indeed 8471. Though drda is used for "distributed db" operations.
Based upon this thread, to use ibm_db to connect to DB2 on an IBM i, you need the IBM Connect product; which is a commercial package that has to be paid for.
This thread suggests using ODBC via the pyodbc module. It also suggests that JDBC via the JT400 toolkit may also work.
Here is an example to work with as400, sqlalchemy and pandas.
This exammple take a bunch of csv files and insert with pandas/sqlalchemy.
Only works for windows, on linux the i series odbc driver segfaults (Centos 7 and Debian 9 x68_64)
Client is Windows 10.
My as400 version is 7.3
Python is 2.7.14
installed with pip: pandas, pyodbc, imb_db_sa, sqlalchemy
You need to install i access for windows from ftp://public.dhe.ibm.com/as400/products/clientaccess/win32/v7r1m0/servicepack/si66062/
Aditionally the modifications by #JohnY on pyodbc.py
C:\Python27\Lib\site-packages\sqlalchemy\dialects\ibm_db_sa\pyodbc.py
Change line 99 to
pyodbc_driver_name = "IBM i Access ODBC Driver"
The odbc driver changed it's name.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import glob
csvfiles=(glob.glob("c:/Users/nahum/Documents/OUT/*.csv"))
df_csvfiles = pd.DataFrame(csvfiles)
for index, row in df_csvfiles.iterrows():
datastore2=pd.read_csv(str(row[0]), delimiter=',', header=[0],skipfooter=3)
engine = create_engine('ibm_db_sa+pyodbc://DB2_USER:PASSWORD#IP_SERVER/*local')
datastore2.to_sql('table', engine, schema='SCHEMA', chunksize=1000, if_exists='append', index=False)
Hope it helps.
If you don't need Pandas/SQLAlchemy, just use pyodbc as suggested in John Y's answer. Otherwise, you can try doing what worked for me, below. It's taken from my answer to my own, similar question, which you can check out for more detail on what doesn't work (I tried and failed in so many ways before getting it working).
I created a blank file in my project to appease this message that I was receiving:
Unable to open 'hashtable_class_helper.pxi': File not found
(file:///c:/git/dashboards/pandas/_libs/hashtable_class_helper.pxi).
(My project folder is C:/Git/dashboards, so I created the rest of the path.)
With that file present, the code below now works for me. For the record, it seems to work regardless of whether the ibm_db_sa module is modified as suggested in John Y's answer, so I would recommend leaving that module alone. Note that although they aren't imported by directly, you need these modules installed: pyodbc, ibm_db_sa, and possibly future (if using Python 2...I forget if it's necessary). If you are using Python 3, I you'll need urllib.parse instead of urllib. I also have i Access 7.1 drivers installed on my computer, which probably came into play.
import urllib
import pandas as pd
from sqlalchemy import create_engine
CONNECTION_STRING = (
"driver={iSeries Access ODBC Driver};"
"system=ip_address;"
"database=database_name;"
"uid=username;"
"pwd=password;"
)
SQL= "SELECT..."
quoted = urllib.quote_plus(CONNECTION_STRING)
engine = create_engine('ibm_db_sa+pyodbc:///?odbc_connect={}'.format(quoted))
df = pd.read_sql_query(
SQL,
engine,
index_col='some column'
)
print df
Our infrastructure group has asked us to "add MultiSubnetFailover=True to all application connection strings" so that we can take advantage of a new SQL Server HA setup involving Availability Groups.
I am stuck though since we have some python programs that connect (read+write) to the database via SQL Alchemy. I have been searching and I don't see anything about this MultiSubnetFailover feature being available as an option in SQL Alchemy or any other Python driver. Is it possible to connect to an HA setup utilizing the SQL Alchemy driver, or even Python, and if so how?
FYI - The link that my infrastructure guy pointed me to is here (http://msdn.microsoft.com/en-us/library/hh205662%28v=vs.110%29.aspx), and as you can see it is specifically about how .NET applications can utilize the "MultiSubnetFailover=True" setting in the connection string among other things.
http://docs.sqlalchemy.org/en/latest/dialects/mssql.html#dialect-mssql-pyodbc-connect
You could use the example towards the end of the documentation's section like this:
import urllib
from sqlalchemy import create_engine
connection_string = '127.0.0.1;Database=MyDatabase;MultiSubnetFailover=True'
engine_string = 'mssql+pyodbc:///?odbc_connect={}'.format(urllib.quote_plus(connection_string))
engine = create_engine(engine_string)
Update from comments
For newer versions of Microsoft ODBC Driver for SQL Server, you may need to use MultiSubnetFailover=Yes instead of True
I've looked over Google Cloud SQL's documentation and various searches, but I can't find out whether it is possible to use SQLAlchemy with Google Cloud SQL, and if so, what the connection URI should be.
I'm looking to use the Flask-SQLAlchemy extension and need the connection string like so:
mysql://username:password#server/db
I saw the Django example, but it appears the configuration uses a different style than the connection string. https://developers.google.com/cloud-sql/docs/django
Google Cloud SQL documentation:
https://developers.google.com/cloud-sql/docs/developers_guide_python
Update
Google Cloud SQL now supports direct access, so the MySQLdb dialect can now be used. The recommended connection via the mysql dialect is using the URL format:
mysql+mysqldb://root#/<dbname>?unix_socket=/cloudsql/<projectid>:<instancename>
mysql+gaerdbms has been deprecated in SQLAlchemy since version 1.0
I'm leaving the original answer below in case others still find it helpful.
For those who visit this question later (and don't want to read through all the comments), SQLAlchemy now supports Google Cloud SQL as of version 0.7.8 using the connection string / dialect (see: docs):
mysql+gaerdbms:///<dbname>
E.g.:
create_engine('mysql+gaerdbms:///mydb', connect_args={"instance":"myinstance"})
I have proposed an update to the mysql+gaerdmbs:// dialect to support both of Google Cloud SQL APIs (rdbms_apiproxy and rdbms_googleapi) for connecting to Cloud SQL from a non-Google App Engine production instance (ex. your development workstation). The change will also modify the connection string slightly by including the project and instance as part of the string, and not require being passed separately via connect_args.
E.g.
mysql+gaerdbms:///<dbname>?instance=<project:instance>
This will also make it easier to use Cloud SQL with Flask-SQLAlchemy or other extension where you don't explicitly make the create_engine() call.
If you are having trouble connecting to Google Cloud SQL from your development workstation, you might want to take a look at my answer here - https://stackoverflow.com/a/14287158/191902.
Yes,
If you find any bugs in SA+Cloud SQL, please let me know. I wrote the dialect code that was integrated into SQLAlchemy. There's a bit of silly business about how Cloud SQL bubbles up exceptions, so there might be some loose ends there.
For those who prefer PyMySQL over MySQLdb (which is suggested in the accepted answer), the SQLAlchemy connection strings are:
For Production
mysql+pymysql://<USER>:<PASSWORD>#/<DATABASE_NAME>?unix_socket=/cloudsql/<PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>
Please make sure to
Add the SQL instance to your app.yaml:
beta_settings:
cloud_sql_instances: <PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>
Enable the SQL Admin API as it seems to be necessary:
https://console.developers.google.com/apis/api/sqladmin.googleapis.com/overview
For Local Development
mysql+pymysql://<USER>:<PASSWORD>#localhost:3306/<DATABASE_NAME>
given that you started the Cloud SQL Proxy with:
cloud_sql_proxy -instances=<PUT-SQL-INSTANCE-CONNECTION-NAME-HERE>=tcp:3306
it is doable, though I haven't used Flask at all so I'm not sure about establishing the connection through that. I got it working through Pyramid and submitted a patch to SQLAlchemy (possibly to the wrong repo) here:
https://bitbucket.org/sqlalchemy/sqlalchemy/pull-request/2/added-a-dialect-for-google-app-engines
That has since been replaced and accepted into SQLAlchemy as
http://www.sqlalchemy.org/trac/ticket/2484
I don't think it's made it way to a release though.
There are some issues with Google SQL throwing different exceptions so we had issues with things like deploying a database automatically. You also need to disable connection pooling using NullPool as mentioned in the second patch.
We've since moved to using the datastore through NDB so I haven't followed the progess of these fixes for a while..
PostgreSQL, pg8000 and flask_sqlalchemy
Adding information in case someone is on the lookout how to use flask_sqlalchemy with PostgreSQL: Using pg8000 as driver, the working connection string is
postgres+pg8000://<db_user>:<db_pass>#/<db_name>
My job would be easier, or at least less tedious if I could come up with an automated way (preferably in a Python script) to extract useful information from a FileMaker Pro database. I am working on Linux machine and the FileMaker database is on the same LAN running on an OS X machine. I can log into the webby interface from my machine.
I'm quite handy with SQL, and if somebody could point me to some FileMaker plug-in that could give me SQL access to the data within FileMaker, I would be pleased as punch. Everything I've found only goes the other way: Having FileMaker get data from SQL sources. Not useful.
It's not my first choice, but I'd use Perl instead of Python if there was a Perl-y solution at hand.
Note: XML/XSLT services (as suggested by some folks) are only available on FM Server, not FM Pro. Otherwise, that would probably be the best solution. ODBC is turning out to be extremely difficult to even get working. There is absolutely zero feedback from FM when you set it up so you have to dig through /var/log/system.log and parse obscure error messages.
Conclusion: I got it working by running a python script locally on the machine that queries the FM database through the ODBC connections. The script is actually a TCPServer that accepts socket connections from other systems on the LAN, runs the queries, and returns the data through the socket connection. I had to do this to bypass the fact that FM Pro only accepts ODBC connections locally (FM server is required for external connections).
It has been a really long time since I did anything with FileMaker Pro, but I know that it does have capabilities for an ODBC (and JDBC) connection to be made to it (however, I don't know how, or if, that translates to the linux/perl/python world though).
This article shows how to share/expose your FileMaker data via ODBC & JDBC:
Sharing FileMaker Pro data via ODBC or JDBC
From there, if you're able to create an ODBC/JDBC connection you could query out data as needed.
You'll need the FileMaker Pro installation CD to get the drivers. This document details the process for FMP 9 - it is similar for versions 7.x and 8.x as well. Versions 6.x and earlier are completely different and I wouldn't bother trying (xDBC support in those previous versions is "minimal" at best).
FMP 9 supports SQL-92 standard syntax (mostly). Note that rather than querying tables directly you query using the "table occurrence" name which serves as a table alias of sorts. If the data tables are stored in multiple files it is possible to create a single FMP file with table occurrences/aliases pointing to those data tables. There's an "undocumented feature" where such a file must have a table defined in it as well and that table "related" to any other table on the relationships graph (doesn't matter which one) for ODBC access to work. Otherwise your queries will always return no results.
The PDF document details all of the limitations of using the xDBC interface FMP provides. Performance of simple queries is reasonably fast, ymmv. I have found the performance of queries specifying the "LIKE" operator to be less than stellar.
FMP also has an XML/XSLT interface that you can use to query FMP data over an HTTP connection. It also provides a PHP class for accessing and using FMP data in web applications.
If your leaning is to Python, you may be interested in checking out the Python Wrapper for Filemaker. It provides two way access to the Filemaker data via Filemaker's built-in XML services. You can find some quite thorough information on this at:
http://code.google.com/p/pyfilemaker/