Using JPype - How can I access JDBC Meta Data Functions - python

I'm using JayDeBeAPI which uses JPype to load FileMaker's JDBC driver and pull data.
But I also want to be able to get a listing of all tables in the database.
In the JDBC documentation (page 55) it lists the following functions:
The JDBC client driver supports the following Meta Data functions:
getColumns
getColumnPrivileges
getMetaData
getTypeInfo
getTables
getTableTypes
Any ideas how I might call them from JPype or JayDeBeAPI?
If it helps, here's my current code:
import jaydebeapi
import jpype
jar = r'/opt/drivers/fmjdbc.jar'
args='-Djava.class.path=%s' % jar
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, args)
conn = jaydebeapi.connect('com.filemaker.jdbc.Driver',
SETTINGS['SOURCE_URL'], SETTINGS['SOURCE_UID'], SETTINGS['SOURCE_PW'])
curs = conn.cursor()
#Sample Query:
curs.execute("select * from table")
result_rows = curs.fetchall()
Update:
Here's some progress and it seems like it should work, but I'm getting the error below. Any ideas?
> conn.jconn.metadata.getTables()
*** RuntimeError: No matching overloads found. at src/native/common/jp_method.cpp:121

Ok, thanks to eltabo and Juan Mellado I figured it out!
I just had to pass in the correct parameters to match the method signature.
Here's the working code:
import jaydebeapi
import jpype
jar = r'/opt/drivers/fmjdbc.jar'
args='-Djava.class.path=%s' % jar
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, args)
conn = jaydebeapi.connect('com.filemaker.jdbc.Driver',
SETTINGS['SOURCE_URL'], SETTINGS['SOURCE_UID'], SETTINGS['SOURCE_PW'])
results = source_conn.jconn.getMetaData().getTables(None, None, "%", None)
#I'm not sure if this is how to read the result set, but jaydebeapi's cursor object
# has a lot of logic for getting information out of a result set, so let's harness
# that.
table_reader_cursor = source_conn.cursor()
table_reader_cursor._rs = results
read_results = table_reader_cursor.fetchall()
#get just the table names
[row[2] for row in read_results if row[3]=='TABLE']

From ResultSet Javadoc:
public ResultSet getTables(String catalog,
String schemaPattern,
String tableNamePattern,
String[] types)
throws SQLException
You need pass the four parameter to the method. I'm not a python developer, but in Java I use :
ResultSet rs = metadata.getTables(null, "public", "%" ,new String[] {"TABLE"} );
to get all the tables (and only the tables) in a schema.
Regards.

Related

Compile query from raw string (without using .text(...)) using Sqlalchemy connection and Postgres

I am using Sqlalchemy 1.3 to connect to a PostgreSQL 9.6 database (through Psycopg).
I have a very, very raw Sql string formatted using Psycopg2 syntax which I can not modify because of some legacy issues:
statement_str = SELECT * FROM users WHERE user_id=%(user_id)s
Notice the %(user_id)s
I can happily execute that using a sqlalchemy connection just by doing:
connection = sqlalch_engine.connect()
rows = conn.execute(statement_str, user_id=self.user_id)
And it works fine. I get my user and all is nice and good.
Now, for debugging purposes I'd like to get the actual query with the %(user_id)s argument expanded to the actual value. For instance: If user_id = "foo", then get SELECT * FROM users WHERE user_id = 'foo'
I've seen tons of examples using sqlalchemy.text(...) to produce a statement and then get a compiled version. I have that thanks to other answers like this one or this one been able to produce a decent str when I have an SqlAlchemy query.
However, in this particular case, since I'm using a more cursor-specific syntax %(user_id) I can't do that. If I try:
text(statement_str).bindparams(user_id="foo")
I get:
This text() construct doesn't define a bound parameter named 'user_id'
So I guess what I'm looking for would be something like
conn.compile(statement_str, user_id=self.user_id)
But I haven't been able to get that.
Not sure if this what you want but here goes.
Assuming statement_str is actually a string:
import sqlalchemy as sa
statement_str = "SELECT * FROM users WHERE user_id=%(user_id)s"
params = {'user_id': 'foo'}
query_text = sa.text(statement_str % params)
# str(query_text) should print "select * from users where user_id=foo"
Ok I think I got it.
The combination of SqlAlchemy's raw_connection + Psycopg's mogrify seems to be the answer.
conn = sqlalch_engine.raw_connection()
try:
cursor = conn.cursor()
s_str = cursor.mogrify(statement_str, {'user_id': self.user_id})
s_str = s_str.decode("utf-8") # mogrify returns bytes
# Some cleanup for niceness:
s_str = s_str.replace('\n', ' ')
s_str = re.sub(r'\s{2,}', ' ', s_str)
finally:
conn.close()
I hope someone else finds this helpful

Is it possible to use dask dataframe with teradata python module?

I have this code:
import teradata
import dask.dataframe as dd
login = login
pwd = password
udaExec = teradata.UdaExec (appName="CAF", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", DSN="Teradata",
USEREGIONALSETTINGS='N', username=login,
password=pwd, authentication="LDAP");
And the connection is working.
I want to get a dask dataframe. I have tried this:
sqlStmt = "SOME SQL STATEMENT"
df = dd.read_sql_table(sqlStmt, session, index_col='id')
And I'm getting this error message:
AttributeError: 'UdaExecConnection' object has no attribute '_instantiate_plugins'
Does anyone have a suggestion?
Thanks in advance.
read_sql_table expects a SQLalchemy connection string, not a "session" as you are passing. I have not heard of teradata being used via sqlalchemy, but apparently there is at least one connector you could install, and possibly other solutions using the generic ODBC driver.
However, you may wish to use a more direct approach using delayed, something like
from dask import delayed
# make a set of statements for each partition
statements = [sqlStmt + " where id > {} and id <= {}".format(bounds)
for bounds in boundslist] # I don't know syntax for tera
def get_part(statement):
# however you make a concrete dataframe from a SQL statement
udaExec = ..
session = ..
df = ..
return dataframe
# ideally you should provide the meta and divisions info here
df = dd.from_delayed([delayed(get_part)(stm) for stm in statements],
meta= , divisions=)
We will be interested to hear of your success.

Connecting and testing a JDBC driver from Python

I'm trying to do some testing on our JDBC driver using Python.
Initially figuring out JPype, I eventually managed to connect the driver and execute select queries like so (reproducing a generalized snippet):
from __future__ import print_function
from jpype import *
#Start JVM, attach the driver jar
jvmpath = 'path/to/libjvm.so'
classpath = 'path/to/JDBC_Driver.jar'
startJVM(jvmpath, '-ea', '-Djava.class.path=' + classpath)
# Magic line 1
driver = JPackage('sql').Our_Driver
# Initiating a connection via DriverManager()
jdbc_uri = 'jdbc:our_database://localhost:port/database','user', 'passwd')
conn = java.sql.DriverManager.getConnection(jdbc_uri)
# Executing a statement
stmt = conn.createStatement()
rs = stmt.executeQuery ('select top 10 * from some_table')
# Extracting results
while rs.next():
''' Magic #2 - rs.getStuff() only works inside a while loop '''
print (rs.getString('col_name'))
However, I've failed to to batch inserts, which is what I wanted to test. Even when executeBatch() returned a jpype int[], which should indicate a successful insert, the table was not updated.
I then decided to try out py4j.
My plight - I'm having a hard time figuring out how to do the same thing as above. It is said py4j does not start a JVM on its own, and that the Java code needs to be prearranged with a GatewayServer(), so I'm not sure it's even feasible.
On the other hand, there's a library named py4jdbc that does just that.
I tinkered through the dbapi.py code but didn't quite understand the flow, and am pretty much jammed.
If anyone understands how to load a JDBC driver from a .jar file with py4j and can point me in the right direction, I'd be much grateful.
add a commit after adding the records and before retrieving.
conn.commit()
I have met a similar problem in airflow, I used teradata jdbc jars and jaydebeapi to connect teradata database and execute sql:
[root#myhost transfer]# cat test_conn.py
import jaydebeapi
from contextlib import closing
jclassname='com.teradata.jdbc.TeraDriver'
jdbc_driver_loc = '/opt/spark-2.3.1/jars/terajdbc4-16.20.00.06.jar,/opt/spark-2.3.1/jars/tdgssconfig-16.20.00.06.jar'
jdbc_driver_name = 'com.teradata.jdbc.TeraDriver'
host='my_teradata.address'
url='jdbc:teradata://' + host + '/TMODE=TERA'
login="teradata_user_name"
psw="teradata_passwd"
sql = "SELECT COUNT(*) FROM A_TERADATA_TABLE_NAME where month_key='202009'"
conn = jaydebeapi.connect(jclassname=jdbc_driver_name,
url=url,
driver_args=[login, psw],
jars=jdbc_driver_loc.split(","))
with closing(conn) as conn:
with closing(conn.cursor()) as cur:
cur.execute(sql)
print(cur.fetchall())
[root#myhost transfer]# python test_conn.py
[(7734133,)]
[root#myhost transfer]#
In py4j, with your respective JDBC uri:
from py4j.java_gateway import JavaGateway
# Open JVM interface with the JDBC Jar
jdbc_jar_path = '/path/to/jdbc_driver.jar'
gateway = JavaGateway.launch_gateway(classpath=jdbc_jar_path)
# Load the JDBC Jar
jdbc_class = "com.vendor.VendorJDBC"
gateway.jvm.class.forName(jdbc_class)
# Initiate connection
jdbc_uri = "jdbc://vendor:192.168.x.y:zzzz;..."
con = gateway.jvm.DriverManager.getConnection(jdbc_uri)
# Run a query
sql = "select this from that"
stmt = con.createStatement(sql)
rs = stmt.executeQuery()
while rs.next():
rs.getInt(1)
rs.getFloat(2)
.
.
rs.close()
stmt.close()

DB2 sql query to a Pandas DataFrame from my Mac

I have a need to access data that resides in a remote db2 database via a sql statement and convert it to a Pandas DataFrame. All from my Mac. I looked at using Pandas' read_sql with the ibm_db_sa adapter, but it looks like the prerequisite client side software is not supported on the Mac
I came up with an jdbc option, which I'm posting, but I'm curious to know if anyone else has any ideas
Here's an option using jdbc, the pip installable JayDeBeApi and the appropriate db jar file
Note: this could be used for other jdbc/jaydebeapi compliant databases like Oracle, MS Sql Server, etc
import jaydebeapi
import pandas as pd
def read_jdbc(sql, jclassname, driver_args, jars=None, libs=None):
'''
Reads jdbc compliant data sources and returns a Pandas DataFrame
uses jaydebeapi.connect and doc strings :-)
https://pypi.python.org/pypi/JayDeBeApi/
:param sql: select statement
:param jclassname: Full qualified Java class name of the JDBC driver.
e.g. org.postgresql.Driver or com.ibm.db2.jcc.DB2Driver
:param driver_args: Argument or sequence of arguments to be passed to the
Java DriverManager.getConnection method. Usually the
database URL. See
http://docs.oracle.com/javase/6/docs/api/java/sql/DriverManager.html
for more details
:param jars: Jar filename or sequence of filenames for the JDBC driver
:param libs: Dll/so filenames or sequence of dlls/sos used as
shared library by the JDBC driver
:return: Pandas DataFrame
'''
try:
conn = jaydebeapi.connect(jclassname, driver_args, jars, libs)
except jaydebeapi.DatabaseError as de:
raise
try:
curs = conn.cursor()
curs.execute(sql)
columns = [desc[0] for desc in curs.description] #getting column headers
#convert the list of tuples from fetchall() to a df
return pd.DataFrame(curs.fetchall(), columns=columns)
except jaydebeapi.DatabaseError as de:
raise
finally:
curs.close()
conn.close()
Some examples
#DB2
conn = 'jdbc:db2://<host>:5032/<db>:currentSchema=<schema>;'
class_name = 'com.ibm.db2.jcc.DB2Driver'
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'])
#PostgreSQL
conn = 'jdbc:postgresql://<host>:5432/<db>?currentSchema=<schema>'
class_name = 'org.postgresql.Driver'
jar = '/path/to/jar/postgresql-9.4.1212.jar'
sql = 'SELECT name FROM table_name LIMIT 5'
df = read_jdbc(sql, class_name, [conn, 'myname', 'mypwd'], jars=jar)
I got a simpler answer from https://stackoverflow.com/a/33805547/914967 where it uses pip module ibm_db only:
import ibm_db
import ibm_db_dbi
import pandas as pd
conn_handle = ibm_db.connect('DATABASE={};HOSTNAME={};PORT={};PROTOCOL=TCPIP;UID={};PWD={};'.format(db_name, hostname, port_number, user, password), '', '')
conn = ibm_db_dbi.Connection(conn_handle)
df = pd.read_sql(sql, conn)
Bob, you should check out ibmdbpy (https://pypi.python.org/pypi/ibmdbpy). It is a pandas data frame style API to DB2 and dashDB tables. It supports both underlying DB2 client drivers, ODBC and JDBC.
So as prerequisites you need to set up the DB2 client driver package for Mac that you can find here: http://www-01.ibm.com/support/docview.wss?uid=swg21385217
After #IanBjorhovde commented on my question I investigated another solution that allows me to use sqlalchemy and pandas' read_sql()
Here are the steps I took. Note: I got this working on OSX Yosemite (10.10.4) for python 3.4 and 3.5
1) Download IBM DB2 Express-C (no-cost community edition of DB2)
https://www-01.ibm.com/marketing/iwm/iwm/web/pick.do?source=swg-db2expressc&S_TACT=000000VR&lang=en_US&S_OFF_CD=10000761
2) After navigating to the unzipped dir
sudo ./db2_install
I accepted the default location of /opt/IBM/db2/V10.1
3) Install ibm_db and ibm_db_sa
pip install ibm_db
I built ibm_db_sa from source because the pip installed failed
python setup.py install
That should do it. You might get an error like 'Reason: image not found' when you try to connect to your db so read this for the fix. Note: might require a reboot
Example usage:
import ibm_db_sa
import pandas as pd
from sqlalchemy import select, create_engine
eng = create_engine('ibm_db_sa://<user_name>:<pwd>#<host>:5032/<db name>')
sql = 'SELECT name FROM table_name FETCH FIRST 5 ROWS ONLY'
df = pd.read_sql(sql, eng)

Python to SQL Server Stored Procedure

I am trying to call a SQL Server stored procedure from my Python code, using sqlalchemy. What I'm finding is that no error is raised by the python code and the stored procedure is not executing.
Sample code:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
DB.connect()
result_proxy = DB.execute(formattedSQL)
results = result_proxy.fetchall()
Examination of formatted SQL yields the following command
EXECUTE mc.SaveFundamentalDataCSV #pSource='PythonTest', #pCountry='UK',
#pOperator='Operator', #pFromCountry='None', #pFromOperator='None',
#pToCountry='None', #pToOperator='None', #pSiteName='None', #pFactor='Factor',
#pGranularity='Hourly', #pDescription='Testing from python',
#pDataType='Forecast',#pTechnology = 'Electricity',
#pcsvData='01-Jan-2012 00:00:00,01-Feb-2012 00:15:00,1,01-Jan-2012 00:00:00,01-Feb-2012 00:30:00,2';
The various versions and software in use is as follows:
SQL Server 2008 R2
Python 2.6.6
SQLAlchemy 0.6.7
I have tested my stored procedure by calling it directly in SQL Server Management Studio with the same parameters with no problem.
It's worth stating that this point that the Python version and the SQL server version are non-changeable. I have no strong allegiance to sqlalchemy and am open to other suggestions.
Any advice would be greatly appreciated, more information can be provided if needed.
Fixed now but open to opinion if I'm using best practice here. I've used the 'text' object exposed by sqlalchemy, working code below:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
conn = DB.connect()
t = text(formattedSQL).execution_options(autocommit=True)
DB.execute(t)
conn.close()
Hope this proves helpful to someone else!

Categories

Resources