Connecting and testing a JDBC driver from Python - python

I'm trying to do some testing on our JDBC driver using Python.
Initially figuring out JPype, I eventually managed to connect the driver and execute select queries like so (reproducing a generalized snippet):
from __future__ import print_function
from jpype import *
#Start JVM, attach the driver jar
jvmpath = 'path/to/libjvm.so'
classpath = 'path/to/JDBC_Driver.jar'
startJVM(jvmpath, '-ea', '-Djava.class.path=' + classpath)
# Magic line 1
driver = JPackage('sql').Our_Driver
# Initiating a connection via DriverManager()
jdbc_uri = 'jdbc:our_database://localhost:port/database','user', 'passwd')
conn = java.sql.DriverManager.getConnection(jdbc_uri)
# Executing a statement
stmt = conn.createStatement()
rs = stmt.executeQuery ('select top 10 * from some_table')
# Extracting results
while rs.next():
''' Magic #2 - rs.getStuff() only works inside a while loop '''
print (rs.getString('col_name'))
However, I've failed to to batch inserts, which is what I wanted to test. Even when executeBatch() returned a jpype int[], which should indicate a successful insert, the table was not updated.
I then decided to try out py4j.
My plight - I'm having a hard time figuring out how to do the same thing as above. It is said py4j does not start a JVM on its own, and that the Java code needs to be prearranged with a GatewayServer(), so I'm not sure it's even feasible.
On the other hand, there's a library named py4jdbc that does just that.
I tinkered through the dbapi.py code but didn't quite understand the flow, and am pretty much jammed.
If anyone understands how to load a JDBC driver from a .jar file with py4j and can point me in the right direction, I'd be much grateful.

add a commit after adding the records and before retrieving.
conn.commit()

I have met a similar problem in airflow, I used teradata jdbc jars and jaydebeapi to connect teradata database and execute sql:
[root#myhost transfer]# cat test_conn.py
import jaydebeapi
from contextlib import closing
jclassname='com.teradata.jdbc.TeraDriver'
jdbc_driver_loc = '/opt/spark-2.3.1/jars/terajdbc4-16.20.00.06.jar,/opt/spark-2.3.1/jars/tdgssconfig-16.20.00.06.jar'
jdbc_driver_name = 'com.teradata.jdbc.TeraDriver'
host='my_teradata.address'
url='jdbc:teradata://' + host + '/TMODE=TERA'
login="teradata_user_name"
psw="teradata_passwd"
sql = "SELECT COUNT(*) FROM A_TERADATA_TABLE_NAME where month_key='202009'"
conn = jaydebeapi.connect(jclassname=jdbc_driver_name,
url=url,
driver_args=[login, psw],
jars=jdbc_driver_loc.split(","))
with closing(conn) as conn:
with closing(conn.cursor()) as cur:
cur.execute(sql)
print(cur.fetchall())
[root#myhost transfer]# python test_conn.py
[(7734133,)]
[root#myhost transfer]#

In py4j, with your respective JDBC uri:
from py4j.java_gateway import JavaGateway
# Open JVM interface with the JDBC Jar
jdbc_jar_path = '/path/to/jdbc_driver.jar'
gateway = JavaGateway.launch_gateway(classpath=jdbc_jar_path)
# Load the JDBC Jar
jdbc_class = "com.vendor.VendorJDBC"
gateway.jvm.class.forName(jdbc_class)
# Initiate connection
jdbc_uri = "jdbc://vendor:192.168.x.y:zzzz;..."
con = gateway.jvm.DriverManager.getConnection(jdbc_uri)
# Run a query
sql = "select this from that"
stmt = con.createStatement(sql)
rs = stmt.executeQuery()
while rs.next():
rs.getInt(1)
rs.getFloat(2)
.
.
rs.close()
stmt.close()

Related

How do I load a .sql file in a python environment?

I have a .sql file which I'm trying to load in an online Python environment (JupyterHub) but other code I've found online has just left me confused. I've gotten as far as:
import sqlite3
from sqlite3 import connect
sqlite_uri = "sqlite:///basketball.db"
sqlite_engine = sqlalchemy.create_engine(sqlite_uri)
connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
sql_file = open("travel-times.sql")
travel = sql_file.read()
travel
sql_expr = """
SELECT *
FROM travel;
"""
pd.read_sql(sql_expr, sqlite_engine)
and calling the 'travel' object does at least print the data in raw form, but from there I'm at a loss to actually load the table from here. What commands would accomplish this?

Problem when I try to import big database into SQL Azure with Python

I have a pretty weird problem, I am trying to extract with Python a SQL database in Azure.
Within this database, there are several tables (I explain this because you gonna see a "for" loop in the code).
I can import some tables without problem, others (the ones that take the longest, I suppose it is because size) fail.
Not only does it throw an error ( [1] 25847 killed / usr / bin / python3 ), but it directly kicks me out of the console.
Does anyone know why? Is there an easier way to calculate the size of the database without import the entire database with pd.read_sql ()?
code:
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
query = "SELECT * FROM INFORMATION_SCHEMA.TABLES"
df = pd.read_sql(query, cnxn)
df
DataConContenido = pd.DataFrame({'Nombre':[], 'TieneCon?':[],'Size':[]})
for tablas in df['TABLE_NAME']:
cnxn = pyodbc.connect('DRIVER={ODBC Driver 17 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
cursor = cnxn.cursor()
query = "SELECT * FROM " + tablas
print("vamos con "+ str(tablas))
try:
df = pd.read_sql(query, cnxn)
size=df.shape
if size[0] > 0:
DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,True,size])),ignore_index=True)
else:
DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,False,size])),ignore_index=True)
except:
pass
Could it be that the connection drops when it takes so long and that is why the error named above?
I think the process is getting killed in the below line :
DataConContenido= DataConContenido.append(dict(zip(['Nombre','TieneCon?','Size'],[tablas,True,size])),ignore_index=True)
You could double confirm by adding a print statement just above it.
print("Querying Completed...")
You are getting KILLED mainly because there is a probability that your process crossed some limit in the amount of system resources that you are allowed to use. This specific operation to me appears like one.
If possible you could query and append in batches rather than doing in one shot.

Fast MySQL Import

Writing a script to convert raw data for MySQL import I worked with a temporary textfile so far which I later imported manually using the LOAD DATA INFILE... command.
Now I included the import command into the python script:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
cursor = db.cursor()
query = """
LOAD DATA INFILE 'temp.txt' INTO TABLE myDB.values
FIELDS TERMINATED BY ',' LINES TERMINATED BY ';';
"""
cursor.execute(query)
cursor.close()
db.commit()
db.close()
This works but temp.txt has to be in the database directory which isn't suitable for my needs.
Next approch is dumping the file and commiting directly:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
sql = "INSERT INTO values(`timestamp`,`id`,`value`,`status`) VALUES(%s,%s,%s,%s)"
cursor=db.cursor()
for line in lines:
mode, year, julian, time, *values = line.split(",")
del values[5]
date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%m-%d")
time = datetime.strptime(time.rjust(4, "0"), "%H%M" ).strftime("%H:%M:%S")
timestamp = "%s %s" % (date, time)
for i, value in enumerate(values[:20], 1):
args = (timestamp,str(i+28),value, mode)
cursor.execute(sql,args)
db.commit()
Works as well but takes around four times as long which is too much. (The same for construct was used in the first version to generate temp.txt)
My conclusion is that I need a file and the LOAD DATA INFILE command to be faster. To be free where the textfile is placed the LOCAL option seems useful. But with MySQL Connector (1.1.7) there is the known error:
mysql.connector.errors.ProgrammingError: 1148 (42000): The used command is not allowed with this MySQL version
So far I've seen that using MySQLdb instead of MySQL Connector can be a workaround. Activity on MySQLdb however seems low and Python 3.3 support will probably never come.
Is LOAD DATA LOCAL INFILE the way to go and if so is there a working connector for python 3.3 available?
EDIT: After development the database will run on a server, script on a client.
I may have missed something important, but can't you just specify the full filename in the first chunk of code?
LOAD DATA INFILE '/full/path/to/temp.txt'
Note the path must be a path on the server.
To use LOAD DATA INFILE with every accessible file you have to set the
LOCAL_FILES client flag while creating the connection
import mysql.connector
from mysql.connector.constants import ClientFlag
db = mysql.connector.connect(client_flags=[ClientFlag.LOCAL_FILES], <other arguments>)

Using JPype - How can I access JDBC Meta Data Functions

I'm using JayDeBeAPI which uses JPype to load FileMaker's JDBC driver and pull data.
But I also want to be able to get a listing of all tables in the database.
In the JDBC documentation (page 55) it lists the following functions:
The JDBC client driver supports the following Meta Data functions:
getColumns
getColumnPrivileges
getMetaData
getTypeInfo
getTables
getTableTypes
Any ideas how I might call them from JPype or JayDeBeAPI?
If it helps, here's my current code:
import jaydebeapi
import jpype
jar = r'/opt/drivers/fmjdbc.jar'
args='-Djava.class.path=%s' % jar
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, args)
conn = jaydebeapi.connect('com.filemaker.jdbc.Driver',
SETTINGS['SOURCE_URL'], SETTINGS['SOURCE_UID'], SETTINGS['SOURCE_PW'])
curs = conn.cursor()
#Sample Query:
curs.execute("select * from table")
result_rows = curs.fetchall()
Update:
Here's some progress and it seems like it should work, but I'm getting the error below. Any ideas?
> conn.jconn.metadata.getTables()
*** RuntimeError: No matching overloads found. at src/native/common/jp_method.cpp:121
Ok, thanks to eltabo and Juan Mellado I figured it out!
I just had to pass in the correct parameters to match the method signature.
Here's the working code:
import jaydebeapi
import jpype
jar = r'/opt/drivers/fmjdbc.jar'
args='-Djava.class.path=%s' % jar
jvm_path = jpype.getDefaultJVMPath()
jpype.startJVM(jvm_path, args)
conn = jaydebeapi.connect('com.filemaker.jdbc.Driver',
SETTINGS['SOURCE_URL'], SETTINGS['SOURCE_UID'], SETTINGS['SOURCE_PW'])
results = source_conn.jconn.getMetaData().getTables(None, None, "%", None)
#I'm not sure if this is how to read the result set, but jaydebeapi's cursor object
# has a lot of logic for getting information out of a result set, so let's harness
# that.
table_reader_cursor = source_conn.cursor()
table_reader_cursor._rs = results
read_results = table_reader_cursor.fetchall()
#get just the table names
[row[2] for row in read_results if row[3]=='TABLE']
From ResultSet Javadoc:
public ResultSet getTables(String catalog,
String schemaPattern,
String tableNamePattern,
String[] types)
throws SQLException
You need pass the four parameter to the method. I'm not a python developer, but in Java I use :
ResultSet rs = metadata.getTables(null, "public", "%" ,new String[] {"TABLE"} );
to get all the tables (and only the tables) in a schema.
Regards.

Python to SQL Server Stored Procedure

I am trying to call a SQL Server stored procedure from my Python code, using sqlalchemy. What I'm finding is that no error is raised by the python code and the stored procedure is not executing.
Sample code:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
DB.connect()
result_proxy = DB.execute(formattedSQL)
results = result_proxy.fetchall()
Examination of formatted SQL yields the following command
EXECUTE mc.SaveFundamentalDataCSV #pSource='PythonTest', #pCountry='UK',
#pOperator='Operator', #pFromCountry='None', #pFromOperator='None',
#pToCountry='None', #pToOperator='None', #pSiteName='None', #pFactor='Factor',
#pGranularity='Hourly', #pDescription='Testing from python',
#pDataType='Forecast',#pTechnology = 'Electricity',
#pcsvData='01-Jan-2012 00:00:00,01-Feb-2012 00:15:00,1,01-Jan-2012 00:00:00,01-Feb-2012 00:30:00,2';
The various versions and software in use is as follows:
SQL Server 2008 R2
Python 2.6.6
SQLAlchemy 0.6.7
I have tested my stored procedure by calling it directly in SQL Server Management Studio with the same parameters with no problem.
It's worth stating that this point that the Python version and the SQL server version are non-changeable. I have no strong allegiance to sqlalchemy and am open to other suggestions.
Any advice would be greatly appreciated, more information can be provided if needed.
Fixed now but open to opinion if I'm using best practice here. I've used the 'text' object exposed by sqlalchemy, working code below:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
conn = DB.connect()
t = text(formattedSQL).execution_options(autocommit=True)
DB.execute(t)
conn.close()
Hope this proves helpful to someone else!

Categories

Resources