How do I load a .sql file in a python environment?

How do I load a .sql file in a python environment? - python

I have a .sql file which I'm trying to load in an online Python environment (JupyterHub) but other code I've found online has just left me confused. I've gotten as far as:
import sqlite3
from sqlite3 import connect
sqlite_uri = "sqlite:///basketball.db"
sqlite_engine = sqlalchemy.create_engine(sqlite_uri)
connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
sql_file = open("travel-times.sql")
travel = sql_file.read()
travel
sql_expr = """
SELECT *
FROM travel;
"""
pd.read_sql(sql_expr, sqlite_engine)
and calling the 'travel' object does at least print the data in raw form, but from there I'm at a loss to actually load the table from here. What commands would accomplish this?

Related

Using SQL Server file streaming in Python

I am attempting to use SQL Server 2017 filestream in python. All of the functionality i use goes through sqlalchemy, thus i am attempting to find a way of using this, since i haven't found any implementation within sqlalchemy or other libraries (may have missed something, if so please point me to a working and tested implementation).
I have decided to approach this using the dll, based on https://github.com/VisionMark/django-mssql-filestream/blob/master/sql_filestream/win32_streaming_api.py . However, my call to the OpenSqlFilestream fails and returns -1 instead of file handle. I have no idea what the issue is or how to fix it.
from ctypes import c_char, sizeof, windll
from sqlalchemy import create_engine
from sqlalchemy.orm import session_maker
import msvcrt
import os
msodbcsql = windll.LoadLibrary("C:\Windows\System32\msodbcsql17.dll")
engine = create_engine("mssql+pyodbc://user:pass#test/test?TrustedConnection=yes+driver=ODBC Driver+17+for+SQL+Server")
maker = session_maker(bind=engine)
session = session_maker()
## first query should begind transaction
path = session.execute("SELECT file_stream.PathName() FROM test_filetable").fetchall()[0][0]
## this returns str like "\\\\test\\*"
context = session.execute("SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()").fetchall()[0][0]
## returns bytes
_context = (c_char*len(context)).from_buffer_copy(context)
## This call fails
handle = msodbcsql.OpenSqlFilestream(
path, # FilestreamPath
0, # DesiredAccess
0, # OpenOptions
_context, # FilestreamTransactionContext
sizeof(_context), # FilestreamTransactionContextLength
0 # AllocationSize
)
## this returns -1 instead of handle
## Never reached, but this should create usable file
desc = msvcrt.open_osfhandle(fsHandle, os.O_RDONLY)
_file = os.fdopen(desc, 'r')
All of the queries work and output (as far as i understand) correct data.
How do i obtain filestream access to a file on SQL Server 2017 from python (3.7)?
Edit: The objects i read go to the size of gigabytes and the process only needs stream access.

My guess is that your issue is related to
the fact that a SQLAlchemy Session is much more than just a raw DB API Connection, and/or
the transaction context is not appropriate for your invocation of OpenSqlFilestream
For what it's worth, the following works for me with CPython 3.7.2 and pythonnet 2.4.0:
import clr
clr.AddReference("System.Data")
from System.Data import IsolationLevel
from System.Data.SqlClient import SqlCommand, SqlConnection
from System.Data.SqlTypes import SqlFileStream
from System.IO import File, FileAccess, FileOptions
# adapted from c# code at
# https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/filestream-data
connection_string = r"Data Source=(local)\SQLEXPRESS;Initial Catalog=myDB;Integrated Security=True"
con = SqlConnection(connection_string)
con.Open()
sql = """\
SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM employees WHERE EmployeeID = 1"""
cmd = SqlCommand(sql, con)
tran = con.BeginTransaction(IsolationLevel.ReadCommitted)
cmd.Transaction = tran
rdr = cmd.ExecuteReader()
rdr.Read()
path = rdr.GetString(0)
transaction_context = rdr.GetSqlBytes(1).Buffer
rdr.Close()
allocation_size = 0
input_stream = SqlFileStream(path, transaction_context,
FileAccess.Read, FileOptions.SequentialScan, allocation_size)
output_stream = File.Create(r"C:\Users\Gord\Desktop\photo.bmp")
input_stream.CopyTo(output_stream)
output_stream.Close()
input_stream.Close()
tran.Commit()
con.Close()

SQLITE3 not creating database

import sqlite3
conn = sqlite3.connect("test.db")
cursor = conn.cursor()
It should create the database, but it does not. Any help?

This code will create an sqlite db file called "test.db" in the same directory you are running your script from.
For example, if you have your python file in:
/home/user/python_code/mycode.py
And you run it from:
/home/user/
With:
python python_code/mycode.py # or python3
It will create an "empty" sqlite db file at
/home/user/test.db
If you can't find the test.db file, make sure you pass it the full path of where you want it to be located.
i.e.
conn = sqlite3.connect("/full/path/to/location/you/want/test.db")

I had the same problem, my .db file wasn't appearing because I forgot to add test.db at the end of path, see line 2 below
import sqlite3
databaseFile = "/home/user/test.db" #don't forget the test.db
conn = sqlite3.connect(databaseFile)
cursor = conn.cursor()

I suspect the DB will not be created on disk until you create at least one table in it. Just calling conn.cursor() is not sufficient.
Console sqlite3 utility behaves this way, too.

Connecting and testing a JDBC driver from Python

I'm trying to do some testing on our JDBC driver using Python.
Initially figuring out JPype, I eventually managed to connect the driver and execute select queries like so (reproducing a generalized snippet):
from __future__ import print_function
from jpype import *
#Start JVM, attach the driver jar
jvmpath = 'path/to/libjvm.so'
classpath = 'path/to/JDBC_Driver.jar'
startJVM(jvmpath, '-ea', '-Djava.class.path=' + classpath)
# Magic line 1
driver = JPackage('sql').Our_Driver
# Initiating a connection via DriverManager()
jdbc_uri = 'jdbc:our_database://localhost:port/database','user', 'passwd')
conn = java.sql.DriverManager.getConnection(jdbc_uri)
# Executing a statement
stmt = conn.createStatement()
rs = stmt.executeQuery ('select top 10 * from some_table')
# Extracting results
while rs.next():
''' Magic #2 - rs.getStuff() only works inside a while loop '''
print (rs.getString('col_name'))
However, I've failed to to batch inserts, which is what I wanted to test. Even when executeBatch() returned a jpype int[], which should indicate a successful insert, the table was not updated.
I then decided to try out py4j.
My plight - I'm having a hard time figuring out how to do the same thing as above. It is said py4j does not start a JVM on its own, and that the Java code needs to be prearranged with a GatewayServer(), so I'm not sure it's even feasible.
On the other hand, there's a library named py4jdbc that does just that.
I tinkered through the dbapi.py code but didn't quite understand the flow, and am pretty much jammed.
If anyone understands how to load a JDBC driver from a .jar file with py4j and can point me in the right direction, I'd be much grateful.

add a commit after adding the records and before retrieving.
conn.commit()

I have met a similar problem in airflow, I used teradata jdbc jars and jaydebeapi to connect teradata database and execute sql:
[root#myhost transfer]# cat test_conn.py
import jaydebeapi
from contextlib import closing
jclassname='com.teradata.jdbc.TeraDriver'
jdbc_driver_loc = '/opt/spark-2.3.1/jars/terajdbc4-16.20.00.06.jar,/opt/spark-2.3.1/jars/tdgssconfig-16.20.00.06.jar'
jdbc_driver_name = 'com.teradata.jdbc.TeraDriver'
host='my_teradata.address'
url='jdbc:teradata://' + host + '/TMODE=TERA'
login="teradata_user_name"
psw="teradata_passwd"
sql = "SELECT COUNT(*) FROM A_TERADATA_TABLE_NAME where month_key='202009'"
conn = jaydebeapi.connect(jclassname=jdbc_driver_name,
url=url,
driver_args=[login, psw],
jars=jdbc_driver_loc.split(","))
with closing(conn) as conn:
with closing(conn.cursor()) as cur:
cur.execute(sql)
print(cur.fetchall())
[root#myhost transfer]# python test_conn.py
[(7734133,)]
[root#myhost transfer]#

In py4j, with your respective JDBC uri:
from py4j.java_gateway import JavaGateway
# Open JVM interface with the JDBC Jar
jdbc_jar_path = '/path/to/jdbc_driver.jar'
gateway = JavaGateway.launch_gateway(classpath=jdbc_jar_path)
# Load the JDBC Jar
jdbc_class = "com.vendor.VendorJDBC"
gateway.jvm.class.forName(jdbc_class)
# Initiate connection
jdbc_uri = "jdbc://vendor:192.168.x.y:zzzz;..."
con = gateway.jvm.DriverManager.getConnection(jdbc_uri)
# Run a query
sql = "select this from that"
stmt = con.createStatement(sql)
rs = stmt.executeQuery()
while rs.next():
rs.getInt(1)
rs.getFloat(2)
.
.
rs.close()
stmt.close()

Fast MySQL Import

Writing a script to convert raw data for MySQL import I worked with a temporary textfile so far which I later imported manually using the LOAD DATA INFILE... command.
Now I included the import command into the python script:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
cursor = db.cursor()
query = """
LOAD DATA INFILE 'temp.txt' INTO TABLE myDB.values
FIELDS TERMINATED BY ',' LINES TERMINATED BY ';';
"""
cursor.execute(query)
cursor.close()
db.commit()
db.close()
This works but temp.txt has to be in the database directory which isn't suitable for my needs.
Next approch is dumping the file and commiting directly:
db = mysql.connector.connect(user='root', password='root',
host='localhost',
database='myDB')
sql = "INSERT INTO values(`timestamp`,`id`,`value`,`status`) VALUES(%s,%s,%s,%s)"
cursor=db.cursor()
for line in lines:
mode, year, julian, time, *values = line.split(",")
del values[5]
date = datetime.strptime(year+julian, "%Y%j").strftime("%Y-%m-%d")
time = datetime.strptime(time.rjust(4, "0"), "%H%M" ).strftime("%H:%M:%S")
timestamp = "%s %s" % (date, time)
for i, value in enumerate(values[:20], 1):
args = (timestamp,str(i+28),value, mode)
cursor.execute(sql,args)
db.commit()
Works as well but takes around four times as long which is too much. (The same for construct was used in the first version to generate temp.txt)
My conclusion is that I need a file and the LOAD DATA INFILE command to be faster. To be free where the textfile is placed the LOCAL option seems useful. But with MySQL Connector (1.1.7) there is the known error:
mysql.connector.errors.ProgrammingError: 1148 (42000): The used command is not allowed with this MySQL version
So far I've seen that using MySQLdb instead of MySQL Connector can be a workaround. Activity on MySQLdb however seems low and Python 3.3 support will probably never come.
Is LOAD DATA LOCAL INFILE the way to go and if so is there a working connector for python 3.3 available?
EDIT: After development the database will run on a server, script on a client.

I may have missed something important, but can't you just specify the full filename in the first chunk of code?
LOAD DATA INFILE '/full/path/to/temp.txt'
Note the path must be a path on the server.

To use LOAD DATA INFILE with every accessible file you have to set the
LOCAL_FILES client flag while creating the connection
import mysql.connector
from mysql.connector.constants import ClientFlag
db = mysql.connector.connect(client_flags=[ClientFlag.LOCAL_FILES], <other arguments>)

Python to SQL Server Stored Procedure

I am trying to call a SQL Server stored procedure from my Python code, using sqlalchemy. What I'm finding is that no error is raised by the python code and the stored procedure is not executing.
Sample code:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
DB.connect()
result_proxy = DB.execute(formattedSQL)
results = result_proxy.fetchall()
Examination of formatted SQL yields the following command
EXECUTE mc.SaveFundamentalDataCSV #pSource='PythonTest', #pCountry='UK',
#pOperator='Operator', #pFromCountry='None', #pFromOperator='None',
#pToCountry='None', #pToOperator='None', #pSiteName='None', #pFactor='Factor',
#pGranularity='Hourly', #pDescription='Testing from python',
#pDataType='Forecast',#pTechnology = 'Electricity',
#pcsvData='01-Jan-2012 00:00:00,01-Feb-2012 00:15:00,1,01-Jan-2012 00:00:00,01-Feb-2012 00:30:00,2';
The various versions and software in use is as follows:
SQL Server 2008 R2
Python 2.6.6
SQLAlchemy 0.6.7
I have tested my stored procedure by calling it directly in SQL Server Management Studio with the same parameters with no problem.
It's worth stating that this point that the Python version and the SQL server version are non-changeable. I have no strong allegiance to sqlalchemy and am open to other suggestions.
Any advice would be greatly appreciated, more information can be provided if needed.

Fixed now but open to opinion if I'm using best practice here. I've used the 'text' object exposed by sqlalchemy, working code below:
def SaveData(self, aScrapeResult):
sql = "EXECUTE mc.SaveFundamentalDataCSV #pSource='%s',#pCountry='%s',#pOperator='%s',#pFromCountry='%s',#pFromOperator='%s',#pToCountry='%s',#pToOperator='%s',#pSiteName='%s',#pFactor='%s',#pGranularity='%s',#pDescription='%s',#pDataType='%s',#pTechnology = '%s',#pcsvData='%s'"
# Need to convert the data into CSV
util = ListToCsvUtil()
csvValues = util.ListToCsv(aScrapeResult.DataPoints)
formattedSQL = sql % (aScrapeResult.Source ,aScrapeResult.Country,aScrapeResult.Operator ,aScrapeResult.FromCountry ,aScrapeResult.FromOperator ,aScrapeResult.ToCountry ,aScrapeResult.ToOperator ,aScrapeResult.SiteName ,aScrapeResult.Factor ,aScrapeResult.Granularity ,aScrapeResult.Description ,aScrapeResult.DataType ,aScrapeResult.Technology ,csvValues)
DB = create_engine(self.ConnectionString)
conn = DB.connect()
t = text(formattedSQL).execution_options(autocommit=True)
DB.execute(t)
conn.close()
Hope this proves helpful to someone else!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I load a .sql file in a python environment? - python

Related

Using SQL Server file streaming in Python

SQLITE3 not creating database

Connecting and testing a JDBC driver from Python

Fast MySQL Import

Python to SQL Server Stored Procedure

Categories

Resources