Using SQL Server file streaming in Python

Using SQL Server file streaming in Python - python

I am attempting to use SQL Server 2017 filestream in python. All of the functionality i use goes through sqlalchemy, thus i am attempting to find a way of using this, since i haven't found any implementation within sqlalchemy or other libraries (may have missed something, if so please point me to a working and tested implementation).
I have decided to approach this using the dll, based on https://github.com/VisionMark/django-mssql-filestream/blob/master/sql_filestream/win32_streaming_api.py . However, my call to the OpenSqlFilestream fails and returns -1 instead of file handle. I have no idea what the issue is or how to fix it.
from ctypes import c_char, sizeof, windll
from sqlalchemy import create_engine
from sqlalchemy.orm import session_maker
import msvcrt
import os
msodbcsql = windll.LoadLibrary("C:\Windows\System32\msodbcsql17.dll")
engine = create_engine("mssql+pyodbc://user:pass#test/test?TrustedConnection=yes+driver=ODBC Driver+17+for+SQL+Server")
maker = session_maker(bind=engine)
session = session_maker()
## first query should begind transaction
path = session.execute("SELECT file_stream.PathName() FROM test_filetable").fetchall()[0][0]
## this returns str like "\\\\test\\*"
context = session.execute("SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()").fetchall()[0][0]
## returns bytes
_context = (c_char*len(context)).from_buffer_copy(context)
## This call fails
handle = msodbcsql.OpenSqlFilestream(
path, # FilestreamPath
0, # DesiredAccess
0, # OpenOptions
_context, # FilestreamTransactionContext
sizeof(_context), # FilestreamTransactionContextLength
0 # AllocationSize
)
## this returns -1 instead of handle
## Never reached, but this should create usable file
desc = msvcrt.open_osfhandle(fsHandle, os.O_RDONLY)
_file = os.fdopen(desc, 'r')
All of the queries work and output (as far as i understand) correct data.
How do i obtain filestream access to a file on SQL Server 2017 from python (3.7)?
Edit: The objects i read go to the size of gigabytes and the process only needs stream access.

My guess is that your issue is related to
the fact that a SQLAlchemy Session is much more than just a raw DB API Connection, and/or
the transaction context is not appropriate for your invocation of OpenSqlFilestream
For what it's worth, the following works for me with CPython 3.7.2 and pythonnet 2.4.0:
import clr
clr.AddReference("System.Data")
from System.Data import IsolationLevel
from System.Data.SqlClient import SqlCommand, SqlConnection
from System.Data.SqlTypes import SqlFileStream
from System.IO import File, FileAccess, FileOptions
# adapted from c# code at
# https://learn.microsoft.com/en-us/dotnet/framework/data/adonet/sql/filestream-data
connection_string = r"Data Source=(local)\SQLEXPRESS;Initial Catalog=myDB;Integrated Security=True"
con = SqlConnection(connection_string)
con.Open()
sql = """\
SELECT Photo.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM employees WHERE EmployeeID = 1"""
cmd = SqlCommand(sql, con)
tran = con.BeginTransaction(IsolationLevel.ReadCommitted)
cmd.Transaction = tran
rdr = cmd.ExecuteReader()
rdr.Read()
path = rdr.GetString(0)
transaction_context = rdr.GetSqlBytes(1).Buffer
rdr.Close()
allocation_size = 0
input_stream = SqlFileStream(path, transaction_context,
FileAccess.Read, FileOptions.SequentialScan, allocation_size)
output_stream = File.Create(r"C:\Users\Gord\Desktop\photo.bmp")
input_stream.CopyTo(output_stream)
output_stream.Close()
input_stream.Close()
tran.Commit()
con.Close()

Related

How do I load a .sql file in a python environment?

I have a .sql file which I'm trying to load in an online Python environment (JupyterHub) but other code I've found online has just left me confused. I've gotten as far as:
import sqlite3
from sqlite3 import connect
sqlite_uri = "sqlite:///basketball.db"
sqlite_engine = sqlalchemy.create_engine(sqlite_uri)
connection = sqlite3.connect(":memory:")
cursor = connection.cursor()
sql_file = open("travel-times.sql")
travel = sql_file.read()
travel
sql_expr = """
SELECT *
FROM travel;
"""
pd.read_sql(sql_expr, sqlite_engine)
and calling the 'travel' object does at least print the data in raw form, but from there I'm at a loss to actually load the table from here. What commands would accomplish this?

Writing to MS-Access DB with fast_executemany does not work

i am having problems to load data into an access-database. For testing purpose i build a little convert functions which takes all data-sets from a hdf-file and writes it into the accdb. Without the #event.listens_for(engine, "before_cursor_execute") functionality it works, but veeery slow. With it, it creates an odd behavior. It creates only one empty table (from the first df) in the db and finishes execution. The for-loop will never be finished and no error raises.
Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany but couldn’t find any related information about it. Does any of you have some input for me how i can solve it or be able to write data in a faster way into the db?
big thanks!
import urllib
from pathlib import Path
from sqlalchemy import create_engine, event
# PATHS
HOME = Path(__file__).parent
DATA_DIR = HOME / 'output'
FILE_ACCESS = DATA_DIR / 'db.accdb'
FILE_HDF5 = DATA_DIR / 'Data.hdf'
# FUNCTIONS
def convert_from_hdf_to_accb():
# https://github.com/gordthompson/sqlalchemy-access/wiki/Getting-Connected
driver = '{Microsoft Access Driver (*.mdb, *.accdb)}'
conn_str = 'DRIVER={};DBQ={};'.format(driver, FILE_ACCESS)
conn_url = "access+pyodbc:///?odbc_connect={}".format(urllib.parse.quote_plus(conn_str))
# https://medium.com/analytics-vidhya/speed-up-bulk-inserts-to-sql-db-using-pandas-and-python-61707ae41990
# https://github.com/pandas-dev/pandas/issues/15276
# https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc
engine = create_engine(conn_url)
#event.listens_for(engine, "before_cursor_execute")
def receive_before_cursor_execute(conn, cursor, statement, params, context, executemany):
if executemany:
cursor.fast_executemany = True
with pd.HDFStore(path=FILE_HDF5, mode="r") as store:
for key in store.keys():
df = store.get(key)
df.to_sql(name=key, con=engine, index=False, if_exists='replace')
print(' IT NEVER REACHES AND DOESNT RAISE AN ERROR :( ')
# EXECUTE
if __name__ == "__main__":
convert_from_hdf_to_accb()

Maybe it’s because the sqlalchemy-access package doesn’t support fast_executemany
That is true. pyodbc's fast_executemany feature requires that the driver support an internal ODBC mechanism called "parameter arrays", and the Microsoft Access ODBC driver does not support them.
See also
https://github.com/mkleehammer/pyodbc/wiki/Driver-support-for-fast_executemany

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

I have a Pyramid / SQLAlchemy, MySQL python app.
When I execute a raw SQL INSERT query, nothing gets written to the DB.
When using ORM, however, I can write to the DB. I read the docs, I read up about the ZopeTransactionExtension, read a good deal of SO questions, all to no avail.
What hasn't worked so far:
transaction.commit() - nothing is written to the DB. I do realize this statement is necessary with ZopeTransactionExtension but it just doesn't do the magic here.
dbsession().commit - doesn't work since I'm using ZopeTransactionExtension
dbsession().close() - nothing written
dbsession().flush() - nothing written
mark_changed(session) -
File "/home/dev/.virtualenvs/sc/local/lib/python2.7/site-packages/zope/sqlalchemy/datamanager.py", line 198, in join_transaction
if session.twophase:
AttributeError: 'scoped_session' object has no attribute 'twophase'"
What has worked but is not acceptable because it doesn't use scoped_session:
engine.execute(...)
I'm looking for how to execute raw SQL with a scoped_session (dbsession() in my code)
Here is my SQLAlchemy setup (models/__init__.py)
def dbsession():
assert (_dbsession is not None)
return _dbsession
def init_engines(settings, _testing_workarounds=False):
import zope.sqlalchemy
extension = zope.sqlalchemy.ZopeTransactionExtension()
global _dbsession
_dbsession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=extension,
)
)
engine = engine_from_config(settings, 'sqlalchemy.')
_dbsession.configure(bind=engine)
Here is a python script I wrote to isolate the problem. It resembles the real-world environment of where the problem occurs. All I want is to make the below script insert the data into the DB:
# -*- coding: utf-8 -*-
import sys
import transaction
from pyramid.paster import setup_logging, get_appsettings
from sc.models import init_engines, dbsession
from sqlalchemy.sql.expression import text
def __main__():
if len(sys.argv) < 2:
raise RuntimeError()
config_uri = sys.argv[1]
setup_logging(config_uri)
aa = init_engines(get_appsettings(config_uri))
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
print list(session.execute("""SELECT * from operations""").fetchall()) # prints inserted data
transaction.commit()
print list(session.execute("""SELECT * from operations""").fetchall()) # doesn't print inserted data
if __name__ == '__main__':
__main__()
What is interesting, if I do:
session = dbsession()
session.execute(text("""INSERT INTO
operations (description, generated_description)
VALUES ('hello2', 'world');"""))
op = Operation(generated_description='aa', description='oo')
session.add(op)
then the first print outputs the raw SQL inserted row ('hello2' 'world'), and the second print prints both rows, and in fact both rows are inserted into the DB.
I cannot comprehend why using an ORM insert alongside raw SQL "fixes" it.
I really need to be able to call execute() on a scoped_session to insert data into the DB using raw SQL. Any advice?

It has been a while since I mixed raw sql with sqlalchemy, but whenever you mix them, you need to be aware of what happens behind the scenes with the ORM. First, check the autocommit flag. If the zope transaction is not configured correctly, the ORM insert might be triggering a commit.
Actually, after looking at the zope docs, it seems manual execute statements need an extra step. From their readme:
By default, zope.sqlalchemy puts sessions in an 'active' state when they are
first used. ORM write operations automatically move the session into a
'changed' state. This avoids unnecessary database commits. Sometimes it
is necessary to interact with the database directly through SQL. It is not
possible to guess whether such an operation is a read or a write. Therefore we
must manually mark the session as changed when manual SQL statements write
to the DB.
>>> session = Session()
>>> conn = session.connection()
>>> users = Base.metadata.tables['test_users']
>>> conn.execute(users.update(users.c.name=='bob'), name='ben')
<sqlalchemy.engine...ResultProxy object at ...>
>>> from zope.sqlalchemy import mark_changed
>>> mark_changed(session)
>>> transaction.commit()
>>> session = Session()
>>> str(session.query(User).all()[0].name)
'ben'
>>> transaction.abort()
It seems you aren't doing that, and so the transaction.commit does nothing.

Pure Python Backup of SQLite3 In-Memory Database to Disk

Without installing additional modules, how can I use the SQLite backup API to backup an in-memory database to an on-disk database? I have managed to successfully perform a disk-to-disk backup, but passing the already-extant in-memory connection to the sqlite3_backup_init function appears to be the problem.
My toy example, adapted from https://gist.github.com/achimnol/3021995 and cut down to the minimum, is as follows:
import sqlite3
import ctypes
# Create a junk in-memory database
sourceconn = sqlite3.connect(':memory:')
cursor = sourceconn.cursor()
cursor.execute('''CREATE TABLE stocks
(date text, trans text, symbol text, qty real, price real)''')
cursor.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
sourceconn.commit()
target = r'C:\data\sqlite\target.db'
dllpath = u'C:\\Python27\DLLs\\sqlite3.dll'
# Constants from the SQLite 3 API defining various return codes of state.
SQLITE_OK = 0
SQLITE_ERROR = 1
SQLITE_BUSY = 5
SQLITE_LOCKED = 6
SQLITE_OPEN_READONLY = 1
SQLITE_OPEN_READWRITE = 2
SQLITE_OPEN_CREATE = 4
# Tweakable variables
pagestocopy = 20
millisecondstosleep = 100
# dllpath = ctypes.util.find_library('sqlite3') # I had trouble with this on Windows
sqlitedll = ctypes.CDLL(dllpath)
sqlitedll.sqlite3_backup_init.restype = ctypes.c_void_p
# Setup some ctypes
p_src_db = ctypes.c_void_p(None)
p_dst_db = ctypes.c_void_p(None)
null_ptr = ctypes.c_void_p(None)
# Check to see if the first argument (source database) can be opened for reading.
# ret = sqlitedll.sqlite3_open_v2(sourceconn, ctypes.byref(p_src_db), SQLITE_OPEN_READONLY, null_ptr)
#assert ret == SQLITE_OK
#assert p_src_db.value is not None
# Check to see if the second argument (target database) can be opened for writing.
ret = sqlitedll.sqlite3_open_v2(target, ctypes.byref(p_dst_db), SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, null_ptr)
assert ret == SQLITE_OK
assert p_dst_db.value is not None
# Start a backup.
print 'Starting backup to SQLite database "%s" to SQLite database "%s" ...' % (sourceconn, target)
p_backup = sqlitedll.sqlite3_backup_init(p_dst_db, 'main', sourceconn, 'main')
print ' Backup handler: {0:#08x}'.format(p_backup)
assert p_backup is not None
# Step through a backup.
while True:
ret = sqlitedll.sqlite3_backup_step(p_backup, pagestocopy)
remaining = sqlitedll.sqlite3_backup_remaining(p_backup)
pagecount = sqlitedll.sqlite3_backup_pagecount(p_backup)
print ' Backup in progress: {0:.2f}%'.format((pagecount - remaining) / float(pagecount) * 100)
if remaining == 0:
break
if ret in (SQLITE_OK, SQLITE_BUSY, SQLITE_LOCKED):
sqlitedll.sqlite3_sleep(millisecondstosleep)
# Finish the bakcup
sqlitedll.sqlite3_backup_finish(p_backup)
# Close database connections
sqlitedll.sqlite3_close(p_dst_db)
sqlitedll.sqlite3_close(p_src_db)
I receive an error ctypes.ArgumentError: argument 3: <type 'exceptions.TypeError'>: Don't know how to convert parameter 3 on line 49 (p_backup = sqlitedll.sqlite3_backup_init(p_dst_db, 'main', sourceconn, 'main')). Somehow, I need to pass a reference to the in-memory database to that sqlite3_backup_init function.
I do not know enough C to grasp the specifics of the API itself.
Setup: Windows 7, ActiveState Python 2.7

It looks like as of Python 3.7, this functionality is available within the standard library. Here are some examples copied directly out of the official docs:
Example 1, copy an existing database into another:
import sqlite3
def progress(status, remaining, total):
print(f'Copied {total-remaining} of {total} pages...')
con = sqlite3.connect('existing_db.db')
bck = sqlite3.connect('backup.db')
with bck:
con.backup(bck, pages=1, progress=progress)
bck.close()
con.close()
Example 2, copy an existing database into a transient copy:
import sqlite3
source = sqlite3.connect('existing_db.db')
dest = sqlite3.connect(':memory:')
source.backup(dest)
To answer your specific question of backing up an in-memory database to disk, it looks like this works. Here's a quick script using the standard library backup method:
import sqlite3
source = sqlite3.connect(':memory:')
dest = sqlite3.connect('backup.db')
c = source.cursor()
c.execute("CREATE TABLE test(id INTEGER PRIMARY KEY, msg TEXT);")
c.execute("INSERT INTO test VALUES (?, ?);", (1, "Hello World!"))
source.commit()
source.backup(dest)
dest.close()
source.close()
And the backup.db database can be loaded into sqlite3 and inspected:
$ sqlite3 backup.db
SQLite version 3.24.0 2018-06-04 14:10:15
Enter ".help" for usage hints.
sqlite> .schema
CREATE TABLE test(id INTEGER PRIMARY KEY, msg TEXT);
sqlite> SELECT * FROM test;
1|Hello World!

An in-memory database can be accessed only through the SQLite library that created it (in this case, Python's built-in SQLite).
Python's sqlite3 module does not give access to the backup API, so it is not possible to copy an in-memory database.
You would need to install an additional module, or use an on-disk database in the first place.

While this isn't strictly a solution to your questions (as it's not using the backup API) it serves as a minimal effort approach and works well for small in memory databases.
import os
import sqlite3
database = sqlite3.connect(':memory:')
# fill the in memory db with your data here
dbfile = 'dbcopy.db'
if os.path.exists(dbfile):
os.remove(dbfile) # remove last db dump
new_db = sqlite3.connect(dbfile)
c = new_db.cursor()
c.executescript("\r\n".join(database.iterdump()))
new_db.close()

#cl is wrong. I solved this problem using ctypes to extract the underlying C pointer of a Python connection object and created a small package: https://pypi.org/project/sqlite3-backup/
Source code is at https://sissource.ethz.ch/schmittu/sqlite3_backup

How can Python Observe Changes to Mongodb's Oplog

I have multiple Python scripts writing to Mongodb using pyMongo. How can another Python script observe changes to a Mongo query and perform some function when the change occurs? mongodb is setup with oplog enabled.

I wrote a incremental backup tool for MongoDB some time ago, in Python. The tool monitors data changes by tailing the oplog. Here is the relevant part of the code.
Updated answer, MongDB 3.6+
As datdinhquoc cleverly points out in the comments below, for MongoDB 3.6 and up there are Change Streams.
Updated answer, pymongo 3
from time import sleep
from pymongo import MongoClient, ASCENDING
from pymongo.cursor import CursorType
from pymongo.errors import AutoReconnect
# Time to wait for data or connection.
_SLEEP = 1.0
if __name__ == '__main__':
oplog = MongoClient().local.oplog.rs
stamp = oplog.find().sort('$natural', ASCENDING).limit(-1).next()['ts']
while True:
kw = {}
kw['filter'] = {'ts': {'$gt': stamp}}
kw['cursor_type'] = CursorType.TAILABLE_AWAIT
kw['oplog_replay'] = True
cursor = oplog.find(**kw)
try:
while cursor.alive:
for doc in cursor:
stamp = doc['ts']
print(doc) # Do something with doc.
sleep(_SLEEP)
except AutoReconnect:
sleep(_SLEEP)
Also see http://api.mongodb.com/python/current/examples/tailable.html.
Original answer, pymongo 2
from time import sleep
from pymongo import MongoClient
from pymongo.cursor import _QUERY_OPTIONS
from pymongo.errors import AutoReconnect
from bson.timestamp import Timestamp
# Tailable cursor options.
_TAIL_OPTS = {'tailable': True, 'await_data': True}
# Time to wait for data or connection.
_SLEEP = 10
if __name__ == '__main__':
db = MongoClient().local
while True:
query = {'ts': {'$gt': Timestamp(some_timestamp, 0)}} # Replace with your query.
cursor = db.oplog.rs.find(query, **_TAIL_OPTS)
cursor.add_option(_QUERY_OPTIONS['oplog_replay'])
try:
while cursor.alive:
try:
doc = next(cursor)
# Do something with doc.
except (AutoReconnect, StopIteration):
sleep(_SLEEP)
finally:
cursor.close()

I ran into this issue today and haven't found an updated answer anywhere.
The Cursor class has changed as of v3.0 and no longer accepts the tailable and await_data arguments. This example will tail the oplog and print the oplog record when it finds a record newer than the last one it found.
# Adapted from the example here: https://jira.mongodb.org/browse/PYTHON-735
# to work with pymongo 3.0
import pymongo
from pymongo.cursor import CursorType
c = pymongo.MongoClient()
# Uncomment this for master/slave.
oplog = c.local.oplog['$main']
# Uncomment this for replica sets.
#oplog = c.local.oplog.rs
first = next(oplog.find().sort('$natural', pymongo.DESCENDING).limit(-1))
ts = first['ts']
while True:
cursor = oplog.find({'ts': {'$gt': ts}}, cursor_type=CursorType.TAILABLE_AWAIT, oplog_replay=True)
while cursor.alive:
for doc in cursor:
ts = doc['ts']
print doc
# Work with doc here

Query the oplog with a tailable cursor.
It is actually funny, because oplog-monitoring is exactly what the tailable-cursor feature was added for originally. I find it extremely useful for other things as well (e.g. implementing a mongodb-based pubsub, see this post for example), but that was the original purpose.

I had the same issue. I put together this rescommunes/oplog.py. Check comments and see __main__ for an example of how you could use it with your script.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using SQL Server file streaming in Python - python

Related

How do I load a .sql file in a python environment?

Writing to MS-Access DB with fast_executemany does not work

SQLAlchemy, scoped_session - raw SQL INSERT doesn't write to DB

Pure Python Backup of SQLite3 In-Memory Database to Disk

How can Python Observe Changes to Mongodb's Oplog

Categories

Resources