Apache Airflow - Connection issue to MS SQL Server using pymssql + SQLAlchemy - python

I am facing a problem to connect to an Azure MS SQL Server 2014 database in Apache Airflow 1.10.1 using pymssql.
I want to use the MsSqlHook class provided by Airflow, for the convenience to create my connection in the Airflow UI, and then create a context manager for my connection using SqlAlchemy:
#contextmanager
def mssql_session(dt_conn_id):
sqla_engine = MsSqlHook(mssql_conn_id=dt_conn_id).get_sqlalchemy_engine()
session = sessionmaker(bind=sqla_engine)()
try:
yield session
except:
session.rollback()
raise
else:
session.commit()
finally:
session.close()
But when I do that, I have this error when I run a request :
sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002',
'[IM002] [unixODBC][Driver Manager]Data source name not found, and no
default driver specified (0) (SQLDriverConnect)') (Background on this
error at: http://sqlalche.me/e/rvf5)
It seems come from pyodbc whereas I want to use pymssql (and in MsSqlHook, the method get_conn uses pymssql !)
I searched in the source code of Airflow the cause.
I noticed that the method get_uri from the class DbApiHook (from which is inherited MsSqlHook) builds the connection string passed to SqlAlchemy like this:
'{conn.conn_type}://{login}{host}/{conn.schema}'
But conn.conn_type is simply equal to 'mssql' whereas we need to specify the DBAPI as described here:
https://docs.sqlalchemy.org/en/latest/core/engines.html#microsoft-sql-server
(for example : 'mssql+pymssql://scott:tiger#hostname:port/dbname')
So, by default, I think it uses pyodbc.
But how can I set properly the conn_type of the connection to 'mssql+pymssql' instead of 'mssql' ?
In the Airflow IU, you can simply select SQL server in a dropdown list, but not set as you want :
To work around the issue, I overload the get_uri method from DbApiHook in a new class I created inherited from MsSqlHook, and in which I build my own connection string, but it's not clean at all...
Thanks for any help

You're right. There's no easy, straightforward way to get Airflow to do what you want. Personally I would build the sqlalchemy engine inside of your context manager, something like create_engine(hook.get_uri().replace("://", "+pymssql://")) -- then I would toss the code somewhere reusable.

You can create a connection by passing it as an environment variable to Airflow. See the docs. The value of the variable is the database URL in the format SqlAlchemy accepts.
The name of the env var follows the pattern AIRFLOW_CONN_ to which you append the connection ID. For example AIRFLOW_CONN_MY_MSSQL, in this case, the conn_id would be 'my_mssql'.

Related

pyscopg2 WITHOUT transaction

Sometimes I have a need to execute a query from psycopg2 that is not in a transaction block.
For example:
cursor.execute('create index concurrently on my_table (some_column)')
Doesn't work:
InternalError: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
I don't see any easy way to do this with psycopg2. What am I missing?
I can probably call os.system('psql -c "create index concurrently"') or something similar to get it to run from my python code, however it would be much nicer to be able to do it inside python and not rely on psql to actually be in the container.
Yes, I have to use the concurrently option for this particular use case.
Another time I've explored this and not found an obvious answer is when I have a set of sql commands that I'd like to call with a single execute(), where the first one briefly locks a resource. When I do this, that resource will remain locked for the entire duration of the execute() rather than for just when the first statement in the sql string was running because they all run together in one big happy transaction.
In that case I could break the query up into a series of execute() statements - each became its own transaction, which was ok.
It seems like there should be a way, but I seem to be missing it. Hopefully this is an easy answer for someone.
EDIT: Add code sample:
#!/usr/bin/env python3.10
import psycopg2 as pg2
# -- set the standard psql environment variables to specify which database this should connect to.
# We have to set these to 'None' explicitly to get psycopg2 to use the env variables
connDetails = {'database': None, 'host': None, 'port': None, 'user': None, 'password': None}
with (pg2.connect(**connDetails) as conn, conn.cursor() as curs):
conn.set_session(autocommit=True)
curs.execute("""
create index concurrently if not exists my_new_index on my_table (my_column);
""")
Throws:
psycopg2.errors.ActiveSqlTransaction: CREATE INDEX CONCURRENTLY cannot run inside a transaction block
Per psycopg2 documentation:
It is possible to set the connection in autocommit mode: this way all the commands executed will be immediately committed and no rollback is possible. A few commands (e.g. CREATE DATABASE, VACUUM, CALL on stored procedures using transaction control…) require to be run outside any transaction: in order to be able to run these commands from Psycopg, the connection must be in autocommit mode: you can use the autocommit property.
Hence on the connection:
conn.set_session(autocommit=True)
Further resources from psycopg2 documentation:
transactions-control
connection.autocommit

Application name in cx_Oracle

I am connecting to an Oracle Database Server using a SessionPool from cx_Oracle. When I look at the description of the opened session in the Oracle developer, I see that its name is "python.exe". How can I set the application/module name in cx_oracle?
You may be able to physically rename python.exe but there is no programmatic way to change what is shown in Oracle Database as the executable.
You can set the driver name by calling cx_Oracle.init_oracle_client. This changes the CLIENT_DRIVER column of V$SESSION_CONNECT_INFO.
Other settable attributes including 'module' (which in Oracle terminology is not the program name) are shown in the documentation Oracle Database End-to-End Tracing.
# Set the tracing metadata
connection.client_identifier = "pythonuser"
connection.action = "Query Session tracing parameters"
connection.module = "End-to-end Demo"
for row in cursor.execute("""
SELECT username, client_identifier, module, action
FROM V$SESSION
WHERE username = 'SYSTEM'"""):
print(row)

SQLAlchemy: Get the database name from connection

I wonder if it is possible to get the database name after a connection. I known that it is possible with the engine created by the function 'create_engine' (here) and I would like to have the same possibility after a connection.
from sqlalchemy import create_engine, inspect
engine = create_engine('mysql+mysqldb://login:pass#localhost/MyDatabase')
print (engine.url.database) # print the dabase name with an engine
con = engine.connect()
I looked at the inspector tool, but there is no way to retrieve the database name like:
db_name = inspect(con.get_database_name() )
May be it is not possible. Any idea ?
Thanks a lot!
For MySQL, executing select DATABASE() as name_of_current_database should be sufficient. For SQL Server, it would be select DB_NAME() as name_of_current_database. I do not know of any inherently portable way of doing this that will work irrespective of the backend.

How to executescript in sqlite3 from Python transactionally? [duplicate]

Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.

Calling python script in web2py framework over webserver

I have NGINX UWSGI and WEB2PY installed on the server. Web2py application performing only one function by accessing the database and printing rows in the table.
def fetch():
import psycopg2
conn = psycopg2.connect(database="postgres",
user="postgres",
password="qwerty",
host="127.0.0.1")
cur = conn.cursor()
cur.execute("SELECT id, name from TEST")
rows = cur.fetchall()
conn.close()
return rows
When the function is called locally the table contents is returned.
But when I'm trying to call the function from remote machine I get an internal error 500.
One more interesting thing, is when function looks like this:
def hello():
return 'hello'
String 'hello' is returned. Starting adding it an import directive immediately causes error page to be generated.
Can any one please suggest the proper application syntax/logic?
My guess is that your MySQL service doesn't allow remote access. Could you check your MySQL configuration?
vim /etc/mysql/my.cnf
Comment out the following lines.
#bind-address = 127.0.0.1
#skip-networking
If there is no skip-networking line in your configuration file, just add it and comment out it.
And then restart the mysql service.
service mysql restart
Forgive the stupid question but have you checked if the module is available on your server?
When you say that the error appears in your hello function as soon as you try to import, it's the same directive import psycopg2?
Try this:
Assuming that fetch() it's defined on controllers/default.py
open folder views/default and create a new file called fetch.html
paste this inside
{{extend 'layout.html'}}
{{=rows}}
fetch.html is a view or a template if you prefer
Modify fetch() to return a dictionary with rows for the view to print
return dict(rows=rows)
this is very basic tough, you can find more information about basic steps in the book -> http://www.web2py.com/books/default/chapter/29/03/overview#Postbacks

Categories

Resources