impala connection via sqlalchemy - python

I'm new to hadoop and impala. I managed to connect to impala by installing impyla and executing the following code. This is connection by LDAP:
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host="server.lrd.com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")
I'm then able to grab a cursor and execute queries as:
cursor = conn.cursor()
cursor.execute('SELECT * FROM tab_2014_m LIMIT 10')
df = as_pandas(cursor)
I'd like to be able use sqlalchemy to connect to impala and be able to use some nice sqlalchemy functions. I found a test file in imyla source code that illustrates how to create an sqlalchemy engine with impala driver like:
engine = create_engine('impala://localhost')
I'd like to be able to do that but I'm not able to because my call to the connect function above has a lot more parameters; and I do not know how to pass those to sqlalchemy's create_engine to get a successful connection. Has anyone done this? Thanks.

As explained at https://github.com/cloudera/impyla/issues/214
import sqlalchemy
def conn():
return connect(host='some_host',
port=21050,
database='default',
timeout=20,
use_ssl=True,
ca_cert='some_pem',
user=user, password=pwd,
auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('impala://', creator=conn)

If your Impala is secured by Kerberos below script works (due to some reason I need to use hive:// instead of impala://)
import sqlalchemy
from sqlalchemy.engine import create_engine
connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'impala'}
engine = create_engine('hive://impalad-host:21050', connect_args=connect_args)
conn = engine.connect()
ResultProxy = conn.execute("SELECT * FROM db1.table1 LIMIT 5")
print(ResultProxy.fetchall())

import time
from sqlalchemy import create_engine, MetaData, Table, select, and_
ENGINE = create_engine(
'impala://{host}:{port}/{database}'.format(
host=host, # your host
port=port,
database=database,
)
)
METADATA = MetaData(ENGINE)
TABLES = {
'table': Table('table_name', METADATA, autoload=True),
}

Related

How to connect to Oracle-DB ODBC connection string using SQLAlchemy?

I am trying to connect to oracle-db using the odbc connection string, I am able to make the connection using pyodbc
import pyodbc
import pandas as pd
connection_string = 'DRIVER={Oracle};DBQ=X.X.X.X/YY/dbname;UID=someuser;PWD=XXXXXX'
cnxn = pyodbc.connect(connection_string)
but I am not able to connect using SQLAlchemy.
from sqlalchemy.engine import create_engine
params = urllib.parse.quote_plus(connection_string)
db_engine = create_engine(f"cx-Oracle+pyodbc:///?odbc_connect={params}")

How to set module name in SQLAlchemy and cx_Oracle with create_engine()?

In SQLAlchemy I create an engine with :
engine = create_engine(url="oracle+cx_oracle://user:xxxx#tns")
In cx_Oracle, I would create a connection with :
conn = cx_Oracle.connect(user="user", password="xxxx", dsn="tns")
I can then set the module with Connection.module attribute which tags appropriately when looking at v$session table.
conn.module = "MyModule"
Is there a way to set the Oracle session module name to an sqlalchemy.engine.Engine once it is created with create_engine?
I ended up using DialectEvents.do_connect() hook which worked nicely for me.
import cx_Oracle
from sqlalchemy import create_engine, event
engine = create_engine(url="oracle+cx_oracle://user:xxxx#tns")
#event.listens_for(engine, "do_connect")
def receive_do_connect(dialect, conn_rec, cargs, cparams):
"""listen for the 'do_connect' event"""
connection = cx_Oracle.connect(*cargs, **cparams)
connection.module = "MyModule"
return connection

Connect to Oracle database from SQLAlchemy using pyodbc

I have set up a data source name(DSN) in ODBC driver and supplying that in a query.
My below code is working like a charm.
import pyodbc as db
cnxn = db.connect('DSN=Oracle Prod DW;PWD=******')
I want to create a sqlalchemy connection for the same, but I fail. I tried different approaches but it didn't work. I just want to supply a password and DSN.
Oracle dialect + ODBC Driver is not seem to be supported by SqlAlchemy
https://docs.sqlalchemy.org/en/13/core/engines.html#oracle
Only in Java Runtime you can do that apparently
https://docs.sqlalchemy.org/en/13/dialects/oracle.html#module-sqlalchemy.dialects.oracle.zxjdbc
https://www.jython.org/jython-old-sites/archive/21/docs/zxjdbc.html
That being said
If you have an oracle client installation with proper tnsnames setup
You can do something like follows
Install cx_Oracle
Setup tnsnames i.e.
DEVDB=
(DESCRIPTION =
(ADDRESS =(PROTOCOL =TCP)(HOST =10.10.10.11)(PORT =1521))
(CONNECT_DATA =
(SERVER =DEDICATED)
(SERVICE_NAME =SVCDEV)
)
)
Code
import sqlalchemy as alc
from sqlalchemy.orm import sessionmaker
import cx_Oracle
import pandas as pd
conn_str = 'oracle://DEVDB'
engine = alc.create_engine(conn_str, echo=False)
Session = sessionmaker(bind=engine)
# YOU MIGHT NEED THIS sometimes
# cx_Oracle.init_oracle_client(lib_dir=r"C:\oracle\x64\product\19.0.0\client_1\bin")
sess = Session()
result = sess.execute("select 'foo' from dual")
df = pd.DataFrame(result.fetchall(), columns=result.keys())
print(df.to_string())

Launch SQL stored procedures from python with sqlalchemy?

I can successfully connect to SQL Server Management Studio from my jupyter notebook with this script:
from sqlalchemy import create_engine
import pyodbc
import csv
import time
import urllib
params = urllib.parse.quote_plus('''DRIVER={SQL Server Native Client 11.0};
SERVER=SV;
DATABASE=DB;
TRUSTED_CONNECTION=YES;''')
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
I managed to execute some SQL scripts like this:
engine.execute("delete from table_name_X")
However, I can't execute stored procedures. I tried the following scripts from what I've seen in stored procedures with sqlAlchemy. These following scripts have an output like "sqlalchemy.engine.result.ResultProxy at 0x173ed18e470", but the procedure wasn't executed in reality (nothing happened):
# test 1
engine.execute('stored_procedure_name')
# test 2
from sqlalchemy import func
from sqlalchemy.orm import sessionmaker
session = sessionmaker(bind=engine)()
session.execute(func.upper('stored_procedure_name'))
Could you please give me the correct way to execute stored procedures?
The way you can call a stored procedure using pyodbc is :
cursor.execute("{CALL usp_StoreProcedure}")
I found a solutions in reference to this link . https://github.com/mkleehammer/pyodbc/wiki/Calling-Stored-Procedures
Here a example :
import pyodbc
import urllib
import sqlalchemy as sa
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=xxx.xxx.xxx.xxx;"
"DATABASE=DB;"
"UID=user;"
"PWD=pass")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
connection = engine.raw_connection()
try:
cursor = connection.cursor()
cursor.execute("{CALL stored_procedure_name}")
result = cursor.fetchall()
print(result)
connection.commit()
finally:
connection.close()
Finally solved my problem with the following function :
def execute_stored_procedure(engine, procedure_name):
res = {}
connection = engine.raw_connection()
try:
cursor = connection.cursor()
cursor.execute("EXEC "+procedure_name)
cursor.close()
connection.commit()
res['status'] = 'OK'
except Exception as e:
res['status'] = 'ERROR'
res['error'] = e
finally:
connection.close()
return res

How to load data into blaze from hive2

All,
I am attempting to load data into blaze from a hive2 thrift server. I would like to do some analysis similar to what is posted here. Here is my current process.
import blaze as bz
import sqlalchemy
import impala
conn = connect(host='myhost.url.com', port=10000, database='mydb', user='hive', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
data = bz.data(engine)
I am able to make the connection and generate the engine, but when I run bz.data it fails with the error
TypeError: 'HiveServer2Connection' object is not callable
Any help is appreciated.
Answer
from pyhive import import hive
import sqlalchemy
from impala.dbapi import import connect
def conn():
return connect(host='myhost.com', port=10000, database='database', user='username', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
#Workaround
import blaze as bz
data = bz.data(engine)
from pyhive import import hive
import sqlalchemy
from impala.dbapi import import connect
def conn():
return connect(host='myhost.com', port=10000, database='database', user='username', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
#Workaround
import blaze as bz
data = bz.data(engine)
I was having this same issue when using impyla to connect to Impala with SQLAlchemy. Making conn a function instead of assigning it to a variable worked.

Categories

Resources