How to load data into blaze from hive2

How to load data into blaze from hive2 - python

All,
I am attempting to load data into blaze from a hive2 thrift server. I would like to do some analysis similar to what is posted here. Here is my current process.
import blaze as bz
import sqlalchemy
import impala
conn = connect(host='myhost.url.com', port=10000, database='mydb', user='hive', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
data = bz.data(engine)
I am able to make the connection and generate the engine, but when I run bz.data it fails with the error
TypeError: 'HiveServer2Connection' object is not callable
Any help is appreciated.
Answer
from pyhive import import hive
import sqlalchemy
from impala.dbapi import import connect
def conn():
return connect(host='myhost.com', port=10000, database='database', user='username', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
#Workaround
import blaze as bz
data = bz.data(engine)

from pyhive import import hive
import sqlalchemy
from impala.dbapi import import connect
def conn():
return connect(host='myhost.com', port=10000, database='database', user='username', auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('hive://', creator=conn)
#Workaround
import blaze as bz
data = bz.data(engine)

I was having this same issue when using impyla to connect to Impala with SQLAlchemy. Making conn a function instead of assigning it to a variable worked.

Related

Unable to run a Impala query through Python script when Impala is load balanced

I want to run this simple script:
from pyhive import hive
import sqlalchemy
from impala.dbapi import connect
import pandas as pd
def conn():
return connect(host='mid.impala.mycompany.com', port=21050, auth_mechanism='GSSAPI', use_ssl=True, kerberos_service_name='impala',ca_cert='/opt/cloudera/security/pki/SSrootCA.pem')
engine = sqlalchemy.create_engine('impala://', creator=conn)
pd.read_sql("SELECT * FROM giadb.a002_fnp_100 LIMIT 100", engine)
But I got this error:
"TTransportException: TTransportException(type=1, message="Could not connect to ('mid.impala.mycompany.com', 21050)")
The Impala services is load balanced. So I think I have to set in properly manner the connection string, but I need some help.
Thank you
Gianluca

How to connect to Oracle-DB ODBC connection string using SQLAlchemy?

I am trying to connect to oracle-db using the odbc connection string, I am able to make the connection using pyodbc
import pyodbc
import pandas as pd
connection_string = 'DRIVER={Oracle};DBQ=X.X.X.X/YY/dbname;UID=someuser;PWD=XXXXXX'
cnxn = pyodbc.connect(connection_string)
but I am not able to connect using SQLAlchemy.
from sqlalchemy.engine import create_engine
params = urllib.parse.quote_plus(connection_string)
db_engine = create_engine(f"cx-Oracle+pyodbc:///?odbc_connect={params}")

Launch SQL stored procedures from python with sqlalchemy?

I can successfully connect to SQL Server Management Studio from my jupyter notebook with this script:
from sqlalchemy import create_engine
import pyodbc
import csv
import time
import urllib
params = urllib.parse.quote_plus('''DRIVER={SQL Server Native Client 11.0};
SERVER=SV;
DATABASE=DB;
TRUSTED_CONNECTION=YES;''')
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
I managed to execute some SQL scripts like this:
engine.execute("delete from table_name_X")
However, I can't execute stored procedures. I tried the following scripts from what I've seen in stored procedures with sqlAlchemy. These following scripts have an output like "sqlalchemy.engine.result.ResultProxy at 0x173ed18e470", but the procedure wasn't executed in reality (nothing happened):
# test 1
engine.execute('stored_procedure_name')
# test 2
from sqlalchemy import func
from sqlalchemy.orm import sessionmaker
session = sessionmaker(bind=engine)()
session.execute(func.upper('stored_procedure_name'))
Could you please give me the correct way to execute stored procedures?

The way you can call a stored procedure using pyodbc is :
cursor.execute("{CALL usp_StoreProcedure}")
I found a solutions in reference to this link . https://github.com/mkleehammer/pyodbc/wiki/Calling-Stored-Procedures
Here a example :
import pyodbc
import urllib
import sqlalchemy as sa
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=xxx.xxx.xxx.xxx;"
"DATABASE=DB;"
"UID=user;"
"PWD=pass")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
connection = engine.raw_connection()
try:
cursor = connection.cursor()
cursor.execute("{CALL stored_procedure_name}")
result = cursor.fetchall()
print(result)
connection.commit()
finally:
connection.close()

Finally solved my problem with the following function :
def execute_stored_procedure(engine, procedure_name):
res = {}
connection = engine.raw_connection()
try:
cursor = connection.cursor()
cursor.execute("EXEC "+procedure_name)
cursor.close()
connection.commit()
res['status'] = 'OK'
except Exception as e:
res['status'] = 'ERROR'
res['error'] = e
finally:
connection.close()
return res

SSH tunnel won't close after downloading SQL server tables from PythonAnywhere

I'm downloading the tables from my SQL server on PythonAnywhere.com using a ssh tunnel following their description. Using the following code everything works fine in terms of downloading the tables, but the code then hangs at tunnel.close(). Any suggestions on how to stop it from hanging?
from __future__ import print_function
from mysql.connector import connect as sql_connect
import sshtunnel
from sshtunnel import SSHTunnelForwarder
from copy import deepcopy
import cPickle as pickle
import os
import datetime
sshtunnel.SSH_TIMEOUT = 5.0
sshtunnel.TUNNEL_TIMEOUT = 5.0
remote_bind_address = ('{}.mysql.pythonanywhere-services.com'.format(SSH_USERNAME), 3306)
tunnel = SSHTunnelForwarder(('ssh.pythonanywhere.com'),
ssh_username=SSH_USERNAME, ssh_password=SSH_PASSWORD,
remote_bind_address=remote_bind_address)
tunnel.start()
connection = sql_connect(user=SSH_USERNAME, password=DATABASE_PASSWORD,
host='127.0.0.1', port=tunnel.local_bind_port,
database=DATABASE_NAME)
print("Connection successful!")
cursor = connection.cursor() # get the cursor
cursor.execute("USE {}".format(DATABASE_NAME)) # select the database
# fetch all tables
cursor.execute("SHOW TABLES")
tables = deepcopy(cursor.fetchall()) # return data from last query
for (table_name,) in tables:
if 'contribute' in table_name:
print(table_name)
# may hang
connection.close()
tunnel.close()

impala connection via sqlalchemy

I'm new to hadoop and impala. I managed to connect to impala by installing impyla and executing the following code. This is connection by LDAP:
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host="server.lrd.com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")
I'm then able to grab a cursor and execute queries as:
cursor = conn.cursor()
cursor.execute('SELECT * FROM tab_2014_m LIMIT 10')
df = as_pandas(cursor)
I'd like to be able use sqlalchemy to connect to impala and be able to use some nice sqlalchemy functions. I found a test file in imyla source code that illustrates how to create an sqlalchemy engine with impala driver like:
engine = create_engine('impala://localhost')
I'd like to be able to do that but I'm not able to because my call to the connect function above has a lot more parameters; and I do not know how to pass those to sqlalchemy's create_engine to get a successful connection. Has anyone done this? Thanks.

As explained at https://github.com/cloudera/impyla/issues/214
import sqlalchemy
def conn():
return connect(host='some_host',
port=21050,
database='default',
timeout=20,
use_ssl=True,
ca_cert='some_pem',
user=user, password=pwd,
auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('impala://', creator=conn)

If your Impala is secured by Kerberos below script works (due to some reason I need to use hive:// instead of impala://)
import sqlalchemy
from sqlalchemy.engine import create_engine
connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'impala'}
engine = create_engine('hive://impalad-host:21050', connect_args=connect_args)
conn = engine.connect()
ResultProxy = conn.execute("SELECT * FROM db1.table1 LIMIT 5")
print(ResultProxy.fetchall())

import time
from sqlalchemy import create_engine, MetaData, Table, select, and_
ENGINE = create_engine(
'impala://{host}:{port}/{database}'.format(
host=host, # your host
port=port,
database=database,
)
)
METADATA = MetaData(ENGINE)
TABLES = {
'table': Table('table_name', METADATA, autoload=True),
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to load data into blaze from hive2 - python

I was having this same issue when using impyla to connect to Impala with SQLAlchemy. Making conn a function instead of assigning it to a variable worked.

Related

Unable to run a Impala query through Python script when Impala is load balanced

How to connect to Oracle-DB ODBC connection string using SQLAlchemy?

Launch SQL stored procedures from python with sqlalchemy?

SSH tunnel won't close after downloading SQL server tables from PythonAnywhere

impala connection via sqlalchemy

Categories

Resources