How to access Hive on remote server using python client - python

Case: I have Hive on a cloudera platform. There is a database on Hive that I want to access using python client from my computer. I read a similar SO question but its using pyhs2 which I am unable to install on the remote server. And this SO question too uses Thrift but I cant seem to install it either.
Code: After following the documentation, when I execute the following program it gives me an error.
import pyodbc, sys, os
pyodbc.autocommit=True
con = pyodbc.connect("DSN=default",driver='SQLDriverConnect',autocommit=True)
cursor = con.cursor()
cursor.execute("select * from fb_mpsp")
Error: ssh://ashish#ServerIPAddress/home/ashish/anaconda/bin/python2.7 -u /home/ashish/PyCharm_proj/hdfsConnect/home/ashish/PyCharm_proj/hdfsConnect/Hive_connect/hive_connect.py
Traceback (most recent call last):
File "/home/ashish/PyCharm_proj/hdfsConnect/home/ashish/PyCharm_proj/hdfsConnect/Hive_connect/hive_connect.py", line 5, in
con = pyodbc.connect("DSN=default", driver='SQLDriverConnect',autocommit=True)
pyodbc.Error: ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)')
Process finished with exit code 1
Please suggest how can I solve this problem? Also I am not sure why do I have to specify the driver as SQLDriverConnect when the code will be executed using hadoop hive?
Thanks

This worked for me
oODBC = pyodbc.connect("DSN=Cloudera Hive DSN 64;", autocommit = True, ansi = True )
And now everything works fine.
Be sure anything is fine with you DSN using:
isql -v "Cloudera Hive DSN 64"
and replace "Cloudera Hive DSN 64" with the name you used in your odbc.ini
Also, currently I'm not able to use the kerberos authentication unless I make a ticket by hand. Impala works smoothly using kerberos keytab files
Any help about how to have hive odbc working with keytab files is appreciated.

If you do decide to revisit pyhs2 note that it doesn't need to be installed on the remote server, it's installed on your local client.
If you continue with pyodbc, you need to install the ODBC driver for Hive, which you can get from Cloudera's site.
You don't need to specify the driver in your connection, it should be part of your DSN. The specifics of creating the DSN depend on your OS, but essentially you will create it using Administrative Tools -> Data Sources (Windows), install ODBC and edit /Library/ODBC/odbc.ini (Mac), or edit /etc/odbc.ini (Linux).
Conceptually, think of the DSN as a specification that represents all the information about the connection - it will contain the host, port, and driver information. That way in your code you don't have to specify these things and you can switch details about the database without changing your code.
# Note only the DSN name specifies the connection
import pyodbc
conn = pyodbc.connect("DSN=Hive1")
cursor = conn.cursor()
cursor.execute("select * from YYY")
Alternatively, I've updated the other question you referenced with information about how to install the thrift libraries. I think that's the way to go, if you have that option.

Try this method also to conenct and get data remotely from hive server:
connect remote server with ssh and give the cli command to access data from remote server:
ssh -o UserKnownHostsFile=/dev/null -o ConnectTimeout=90 -o StrictHostKeyChecking=no shashanks#remote_host 'hive -e "select * from DB.testtable limit 5;" >/home/shashanks/testfile'

Related

Sym linking issue connecting to Azure Database from macOS

I am trying to get the data from my SQL Azure Database. It seems like my python code throws the error because of linking confusion with ODBC Drivers.
Here is my Python code.
from urllib import parse
from sqlalchemy import create_engine
connecting_string = 'Driver={ODBC Driver 13 for SQL Server};Server=tcp:mftaccountinghost.database.windows.net,1433;Database=mft_accounting;Uid=localhost;Pwd=######;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30'
params = parse.quote_plus(connecting_string)
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
connection = engine.connect()
result = connection.execute("select 1+1 as res")
for row in result:
print("res:", row['res'])
connection.close()
Here is the error I get:
sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib '/usr/local/lib/libmsodbcsql.13.dylib' : file not found (0) (SQLDriverConnect)")
When I check my terminal I get the following results. Cannot resolve this issue...
Here is my connection string for the server:
Here is complete guide to install the ODBC driver in UNIX machine:
https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver15
If you installed the v17 msodbcsql package that was briefly available, you should remove it before installing the msodbcsql17 package. This will avoid conflicts.
Try uninstalling the driver and install it again it should work.
Also check below thread for additional reference:
Can't open lib 'ODBC Driver 13 for SQL Server'? Sym linking issue?
Hope it helps.

How to Connect Python, Oracle 11g with cx_Oracle in linux server?

My oracle install folder in linux server is "/lib/oracle/11.2/client64/lib"
setting variables in ~/.bash_profile are
ORACLE_HOME=/usr/lib/oracle/11.2/client64
LD_LIBRARY_PATH=$ORACLE_HOME/lib
export ORACLE_HOME
export LD_LIBRARY_PATH
Also, link in folder "/usr/lib/oracle/11.2/client64/lib"
ls -al|grep libclntsh.so
## Results are:
libclntsh.so -> libclntsh.so.11.1
libclntsh.so.10.1 -> /oracle/app/pracle/product/11.2.0/lib/libclntsh.so
libclntsh.so.11.1
And in python
os.environ['ORACLE_HOME']
os.environ['LD_LIBRARY_PATH']
## Results are:
'/usr/lib/oracle/11.2/client64'
'/usr/lib/oracle/11.2/client64/lib'
import cx_Oracle ## This Part is ok
But, This code are Error
cx_Oracle.clientversion()
## or
dsn = cx_Oracle.makedsn('ip',port,'SID')
conn = cx_Oracle.connect(user='uid',password='pwd',dsn=dsn)
## Above Code Results:
DatabaseError:Error while trying to retrieve text for error ORA-01804
I doubt that my folder "/etc/ld.conf.d" has not "oracle-instantclient.conf" file. This folder only has "mariadb-x86x64.conf"
ip, port, SID and userid, password all correct!
What's wrong my oracle connect setting in linux server?
Sorry. It's my fault.
I just change the ORACLE_HOME & LD_LIBRARY_PATH and It's running.
os.environ["ORACLE_HOME"] = 'oracle/app/oracle/product/11.2.0'
os.environ["LD_LIBRARY_PATH"] = 'oracle/app/oracle/product/11.2.0/lib'
conn = oci.connect('ID/pw#localhost:port/SID')
Solved it.

Python 3.4 - Sybase ASE connection

I would like to connect to a Sybase Ase 15 db with Python. Unfortunately I couldnt find any working solution for Windows with Python 3.4. Could anyone refer something? I tried with a few without luck. Can I use OleDb driver (dll) maybe somehow?
It would be great something free which is updated recently. I found solutions from 2012, but there werent working either.
Thank you.
You could use pyodbc:
import pyodbc
DbConnection = pyodbc.connect('DRIVER=freetds;SERVER=%s;PORT=%s;UID=%s;PWD=%s;DATABASE=%s;TDS _Version=5.0;' % (self.ServerAddress, self.ServerPort,
'aselogin', 'loginpwd', DefaultDb),unicode_results=True,autocommit=True)
Prerequisite: installing the driver FreeTDS corresponding to your OS.
First of all I want to point out the following: we can't connect to the Sybase ASE database just using Python; we have to use an external binary (called driver) which is able to manage the connections. In my case, I connect from a Windows machine using "jconn4.jar" driver.
In order to get the driver (jconn4.jar) I had to install Dbeaver application., and I set up a connection to the Sybase database that I want to access via python.
The next step is to test the connection and get the connection parameters. Press Test Connection --> Details.
In the windows that popped up we have all the details we need to configure the Sybase connection from python.
import jaydebeapi
server = "<server IP>"
username = "<username>"
password = "<password>"
database = "<datamase/schema>"
port = <port>
jdbc_driver = r'..\DBeaverData\drivers\drivers\sybase\jconnect\jconn4.jar' // this is the driver path; the path is obtained from the details window
conn = jaydebeapi.connect('com.sybase.jdbc4.jdbc.SybDriver', f'jdbc:sybase:Tds:{server}:{port}/{database}', {'user': username, 'password': password},
jdbc_driver)
cursor = conn.cursor()
cursor.execute("select * from my_table")
result = cursor.fetchall()
print(result)
If you get an error, regarding "JAVA_HOME" is missing
Check JDK is installed on your machine (I have installed JDK 1.8.0_202)
--> If not; install JDK
Add java path to the Environment Variables --> System Variables --> Path --> Edit --> New and paste the path to the Java bin folder (C:\Program Files\Java\jdk1.8.0_202\bin) --> Press OK
Specify the JAVA_HOME environment variable; Environment Variables --> System Variables --> New --> Varialbe Name: JAVA_HOME, Varialbe Value: C:\Program Files\Java\jdk1.8.0_202

pyodbc connection with Advantage ODBC Driver (Linux)

I am trying to connect to an existing Sybase Advantage Database Server via the ODBC driver on a LOCAL instance. I currently have unixodbc, unixodbc-dev, and unixodbc-bin installed.
When I attempt the following:
import pyodbc
str='DRIVER={Advantage ODBC Driver};DataDirectory=/var/lib/advantage/.../dbfile.add;User ID=...;Password=...;ServerTypes=1;'
connection = pyodbc.connect(str)
I get the following error:
pyodbc.Error: ('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found, and no default driver specified (0) (SQLDriverConnect)')
Here's my /etc/odbc.ini (and /etc/odbcinst.ini) file:
;
; odbc.ini
;
[ODBC Data Sources]
Odie = Advantage ODBC Driver
[Odie]
Driver=/opt/ads/odbc/redistribute/libadsodbc.so.11.10.0.24
DataDirectory=/var/lib/advantage/.../dbfile.add
Description=Advantage ODBC driver
Rows=False
MemoBlockSize=64
DefaultType=Advantage
MaxTableCloseCache=0
LOCKING=Record
CharSet=OEM
ADVANTAGELOCKING=OFF
ServerTypes=1
TableExtension=
I see three potential issues here - either my connection string is wrong, my odbc.ini file is incorrectly setup, or my unixodbc hasn't reloaded the odbc.ini since I modified it (if there is such a thing). I have attempted the solution proposed here, without avail.
Thanks for your help!

pypyodbc python module to connect postgresql on windows and linux

Unable to connect to postgreSQL using python pypyodbc module and getting error:
[Microsoft] [ODBC Driver Manager] Data Source name not found and no default driver specified.
Connection string used:
dbconn = pypyodbc.connect('Driver={PostgreSQL ODBC Driver(UNICODE)};'+ 'Server=127.0.0.1'
+ ';Port=5432;' + ';database=#####;'+ 'uid=######;' + ';pwd=######;' )
I have been searching on net over and over again but didn't found any thing,
Do I need to install supporting driver, if so please specify the way to do so.

Categories

Resources