PyHive with Kerberos throws Authentication error after few calls

PyHive with Kerberos throws Authentication error after few calls - python

I am trying to connect to Hive using Python (PyHive Lib) to read some data and then I further connects it to hive Flask to show in Dashboard.
It all works fine for few calls to hive, however soon after that I am getting following error.
Traceback (most recent call last):
File "libs/hive.py", line 63, in <module>
cur = h.connect().cursor()
File "libs/hive.py", line 45, in connect
kerberos_service_name='hive')
File "/home1/igns/git/emsr/.venv/lib/python2.7/site-packages/pyhive/hive.py", line 94, in connect
return Connection(*args, **kwargs)
File "/home1/igns/git/emsr/.venv/lib/python2.7/site-packages/pyhive/hive.py", line 192, in __init__
self._transport.open()
File "/home1/igns/git/emsr/.venv/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 79, in open
message=("Could not start SASL: %s" % self.sasl.getError()))
thrift.transport.TTransport.TTransportException: Could not start SASL: Error in sasl_client_start (-1) SASL(-1): generic failure: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (No Kerberos credentials available (default cache: FILE:/tmp/krb5cc_cdc995595290_51CD7j))
Following is my code
from pyhive import hive
class Hive(object):
def connect(self):
return hive.connect(host='hive.hadoop-prod.abc.com',
port=10000,
database='temp',
username='gaurang.shah',
auth='KERBEROS',
kerberos_service_name='hive')
if __name__ == '__main__':
h = Hive()
cur = h.connect().cursor()
cur.execute("select * from temp.migration limit 1")
res = cur.fetchall()
print res
Calling Script
source .venv/bin/activate
for i in {1..50}
do
python get_hive_data.py
sleep 300
done
Observation
When it's working I can see hive in service principal when I do klist however, I don't when I see above error message.
Klist when it's working
Ticket cache: FILE:/tmp/krb5cc_cdc995595290_XyMnhu
Default principal: gaurang.shah#ABC.COM
Valid starting Expires Service principal
12/04/2018 14:37:28 12/05/2018 00:37:28 krbtgt/ABC.COM#ABC.COM
renew until 12/05/2018 14:37:24
12/04/2018 14:39:06 12/05/2018 00:37:28 hive/hive_server.ABC.COM#ABC.COM
renew until 12/05/2018 14:37:24
Klist when it's not working
Ticket cache: FILE:/tmp/krb5cc_cdc995595290_XyMnhu
Default principal: gaurang.shah#ABC.COM
Valid starting Expires Service principal
12/04/2018 14:37:28 12/05/2018 00:37:28 krbtgt/ABC.COM#ABC.COM
renew until 12/05/2018 14:37:24
Update:
So I don't think it's after certain call however, I think it's after certain time. ( I think one hour). I changed the sleep time to 3600 sec and just after first call I started getting error.
This is weird as, ticket for hive/hive_server.ABC.COM#ABC.COM was valid for 7 days

I know this is an old post. But if you make a new connection every time you are doing a call, you should resolve the issue.
from pyhive import hive
class Hive(object):
def connect(self):
return hive.connect(host='hive.hadoop-prod.abc.com',
port=10000,
database='temp',
username='gaurang.shah',
auth='KERBEROS',
kerberos_service_name='hive')
if __name__ == '__main__':
def newConnect(query):
h = Hive()
cur = h.connect().cursor()
cur.execute(query)
res = cur.fetchall()
return res
myConnectionAndResults = newConnect("select * from temp.migration limit 1")
print myConnectionAndResults

Related

How to set a query timeout in sqlalchemy using Oracle database?

I want to create a query timeout in sqlalchemy. I have an oracle database.
I have tried following code:
import sqlalchemy
engine = sqlalchemy.create_engine('oracle://db', connect_args={'querytimeout': 10})
I got following error:
TypeError: 'querytimeout' is an invalid keyword argument for this function
I would like a solution looking like:
connection.execute('query').set_timeout(10)
Maybe it is possible to set timeout in sql query? I found how to do it in pl/sql, but i need just sql.
How could i set a query timeout?

The only way how you can set connection timeout for the Oracle engine from the Sqlalchemy is create and configure the sqlnet.ora
Linux
Create file sqlnet.ora in folder
/opt/oracle/instantclient_19_9/network/admin
Windows
For windows please create such folder as \network\admin
C:\oracle\instantclient_19_9\network\admin
Example sqlnet.ora file
SQLNET.INBOUND.CONNECT_TIMEOUT = 120
SQLNET.SEND_TIMEOUT = 120
SQLNET.RECV_TIMEOUT = 120
More parameters you can find here https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm

The way to do it in Oracle is via resource manager. Have a look here

timeout decorator
Get your session handle as you normally would. (Notice that the session has not actually connected yet.) Then, test the session in a function that is decorated with wrapt_timeout_decorator.timeout.
#!/usr/bin/env python3
from time import time
from cx_Oracle import makedsn
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import text
from wrapt_timeout_decorator import timeout
class ConnectionTimedOut(Exception):
pass
class Blog:
def __init__(self):
self.port = None
def connect(self, connection_timeout):
#timeout(connection_timeout, timeout_exception=ConnectionTimedOut)
def test_session(session):
session.execute(text('select dummy from dual'))
session = sessionmaker(bind=self.engine())()
test_session(session)
return session
def engine(self):
return create_engine(
self.connection_string(),
max_identifier_length=128
)
def connection_string(self):
driver = 'oracle'
username = 'USR'
password = 'solarwinds123'
return '%s://%s:%s#%s' % (
driver,
username,
password,
self.dsn()
)
def dsn(self):
host = 'hn.com'
dbname = 'ORCL'
print('port: %s expected: %s' % (
self.port,
'success' if self.port == 1530 else 'timeout'
))
return makedsn(host, self.port, dbname)
def run(self):
self.port = 1530
session = self.connect(connection_timeout=4)
for r in session.execute(text('select status from v$instance')):
print(r.status)
self.port = 1520
session = self.connect(connection_timeout=4)
for r in session.execute(text('select status from v$instance')):
print(r.status)
if __name__ == '__main__':
Blog().run()
In this example, the network is firewalled with port 1530 open. Port 1520 is blocked and leads to a TCP connection timeout. Output:
port: 1530 expected: success
OPEN
port: 1520 expected: timeout
Traceback (most recent call last):
File "./blog.py", line 68, in <module>
Blog().run()
File "./blog.py", line 62, in run
session = self.connect(connection_timeout=4)
File "./blog.py", line 27, in connect
test_session(session)
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrapt_timeout_decorator.py", line 123, in wrapper
return wrapped_with_timeout(wrap_helper)
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrapt_timeout_decorator.py", line 131, in wrapped_with_timeout
return wrapped_with_timeout_process(wrap_helper)
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrapt_timeout_decorator.py", line 145, in wrapped_with_timeout_process
return timeout_wrapper()
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrap_function_multiprocess.py", line 43, in __call__
self.cancel()
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrap_function_multiprocess.py", line 51, in cancel
raise_exception(self.wrap_helper.timeout_exception, self.wrap_helper.exception_message)
File "/home/exagriddba/lib/python3.8/site-packages/wrapt_timeout_decorator/wrap_helper.py", line 178, in raise_exception
raise exception(exception_message)
__main__.ConnectionTimedOut: Function test_session timed out after 4.0 seconds
Caution
Do not decorate the function that calls sessionmaker, or you will get:
_pickle.PicklingError: Can't pickle <class 'sqlalchemy.orm.session.Session'>: it's not the same object as sqlalchemy.orm.session.Session
SCAN
This implementation is a "connection timeout" without regard to underlying cause. The client could time out before trying all available SCAN listeners.

Connecting to cloudSQL (MySQL) with python script using SSL certificates

I need to connect a python script to the Google Cloud SQL database using SSL. I am aware how to connect to the database only by ip address but I do not understand how to use the SSL certificate (3 .pem certificates), token ID etc that I have received in the .json file using python. I have tried some codes and they do not work nor do I understand what is going on.
import psycopg2
import psycopg2.extensions
import os
import stat
from google.cloud import storage
def main():
con = connect()
connection = con.connect_to_db()
result = connection.execute('SELECT * FROM
personaformation').fetchall()
for row in result:
votes.append({
'fname': row[0],
'lname': row[1],
'email': row[2]
})
print(votes[0])
def connect_to_db(self):
# Get keys from GCS
client = storage.Client()
bucket = client.get_bucket(weightssl)
bucket.get_blob('C:/Users/tasne/Downloads/serverca.pem').
download_to_filename('server-ca.pem')
bucket.get_blob('C:/Users/tasne/Downloads/clientkey.pem').
download_to_filename('client-key.pem')
os.chmod("client-key.pem", stat.S_IRWXU)
bucket.get_blob('C:/Users/tasne/Downloads/clientcert.pem').
download_to_filename('client-cert.pem')
sslrootcert = 'server-ca.pem'
sslkey = 'client-key.pem'
sslcert = 'client-cert.pem'
print("reached here")
con = psycopg2.connect(
host='37.246.65.223',
dbname='personalformation',
user='hello',
password='world',
sslmode = 'verify-full',
sslrootcert=sslrootcert,
sslcert=sslcert,
sslkey=sslkey)
return con
When I try the codes this is the error that I get, however I do not know where to specify the credentials nor am I using compute engine because this is not an application that I am creating but only a python script
C:\Users\tasne\PycharmProjects\project1\venv\Scripts\python.exe C:/Users/tasne/Downloads/database2.py
Traceback (most recent call last):
File "C:/Users/tasne/Downloads/database2.py", line 21, in <module>
credentials = GoogleCredentials.get_application_default()
File "C:\Users\tasne\PycharmProjects\project1\venv\lib\site-packages\oauth2client\client.py", line 1288, in get_application_default
return GoogleCredentials._get_implicit_credentials()
File "C:\Users\tasne\PycharmProjects\project1\venv\lib\site-packages\oauth2client\client.py", line 1278, in _get_implicit_credentials
raise ApplicationDefaultCredentialsError(ADC_HELP_MSG)
oauth2client.client.ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.

Using Python to connect to Impala database (thriftpy error)

What I'm trying to do is very basic: connect to an Impala db using Python:
from impala.dbapi import connect
conn = connect(host='impala', port=21050, auth_mechanism='PLAIN')
I'm using Impyla package to do so. I got this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/thriftpy/transport/socket.py", line 96, in open
self.sock.connect(addr)
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alaaeddine/PycharmProjects/test/data_test.py", line 3, in <module>
conn = connect(host='impala', port=21050, auth_mechanism='PLAIN')
File "/usr/local/lib/python3.6/dist-packages/impala/dbapi.py", line 147, in connect
auth_mechanism=auth_mechanism)
File "/usr/local/lib/python3.6/dist-packages/impala/hiveserver2.py", line 758, in connect
transport.open()
File "/usr/local/lib/python3.6/dist-packages/thrift_sasl/__init__.py", line 61, in open
self._trans.open()
File "/usr/local/lib/python3.6/dist-packages/thriftpy/transport/socket.py", line 104, in open
message="Could not connect to %s" % str(addr))
thriftpy.transport.TTransportException: TTransportException(type=1, message="Could not connect to ('impala', 21050)")
Tried also the Ibis package but failed with the same thriftpy related error.
In Windows using Dbeaver, I could connect to the database using the official Cloudera JDBC connector. My questions are:
Should pass my JDBC connector as parameter in my connect code? I have made some search I could not find something pointing at this direction.
Should I try something else than Ibis and Impyla packages? I had experienced a lot of version related issues and dependencies when using them. If yes, what would you recommend as alternatives?
Thanks!

Solved:
I used pyhive package instead of Ibis/Impyla. Here's an example:
#import hive from pyhive
from pyhive import hive
#establish the connection to the db
conn = hive.Connection(host='host_IP_addr', port='conn_port', auth='auth_type', database='my_db')
#prepare the cursor for the queries
cursor = conn.cursor()
#execute a query
cursor.execute("SHOW TABLES")
#navigate and display the results
for table in cursor.fetchall():
print(table)

Your impala domain name must not be resolving. Are you able to do nslookup impala in command prompt? If you're using Docker, you need to have the docker service name in docker-compose as "impala" or have "extra_hosts" option. Or you can always add it to /etc/hosts (Windows/Drivers/etc/hosts) as impala 127.0.0.1
Also try 'NOSASL' instead of PLAIN sometimes that works better with security turned off.

This is the simple method, connecting impala through impala shell using python.
import commands
import re
query1 = "select * from table_name limit 10"
impalad = str('hostname')
port = str('21000')
database = str('database_name')
result_string = 'impala-shell -i "'+ impalad+':'+port +'" -k -B --delimited -q "'+query1+'"'
status, output = commands.getstatusoutput(result_string)
print output
if status == 0:
print output
else:
print "Error encountered while executing HiveQL queries."

Unable to connect to server with python and pymssql

I'm developing a script that it's supossed to read data from a Microsoft SQL database and display it in a nice format. Also, It's supossed to write into the database as well. The issue is that I'm not able to connect to the server.
I'm using this code:
import pymssql
server = "serverIpAddress"
user = "username"
password = "pass"
db = "databaseName"
port = 1433
db = pymssql.connect(server,user,password,port= port)
# prepare a cursor object using cursor() method
cursor = db.cursor()
# execute SQL query using execute() method.
cursor.execute("SELECT VERSION()")
# Fetch a single row using fetchone() method.
data = cursor.fetchone()
print "Database version : %s " % data
# disconnect from server
db.close()
And I'm getting this traceback:
Traceback (most recent call last):
File ".\dbtest.py", line 9, in <module>
db = pymssql.connect(server,user,password,port= port)
File "pymssql.pyx", line 641, in pymssql.connect (pymssql.c:10824)
pymssql.OperationalError: (18452, 'Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.DB-Lib error message 20018, severity 14:\nGeneral SQL Server e
rror: Check messages from the SQL Server\nDB-Lib error message 20002, severity 9:\nAdaptive Server connection failed (serverip:1433)\n')
I've changed some data to keep privacy.
This give me some clues about what it's going on:
The login is from an untrusted domain and cannot be used with Windows authentication
But I don't know how to fix it. I've seen that some people uses
integratedSecurity=true
But I don't know if there is something like this on pymssql or even if that it's a good idea.
Also, I don't need to use pymssql at all. If you know any other library that can perform what I need, I don't mind changing it.
Thanks and greetings.
--EDIT--
I've also tested this code:
import pyodbc
server = "serverIpAddress"
user = "username"
password = "pass"
db = "databaseName"
connectString = "Driver={SQL Server};server="+serverIP+";database="+db+";uid="+user+";pwd="+password
con = pyodbc.connect(connectString)
cur = con.cursor()
cur.close()
con.close()
and I'm getting this traceback:
Traceback (most recent call last):
  File ".\pyodbc_test.py", line 9, in <module>
    con = pyodbc.connect(connectString)
pyodbc.Error: ('28000', "[28000] [Microsoft][ODBC SQL Server Driver][SQL Server]Login failed for user '.\\sa'. (18456) (SQLDriverConnect); [28000] [Microsoft][ODBC SQL Server Driver][SQL Server]Lo
gin failed for user '.\\sa'. (18456)")

Can't connect to Meteor with Pymongo

I am trying to connect to a Meteor Mongo database through pymongo. Here's the code:
def get_mongo_url(site):
# return "mongodb://client-xxxxx:yyyyy#production-db-c1.meteor.io:27017/site"
import subprocess
p = subprocess.Popen(['meteor', 'mongo', '--url', site], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print out
return out
from pymongo import MongoClient
client = MongoClient(get_mongo_url("mysite.com"))
And the error (the print statement yields a correct url)
>> mongodb://client-xxxxx:yyyyy#production-db-c1.meteor.io:27017/site
Traceback (most recent call last):
File "private/test.py", line 46, in <module>
client = pymongo.MongoClient(get_mongo_url(METEOR_SITE))
File "/Library/Python/2.7/site-packages/pymongo/mongo_client.py", line 369, in __init__
raise ConfigurationError(str(exc))
pymongo.errors.ConfigurationError: command SON([('authenticate', 1), ('user', u'client-xxxxx'), ('nonce', u'zzzzz'), ('key', u'ttttt')]) failed: auth fails
If I run meteor mongo --url mysite.com, copy the result into the return ... at the top of the function and uncomment it, the connection works. Why can't I connect programmatically?

The subprocess code appends a line feed character \n to the end of the url.
You need to strip that with .rstrip()
The right way to do that is to replace the return in your function with
return out.rstrip()
For confirmation purposes I will show what happens with the function as-is and
rstrip() applied/unapplied to the return.
murl = get_mongo_url('').rstrip()
mongodb://client-faf1d0db:746d8f43-367b-dde2-b69a-039ff8b9f7fa#production-db-a1.meteor.io:27017/_meteor_com
client = pymongo.MongoClient(murl)
Worked OK
murl = get_mongo_url('')
mongodb://client-3578a20b:d4ddeec9-6d24-713e-8ddb-c357b664948a#production-db-a1.meteor.io:27017/_meteor_com
client = pymongo.MongoClient(murl)
Traceback (most recent call last):
File "", line 1, in
File "/home/action/.local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 383, in init
raise ConfigurationError(str(exc))
pymongo.errors.ConfigurationError: command SON([('authenticate', 1), ('user', u'client-3578a20b'), ('nonce', u'e14e2bdb3d8484b9'), ('key', u'9
c101b78ff1a617a9c5f0def36c7e3d9')]) failed: auth fails
Failed without the rstrip.
murl = get_mongo_url('')
mongodb://client-1a193a61:4c9c572e-22e3-4b7e-44a1-dc76bfb65e86#production-db-a1.meteor.io:27017/_meteor_com
client = pymongo.MongoClient(murl)
Traceback (most recent call last):
File "", line 1, in
File "/home/action/.local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 383, in init
raise ConfigurationError(str(exc))
pymongo.errors.ConfigurationError: command SON([('authenticate', 1), ('user', u'client-1a193a61'), ('nonce', u'a2576142b1a33d8b'), ('key', u'4
419c490bcdcc65b20f2950c3b106d59')]) failed: auth fails
Failed again (no rsrtip)
murl = get_mongo_url('').rstrip()
mongodb://client-ce463608:d7dc6be0-499f-1808-43e1-fdfb8b6e8ebc#production-db-a1.meteor.io:27017/_meteor_com
client = pymongo.MongoClient(murl)
Worked (rstrip used).
The following is general info on mongodb URLs. You may know this already.
The URL that pymongo wants is not a web URL but a URL-like specifier for a mongo database connection.
For a development environment, the mongodb is usually set up on port 3001, which is not the default mongodb port for a production server.
Meteor applications can be configured to use a mongodb hosted anywhere. It does not have to be on the same server that serves the meteor content. The specification of this is done through the mongodb:// URL which is what pymongo wants. pymongo doesn't depend on the meteor website url, which can be very different from the mongodb url.
Here is some code I am using
import pymongo
MONGO_URL = r'mongodb://localhost:3001/meteor'
###
def connect():
client = pymongo.MongoClient(MONGO_URL)
return client
def findUser(c, email):
users = c.meteor.users
return users.find_one({"emails.address": email})
According to the mongodb site on Github, The MONGO_URL format is
mongodb://[username:password#]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
so the mongodb url mongodb://localhost:3001/meteor can be interpreted like this:
* mongodb:// means this describes a mongodb connection
* localhost means connect locally
* :3001 means use non-standard port number 3001. this is how "meteor run" sets up mongo
* /meteor means connect to the database called "meteor"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyHive with Kerberos throws Authentication error after few calls - python

Related

How to set a query timeout in sqlalchemy using Oracle database?

Connecting to cloudSQL (MySQL) with python script using SSL certificates

Using Python to connect to Impala database (thriftpy error)

Unable to connect to server with python and pymssql

Can't connect to Meteor with Pymongo

Categories

Resources