PyHive is Hanging on Connection -- Thrift_sasl appears to be the issue - python

from pyhive import hive
import thrift_sasl
connection = hive.Connection(host='myhost', port=10000, database='local')
#hangs here
from sqlalchemy import create_engine
engine = create_engine('hive://myhost:10000/local')
logs = Table('mytable', MetaData(bind=engine), autoload=True)
#also hangs here
Both of these snippets will hang for me.
Hitting ctrl+c stops the execution here:
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/apps/Python/lib/python2.7/site-packages/pyhive/hive.py", line 86, in __init__
self._transport.open()
File "thrift_sasl.py", line 74, in open
status, payload = self._recv_sasl_message()
File "thrift_sasl.py", line 92, in _recv_sasl_message
header = self._trans.readAll(5)
File "/apps/Python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
chunk = self.read(sz - have)
File "/apps/Python/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 105, in read
buff = self.handle.recv(sz)
KeyboardInterrupt
I am using Hive 0.12 and HiveServer2. I can connect to it using the python Hive library provided with Hadoop (.../hive/lib/py) but cannot do so with pyhive, which uses thrift_sasl.
Some people not using the thrift_sasl module suggest turning off SASL support in hive-site.xml via:
<property><name>hive.server2.authentication</name><value>NOSASL</value></property>
However after trying this the code still hanged with the same stack trace when I issued a KeyboardInterrupt.

Related

Problem with hive thrift or with HiveMetastore by Apache Airflow

We have a problem when possible trying to connect with Hive Metastore by HiveMetastoreHook in Apache Airflow
thrift.transport.TTransport.TTransportException: b'Error in sasl_decode (-1) SASL(-1): generic failure: Unable to find a callback: 32775'
We googled this issue but still have not find any answer
But for temporary fix - We do REcreate our hive external table on which the issue happens and restart this task. Then after a few days this problem happens again and again and again. And we have no any idea where we need to fix.
NOTE:
This happens only on one big table ,
we have hive 3.1.0 and Airflow 1.10.5 ,
this issue reproduces from python3 cli by import airflow,
select this big table from hive is fine (data is fine too)
Full stack trace
[2021-06-30 18:05:41,143] {{base_hook.py:84}} INFO - Using connection to: id: our_hive_metastore_connection. Host: server_name, Port: someport, Schema: our_schema, Login: some_login, Password: None, extra: {'authMechanism': 'GSSAPI', 'kerberos_service_name': 'some_name'}
>>> hm.get_table(table_name='some_big_table', db='some_schema')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.6/site-packages/airflow/hooks/hive_hooks.py", line 608, in get_table
return client.get_table(dbname=db, tbl_name=table_name)
File "/usr/local/lib/python3.6/site-packages/hmsclient/genthrift/hive_metastore/ThriftHiveMetastore.py", line 2253, in get_table
return self.recv_get_table()
File "/usr/local/lib/python3.6/site-packages/hmsclient/genthrift/hive_metastore/ThriftHiveMetastore.py", line 2266, in recv_get_table
(fname, mtype, rseqid) = iprot.readMessageBegin()
File "/usr/local/lib64/python3.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 134, in readMessageBegin
sz = self.readI32()
File "/usr/local/lib64/python3.6/site-packages/thrift/protocol/TBinaryProtocol.py", line 217, in readI32
buff = self.trans.readAll(4)
File "/usr/local/lib64/python3.6/site-packages/thrift/transport/TTransport.py", line 62, in readAll
chunk = self.read(sz - have)
File "/usr/local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 166, in read
self._read_frame()
File "/usr/local/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 180, in _read_frame
message=self.sasl.getError())
thrift.transport.TTransport.TTransportException: b'Error in sasl_decode (-1) SASL(-1): generic failure: Unable to find a callback: 32775'```
Is anybody have some idea what we need to fix?
Appreciate for help!

Cannot create simple table using Happybase in Python

I am trying to create a table using Happybase. To start I enter the following commands get Hbase and Thrift running:
start-hbase.sh
hbase thrift start &
Once this is running I open Python's command prompt and type the following:
import happybase as hb
connection = hb.Connection()
connection.open()
However when I try to create a table:
connection.create_table(
'mytable',
{'cf1': dict(max_versions=10),
'cf2': dict(max_versions=1, block_cache_enabled=False),
'cf3': dict(), # use defaults
}
I get the following error that I just don't understand.
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "/usr/local/lib/python2.7/dist-packages/happybase/connection.py", line 309, in create_table
self.client.createTable(name, column_descriptors)
File "/usr/local/lib/python2.7/dist-packages/thriftpy/thrift.py", line 198, in _req
return self._recv(_api)
File "/usr/local/lib/python2.7/dist-packages/thriftpy/thrift.py", line 210, in _recv
fname, mtype, rseqid = self._iprot.read_message_begin()
File "thriftpy/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6325)
File "thriftpy/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32 (thriftpy/protocol/cybin/cybin.c:1546)
File "thriftpy/transport/buffered/cybuffered.pyx", line 65, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.c_read (thriftpy/transport/buffered/cybuffered.c:1881)
File "thriftpy/transport/buffered/cybuffered.pyx", line 69, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.read_trans (thriftpy/transport/buffered/cybuffered.c:1948)
File "thriftpy/transport/cybase.pyx", line 61, in thriftpy.transport.cybase.TCyBuffer.read_trans (thriftpy/transport/cybase.c:1472)
File "/usr/local/lib/python2.7/dist-packages/thriftpy/transport/socket.py", line 125, in read
message='TSocket read 0 bytes')
thriftpy.transport.TTransportException: TTransportException(message='TSocket read 0 bytes', type=4)
)
You need to specify the server address, and possibly port:
connection = hb.Connection(SERVER, PORT)
You can probably omit the port value, as most likely the default value will match, but just in case check on what port your thrift server is listening and specify that as a numeric value

Unable to connect to Hive2 using Python

While connecting to Hive2 using Python with below code:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
I am getting below error:
File
"C:\Users\vinbhask\AppData\Roaming\Python\Python36\site-packages\pyhs2-0.6.0-py3.6.egg\pyhs2\connections.py",
line 7, in <module>
from cloudera.thrift_sasl import TSaslClientTransport ModuleNotFoundError: No module named 'cloudera'
Have tried here and here but issue wasn't resolved.
Here is the packages installed till now:
bitarray0.8.1,certifi2017.7.27.1,chardet3.0.4,cm-api16.0.0,cx-Oracle6.0.1,future0.16.0,idna2.6,impyla0.14.0,JayDeBeApi1.1.1,JPype10.6.2,ply3.10,pure-sasl0.4.0,PyHive0.4.0,pyhs20.6.0,pyodbc4.0.17,requests2.18.4,sasl0.2.1,six1.10.0,teradata15.10.0.21,thrift0.10.0,thrift-sasl0.2.1,thriftpy0.3.9,urllib31.22
Error while using Impyla:
Traceback (most recent call last):
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\Scripts\HiveConnTester4.py", line 1, in <module>
from impala.dbapi import connect
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\dbapi.py", line 28, in <module>
import impala.hiveserver2 as hs2
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\hiveserver2.py", line 33, in <module>
from impala._thrift_api import (
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\_thrift_api.py", line 74, in <module>
include_dirs=[thrift_dir])
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\__init__.py", line 30, in load
include_dir=include_dir)
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\parser.py", line 496, in parse
url_scheme))
thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c'
thrift_sasl.py is trying cStringIO which is no longer available in Python 3.0. Try with python 2 ?
You may need to install an unreleased version of thrift_sasl. Try:
pip install git+https://github.com/cloudera/thrift_sasl
If you're comfortable learning PySpark, then you just need to setup the hive.metastore.uris property to point at the Hive Metastore address, and you're ready to go.
The easiest way to do that would be to export the hive-site.xml from the your cluster, then pass --files hive-site.xml during spark-submit.
(I haven't tried running standalone Pyspark, so YMMV)

Hive Server 2 error on python connect with hiveserver2

I am using
Centos , Python2.7 , hive 2.1 ,Hadoop 2.7.2 ,pyHive
here is code
from pyhive import hive
from TCLIService.ttypes import TOperationState
cursor = hive.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10', async=True)
#status = cursor.poll().operationState
#while status in (TOperationState.INITIALIZED_STATE, TOperationState.RUNNI$
# logs = cursor.fetch_logs()
# for message in logs:
# print message
# If needed, an asynchronous query can be cancelled at any time with:
# cursor.cancel()
# status = cursor.poll().operationState
#print cursor.fetchall()
when I run python /usr/local/py/test5.py in terminal its showing ....
Traceback (most recent call last):
File "/usr/local/py/test5.py", line 3, in <module>
cursor = hive.connect('localhost').cursor()
File "/usr/local/lib/python2.7/site-packages/pyhive/hive.py", line 63, in connect
return Connection(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/pyhive/hive.py", line 104, in __init__
self._transport.open()
File "/usr/local/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 80, in open
status, payload = self._recv_sasl_message()
File "/usr/local/lib/python2.7/site-packages/thrift_sasl/__init__.py", line 98, in _recv_sasl_message
header = read_all_compat(self._trans, 5)
File "/usr/local/lib/python2.7/site-packages/thrift_sasl/six.py", line 31, in <lambda>
read_all_compat = lambda trans, sz: trans.readAll(sz)
File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
chunk = self.read(sz - have)
File "/usr/local/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 120, in read
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
Hive server error log showing after this...
ERROR [HiveServer2-Handler-Pool: Thread-41]: server.TThreadPoolServer (:()) - Thrift error occur$
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:228)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Also I had tried pyhs2 getting same error
what was going wrong ?
Thanks
I hava solved this error with these versions :
Centos 7 , Python2.7 , hive 2.1 ,Hadoop 2.7.3 and java java 1.7.0_91
using impyla
its working for me.
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
I had the same problem and solved it by
Start hiveserver2 (how to start hiveserver2)
Modify your port. The default port is 10000 or you can change it to follow your hive_site.xml

Connecting to a firebird superserver on windows using sqlalchemy

I am trying to connect to a firebird super server.I have the fdb package installed.
I am trying
from sqlalchemy import create_engine
engine = create_engine ('localhost:c:\fdbb\school.fdb')
I get this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build\bdist.win32\egg\sqlalchemy\engine\__init__.py", line 332, in creat
e_engine
File "build\bdist.win32\egg\sqlalchemy\engine\strategies.py", line 48, in crea
te
File "build\bdist.win32\egg\sqlalchemy\engine\url.py", line 154, in make_url
File "build\bdist.win32\egg\sqlalchemy\engine\url.py", line 196, in _parse_rfc
1738_args
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'localhost
:c:♀dbb\school.fdb'
Am i doing this right?.
Solved it this way
import sqlalchemy
import fdb
engine =
create_engine('firebird+fdb://sysdba:masterkey#localhost:3050/c:/fdbb/school.fdb')

Categories

Resources