This is the code use to connect to our hive database , it ran fine a week back but now it seems to be be failing to even open a session and get a cursor to execute the queries. The issue was temporarily fixed when i explicitly added a cursor.close() method but now it's back again. I am unable to access the hive database using python
I have tried using pyhs2 and pyhive both the libraries fail to connect to the hive database. Nothing on the cluster has changed so far. what could the reason for this be ?
I know Hive is not a relational DB so the concept of cursors doesn't make sense but Is there any way the the Hive database remembers the cursors created using the pyhive library ?? if so how can i delete the currently unused cursors ??
Here is the code and exception it raises upon execution
from pyhive import hive
import contextlib
class Hive():
def __init__(self,host="[hostnamehere]",db="default",port="10000",auth="KERBEROS",kerberos_service_name="hive"):
self.host = host
self.db = db
self.port = port
self.auth = auth
self.kerberos_service_name = kerberos_service_name
def connect(self):
return(hive.connect(host=self.host, port=self.port, database=self.db, auth=self.auth, kerberos_service_name=self.kerberos_service_name))
def query_one(self,sql):
with contextlib.closing(self.connect()) as connection:
with contextlib.closing(connection.cursor()) as cursor:
cursor.execute(sql)
result = cursor.fetch_one()
cursor.close()
return(result)
if __name__ == "__main__":
connector = Hive()
print("running query")
print(connector.query_one("SELECT * FROM [tablenamehere]"))
raise OperationalError(response)
pyhive.exc.OperationalError: TOpenSessionResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient:13:12', 'org.apache.hive.service.cli.session.SessionManager:openSession:SessionManager.java:289', 'org.apache.hive.service.cli.CLIService:openSession:CLIService.java:199', 'org.apache.hive.service.cli.thrift.ThriftCLIService:getSessionHandle:ThriftCLIService.java:427', 'org.apache.hive.service.cli.thrift.ThriftCLIService:OpenSession:ThriftCLIService.java:319', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession:getResult:TCLIService.java:1257', 'org.apache.hive.service.cli.thrift.TCLIService$Processor$OpenSession:getResult:TCLIService.java:1242', 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', 'org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor:process:HadoopThriftAuthBridge.java:562', 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286', 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1149', 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:624', 'java.lang.Thread:run:Thread.java:748', '*java.lang.RuntimeException:java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient:15:2', 'org.apache.hadoop.hive.ql.session.SessionState:start:SessionState.java:547', 'org.apache.hive.service.cli.session.HiveSessionImpl:open:HiveSessionImpl.java:144', 'org.apache.hive.service.cli.session.SessionManager:openSession:SessionManager.java:281', '*java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient:21:6', 'org.apache.hadoop.hive.metastore.MetaStoreUtils:newInstance:MetaStoreUtils.java:1566', 'org.apache.hadoop.hive.metastore.RetryingMetaStoreClient::RetryingMetaStoreClient.java:92', 'org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:getProxy:RetryingMetaStoreClient.java:138', 'org.apache.hadoop.hive.metastore.RetryingMetaStoreClient:getProxy:RetryingMetaStoreClient.java:110', 'org.apache.hadoop.hive.ql.metadata.Hive:createMetaStoreClient:Hive.java:3510', 'org.apache.hadoop.hive.ql.metadata.Hive:getMSC:Hive.java:3542', 'org.apache.hadoop.hive.ql.session.SessionState:start:SessionState.java:528', '*java.lang.reflect.InvocationTargetException:null:25:4', 'sun.reflect.NativeConstructorAccessorImpl:newInstance0:NativeConstructorAccessorImpl.java:-2', 'sun.reflect.NativeConstructorAccessorImpl:newInstance:NativeConstructorAccessorImpl.java:62', 'sun.reflect.DelegatingConstructorAccessorImpl:newInstance:DelegatingConstructorAccessorImpl.java:45', 'java.lang.reflect.Constructor:newInstance:Constructor.java:423', 'org.apache.hadoop.hive.metastore.MetaStoreUtils:newInstance:MetaStoreUtils.java:1564', '*org.apache.hadoop.hive.metastore.api.MetaException:GC overhead limit exceeded:30:4', 'org.apache.hadoop.hive.metastore.RetryingHMSHandler::RetryingHMSHandler.java:82', 'org.apache.hadoop.hive.metastore.RetryingHMSHandler:getProxy:RetryingHMSHandler.java:91', 'org.apache.hadoop.hive.metastore.HiveMetaStore:newRetryingHMSHandler:HiveMetaStore.java:6463', 'org.apache.hadoop.hive.metastore.HiveMetaStoreClient::HiveMetaStoreClient.java:206', 'org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient::SessionHiveMetaStoreClient.java:76', '*java.lang.OutOfMemoryError:GC overhead limit exceeded:0:-1'], sqlState=None, errorCode=0, errorMessage='Failed to open new session: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient'), serverProtocolVersion=7, sessionHandle=None, configuration=None)
Related
Using Flask, I'm attempting to build a web application for user authentication, however the database is not being created and I keep receiving sql.errors.opertionalerror.
When I use sqlite, it functions perfectly, but when I use mysql, I receive the above mentioned error.
I am providing the SQLALCHEMY_DATABASE_URI='mysql+pymysql://username:password#localhost:3306/db_name' in the enivronment file.
Is it acceptable to use the Python MySQL (pymysql) connector to establish a database?
import pymysql as pm
def creating_user_db():
try:
mydb = pm.connect(host='localhost',
user='username',
passwd='password'
)
mydb.autocommit = False
cursor = mydb.cursor()
mydb.cursor().execute('CREATE DATABASE IF NOT EXISTS db_name;')
except Exception as e:
print(e)
finally:
cursor.close()
mydb.close()
Please correct me where I made a mistake and assist me with this.
Thanks In advance!
I'm trying to create a Prefect task that receives as input an instance of PyMySQL connection, such as:
#task
def connect_db():
connection = pymysql.connect(user=user,
password=password,
host=host,
port=port,
db=db,
connect_timeout=5,
cursorclass=pymysql.cursors.DictCursor,
local_infile=True)
return connection
#task
def query_db(connection) -> Any:
query = 'SELECT * FROM myschema.mytable;'
with connection.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
return rows
#task
def get_df(rows) -> Any:
return pd.DataFrame(rows, dtype=str)
#task
def save_csv(df):
path = 'mypath'
df.to_csv(path, sep=';', index=False)
with Flow(FLOW_NAME) as f:
con = connect_db()
rows = query_db(con)
df = get_df(rows)
save_csv(df)
However, as I try to register the resulting flow, it raises "TypeError: cannot pickle 'socket' object". Going through Prefect's Docs, I've found built-in MySQL Tasks ( https://docs.prefect.io/api/latest/tasks/mysql.html#mysqlexecute), but they open and close connections each time they're called. Is there any way to pass a connection previously opened to a Prefect Task (or implement such thing as a connection manager)?
I tried to replicate your example but it registers fine. The most common way an error like this pops up is if you have a client in the global namespace that the flow uses. Prefect will try to serialize that upon registration. For example, the following code snippet will error if you try to register it:
import pymysql
connection = pymysql.connect(user=user,
password=password,
host=host,
port=port,
db=db,
connect_timeout=5,
cursorclass=pymysql.cursors.DictCursor,
local_infile=True)
#task
def query_db(connection) -> Any:
query = 'SELECT * FROM myschema.mytable;'
with connection.cursor() as cur:
cur.execute(query)
rows = cur.fetchall()
return rows
with Flow(FLOW_NAME) as f:
rows = query_db(connection)
This errors because the connection variable is serialized along with the flow object. You can work around this by storing your Flow as a script. See this link for more information:
https://docs.prefect.io/core/idioms/script-based.html#using-script-based-flow-storage
This will avoid the serialization of the Flow object and create that connection during runtime.
If this happens during runtime
If you encounter this error during runtime, there are two possible reasons you can see this. The first is Dask serializing it, and the second is from the Prefect checkpointing.
Dask uses cloudpickle to send the data to the workers across a network. So if you use Prefect with a DaskExecutor, it will use cloudpickle to send the tasks for execution. Thus, task inputs and outputs need to be serializable. In this scenario, you should instantiate the Client and perform the query inside a task (like you saw with the current MySQL Task implementation)
If you use a LocalExecutor, task outputs are serialized by default because checkpointing is on by default. You can toggle with by doing checkpoint=False when you define the task.
If you need further help, feel free to join the Prefect Slack channel at prefect.io/slack .
I have a MySQL server installed locally and I have Python code that accesses MySQL Database and executes a simple query:
from mysql.connector import connect
from mysql.connector import ProgrammingError
DB = {
'user':'andrei',
'password':'qwertttyy',
'host':'localhost',
'port':'3306',
'db':'my_database'
}
class Connection:
instance = None
def __new__(cls):
if not cls.instance:
try:
cls.instance = connect(**DB)
except:
raise
return cls.instance
def excuteDQL(query):
cnx = Connection()
cursor = cnx.cursor()
try:
cursor.execute(query)
return cursor.fetchall()
except ProgragrammingError as err:
print('You have an error in your MySQL syntax. Please check and retry')
return []
if __name__ == '__main__':
while True:
query = input('Enter a SQL query: ')
for tuple in executeDQL(query):
print(tuple)
If I go out there and find a cloud MySQL hosting service and pay for it, the access would be as easy as changing the DB mapping with different info?
I think it should be because the connection would still be over standard TCP/IP, except, in this case, it happens to come back the same machine that is emitting. I guess, under the hood, data is packed following TCP/IP rules up to the IP layer, and then these are transferred as IP Packets from the Python process through the OS Networking API to the MySQL Server listening to the port, without further down processing into the Access Layer since the packets never leave the machine, which I understand is the purpose of the Access Layer of the TCP/IP stack, that is, to abstract the physical road the data takes.
Did I say something coherent in my guessing?
If I'm wrong, How can I put a MySQL Server in the cloud?
Yes how you connect to the database would not change. It will be as simple as changing the host name and providing whatever credentials you need ( Access Token , User info, etc). The way you insert data doesn't change once you make a connection to the DB.
Here is a good script which should provide some info: https://gist.github.com/kirang89/7161185
When running this code for connecting to a db through cmd - locally and on the actual server it works fine. But I have set it up on Jenkins and receive the error:
DatabaseError: file is encrypted or is not a database
It seems to be happening on this line:
self.cursor.execute(*args)
The database class is:
class DatabaseManager(object):
def __init__(self, db):
self.conn = sqlite3.connect(db)
self.cursor = self.conn.cursor()
def query(self, *args):
self.cursor.execute(*args)
self.conn.commit()
return self.cursor
def __del__(self):
self.conn.close()
The version of python sqlite3 and Command Line sqlite3 can be different. Create your database from the script i.e. code the DB initialization in the script rather than from CMD and it might solve the problem.
What is the value of *args ?? is it having the same values that you have when you run it on cmd and through jenkins, Did you check the path and location of DB related from jenkins location??
It's most probably a version mismatch between the SQLite CLI you're using and the one bundled with Python. You can have such mismatch on the same server, too. To be sure, you can use Python instead of the SQLite CLI to create your DB on the sever - provided you have all your SQLite init structure in path/to/your_sql.sql you can initialize path/to/your_database.db database with a script like:
import sqlite3
connection = sqlite3.connect("path/to/your_database.db")
cursor = connection.cursor()
with open("path/to/your_sql.sql", "r") as f:
cursor.execute(f.read())
connection.commit()
Then try to load that DB from your Jenkins job.
I want to connect hive using python with only on JDBC connection. I have tried pyhive it is working fine, but I need to connect python with hive using JDBC connection.
I am trying this below code to connect python with hive using JDBC connection
import jaydebeapi
def get_hive_jdbc_con():
driver="org.apache.hive.jdbc.HiveDriver"
conn_url="jdbc:hive2://system101.xxx.com:10000/default".format(host="system101.xxx.com",port=10000)
auth_lst=["hive","hive"]
conn = jaydebeapi.connect(driver,conn_url,auth_lst,)
return conn
if __name__ == '__main__':
get_hive_jdbc_con()
I am getting this error
java.lang.RuntimeExceptionPyRaisable: java.lang.RuntimeException: Class org.apache.hive.jdbc.HiveDriver not found