While connecting to Hive2 using Python with below code:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
I am getting below error:
File
"C:\Users\vinbhask\AppData\Roaming\Python\Python36\site-packages\pyhs2-0.6.0-py3.6.egg\pyhs2\connections.py",
line 7, in <module>
from cloudera.thrift_sasl import TSaslClientTransport ModuleNotFoundError: No module named 'cloudera'
Have tried here and here but issue wasn't resolved.
Here is the packages installed till now:
bitarray0.8.1,certifi2017.7.27.1,chardet3.0.4,cm-api16.0.0,cx-Oracle6.0.1,future0.16.0,idna2.6,impyla0.14.0,JayDeBeApi1.1.1,JPype10.6.2,ply3.10,pure-sasl0.4.0,PyHive0.4.0,pyhs20.6.0,pyodbc4.0.17,requests2.18.4,sasl0.2.1,six1.10.0,teradata15.10.0.21,thrift0.10.0,thrift-sasl0.2.1,thriftpy0.3.9,urllib31.22
Error while using Impyla:
Traceback (most recent call last):
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\Scripts\HiveConnTester4.py", line 1, in <module>
from impala.dbapi import connect
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\dbapi.py", line 28, in <module>
import impala.hiveserver2 as hs2
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\hiveserver2.py", line 33, in <module>
from impala._thrift_api import (
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\_thrift_api.py", line 74, in <module>
include_dirs=[thrift_dir])
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\__init__.py", line 30, in load
include_dir=include_dir)
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\parser.py", line 496, in parse
url_scheme))
thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c'
thrift_sasl.py is trying cStringIO which is no longer available in Python 3.0. Try with python 2 ?
You may need to install an unreleased version of thrift_sasl. Try:
pip install git+https://github.com/cloudera/thrift_sasl
If you're comfortable learning PySpark, then you just need to setup the hive.metastore.uris property to point at the Hive Metastore address, and you're ready to go.
The easiest way to do that would be to export the hive-site.xml from the your cluster, then pass --files hive-site.xml during spark-submit.
(I haven't tried running standalone Pyspark, so YMMV)
Related
i'm trying to create an exe file which take advantage of teradataml python. I'm trying to create a table in teradata and import the data form pandas dataframe.
here is my code.
import pandas as pd
from sqlalchemy import create_engine
from teradataml.context.context import *
from sqlalchemy import *
from teradataml.dataframe.copy_to import copy_to_sql
from sqlalchemy.dialects import registry
from teradatasqlalchemy import dialect
registry.register('teradata', 'teradatasqlalchemy', 'dialect')
user = 'dbc'
pasw=user
host = '192.168.1.7'
td_engine = create_engine('teradata://'+ user +':' + pasw + '#'+ host )
create_context(tdsqlengine =td_engine)
df = pd.read_csv(r"C:/krishna/data/FL_insurance_sample1.csv", delimiter=',')
copy_to_sql(df = df, table_name = "Insurece_sample", primary_index="InsurenceID", if_exists="replace")
remove_context()
initially i was getting below error however i fixed that one.
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:teradata
pyinstaller command which i tried:
pyinstaller --add-binary "C:\Users\krishna\AppData\Local\Programs\Python\Python38\Lib\site-packages\teradatasql\teradatasql.dll;teradatasql"-F pyinstalletest.py
the error which i'm getting now:
Traceback (most recent call last):
File "pyinstalletest.py", line 18, in <module>
File "teradataml\context\context.py", line 459, in create_context
File "teradataml\context\context.py", line 751, in _load_function_aliases
File "teradataml\common\utils.py", line 1591, in _check_alias_config_file_exists
teradataml.common.exceptions.TeradataMlException: [Teradata][teradataml](TDML_2069) Alias config file 'C:\Users\krishna\AppData\Local\Temp\_MEI63962\teradataml\config\mlengine_alias_definitions_v1.0' is not defined for the current Vantage version 'vantage1.0'. Please add the config file.
[1660] Failed to execute script pyinstalletest
please help me to resolve the error.
I am on windows 10 and using python3.9.
I installed the pagkage
>python3 -m pip install mysql-connector-python
Now I try to run a simple program
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="pythonDB"
)
mycursor = mydb.cursor()
mycursor.execute("SELECT * FORM customers")
myresult = mycursor.fetchall()
for x in myresult:
print(x)
I am getting the following error when running
>python3 select.py
Traceback (most recent call last):
File "G:\Python_w3school\mysql\select.py", line 1, in
import mysql.connector
File "C:\Users\pawar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\mysql\connector_init_.py", line 42, in
import dns.resolver
File "C:\Users\pawar\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\dns\resolver.py", line 20, in
import socket
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\socket.py", line 54, in
import os, sys, io, selectors
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\selectors.py", line 12, in
import select
File "G:\Python_w3school\mysql\select.py", line 3, in
mydb = mysql.connector.connect(
AttributeError: module 'mysql' has no attribute 'connector'
try these:
AttributeError: module 'mysql' has no attribute 'connector'
Attribute error:module 'mysql' has no attribute 'connector'
basically change the import to
from mysql import connector
Seems like the Python3 is not taking SQL directly inside the execute function.
Assign the query to a variable and pass it to the execute() function.
It worked for me!!!
sql = "SELECT * FROM customers"
mycursor.execute(sql)
Also try try renaming the file if it is something like select.py to select_customers.py
I use pymongo to connect to my databases on a mongodb server. I set everything up and used a simple tutorial to start with basic things in pymongo. I ended up writting this into a python file:
from pymongo import MongoClient
from random import randint
client = MongoClient("localhost", 27017) #Class from PyMongo module
db = client["rothe_plana"]
# Initialize database settings for employers and events collections:
employersCollect = db["employers"]
eventsCollect = db["events"]
#-----------------------------------------------------
#Employer database managment:
#-----------------------------------------------------
#Inserts passed dictionary objects of employer profiles:
def insertNewEmployer(new_employer_profile):
while True:
try:
readyProfile = new_employer_profile.copy()
readyProfile['employer_id'] = randint(100, 999)
employersCollect.insert_one()
except pymongo.errors.DuplicateKeyError:
continue
break
def getListOfEmployerIDs():
pass #get employer ids to identify and render template elements.
# -----------------------------------------------------
# Events database managment:
# -----------------------------------------------------
#Inserts passed dictionary objects of event data:
def insertNewEvent(new_event_data):
while True:
try:
readyEventData = new_employer_profile.copy()
readyEventData['event_id'] = randint(10000000, 99999999)
employersCollect.insert_one()
except pymongo.errors.DuplicateKeyError:
continue
break
But if I run this I get an exception:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2018.1.4\helpers\pydev\pydevd.py", line 1664, in <module>
main()
File "C:\Program Files\JetBrains\PyCharm 2018.1.4\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm 2018.1.4\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm 2018.1.4\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/thoma/OneDrive/Projects_For_The_Web/Fliesen Rothe/PlanA/Pyramid_PlanA/pyramid_plana/datadbhandler.py", line 1, in <module>
from pymongo import MongoClient
File "C:\Users\thoma\OneDrive\Projects_For_The_Web\Fliesen Rothe\PlanA\Pyramid_PlanA\venv\lib\site-packages\pymongo\__init__.py", line 77, in <module>
from pymongo.collection import ReturnDocument
File "C:\Users\thoma\OneDrive\Projects_For_The_Web\Fliesen Rothe\PlanA\Pyramid_PlanA\venv\lib\site-packages\pymongo\collection.py", line 29, in <module>
from pymongo import (common,
File "C:\Users\thoma\OneDrive\Projects_For_The_Web\Fliesen Rothe\PlanA\Pyramid_PlanA\venv\lib\site-packages\pymongo\message.py", line 654, in <module>
_op_msg_uncompressed = _cmessage._op_msg
AttributeError: module 'pymongo._cmessage' has no attribute '_op_msg'
Since I certainly did not touch the Pymongo module code, I am doing something wrong in my code above. Also the web didn't bring up any results so is there a clear explanation for this?
EDIT: I had a closer look into the files that were provided by the above error. And I can see that the attribute in the specified class actually do exist. So that is quite strange. Even if I comment the dependent line out of pymongo, there is another AtrributeError for the same class.
I finally resolved the problem. It turned out that the permissions in my filesystem were not handled right.
I originally installed PyMongo via PyCharm (pip install pymongo). But this just does not work (no idea why) but I finally uninstalled pymongo from the virtual environment and installed it manually again via PowerShell in the virtual environment:
python -m pip install pymongo
Restarting PyCharm and running the project did bring up no errors anymore. Hope this may help others with this problem
from pyhive import hive
import thrift_sasl
connection = hive.Connection(host='myhost', port=10000, database='local')
#hangs here
from sqlalchemy import create_engine
engine = create_engine('hive://myhost:10000/local')
logs = Table('mytable', MetaData(bind=engine), autoload=True)
#also hangs here
Both of these snippets will hang for me.
Hitting ctrl+c stops the execution here:
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/apps/Python/lib/python2.7/site-packages/pyhive/hive.py", line 86, in __init__
self._transport.open()
File "thrift_sasl.py", line 74, in open
status, payload = self._recv_sasl_message()
File "thrift_sasl.py", line 92, in _recv_sasl_message
header = self._trans.readAll(5)
File "/apps/Python/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll
chunk = self.read(sz - have)
File "/apps/Python/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 105, in read
buff = self.handle.recv(sz)
KeyboardInterrupt
I am using Hive 0.12 and HiveServer2. I can connect to it using the python Hive library provided with Hadoop (.../hive/lib/py) but cannot do so with pyhive, which uses thrift_sasl.
Some people not using the thrift_sasl module suggest turning off SASL support in hive-site.xml via:
<property><name>hive.server2.authentication</name><value>NOSASL</value></property>
However after trying this the code still hanged with the same stack trace when I issued a KeyboardInterrupt.
I am trying to connect to cassandra from python , I have installed cassandra as pip install pycassa.When i am trying to connect to the cassandra i am getting the following exception
from pycassa.pool import ConnectionPool
pool = ConnectionPool('Keyspace1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/pycassa/pool.py", line 382, in __init__
self.fill()
File "/usr/lib/python2.7/site-packages/pycassa/pool.py", line 442, in fill
conn = self._create_connection()
File "/usr/lib/python2.7/site-packages/pycassa/pool.py", line 431, in _create_connection
(exc.__class__.__name__, exc))
pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was TTransportException: Could not connect to localhost:9160
I am using python 2.7.
What is the problem, Any help would be appreciated.
Perhaps try specifying the host:
pool = ConnectionPool('Keyspace1', ['server_node_here:9160'])
General way to connect Cassandra with python.
from cassandra.cluster import Cluster
cluster = Cluster() #for connecting on localhost
cluster = Cluster(['192.168.0.1', '192.168.0.2']) #*for connecting on clusters (comment this line, if you are connecting with localhost)*
session = cluster.connect('testing')
You can also connect using model class with python
from cassandra.cqlengine import columns
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
from cassandra.cqlengine import connection
import uuid
from datetime import datetime
connection.setup(['127.0.0.1'], "testing") #testing is the keyspace
For detail information for model class implementation take a look: https://github.com/vishal-kr-yadav/NoSQL_Databases