I am trying to connect to Exasol DB through SQL Alchemy
I installed SQL Alchemy using:
pip install sqlalchemy-exasol
Code is below:
from sqlalchemy import create_engine
e = create_engine("exa+pyodbc://<user>:<password>#<host>:<port>/<schema>?CONNECTIONLCALL=en_US.UTF-8&driver=com.exasol.jdbc.EXADriver")
e.execute("Select count(*) from TableA").fetchall()
I have also tried this:
e = create_engine("exa+pyodbc://<user>:<password>#<host>:<port>/<schema>")
Either way I getting the following error:
Traceback (most recent call last):
File "C:\Users\xxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\sqlalchemy\pool.py", line 1122, in _do_get
return self._pool.get(wait, self._timeout)
File "C:\Users\xx\AppData\Local\Continuum\Anaconda3\lib\site-packages\sqlalchemy\util\queue.py", line 145, in get
raise Empty
sqlalchemy.util.queue.Empty
During handling of the above exception, another exception occurred:
sqlalchemy.exc.InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Any ideas? Something must be wrong with my connection details format but I am not sure what...
The error message sounds like the odbc driver DNS was not configured properly. Install the ODBC driver.
Related
sol.spark.sql("select * from type_match")
2022-04-19 10:31:33 WARN FileStreamSink:66 - Error while looking for metadata directory.
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Users\z00635559\PycharmProjects\pythonProject2\venv\lib\site-packages\pyspark\sql\session.py", line 710, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "C:\Users\z00635559\PycharmProjects\pythonProject2\venv\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "C:\Users\z00635559\PycharmProjects\pythonProject2\venv\lib\site-packages\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "C:\Users\z00635559\PycharmProjects\pythonProject2\venv\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o22.sql.
: java.util.concurrent.ExecutionException: org.apache.hadoop.net.ConnectTimeoutException: Call From A191136324/10.58.0.0 to 10.58.0.1:9000 failed on socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=10.58.245.43/10.58.245.43:9000]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout
at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
at org.spark_project.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:137)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:227)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:264)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:255)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
Since I moved my worplace to another workplace, it seems like my IP address has changed, but on my hive it still not changed, it shows i am still call from the newest IP address to old IP, my hive is running okay, i can query table in my hive, but when i query table from pyspark, it seems get stuck first for a while, then tell me that calling to the wrong IP, is there any settings i should modify??
PS: changed dbs and sds in mysql engine, and i could access data from hive, but cannot query data from spark
thanks,
I hope you have copied hive-site.xml in the spark conf folder?
Why it's needed because - SparkSQL uses its default metastore (derby), so it doesn't have info about hive metastore.
So you have to copy hive-site.xml in the spark conf folder($SPARK_HOME/conf)
you also need to register the table before executing the SQL. Below code worked for me, while accessing table from hive using Pyspark.
from pyspark.sql import HiveContext
hiveContext = HiveContext(sc)
table = hiveContext.table("schema.table")
table.registerTempTable("table_name")
hiveContext.sql("select * from table").show()
I'm upgrading a database application from Python 2.7 to 3.4. I originally used mysqldb, but I'm trying to switch to mysql-connector. My database connection fails with an internal TypeError. Here's the code and the error:
import mysql.connector
try:
dbh = mysql.connector.connect("localhost","pyuser","pypwd","jukebox")
except mysql.connector.Error as err:
print("Failed opening database: {}".format(err))
exit(1)
and here's the error:
# python loadcd.py
Traceback (most recent call last):
File "loadcd.py", line 12, in <module>
dbh = mysql.connector.connect("127.0.0.1","pyuser","pypwd","jukebox")
File "/usr/local/lib/python3.4/dist-packages/mysql/connector/__init__.py", line 179, in connect
return MySQLConnection(*args, **kwargs)
File "/usr/local/lib/python3.4/dist-packages/mysql/connector/connection.py", line 57, in __init__
super(MySQLConnection, self).__init__(*args, **kwargs)
TypeError: __init__() takes 1 positional argument but 5 were given
I'm at a loss. The exact same connection works with mysqldb, and I can connect with the same credentials using PHP or at the command line.
You should provide keyword arguments:
dbh = mysql.connector.connect(host="localhost",user="pyuser",password="pypwd",database="jukebox")
Use this and replace the values as per your configuration:
import mysql.connector
mydb = mysql.connector.connect(host="localhost",
user="yourusername",
password="yourpassword")
print(mydb)
connstr = """Provider=Microsoft.SQLSERVER.CE.OLEDB.3.5;DataSource=first.sdf;"""
conn = adodbapi.connect(connstr)
cur = conn.cursor()
getresult="select * from ft"
cur.execute(getresult)
result=cur.fetchall()
How can i solve the following error?
Traceback (most recent call last):
File "e:\python1\sqlcompactdb\compact.py", line 7, in <module>
connection = adodbapi.connect(connection_string)
File "C:\Users\khan\AppData\Local\Programs\Python\Python36-32\lib\site-packages\adodbapi\adodbapi.py", line 116, in connect
raise api.OperationalError(e, message)
adodbapi.apibase.OperationalError: (InterfaceError("Windows COM Error: Dispatch('ADODB.Connection') failed.",), 'Error opening connection to "Provider=Microsoft.SQLSERVER.CE.OLEDB.4.0; Data Source=E:\\python1\\sqlcompact\\first.sdf;"')
As the error implies, this issue stems from an error when the module tries to make an ADO database connection.
Specifically, when the following code executes
pythoncom.CoInitialize()
c = win32com.client.Dispatch('ADODB.Connection')
This is most likely due to hardware issues like the lack of the correct provider for the needed connection.
Solutions to a similar problem can be found at Connecting to SQLServer 2005 with adodbapi
While connecting to Hive2 using Python with below code:
import pyhs2
with pyhs2.connect(host='localhost',
port=10000,
authMechanism="PLAIN",
user='root',
password='test',
database='default') as conn:
with conn.cursor() as cur:
#Show databases
print cur.getDatabases()
#Execute query
cur.execute("select * from table")
#Return column info from query
print cur.getSchema()
#Fetch table results
for i in cur.fetch():
print i
I am getting below error:
File
"C:\Users\vinbhask\AppData\Roaming\Python\Python36\site-packages\pyhs2-0.6.0-py3.6.egg\pyhs2\connections.py",
line 7, in <module>
from cloudera.thrift_sasl import TSaslClientTransport ModuleNotFoundError: No module named 'cloudera'
Have tried here and here but issue wasn't resolved.
Here is the packages installed till now:
bitarray0.8.1,certifi2017.7.27.1,chardet3.0.4,cm-api16.0.0,cx-Oracle6.0.1,future0.16.0,idna2.6,impyla0.14.0,JayDeBeApi1.1.1,JPype10.6.2,ply3.10,pure-sasl0.4.0,PyHive0.4.0,pyhs20.6.0,pyodbc4.0.17,requests2.18.4,sasl0.2.1,six1.10.0,teradata15.10.0.21,thrift0.10.0,thrift-sasl0.2.1,thriftpy0.3.9,urllib31.22
Error while using Impyla:
Traceback (most recent call last):
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\Scripts\HiveConnTester4.py", line 1, in <module>
from impala.dbapi import connect
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\dbapi.py", line 28, in <module>
import impala.hiveserver2 as hs2
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\hiveserver2.py", line 33, in <module>
from impala._thrift_api import (
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\impala\_thrift_api.py", line 74, in <module>
include_dirs=[thrift_dir])
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\__init__.py", line 30, in load
include_dir=include_dir)
File "C:\Users\xxxxx\AppData\Local\Programs\Python\Python36-32\lib\site-packages\thriftpy\parser\parser.py", line 496, in parse
url_scheme))
thriftpy.parser.exc.ThriftParserError: ThriftPy does not support generating module with path in protocol 'c'
thrift_sasl.py is trying cStringIO which is no longer available in Python 3.0. Try with python 2 ?
You may need to install an unreleased version of thrift_sasl. Try:
pip install git+https://github.com/cloudera/thrift_sasl
If you're comfortable learning PySpark, then you just need to setup the hive.metastore.uris property to point at the Hive Metastore address, and you're ready to go.
The easiest way to do that would be to export the hive-site.xml from the your cluster, then pass --files hive-site.xml during spark-submit.
(I haven't tried running standalone Pyspark, so YMMV)
I know that I need to "import MySQLdb" to connect to a MySQL database. But what is the name of the library that we need to import when we are using "cleardb mysql"?
I am hard coding as below to connect, but I guess due to wrong library, I am getting errors. Below are my points to explain my situation::
1) I have installed "MySQldb", and imported it through import keyword.
2) when I use port number in the connectivity syntax, I got "TypeError: an integer is required".
db = MySQLdb.connect("server_IP",3306,"uid","pwd","db_name")
so I removed the port number
import MySQLdb
db = MySQLdb.connect("server_IP","uid","pwd","db_name")
cur = db.cursor()
and the error vanishes. Is that the right method?
3) Everything goes fine until I execute the function "curson.execution("SELECT VERSION()")" to execute sql queries.
curson.execution("SELECT VERSION()")
I am getting error as below
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
cursor.execute("use d_7fc249f763d6fc2")
File "path_to\cursors.py", line 205, in execute
self.errorhandler(self, exc, value)
File "path_to\connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
OperationalError: (2006, 'MySQL server has gone away')
So, is this happening due to library that I have imported? or, if the library is correct, then what seems to be the issue?
The port number is the fifth positional argument, not the second.
db = MySQLdb.connect("server_IP", "uid", "pwd", "db_name", 3306)