How to connect to a cluster in Amazon Redshift using SQLAlchemy? - python

In Amazon Redshift's Getting Started Guide, it's mentioned that you can utilize SQL client tools that are compatible with PostgreSQL to connect to your Amazon Redshift Cluster.
In the tutorial, they utilize SQL Workbench/J client, but I'd like to utilize python (in particular SQLAlchemy). I've found a related question, but the issue is that it does not go into the detail or the python script that connects to the Redshift Cluster.
I've been able to connect to the cluster via SQL Workbench/J, since I have the JDBC URL, as well as my username and password, but I'm not sure how to connect with SQLAlchemy.
Based on this documentation, I've tried the following:
from sqlalchemy import create_engine
engine = create_engine('jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')
ERROR:
Could not parse rfc1738 URL from string 'jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy'

I don't think SQL Alchemy "natively" knows about Redshift. You need to change the JDBC "URL" string to use postgres.
jdbc:postgres://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy
Alternatively, you may want to try using sqlalchemy-redshift using the instructions they provide.

I was running into the exact same issue, and then I remembered to include my Redshift credentials:
eng = create_engine('postgresql://[LOGIN]:[PASSWORD]#shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')

sqlalchemy-redshift is works for me, but after few days of reserch
packages (python3.4):
SQLAlchemy==1.0.14 sqlalchemy-redshift==0.5.0 psycopg2==2.6.2
First of all, I checked, that my query is working workbench (http://www.sql-workbench.net), then I force it work in sqlalchemy (this https://stackoverflow.com/a/33438115/2837890 helps to know that auto_commit or session.commit() must be):
db_credentials = (
'redshift+psycopg2://{p[redshift_user]}:{p[redshift_password]}#{p[redshift_host]}:{p[redshift_port]}/{p[redshift_database]}'
.format(p=config['Amazon_Redshift_parameters']))
engine = create_engine(db_credentials, connect_args={'sslmode': 'prefer'})
connection = engine.connect()
result = connection.execute(text(
"COPY assets FROM 's3://xx/xx/hello.csv' WITH CREDENTIALS "
"'aws_access_key_id=xxx_id;aws_secret_access_key=xxx'"
" FORMAT csv DELIMITER ',' IGNOREHEADER 1 ENCODING UTF8;").execution_options(autocommit=True))
result = connection.execute("select * from assets;")
print(result, type(result))
print(result.rowcount)
connection.close()
And after that, I forced to work sqlalchemy_redshift CopyCommand perhaps bad way, looks little tricky:
import sqlalchemy as sa
tbl2 = sa.Table(TableAssets, sa.MetaData())
copy = dialect_rs.CopyCommand(
assets,
data_location='s3://xx/xx/hello.csv',
access_key_id=access_key_id,
secret_access_key=secret_access_key,
truncate_columns=True,
delimiter=',',
format='CSV',
ignore_header=1,
# empty_as_null=True,
# blanks_as_null=True,
)
print(str(copy.compile(dialect=RedshiftDialect(), compile_kwargs={'literal_binds': True})))
print(dir(copy))
connection = engine.connect()
connection.execute(copy.execution_options(autocommit=True))
connection.close()
We make just that I made with sqlalchemy, excute query, except comine query by CopyCommand. I have not see some profit :(.

The following works for me with Databricks on all kinds of SQLs
import sqlalchemy as SA
import psycopg2
host = 'your_host_url'
username = 'your_user'
password = 'your_passw'
port = 5439
url = "{d}+{driver}://{u}:{p}#{h}:{port}/{db}".\
format(d="redshift",
driver='psycopg2',
u=username,
p=password,
h=host,
port=port,
db=db)
engine = SA.create_engine(url)
cnn = engine.connect()
strSQL = "your_SQL ..."
try:
cnn.execute(strSQL)
except:
raise

import sqlalchemy as db
engine = db.create_engine('postgres://username:password#url:5439/db_name')
This worked for me

Related

How to specify a search path with SQL Alchemy and pg8000?

I'm trying to connect to a postgres db using SQL Alchemy and the pg8000 driver. I'd like to specify a search path for this connection. With the Psycopg driver, I could do this by doing something like
engine = create_engine(
'postgresql+psycopg2://dbuser#dbhost:5432/dbname',
connect_args={'options': '-csearch_path={}'.format(dbschema)})
However, this does not work for the pg8000 driver. Is there a good way to do this?
You can use pg8000 pretty much in the same way as psycopg2, just need to swap scheme from postgresql+psycopg2 to postgresql+pg8000.
The full connection string definition is in the SQLAlchemy pg8000 docs:
postgresql+pg8000://user:password#host:port/dbname[?key=value&key=value...]
But while psycopg2.connect will pass kwargs to the server (like options and its content), pg8000.connect will not, so there is no setting search_path with pg8000.
The SQLAlchemy docs describe how to do this. For example:
from sqlalchemy import create_engine, event, text
engine = create_engine("postgresql+pg8000://postgres:postgres#localhost/postgres")
#event.listens_for(engine, "connect", insert=True)
def set_search_path(dbapi_connection, connection_record):
existing_autocommit = dbapi_connection.autocommit
dbapi_connection.autocommit = True
cursor = dbapi_connection.cursor()
cursor.execute("SET SESSION search_path='myschema'")
cursor.close()
dbapi_connection.autocommit = existing_autocommit
with engine.connect() as connection:
result = connection.execute(text("SHOW search_path"))
for row in result:
print(row)
However, as it says in the docs:
SQLAlchemy is generally organized around the concept of keeping this
variable at its default value of public

Connecting to multiple hosts in Hive with SqlAlchemy

I've already had a working connection through ODBC using Cloudera ODBC Driver for Apache Hive, where I had my DSN set and all I needed was to call pyodbc.connect(f"DSN={mydsn}", autocommit=True).
Since I'm planning to use pandas on the query result, I've read that SQLAlchemy is the preferred choice and I'd like to avoid warnings resulting from other ways of connection. My DSN for Hive was using Zookeeper and "Hosts" field was filled in the form of host1:2181,host2:2181,host3:2181. I'm trying to connect to these 3 hosts and I've tried changing connection url in analogous way to the one provided in here, but I got invalid literal for int() with base 10: '2181,host2:2181,host3:2181 etc.
from sqlalchemy import create_engine
query = """SELECT TOP 10 * from eb.mobile_sa"""
conn_url = f'hive://{UID}#host1:2181,host2:2181,host3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2'
engine = create_engine(conn_url)
with engine.connect() as conn:
df = pd.read_sql(query, conn)
I found kazoo module that is said to be Zookeeper implementation in Python, but when I tried the very first lines from Basic Usage and just 1 host:
from kazoo.client import KazooClient
zk = KazooClient(hosts = "host1:2181", read_only=True)
zk.start()
I got a lot of lines of Connection dropped: socket connection error
How can I correctly connect to multiple hosts in hive?

Connect to MSSQL Database using Flask-SQLAlchemy

I'm trying to connect to a local MSSQL DB through Flask-SQLAlchemy.
Here's a code excerpt from my __init__.py file:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'mssql+pyodbc://HARRISONS-THINK/LendApp'
db = SQLAlchemy(app)
SQLALCHEMY_TRACK_MODIFICATIONS = False
As you can see in SQL Server Management Studio, this information seems to match:
Here is the creation of a simple table in my models.py file:
from LendApp import db
class Transaction(db.model):
transactionID = db.Column(db.Integer, primary_key=True)
amount = db.Column(db.Integer)
sender = db.Column(db.String(80))
receiver = db.Column(db.String(80))
def __repr__(self):
return 'Transaction ID: {}'.format(self.transactionID)
I am then connecting to the database using a Python Console within Pycharm via the execution of these two lines:
>>> from LendApp import db
>>> db.create_all()
This is resulting in the following error:
DBAPIError: (pyodbc.Error) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
The only thing that I can think of is that my database connection string is incorrect. I have tried altering it to more of a standard Pyodbc connection string and including driver={SQL SERVER} but to no prevail.
If anyone could help me out with this it would be highly appreciated.
Thanks
So I just had a very similar problem and was able to solve by doing the following.
Following the SQL Alchemy documentation I found I could use the my pyodbc connection string like this:
# Python 2.x
import urllib
params = urllib.quote_plus("DRIVER={SQL Server Native Client 10.0};SERVER=dagger;DATABASE=test;UID=user;PWD=password")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# Python 3.x
import urllib
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 10.0};SERVER=dagger;DATABASE=test;UID=user;PWD=password")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# using the above logic I just did the following
params = urllib.parse.quote_plus('DRIVER={SQL Server};SERVER=HARRISONS-THINK;DATABASE=LendApp;Trusted_Connection=yes;')
app.config['SQLALCHEMY_DATABASE_URI'] = "mssql+pyodbc:///?odbc_connect=%s" % params
This then caused an additional error because I was also using Flask-Migrate and apparently it doesn't like % in the connection URI. So I did some more digging and found this post. I then changed the following line in my ./migrations/env.py file
From:
from flask import current_app
config.set_main_option('sqlalchemy.url',
current_app.config.get('SQLALCHEMY_DATABASE_URI'))
To:
from flask import current_app
db_url_escaped = current_app.config.get('SQLALCHEMY_DATABASE_URI').replace('%', '%%')
config.set_main_option('sqlalchemy.url', db_url_escaped)
After doing all this I was able to do my migrations and everything seems as if it is working correctly now.
If someone still stumbled upon this issue and trying to figure out another solution then try with pymssql instead of pyodbc;
pip install pymssql
Connection URI would be:
conn_uri = "mssql+pymssql://<username>:<password>#<servername>/<dbname>"
I just changed my connection string something like this and its worked perfectly
NOTE: you need to install pyodbc to work....
app.config["SQLALCHEMY_DATABASE_URI"] = "mssql+pyodbc://user:pwd#server/database?driver=SQL+Server"
Note:
Try to avoid '#' character in password. you will get error because connection string also has '#' character after password. This also can cause the connection error
I had the same problem, it was resolved by specifying:
app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = "mssql+pyodbc://MySQLServerName/MyTestDb?driver=SQL+Server?trusted_connection=yes"
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
db.init_app(app)
using below solution i get resolve my connection issue with MSSQL server
params = urllib.parse.quote_plus('DRIVER={SQL Server};SERVER=HARRISONS-THINK;DATABASE=LendApp;Trusted_Connection=yes;')
app.config['SQLALCHEMY_DATABASE_URI'] = "mssql+pyodbc:///?odbc_connect=%s" % params
If you are getting any Login failed for User error then please go to this
http://itproguru.com/expert/2014/09/how-to-fix-login-failed-for-user-microsoft-sql-server-error-18456-step-by-step-add-sql-administrator-to-sql-management-studio/.
I believe your connection string is missing the authentication details. From Flask-SQLAlchemy documentation, the connection string should have the following format
dialect+driver://username:password#host:port/database
From your example, I believe it will look something like this
app.config['SQLALCHEMY_DATABASE_URI'] = 'mssql+pyodbc://<username>:<password>#<Host>:<Port>/LendApp'

How do I connect to SQL Server via sqlalchemy using Windows Authentication?

sqlalchemy, a db connection module for Python, uses SQL Authentication (database-defined user accounts) by default. If you want to use your Windows (domain or local) credentials to authenticate to the SQL Server, the connection string must be changed.
By default, as defined by sqlalchemy, the connection string to connect to the SQL Server is as follows:
sqlalchemy.create_engine('mssql://*username*:*password*#*server_name*/*database_name*')
This, if used using your Windows credentials, would throw an error similar to this:
sqlalchemy.exc.DBAPIError: (Error) ('28000', "[28000] [Microsoft][ODBC SQL Server Driver][SQL Server]Login failed for us
er '***S\\username'. (18456) (SQLDriverConnect); [28000] [Microsoft][ODBC SQL Server Driver][SQL Server]Login failed for us
er '***S\\username'. (18456)") None None
In this error message, the code 18456 identifies the error message thrown by the SQL Server itself. This error signifies that the credentials are incorrect.
In order to use Windows Authentication with sqlalchemy and mssql, the following connection string is required:
ODBC Driver:
engine = sqlalchemy.create_engine('mssql://*server_name*/*database_name*?trusted_connection=yes')
SQL Express Instance:
engine = sqlalchemy.create_engine('mssql://*server_name*\\SQLEXPRESS/*database_name*?trusted_connection=yes')
If you're using a trusted connection/AD and not using username/password, or otherwise see the following:
SAWarning: No driver name specified; this is expected by PyODBC when using >DSN-less connections
"No driver name specified; "
Then this method should work:
from sqlalchemy import create_engine
server = <your_server_name>
database = <your_database_name>
engine = create_engine('mssql+pyodbc://' + server + '/' + database + '?trusted_connection=yes&driver=ODBC+Driver+13+for+SQL+Server')
A more recent response if you want to connect to the MSSQL DB from a different user than the one you're logged with on Windows. It works as well if you are connecting from a Linux machine with FreeTDS installed.
The following worked for me from both Windows 10 and Ubuntu 18.04 using Python 3.6 & 3.7:
import getpass
from sqlalchemy import create_engine
password = getpass.getpass()
eng_str = fr'mssql+pymssql://{domain}\{username}:{password}#{hostip}/{db}'
engine = create_engine(eng_str)
What changed was to add the Windows domain before \username.
You'll need to install the pymssql package.
Create Your SqlAlchemy Connection URL      From Your pyodbc Connection String      OR Your Known Connection Parameters
I found all the other answers to be educational, and I found the SqlAlchemy Docs on connection strings helpful too, but I kept failing to connect to MS SQL Server Express 19 where I was using no username or password and trusted_connection='yes' (just doing development at this point).
Then I found THIS method in the SqlAlchemy Docs on Connection URLs built from a pyodbc connection string (or just a connection string), which is also built from known connection parameters (i.e. this can simply be thought of as a connection string that is not necessarily used in pyodbc). Since I knew my pyodbc connection string was working, this seemed like it would work for me, and it did!
This method takes the guesswork out of creating the correct format for what you feed to the SqlAlchemy create_engine method. If you know your connection parameters, you put those into a simple string per the documentation exemplified by the code below, and the create method in the URL class of the sqlalchemy.engine module does the correct formatting for you.
The example code below runs as is and assumes a database named master and an existing table named table_one with the schema shown below. Also, I am using pandas to import my table data. Otherwise, we'd want to use a context manager to manage connecting to the database and then closing the connection like HERE in the SqlAlchemy docs.
import pandas as pd
import sqlalchemy
from sqlalchemy.engine import URL
# table_one dictionary:
table_one = {'name': 'table_one',
'columns': ['ident int IDENTITY(1,1) PRIMARY KEY',
'value_1 int NOT NULL',
'value_2 int NOT NULL']}
# pyodbc stuff for MS SQL Server Express
driver='{SQL Server}'
server='localhost\SQLEXPRESS'
database='master'
trusted_connection='yes'
# pyodbc connection string
connection_string = f'DRIVER={driver};SERVER={server};'
connection_string += f'DATABASE={database};'
connection_string += f'TRUSTED_CONNECTION={trusted_connection}'
# create sqlalchemy engine connection URL
connection_url = URL.create(
"mssql+pyodbc", query={"odbc_connect": connection_string})
""" more code not shown that uses pyodbc without sqlalchemy """
engine = sqlalchemy.create_engine(connection_url)
d = {'value_1': [1, 2], 'value_2': [3, 4]}
df = pd.DataFrame(data=d)
df.to_sql('table_one', engine, if_exists="append", index=False)
Update
Let's say you've installed SQL Server Express on your linux machine. You can use the following commands to make sure you're using the correct strings for the following:
For the driver: odbcinst -q -d
For the server: sqlcmd -S localhost -U <username> -P <password> -Q 'select ##SERVERNAME'
pyodbc
I think that you need to put:
"+pyodbc" after mssql
try this:
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#host:port/databasename?driver=ODBC+Driver+17+for+SQL+Server")
cnxn = engine.connect()
It works for me
Luck!
If you are attempting to connect:
DNS-less
Windows Authentication for a server not locally hosted.
Without using ODBC connections.
Try the following:
import sqlalchemy
engine = sqlalchemy.create_engine('mssql+pyodbc://' + server + '/' + database + '?trusted_connection=yes&driver=SQL+Server')
This avoids using ODBC connections and thus avoids pyobdc interface errors from DPAPI2 vs DBAPI3 conflicts.
I would recommend using the URL creation tool instead of creating the url from scratch.
connection_url = sqlalchemy.engine.URL.create("mssql+pyodbc",database=databasename, host=servername, query = {'driver':'SQL Server'})
engine = sqlalchemy.create_engine(connection_url)
See this link for creating a connection string with SQL Server Authentication (non-domain, uses username and password)

Connect to an URI in postgres

I'm guessing this is a pretty basic question, but I can't figure out why:
import psycopg2
psycopg2.connect("postgresql://postgres:postgres#localhost/postgres")
Is giving the following error:
psycopg2.OperationalError: missing "=" after
"postgresql://postgres:postgres#localhost/postgres" in connection info string
Any idea? According to the docs about connection strings I believe it should work, however it only does like this:
psycopg2.connect("host=localhost user=postgres password=postgres dbname=postgres")
I'm using the latest psycopg2 version on Python2.7.3 on Ubuntu12.04
I would use the urlparse module to parse the url and then use the result in the connection method. This way it's possible to overcome the psycop2 problem.
from urlparse import urlparse # for python 3+ use: from urllib.parse import urlparse
result = urlparse("postgresql://postgres:postgres#localhost/postgres")
username = result.username
password = result.password
database = result.path[1:]
hostname = result.hostname
port = result.port
connection = psycopg2.connect(
database = database,
user = username,
password = password,
host = hostname,
port = port
)
The connection string passed to psycopg2.connect is not parsed by psycopg2: it is passed verbatim to libpq. Support for connection URIs was added in PostgreSQL 9.2.
To update on this, Psycopg3 does actually include a way to parse a database connection URI.
Example:
import psycopg # must be psycopg 3
pg_uri = "postgres://jeff:hunter2#example.com/db"
conn_dict = psycopg.conninfo.conninfo_to_dict(pg_uri)
with psycopg.connect(**conn_dict) as conn:
...
Another option is using SQLAlchemy for this. It's not just ORM, it consists of two distinct components Core and ORM, and it can be used completely without using ORM layer.
SQLAlchemy provides such functionality out of the box by create_engine function. Moreover, via URI you can specify DBAPI driver or many various postgresql settings.
Some examples:
# default
engine = create_engine("postgresql://user:pass#localhost/mydatabase")
# psycopg2
engine = create_engine("postgresql+psycopg2://user:pass#localhost/mydatabase")
# pg8000
engine = create_engine("postgresql+pg8000://user:pass#localhost/mydatabase")
# psycopg3 (available only in SQLAlchemy 2.0, which is currently in beta)
engine = create_engine("postgresql+psycopg://user:pass#localhost/test")
And here is a fully working example:
import sqlalchemy as sa
# set connection URI here ↓
engine = sa.create_engine("postgresql://user:password#db_host/db_name")
ddl_script = sa.DDL("""
CREATE TABLE IF NOT EXISTS demo_table (
id serial PRIMARY KEY,
data TEXT NOT NULL
);
""")
with engine.begin() as conn:
# do DDL and insert data in a transaction
conn.execute(ddl_script)
conn.exec_driver_sql("INSERT INTO demo_table (data) VALUES (%s)",
[("test1",), ("test2",)])
conn.execute(sa.text("INSERT INTO demo_table (data) VALUES (:data)"),
[{"data": "test3"}, {"data": "test4"}])
with engine.connect() as conn:
cur = conn.exec_driver_sql("SELECT * FROM demo_table LIMIT 2")
for name in cur.fetchall():
print(name)
# you also can obtain raw DBAPI connection
rconn = engine.raw_connection()
SQLAlchemy provides many other benefits:
You can easily switch DBAPI implementations just by changing URI (psycopg2, psycopg2cffi, etc), or maybe even databases.
It implements connection pooling out of the box (both psycopg2 and psycopg3 has connection pooling, but API is different)
asyncio support via create_async_engine (psycopg3 also supports asyncio).

Categories

Resources