How to connect Amazon Redshift to python - python

This is my python code and I want to connect my Amazon Redshift database to Python, but it is showing error in host.
Can anyone tell me the correct syntax? Am I passing all the parameters correctly?
con=psycopg2.connect("dbname = pg_table_def, host=redshifttest-icp.cooqucvshoum.us-west-2.redshift.amazonaws.com, port= 5439, user=me, password= secret")
This is the error:
OperationalError: could not translate host name "redshift://redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com," to address: Unknown host

It appears that you wish to run Amazon Redshift queries from Python code.
The parameters you would want to use are:
dbname: This is the name of the database you entered in the Database name field when the cluster was created.
user: This is you entered in the Master user name field when the cluster was created.
password: This is you entered in the Master user password field when the cluster was created.
host: This is the Endpoint provided in the Redshift management console (without the port at the end): redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com
port: 5439
For example:
con=psycopg2.connect("dbname=sales host=redshifttest-xyz.cooqucvshoum.us-west-2.redshift.amazonaws.com port=5439 user=master password=secret")

Old question but I just arrived here from Google.
The accepted answer doesn't work with SQLAlchemy, although it's powered by psycopg2:
sqlalchemy.exc.ArgumentError: Could not parse rfc1738 URL from string 'dbname=... host=... port=... user=... password=...'
What worked:
create_engine(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}#{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")
Which works with psycopg2 directly too:
psycopg2.connect(f"postgresql://{REDSHIFT_USER}:{REDSHIFT_PASSWORD}#{REDSHIFT_HOST}:{REDSHIFT_PORT}/{REDSHIFT_DATABASE}")
Using the postgresql dialect works because Amazon Redshift is based on PostgreSQL.
Hope it can help other people!

To connect to redshift, you need the
postgres+psycopg2
Install it as
For Python 3.x:
pip3 install psycopg2-binary
And then use
return create_engine(
"postgresql+psycopg2://%s:%s#%s:%s/%s"
% (REDSHIFT_USERNAME, urlquote(REDSHIFT_PASSWORD), REDSHIFT_HOST, RED_SHIFT_PORT,
REDSHIFT_DB,)
)

Well, for Redshift the idea is made COPY from S3, is faster than every different way, but here is some example to do it:
first you must install some dependencies
for linux users
sudo apt-get install libpq-dev
for mac users
brew install libpq
install with pip this dependencies
pip3 install psycopg2-binary
pip3 install sqlalchemy
pip3 install sqlalchemy-redshift
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dwtest"
USER = "youruser"
PASSWORD = "yourpassword"
HOST = "dwtest.awsexample.com"
PORT = "5439"
SCHEMA = "public"
S3_FULL_PATH = 's3://yourbucket/category_pipe.txt'
ARN_CREDENTIALS = 'arn:aws:iam::YOURARN:YOURROLE'
REGION = 'us-east-1'
############ CONNECTING AND CREATING SESSIONS ############
connection_string = "redshift+psycopg2://%s:%s#%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###########################################################
############ RUNNING COPY ############
copy_command = '''
copy category from '%s'
credentials 'aws_iam_role=%s'
delimiter '|' region '%s';
''' % (S3_FULL_PATH, ARN_CREDENTIALS, REGION)
s.execute(copy_command)
s.commit()
######################################
############ GETTING DATA ############
query = "SELECT * FROM category;"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print("row start >>>>>>>>>>>>>>>>>>>>")
for r in row :
print(" ---- %s" % r)
print("row end >>>>>>>>>>>>>>>>>>>>>>")
pretty(all_results)
s.close()
######################################

The easiest way to query AWS Redshift from python is through this Jupyter extension - Jupyter Redshift
Not only can you query and save your results but also write them back to the database from within the notebook environment.

Related

How to INSERT INTO Azure SQL database from Azure Databricks in Python

Since pyodbc cannot be installed to Azure databricks, I am trying to use jdbc to insert data into Azure SQL database by Python, but I can find sample code for that.
jdbcHostname = "xxxxxxx.database.windows.net"
jdbcDatabase = "yyyyyy"
jdbcPort = 1433
#jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2};user={3};password={4}".format(jdbcHostname, jdbcPort, jdbcDatabase, username, password)
jdbcUrl = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname, jdbcPort, jdbcDatabase)
connectionProperties = {
"user" : jdbcUsername,
"password" : jdbcPassword,
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
pushdown_query = "(INSERT INTO test (a, b) VALUES ('val_a', 'val_b')) insert_test"
Please advise how to write insertion code in Python.
Thanks.
If I may add, you should also be able to use a Spark data frame to insert to Azure SQL. Just use the connection string you get from Azure SQL.
connectionString = "<Azure SQL Connection string>"
data = spark.createDataFrame([(val_a, val_b)], ["a", "b"])
data.write.jdbc(connectionString, "<TableName>", mode="append")
Since pyodbc cannot be installed to Azure databricks
Actually, it seems you could install pyodbc in databricks.
%sh
apt-get -y install unixodbc-dev
/databricks/python/bin/pip install pyodbc
For more details, you could refer to this answer and this blog.
Pigging backing on Jon ... This is what I used to write data from a Azure databricks dataframe to a Azure SQL Database:
Hostname = "YOUR_SERVER.database.windows.net"
Database = "YOUR_DB"
port = 1433
UN = 'YOUR_USERNAME'
PW = 'YOUR_PASSWORD'
Url = "jdbc:sqlserver://{0}:{1};database={2};user={3};password= {4}".format(Hostname, Port, Database, UN, PW)
df.write.jdbc(Url, "schema.table", mode="append")

Authenticate from Linux to Windows SQL Server with pyodbc

I am trying to connect from a linux machine to a windows SQL Server with pyodbc.
I do have a couple of constraints:
Need to log on with a windows domain account
Need to use python3
Need to do it from Linux to Windows
Need to connect to a specific instance
I set up the environment as described by microsoft and have it working (I can import pyodbc and use the configured mussel driver).
I am not familiar with Windows domain authentication and what not, so there is where my problem is.
My connection string:
DRIVER={ODBC Driver 17 for SQL Server};SERVER=myserver.mydomain.com;PORT=1433;DATABASE=MyDatabase;Domain=MyCompanyDomain;Instance=MyInstance;UID=myDomainUser;PWD=XXXXXXXX;Trusted_Connection=yes;Integrated_Security=SSPI
Supposedly one should use "Trusted_Connection" to use the Windows domain authentication instead of directly authenticating with the SQL server.
The error I get when running pyodbc.connect(connString):
pyodbc.Error: ('HY000', '[HY000] [unixODBC][Microsoft][ODBC Driver 17 for SQL Server]SSPI Provider: No Kerberos credentials available (851968) (SQLDriverConnect)')
From other sources I read this should work on Windows as this code would use the credentials of the currently logged in user.
My question is how can I connect to a Windows SQL Server instance from Linux using Windows Domain credentials.
You must obtain a Kerberos ticket for this to work. Your example doesn't specify whether your Linux system is set up to authenticate via Kerberos or whether you have previously obtained a Kerberos ticket before your code hits your connection string.
If your Linux system is set up to authenticate via Kerberos, then as a proof of concept you can obtain a Kerberos ticket using kinit from the command line. Here's what works for me in python3 running in Ubuntu on Windows via the WSL. The python code:
#!/usr/bin/env python
# minimal example using Kerberos auth
import sys
import re
import pyodbc
driver='{ODBC Driver 17 for SQL Server}'
server = sys.argv[1]
database = sys.argv[2]
# trusted_connection uses kerberos ticket and ignores UID and PASSWORD in connection string
# https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/using-integrated-authentication?view=sql-server-ver15
try:
cnxn = pyodbc.connect(driver=driver, server=server, database=database, trusted_connection='yes')
cursor = cnxn.cursor()
except pyodbc.Error as ex:
msg = ex.args[1]
if re.search('No Kerberos', msg):
print('You must login using kinit before using this script.')
exit(1)
else:
raise
# Sample select query
cursor.execute("SELECT ##version;")
row = cursor.fetchone()
while row:
print(row[0])
row = cursor.fetchone()
print('success')
This tells you if you don't have a ticket. Since it uses a ticket you don't have to specify a user or password in the script. It will ignore both.
Now we run it:
user#localhost:~# kdestroy # make sure there are no active tickets
kdestroy: No credentials cache found while destroying cache
user#localhost:~# python pyodbc_sql_server_test.py tcp:dbserver.example.com mydatabase
You must login using kinit before using this script.
user#localhost:~# kinit
Password for user#DOMAIN.LOCAL:
user#localhost:~# python pyodbc_sql_server_test.py tcp:dbserver.example.com mydatabase
Microsoft SQL Server 2016 (SP2-GDR) (KB4505220) - 13.0.5101.9 (X64)
Jun 15 2019 23:15:58
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows Server 2016 Datacenter 10.0 <X64> (Build 14393: )
success
user#localhost:~#
You may also have success obtaining a Kerberos ticket from python code that runs before you make this connection but that is beyond the scope of this answer. A search for python Kerberos modules might point you toward a solution.
It also appears possible to set up the Linux system so that as soon as a user logs in it automatically obtains a Kerberos ticket that can be passed to other processes. That is also outside of the scope of this answer but a search for automatic Kerberos ticket upon Linux login may yield some clues.
I ended up using the pymssql library which basically is pyodbc on top of the FreeTDS driver. It worked out of the box.
Weird how I had such a hard time discovering this library..
I find two ways for same task. I have MSSQL server with AD auth.
You can use JVM.
Load and install JAVA https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html. Also install JPype1 version 0.6.3 pip install JPype==0.6.3. Version above 0.6.3 won't work correct
import jaydebeapi
import pandas as pd
driver_name = "net.sourceforge.jtds.jdbc.Driver"
connection_url="jdbc:jtds:sqlserver://<server>:<port>/<database name>"
connection_properties = {
"domain": "<domain name>",
"user": "<username>",
"password": "<pwd>"}
jar_path = <path to jsds>"/jtds-1.3.1.jar"
CONN = jaydebeapi.connect(driver_name, connection_url, connection_properties, jar_path)
sql = "SELECT * FROM INFORMATION_SCHEMA.COLUMNS"
df = pd.read_sql(sql, CONN)
This version was too slow for me.
Also You can use pyodbc via FreeTDS. To create a FreeTDS connection Install FreeTDS on your Linux apt-get install tdsodbc freetds-bin, configure FreeTDS /etc/odbcinst.ini like this:
[FreeTDS]
Description=FreeTDS
Driver=/usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup=/usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
and turn it on odbcinst -i -d -f /etc/odbcinst.ini
After that, you can use pyodbc
import pandas as pd
import pyodbc
CONN =pyodbc.connect('DRIVER={FreeTDS};'
'Server=<server>;'
'Database=<database>;'
'UID=<domain name>\\<username>;'
'PWD=<password>;'
'TDS_Version=8.0;'
'Port=1433;')
sql = "SELECT * FROM INFORMATION_SCHEMA.COLUMNS"
df = pd.read_sql(sql, CONN)
It's works much faster
I was trying to do the same thing and after reading the OPs answer I tested out pymssql and noticed that it worked with just the below:
pymssql.connect(server='myserver', user='domain\username', password='password', database='mydb')
After realizing that that was all pymssql needed I went back to pyodbc and was able to get it working with:
pyodbc.connect("DRIVER={FreeTDS};SERVER=myserver;PORT=1433;DATABASE=mydb;UID=domain\username;PWD=password;TDS_Version=8.0")
I just wanted to thank you for posting this as it helped me so greatly!!!! :)
Generating windows authentication via Linux is complex. EasySoftDB (commercial) used to be able to handle this, and FreeTDS has some convoluted support.
https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/using-integrated-authentication
My suggestion is to move away from Windows Authentication and use SQL authentication. Really there is no security difference, except that you are providing a username and password in the connection string. But this would make your life a lot easier.
I had the same issue and got the docker container for airflow using windows authentication by adding a few things to my airflow build. The apt install needs to be run as root.
USER root
RUN apt install -y krb5-config
RUN apt-get install -y krb5-user
COPY krb5.conf /etc/krb5.conf
In the krb5.conf file
[appdefaults]
default_lifetime = 52hrs
krb4_convert = false
krb4_convert_524 = false
ksu = {
forwardable = false
}
pam = {
minimum_uid = 100
forwardable = true
}
pam-afs-session = {
minimum_uid = 100
}
[libdefaults]
default_realm = DEFAULT_DOMAIN
ticket_lifetime = 52h
renew_lifetime = 90d
forwardable = true
noaddresses = true
allow_weak_crypto = true
rdns = false
[realms]
MY.COMPANY.LOCAL = {
kdc = SERVER.DEFAULT_DOMAIN
default_domain = DEFAULT_DOMAIN
}
[domain_realm]
my.company.local = DEFAULT_DOMAIN
[logging]
kdc = SYSLOG:NOTICE
admin_server = SYSLOG:NOTICE
default = SYSLOG:NOTICE
DEFAULT_DOMAIN for me is DOMAIN.COMPANY.COM. Others have .LOCAL at the end. Make sure it is all caps in the file. I had an error the first time I tried to authenticate.
Rebuild and then launch the shell for the airflow worker.
Run kinit USER
It will prompt for a password. Running klist afterwards to confirm you have a ticket. Once you get this working you should be able to authenticate to the server from python.

Unload to S3 with Python using IAM Role credentials

In Redshift, I run the following to unload data from a table into a file in S3:
unload('select * from table')
to 's3://bucket/unload/file_'
iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'
I would like to do the same in Python- any suggestion how to replicate this? I saw examples using access key and secret, but that is not an option for me- need to use role based credentials on a non-public bucket.
You will need two sets of credentials. IAM credentials via an IAM Role to access the S3 bucket and Redshift ODBC credentials to execute SQL commands.
Create a Python program that connects to Redshift, in a manner similar to other databases such as SQL Server, and execute your query. This program will need Redshift login credentials and not IAM credentials (Redshift username, password).
The IAM credentials for S3 are assigned as a role to Redshift so that Redshift can store the results on S3. This is the iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>' part of the Redshift query in your question.
You do not need boto3 (or boto) to access Redshift, unless you plan to actually interface with the Redshift API (which does not access the database stored inside Redshift).
Here is an example Python program to access Redshift. The link to this code is here. Credit due to Varun Verma
There are other examples on the Internet to help you get started.
############ REQUIREMENTS ####################
# sudo apt-get install python-pip
# sudo apt-get install libpq-dev
# sudo pip install psycopg2
# sudo pip install sqlalchemy
# sudo pip install sqlalchemy-redshift
##############################################
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
#>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<<
DATABASE = "dbname"
USER = "username"
PASSWORD = "password"
HOST = "host"
PORT = ""
SCHEMA = "public" #default is "public"
####### connection and session creation ##############
connection_string = "redshift+psycopg2://%s:%s#%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
###### All Set Session created using provided schema #######
################ write queries from here ######################
query = "unload('select * from table') to 's3://bucket/unload/file_' iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>';"
rr = s.execute(query)
all_results = rr.fetchall()
def pretty(all_results):
for row in all_results :
print "row start >>>>>>>>>>>>>>>>>>>>"
for r in row :
print " ----" , r
print "row end >>>>>>>>>>>>>>>>>>>>>>"
pretty(all_results)
########## close session in the end ###############
s.close()

How to connect to a cluster in Amazon Redshift using SQLAlchemy?

In Amazon Redshift's Getting Started Guide, it's mentioned that you can utilize SQL client tools that are compatible with PostgreSQL to connect to your Amazon Redshift Cluster.
In the tutorial, they utilize SQL Workbench/J client, but I'd like to utilize python (in particular SQLAlchemy). I've found a related question, but the issue is that it does not go into the detail or the python script that connects to the Redshift Cluster.
I've been able to connect to the cluster via SQL Workbench/J, since I have the JDBC URL, as well as my username and password, but I'm not sure how to connect with SQLAlchemy.
Based on this documentation, I've tried the following:
from sqlalchemy import create_engine
engine = create_engine('jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')
ERROR:
Could not parse rfc1738 URL from string 'jdbc:redshift://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy'
I don't think SQL Alchemy "natively" knows about Redshift. You need to change the JDBC "URL" string to use postgres.
jdbc:postgres://shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy
Alternatively, you may want to try using sqlalchemy-redshift using the instructions they provide.
I was running into the exact same issue, and then I remembered to include my Redshift credentials:
eng = create_engine('postgresql://[LOGIN]:[PASSWORD]#shippy.cx6x1vnxlk55.us-west-2.redshift.amazonaws.com:5439/shippy')
sqlalchemy-redshift is works for me, but after few days of reserch
packages (python3.4):
SQLAlchemy==1.0.14 sqlalchemy-redshift==0.5.0 psycopg2==2.6.2
First of all, I checked, that my query is working workbench (http://www.sql-workbench.net), then I force it work in sqlalchemy (this https://stackoverflow.com/a/33438115/2837890 helps to know that auto_commit or session.commit() must be):
db_credentials = (
'redshift+psycopg2://{p[redshift_user]}:{p[redshift_password]}#{p[redshift_host]}:{p[redshift_port]}/{p[redshift_database]}'
.format(p=config['Amazon_Redshift_parameters']))
engine = create_engine(db_credentials, connect_args={'sslmode': 'prefer'})
connection = engine.connect()
result = connection.execute(text(
"COPY assets FROM 's3://xx/xx/hello.csv' WITH CREDENTIALS "
"'aws_access_key_id=xxx_id;aws_secret_access_key=xxx'"
" FORMAT csv DELIMITER ',' IGNOREHEADER 1 ENCODING UTF8;").execution_options(autocommit=True))
result = connection.execute("select * from assets;")
print(result, type(result))
print(result.rowcount)
connection.close()
And after that, I forced to work sqlalchemy_redshift CopyCommand perhaps bad way, looks little tricky:
import sqlalchemy as sa
tbl2 = sa.Table(TableAssets, sa.MetaData())
copy = dialect_rs.CopyCommand(
assets,
data_location='s3://xx/xx/hello.csv',
access_key_id=access_key_id,
secret_access_key=secret_access_key,
truncate_columns=True,
delimiter=',',
format='CSV',
ignore_header=1,
# empty_as_null=True,
# blanks_as_null=True,
)
print(str(copy.compile(dialect=RedshiftDialect(), compile_kwargs={'literal_binds': True})))
print(dir(copy))
connection = engine.connect()
connection.execute(copy.execution_options(autocommit=True))
connection.close()
We make just that I made with sqlalchemy, excute query, except comine query by CopyCommand. I have not see some profit :(.
The following works for me with Databricks on all kinds of SQLs
import sqlalchemy as SA
import psycopg2
host = 'your_host_url'
username = 'your_user'
password = 'your_passw'
port = 5439
url = "{d}+{driver}://{u}:{p}#{h}:{port}/{db}".\
format(d="redshift",
driver='psycopg2',
u=username,
p=password,
h=host,
port=port,
db=db)
engine = SA.create_engine(url)
cnn = engine.connect()
strSQL = "your_SQL ..."
try:
cnn.execute(strSQL)
except:
raise
import sqlalchemy as db
engine = db.create_engine('postgres://username:password#url:5439/db_name')
This worked for me

How to create db in MySQL with SQLAlchemy?

I need to create a db in MySQL using SQLAlchemy, I am able to connect to a db if it already exists, but I want to be able to create it if it does not exist. These are my tables:
#def __init__(self):
Base = declarative_base()
class utente(Base):
__tablename__="utente"
utente_id=Column(Integer,primary_key=True)
nome_utente=Column(Unicode(20))
ruolo=Column(String(10))
MetaData.create_all()
def __repr(self):
return "utente: {0}, {1}, id: {2}".format(self.ruolo,self.nome_utente,self.utente_id)
class dbmmas(Base):
__tablename__="dbmmas"
db_id=Column(Integer,primary_key=True,autoincrement=True)
nome_db=Column(String(10))
censimento=Column(Integer)
versione=Column(Integer)
ins_data=Column(DateTime)
mod_data=Column(DateTime)
ins_utente=Column(Integer)
mod_utente=Column(Integer)
MetaData.create_all()
def __repr(self):
return "dbmmas: {0}, censimento {1}, versione {2}".format(self.nome_db,self.censimento,self.versione)
class funzione(Base):
__tablename__="funzione"
funzione_id=Column(Integer,primary_key=True,autoincrement=True)
categoria=Column(String(10))
nome=Column(String(20))
def __repr__(self):
return "funzione:{0},categoria:{1},id:{2} ".format(self.nome,self.categoria,self.funzione_id)
class profilo(Base):
__tablename__="rel_utente_funzione"
utente_id=Column(Integer,primary_key=True)
funzione_id=Column(Integer,primary_key=True)
amministratore=Column(Integer)
MetaData.create_all()
def __repr(self):
l=lambda x: "amministratore" if x==1 else "generico"
return "profilo per utente_id:{0}, tipo: {1}, funzione_id: {2}".format(self.utente_id,l(self.amministratore),self.funzione_id)
class aree(Base):
__tablename__="rel_utente_zona"
UTB_id=Column(String(10), primary_key=True) # "in realta' si tratta della seatureSignature della feature sullo shapefile"
utente_id=Column(Integer, primary_key=True)
amministratore=Column(Integer)
MetaData.create_all()
def __repr(self):
l=lambda x: "amministratore" if x==1 else "generico"
return "zona: {0}, pe utente_id:{1}, {2}".format(self.UTB_id,self.utente_id,l(self.amministratore))
class rel_utente_dbmmas(Base):
__tablename__="rel_utente_dbmmas"
utente_id=Column(Integer,primary_key=True)
db_id=Column(Integer,primary_key=True)
amministratore=(Integer)
MetaData.create_all()
def __repr(self):
l=lambda x: "amministratore" if x==1 else "generico"
return "dbregistrato: {0} per l'utente{1} {2}".format(self.db_id,self.utente_id,l(self.amministratore))
To create a mysql database you just connect to the server an create the database:
import sqlalchemy
engine = sqlalchemy.create_engine('mysql://user:password#server') # connect to server
engine.execute("CREATE DATABASE dbname") #create db
engine.execute("USE dbname") # select new db
# use the new db
# continue with your work...
of course your user has to have the permission to create databases.
You can use SQLAlchemy-Utils for that.
pip install sqlalchemy-utils
Then you can do things like
from sqlalchemy_utils import create_database, database_exists
url = 'mysql://{0}:{1}#{2}:{3}'.format(user, pass, host, port)
if not database_exists(url):
create_database(url)
I found the answer here, it helped me a lot.
I don't know what the canonical way is, but here's a way to check to see if a database exists by checking against the list of databases, and to create it if it doesn't exist.
from sqlalchemy import create_engine
# This engine just used to query for list of databases
mysql_engine = create_engine('mysql://{0}:{1}#{2}:{3}'.format(user, pass, host, port))
# Query for existing databases
existing_databases = mysql_engine.execute("SHOW DATABASES;")
# Results are a list of single item tuples, so unpack each tuple
existing_databases = [d[0] for d in existing_databases]
# Create database if not exists
if database not in existing_databases:
mysql_engine.execute("CREATE DATABASE {0}".format(database))
print("Created database {0}".format(database))
# Go ahead and use this engine
db_engine = create_engine('mysql://{0}:{1}#{2}:{3}/{4}'.format(user, pass, host, port, db))
Here's an alternative method if you don't need to know if the database was created or not.
from sqlalchemy import create_engine
# This engine just used to query for list of databases
mysql_engine = create_engine('mysql://{0}:{1}#{2}:{3}'.format(user, pass, host, port))
# Query for existing databases
mysql_engine.execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
# Go ahead and use this engine
db_engine = create_engine('mysql://{0}:{1}#{2}:{3}/{4}'.format(user, pass, host, port, db))
CREATE DATABASE IF NOT EXISTS dbName;
Would recommend using with:
from sqlalchemy import create_engine
username = ''
password = ''
host = 'localhost'
port = 3306
DB_NAME = 'db_name'
engine = create_engine(f"mysql://{username}:{password}#{host}:{port}")
with engine.connect() as conn:
# Do not substitute user-supplied database names here.
conn.execute(f"CREATE DATABASE IF NOT EXISTS {DB_NAME}")
The mysqlclient seems to be up to 10 times faster in benchmark tests than PyMySQL, see: What's the difference between MySQLdb, mysqlclient and MySQL connector/Python?.
Yet, why not use a Python-ready package for Python, at least, if it is not about every second of query time? PyMySQL is suggested by the following links, for example:
Using SQLAlchemy to access MySQL without frustrating library installation issues
How to connect MySQL database using Python+SQLAlchemy remotely?.
Python packages:
Install with pip, at best put in "requirements.txt":
PyMySQL
SQLAlchemy
Again, if it is about the best speed of the query, use mysqlclient package. Then you need to install an additional Linux package with sudo apt-get install libmysqlclient-dev.
import statements
Only one needed:
import sqlalchemy
Connection string (= db_url)
Connection string starting with {dialect/DBAPI}+{driver}:
db_url = mysql+pymysql://
where pymysql stands for the used Python package "PyMySQL" as the driver.
Again, if it is about the best speed of the query, use mysqlclient package. Then you need mysql+msqldb:// at this point.
For a remote connection, you need to add to the connection string:
host
user
password
database
port (the port only if it is not the standard 3306)
You can create your db_url with several methods. Do not write user and password and at best any other variable value directly in the string to avoid possible attacks:
sqlalchemy.engine.URL.create(), or with .url.URL, see an example at Connecting from Cloud Functions to Cloud SQL or an example which automatically adds ? suffixes, for example ?driver=SQL+Server, at the end of the string at Building a connection URL for mssql+pyodbc with sqlalchemy.engine.url.URL
f"""...{my_var}..."""
"""...{my_var}...""".format(my_var=xyz_var)
...
Example without the url helper of SQLAlchemy:
db_url = "{dialect}+{driver}://{user}:{password}#{host}:{port}/{database}".format(
or:
db_url = "{dialect}+{driver}://{user}:{password}#{host}/{database}?host={host}?port={port}".format(
dialect = 'mysql',
driver = 'pymysql',
username=db_user,
password=db_pass,
database=db_name,
host=db_host,
port=db_port
)
Other engine configurations
For other connection drivers, dialects and methods, see the SQLAlchemy 1.4 Documentation - Engine Configuration
Create the db if not exists
See How to create a new database using SQLAlchemy?.
engine = sqlalchemy.create_engine(db_url)
if not sqlalchemy.database_exists(engine.url):
create_database(engine.url)
with engine.connect() as conn:
conn.execute("commit")
conn.execute("create database test")

Categories

Resources