How to get the column names in redshift using Python boto3

How to get the column names in redshift using Python boto3 - python

I want to get the column names in redshift using python boto3
Creaed Redshift Cluster
Insert Data into it
Configured Secrets Manager
Configure SageMaker Notebook
Open the Jupyter Notebook wrote the below code
import boto3
import time
client = boto3.client('redshift-data')
response = client.execute_statement(ClusterIdentifier = "test", Database= "dev", SecretArn= "{SECRET-ARN}",Sql= "SELECT `COLUMN_NAME` FROM `INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_SCHEMA`='dev' AND `TABLE_NAME`='dojoredshift'")
I got the response but there is no table schema inside it
Below is the code i used to connect I am getting timed out
import psycopg2
HOST = 'xx.xx.xx.xx'
PORT = 5439
USER = 'aswuser'
PASSWORD = 'Password1!'
DATABASE = 'dev'
def db_connection():
conn = psycopg2.connect(host=HOST,port=PORT,user=USER,password=PASSWORD,database=DATABASE)
return conn
How to get the ip address go to https://ipinfo.info/html/ip_checker.php
pass your hostname of redshiftcluster xx.xx.us-east-1.redshift.amazonaws.com or you can see in cluster page itself
I got the error while running above code
OperationalError: could not connect to server: Connection timed out
Is the server running on host "x.xx.xx..xx" and accepting
TCP/IP connections on port 5439?

I fixed with the code, and add the above the rules
import boto3
import psycopg2
# Credentials can be set using different methodologies. For this test,
# I ran from my local machine which I used cli command "aws configure"
# to set my Access key and secret access key
client = boto3.client(service_name='redshift',
region_name='us-east-1')
#
#Using boto3 to get the Database password instead of hardcoding it in the code
#
cluster_creds = client.get_cluster_credentials(
DbUser='awsuser',
DbName='dev',
ClusterIdentifier='redshift-cluster-1',
AutoCreate=False)
try:
# Database connection below that uses the DbPassword that boto3 returned
conn = psycopg2.connect(
host = 'redshift-cluster-1.cvlywrhztirh.us-east-1.redshift.amazonaws.com',
port = '5439',
user = cluster_creds['DbUser'],
password = cluster_creds['DbPassword'],
database = 'dev'
)
# Verifies that the connection worked
cursor = conn.cursor()
cursor.execute("SELECT VERSION()")
results = cursor.fetchone()
ver = results[0]
if (ver is None):
print("Could not find version")
else:
print("The version is " + ver)
except:
logger.exception('Failed to open database connection.')
print("Failed")

Related

Connect to cloudSQL db using service account with pymysql or mysql.connector

I have a running CloudSQL instance running in another VPC and a nginx proxy to allow cross-vpc access.
I can access the db using a built-in user. But how can I access the DB using a Google Service Account?
import google.auth
import google.auth.transport.requests
import mysql.connector
from mysql.connector import Error
import os
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
connection = mysql.connector.connect(host=HOST,
database=DB,
user=SA_USER,
password=creds.token)
if connection.is_connected():
db_Info = connection.get_server_info()
print("Connected to MySQL Server version ", db_Info)
cur = connection.cursor()
cur.execute("""SELECT now()""")
query_results = cur.fetchall()
print(query_results)
When using mysql connnector, I get this error:
DatabaseError: 2059 (HY000): Authentication plugin 'mysql_clear_password' cannot be loaded: plugin not enabled
Then I tried using pymysql
import pymysql
import google.auth
import google.auth.transport.requests
import os
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
try:
conn = pymysql.connect(host=ENDPOINT, user=SA_USER, passwd=creds.token, port=PORT, database=DBNAME)
cur = conn.cursor()
cur.execute("""SELECT now()""")
query_results = cur.fetchall()
print(query_results)
except Exception as e:
print("Database connection failed due to {}".format(e))
Database connection failed due to (1045, "Access denied for user 'xx'#'xxx.xxx.xx.xx' (using password: YES)"
I guess these errors are all related to the token.
Anyone to suggest a proper way to get SA token to access CloudSQL DB?
PS: Using cloudsql auth proxy is not a good option for our architecture.

The error that you have mentioned in description , indicates an issue with authentication , to exactly understand what could have caused ,try these things
Verify the username and corresponding password.
Check the origin of the connection to see if it matches the URL where
the user has access privileges.
Check the user's grant privileges in the database.
As you are trying to access the DB using a Google Service Account then you should try to use the default service account credentials to include this authorization token for you. Check out the Client libraries and sample code page for more info.Alternatively, if you prefer to manually create the requests, you can use an Oauth 2.0 token. The Authorizing requests page has more information for how to create these.These access tokens are only valid for 60 minutes after which they expire - however once a token expires it does not disconnect clients but if that client connection is broken and must re-connect to the instance, and it's been more than an hour, then a new access token will need to be pulled and provided on that new connection attempt.
For your use case as you are not interested in cloud sql proxy, a service account IAM user is the better way to go.
Note that to get an appropriate access token the scope must be set to Cloud SQL Admin API.

It finally works.
I had to enforce SSL connection.
import pymysql
from google.oauth2 import service_account
import google.auth.transport.requests
scopes = ["https://www.googleapis.com/auth/cloud-platform", "https://www.googleapis.com/auth/sqlservice.admin"]
credentials = service_account.Credentials.from_service_account_file('key.json', scopes=scopes)
auth_req = google.auth.transport.requests.Request()
credentials.refresh(auth_req)
config = {'user': SA_USER,
'host': ENDPOINT,
'database': DBNAME,
'password': credentials.token,
'ssl_ca': './server-ca.pem',
'ssl_cert': './client-cert.pem',
'ssl_key': './client-key.pem'}
try:
conn = pymysql.connect(**config)
with conn:
print("Connected")
cur = conn.cursor()
cur.execute("""SELECT now()""")
query_results = cur.fetchall()
print(query_results)
except Exception as e:
print("Database connection failed due to {}".format(e))

I'd recommend using the Cloud SQL Python Connector it should make your life way easier!
It manages the SSL connection for you (no need for cert files!), takes care of the credentials (uses Application Default Credentials which you can set to service account easily) and allows you to login with Automatic IAM AuthN so that you don't have to pass the credentials token as a password.
Connecting looks like this:
from google.cloud.sql.connector import Connector, IPTypes
import sqlalchemy
import pymysql
# initialize Connector object
connector = Connector(ip_type=IPTypes.PRIVATE, enable_iam_auth=True,)
# function to return the database connection
def getconn() -> pymysql.connections.Connection:
conn: pymysql.connections.Connection = connector.connect(
"project:region:instance", # your Cloud SQL instance connection name
"pymysql",
user="my-user",
db="my-db-name"
)
return conn
# create connection pool
pool = sqlalchemy.create_engine(
"mysql+pymysql://",
creator=getconn,
)
# insert statement
insert_stmt = sqlalchemy.text(
"INSERT INTO my_table (id, title) VALUES (:id, :title)",
)
# interact with Cloud SQL database using connection pool
with pool.connect() as db_conn:
# insert into database
db_conn.execute(insert_stmt, id="book1", title="Book One")
# query database
result = db_conn.execute("SELECT * from my_table").fetchall()
# Do something with the results
for row in result:
print(row)
Let me know if you run into any issues! There is also an interactive Cloud SQL Notebook that will walk your through things in more detail you can check out.

Connect to Oracle Database from SQLAlchemy in Python on AWS EC2

I'm using python Jupyter-Lab inside a Docker Conteiner, which is embedded in an AWS EC-2. This Docker Container has an Instant Oracle Cliente installed inside it, so everything is set. The problem is that I'm still having trouble to connect this Docker to my AWS RDS with an Oracle Database, but only using SQLAlchemy.
When I try the connection using cx-Oracle==8.2.1 engine:
host = '***********************'
user = '*********'
password = '**********'
port = '****'
service = '****'
dsn_tns = cx_Oracle.makedsn(host,
port,
service)
engine_oracle = cx_Oracle.connect(user=user, password=password, dsn=dsn_tns)
Everything works fine. I can read tables using pandas read_sql(), I can create tables using cx_Oracle execute(), etc.
But when I try to take a DataFrame and send it to my RDS using pandas to_sql(), my cx_Oracle connection returns the error:
DatabaseError: ORA-01036: illegal variable name/number
I then tried to use a SQLAlchemy==1.4.22 engine from the string:
tns = """
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = %s)(PORT = %s))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = %s)
)
)
""" % (host, port, service)
engine_alchemy = create_engine('oracle+cx_oracle://%s:%s#%s' % (user, password, tns))
But I get this error:
DatabaseError: ORA-12154: TNS:could not resolve the connect identifier specified
And I keep getting this error even when I try to use pandas read_sql with the SQLAlchemy engine. Thus, I ran out of options. Can somebody help me please?
EDIT*
I tried again with SQLAlchemy==1.3.9 and it worked. Does anybody knows why?
The code I'm using for reading and sending a test table from and to Oracle is:
sql = """
SELECT
*
FROM
DADOS_MIS.DR_ACIO_ATIVOS_HASH
WHERE
ROWNUM <= 5"""
df = pd.read_sql(sql, engine_oracle)
dtyp1 = {c:'VARCHAR2('+str(df[c].str.len().max())+')'
for c in df.columns[df.dtypes == 'object'].tolist()}
dtyp2 = {c:'NUMBER'
for c in df.columns[df.dtypes == 'float64'].tolist()}
dtyp3 = {c:'DATE'
for c in df.columns[df.dtypes == 'datetime'].tolist()}
dtyp4 = {c:'NUMBER'
for c in df.columns[df.dtypes == 'int64'].tolist()}
dtyp_total = dtyp1
dtyp_total.update(dtyp2)
dtyp_total.update(dtyp3)
dtyp_total.update(dtyp4)
df.to_sql(name='teste', con=engine_oracle, if_exists='replace', dtype=dtyp_total, index=False)
The dtyp_total is:
{'IDENTIFICADOR': 'VARCHAR2(32)',
'IDENTIFICADOR_PRODUTO': 'VARCHAR2(32)',
'DATA_CHAMADA': 'VARCHAR2(19)',
'TABULACAO': 'VARCHAR2(25)'}

Connecting to Oracle DB via python

I'm using jaydebeapi to connect to an oracle DB. The code is as follows:
host = [address]
port = "1521"
sid = "ctginst1"
database = "oracle"
drivertype = "thin"
uid = [user]
pwd = [pass]
driver_class = "oracle.jdbc.OracleDriver"
driver_file = "ojdbc10.jar"
connection_string="jdbc:{}:{}#{}:{}:{}".format(database, drivertype, host, port, sid)
conn=jaydebeapi.connect(driver_class, connection_string, [uid, pwd], driver_file, )
However this fails and gives me an error:
java.lang.RuntimeException: Class oracle.jdbc.OracleDriver not found
Edit:
By passing CLASSPATH with the .jar's location when starting JVM and only then attempting the connection I managed to proceed further with
import jpype
jpype.startJVM(jpype.getDefaultJVMPath(), '-Djava.class.path=%s' % driver_file)
And now I am getting java.sql.SQLException: Invalid Oracle URL specified error

Okay so from there there was apparently a colon missing before the "#".
The full successful connection code looks like:
import jaydebeapi
import jpype
host = host
port = "1521"
sid = "ctginst1"
database = "oracle"
drivertype = "thin"
uid = user
pwd = password
driver_class = "oracle.jdbc.OracleDriver"
driver_file = "C:\ojdbc8.jar"
connection_string="jdbc:{}:{}:#{}:{}:{}".format(database, drivertype, host, port, sid)
jpype.startJVM(jpype.getDefaultJVMPath(), '-Djava.class.path=%s' % driver_file)
conn=jaydebeapi.connect(driver_class, connection_string, [uid, pwd])

Connecting to remote Oracle database using Python

I am trying to connect to a remote Oracle database using Python.
I am able to access the database directly using DBeaver and I have copied the parameters in the Python code below from the "Connection Configuration --> Connection settings --> General" tab (which can be opened by right-clicking on the database and selecting "Edit connection"):
import cx_Oracle
host_name = # content of "Host"
port_number = # content of "Port"
user_name = # content of "User name"
pwd = # content of "Password"
service_name = # content of "Database" (the "Service Name" option is selected)
dsn_tns = cx_Oracle.makedsn(host_name, port_number, service_name = service_name)
conn = cx_Oracle.connect(user = user_name, password = pwd, dsn = dsn_tns)
However, I get the following error:
DatabaseError: ORA-12541: TNS:no listener
Other answers I found related to this question suggested modifying some values inside the listener.ora file, but I have no such file on my computer nor do I know where it can be retrieved. Does anyone have any suggestions?

There would be two reason for that error.
The database was briefly unavailable at the time when you tried to access
The Oracle client application on your machine is not configured correctly
I think thi config is not correct.
See the link : https://oracle.github.io/python-cx_Oracle/
ip = '192.168.1.1'
port = 1521
SID = 'YOURSIDHERE'
dsn_tns = cx_Oracle.makedsn(ip, port, SID)
db = cx_Oracle.connect('username', 'password', dsn_tns)

How to Load Data into Amazon Redshift via Python Boto3?

In Amazon Redshift's Getting Started Guide, data is pulled from Amazon S3 and loaded into an Amazon Redshift Cluster utilizing SQLWorkbench/J. I'd like to mimic the same process of connecting to the cluster and loading sample data into the cluster utilizing Boto3.
However in Boto3's documentation of Redshift, I'm unable to find a method that would allow me to upload data into Amazon Redshift cluster.
I've been able to connect with Redshift utilizing Boto3 with the following code:
client = boto3.client('redshift')
But I'm not sure what method would allow me to either create tables or upload data to Amazon Redshift the way it's done in the tutorial with SQLWorkbenchJ.

Right, you need psycopg2 Python module to execute COPY command.
My code looks like this:
import psycopg2
#Amazon Redshift connect string
conn_string = "dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"
#connect to Redshift (database should be open to the world)
con = psycopg2.connect(conn_string);
sql="""COPY %s FROM '%s' credentials
'aws_access_key_id=%s; aws_secret_access_key=%s'
delimiter '%s' FORMAT CSV %s %s; commit;""" %
(to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,delim,quote,gzip)
#Here
# fn - s3://path_to__input_file.gz
# gzip = 'gzip'
cur = con.cursor()
cur.execute(sql)
con.close()
I used boto3/psycopg2 to write CSV_Loader_For_Redshift

Go back to step 4 in that tutorial you linked. See where it shows you how to get the URL of the cluster? You have to connect to that URL with a PostgreSQL driver. The AWS SDKs such as Boto3 provide access to the AWS API. You need to connect to Redshift over a PostgreSQL API, just like you would connect to a PostgreSQL database on RDS.

Using psycopyg2 & get_cluster_credentials
Prerequisites -
IAM ROLE attached to respective User
IAM Role with get_cluster_credentials policy LINK
On cloud (EC2) with appropriate IAM Role attached
The below code will work only if you deploying it on a PC/VM where a user's AWS Credentials are already configured [ CLI - aws configure ] OR
you are on an instance in the same Account,VPC.
Have a config.ini file -
[Redshift]
port = 3389
username = please_enter_username
database_name = please_database-name
cluster_id = please_enter_cluster_id_name
url = please_enter_cluster_endpoint_url
region = us-west-2
My Redshift_connection.py
import logging
import psycopg2
import boto3
import ConfigParser
def db_connection():
logger = logging.getLogger(__name__)
parser = ConfigParser.ConfigParser()
parser.read('config.ini')
RS_PORT = parser.get('Redshift','port')
RS_USER = parser.get('Redshift','username')
DATABASE = parser.get('Redshift','database_name')
CLUSTER_ID = parser.get('Redshift','cluster_id')
RS_HOST = parser.get('Redshift','url')
REGION_NAME = parser.get('Redshift','region')
client = boto3.client('redshift',region_name=REGION_NAME)
cluster_creds = client.get_cluster_credentials(DbUser=RS_USER,
DbName=DATABASE,
ClusterIdentifier=CLUSTER_ID,
AutoCreate=False)
try:
conn = psycopg2.connect(
host=RS_HOST,
port=RS_PORT,
user=cluster_creds['DbUser'],
password=cluster_creds['DbPassword'],
database=DATABASE
)
return conn
except psycopg2.Error:
logger.exception('Failed to open database connection.')
print "Failed"
Query Execution script -
from Redshift_Connection import db_connection
def executescript(redshift_cursor):
query = "SELECT * FROM <SCHEMA_NAME>.<TABLENAME>"
cur=redshift_cursor
cur.execute(query)
conn = db_connection()
conn.set_session(autocommit=False)
cursor = conn.cursor()
executescript(cursor)
conn.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the column names in redshift using Python boto3 - python

Related

Connect to cloudSQL db using service account with pymysql or mysql.connector

Connect to Oracle Database from SQLAlchemy in Python on AWS EC2

Connecting to Oracle DB via python

Connecting to remote Oracle database using Python

How to Load Data into Amazon Redshift via Python Boto3?

Categories

Resources