I tried a lot however I am unable to copy data available as json file in S3 bucket(I have read only access to the bucket) to Redshift table using python boto3. Below is the python code which I am using to copy the data. Using the same code I was able to create the tables in which I am trying to copy.
import configparser
import psycopg2
from sql_queries import create_table_queries, drop_table_queries
def drop_tables(cur, conn):
for query in drop_table_queries:
cur.execute(query)
conn.commit()
def create_tables(cur, conn):
for query in create_table_queries:
cur.execute(query)
conn.commit()
def main():
try:
config = configparser.ConfigParser()
config.read('dwh.cfg')
# conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*config['CLUSTER'].values()))
conn = psycopg2.connect(
host=config.get('CLUSTER', 'HOST'),
database=config.get('CLUSTER', 'DB_NAME'),
user=config.get('CLUSTER', 'DB_USER'),
password=config.get('CLUSTER', 'DB_PASSWORD'),
port=config.get('CLUSTER', 'DB_PORT')
)
cur = conn.cursor()
#drop_tables(cur, conn)
#create_tables(cur, conn)
qry = """copy DWH_STAGE_SONGS_TBL
from 's3://udacity-dend/song-data/A/A/A/TRAAACN128F9355673.json'
iam_role 'arn:aws:iam::xxxxxxx:role/MyRedShiftRole'
format as json 'auto';"""
print(qry)
cur.execute(qry)
# execute a statement
# print('PostgreSQL database version:')
# cur.execute('SELECT version()')
#
# # display the PostgreSQL database server version
# db_version = cur.fetchone()
# print(db_version)
print("Executed successfully")
cur.close()
conn.close()
# close the communication with the PostgreSQL
except Exception as error:
print("Error while processing")
print(error)
if __name__ == "__main__":
main()
I don't see any error in the Pycharm console but I see Aborted status in the redshift query console. I don't see any reason why it has been aborted(or I don't know where to look for that)
Other thing that I have noticed is when I run the copy statement in Redshift query editor , it runs fine and data gets moved into the table. I tried to delete and recreate the cluster but no luck. I am not able to figure what I am doing wrong. Thank you
Quick read - it looks like you haven't committed the transaction and the COPY is rolled back when the connection closes. You need to either change the connection configuration to be in "autocommit" or add an explicit "commit()".
Related
I am writing a python script that does the following as a part a transaction.
Creates a new database.
Creates new tables using a schema.sql file.
Copies the data from the master DB to this new DB using insert into select * from master.table_name... like SQL statements.
Commits the txn, in the else block, if everything goes right. Rollback the txn, in the except block, if something goes wrong.
Close the connection in the finally block.
However, while testing, I found out that rollback isn't working. If an exception is raised after DB is created, even after rollback, DB is created. If an exception is raised after inserting data into a few tables with some tables remaining, calling rollback in the except block does not revert the inserted data. The script looks like this:
import mysql.connector
try:
conn = mysql.connector.connect(host='localhost', port=3306,
user=USERNAME, password=PASSWORD,
autocommit=False)
cursor.execute("START TRANSACTION;")
cursor.execute(f"DROP DATABASE IF EXISTS {target_db_name};")
cursor.execute(f"CREATE DATABASE {target_db_name};")
cursor.execute(f"USE {target_db_name};")
with open(SCHEMA_LOCATION) as f:
schema_query = f.read()
commands = schema_query.split(";")
for command in commands:
cursor.execute(command)
for query in QUERIES:
cursor.execute(f"{query}{org_Id};")
except Exception as error:
conn.rollback()
else:
conn.commit() # cursor.execute("COMMIT")
finally:
if conn.is_connected():
cursor.close()
conn.close()
Below are the details of the setup
Python3
mysql-connector-python==8.0.32
MySQL 5.7
Storage Engine: InnoDB
Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results
I have a sql file generated during database backup process and I want to load all database content from that sql file to a different MySQL database (secondary database).
I have created a python function to load the whole database in that sql file but when I execute the function, I get an error
'str' object is not callable
Below is python script
def load_database_dump_to_secondary_mysql(file_path='db_backup_file.sql'):
query = f'source {file_path}'
try:
connection = mysql_hook.get_conn() # connection to secondary db
cursor = connection.cursor(query)
print('LOAD TO MYSQL COMPLETE')
except Exception as xerror:
print("LOAD ERROR: ", xerror)
NB: mysql_hook is an airflow connector that contains MySQL DB connection info such as Host, user/passwd, Database name. Also, I don't have connection to the primary database, I'm only receiving sql dump file.
What I'm I missing?
source is a client builtin command: https://dev.mysql.com/doc/refman/8.0/en/mysql-commands.html
It's not an SQL query that MySQL's SQL parser understands.
So you can't execute source using cursor.execute(), because that goes directly to the dynamic SQL interface.
You must run it using the MySQL command-line client as a subprocess:
subprocess.run(['mysql', '-e', f'source {file_path}'])
You might need other options to the mysql client, such as user, password, host, etc.
try this
import mysql.connector as m
# database which you want to backup
db = 'geeksforgeeks'
connection = m.connect(host='localhost', user='root',
password='123', database=db)
cursor = connection.cursor()
# Getting all the table names
cursor.execute('SHOW TABLES;')
table_names = []
for record in cursor.fetchall():
table_names.append(record[0])
backup_dbname = db + '_backup'
try:
cursor.execute(f'CREATE DATABASE {backup_dbname}')
except:
pass
cursor.execute(f'USE {backup_dbname}')
for table_name in table_names:
cursor.execute(
f'CREATE TABLE {table_name} SELECT * FROM {db}.{table_name}')
I am trying to execute stored procedure by using pyodbc in databricks, after executing SP I tried to commit the connection but, commit is not happening. Here I am giving my code, please help me out from this issue.
import pyodbc
#### Connecting Azure SQL
def db_connection():
try:
username = "starsusername"
password = "password-db"
server = "server-name"
database_name = "db-name2"
port = "db-port"
conn=pyodbc.connect('Driver={ODBC Driver 17 for SQL server};SERVER=tcp:'+server+','+port+';DATABASE='+ database_name +';UID='+ username +';PWD='+ password)
cursor=conn.cursor()
return cursor, conn
except Exception as e:
print("Faild to Connect AZURE SQL: \n"+str(e))
cursor, conn = db_connection()
# conn1.autocommit=True
cursor.execute("delete from db.table_name")
cursor.execute("insert into db.table_name(BUSINESS_DATE) values('2021-10-02')")
cursor.execute("exec db.SP_NAME '20211023'")
conn.commit()
conn.close()
here I am commiting connection after SP excution. deletion and insertion is not happening at all. and I tried with cursor.execute("SET NOCOUNT ON; exec db.SP_NAME '20211023'") but it's also not working.
Thanks in Advance
If you check this document on pyodbc, you will find that -
To call a stored procedure right now, pass the call to the execute method using either a format your database recognizes or using the ODBC call escape format. The ODBC driver will then reformat the call for you to match the given database.
Note that after connection is set up or done, try doing conn.autocommit = True before calling your SP and it will help. By default it is false.
Executing the Stored Procedure.
You will be able to execute your stored procedure if you follow the below code snippet.
cursor = conn.cursor()
conn.autocommit = True
executesp = """EXEC yourstoredprocedure """
cursor.execute(executesp)
conn.commit()
Delete the Records in SQL Server
You can delete record as shown in the below example.
...#just an example
cursor.execute('''
DELETE FROM product
WHERE product_id in (5,6)
''')
conn.commit()
Don’t forget to add conn.commit() at the end of the code, to ensure that the command would get executed.
Insert record in SQL Server
The below snippet show how we can do the same.
...#just an example
cursor.execute("INSERT INTO EMP (EMPNO, ENAME, JOB, MGR) VALUES (535, 'Scott', 'Manager', 545)")
conn.commit()
I will suggest you to read the for following document for more information.
Delete Record Documentation.
Insert Record Document
I am trying to copy a file from S3 to redshift table but I am unable to do so. However, I can read from the table so I know that my connection is okay.
Please help me to figure out the problem.
def upload_redshift():
conn_string = passd.redshift_login['login'] //the connection string containing dbname, username etc.
con = psycopg2.connect(conn_string);
sql = """FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ; ;"""
try:
con = psycopg2.connect(conn_string)
logging.info("Connection Successful!")
except:
raise ValueError("Unable to connect to Redshift")
cur = con.cursor()
try:
cur.execute(sql)
logging.info(" Copy to redshift executed successfully")
except:
raise ValueError("Failed to execute copy command")
con.close()
I am getting Copy to redshift executed successfully message but nothing is happening in my table.
Try the following,
sql = "copy table_name FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ;"
Also, try creating the connection under "connections tab" and use PostgresHook with aws_access_key_id and key as variables, something like below which enables to store the details encrypted within airflow,
pg_db = PostgresHook(postgres_conn_id='<<connection_id>>')
src_conn = pg_db.get_conn()
src_cursor = src_conn.cursor()
src_cursor.execute(sql)
src_cursor.commit()
src_cursor.close()
Also, you can use s3_to_redshift_operator operator and execute it as a task,
from airflow.operators.s3_to_redshift_operator import S3ToRedshiftTransfer
T1 = S3ToRedshiftTransfer(
schema = ‘’,
table = ‘’,
s3_bucket=‘’,
s3_key=‘’,
redshift_conn_id=‘’, #reference to a specific redshift database
aws_conn_id=‘’, #reference to a specific S3 connection
)