Let me start off by saying I am extremely new to Python and Postgresql so I feel like I'm in way over my head. My end goal is to get connected to the dvdrental database in postgresql and be able to access/manipulate the data. So far I have:
created a .config folder and a database.ini is within there with my login credentials.
in my src i have a config.py folder and use config parser, see below:
def config(filename='.config/database.ini', section='postgresql'):
# create a parser
parser = ConfigParser()
# read config file
parser.read(filename)
# get section, default to postgresql
db = {}
if parser.has_section(section):
params = parser.items(section)
for param in params:
db[param[0]] = param[1]
else:
raise Exception('Section {0} not found in the {1} file'.format(section, filename))
return db
then also in my src I have a tasks.py file that has a basic connect function, see below:
import pandas as pd
from clients.config import config
import psycopg
def connect():
""" Connect to the PostgreSQL database server """
conn = None
try:
# read connection parameters
params = config()
# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg.connect(**params)
# create a cursor
cur = conn.cursor()
# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)
# close the communication with the PostgreSQL
cur.close()
except (Exception, psycopg.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
print('Database connection closed.')
if __name__ == '__main__':
connect()
Now this runs and prints out the Postgresql database version which is all well & great but I'm struggling to figure out how to change the code so that it's more generalized and maybe just creates a cursor?
I need the connect function to basically just connect to the dvdrental database and create a cursor so that I can then use my connection to select from the database in other needed "tasks" -- for example I'd like to be able to create another function like the below:
def select_from_table(cursor, table_name, schema):
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
return results
but I'm struggling with how to just create a connection to the dvdrental database & a cursor so that I'm able to actually fetch data and create pandas tables with it and whatnot.
so it would be like
task 1 is connecting to the database
task 2 is interacting with the database (selecting tables and whatnot)
task 3 is converting the result from 2 into a pandas df
thanks so much for any help!! This is for a project in a class I am taking and I am extremely overwhelmed and have been googling-researching non-stop and seemingly end up nowhere fast.
The fact that you established the connection is honestly the hardest step. I know it can be overwhelming but you're on the right track.
Just copy these three lines from connect into the select_from_table method
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
It will look like this (also added conn.close() at the end):
def select_from_table(cursor, table_name, schema):
params = config()
conn = psycopg.connect(**params)
cursor = conn.cursor()
cursor.execute(f"SET search_path TO {schema}, public;")
results= cursor.execute(f"SELECT * FROM {table_name};").fetchall()
conn.close()
return results
Related
I tried a lot however I am unable to copy data available as json file in S3 bucket(I have read only access to the bucket) to Redshift table using python boto3. Below is the python code which I am using to copy the data. Using the same code I was able to create the tables in which I am trying to copy.
import configparser
import psycopg2
from sql_queries import create_table_queries, drop_table_queries
def drop_tables(cur, conn):
for query in drop_table_queries:
cur.execute(query)
conn.commit()
def create_tables(cur, conn):
for query in create_table_queries:
cur.execute(query)
conn.commit()
def main():
try:
config = configparser.ConfigParser()
config.read('dwh.cfg')
# conn = psycopg2.connect("host={} dbname={} user={} password={} port={}".format(*config['CLUSTER'].values()))
conn = psycopg2.connect(
host=config.get('CLUSTER', 'HOST'),
database=config.get('CLUSTER', 'DB_NAME'),
user=config.get('CLUSTER', 'DB_USER'),
password=config.get('CLUSTER', 'DB_PASSWORD'),
port=config.get('CLUSTER', 'DB_PORT')
)
cur = conn.cursor()
#drop_tables(cur, conn)
#create_tables(cur, conn)
qry = """copy DWH_STAGE_SONGS_TBL
from 's3://udacity-dend/song-data/A/A/A/TRAAACN128F9355673.json'
iam_role 'arn:aws:iam::xxxxxxx:role/MyRedShiftRole'
format as json 'auto';"""
print(qry)
cur.execute(qry)
# execute a statement
# print('PostgreSQL database version:')
# cur.execute('SELECT version()')
#
# # display the PostgreSQL database server version
# db_version = cur.fetchone()
# print(db_version)
print("Executed successfully")
cur.close()
conn.close()
# close the communication with the PostgreSQL
except Exception as error:
print("Error while processing")
print(error)
if __name__ == "__main__":
main()
I don't see any error in the Pycharm console but I see Aborted status in the redshift query console. I don't see any reason why it has been aborted(or I don't know where to look for that)
Other thing that I have noticed is when I run the copy statement in Redshift query editor , it runs fine and data gets moved into the table. I tried to delete and recreate the cluster but no luck. I am not able to figure what I am doing wrong. Thank you
Quick read - it looks like you haven't committed the transaction and the COPY is rolled back when the connection closes. You need to either change the connection configuration to be in "autocommit" or add an explicit "commit()".
I have written a Python Tool with an wxPython GUI which has mainly the task to get a lot of user input regarding Customer Data, Product Data and so on and save it to a SQL Database, at the moment locally with a SQLite3 Database for testing an now switching to MS Azure to have anybody work in the same Database.
As i now plan to use a MS Azure SQL DB i have a few questions an i am hoping this is the right place to ask:
What is the best library to connect to Azure via Python? I found
pyodbc and pymssql but i think both need to have an extra driver
installed? Is this true and is this a problem in real usecases?
I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
So my code looks like this most of the time:
import wx
import sqlite3
SQL_PATH = "Database_Test.db"
class ManageCustomerToDB(wx.Dialog):
def __init__(self, *args, **kw):
super(ManageCustomerToDB, self).__init__(*args, **kw)
def InitUI(self):
#[GUI an so on...]
# I do this on time inside a module:
conn = sqlite3.connect(SQL_PATH)
self.c = conn.cursor()
# Use functions like the ones below...
def GetCustomerData(self):
self.c.execute("SELECT * FROM Customer WHERE CustomerID = ?", (self.tc_customer_id.GetValue(),))
customer_data = self.c.fetchall()
# Do something with Customer Data
def GetPersonData(self):
self.c.execute("SELECT * FROM Person WHERE PersonID = ?", (self.tc_person_id.GetValue(),))
person_data = self.c.fetchall()
# Do something with Person Data
I hope this example shows what i do. Are there any bigger mistakes i do?
After a read in SQL I dont have to close the DB in any way?
Thanks for your help and let me know if i can improve my question or give more details.
It is not a good idea to create a new connection to Azure SQL every time you CRUD. This is a waste of resources, and when the number of accesses reaches a certain number, it will have a large impact on the performance of mssql.
I suggest you use database connection pool. The pool manager will initial several connections to SQL Server instance, and then reuse these connections when requested.
There is an existing package which you can take advantage of. It is DBUtils. You can use the PoolDB from it with pyodbc together.
A sample for showing how database connection pool works:
import pyodbc
from DBUtils.PooledDB import PooledDB
class Database:
def __init__(self, server, driver, port, database, username, password):
self.server = server
self.driver = driver
self.port = port
self.database = database
self.username = username
self.password = password
self._CreatePool()
def _CreatePool(self):
self.Pool = PooledDB(creator=pyodbc, mincached=2, maxcached=5, maxshared=3, maxconnections=6, blocking=True, DRIVER=self.driver, SERVER=self.server, PORT=self.port, DATABASE=self.database, UID=self.username, PWD=self.password)
def _Getconnect(self):
self.conn = self.Pool.connection()
cur = self.conn.cursor()
if not cur:
raise "connection error"
else:
return cur
# query sql
def ExecQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
relist = cur.fetchall()
cur.close()
self.conn.close()
return relist
# non-query sql
def ExecNoQuery(self, sql):
cur = self._Getconnect()
cur.execute(sql)
self.conn.commit()
cur.close()
self.conn.close()
def main():
server = 'jackdemo.database.windows.net'
database = 'jackdemo'
username = 'jack'
port=1433
password = '*********'
driver= '{ODBC Driver 17 for SQL Server}'
ms = Database(server=server, driver=driver, port=port, database=database, username=username, password=password)
resList = ms.ExecQuery("select * from Users")
print(resList)
if __name__ == '__main__':
main()
Answers to your questions:
Q1: What is the best library to connect to Azure via Python? I found pyodbc and pymssql but i think both need to have an extra driver installed? Is this true and is this a problem in real usecases?
Answer: Both of then would be OK. ODBC stands for Open Database Connectivity, so it could be used to connect many databases. I see the Microsoft tutorial uses pyodbc, so maybe it is a better choice.
Q2: I have many modules, like Manage_Customer.py and Manage_Factory.py and so on. In all of them I connect to my Database. I have no module which is like a SQL Master which handels some overhead.
Answer: Use database connection pool.
Q3: After a read in SQL I dont have to close the DB in any way?
Answer: If you use database connection pool, the connection will be put back too pool after you call close() method.
I am trying to copy a file from S3 to redshift table but I am unable to do so. However, I can read from the table so I know that my connection is okay.
Please help me to figure out the problem.
def upload_redshift():
conn_string = passd.redshift_login['login'] //the connection string containing dbname, username etc.
con = psycopg2.connect(conn_string);
sql = """FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ; ;"""
try:
con = psycopg2.connect(conn_string)
logging.info("Connection Successful!")
except:
raise ValueError("Unable to connect to Redshift")
cur = con.cursor()
try:
cur.execute(sql)
logging.info(" Copy to redshift executed successfully")
except:
raise ValueError("Failed to execute copy command")
con.close()
I am getting Copy to redshift executed successfully message but nothing is happening in my table.
Try the following,
sql = "copy table_name FROM 's3://datawarehouse/my_S3_file' credentials 'aws_access_key_id=***;aws_secret_access_key=***' csv ;"
Also, try creating the connection under "connections tab" and use PostgresHook with aws_access_key_id and key as variables, something like below which enables to store the details encrypted within airflow,
pg_db = PostgresHook(postgres_conn_id='<<connection_id>>')
src_conn = pg_db.get_conn()
src_cursor = src_conn.cursor()
src_cursor.execute(sql)
src_cursor.commit()
src_cursor.close()
Also, you can use s3_to_redshift_operator operator and execute it as a task,
from airflow.operators.s3_to_redshift_operator import S3ToRedshiftTransfer
T1 = S3ToRedshiftTransfer(
schema = ‘’,
table = ‘’,
s3_bucket=‘’,
s3_key=‘’,
redshift_conn_id=‘’, #reference to a specific redshift database
aws_conn_id=‘’, #reference to a specific S3 connection
)
It's more of a theoratical question but i have been trying to find a correct answer of it for hours and yet i have't arrived at a solution. I have a big flask app and it contains multiple routes.
#app.route('/try'):
#app.route('/new'):
and many others. I am using MySQLdb for database purpose. Before i was having this in the starting of the application.
import MySQLdb as mysql
db = mysql.connect('localhost', 'root', 'password', 'db')
cursor = db.cursor()
It works fine but after a time, it generates a error "Local Variable 'cursor' referenced before assignment.". This may be due to the reason that after a time mysql closes a connection. So, i entered
cursor=db.cursor() in every route function and close it afer i have done the processing like this:
db = mysql.connect('localhost', 'root', 'password', 'db')
#app.route('/')
def home():
cursor=db.cursor()
...some processing...
cursor.close()
return render_template('home.html')
#app.route('/new')
def home_new():
cursor=db.cursor()
...some processing...
cursor.close()
return render_template('homenew.html')
Now i want to ask is this approach right? Should i define a cursor for each request and close it?
This is how I have my MySQLdb setup
def requestConnection():
"Create new connection. Return connection."
convt = cv.conversions.copy()
convt[3] = int
convt
conn = db.connect(host=c.SQL_HOST, port=c.SQL_PORT, user=c.SQL_USER, passwd=c.SQL_PASSWD, db=c.SQL_DB, conv=convt, use_unicode=True, charset="utf8")
return conn
def requestCursor(conn):
return conn.cursor(db.cursors.DictCursor)
Then, at the start of every SQL function I do this:
def executeQuery(query):
"Execute a given query. Used for debug purpouses."
conn = requestConnection()
cur = requestCursor(conn)
cur.execute(query)
r = cur.fetchall()
cur.close()
conn.close()
return r
I change conversions because I had to change int values in DB from Float to int due to my work, but you can skip this step.
If not, you need to import this:
import MySQLdb as db # https://github.com/farcepest/MySQLdb1
import MySQLdb.converters as cv
Hope it helps!
I have been trying to insert data into the database using the following code in python:
import sqlite3 as db
conn = db.connect('insertlinks.db')
cursor = conn.cursor()
db.autocommit(True)
a="asd"
b="adasd"
cursor.execute("Insert into links (link,id) values (?,?)",(a,b))
conn.close()
The code runs without any errors. But no updation to the database takes place. I tried adding the conn.commit() but it gives an error saying module not found. Please help?
You do have to commit after inserting:
cursor.execute("Insert into links (link,id) values (?,?)",(a,b))
conn.commit()
or use the connection as a context manager:
with conn:
cursor.execute("Insert into links (link,id) values (?,?)", (a, b))
or set autocommit correctly by setting the isolation_level keyword parameter to the connect() method to None:
conn = db.connect('insertlinks.db', isolation_level=None)
See Controlling Transactions.
It can be a bit late but set the autocommit = true save my time! especially if you have a script to run some bulk action as update/insert/delete...
Reference: https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.isolation_level
it is the way I usually have in my scripts:
def get_connection():
conn = sqlite3.connect('../db.sqlite3', isolation_level=None)
cursor = conn.cursor()
return conn, cursor
def get_jobs():
conn, cursor = get_connection()
if conn is None:
raise DatabaseError("Could not get connection")
I hope it helps you!