As per Unload to S3 with Python using IAM Role credentials, the unload statement worked perfectly. So did other commands I tried, like copy and select statements.
However, I also tried to run a query which creates a table.. The create table query runs without error, but when it gets to the select statement, it throws an errors that relation "public.test" does not exist.
Any idea why is the table not created properly? Query below:
import sqlalchemy as sa
from sqlalchemy.orm import sessionmaker
import config
import pandas as pd
#>>>>>>>> MAKE CHANGES HERE >>>>>>>>
DATABASE = "db"
USER = "user"
PASSWORD = getattr(config, 'password') #see answer by David Bern https://stackoverflow.com/questions/43136925/create-a-config-file-to-hold-values-like-username-password-url-in-python-behave/43137301
HOST = "host"
PORT = "5439"
SCHEMA = "public" #default is "public"
########## connection and session creation ##########
connection_string = "redshift+psycopg2://%s:%s#%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = "SET search_path TO %s" % SCHEMA
s.execute(SetPath)
--create table example
query2 = '''\
create table public.test (
id integer encode lzo,
user_id integer encode lzo,
created_at timestamp encode delta32k,
updated_at timestamp encode delta32k
)
distkey(id)
sortkey(id)
'''
r2 = s.execute(query2)
--select example
query4 = '''\
select * from public.test
'''
r4 = s.execute(query4)
########## create DataFrame from SQL query output ##########
df = pd.read_sql_query(query4, connection_string)
print(df.head(50))
########## close session in the end ##########
s.close()
If I run the same directly in Redshift, it works just fine..
--Edit--
Some of the things tried:
Removing "\" from query string
adding ";" at the end of query string
changing "public.test" to "test"
removing SetPath = "SET search_path TO %s" % SCHEMA and s.execute(SetPath)
breaking the create statement- generates expected error
adding copy from S3 command after create- runs without error, but again no table created
adding a column to create statement that doesnt exist in the file that is generated from the copy command- generates expected error
adding r4 = s.execute(query4)- runs without error, but again created table not in Redshift
Apparently need to add s.commit() in order to create the table.. If you are populating it via copy command or insert into: then add it after the copy command (after the create table is optional). Basically, it does not auto commit for create/alter commands!
http://docs.sqlalchemy.org/en/latest/orm/session_basics.html#session-faq-whentocreate
http://docs.sqlalchemy.org/en/latest/core/connections.html#understanding-autocommit
Related
I have a 1,000,000 records that I am trying to enter to the database, some of the records unfortunately are not standing with the db schema. At the moment when a record failed I am doing:
rollback to the database
observer the exception
fix the issue
run again.
I wish to build a script which would save a side all "bad" records but would commit all the correct ones.
Of course I can commit one by one and then when the commit fail rollback and commit the next but I would pay a "run time price" as the code would run for a long time.
In the example below i have two models: File and Client.The a relation one client has many files.
In the commit.py file i wish to commit 1M File objects at once or at batches (1k). at the moment I only understand when something failed when i commit at the end, is there a way to know which object are "bad" ( Integrity errors with the foreign key as example) before, i.e park a side ( in another list) but committing all the "good"
thx a lot for the help
#model.py
from sqlalchemy import Column, DateTime, String, func, Integer, ForeignKey
from . import base
class Client(base):
__tablename__ = 'clients'
id = Column(String, primary_key=True)
class File(base):
__tablename__ = 'files'
id = Column(Integer, primary_key=True, autoincrement=True)
client_id = Column(String, ForeignKey('clients.id'))
path = Column(String)
#init.py
import os
from dotenv import load_dotenv
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
load_dotenv()
db_name = os.getenv("DB_NAME")
db_host = os.getenv("DB_HOST")
db_port = os.getenv("DB_PORT")
db_user = os.getenv("DB_USER")
db_password = os.getenv("DB_PASSWORD")
db_uri = 'postgresql://' + db_user + ':' + db_password + '#' + db_host + ':' + db_port + '/' + db_name
print(f"product_domain: {db_uri}")
base = declarative_base()
engine = create_engine(db_uri)
base.metadata.bind = engine
Session = sessionmaker(bind=engine)
session = Session()
conn = engine.connect()
#commit.py
from . import session
def commit(list_of_1m_File_objects_records):
#I wish to for loop over the rows and if a specific row rasie excaption to insert it to a list and handle after wards
for file in list_of_1m_File_objects_records:
session.add(file)
session.commit()
# client:
# id
# "a"
# "b"
# "c"
# file:
# id|client_id|path
# --|---------|-------------
# 1 "a" "path1.txt"
# 2 "aa" "path2.txt"
# 3 "c" "path143.txt"
# 4 "a" "pat.txt"
# 5 "b" "patb.txt"
# wish the file data would enter the database although it has one record "aa" which will raise integrity error
Since I can't comment, I would suggest to use psycopg2 and sqlAlchemy to generate the connection with the db and then use a query with "On conflict" at the end of the query to add and commit your data
Of course I can commit one by one and then when the commit fail rollback and commit the next but I would pay a "run time price" as the code would run for a long time.
What is the source of that price? If it is fsync speed, you can get rid of most of that cost by setting synchronous_commit to off on the local connection. If you have a crash part way through, then you need to figure out which ones had actually been recorded once it comes back up so you know where to start up again, but I wouldn't think that that would be hard to do. This method should get you most benefit for the least work.
at the moment I only understand when something failed when i commit at the end
It sounds like you are using deferred constraints. Is there a reason for that?
is there a way to know which object are "bad" ( Integrity errors with the foreign key as example)
For the case of that example, read all the Client ids into a dictionary before you start (assuming they will fit in RAM) then test Files on the python side so you can reject the orphans before trying to insert them.
One of our queries that was working in Python 2 + mxODBC is not working in Python 3 + pyodbc; it raises an error like this: Maximum number of parameters in the sql query is 2100. while connecting to SQL Server. Since both the printed queries have 3000 params, I thought it should fail in both environments, but clearly that doesn't seem to be the case here. In the Python 2 environment, both MSODBC 11 or MSODBC 17 works, so I immediately ruled out a driver related issue.
So my question is:
Is it correct to send a list as multiple params in SQLAlchemy because the param list will be proportional to the length of list? I think it looks a bit strange; I would have preferred concatenating the list into a single string because the DB doesn't understand the list datatype.
Are there any hints on why it would be working in mxODBC but not pyodbc? Does mxODBC optimize something that pyodbc does not? Please let me know if there are any pointers - I can try and paste more info here. (I am still new to debugging SQLAlchemy.)
Footnote: I have seen lot of answers that suggest to chunk the data, but because of 1 and 2, I wonder if I am doing the correct thing in the first place.
(Since it seems to be related to pyodbc, I have raised an internal issue in the official repository.)
import sqlalchemy
import sqlalchemy.orm
from sqlalchemy import MetaData, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.session import Session
Base = declarative_base()
create_tables = """
CREATE TABLE products(
idn NUMERIC(8) PRIMARY KEY
);
"""
check_tables = """
SELECT * FROM products;
"""
insert_values = """
INSERT INTO products
(idn)
values
(1),
(2);
"""
delete_tables = """
DROP TABLE products;
"""
engine = sqlalchemy.create_engine('mssql+pyodbc://user:password#dsn')
connection = engine.connect()
cursor = engine.raw_connection().cursor()
Session = sqlalchemy.orm.sessionmaker(bind=connection)
session = Session()
session.execute(create_tables)
metadata = MetaData(connection)
class Products(Base):
__table__ = Table('products', metadata, autoload=True)
try:
session.execute(check_tables)
session.execute(insert_values)
session.commit()
query = session.query(Products).filter(
Products.idn.in_(list(range(0, 3000)))
)
query.all()
f = open("query.sql", "w")
f.write(str(query))
f.close()
finally:
session.execute(delete_tables)
session.commit()
When you do a straightforward .in_(list_of_values) SQLAlchemy renders the following SQL ...
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN (?, ?)
... where each value in the IN clause is specified as a separate parameter value. pyodbc sends this to SQL Server as ...
exec sp_prepexec #p1 output,N'#P1 nvarchar(4),#P2 nvarchar(4)',N'SELECT team.prov AS team_prov, team.city AS team_city, team.team_name AS team_team_name
FROM team
WHERE team.prov IN (#P1, #P2)',N'AB',N'ON'
... so you hit the limit of 2100 parameters if your list is very long. Presumably, mxODBC inserted the parameter values inline before sending it to SQL Server, e.g.,
SELECT team.prov AS team_prov, team.city AS team_city
FROM team
WHERE team.prov IN ('AB', 'ON')
You can get SQLAlchemy to do that for you with
provinces = ["AB", "ON"]
stmt = (
session.query(Team)
.filter(
Team.prov.in_(sa.bindparam("p1", expanding=True, literal_execute=True))
)
.statement
)
result = list(session.query(Team).params(p1=provinces).from_statement(stmt))
I have the following PostgreSql table:
I want to insert a string value of 3 into the first row under the diagnosis column. I'm trying to follow some general INSERT code from the postgresql documentation. Below is my trial code that is not working.
diagnosis = 3;
pip install config
# insert
import psycopg2
from config import config
def insert_diagnosis(diagnosis):
""" insert a new diagnosis prediction into the eyeballtables table """
sql = """INSERT INTO eyeballtables(diagnosis)
VALUES(%s) RETURNING vendor_id;"""
conn = None
vendor_id = None
try:
# read database configuration
params = config()
# connect to the PostgreSQL database
conn = psycopg2.connect(**params)
# create a new cursor
cur = conn.cursor()
# execute the INSERT statement
cur.execute(sql, (diagnosis,))
# get the generated id back
vendor_id = cur.fetchone()[0]
# commit the changes to the database
conn.commit()
# close communication with the database
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()
return vendor_id
The code above is not exactly right because it doesn't isolate the insert to the "diagnosis" field in the table. But beyond that I also get the following error even though I did a successful pip install of config:
ImportError: cannot import name 'config' from 'config' (/home/pinzhi/anaconda3/lib/python3.7/site-packages/config.py)
Thought on what I'm doing wrong or if there is a more straightforward way to insert a datapoint into a PostgreSql table I have?
EDIT ---------------------------------------------------------------------
The config error is no longer popping up after I followed the answer provided. But my code above is unable to insert the diagnosis value of 3 into the table.
What am I doing wrong?
it's Config not config
from config import Config
params = Config()
I'm looking to run the following test.sql located in a folder on my C: drive. I've been playing with cx_Oracle and just can't get it to work.
test.sql contains the following.
CREATE TABLE MURRAYLR.test
( customer_id number(10) NOT NULL,
customer_name varchar2(50) NOT NULL,
city varchar2(50)
);
CREATE TABLE MURRAYLR.test2
( customer_id number(10) NOT NULL,
customer_name varchar2(50) NOT NULL,
city varchar2(50)
);
This is my code:
import sys
import cx_Oracle
connection = cx_Oracle.connect('user,'password,'test.ora')
cursor = connection.cursor()
f = open("C:\Users\desktop\Test_table.sql")
full_sql = f.read()
sql_commands = full_sql.split(';')
for sql_command in sql_commands:
cursor.execute(sql_command)
cursor.close()
connection.close()
This answer is relevant only if your test.sql file contains new lines '\n\' characters (like mine which I got from copy-pasting your sql code). You will need to remove them in your code, if they are present. To check, do
print full_sql
To fix the '\n's,
sql_commands = full_sql.replace('\n', '').split(';')[:-1]
The above should help.
It removes the '\n's and removes the empty string token at the end when splitting the sql string.
MURRAYLR.test is not acceptable table name in any DBMS I've used. The connection object the cx_oracle.connect returns should already have a schema selected. To switch to a different schema set the current_schema field on the connection object or add using <Schemaname>; in your sql file.
Obviously make sure that the schema exists.
I'm having troubles with creating a database and tables. The database needs to be created within a Python script.
#connect method has 4 parameters:
#localhost (where mysql db is located),
#database user name,
#account password,
#database name
db1 = MS.connect(host="localhost",user="root",passwd="****",db="test")
returns
_mysql_exceptions.OperationalError: (1049, "Unknown database 'test'")
So clearly, the db1 needs to be created first, but how? I've tried CREATE before the connect() statement but get errors.
Once the database is created, how do I create tables?
Thanks,
Tom
Here is the syntax, this works, at least the first time around. The second time naturally returns that the db already exists. Now to figure out how to use the drop command properly.
db = MS.connect(host="localhost",user="root",passwd="****")
db1 = db.cursor()
db1.execute('CREATE DATABASE test1')
So this works great the first time through. The second time through provides a warning "db already exists". How to deal with this? The following is how I think it should work, but doesn't. OR should it be an if statement, looking for if it already exists, do not populate?
import warnings
warnings.filterwarnings("ignore", "test1")
Use CREATE DATABASE to create the database:
db1 = MS.connect(host="localhost",user="root",passwd="****")
cursor = db1.cursor()
sql = 'CREATE DATABASE mydata'
cursor.execute(sql)
Use CREATE TABLE to create the table:
sql = '''CREATE TABLE foo (
bar VARCHAR(50) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
'''
cursor.execute(sql)
There are a lot of options when creating a table. If you are not sure what the right SQL should be, it may help to use a graphical tool like phpmyadmin to create a table, and then use SHOW CREATE TABLE to discover what SQL is needed to create it:
mysql> show create table foo \G
*************************** 1. row ***************************
Table: foo
Create Table: CREATE TABLE `foo` (
`bar` varchar(50) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
phpmyadmin can also show you what SQL it used to perform all sorts of operations. This can be a convenient way to learn some basic SQL.
Once you've experimented with this, then you can write the SQL in Python.
I think the solution is a lot easier, use "if not":
sql = "CREATE DATABASE IF NOT EXISTS test1"
db1.execute(sql)
import MySQLdb
# Open database connection ( If database is not created don't give dbname)
db = MySQLdb.connect("localhost","yourusername","yourpassword","yourdbname" )
# prepare a cursor object using cursor() method
cursor = db.cursor()
# For creating create db
# Below line is hide your warning
cursor.execute("SET sql_notes = 0; ")
# create db here....
cursor.execute("create database IF NOT EXISTS yourdbname")
# create table
cursor.execute("SET sql_notes = 0; ")
cursor.execute("create table IF NOT EXISTS test (email varchar(70),pwd varchar(20));")
cursor.execute("SET sql_notes = 1; ")
#insert data
cursor.execute("insert into test (email,pwd) values('test#gmail.com','test')")
# Commit your changes in the database
db.commit()
# disconnect from server
db.close()
#OUTPUT
mysql> select * from test;
+-----------------+--------+
| email | pwd |
+-----------------+--------+
| test#gmail.com | test |
+-----------------+--------+