Pandas DataFrame to PostgresSql (pandas.io.sql.DatabaseError) - python

Am new to Postgres. Anyone can tell how to have it work?
What I want to do is to write Pandas datataframe to PostgreSQL database. I have already created a database 'customer' and table 'users'.
I am creating a simple Pandas dataframe as follows:
data = {'Col1':[1,2,3,4,5], 'Col2':[1,2,3,4,5]}
df = pd.DataFrame(data)
After that I am creating Postgres database connection to my 'customer' database follows:
conn = psycopg2.connect(
database="customer", user='postgres', password='password', host='127.0.0.1', port= '5432')
Then, I am using the following command to insert records from dataframe into table 'users':
df.to_sql('users', conn, if_exists='replace')
conn.commit()
conn.close()
Error that I am getting is:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': syntax error at or near ";"
LINE 1: ...ELECT name FROM sqlite_master WHERE type='table' AND name=?;
^

df.to_sql() does not work for "conn" in psycopg2. It is for "engine" in sqlalchemy. For psycopg2, try insert instead:
Step 1: Creation of an empty table
First you need to create a cursor and then create a table:
cursor = conn.cursor()
cursor.execute("CREATE TABLE users_table (col1 integer, col2 integer)")
conn.commit()
Step 2: Insert pandas df to the users_table
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(df.columns))
query = "INSERT INTO %s(%s) VALUES(%%s,%%s)" % (users_table, cols) #two columns
cursor.executemany(query, tuples)
conn.commit()
If you want to use df.to_sql():
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://user:password#hostname/database_name')
df.to_sql('users', engine)
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html

Related

Query String Composition in Psycopg2

I am trying to run a SQL "SELECT" query in Postgres from Python using Psycopg2. I am trying to compose the query string as below, but getting error message, using psycopg2 version 2.9.
from psycopg2 import sql
tablename = "mytab"
schema = "public"
query = sql.SQL("SELECT table_name from information_schema.tables where table_name = {tablename} and table_schema = {schema};")
query = query.format(tablename=sql.Identifier(tablename), schema=sql.Identifier(schema))
cursor.execute(query)
result = cursor.fetchone()[0]
Error:
psycopg2.error.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block
Can someone please help. Thanks.
In the (a bit strange) query
select table_name
from information_schema.tables
where table_name = 'mytab'
and table_schema = 'public';
'mytab' and 'public' are literals, not identifiers. For comparison, mytab is an identifier here:
select *
from mytab;
Thus your format statement should look like this:
query = query.format(tablename=sql.Literal(tablename), schema=sql.Literal(schema))
Note that the quoted error message is somewhat misleading as it is about executing a query other than what is shown in the question.
Since this query is only dealing with dynamic values it can be simplified to:
import psycopg2
con = psycopg2.connect(<params>)
cursor = con.cursor()
tablename = "mytab"
schema = "public"
# Regular placeholders
query = """SELECT
table_name
from
information_schema.tables
where
table_name = %s and table_schema = %s"""
cursor.execute(query, [tablename, schema])
result = cursor.fetchone()[0]
# Named placeholders
query = """SELECT
table_name
from
information_schema.tables
where
table_name = %(table)s and table_schema = %(schema)s"""
cursor.execute(query, {"table": tablename, "schema": schema})
result = cursor.fetchone()[0]

Populate data in a postgres table using a pandas dataframe

I am working with python trying to connect with postgres, I created a table into my postgres database in the staging schema.
create table staging.data( Name varchar, Age bigint);
then I try to connect and insert my dataframe data into this table:
import psycopg2
import pandas as pd
from sqlalchemy import create_engine
conn_string = 'postgresql://myuser:password#host/database_name'
db = create_engine(conn_string)
conn = db.connect()
# our dataframe
data = {'Name': ['Tom', 'dick', 'harry'],
'Age': [22, 21, 24]}
# Create DataFrame
df = pd.DataFrame(data)
df.to_sql('staging.data', con=conn, if_exists='replace',
index=False)
conn = psycopg2.connect(conn_string
)
conn.autocommit = True
cursor = conn.cursor()
sql1 = '''select * from staging.data;'''
cursor.execute(sql1)
for i in cursor.fetchall():
print(i)
conn.commit()
conn.close()
But the Python ends with no error message, and there is no data into my table from postgres.
Any idea about this?
Regards
I think the issue is that you are trying to use a schema other than public. Try passing in the schema name via the schema argument of to_sql() like this:
df.to_sql('data', con=conn, if_exists='replace', schema='staging', index=False)

Inserting Data to SQL Server from a Python Dataframe Quickly

I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.
Below is the code which is inserting the data but it is very slow. Please advise.
import pandas as pd
import xlsxwriter
import pyodbc
df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
'SERVER=Server Name;'
'Database=Database Name;'
'UID=User ID;'
'PWD=Password;'
'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:-
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:-
for index,row in df.iterrows():
cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C'])
conn.commit()
cursor.close()
conn.close()
To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True
connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)
with engine.connect() as connection:
df.to_sql(<table name>, connection, index=False)
Here is the script and hope this works for you.
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
This should do what you want...very generic example...
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=Excel-PC\SQLEXPRESS;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Try that and post back if you have additional questions/issues/concerns.
Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.
Try to populate a temp table with 1 or none indexes then insert it into your good table all at once.
Might speed things up due to not having to update the indexes after each insert??

Create MySQL Table from pandas dataframe - error 1054 (42S22)

I have a dataframe df created as follow :
df = pd.DataFrame(list(zip(product_urlList, nameList, priceList, picList)),
columns =['URL','NomProduit', 'Prix', "LienPic"])
df['IdUnique'] = df['NomProduit'] + df['Prix']
My target is to import it into a MySQL database.
I created an SQL Database (called "Sezane") and its table called "Robes" via Python with MySQL.connector.
import mysql.connector as mysql
db = mysql.connect(
host = "localhost",
user = "root",
passwd = "password",
database = "sezane"
)
cursor = db.cursor()
cursor.execute('CREATE TABLE Robes (id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY, Nom_Robes VARCHAR(255), Prix_Robes VARCHAR(255), liens_Robes VARCHAR(300), Images_robes VARCHAR (300), Id_Robes VARCHAR (255))'
Then, I try to insert this dataframe in the table :
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://root:password#Localhost:3306/sezane', echo=True)
df.to_sql(name='Robes', con=engine, if_exists = 'append')
I have the following error :
ProgrammingError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'index' in 'field list'
I made some researches about this error and found that it could become a problem of quote bracket "/' interversion.
However, after many hours on it, I still don't understand where it comes from. Why is the error message about "Index" ?
My target is to be able to make my df as a table.
By default to_sql tries to export the dataframe index as a column. You should be able to change this:
df.to_sql(name='Robes', con=engine, if_exists = 'append')
To this:
df.to_sql(name='Robes', con=engine, if_exists = 'append', index = False) and you will no longer get the same error.

Not all parameters were used in the SQL statement when using python and mysql

hi I am doing the python mysql at this project, I initial the database and try to create the table record, but it seems cannot load data to the table, can anyone here can help me out with this
import mysql.connector
mydb = mysql.connector.connect( host="localhost",user="root",password="asd619248636",database="mydatabase")
mycursor = mydb.cursor()
mycursor.excute=("CREATE TABLE record (temperature FLOAT(20) , humidity FLOAT(20))")
sql = "INSERT INTO record (temperature,humidity) VALUES (%d, %d)"
val = (2.3,4.5)
mycursor.execute(sql,val)
mydb.commit()
print(mycursor.rowcount, "record inserted.")
and the error shows "Not all parameters were used in the SQL statement")
mysql.connector.errors.ProgrammingError: Not all parameters were used in the SQL statement
Changing the following should fix your problem:
sql = "INSERT INTO record (temperature,humidity) VALUES (%s, %s)"
val = ("2.3","4.5") # You can also use (2.3, 4.5)
mycursor.execute(sql,val)
The database API takes strings as arguments, and later converts them to the appropriate datatype. Your code is throwing an error because it isn't expecting %d or %f (int or float) datatypes.
For more info on this you can look here
simply change insert method to
sql = "INSERT INTO record (temperature,humidity) VALUES (%s, %s)"
then it works fine
This works for me.
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path_here\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=Excel-Your_Server_Name;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()

Categories

Resources