I have a dataframe df created as follow :
df = pd.DataFrame(list(zip(product_urlList, nameList, priceList, picList)),
columns =['URL','NomProduit', 'Prix', "LienPic"])
df['IdUnique'] = df['NomProduit'] + df['Prix']
My target is to import it into a MySQL database.
I created an SQL Database (called "Sezane") and its table called "Robes" via Python with MySQL.connector.
import mysql.connector as mysql
db = mysql.connect(
host = "localhost",
user = "root",
passwd = "password",
database = "sezane"
)
cursor = db.cursor()
cursor.execute('CREATE TABLE Robes (id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY, Nom_Robes VARCHAR(255), Prix_Robes VARCHAR(255), liens_Robes VARCHAR(300), Images_robes VARCHAR (300), Id_Robes VARCHAR (255))'
Then, I try to insert this dataframe in the table :
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://root:password#Localhost:3306/sezane', echo=True)
df.to_sql(name='Robes', con=engine, if_exists = 'append')
I have the following error :
ProgrammingError: (mysql.connector.errors.ProgrammingError) 1054 (42S22): Unknown column 'index' in 'field list'
I made some researches about this error and found that it could become a problem of quote bracket "/' interversion.
However, after many hours on it, I still don't understand where it comes from. Why is the error message about "Index" ?
My target is to be able to make my df as a table.
By default to_sql tries to export the dataframe index as a column. You should be able to change this:
df.to_sql(name='Robes', con=engine, if_exists = 'append')
To this:
df.to_sql(name='Robes', con=engine, if_exists = 'append', index = False) and you will no longer get the same error.
Related
I'm getting classes for the tables in the DB as follows:
import sqlalchemy as sa
import sqlalchemy.ext.automap
eng = sa.create_engine(CONNECTION_URL)
Base = sa.ext.automap.automap_base()
Base.prepare(eng, reflect=True)
Session = sa.orm.sessionmaker(bind=eng)
Table1 = Base.classes.Table1
In my case Table1 is system versioned which I understand sqlalchemy doesn't explicitly support.
When running the following code:
t = Table1(field1=1, field2=3)
with Session() as session:
session.add(t)
session.commit()
I get the following error:
[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Cannot insert explicit value into a GENERATED ALWAYS column in table 'DBName.dbo.Table1'. Use INSERT with a column list to exclude the GENERATED ALWAYS column, or insert a DEFAULT into GENERATED ALWAYS column. (13536) (SQLExecDirectW);
I understand this probably has to do with the ValidTo and ValidFrom columns
Table1.__table__.columns.keys()
# Column('ValidFrom', DATETIME2(), table=<Table1>, nullable=False)
# Column('ValidTo', DATETIME2(), table=<Table1>, nullable=False)
How do I tell sqlalchemy to ignore those columns during the insert statement?
EDIT
I'm guessing the below is the relevant part of the create statement?
CREATE TABLE [dbo].[Table1]
[TableID] [int] NOT NULL IDENTITY,
...
[ValidFrom] [datetime2](7) GENERATED ALWAYS AS ROW START NOT NULL,
[ValidTo] [datetime2](7) GENERATED ALWAYS AS ROW END NOT NULL
I've got this code below working using a sqlalchemy.
CREATE TABLE dbo.Customer2
(
Id INT NOT NULL PRIMARY KEY CLUSTERED,
Name NVARCHAR(100) NOT NULL,
StartTime DATETIME2 GENERATED ALWAYS AS ROW START
NOT NULL,
EndTime DATETIME2 GENERATED ALWAYS AS ROW END
NOT NULL ,
PERIOD FOR SYSTEM_TIME (StartTime, EndTime)
)
WITH(SYSTEM_VERSIONING=ON (HISTORY_TABLE=dbo.CustomerHistory2))
If the StartTime / EndTime columns are hidden (which these arent't) then a value isn't needed in the insert statement you can add just the required. However the date columns in my table are required, so using default.
sql = "INSERT INTO dbo.Customer2 VALUES (2,'Someone else', default,default)"
print(sql)
with engine.connect() as con:
rs = con.execute(sql)
Am new to Postgres. Anyone can tell how to have it work?
What I want to do is to write Pandas datataframe to PostgreSQL database. I have already created a database 'customer' and table 'users'.
I am creating a simple Pandas dataframe as follows:
data = {'Col1':[1,2,3,4,5], 'Col2':[1,2,3,4,5]}
df = pd.DataFrame(data)
After that I am creating Postgres database connection to my 'customer' database follows:
conn = psycopg2.connect(
database="customer", user='postgres', password='password', host='127.0.0.1', port= '5432')
Then, I am using the following command to insert records from dataframe into table 'users':
df.to_sql('users', conn, if_exists='replace')
conn.commit()
conn.close()
Error that I am getting is:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': syntax error at or near ";"
LINE 1: ...ELECT name FROM sqlite_master WHERE type='table' AND name=?;
^
df.to_sql() does not work for "conn" in psycopg2. It is for "engine" in sqlalchemy. For psycopg2, try insert instead:
Step 1: Creation of an empty table
First you need to create a cursor and then create a table:
cursor = conn.cursor()
cursor.execute("CREATE TABLE users_table (col1 integer, col2 integer)")
conn.commit()
Step 2: Insert pandas df to the users_table
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(df.columns))
query = "INSERT INTO %s(%s) VALUES(%%s,%%s)" % (users_table, cols) #two columns
cursor.executemany(query, tuples)
conn.commit()
If you want to use df.to_sql():
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://user:password#hostname/database_name')
df.to_sql('users', engine)
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_sql.html
I'm trying to copy a couple of tables from one database ("db1", PostgreSQL) to another ("db2", SQL Server).
Unfortunately, I face an issue due to the BOOLEAN type for some fields in the PostgreSQL database which is not recognized as a valid type for SQL Server.
Here is my code sample:
db2_engine = "postgresql+psycopg2://" + str(db2_user) + ":" + str(db2_password) + "#" + str(db2_host) + ":" + str(db2_port) + "/" + str(db2_database)
db2 = sqlalchemy.create_engine(db2_engine)
lst_tablename_totr = ["contract",
"subscription",
"contractdelivery",
"businesspartner"
]
for table_name in lst_tablename_totr:
table = Table(table_name, metadata, autoload=True, autoload_with=db2)
table.create(bind=db1)
query = """
SELECT
*
FROM """ + str(table_name) + """
"""
df_hg = pd.read_sql(query, db2_engine)
df_hg.to_sql(table_name, db1, schema='dbo', index=False, if_exists='append')
For now, the issue is located to the table = Table(table_name, metadata, autoload=True, autoload_with=db_hgzl) table.create(bind=db1) part of the code.
Here is the error message:
ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC Driver 13 for SQL Server][SQL Server]Column, parameter or variable #8\xa0: data type BOOLEAN not found. (2715) (SQLExecDirectW)')
I couldn't find any way to force the conversion between PostgreSQL Boolean type and SQL Server Bit type.
You are seeing a difference between SQLAlchemy's dialect-specific BOOLEAN type and its generic Boolean type. For an existing PostgreSQL table
CREATE TABLE IF NOT EXISTS public.so68683260
(
id character varying(5) COLLATE pg_catalog."default" NOT NULL,
bool_col boolean NOT NULL,
CONSTRAINT so68683260_pkey PRIMARY KEY (id)
)
if we reflect the table then the boolean columns are defined as BOOLEAN
tbl = sa.Table(table_name, sa.MetaData(), autoload_with=pg_engine)
print(type(tbl.columns["bool_col"].type))
# <class 'sqlalchemy.sql.sqltypes.BOOLEAN'>
and then if we try to create the table in SQL Server we end up doing the equivalent of
tbl = sa.Table(
table_name,
sa.MetaData(),
sa.Column("id", sa.VARCHAR(5), primary_key=True),
sa.Column("bool_col", sa.BOOLEAN, nullable=False),
)
tbl.drop(ms_engine, checkfirst=True)
tbl.create(ms_engine)
and that fails with the error you cite because the DDL rendered is
CREATE TABLE so68683260 (
id VARCHAR(5) NOT NULL,
bool_col BOOLEAN NOT NULL,
PRIMARY KEY (id)
)
However, if we use the generic Boolean type
tbl = sa.Table(
table_name,
sa.MetaData(),
sa.Column("id", sa.VARCHAR(5), primary_key=True),
sa.Column("bool_col", sa.Boolean, nullable=False),
)
tbl.drop(ms_engine, checkfirst=True)
tbl.create(ms_engine)
we are successful because the DDL rendered is
CREATE TABLE so68683260 (
id VARCHAR(5) NOT NULL,
bool_col BIT NOT NULL,
PRIMARY KEY (id)
)
and BIT is the valid corresponding column type in T-SQL.
Feel free to open a SQLAlchemy issue if you believe that this behaviour should be changed.
[Note also that the text column is VARCHAR(5) because the table uses the default encoding for my PostgreSQL test database (UTF8), but creating the table in SQL Server will create a VARCHAR (non-Unicode) column instead of a NVARCHAR (Unicode) column.]
All I want is a simple Upsert from the DataFrame to SQLite. However, since pd.to_sql() does not have Upsert, I had to implement it with SQLAlchemy instead.
SQLite:
CREATE TABLE test (col1 INTEGER, col2 text, col3 REAL, PRIMARY KEY(col1, col2));
python:
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy import Table
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.ext.automap import automap_base
def test_upsert():
df = pd.DataFrame({'col1':1, 'col2':'a', 'col3':1.5}, index=[0])
sql_url = 'sqlite:///testDB.db'
table = 'test'
engine = create_engine(sql_url)
with engine.connect() as conn:
base = automap_base()
base.prepare(engine, reflect=True)
target_table = Table(table, base.metadata, autoload=True, autoload_with=engine)
stmt = insert(target_table).values(df.to_dict(orient='records'))
update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}
conn.execute(stmt.on_conflict_do_update(constraint=f'{table}_pkey', set_=update_dict))
The script above works with Postgres previously but it keeps giving me the error when used with SQLite.
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) near "ON": syntax error
[SQL: INSERT INTO test (col1, col2, col3) VALUES (?, ?, ?) ON CONFLICT (test_pkey) DO UPDATE SET col3 = excluded.col3]
[parameters: (1, 'a', 1.5)]
(Background on this error at: http://sqlalche.me/e/14/e3q8)
I'm not sure what I did wrong, or if there's any better solution since it seems like a very common operation.
Any help is appreciated.
I've a Pandas dataframe which I'm trying to insert into a MySQL table, using MySQLdb and to_sql. The table has 'allocationid' as primary key and autoincrement.. I will want to do this daily, deleting out the day's previous data from the MySQL table and reinserting updated data from the Pandas dataframe. Hence would like the primary key to autoincrement automatically (I won't be using it down the line, but may want to refer to it).
code is...
columns = ('date','tradeid','accountid','amount')
splitInput = pd.DataFrame(columns = columns)
splitInput['accountid'] = newHFfile['acctID']
splitInput['tradeid'] = newHFfile['Ref']
splitInput['amount'] = newHFfile['AMOUNT1']
splitInput['date'] = newHFfile['Trade Date']
db = MySQLdb.connect(host="(hostIP)", port=3306, user="user", passwd="(passwd)", db="(database)")
cursor = db.cursor()
query = """delete from splittrades where date = """ + runymdformat + """ """
cursor.execute(query)
db.commit()
splitInput.to_sql(con = db, name = 'splittrades',if_exists = 'append',flavor = 'mysql',index = False)
db.commit()
db.close()
The problem is that without adding a column for primary key, I get 'OperationalError: (1364, "Field 'allocationid' doesn't have a default value")'
If I add a primary key column and leave it blank, null, I get OperationalError: (1366, "Incorrect integer value: '' for column 'allocationid' at row 1")
If I use 1 or 0 in the allocationid column I get a duplicated value error msg.
MySQL usually auto-increments the primary key if you don't specify it - is there a way I can make this work from Python?
PS am not a Python expert so pls treat me gently - thanks :-)