Database schema changing when using pandas to_sql in django

Database schema changing when using pandas to_sql in django - python

I am trying to insert a dataframe to an existing django database model using the following code:
database_name = settings.DATABASES['default']['NAME']
database_url = 'sqlite:///{database_name}'.format(database_name=database_name)
engine = create_engine(database_url)
dataframe.to_sql(name='table_name', con=engine, if_exists='replace', index = False)
After running this command, the database schema changes also eliminating the primary key and leading to the following error: django.db.utils.OperationalError: foreign key mismatch
Note: The pandas column names and the database columns are matching.

It seems that the problem comes from the if_exists='replace' parameter in the to_sql method. The pandas documentation says the following:
if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
How to behave if the table already exists.
fail: Raise a ValueError.
replace: Drop the table before inserting new values.
append: Insert new values to the existing table.
The 'replace' parameter replaces the table with another table defined by a predefined schema, if the table already exists. In your case it replaces your table created by the django migration with a base table, thus losing the primary key, foreign key and all. Try replacing the 'replace' parameter with 'append'.

Related

Unable to insert a row in SQL Server table using Python SQLAlchemy (PK not set as IDENTITY) [duplicate]

This question already has answers here:
Prevent SQLAlchemy from automatically setting IDENTITY_INSERT
(4 answers)
Closed last year.
Have this Python Flask SQLAlchemy app that fetch data from a third party SQL Server database.
There is a table with to columns that I need to insert rows:
TABLE [dbo].[TableName](
[Id] [bigint] NOT NULL,
[Desc] [varchar](150) NOT NULL,
CONSTRAINT [PK_Id] PRIMARY KEY CLUSTERED ...
The primary key is not set as IDENTITY
Using SQLAlchemy ORM, if I try to add a new row without an explicit value for Id field, I have this error:
sqlalchemy.exc.IntegrityError: (pyodbc.IntegrityError) ('23000', "[23000] ...
The column not allow Null values* (translated text)
If I explicit an Id value, another error occurs:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] ...*
It is not possible to find the object "dbo.TableName", because it not exists or you don't have permissions (translated text)
This error is followed by the sentence:
[SQL: SET IDENTITY_INSERT dbo.[TableName] ON]
I'm supposing SQLAlchemy is trying to execute this command, but as Id is not set as IDENTITY, there's no need for that.
Using SQL Server Management Studio, with the same user of pyodbc connection, I'm able to insert new records, choosing whatever value for Id.
I would appreciate any hint.

Your INSERT will fail because a value must be defined for the primary key column of a table, either explicitly in your INSERT or implicitly by way of an IDENTITY property.
This requirement is due to the nature of primary keys and cannot be subverted. Further, you are unable to insert a NULL because the table definition explicitly disallows NULLs in that column.
You must provide a value in your INSERT statement explicitly due to the combination of design factors present.
Based on the documentation (https://docs-sqlalchemy.readthedocs.io/ko/latest/dialects/mssql.html#:~:text=The%20SQLAlchemy%20dialect%20will%20detect%20when%20an%20INSERT,OFF%20subsequent%20to%20the%20execution.%20Given%20this%20example%3A), it appears that SqlAlchemy may be assuming that column is an IDENTITY and is attempting to toggle IDENTITY_INSERT to on. As it is not an identity column, it is encountering an exception.
In your table metadata, check that you have autoincrement=False set for the Id column.
Edit to add: According to comments in an answer on a related question (Prevent SQLAlchemy from automatically setting IDENTITY_INSERT), it appears that SqlAlchemy assumes all integer-valued primary keys to be identity, auto-incrementing as well - meaning that you need to explicitly override that assumption as described here.

SQLAlchemy 1.4 overiding system value

I currently am using SQL Alchemy Core specifically with the SQL Expression Language.
I have a table that is currently using the GENERATED ALWAYS AS IDENTITY parameter.
CREATE TABLE mytable(id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
col1 VARCHAR(100),col2 VARCHAR(100));
Everytime I try insert in the table, i'm getting the error:
DETAIL: Column "id" is an identity column defined as GENERATED ALWAYS.
HINT: Use OVERRIDING SYSTEM VALUE to override.
I know that if I just to use postgres I could:
INSERT INTO mytable (id,col1,col2) OVERRIDING SYSTEM VALUE
VALUES (%s,%s,%s) ON CONFLICT (id) DO NOTHING;
But how would do this using the sql expression language that sqlalchemy provides?
I am currently upserting like this:
insert_stmt = postgresql.insert(target).values(vals)
primary_keys = [key.name for key in inspect(target).primary_key]
stmt = insert_stmt.on_conflict_do_nothing(index_elements=primary_keys)
conn.execute(stmt)

I wanted OVERRIDING SYSTEM VALUE to use fixed IDs in my tests.
As far as I can see, SQLAlchemy doesn't support this at the moment.
I hacked it in this way:
#compiles(Insert)
def set_inserts_overriding_system_value(the_insert, compiler, **kw):
text = compiler.visit_insert(the_insert, **kw)
text = text.replace(") VALUES (", ") OVERRIDING SYSTEM VALUE VALUES (")
return text
You can probably create some weird tables or insert queries on purpose, that will be messed up by this text replace. But it won't ever happen by accident.

Creating a table in database defining a column as primary key

I am creating a database from different CSV files. After doing this I have tried to define the primary key table by table but I got an error.
c.execute("ALTER TABLE patient_data ADD PRIMARY KEY (ID);").fetchall()
OperationalError: near "PRIMARY": syntax error
Maybe the best thing to avoid this error is to define the primary key when the table is create but I dont know how to do that. I have been working with python for a few years but today is my first approach with SQL.
This is the code I use to import a CSV to a table
c.execute('''DROP TABLE IF EXISTS patient_data''')
c.execute(''' CREATE TABLE patient_data (ID, NHS_Number,Full_Name,Gender, Birthdate, Ethnicity, Postcode)''')
patients_admitted.to_sql('patient_data', conn, if_exists='append', index = False)
c.execute('''SELECT * FROM patient_data''').fetchall()

This is too long for a comment.
If your table does not have data, just re-create it with the primary key definition.
If your table does have data, you cannot add a primary key in one statement. Why not? The default value is either NULL or constant. And neither is allowed as a primary key.
And finally, SQLite does not allow you to add a primary key to an existing table. The solution is to copy the data to another table, recreate the table with the structure you want, and then copy the data back in.

database migration using alembic or flask (Python)

i am creating a database using SQLAlchemy and I Need to do Migration to my data as i am using df_sql function for converting my csv into dataframe and then to tables in sqlalchemy. As i do this i need to do Migration to add new column and values inside it and assign Primary and foreign key Features. I saw someting related to Alembic and flask but am not sure how to upgrade it as also am working on Jupyter. Any ideas of how i can update delete and assign keys to my tables would be very helpful. Done until the table creation.
metadata.tables.keys()
dict_keys(['table1', 'table2'])
I also tried directly to create a temp table and copy ist values and assinging Primary key but am getting error with my column names as it has Special characters so i cant create duplicate too. Rename property too doesnt work
Column: date
Column: time_stamp
Column: timeslices[5].profilerDataProcess[8]_C0[us]
Column: timeslices[4].profilerDataProcess[54]_C0[us]
Column: timeslices[4]profilerDataProcess[50]_C0[us]
Column: timeslices[4].profilerDataProcess[49]_C0[us]
Column: timeslices[0].profilerDataProcess[14]_C0[us]

Insert into postgreSQL table from pandas with "on conflict" update

I have a pandas DataFrame that I need to store into the database. Here's my current line of code for inserting:
df.to_sql(table,con=engine,if_exists='append',index_label=index_col)
This works fine if none of the rows in df exist in my table. If a row already exists, I get this error:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key
value violates unique constraint "mypk"
DETAIL: Key (id)=(42) already exists.
[SQL: 'INSERT INTO mytable (id, owner,...) VALUES (%(id)s, %(owner)s,...']
[parameters:...] (Background on this error at: http://sqlalche.me/e/gkpj)
and nothing is inserted.
PostgreSQL has optional ON CONFLICT clause, which could be used to UPDATE the existing table rows. I read entire pandas.DataFrame.to_sql manual page and I couldn't find any way to use ON CONFLICT within DataFrame.to_sql() function.
I have considered spliting my DataFrame in two based on what's already in the db table. So now I have two DataFrames, insert_rows and update_rows, and I can safely execute
insert_rows.to_sql(table, con=engine, if_exists='append', index_label=index_col)
But then, there seems to be no UPDATE equivalent to DataFrame.to_sql(). So how do I update the table using DataFrame update_rows?

I know it's an old thread, but I ran into the same issue and this thread showed up in Google. None of the answers is really satisfying yet, so I here's what I came up with:
My solution is pretty similar to zdgriffith's answer, but much more performant as there's no need to iterate over data_iter:
def postgres_upsert(table, conn, keys, data_iter):
from sqlalchemy.dialects.postgresql import insert
data = [dict(zip(keys, row)) for row in data_iter]
insert_statement = insert(table.table).values(data)
upsert_statement = insert_statement.on_conflict_do_update(
constraint=f"{table.table.name}_pkey",
set_={c.key: c for c in insert_statement.excluded},
)
conn.execute(upsert_statement)
Now you can use this custom upsert method in pandas' to_sql method like zdgriffith showed.
Please note that my upsert function uses the primary key constraint of the table. You can target another constraint by changing the constraint argument of .on_conflict_do_update.
This SO answer on a related thread explains the use of .excluded a bit more: https://stackoverflow.com/a/51935542/7066758

# SaturnFromTitan, thanks for the reply to this old thread. That worked like magic. I would upvote, but I don't have the rep.
For those that are as new to all this as I am:
You can cut and paste SaturnFromTitan answer and call it with something like:
df.to_sql('my_table_name',
dbConnection,schema='my_schema',
if_exists='append',
index=False,
method=postgres_upsert)
And that's it. The upsert works.

To follow up on Brendan's answer with an example, this is what worked for me:
import os
import sqlalchemy as sa
import pandas as pd
from sqlalchemy.dialects.postgresql import insert
engine = sa.create_engine(os.getenv("DBURL"))
meta = sa.MetaData()
meta.bind = engine
meta.reflect(views=True)
def upsert(table, conn, keys, data_iter):
upsert_args = {"constraint": "test_table_col_a_col_b_key"}
for data in data_iter:
data = {k: data[i] for i, k in enumerate(keys)}
upsert_args["set_"] = data
insert_stmt = insert(meta.tables[table.name]).values(**data)
upsert_stmt = insert_stmt.on_conflict_do_update(**upsert_args)
conn.execute(upsert_stmt)
if __name__ == "__main__":
df = pd.read_csv("test_data.txt")
with db.engine.connect() as conn:
df.to_sql(
"test_table",
con=conn,
if_exists="append",
method=upsert,
index=False,
)
where in this example the schema would be something like:
CREATE TABLE test_table(
col_a text NOT NULL,
col_b text NOT NULL,
col_c text,
UNIQUE (col_a, col_b)
)

If you notice in the to_sql docs there's mention of a method argument that takes a callable. Creating this callable should allow you to use the Postgres clauses you need. Here's an example of a callable they mentioned in the docs: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
It's pretty different from what you need, but follow the arguments passed to this callable. They will allow you to construct a regular SQL statement.

If anybody wanted to build on top of the answer from zdgriffith and dynamically generate the table constraint name you can use the following query for postgreSQL:
select distinct tco.constraint_name
from information_schema.table_constraints tco
join information_schema.key_column_usage kcu
on kcu.constraint_name = tco.constraint_name
and kcu.constraint_schema = tco.constraint_schema
and kcu.constraint_name = tco.constraint_name
where kcu.table_name = '{table.name}'
and constraint_type = 'PRIMARY KEY';
You can then format this string to populate table.name inside the upsert() method.
I also didn't require the meta.bind and meta.reflect() lines. The latter will be deprecated soon anyway.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Database schema changing when using pandas to_sql in django - python

Related

Unable to insert a row in SQL Server table using Python SQLAlchemy (PK not set as IDENTITY) [duplicate]

SQLAlchemy 1.4 overiding system value

Creating a table in database defining a column as primary key

database migration using alembic or flask (Python)

Insert into postgreSQL table from pandas with "on conflict" update

Categories

Resources