sqlalchemy to_sql if_exists = "replace" creates a table with different prefix - python

When I try to execute the following code, it drops the actual table in the database but doesn't recreate the same one. Instead of generating the dbo.TableName type, it does create username/TableName.
engine = sqlalchemy.create_engine(
"mssql+pyodbc://server/dbname?driver=ODBC+Driver+13+for+SQL+Server")
df.to_sql("OTD_1_DELIVERY_TRACKING_F_IMPORT", con=engine, if_exists="replace", index=False)
Does anyone know how to fix this so it recreates the dbo.TableName table?

Does anyone know how to fix this so it recreates the dbo.TableName table
This is happening because the user has a default schema other than dbo. In very old versions of SQL Server each user had their own schema, and this is a vestige of that behavior.
So when that user runs
DROP TABLE OTD_1_DELIVERY_TRACKING_F_IMPORT
It looks in the user's default schema first, and not finding anything, then looks in the dbo schema, and drops that table. But when the user runs
CREATE TABLE OTD_1_DELIVERY_TRACKING_F_IMPORT ...
It's created in the user's default schema.
The easy fix is to change the user's DEFAULT SCHEMA to dbo. EG
ALTER USER [SomeUser] WITH DEFAULT_SCHEMA = dbo;

Related

Can't append to an existing table. Fails silently

I'm trying to dump a pandas DataFrame into an existing Snowflake table (via a jupyter notebook).
When I run the code below no errors are raised, but no data is written to the destination SF table (df has ~800 rows).
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
sf_engine = create_engine(
URL(
user=os.environ['SF_PROD_EID'],
password=os.environ['SF_PROD_PWD'],
account=account,
warehouse=warehouse,
database=database,
)
)
df.to_sql(
"test_table",
con=sf_engine,
schema=schema,
if_exists="append",
index=False,
chunksize=16000,
)
If I check the SF History, I can see that the queries apparently ran without issue:
If I pull the query from the SF History UI and run it manually in the Snowflake UI the data shows up in the destination table.
If I try to use locopy I run into the same issue.
If the table does not exist before hand, the same code above creates the table and drops the rows no problem.
Here's where it gets weird. When I run the pd.to_sql command to try and append and then drop the destination table, if I then issue a select count(*) from destination_table a table still exists with that name and has (only) the data that I've been trying to drop. Thinking it may be a case-sensitive table naming situation?
Any insight is appreciated :)
Try adding role="<role>" and schema="<schema>" in URL.
engine = create_engine(URL(
account=os.getenv("SNOWFLAKE_ACCOUNT"),
user=os.getenv("SNOWFLAKE_USER"),
password=os.getenv("SNOWFLAKE_PASSWORD"),
role="<role>",
warehouse="<warehouse>",
database="<database>",
schema="<schema>"
))
Issue was due how I set up the database connection and the case-sensitivity of the table name. Turns out that I was writing to a table called DB.SCHEMA."db.schema.test_table" (note that the db.schema slug turns into part of the table name). Don't be like me kids. Use upper-case table names in Snowflake!

Database schema changing when using pandas to_sql in django

I am trying to insert a dataframe to an existing django database model using the following code:
database_name = settings.DATABASES['default']['NAME']
database_url = 'sqlite:///{database_name}'.format(database_name=database_name)
engine = create_engine(database_url)
dataframe.to_sql(name='table_name', con=engine, if_exists='replace', index = False)
After running this command, the database schema changes also eliminating the primary key and leading to the following error: django.db.utils.OperationalError: foreign key mismatch
Note: The pandas column names and the database columns are matching.
It seems that the problem comes from the if_exists='replace' parameter in the to_sql method. The pandas documentation says the following:
if_exists{‘fail’, ‘replace’, ‘append’}, default ‘fail’
How to behave if the table already exists.
fail: Raise a ValueError.
replace: Drop the table before inserting new values.
append: Insert new values to the existing table.
The 'replace' parameter replaces the table with another table defined by a predefined schema, if the table already exists. In your case it replaces your table created by the django migration with a base table, thus losing the primary key, foreign key and all. Try replacing the 'replace' parameter with 'append'.

SQLAlchemy doesn't recognize new entries in query

I'm querying the latest entry from a table like this:
data = dbsession.query(db.mytable).order_by(db.mytable.timestamp.desc()).with_entities(db.mytable.timestamp).first()
On startup this is fine, but if new etries are added by the same dbsession during runtime, the query above doesn't recognize them.
But the following code without SQLAlchemy works as expected:
sql_query="SELECT timestamp FROM mytable ORDER BY timestamp DESC LIMIT 1"
data = cursor.execute(sql_query)
How do I get SQLAlchemy to work in this case?
Had a similar issue once, without recalling exactly why sqlAlchemy behaves this way, you need to commit the session before the select to refresh the data:
session.commit()

Python - pandas to_sql

I'm trying to use https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_sql.html
When I change the Name argument, e.g. say I set
pd.to_sql(name="testTable",constring)
the actual table name comes up as [UserName].[testTable] rather than just [testTable]
Is there a way I can get rid of the [userName]? which is linked to the user who runs the script?
The [UserName] portion of the table name is the schema that the table is in. I don't know which database engine you're using, but the schema you're looking for might be "dbo".
According to the documentation, you can provide a schema argument:
pd.to_sql(name="testTable",constring, schema="dbo")
Note that if the schema is left blank, it uses the DB user's default schema (as defined when the user was added to the database), which in your case, appears to be the schema of the user.

Specifying the schema in Pandas to_sql

From the source of to_sql, I can see that it gets mapped to an Meta Data object meta = MetaData(con, schema=schema). However, I can't find SQLAlchemy docs that tell me how to define the Schema for MySQL
How do I specify the schema string ?
The schema parameter in to_sql is confusing as the word "schema" means something different from the general meaning of "table definitions". In some SQL flavors, notably postgresql, a schema is effectively a namespace for a set of tables.
For example, you might have two schemas, one called test and one called prod. Each might contain a table called user_rankings generated in pandas and written using the to_sql command. You would specify the test schema when working on improvements to user rankings. When you are ready to deploy the new rankings, you would write to the prod schema.
As others have mentioned, when you call to_sql the table definition is generated from the type information for each column in the dataframe. If the table already exists in the database with exactly the same structure, you can use the append option to add new data to the table.
DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Just use schema parameter. But note that schema is not odbc driver.
Starting from the Dialects page of the SQLAlchemy documentation, select documentation page of your dialect and search for create_engine to find example on how to create it.
Even more concise overview you can get on Engine Configuration page for all supported dialects.
Verbatim extract for mysql:
# default
engine = create_engine('mysql://scott:tiger#localhost/foo')
# mysql-python
engine = create_engine('mysql+mysqldb://scott:tiger#localhost/foo')
# MySQL-connector-python
engine = create_engine('mysql+mysqlconnector://scott:tiger#localhost/foo')
# OurSQL
engine = create_engine('mysql+oursql://scott:tiger#localhost/foo')
Then pass this engine to the to_sql(...) of pandas' DataFrame.

Categories

Resources