Specifying the schema in Pandas to_sql - python

From the source of to_sql, I can see that it gets mapped to an Meta Data object meta = MetaData(con, schema=schema). However, I can't find SQLAlchemy docs that tell me how to define the Schema for MySQL
How do I specify the schema string ?

The schema parameter in to_sql is confusing as the word "schema" means something different from the general meaning of "table definitions". In some SQL flavors, notably postgresql, a schema is effectively a namespace for a set of tables.
For example, you might have two schemas, one called test and one called prod. Each might contain a table called user_rankings generated in pandas and written using the to_sql command. You would specify the test schema when working on improvements to user rankings. When you are ready to deploy the new rankings, you would write to the prod schema.
As others have mentioned, when you call to_sql the table definition is generated from the type information for each column in the dataframe. If the table already exists in the database with exactly the same structure, you can use the append option to add new data to the table.

DataFrame.to_sql(self, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Just use schema parameter. But note that schema is not odbc driver.

Starting from the Dialects page of the SQLAlchemy documentation, select documentation page of your dialect and search for create_engine to find example on how to create it.
Even more concise overview you can get on Engine Configuration page for all supported dialects.
Verbatim extract for mysql:
# default
engine = create_engine('mysql://scott:tiger#localhost/foo')
# mysql-python
engine = create_engine('mysql+mysqldb://scott:tiger#localhost/foo')
# MySQL-connector-python
engine = create_engine('mysql+mysqlconnector://scott:tiger#localhost/foo')
# OurSQL
engine = create_engine('mysql+oursql://scott:tiger#localhost/foo')
Then pass this engine to the to_sql(...) of pandas' DataFrame.

Related

Selecting a column with a slash in SQL Alchemy

I need to select some columns from a table with SQL Alchemy. Everything works fine except selecting the one column with '/' in the name. My query looks like:
query = select([func.sum(Table.c.ColumnName),
func.sum(Table.c.Column/Name),
])
Obviously the issue comes from the second line with the column 'Column/Name'. Is there a way in SQL Alchemy to overcome special characters in a column name?
edit:
I've it all inside some class but simplified version of a process looks like this. I create an engine (all necessary db data is inside create_new_engine() function) and map all tables in db into metadata.
def map(self):
from sqlalchemy.engine.base import Engine
# check if engine exist
if not isinstance(self.engine, Engine):
self.create_new_engine()
self.metadata = MetaData({'schema': 'dbo'})
self.metadata.reflect(bind=self.engine)
Then I map a single table with:
def map_table(self, table_name):
table = "{schema}.{table_name}".format(schema=self.metadata.schema, table_name=table_name)
table = self.metadata.tables[table]
return table
In the end I use pandas read_sql_query to run above query with connection and engine established earlier.
I'm connecting to SQL Server.
Since Table.c points to a plain python obect. Try in pure Python
query = select([func.sum(Table.c.ColumnName),
func.sum(getattr(Table.c, 'Column/Name')),
])
So in your case (from comments above) :
func.sum(getattr(Table.c, 'cur/fees'))

How to query bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?

So I have several tables with each product for each year and tables go like:
2020product5, 2019product5, 2018product6 and so on. I have added two custom parameters in google data studio as well named year and product_id but could not use them in table names themselves. I have used parameterized queries before but in conditions like where product_id = #product_id but this setup only works if all of the data is in same table which is not the current case with me. In python I use string formatters like f"{year}product{product_id}" but that obviously does not work in this case...
Using Bigquery Default CONCAT & FORMAT functions does not help as both throw following validation error: Table-valued function not found: CONCAT at [1:15]
So how do I get around with querying bigquery tables in google data studio with python-like string formatting in table names based on custom parameters?
After much research I (kinda) sorted it out. Turns out it is a database level feature to query schema-level entities e.g. table names dynamically. BigQuery does not support formatting within table name like tables as per in question (e.g. 2020product5, 2019product5, 2018product6) cannot be queried directly. However, it does have a TABLE_SUFFIX function which allow you to access tables dynamically given that changes in table names are located at the end of the table. (This feature also allowed for dateweise partitioning and many tools which use BQ as data sink, utilize this. So If you are using BQ as data sink, there is good chance that your original data source is already doing so). Thus, table names like (product52020, product52019, product62018) as well can be accessed dynamically and of course from data studio too using following:
SELECT * FROM `project_salsa_101.dashboards.product*` WHERE _table_Suffix = CONCAT(#product_id,#year)
P.S.: Used python to create a dirty script which looped through products and tables and copied and created new ones which goes as follows: (Adding script with formatted string so it might be useful for anyone with such case wtih nominal effort)
import itertools
credentials = service_account.Credentials.from_service_account_file(
'project_salsa_101-bq-admin.json')
project_id = 'project_salsa_101'
schema = 'dashboards'
client = bigquery.Client(credentials= credentials,project=project_id)
for product_id, year in in itertools.product(product_ids, years):
df = client.query(f"""
SELECT * FROM `{project_id}.{schema}.{year}product{product_id}`
""").result().to_dataframe()
df.to_gbq(project_id = project_id,
destination_table = f'{schema}.product{product_id}{year}',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
if_exists = 'replace')
client.query(f"""
DROP TABLE `{project_id}.{schema}.{year}product{product_id}`""").result()

How to get base from existing sql DDL file?

I'm using SQLAlchemy for MySQL.
The common example of SQLAlchemy is
Defining model classes by the table structure. (class User(Base))
Migrate to the database by db.create_all (or alembic, etc)
Import the model class, and use it. (db.session.query(User))
But what if I want to use raw SQL file instead of defined model classes?
I did read automap do similar like this, but I want to get mapper object from raw SQL file, not created database.
Is there any best practice to do this?
This is an example of DDL
-- ddl.sql
-- This is just an example, so please ignore some issues related to a grammar
CREATE TABLE `card` (
`card_id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'card',
`card_company_id` bigint(20) DEFAULT NULL COMMENT 'card_company_id',
PRIMARY KEY (`card_id`),
KEY `card_ix01` (`card_company_id`),
KEY `card_ix02` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='card table'
And I want to do like
Base = raw_sql_base('ddl.sql') # Some kinda automap_base but from SQL file
# engine, suppose it has two tables 'user' and 'address' set up
engine = create_engine("mysql://user#localhost/program")
# reflect the tables
Base.prepare(engine)
# mapped classes are now created with names by sql file
Card = Base.classes.card
session = Session(engine)
session.add(Card(card_id=1, card_company_id=1))
session.commit() # Insert
SQLAlchemy is not an SQL parser, but the exact opposite; its reflection works against existing databases only. In other words you must execute your DDL and then use reflection / automap to create the necessary Python models:
from sqlalchemy.ext.automap import automap_base
# engine, suppose it has two tables 'user' and 'address' set up
engine = create_engine("mysql://user#localhost/program")
# execute the DDL in order to populate the DB
with open('ddl.sql') as ddl:
engine.execute(ddl)
Base = automap_base()
# reflect the tables
Base.prepare(engine, reflect=True)
# mapped classes are now created with names by sql file
Card = Base.classes.card
session = Session(engine)
session.add(Card(card_id=1, card_company_id=1))
session.commit() # Insert
This of course may fail, if you have already executed the same DDL against your database, so you would have to handle that case as well. Another possible caveat is that some DB-API drivers may not like executing multiple statements at a time, if your ddl.sql happens to contain more than one CREATE TABLE statement etc.
...but I want to get mapper object from raw SQL file.
Ok, in that case what you need is the aforementioned parser. A cursory search produced two candidates:
sqlparse: Generic, but the issue tracker is a testament to how nontrivial parsing SQL is. Is often confused, for example parses ... COMMENT 'card', `card_company_id` ... as a keyword and an identifier list, not as a keyword, a literal, punctuation, and an identifier (or even better, the column definitions as their own nodes).
mysqlparse: A MySQL specific solution, but with limited support for just about anything, and it seems abandoned.
Parsing would be just the first step, though. You'd then have to convert the resulting trees to models.

Python - pandas to_sql

I'm trying to use https://pandas.pydata.org/pandas-docs/version/0.21/generated/pandas.DataFrame.to_sql.html
When I change the Name argument, e.g. say I set
pd.to_sql(name="testTable",constring)
the actual table name comes up as [UserName].[testTable] rather than just [testTable]
Is there a way I can get rid of the [userName]? which is linked to the user who runs the script?
The [UserName] portion of the table name is the schema that the table is in. I don't know which database engine you're using, but the schema you're looking for might be "dbo".
According to the documentation, you can provide a schema argument:
pd.to_sql(name="testTable",constring, schema="dbo")
Note that if the schema is left blank, it uses the DB user's default schema (as defined when the user was added to the database), which in your case, appears to be the schema of the user.

How to get SqlAlchemy Table to read "implicit" schema

Using table creation as normal:
t = Table(name, meta, [columns ...])
This is the first run where I create the table. In future executions I would like to use the table without having to indicate the [columns]. This seems redundant as it should already be specified in the table schema. In other words, for future accesses, I'd like to simply do:
t = Table(name, meta) # columns already read from schema
Is there a way to do this in SqlAlchemy?
See Reflecting Database Objects of SA documentation:
t = Table(name, meta, autoload=True)#, autoload_with=engine)

Categories

Resources