I have a function in my code that generates a bunch of tables on an API call. It looks somewhat like this:
def create_tables():
rows = connection.execute(sqlcmd)
for i, row in enumerate(rows):
# Do some work here
t = Table(f"data_{i}", metadata, *columns)
metadata.create_all()
I need another function where I iterate over the tables created in above function, then dump records in to each table from another API. Since, I'm not using declarative mapping or models in sqlalchmey, how do I identify these tables in my database and write data to specific table??
you can use the reflection system
meta.reflect(bind=someengine)
# now all located tables are present within the MetaData object’s
# dictionary of tables
table1 = meta.tables['data_1']
table1.insert().values(...)
Related
I have a tabled called products
which has following columns
id, product_id, data, activity_id
What I am essentially trying to do is copy bulk of existing products and update it's activity_id and create new entry in the products table.
Example:
I already have 70 existing entries in products with activity_id 2
Now I want to create another 70 entries with same data except for updated activity_id
I could have thousands of existing entries that I'd like to make a copy of and update the copied entries activity_id to be a new id.
products = self.session.query(model.Products).filter(filter1, filter2).all()
This returns all the existing products for a filter.
Then I iterate through products, then simply clone existing products and just update activity_id field.
for product in products:
product.activity_id = new_id
self.uow.skus.bulk_save_objects(simulation_skus)
self.uow.flush()
self.uow.commit()
What is the best/ fastest way to do these bulk entries so it kills time, as of now it's OK performance, is there a better solution?
You don't need to load these objects locally, all you really want to do is have the database create these rows.
You essentially want to run a query that creates the rows from the existing rows:
INSERT INTO product (product_id, data, activity_id)
SELECT product_id, data, 2 -- the new activity_id value
FROM product
WHERE activity_id = old_id
The above query would run entirely on the database server; this is far preferable over loading your query into Python objects, then sending all the Python data back to the server to populate INSERT statements for each new row.
Queries like that are something you could do with SQLAlchemy core, the half of the API that deals with generating SQL statements. However, you can use a query built from a declarative ORM model as a starting point. You'd need to
Access the Table instance for the model, as that then lets you create an INSERT statement via the Table.insert() method.
You could also get the same object from models.Product query, more on that later.
Access the statement that would normally fetch the data for your Python instances for your filtered models.Product query; you can do so via the Query.statement property.
Update the statement to replace the included activity_id column with your new value, and remove the primary key (I'm assuming that you have an auto-incrementing primary key column).
Apply that updated statement to the Insert object for the table via Insert.from_select().
Execute the generated INSERT INTO ... FROM ... query.
Step 1 can be achieved by using the SQLAlchemy introspection API; the inspect() function, applied to a model class, gives you a Mapper instance, which in turn has a Mapper.local_table attribute.
Steps 2 and 3 require a little juggling with the Select.with_only_columns() method to produce a new SELECT statement where we swapped out the column. You can't easily remove a column from a select statement but we can, however, use a loop over the existing columns in the query to 'copy' them across to the new SELECT, and at the same time make our replacement.
Step 4 is then straightforward, Insert.from_select() needs to have the columns that are inserted and the SELECT query. We have both as the SELECT object we have gives us its columns too.
Here is the code for generating your INSERT; the **replace keyword arguments are the columns you want to replace when inserting:
from sqlalchemy import inspect, literal
from sqlalchemy.sql import ClauseElement
def insert_from_query(model, query, **replace):
# The SQLAlchemy core definition of the table
table = inspect(model).local_table
# and the underlying core select statement to source new rows from
select = query.statement
# validate asssumptions: make sure the query produces rows from the above table
assert table in select.froms, f"{query!r} must produce rows from {model!r}"
assert all(c.name in select.columns for c in table.columns), f"{query!r} must include all {model!r} columns"
# updated select, replacing the indicated columns
as_clause = lambda v: literal(v) if not isinstance(v, ClauseElement) else v
replacements = {name: as_clause(value).label(name) for name, value in replace.items()}
from_select = select.with_only_columns([
replacements.get(c.name, c)
for c in table.columns
if not c.primary_key
])
return table.insert().from_select(from_select.columns, from_select)
I included a few assertions about the model and query relationship, and the code accepts arbitrary column clauses as replacements, not just literal values. You could use func.max(models.Product.activity_id) + 1 as a replacement value (wrapped as a subselect), for example.
The above function executes steps 1-4, producing the desired INSERT SQL statement when printed (I created a products model and query that I thought might be representative):
>>> print(insert_from_query(models.Product, products, activity_id=2))
INSERT INTO products (product_id, data, activity_id) SELECT products.product_id, products.data, :param_1 AS activity_id
FROM products
WHERE products.activity_id != :activity_id_1
All you have to do is execute it:
insert_stmt = insert_from_query(models.Product, products, activity_id=2)
self.session.execute(insert_stmt)
I need to drop some columns and uppercase the data in snowflake tables.
For which I need to loop through all the catalogs/ dbs, its respective schemas and then the tables.
I need this to be in python to list of the catalogs schemas and then the tables after which I will be exicuting the SQL query to do the manipulations.
How to proceed with this?
1.List all the catalog names
2.List all the schema names
3.List alll the table names
I have established a connection using python snowflake connector
Your best source for this information is in your SNOWFLAKE.ACCOUNT_USAGE share that Snowflake provides. You'l need to grant privileges to whatever role you are using to connect with Python. From there, though, there is are the following views: DATABASES, SCHEMATA, TABLES, and more.
The easiest way would be to follow the below process
show databases;
select "name" from table(result_scan(last_query_id()));
This will give you the list of Databases. Put them in a list. Traverse through this list and on each item do the following:
use <DBNAME>;
show schemas;
select "name" from table(result_scan(last_query_id()));
Get the list of schemas
use schema <SchemaName>;
show tables;
select "name" from table(result_scan(last_query_id()));
Get the list of tables and then run your queries.
You probably will not need the result_scan. Recently, I created a python program to list all columns for all tables within Snowflake. My requirement was to validate each column and calculate some numerical statistics of the columns. I was able to do it using 'Show Columns' only. I have open sourced some of the common snowflake operations which is available here
https://github.com/Infosys/Snowflake-Python-Development-Framework
You can clone this code and then use this framework to create your python program to list the columns as below and then you can do whatever you would like with the column details
##
from utilities.sf_operations import Snowflakeconnection
connection = Snowflakeconnection(profilename ='snowflake_host')
sfconnectionresults = connection.get_snowflake_connection()
sfconnection = sfconnectionresults.get('connection')
statuscode = sfconnectionresults.get('statuscode')
statusmessage = sfconnectionresults.get('statusmessage')
print(sfconnection,statuscode,statusmessage)
snow_sql = 'SHOW COLUMNS;'
queryresult = connection.execute_snowquery(sfconnection,snow_sql);
print(queryresult['result'])
print('column_name|table_name|column_attribute')
print('---------------------------------------------')
for rows in queryresult['result']:
table_name = rows[0]
schema_name = rows[1]
column_name = rows[2]
column_attribute = rows[3]
is_Null = rows[4]
default_Value = rows[5]
kind = rows[6]
expression = rows[7]
comment = rows[8]
database_name = rows[9]
autoincrement = rows[10]
print(column_name+'|'+table_name+'|'+column_attribute)
Background
I would like to update the schema of a table in BigQuery to match the schema of another table that contains a superset of the original columns. I would like to do it through the BigQuery Python client.
Problem
In practice I want to add some columns containing NULL to an already existing BigQuery table at an arbitrary position that is not necessarily the beginning or the end.
I know how to append new columns at the end of a table, following this snippet, but I would like to add columns in an arbitrary position. Moreover I would like to do it through a schema update, without having to query the entire table.
Being that the schema is actually a list of SchemaField objects, I thought that substituting the append method with the insert method would have sufficed. But this snippet does not do what I'd like:
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = 'my_dataset'
table_id = 'my_table'
table_ref = client.dataset(dataset_id).table(table_id)
table = client.get_table(table_ref) # API request
original_schema = table.schema
new_schema = original_schema[:] # creates a copy of the schema
# insert new_col at position 2, instead of appending
new_schema.insert(2, bigquery.SchemaField('new_col', 'STRING'))
table.schema = new_schema
table = client.update_table(table, ['schema']) # API request
This code results in the schema being updated exactly as if the method called was append, i.e. new_col gets placed at the end of the schema.
Question
Do you know if it's possible to modify the schema of a BigQuery table so that the new (NULL) columns are inserted at an arbitrary position?
as per the answer in this question I would suggest that to copy half table, add the required column and then add join the of the old table would be a possibility, really more expensive than add just a new column at the end, but still a possibility.
as explained on this post, such functionality doesn't exist in any SQL server, as column order is irrelevant. what could be done, is to append the new column, and then e.g. scramble the columns order and recreate the table with random column order. What is your business need for this?
I am writing a basic gui for a program which uses Peewee. In the gui, I would like to show all the tables which exist in my database.
Is there any way to get the names of all existing tables, lets say in a list?
Peewee has the ability to introspect Postgres, MySQL and SQLite for the following types of schema information:
Table names
Columns (name, data type, null?, primary key?, table)
Primary keys (column(s))
Foreign keys (column, dest table, dest column, table)
Indexes (name, sql*, columns, unique?, table)
You can get this metadata using the following methods on the Database class:
Database.get_tables()
Database.get_columns()
Database.get_indexes()
Database.get_primary_keys()
Database.get_foreign_keys()
So, instead of using a cursor and writing some SQL yourself, just do:
db = PostgresqlDatabase('my_db')
tables = db.get_tables()
For even more craziness, check out the reflection module, which can actually generate Peewee model classes from an existing database schema.
To get a list of the tables in your schema, make sure that you have established your connection and cursor and try the following:
cursor.execute("SELECT table_name FROM information_schema.tables WHERE table_schema='public'")
myables = cursor.fetchall()
mytables = [x[0] for x in mytables]
I hope this helps.
Using table creation as normal:
t = Table(name, meta, [columns ...])
This is the first run where I create the table. In future executions I would like to use the table without having to indicate the [columns]. This seems redundant as it should already be specified in the table schema. In other words, for future accesses, I'd like to simply do:
t = Table(name, meta) # columns already read from schema
Is there a way to do this in SqlAlchemy?
See Reflecting Database Objects of SA documentation:
t = Table(name, meta, autoload=True)#, autoload_with=engine)