Teradata MERGE yielding no results when executed through SQLAlchemy - python

I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below:
MERGE INTO perm_table as p
USING temp_table as t
ON p.Id = t.Id
WHEN MATCHED THEN
UPDATE
SET col1 = t.col1,
col2 = t.col2,
...
col50 = t.col50
WHEN NOT MATCHED THEN
INSERT (col1,
col2,
...
col50)
VALUES (t.col1,
t.col2,
...
t.col50)
The script runs all the way to the end without error and the SQL executes properly through Teradata Studio, but for some reason the table won't update when I execute it through SQLAlchemy. However, I've also run different SQL expressions, like the insert that populated perm_table from the same python script and it worked fine. Maybe there's something specific to the MERGE and SQLAlchemy combo?

Since you're using the engine directly, without using a transaction, you're probably (barring unseen configuration on your part) relying on SQLAlchemy's version of autocommit, which works by detecting data changing operations such as INSERTs etc. Possibly MERGE is not one of the detected operations. Try
sql = sqlalchemy.text(merge).execution_options(autocommit=True)
td_engine.execute(sql)

Related

Selecting a column with a slash in SQL Alchemy

I need to select some columns from a table with SQL Alchemy. Everything works fine except selecting the one column with '/' in the name. My query looks like:
query = select([func.sum(Table.c.ColumnName),
func.sum(Table.c.Column/Name),
])
Obviously the issue comes from the second line with the column 'Column/Name'. Is there a way in SQL Alchemy to overcome special characters in a column name?
edit:
I've it all inside some class but simplified version of a process looks like this. I create an engine (all necessary db data is inside create_new_engine() function) and map all tables in db into metadata.
def map(self):
from sqlalchemy.engine.base import Engine
# check if engine exist
if not isinstance(self.engine, Engine):
self.create_new_engine()
self.metadata = MetaData({'schema': 'dbo'})
self.metadata.reflect(bind=self.engine)
Then I map a single table with:
def map_table(self, table_name):
table = "{schema}.{table_name}".format(schema=self.metadata.schema, table_name=table_name)
table = self.metadata.tables[table]
return table
In the end I use pandas read_sql_query to run above query with connection and engine established earlier.
I'm connecting to SQL Server.
Since Table.c points to a plain python obect. Try in pure Python
query = select([func.sum(Table.c.ColumnName),
func.sum(getattr(Table.c, 'Column/Name')),
])
So in your case (from comments above) :
func.sum(getattr(Table.c, 'cur/fees'))

Describe Snowflake table from Azure Databricks

I want to issue a DESC TABLE SQL command for a Snowflake table and using Azure Databricks but I can't quite figure it out! I'm not getting any errors but I'm not getting any results either. Here's the Python code I'm using:
options_vcp = {
"sfUrl": snowflake_url,
"sfUser": user,
"sfPassword": password,
"sfDatabase": db,
"sfWarehouse": wh,
"sfSchema": sch
}
sfUtils = sc._jvm.net.snowflake.spark.snowflake.Utils
sfUtils.runQuery(options_vcp, "DESC TABLE myTable")
I can download the Snowflake table using the "sfDatabase", "sfWarehouse", etc. values so they seem to be correct. I can run the DESC TABLE command in Snowflake and get correct results. But the only output I'm getting from databricks is this:
Out[1]: JavaObject id=o315
Does anyone know how to display this JavaObject or know of a different method to run DESC TABLE from Databricks?
From doc: Executing DDL/DML SQL Statements:
The runQuery method returns only TRUE or FALSE. It is intended for statements that do not return a result set, for example DDL statements like CREATE TABLE and DML statements like INSERT, UPDATE, and DELETE. It is not useful for statements that return a result set, such as SELECT or SHOW.
Alternative approach is to use INFORMATION_SCHEMA.COLUMNS view:
df = spark.read.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("query", "SELECT * FROM information_schema.columns WHERE table_name ILIKE 'myTable'")
.load()
Related: Moving Data from Snowflake to Spark:
When using DataFrames, the Snowflake connector supports SELECT queries only.
Usage Notes
Currently, the connector does not support other types of queries (e.g. SHOW or DESC, or DML statements) when using DataFrames.
I suggest using get_ddl() in your select statement to get the object definition:
https://docs.snowflake.com/en/sql-reference/functions/get_ddl.html

How to run multiple inserts on multiple tables parallelly using Pyspark

I have insert data from staging to main tables using sql query using pyspark programming. But, the problem is I have inserts to multiple tables. So, in order to achieve parallelism what has to be performed instead of using threading.
spark.sql("INSERT INTO Cls.tbl1 (Contract, Name)
SELECT s.Contract, s.Name
FROM tbl1 AS s LEFT JOIN Cls.tbl1 AS c
ON s.Contract = c.Contract AND s.Adj = c.Adj
WHERE c.Contract IS NULL")
spark.sql("INSERT INTO Cls.tbl2 (Contract, Name)
SELECT s.Contract, s.Name
FROM tbl2 AS s LEFT JOIN Cls.tbl2 AS c
ON s.Contract = c.Contract AND s.Adj = c.Adj
WHERE c.Contract IS NULL")
We have to execute multiple insert statements as above and also we want to achieve parallelism when running through spark.
In short, you cannot run them in parallel. But you can run two different job, each inserting into one table, you can sort of achieve parallelism with this approach

Organizing SQL Queries in Python project

I have a python script I'm creating that will replace an set of SQL Server stored procedures to make the process more efficient. However, I have a 20-30 queries I need to execute at different points. To make the main query more simple I organized them into a separate file in a dictionary and created a function to pull the query to be executed.
My question here is there a better way to organize them? An idea I had was to put them into a table on the SQL Server or is this method best or is there another better method? Below is an example of what I'm doing now:
queryDict = {}
queryDict.update({"dbQuery1": "TRUNCATE TABLE MyTable;\
INSERT MyTable (Column1, Column2)\
SELECT Col1, Col2 FROM myTable2;"})
queryDict.update({"dbQuery1": 'SELECT MAX(val) FROM MyTable3;'})
def queryRequest(query):
return queryDict[query]

Copy tables from one database to another in SQL Server, using Python

Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...

Categories

Resources