how to copy and paste sql query in pandas read_Sql - python

I'm new to python and am trying to run sql code in python and have the results in a pandas dataframe. I'm using the following code and the code runs when i have a simple sql query. But when I try to run a super long and complex query with proper formatting in sql, it fails. Can I use any module/option so python recognizes the indention and spacing within sql queries as python specific?
cnxn=...#here it's the connection to my sql server database
sql_2=
r'( Select distinct NPI,
practice_code=RIGHT('000'+CAST(newcode AS VARCHAR(3)),3),
SRcode,
StandardZip,
Zipclass,
CountySSA,
PrimaryCountySSA,
PrimaryCounty,
PrimaryCountyClass,
Lat_Clean,
Long_Clean
FROM Docusinporactice a
LEFT JOIN locationInfo b
on a.zip=b.zip
)
sql_data_test=pd.read_sql_query(sql_2, cnxn)

r = """ Select distinct NPI,
practice_code=RIGHT('000'+CAST(newcode AS VARCHAR(3)),3),
SRcode,
StandardZip,
Zipclass,
CountySSA,
PrimaryCountySSA,
PrimaryCounty,
PrimaryCountyClass,
Lat_Clean,
Long_Clean
FROM Docusinporactice a
LEFT JOIN locationInfo b
on a.zip=b.zip
"""
this way should work the sql statement

Related

is it possible to write a sql query in a python UDF in snowflake snowpark?

i'm trying to create a new UDF function inside snowflake. in this UDF, I need to write a SQL query that will return a list of tables, and than I need to do some python code around it, like this example:
create or replace function SnowparkPrivateSchema()
returns string
language python
runtime_version=3.8
handler='SnowparkPrivateSchema'
as $$
def SnowparkPrivateSchema(self, symbol, quantity, price):
get_tables = '''select table_name from INFORMATION_SCHEMA.TABLES'''
for table in get_tables:
'''create or replace table clone_user.{table} clone {table}'''
$$;
You cannot execute SQL inside of a Snowflake UDF (the same is true for Python/Snowpark as well as standard SQL, Javascript, etc.). UDFs can only take in one row of data, and output a scalar value from that input.
You can execute SQL from a Snowpark Python stored procedure (as well as a stored proc written w/ any other language); A Python stored procedure can be used to execute SQL just like a SQL stored proc, including potentially invoking your UDF that performs some operation on the data returned from your SQL queries.
from snowflake.snowpark import Session
session = Session.builder.configs(connection_parameters).create()
get_tables = session.sql("select table_name from INFORMATION_SCHEMA.TABLES")

Describe Snowflake table from Azure Databricks

I want to issue a DESC TABLE SQL command for a Snowflake table and using Azure Databricks but I can't quite figure it out! I'm not getting any errors but I'm not getting any results either. Here's the Python code I'm using:
options_vcp = {
"sfUrl": snowflake_url,
"sfUser": user,
"sfPassword": password,
"sfDatabase": db,
"sfWarehouse": wh,
"sfSchema": sch
}
sfUtils = sc._jvm.net.snowflake.spark.snowflake.Utils
sfUtils.runQuery(options_vcp, "DESC TABLE myTable")
I can download the Snowflake table using the "sfDatabase", "sfWarehouse", etc. values so they seem to be correct. I can run the DESC TABLE command in Snowflake and get correct results. But the only output I'm getting from databricks is this:
Out[1]: JavaObject id=o315
Does anyone know how to display this JavaObject or know of a different method to run DESC TABLE from Databricks?
From doc: Executing DDL/DML SQL Statements:
The runQuery method returns only TRUE or FALSE. It is intended for statements that do not return a result set, for example DDL statements like CREATE TABLE and DML statements like INSERT, UPDATE, and DELETE. It is not useful for statements that return a result set, such as SELECT or SHOW.
Alternative approach is to use INFORMATION_SCHEMA.COLUMNS view:
df = spark.read.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("query", "SELECT * FROM information_schema.columns WHERE table_name ILIKE 'myTable'")
.load()
Related: Moving Data from Snowflake to Spark:
When using DataFrames, the Snowflake connector supports SELECT queries only.
Usage Notes
Currently, the connector does not support other types of queries (e.g. SHOW or DESC, or DML statements) when using DataFrames.
I suggest using get_ddl() in your select statement to get the object definition:
https://docs.snowflake.com/en/sql-reference/functions/get_ddl.html

Teradata MERGE yielding no results when executed through SQLAlchemy

I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below:
MERGE INTO perm_table as p
USING temp_table as t
ON p.Id = t.Id
WHEN MATCHED THEN
UPDATE
SET col1 = t.col1,
col2 = t.col2,
...
col50 = t.col50
WHEN NOT MATCHED THEN
INSERT (col1,
col2,
...
col50)
VALUES (t.col1,
t.col2,
...
t.col50)
The script runs all the way to the end without error and the SQL executes properly through Teradata Studio, but for some reason the table won't update when I execute it through SQLAlchemy. However, I've also run different SQL expressions, like the insert that populated perm_table from the same python script and it worked fine. Maybe there's something specific to the MERGE and SQLAlchemy combo?
Since you're using the engine directly, without using a transaction, you're probably (barring unseen configuration on your part) relying on SQLAlchemy's version of autocommit, which works by detecting data changing operations such as INSERTs etc. Possibly MERGE is not one of the detected operations. Try
sql = sqlalchemy.text(merge).execution_options(autocommit=True)
td_engine.execute(sql)

Pandas read_sql query with multiple selects

Can read_sql query handle a sql script with multiple select statements?
I have a MSSQL query that is performing different tasks, but I don't want to have to write an individual query for each case. I would like to write just the one query and pull in the multiple tables.
I want the multiple queries in the same script because the queries are related, and it making updating the script easier.
For example:
SELECT ColumnX_1, ColumnX_2, ColumnX_3
FROM Table_X
INNER JOIN (Etc etc...)
----------------------
SELECT ColumnY_1, ColumnY_2, ColumnY_3
FROM Table_Y
INNER JOIN (Etc etc...)
Which leads to two separate query results.
The subsequent python code is:
scriptFile = open('.../SQL Queries/SQLScript.sql','r')
script = scriptFile.read()
engine = sqlalchemy.create_engine("mssql+pyodbc://UserName:PW!#Table")
connection = engine.connect()
df = pd.read_sql_query(script,connection)
connection.close()
Only the first table from the query is brought in.
Is there anyway I can pull in both query results (maybe with a dictionary) that will prevent me from having to separate the query into multiple scripts.
You could do the following:
queries = """
SELECT ColumnX_1, ColumnX_2, ColumnX_3
FROM Table_X
INNER JOIN (Etc etc...)
---
SELECT ColumnY_1, ColumnY_2, ColumnY_3
FROM Table_Y
INNER JOIN (Etc etc...)
""".split("---")
Now you can query each table and concat the result:
df = pd.concat([pd.read_sql_query(q, connection) for q in queries])
Another option is to use UNION on the two results i.e. do the concat in SQL.

Working with Cursors in Python

Searched the web and this forum without satisfaction. Using Python 2.7 and pyODBC on Windows XP. I can get the code below to run and generate two cursors from two different databases without problems. Ideally, I'd then like to join these result cursors thusly:
SELECT a.state, sum(b.Sales)
FROM cust_curs a
INNER JOIN fin_curs b
ON a.Cust_id = b.Cust_id
GROUP BY a.state
Is there a way to join cursors using SQL statements in python or pyODBC? Would I need to store these cursors in a common DB (SQLite3?) to accomplish this? Is there a pure python data handling approach that would generate this summary from these two cursors?
Thanks for your consideration.
Working code:
import pyodbc
#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()
fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales
FROM Finance.Sales_Tbl
GROUP BY Cust_id""")
#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()
cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state
FROM Customers.Cust_Data""")
Cursors are simply objects used for executing SQL commands and retrieving the results. The data aren't migrated in a new database and thus joins aren't possible. If you would like to join the data you'll need to have the two tables in the same database. Whether that means brining both tables and their data into a SQLite database or doing it some other way depends on the specifics of your use case, but that would theoretically work.

Categories

Resources