I want to issue a DESC TABLE SQL command for a Snowflake table and using Azure Databricks but I can't quite figure it out! I'm not getting any errors but I'm not getting any results either. Here's the Python code I'm using:
options_vcp = {
"sfUrl": snowflake_url,
"sfUser": user,
"sfPassword": password,
"sfDatabase": db,
"sfWarehouse": wh,
"sfSchema": sch
}
sfUtils = sc._jvm.net.snowflake.spark.snowflake.Utils
sfUtils.runQuery(options_vcp, "DESC TABLE myTable")
I can download the Snowflake table using the "sfDatabase", "sfWarehouse", etc. values so they seem to be correct. I can run the DESC TABLE command in Snowflake and get correct results. But the only output I'm getting from databricks is this:
Out[1]: JavaObject id=o315
Does anyone know how to display this JavaObject or know of a different method to run DESC TABLE from Databricks?
From doc: Executing DDL/DML SQL Statements:
The runQuery method returns only TRUE or FALSE. It is intended for statements that do not return a result set, for example DDL statements like CREATE TABLE and DML statements like INSERT, UPDATE, and DELETE. It is not useful for statements that return a result set, such as SELECT or SHOW.
Alternative approach is to use INFORMATION_SCHEMA.COLUMNS view:
df = spark.read.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("query", "SELECT * FROM information_schema.columns WHERE table_name ILIKE 'myTable'")
.load()
Related: Moving Data from Snowflake to Spark:
When using DataFrames, the Snowflake connector supports SELECT queries only.
Usage Notes
Currently, the connector does not support other types of queries (e.g. SHOW or DESC, or DML statements) when using DataFrames.
I suggest using get_ddl() in your select statement to get the object definition:
https://docs.snowflake.com/en/sql-reference/functions/get_ddl.html
Related
I need to select some columns from a table with SQL Alchemy. Everything works fine except selecting the one column with '/' in the name. My query looks like:
query = select([func.sum(Table.c.ColumnName),
func.sum(Table.c.Column/Name),
])
Obviously the issue comes from the second line with the column 'Column/Name'. Is there a way in SQL Alchemy to overcome special characters in a column name?
edit:
I've it all inside some class but simplified version of a process looks like this. I create an engine (all necessary db data is inside create_new_engine() function) and map all tables in db into metadata.
def map(self):
from sqlalchemy.engine.base import Engine
# check if engine exist
if not isinstance(self.engine, Engine):
self.create_new_engine()
self.metadata = MetaData({'schema': 'dbo'})
self.metadata.reflect(bind=self.engine)
Then I map a single table with:
def map_table(self, table_name):
table = "{schema}.{table_name}".format(schema=self.metadata.schema, table_name=table_name)
table = self.metadata.tables[table]
return table
In the end I use pandas read_sql_query to run above query with connection and engine established earlier.
I'm connecting to SQL Server.
Since Table.c points to a plain python obect. Try in pure Python
query = select([func.sum(Table.c.ColumnName),
func.sum(getattr(Table.c, 'Column/Name')),
])
So in your case (from comments above) :
func.sum(getattr(Table.c, 'cur/fees'))
i'm trying to create a new UDF function inside snowflake. in this UDF, I need to write a SQL query that will return a list of tables, and than I need to do some python code around it, like this example:
create or replace function SnowparkPrivateSchema()
returns string
language python
runtime_version=3.8
handler='SnowparkPrivateSchema'
as $$
def SnowparkPrivateSchema(self, symbol, quantity, price):
get_tables = '''select table_name from INFORMATION_SCHEMA.TABLES'''
for table in get_tables:
'''create or replace table clone_user.{table} clone {table}'''
$$;
You cannot execute SQL inside of a Snowflake UDF (the same is true for Python/Snowpark as well as standard SQL, Javascript, etc.). UDFs can only take in one row of data, and output a scalar value from that input.
You can execute SQL from a Snowpark Python stored procedure (as well as a stored proc written w/ any other language); A Python stored procedure can be used to execute SQL just like a SQL stored proc, including potentially invoking your UDF that performs some operation on the data returned from your SQL queries.
from snowflake.snowpark import Session
session = Session.builder.configs(connection_parameters).create()
get_tables = session.sql("select table_name from INFORMATION_SCHEMA.TABLES")
I am using azure databricks and I have the following sql query that I would like to convert into a spark python code:
SELECT DISTINCT
personID,
SUM(quantity) as total_shipped
FROM(
SELECT p.personID,
p.systemID,
s.quantity
FROM shipped s
LEFT JOIN ordered p
on (s.OrderId = p.OrderNumber OR
substr(s.OrderId,1,6) = p.OrderNumber )
and p.ndcnum = s.ndc
where s.Dateshipped <= "2022-04-07"
AND personID is not null
group by personID
I intend to merge the spark dataframes first, then perform the aggregated sum. However, I think I am making it more complicated than it is. So far, this is what I have but I am getting InvalidSyntax error:
ordered.join(shipped, ((ordered("OrderId").or(ordered.select(substring(ordered.OrderId, 1, 6)))) === ordered("ORDERNUMBER")) &&
(ordered("ndcnumber") === ordered("ndc")),"left")
.show()
The part I am getting confused is on the OR statement from the SQL query, how do I convert that into a spark python statement?
There is beauty in using databricks. you can directly use the same code by calling spark.sql(""" {your sql query here} """) and you will still get the same results. You can assign it to a variable and you will have a dataframe.
I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below:
MERGE INTO perm_table as p
USING temp_table as t
ON p.Id = t.Id
WHEN MATCHED THEN
UPDATE
SET col1 = t.col1,
col2 = t.col2,
...
col50 = t.col50
WHEN NOT MATCHED THEN
INSERT (col1,
col2,
...
col50)
VALUES (t.col1,
t.col2,
...
t.col50)
The script runs all the way to the end without error and the SQL executes properly through Teradata Studio, but for some reason the table won't update when I execute it through SQLAlchemy. However, I've also run different SQL expressions, like the insert that populated perm_table from the same python script and it worked fine. Maybe there's something specific to the MERGE and SQLAlchemy combo?
Since you're using the engine directly, without using a transaction, you're probably (barring unseen configuration on your part) relying on SQLAlchemy's version of autocommit, which works by detecting data changing operations such as INSERTs etc. Possibly MERGE is not one of the detected operations. Try
sql = sqlalchemy.text(merge).execution_options(autocommit=True)
td_engine.execute(sql)
I'm new to python and am trying to run sql code in python and have the results in a pandas dataframe. I'm using the following code and the code runs when i have a simple sql query. But when I try to run a super long and complex query with proper formatting in sql, it fails. Can I use any module/option so python recognizes the indention and spacing within sql queries as python specific?
cnxn=...#here it's the connection to my sql server database
sql_2=
r'( Select distinct NPI,
practice_code=RIGHT('000'+CAST(newcode AS VARCHAR(3)),3),
SRcode,
StandardZip,
Zipclass,
CountySSA,
PrimaryCountySSA,
PrimaryCounty,
PrimaryCountyClass,
Lat_Clean,
Long_Clean
FROM Docusinporactice a
LEFT JOIN locationInfo b
on a.zip=b.zip
)
sql_data_test=pd.read_sql_query(sql_2, cnxn)
r = """ Select distinct NPI,
practice_code=RIGHT('000'+CAST(newcode AS VARCHAR(3)),3),
SRcode,
StandardZip,
Zipclass,
CountySSA,
PrimaryCountySSA,
PrimaryCounty,
PrimaryCountyClass,
Lat_Clean,
Long_Clean
FROM Docusinporactice a
LEFT JOIN locationInfo b
on a.zip=b.zip
"""
this way should work the sql statement