Databricks not updating in SQL query - python

I am trying to replace special characters from a table column using SQL a SQL query. However, I get the following error. Can anyone tell me what I did wrong or how I should approach this?
SQL QUERY
UPDATE wine SET description = REPLACE(description, '%', '')
ERROR
error in sql statement: analysisexception: update destination only supports delta sources.

Databricks only supports updates for delta (delta lake) tables. The error message indicates that you try the update on a non-delta-table. So you would have to convert your data source to delta. For parquet it is very simple:
CONVERT TO DELTA parquet.`path/to/table` [NO STATISTICS]
[PARTITIONED BY (col_name1 col_type1, col_name2 col_type2, ...)]
See the Documentation for more details.

CONVERT TO DELTA parquet.s3://path/to/table
PARTITIONED BY (column_name INT) ;
--try this for partioned table

Related

Describe Snowflake table from Azure Databricks

I want to issue a DESC TABLE SQL command for a Snowflake table and using Azure Databricks but I can't quite figure it out! I'm not getting any errors but I'm not getting any results either. Here's the Python code I'm using:
options_vcp = {
"sfUrl": snowflake_url,
"sfUser": user,
"sfPassword": password,
"sfDatabase": db,
"sfWarehouse": wh,
"sfSchema": sch
}
sfUtils = sc._jvm.net.snowflake.spark.snowflake.Utils
sfUtils.runQuery(options_vcp, "DESC TABLE myTable")
I can download the Snowflake table using the "sfDatabase", "sfWarehouse", etc. values so they seem to be correct. I can run the DESC TABLE command in Snowflake and get correct results. But the only output I'm getting from databricks is this:
Out[1]: JavaObject id=o315
Does anyone know how to display this JavaObject or know of a different method to run DESC TABLE from Databricks?
From doc: Executing DDL/DML SQL Statements:
The runQuery method returns only TRUE or FALSE. It is intended for statements that do not return a result set, for example DDL statements like CREATE TABLE and DML statements like INSERT, UPDATE, and DELETE. It is not useful for statements that return a result set, such as SELECT or SHOW.
Alternative approach is to use INFORMATION_SCHEMA.COLUMNS view:
df = spark.read.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("query", "SELECT * FROM information_schema.columns WHERE table_name ILIKE 'myTable'")
.load()
Related: Moving Data from Snowflake to Spark:
When using DataFrames, the Snowflake connector supports SELECT queries only.
Usage Notes
Currently, the connector does not support other types of queries (e.g. SHOW or DESC, or DML statements) when using DataFrames.
I suggest using get_ddl() in your select statement to get the object definition:
https://docs.snowflake.com/en/sql-reference/functions/get_ddl.html

postgres SQL Python Pandas

I have an SQL query that works through a Grafana dashboard is this possible to recreate in Pandas?
In Grafana:
SELECT
"time" AS "time",
metric AS metric,
value
FROM volttron
WHERE
$__timeFilter("time") AND
kv_tags->'equip_name' = '["35201"]' AND
'power' = any(m_tags)
ORDER BY 1,2
Trying to recreate in Pandas postgress connection with psycopg2:
eGauge35201 = pd.read_sql('SELECT "time" AS "time", metric AS metric, value FROM volttron WHERE $__timeFilter("time") AND kv_tags->equip_name = ["35201"] AND power = any(m_tags) ORDER BY 1,2', dbconn)
This throws a lot of errors:
DatabaseError: Execution failed on sql 'SELECT "time" AS "time", metric AS metric, value FROM volttron WHERE $__timeFilter("time") AND kv_tags->equip_name = ["35201"] AND power = any(m_tags) ORDER BY 1,2': syntax error at or near "$"
LINE 1: ...c AS metric, value FROM slipstream_volttron WHERE $__timeFil...
Im trying to build a dataframe directly... Sorry still learning db any tips greatly appreciated...
Not using Grafana, I'm pretty sure that $__timeFilter and m_tags is something that should come from Grafana's end, and is then replaced by proper PostgreSQL expressions when when the query is actually made to the database after you defined it.
The query also uses several protected SQL words like "time" which is correctly escaped, and value, which isn't. This can lead to some unwanted behaviour.
I would recreate this query for Pandas to understand properly. But without knowing what e.g. $__timeFilter is replaced with, we cannot know what it should be. You could e.g. monitor Grafana's or the database's logs.
From a quick Google search, this looks promising.

problem with transformation to_timestamp python sql on databricks

I am trying to implement a transformation in python sql on databricks, I have tried several ways but without success, I request a validation please:
%sql
SELECT aa.AccountID__c as AccountID__c_2,
aa.LastModifiedDate,
to_timestamp(aa.LastModifiedDate, "yyyy-MM-dd HH:mm:ss.SSS") as test
FROM EVENTS aa
The output is as follows:
It can be seen that the validation is not correct, but even so it is executed on the engine and returns null.
I have also tried performing a substring on the LastModifiedDate field from 1 to 19, but without success ...
The date format you provided does not agree with the date format of that column, so you got null. Having said that, for standard date formats like the format you have, there is no need to provide any date format at all. Simply using to_timestamp will give the correct results.
%sql
SELECT aa.AccountID__c as AccountID__c_2,
aa.LastModifiedDate,
to_timestamp(aa.LastModifiedDate) as test
FROM EVENTS aa

Sql syntax error when insert into table with python on mysql database

I'm trying to implement the code on this video to my mysql database. When I enter the values by hand to python code it works. However when i use python dictionaries for getting entries from gui like the video it's says: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ':b_name, :b_category, :b_status, :b_author)' at line 1
On video he uses SQLite this can be the problem but i'm not sure.
my code is here
# Insert Into Table
c.execute("INSERT INTO `books` (`bookName`, `bookCategory`, `BookCurrentStatus`, `bookAuthor`) VALUES (:b_name, :b_category, :b_status, :b_author)",
{
'b_name': b_name.get(),
'b_category': b_category.get(),
'b_status': b_status.get(),
'b_author': b_author.get()
})
MySQLdb only supports the format and pyformat parameter styles. The named parameter style used in the video isn't supported.
I believe that you are not actually doing string replacement with your current setup, try this:
# Insert Into Table
c.execute("INSERT INTO `books` (`bookName`, `bookCategory`, `BookCurrentStatus`, `bookAuthor`) VALUES ({b_name}, {b_category}, {b_status}, {b_author})",
{
'b_name': b_name.get(),
'b_category': b_category.get(),
'b_status': b_status.get(),
'b_author': b_author.get()
})

Converting JSON into Python Dict with Postgresql data imported with SQLAlchemy

I've got a little bit of a tricky question here regarding converting JSON strings into Python data dictionaries for analysis in Pandas. I've read a bunch of other questions on this but none seem to work for my case.
Previously, I was simply using CSVs (and Pandas' read_csv function) to perform my analysis, but now I've moved to pulling data directly from PostgreSQL.
I have no problem using SQLAlchemy to connect to my engine and run my queries. My whole script runs the same as it did when I was pulling the data from CSVs. That is, until it gets to the part where I'm trying to convert one of the columns (namely, the 'config' column in the sample text below) from JSON into a Python dictionary. The ultimate goal of converting it into a dict is to be able to count the number of responses under the "options" field within the "config" column.
df = pd.read_sql_query('SELECT questions.id, config from questions ', engine)
df = df['config'].apply(json.loads)
df = pd.DataFrame(df.tolist())
df['num_options'] = np.array([len(row) for row in df.options])
When I run this, I get the error "TypeError: expected string or buffer". I tried converting the data in the 'config' column to string from object, but that didn't do the trick (I get another error, something like "ValueError: Expecting property name...").
If it helps, here's a snipped of data from one cell in the 'config' column (the code should return the result '6' for this snipped since there are 6 options):
{"graph_by":"series","options":["Strongbow Case Card/Price Card","Strongbow Case Stacker","Strongbow Pole Topper","Strongbow Base wrap","Other Strongbow POS","None"]}
My guess is that SQLAlchemy does something weird to JSON strings when it pulls them from the database? Something that doesn't happen when I'm just pulling CSVs from the database?
In recent Psycopg versions the Postgresql json(b) adaption to Python is transparent. Psycopg is the default SQLAlchemy driver for Postgresql
df = df['config']['options']
From the Psycopg manual:
Psycopg can adapt Python objects to and from the PostgreSQL json and jsonb types. With PostgreSQL 9.2 and following versions adaptation is available out-of-the-box. To use JSON data with previous database versions (either with the 9.1 json extension, but even if you want to convert text fields to JSON) you can use the register_json() function.
Just sqlalchemy query:
q = session.query(
Question.id,
func.jsonb_array_length(Question.config["options"]).label("len")
)
Pure sql and pandas' read_sql_query:
sql = """\
SELECT questions.id,
jsonb_array_length(questions.config -> 'options') as len
FROM questions
"""
df = pd.read_sql_query(sql, engine)
Combine both (my favourite):
# take `q` from the above
df = pd.read_sql(q.statement, q.session.bind)

Categories

Resources