How do I pass a date variable to Teradata using Spark?

How do I pass a date variable to Teradata using Spark? - python

Looked around but haven't been able to find this quesiton yet... I'm working in Jupyter notebook writing python code, and a lot of the datasets we use are in Teradata, and so my code usually looks like this:
cs = '''
(
select
*
from SST01.data
where snap_dt = '2020-08-31'
)foo'''
dfclnt_status = spark.read.format('jdbc') \
.option('url', 'jdbc:teradata://teradataservernamehere') \
.option('driver', 'com.teradata.jdbc.TeraDriver') \
.option('user', 'redacted') \
.option('password', PASS) \
.option('dbtable', cs) \
.load()
I know that in spark when running code against our Hive tables I can pass date variables using '{VAR}' but when I try to apply the same thing in queries against Teradata I get this error:
Py4JJavaError: An error occurred while calling o233.load.
: java.sql.SQLException: [Teradata Database] [TeraJDBC 16.30.00.00] [Error 3535] [SQLState 22003] A character string failed conversion to a numeric value.
How is it possible to pass date variables into Teradata?
EDIT: My variables look like this:
END_DT='2020-08-31'

The easiest way is probably to explicitly convert your field to a date, like so:
to_date('2020-08-31')
If you're still getting an error, take a look at the table DDL. The error says the field is numeric.

Related

pySpark / Databricks - How to write a dataframe from pyspark to databricks?

I'm trying to write a dataframe from a pyspark job that runs on AWS Glue to a Databricks cluster and I'm facing an issue I can't solve.
Here's the code that does the writing :
spark_df.write.format("jdbc") \
.option("url", jdbc_url) \
.option("dbtable", table_name) \
.option("password", pwd) \
.mode("overwrite") \
.save()
Here's the error I'm getting :
[Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: \nno viable alternative at input '(\"TEST_COLUMN\"'(line 1, pos 82)\n\n== SQL ==\nCREATE TABLE MYDATABASE.MYTABLE (\"TEST_COLUMN\" TEXT
It seems that the issue comes from the fact the SQL statement is using the column name with double quotes instead of single quotes and it's failing because of that.
I thought simple things like that would be managed automatically by spark but it seems it's not the case.
Do you know how can I solve that issue please ?
Thanks in advance !

Snowflake case when statement not working using args

I am trying to make the select statement dependent on an args variable I am passing via a python script (in this case args.type='hello').
It looks like this:
case when '{{type}}' = 'hello'
then
SELECT
name from table1
else
SELECT
city from table2 where code='usa'
end
The error I am getting:
syntax error unexpected 'case'.
syntax error unexpected 'else'.
syntax error unexpected ')'.
I also tried IFF clause but did run into same issues.

If you were sending this sql to snowflake and it was failing due to a syntax error, I would expect you to get an error like "SQL compilation error: ...". Therefore I wonder if the issue isn't in your python program.
Could you share more?

Where you trying to set some paramters?
The snowflake python connector supports several syntaxes:
format: .execute("... WHERE my_column = %s", (value,))
pyformat: .execute("... WHERE my_column = %(name)s", {"name": value})
qmark: .execute("... WHERE my_column = ?", (value,))
numeric: .execute("... WHERE my_column = :1", (value,))
If you are using python3 you can use something like f" {pythonvar} ".
Could you give us more context about what you are doing?

Using the return values from one SQL statement in another SQL statement using SQL Magic in a Jupyter Notebook

I am working on a Jupyter Notebook on some standard DB2 table functions. I'd like to be able to refer to values returned from one SQL statement in other SQL statements. The syntax of the variable reference is getting me here. This is the code I use to get the values I want to use in later statements:
mgd_baseline = %sql select float(rows_read) rows_read \
, float(rows_returned) rows_returned \
from table(mon_get_database(-2)) as mgd
Then I would like to use it like this:
if mgd_baseline[0].rows_read > 0 or mgd_baseline[0].rows_returned > 0:
%sql select decimal((float(rows_read)-:mgd_baseline[0].rows_read/(float(rows_returned)-:mgd_baseline[0].rows_returned),10,0) read_eff \
from table(mon_get_database(-2)) as mgd
But that fails with this error message:
(ibm_db_dbi.ProgrammingError) ibm_db_dbi::ProgrammingError: SQLNumResultCols failed: [IBM][CLI Driver][DB2/NT64] SQL0104N An unexpected token "," was found following "-?[0].rows_returned)". Expected tokens may include: ")". SQLSTATE=42601\r SQLCODE=-104 [SQL: 'select decimal((float(rows_read)-?[0].rows_read/(float(rows_returned)-?[0].rows_returned),10,0) read_eff from table(mon_get_database(-2)) as mgd'] [parameters: ([(61959.0, 3219.0)], [(61959.0, 3219.0)])]
It looks to me like the sql magic is not passing on the value the way I would expect it to. It looks like it is considering the end of the host variable name the opening square bracket. I am not familiar enough with python to know what notation I can use to make it work.
I know I can do this as a workaround:
if mgd_baseline[0].rows_read > 0 or mgd_baseline[0].rows_returned > 0:
bl_rows_read=mgd_baseline[0].rows_read
bl_rows_returned=mgd_baseline[0].rows_returned
read_eff=%sql select decimal((float(rows_read)-:bl_rows_read)/(float(rows_returned)-:bl_rows_returned),16,2) read_eff \
from table(mon_get_database(-2)) as mgd
Due to some future plans, I would prefer to not have to do the additional assignment.
Is there any way to use those values (mgd_baseline[0].rows_read, mgd_baseline[0].rows_returned) directly in my sql magic sql statement without reassigning them?

I figured it out. It's not using host variables (generic compilation with different values for better package cache use), but for this particular application I don't particularly care if they're treated as host variables. Here is what works:
if mgd_baseline[0].rows_read > 0 or mgd_baseline[0].rows_written > 0:
read_eff=%sql select decimal((float(rows_read)-{mgd_baseline[0].rows_read})/(float(rows_returned)-{mgd_baseline[0].rows_returned}),16,2) read_eff \
from table(mon_get_database(-2)) as mgd

Django: Using named parameters on a raw SQL query

I'm trying to execute a raw query that is built dynamically.
To assure that the parameters are inserted in the valid position I'm using named parameters.
This seems to work for Sqlite without any problems. (all my tests succeed)
But when I'm running the same code against MariaDB it fails...
A simple example query:
SELECT u.*
FROM users_gigyauser AS u
WHERE u.email like :u_email
GROUP BY u.id
ORDER BY u.last_login DESC
LIMIT 60 OFFSET 0
Parameters are:
{'u_email': '%test%'}
The error I get is a default syntax error as the parameter is not replaced.
I tried using '%' as an indicator, but this resulted in SQL trying to parse
%u[_email]
and that returned a type error.
I'm executing the query like this:
raw_queryset = GigyaUser.objects.raw(
self.sql_fetch, self._query_object['params']
)
Or when counting:
cursor.execute(self.sql_count, self._query_object['params'])
Both give the same error on MariaDB but work on Sqlite (using the ':' indicator)
Now, what am I missing?

edit:
The format needs to have s suffix as following:
%(u_email)s

If you are using SQLite3, for some reason syntax %(name)s will not work.
You have to use :name syntax instead if you want to pass your params as {"name":"value"} dictionary.
It's contrary to the documentation, that states the first syntax should work with all DB engines.
Heres the source of the issue:
https://code.djangoproject.com/ticket/10070#comment:18

getting error during describe a table in vectorwise using python ingresdbi module

I am using python ingress module for connectivity with vectorwise database.
For describe a table I am using the code below:
import ingresdbi
local_db = ingresdbi.connect(database ='x',uid ='y',driver ='z',pwd ='p')
local_db_cursor = local_db.cursor()
local_db_cursor.execute('help tran_applog ; ' )
I am getting this error :
Syntax error. Last symbol read was: 'help'."
Solutions will be appreciated. Thanks

The problem you've got is that 'help' isn't a real SQL statement that's understood by the DBMS server. It's really a terminal monitor command that gets converted into some queries against the system catalogs under the covers.
The alternative depends a little on what you're trying to get from the "describe table". The system catalogs relating to table and column information are iitables and iicolumns and you can do a select against them. Check the documentation or experiment.
Alternatively there appears to be a row descriptor you can get from ingresdbi, see the example here http://community.actian.com/wiki/Python_Row_Description
HTH

I believe you should do it like in any other shell script: echo "help tran_applog;" | sql mydatabase
Reason: "HELP" is not a standard SQL statement.
As suggested by PaulM, your best option to get metadata about tables is to query the system catalogs (iitables, iicolumns, iirelation, etc).
Start with something like:
SELECT C.column_name, C.column_datatype
FROM iitables T, iicolumns C
WHERE T.table_name = C.table_name
AND T.table_name = 'tran_applog';\g

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I pass a date variable to Teradata using Spark? - python

The easiest way is probably to explicitly convert your field to a date, like so: to_date('2020-08-31') If you're still getting an error, take a look at the table DDL. The error says the field is numeric.

Related

pySpark / Databricks - How to write a dataframe from pyspark to databricks?

Snowflake case when statement not working using args

Using the return values from one SQL statement in another SQL statement using SQL Magic in a Jupyter Notebook

Django: Using named parameters on a raw SQL query

getting error during describe a table in vectorwise using python ingresdbi module

Categories

Resources