Python Code from Databricks to connect to SQL server - python

I am trying to execute a python code from Databricks which primarily establish a connection from Python to SQL server using JDBC.
I used 'jaydebeapi' python library and when I run the code it gives error saying "JayDeBeApi throws AttributeError: '_jpype.PyJPField' object has no attribute 'getStaticAttribute' "
I searched in the internet and found the Jpype library is used in jaydebeapi is the problem and I downgraded the same to 0.6.3 version.
But still I am getting the same error. Can anyone explain me how to make this change and run in databricks.
Or is there any alternative library which I can use.

Why not directly follow the offical documents of databricks below to install Microsoft JDBC Driver for SQL Server for Spark Connector and refer to the sample code of Python using JDBC connect SQL Server.
SQL Databases using the Apache Spark Connector
SQL Databases using JDBC and its Python example with the jdbc url of MS SQL Server
If you were using Azure, there are the same documents for Azure Databricks, as below.
SQL Databases using the Apache Spark Connector for Azure Databricks
SQL Databases using JDBC for Azure Databricks

This is a known issue with JayDeBeApi, you may check out the issue on GitHub.
Due to a bug in 0.6.3 private variables were exposed as part of the interface. Also 0.6.3 had a default class customizer which automatically created a property to get and set if the methods matched a Java bean pattern. This property customizer was loaded late after many common java.lang classes were already loaded, and was not retroactive thus only user loaded classes that happened after initializer would have the customization. The private variable bug would mask the property customizer as the property customizer was not to override fields. Some libraries were unknowingly accessing private variables assuming they were using the property customizer.
The customizer was both unnecessary and resulted in frequent errors for new programmers. The buggy behavior has been removed and the problematic property customizer has been disable by default in 0.7.
Add lines to the module to enable the old property behavior. But this will not reenable the previous buggy access to private variables. Thus code that was exploiting the previous behavior which bypassed the getter/setter of java will need to use the reflection API.
To enable the property customizer, use
try:
import jpype.beans
except ImportError:
pass
Hope this helps.

Related

Cloud Run: how do I set check_same_thread=False?

My project which runs in Cloud Run of Google Cloud Platform (GCP) has generated errors: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 68387105408768 and this is thread id 68386614675200. for hours before it went back to normal by itself.
Our code is written in Python with flask & no SQLite is involved. Saw suggestions to set check_same_thread to False. May I know where I can set this in Cloud Run or GCP? Thanks.
That setting has nothing to do with your runtime environment, but is set during the connection initialization with sqlite (https://docs.python.org/3/library/sqlite3.html#module-functions), so if you claim that you aren't creating an sqlite connection that won't help you much.
That being said, I find it hard to believe that you are getting that error without using sqlite. More likely is that you are using sqlite via some dependency.
Since sqlite3 is part of the standard library of python it might however not be trivial to figure out which dependency uses it.

Connecting Caché database in Azure function

I'm looking to a way to "simply" access to a Caché database using python (I need to make sql query on this database).
I've heard about a python package (Intersys) but I can't find it anymore (having this package would be the most simple way).
I've tried using pyodbc connection with the appropriate Caché driver : it works on my machine, however when I try to deploi the function in production (Linux OS), the driver's file is not found.
Thank you
There is only one way, on how to make it work with Python, is using pydobc, and InterSystems driver.

Dataflow jobs are failed due to Beam-nuggets referencing to sqlalchemy

We have created an ETL in GCP which reads data from MySQL and migrates it to BigQuery. To read data from MySQL, we use beam-nuggets library. This library is passed as an extra package ('--extra_package=beam-nuggets-0.17.1.tar.gz') to the dataflow job. Cloud functions were used to create the dataflow job. The code was working fine and the Dataflow job got created and the data migration was successful.
After the latest version of sqlalchemy – 1.4 got released, we were unable to deploy the cloud function. The cloud function deployment failed with the exception as mentioned below.
To fix this issue, we tried to give the previous version of sqlalchemy – 1.3.23 in the requirements.txt file of cloud functions. This resolved the issue and the cloud functions got deployed successfully. But when we triggered the dataflow job from cloud functions, we got the same error as mentioned above.
This issue is caused because beam-nuggets library is internally referencing sqlalchemy during the run time and the job fails with the same error. Is it possible to manually enforce beam-nuggets to pick a specific version of sqlalchemy??
Try passing a specific version of sqlalchemy via the extra_package flag as well.

Python: How to Connect to Snowflake Using Apache Beam?

I see that there's a built-in I/O connector for BigQuery, but a lot of our data is stored in Snowflake. Is there a workaround for connecting to Snowflake? The only thing I can think of doing is to use sqlalchemy to run the query and then dump the output to Cloud Storage Buckets, and then Apache-Beam can get the input data from the files stored in the Bucket.
There were added Snowflake Python and Java connectors to Beam recently.
Right now (version 2.24) it supports only ReadFromSnowflake operation in apache_beam.io.external.snowflake.
In the 2.25 release WriteToSnowflake will also be available in apache_beam.io.snowflake module. You can still use the old path, however it will be considered deprecated in this version.
Right now it runs only on Flink Runner but there is an effort to make it available for other runners as well.
Also, it's a cross-language transform so some additional setup can be needed - it's quite well documented in the pydoc here (I'm pasting it below):
https://github.com/apache/beam/blob/release-2.24.0/sdks/python/apache_beam/io/external/snowflake.py
Snowflake transforms tested against Flink portable runner.
**Setup**
Transforms provided in this module are cross-language transforms
implemented in the Beam Java SDK. During the pipeline construction, Python SDK
will connect to a Java expansion service to expand these transforms.
To facilitate this, a small amount of setup is needed before using these
transforms in a Beam Python pipeline.
There are several ways to setup cross-language Snowflake transforms.
* Option 1: use the default expansion service
* Option 2: specify a custom expansion service
See below for details regarding each of these options.
*Option 1: Use the default expansion service*
This is the recommended and easiest setup option for using Python Snowflake
transforms.This option requires following pre-requisites
before running the Beam pipeline.
* Install Java runtime in the computer from where the pipeline is constructed
and make sure that 'java' command is available.
In this option, Python SDK will either download (for released Beam version) or
build (when running from a Beam Git clone) a expansion service jar and use
that to expand transforms. Currently Snowflake transforms use the
'beam-sdks-java-io-expansion-service' jar for this purpose.
*Option 2: specify a custom expansion service*
In this option, you startup your own expansion service and provide that as
a parameter when using the transforms provided in this module.
This option requires following pre-requisites before running the Beam
pipeline.
* Startup your own expansion service.
* Update your pipeline to provide the expansion service address when
initiating Snowflake transforms provided in this module.
Flink Users can use the built-in Expansion Service of the Flink Runner's
Job Server. If you start Flink's Job Server, the expansion service will be
started on port 8097. For a different address, please set the
expansion_service parameter.
**More information**
For more information regarding cross-language transforms see:
- https://beam.apache.org/roadmap/portability/
For more information specific to Flink runner see:
- https://beam.apache.org/documentation/runners/flink/
Snowflake (as most of the portable IOs) has its own java expansion service which should be downloaded automatically when you don't specify your own custom one. I don't think it should be needed but I'm mentioning it just to be on the safe side. You can download the jar and start it with java -jar <PATH_TO_JAR> <PORT> and then pass it to snowflake.ReadFromSnowflake as expansion_service='localhost:<PORT>'. Link to 2.24 version: https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-io-snowflake-expansion-service/2.24.0
Notice that it's still experimental though and feel free to report issues on Beam Jira.
Google Cloud Support here!
There's no direct connector from Snowflake to Cloud Dataflow, but one workaround would be what you've mentioned. First dump the output to Cloud Storage, and then connect Cloud Storage to Cloud Dataflow.
I hope that helps.
For future folks looking for a tutorial on how to start with Snowflake and Apache Beam, I can recommend the below tutorial which was made by the creators of the connector.
https://www.polidea.com/blog/snowflake-and-apache-beam-on-google-dataflow/

Is it possible to use django-pyodbc with iSeries Access ODBC Driver?

I am trying to get Django-pyodbc to work with DB2 on IBM i using the standard IBM i Access ODBC driver.
I know there is a Django DB implementation supported by IBM, but that requires the DB2 Connect product, which is (for us) prohibitively expensive, whereas the included Access ODBC driver comes no-charge with the OS.
I have seen a question asked regarding django-pyodbc with iSeries ODBC, suggesting that it is possible, but I have found no way to get it to work:
https://stackoverflow.com/questions/25066866/django-inspectdb-on-db2-database
My first question therefore is; has anybody succeeded in getting this setup to work?
And if yes, can you share information on how you did it?
Thanks,
Richard

Categories

Resources