Unable to execute scala code on Azure DataBricks cluster - python

I am trying to setup a Development environment for DataBricks, So my developers can write code using VSCODE IDE(or some other IDE) and execute the code against the DataBricks Cluster.
So I went through the Documentation of DataBricks Connect and did the setup as suggested in the document.
https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Post the Setup I am able to execute python code on Azure DataBricks cluster, but not with Scala code
While Running the setup I found that it is saying Skipping scala command test on windows, I am not sure whether I am missing some configuration here.
Please suggest how to resolve this issue.

This is not an error but just a statement that says databricks-connect test is skipping testing scala code on windows you can still execute code from local machine on cluster using databricks-connect, you need to add the jars from databricks-connect get-jar-dir directory to your project structure in IDE as described in this documentation steps https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#intellij-scala-or-java
Also note that when using azure databricks you enter a generic Databricks Host along with your workspace id(org-id) when you execute databricks-connect configure
eg- https://westeurope.azuredatabricks.net/o=?xxxx instead of https://adb-xxxx.yz.azuredatabricks.net

Related

Running python scripts on Databricks cluster

Is it possible to run arbitrary python script written in Pycharm on my azure Databricks cluster?
Databricks offered using databricks-connect but it turned out to be useful for only spark-jobs.
More specifically I'd like to like to use networkx to analyse some graph so huge that my local machine unable to work with them.
I'm not sure if its possible at all...
Thanks in advance!

How do I create a standalone Jupyter Lab server using Pyinstaller or similar?

I would like to create a self-contained, .exe file that launches a JupyterLab server as an IDE on a physical server which doesn't have Python installed itself.
The idea is to deploy it as part of an ETL workflow tool, so that it can be used to view notebooks that will contain the ETL steps in a relatively easily digestible format (the notebooks will be used as pipelines via papermill and scrapbook - not really relevant here).
While I can use Pyinstaller to bundle JupyterLab as a package, there isn't a way to launch it on the Pythonless server (that I can see), and I can't figure out a way to do it using Python code alone.
Is it possible to package JupyterLab this way so that I can run the .exe on the server and then connect to 127.0.0.1:8888 on the server to view a notebook?
I have tried using the link below as a starting point, but I think I'm missing something as no server seems to start using this code alone, and I'm not sure how I would execute this via a tornado server etc.:
https://gist.github.com/bollwyvl/bd56b58ba0a078534272043327c52bd1
I would really appreciate any ideas, help, or somebody to tell my why this idea is impossible madness!
Thanks!
Phil.
P.S. I should add that Docker isn't an option here :( I've done this before using Docker and it's extremely easy.

How do I import Apache Airflow into Intellij?

So I'm trying to get my Intellij to see Apache Airflow that I downloaded. The steps I've taken so far:
I've downloaded the most recent Apache Airflow setup and saved the apache airflow 2.2.3 onto my desktop. I'm trying to get it to work with my Intellij, I've tried adding the Apache Airflow folder into the Library and Modules, both have come back with errors stating it's not being utilized. I've tried looking up documentation on it within Airflow but I'm not able to find any documentation on how to implement in your own IDE to write Python scripts for DAGs and other items?
How would I go about doing this as I'm at a complete loss of how to get Intellij to register that Apache Airflow is a Library to utilize for Python code so I can write DAG files correctly within the IDE itself.
Any help would be much appreciated as I've been stuck on this aspect for the past couple of days searching for any kind of documentation to make this work.
Airflow is both application and library. In your case you are not trying to run the application but only looking to write DAGs so you need it just as a library.
You should just open a virtual environment (preferably) and run:
pip install apache-airflow
Then you can write DAGs using the library and Intellij will let you know if you are using wrong imports or deprecated objects.
When your DAG file is ready deploy it to the DAG folder on the machine where Airflow is running.

How to run a robot framework script from azure databricks notebook?

Is it possible to run robot framework test suite using azure data bricks notebook?.
I have a set of robot framework test suite, that uses database library, Operating System library etc.
In my local machine, I install python, pip install all necessary libraries and then run my robot code like
"Python -m robot filename.robot"
I want to do the same using azure notebooks, Is it possible?
Databricks supports 4 Default Language:
Python,
Scala,
SQL,
R
I was unable to find any documentation, which shows use of robot framework on databricks.
However, you can try running commands on Azure databricks which you tried on local machine.
Databricks is simply just a cloud infrastructure provider to run your spark workload with some add on capability.

Azure DevOps CI Pipeline for a pySpark project

I am trying to implement azure devops on few of my pyspark projects.
some of the projects are developed in pyCharm and some are in intelliJ with python API.
Below is the code structure commited in the git repository.
setup.py is the build file used to create .egg file.
I have tried few of the steps as shown below to create a build pipeline in the devops.
But the python installation part/execution part is failing with below error.
##[error]The process 'C:\hostedtoolcache\windows\Python\3.7.9\x64\python.exe' failed with exit code 1
I would prefer UI API for building and creating .egg files, If not possible YAML files.
Any leads appreciated!
I used Use Python step and then all works like a charm
Can you show details of Install Python step?
I have used the following steps and it succeeds!

Categories

Resources