Azure DevOps CI Pipeline for a pySpark project

Azure DevOps CI Pipeline for a pySpark project - python

I am trying to implement azure devops on few of my pyspark projects.
some of the projects are developed in pyCharm and some are in intelliJ with python API.
Below is the code structure commited in the git repository.
setup.py is the build file used to create .egg file.
I have tried few of the steps as shown below to create a build pipeline in the devops.
But the python installation part/execution part is failing with below error.
##[error]The process 'C:\hostedtoolcache\windows\Python\3.7.9\x64\python.exe' failed with exit code 1
I would prefer UI API for building and creating .egg files, If not possible YAML files.
Any leads appreciated!

I used Use Python step and then all works like a charm
Can you show details of Install Python step?

I have used the following steps and it succeeds!

Related

How do I import Apache Airflow into Intellij?

So I'm trying to get my Intellij to see Apache Airflow that I downloaded. The steps I've taken so far:
I've downloaded the most recent Apache Airflow setup and saved the apache airflow 2.2.3 onto my desktop. I'm trying to get it to work with my Intellij, I've tried adding the Apache Airflow folder into the Library and Modules, both have come back with errors stating it's not being utilized. I've tried looking up documentation on it within Airflow but I'm not able to find any documentation on how to implement in your own IDE to write Python scripts for DAGs and other items?
How would I go about doing this as I'm at a complete loss of how to get Intellij to register that Apache Airflow is a Library to utilize for Python code so I can write DAG files correctly within the IDE itself.
Any help would be much appreciated as I've been stuck on this aspect for the past couple of days searching for any kind of documentation to make this work.

Airflow is both application and library. In your case you are not trying to run the application but only looking to write DAGs so you need it just as a library.
You should just open a virtual environment (preferably) and run:
pip install apache-airflow
Then you can write DAGs using the library and Intellij will let you know if you are using wrong imports or deprecated objects.
When your DAG file is ready deploy it to the DAG folder on the machine where Airflow is running.

Azure devops for Python

I am trying to create an azure ci/cd pipeline for my python application. I have tried many ways but not get success. I can create CI successfully even in some cases cd also but not able to see the output on the azure app service.
I use Linux app services that use the Python 3.7 version.
I can create ci-cd successfully using the YAML file but I want to create using the classic editor without YAML, as I have some restrictions using yaml.

I will post the steps I deploy a simple hello world project with DevOps CI/CD pipeline.
1. Create pipeline:
2. Create Release pipeline:
3. Save and queue your pipeline, the release pipeline would be triggered. Here is the file structure on Azure KUDU:

Unable to execute scala code on Azure DataBricks cluster

I am trying to setup a Development environment for DataBricks, So my developers can write code using VSCODE IDE(or some other IDE) and execute the code against the DataBricks Cluster.
So I went through the Documentation of DataBricks Connect and did the setup as suggested in the document.
https://docs.databricks.com/dev-tools/databricks-connect.html#overview
Post the Setup I am able to execute python code on Azure DataBricks cluster, but not with Scala code
While Running the setup I found that it is saying Skipping scala command test on windows, I am not sure whether I am missing some configuration here.
Please suggest how to resolve this issue.

This is not an error but just a statement that says databricks-connect test is skipping testing scala code on windows you can still execute code from local machine on cluster using databricks-connect, you need to add the jars from databricks-connect get-jar-dir directory to your project structure in IDE as described in this documentation steps https://learn.microsoft.com/en-us/azure/databricks/dev-tools/databricks-connect#intellij-scala-or-java
Also note that when using azure databricks you enter a generic Databricks Host along with your workspace id(org-id) when you execute databricks-connect configure
eg- https://westeurope.azuredatabricks.net/o=?xxxx instead of https://adb-xxxx.yz.azuredatabricks.net

Deploy python deb test project using make-deb and dh-virtualenv

In development I use Anaconda to manage environments. I have not yet developed a python project for production. In this context I have two related questions.
First, which solution incurs lower technical debt: A. Install Anaconda on production servers; or B. deploy python as deb packages?
Second, what is the simplest structure of python project folders and files to test the functionality of make-deb and dh-virtualenv as described in the last section of Nylas blog article?
Nylas blog (How We Deploy Python Code: Building, packaging & deploying Python using versioned artifacts in Debian packages)
https://www.nylas.com/blog/packaging-deploying-python/
Make-deb:
https://github.com/nylas/make-deb
dh-virtualenv:
https://github.com/spotify/dh-virtualenv
Package Python Application for Linux (Link Added November 2022):
https://opensource.com/article/20/4/package-python-applications-linux
For a test I would only add Requests package to a standard Python 2.7 environment and write one module to download and save a small csv file. Then I would like to test make-deb and dh-virtualenv to deploy to a cloud server or Raspberry pi server. Then I want to run code to verify the download app works as expected on the server. Then I want to further develop the application and test deployment tools using make-deb and dh-virtualenv to see if I can more effectively manage development for production.
Edit: Based on some further research so far it appears Anaconda cannot be made to export a requirements.txt file. It appears the options are to use virtualenv, make-deb, and dh-virtualenv; or to use Anaconda and Miniconda roughly as described in the following blog articles:
https://tdhopper.com/blog/2015/Nov/24/my-python-environment-workflow-with-conda/
https://www.thoughtvector.io/blog/deployment-with-anaconda/

Github repo with more than one python packages

I have created a Github repo in which I keep all the code for my project.
The structure is:
myproject
\
- package api
- package database
- package feature
The api package is responsible for communicating with external apis like itunes api.
The database package is responsible for communicating with my database.
Finally the feature package is the actual project i am building.
Each package has its own setup.py.
I have three problems with this structure:
how can i add the dependencies of api and database in the feature setup.py?
How would you recommend i deploy this python code in Amazon? Using docker? Platter? Something else?
If we assume that more features will be added in the feature as separate packages. How can i deploy only a subset of the code in the server? Lets say package api along with another feature that uses it.
Let me know if my questions are not clear and I will refine them.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.