Kubeflow example: understanding the context

Kubeflow example: understanding the context - python

In this KF example https://github.com/kubeflow/examples/blob/master/financial_time_series/tensorflow_model/ml_pipeline.py an ML pipeline gets constructed that triggers Python functions via command line.
This means that all .py files that are being called (e.g. "python3 train.py --param value") should be in the directory where the process runs. What I don't understand is where exactly should I put the .py files in the context of GCP.
Should I just copy them using Cloud shell?
Or should I add git clone <repository with .py files> into my Dockerfile?

To kickstart KFP development using python, try the following tutorial: Data passing in python components
Should I just copy them using Cloud shell? Or should I add git clone <repository with .py files> into my Dockerfile?
Ideally, the files should be inside the container image (the Dockerfile method). This ensures maximum reproducibility.
For not very complex python scripts, the Lightweight python component feature allows you to create component from a python function. In this case the script code is store in the component command-line, so you do not need to upload the code anywhere.
Putting scripts somewhere remote (e.g. cloud storage or website) is possible, but can reduce reliability and reproducibility.

Related

run tests on a lambda container image?

I'm using Lambda container images to package complicated libraries like opencv and pdf2image in Python.
Is there a way to run unit tests against it so I can get a code coverage for tools like Sonar?
With normal code, I could do the following:
python -m unittest -v
But not sure how to do that if the code is inside a container image.
I'm using bitbuckett pipelines as well.

In order to run Unit tests inside a container there are different solutions, and they mostly depends on where you are going to run the image.
Assuming you are going to run the image locally and assuming you are proficient with docker, you could build your image including development dependencies and then you could modify the entrypoint of the image accordingly in order to run your unit test suite. I suggest to bind a path to the local host in order to be able to retrieve possible junit.xml (or any test report) files. This way, you just do:
docker run --entrypoint="/path/to/my/ut/runner" -v /tmp/testout:/container/path myimage
Assuming you want to run the lambda remotely I suggest either to bind it to any APIGateway and to perform a remote API call (maybe create an endpoint just for development purposes that returns the test report), or to use additional tools like Terratest to invoke the lambda remotely without any additional infrastructure (note that this require using Terraform). Finally note that AWS Serverless Application Model documentation gives additional examples on how you could run your tests inside a lambda.

Move python code from Azure Devops repo to windows Azure VM

We have a python application on a window azure VM that reads data from an API and loads the data into an onprem DB. The application works just fine and the code is source controlled in an Azure devops repo. The current deployment process is for someone to pull the main branch and copy the application from their local machine to c:\someapplication\ on the STAGE/PROD server. We would like to automate this process. There are a bunch of tutorials on how to do a build for API and Web applications which require your azure subscription and app name (which we dont have). My two questions are:
is there a way to do a simple copy to the c:\someapplication folder from azure devops for this application
if there is, would the copy solution be the best solution for this or should I consider something else?
Should i simply clone the main repo to each folder location above and then automate the git pull via the azure pipeline? Any advice or links would be greatly appreciated.

According to your description, you could try to use the CopyFiles#2 task and set the local folder as shared folder, so that use it as TargetFolder. The target folder or UNC path that will contain the copied files.
YAML like:
- task: CopyFiles#2
inputs:
SourceFolder:$(Build.SourcesDirectory) # string. Source Folder.
Contents: '**' # string. Required. Contents. Default: **.
TargetFolder: 'c:\\someapplication' # string. Required. Target Folder.
Please check if it meets your requirements.

How do I upload my own binary (Python module) as a resource for my Kubernetes application?

I have my own Python module (an '.so' file that I'm able to import locally) that I want to make available to my application running in Kubernetes. I am wholly unfamiliar with Kubernetes and Helm, and the documentation and attempts I've made so far haven't gotten me anywhere.
I looked into ConfigMaps, trying kubectl.exe create configmap mymodule --from-file=MyModule.so, but kubectl says "Request entity too large: limit is 3145728". (My binary file is ~6mb.) I don't know if this is even the appropriate way to get my file there. I've also looked at Helm Charts, but I see nothing about how to package up a file to upload. Helm Charts look more like a way to configure deployment of existing services.
What's the appropriate way to package up my file, upload it, and use it within my application (ensuring that Python will be able to import MyModule successfully when running in my AKS cluster)?

The python module should be added to the container image that runs in Kubernetes, not to Kubernetes itself.
The current container image running in Kubernetes has a build process, usually controlled by a Dockerfile. That container image is then published to an image repository where the container runtime in Kubernetes can pull the image in from and run the container.
If you don't currently build this container, you may need to create your own build process to add the python module to the existing container. In a Dockerfile you use the FROM old/image:1.7.1 and then add your content.
FROM old/image:1.7.1
COPY MyModule.so /app/
Publish the new container image to an ECR (Elastic Container Registry) so it is available for use in your AKS cluster.
The only change you might need to make in Kubernetes then is to set the image for the deployment to the newly published image.
Here is a simple end to end guide for a python application.

Kubernetes is a container orchestration engine. A binary / package from that perspective is part of the application running within a container. As such, normally you would add your package to the container during its build process and then deploy new version of the container.
Details will depend on your application and build pipeline, but generally in some way (using package repository or copying the package manually) you would make the package available on the build agent and then copy it into the container and install within the dockerfile.

Simple Google Cloud deployment: Copy Python files from Google Cloud repository to app engine

I'm implementing continuous integration and continuous delivery for a large enterprise data warehouse project.
All the code reside in Google Cloud Repository and I'm able to set up Google Cloud Build trigger, so that every time code of specific file type (Python scripts) are pushed to the master branch, a Google Cloud build starts.
The Python scripts doesn't make up an app. They contain an ODBC connection string and script to extract data from a source and store it as a CSV-file. The Python scripts are to be executed on a Google Compute Engine VM Instance with AirFlow installed.
So the deployment of the Python scripts is as simple as can be: The .py files are only to be copied from the Google Cloud repository folder to a specific folder on the Google VM instance. There is not really a traditionally build to run, as all the Python files are separate for each other and not part of an application.
I thought this would be really easy, but now I have used several days trying to figure this out with no luck.
Google Cloud Platform provides several Cloud Builders, but as far as I can see none of them can do this simple task. Using GCLOUD also does not work. It can copy files but only from local pc to VM not from source repository to VM.
What I'm looking for is a YAML or JSON build config file to copy those Python files from source repository to Google Compute Engine VM Instance.
Hoping for some help here.

The files/folders in the Google Cloud repository aren't directly accessible (it's like a bare git repository), you need to first clone the repo then copy the desired files/folders from the cloned repo to their destinations.
It might be possible to use a standard Fetching dependencies build step to clone the repo, but I'm not 100% certain of it in your case, since you're not actually doing a build:
steps:
- name: gcr.io/cloud-builders/git
args: ['clone', 'https://github.com/GoogleCloudPlatform/cloud-builders']
If not you may need one (or more) custom build steps. From Creating Custom Build Steps:
A custom build step is a container image that the Cloud Build worker
VM pulls and runs with your source volume-mounted to /workspace.
Your custom build step can execute any script or binary inside the
container; as such, it can do anything a container can do.
Custom build steps are useful for:
Downloading source code or packages from external locations
...

What source bundle file types are supported?

I'm deploying an app to AWS Elastic Beanstalk using the API:
https://elasticbeanstalk.us-east-1.amazon.com/?ApplicationName=SampleApp
&SourceBundle.S3Bucket=amazonaws.com
&SourceBundle.S3Key=sample.war
...
My impression from reading around a bit is that Java deployments use .war, .zips are supported (docs) and that one can use .git (but only with PHP or using eb? doc).
Can I use the API to create an application version from a .git for a Python app? Or are zips the only type supported?
(Alternatively, can I git push to AWS without using the commandline tools?)

There are two ways to deploy to AWS:
The API backend, where it is basically a .zip file referenced from S3. When deploying, the Instance will unpack and run some custom scripts (which you can override from your AMI, or via Custom Configuration Files, which are the recommended way). Note that in order to create and deploy a new version in an AWS Elastic Beanstalk Environment, you need three calls: upload to s3, Create Application Version, and UpdateEnvironment.
The git endpoint, which works like this:
You install the AWS Elastic Beanstalk DevTools, and run a setup script on your git repo
When ran, the setup script patches your .git/config in order to support git aws.push and in particular, git aws.remote (which is not documented)
git aws.push simply takes your keys, builds a custom URL (git aws.remote), and does a git push -f master
Once AWS receives this (url is basically <api>/<app>/<commitid>(/<envname>), it creates the s3 .zip file (from the commit contents), then the application version on <app> for <commitid> and if <envname> is present, it also issues a UpdateEnvironment call. Your AWS ids are hashed and embedded into the URL just like all AWS calls, but sent as username / password auth tokens.
(full reference docs)
I've ported that as a Maven Plugin a few months ago, and this file show how it is done in plain Java. It actually covers a lot of code (since it actually builds a custom git repo - using jgit, calculates the hashes and pushes into it)
I'm strongly considering backporting as a ant task, or perhaps simply make it work without a pom.xml file present, so users only use maven to do the deployment.
Historically, only the first method was supported, while the second grew up in importance. Since the second is actually far easier (in beanstalk-maven-plugin, you have to call three different methods while a simply git push does all the three), we're supporting git-based deployments, and even published an archetype for it (you see a sample project here, especially the README.md in particular).
(btw, if you're using .war files, my elastic beanstalk plugin support both ways, and we're actually in favor of git, since it allows us to some incremental deployments)
So, you wanna still implement it?
There are three files I suggest you read:
FastDeployMojo.java is the main façade
RequestSigner does the real magic
This is a testcase for RequestSigner
Wanna do in
Python? I'd look for Dulwich
C#? The powershell version is based in it
Ruby? The Linux is based on it
Java? Use mine, it uses jgit

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.