Automate Python Application in Visual Studio 2017 inside Azure - python

I am working on extracting Features from csv files and I use Python to perform the task. I am inside Azure and created a Python Application using Visual studio 2017. It works perfectly fine and i am looking for ways to automate the process so that it runs in batches per schedule.
I dont want to post it as a web job because the script has some references to the file in local disk of my VM. Could some one tell me the options available to run this solution in batch?

According to your description , I provide you with several ways as below to run your solution in batch.
1.Web Job
Actually , you can package Python script-dependent modules or references together and send them to webjob. Then you can find their absolute path on the KUDU and refer to them in your script, so this does not affect your use of webjob. For this process you can refer to the case I used to answer :Python libraries on Web Job.
Please note that Web Job can be executed per second at least.
2.Azure Scheduler
Azure Scheduler allows you to declaratively describe actions to run in the cloud. It then schedules and runs those actions automatically. You can periodically call your app script url. More details , please refer to the official tutorial.
Please note that Azure Scheduler can be executed per minute at least.
3.Azure Functions
Like the previous method , you can use Azure functions timer trigger to periodically call your app script url. More details , please refer to the official tutorial.
4.Azure Batch
Azure Batch schedules compute-intensive work to run on a managed collection of virtual machines, and can automatically scale compute resources to meet the needs of your jobs.Considering that Azure Batch is used in large data operations, the cost of combining your situation is relatively high and I don't suggesting you use.More details , please refer to the official tutorial.
Hope it helps you.

Related

How to limit python script so that it can't access local resources?

I am working on a project that allows users to upload a python script to an API and run it on a schedule. Currently, I'm trying to figure out a way to limit the functionality of the script so that it cannot access local files, mess with the flask server running the API, etc. Do you have any ideas on how I can achieve this? Is there anyway to make it so only specific libraries are available for importing?
Running other scripts on your server is serious security issue. If you are trying to deploy Python interpreter on your web application, you can try with something like judge0 - GitHub. It is free if you deploy it yourself and it will run scripts safely inside containers.
The simplest way is to ensure the user running the script is not root, but a user specifically designed for this task (e.g. part of a group that can only read and not write or execute). This means at minimum you should ensure all files have the appropriate mode. Then you can just use a pipe or something to run the script.
Alternatively, you could use a runtime that’s not “local”, like a VM or compute service (AWS lambda, etc). The latter would be simplest, and there’s lots of vendors who offer compute service with programmatic api.

How can I schedule python script in the cloud?

I am developing a python script that downloads some excel files from a web service. These two files are combined with another one stored in my computer locally to produce the final file. This final file is loaded to some database and PowerBI dashboard to finally visualize data.
My question is: How can I schedule this to run it daily if my computer is turned off? As I said, two files are web scraped (so no problem to schedule) but one file is stored locally.
One solution that comes to my mind: Store the local file in Google Drive/OneDrive and download it with the API so my script is not dependent of my computer. But if this was the case, how can I schedule that? What service would you use? Heroku,...?
I am not entirely sure about your context, but I think you could look into using AWS Lambda for this. It is reasonably easy to set it up and also create a schedule for running code.
It is even easier to achieve this using the serverless framework. This link shows an example built with Python that will run on a schedule.
I am running the schedule package for exactly something like that.
It’s easy to setup and works very well.

AWS CLOUD9: How can I automate a task to be run on daily basis?

I've created a python script that grabs information from an API and sends it in an email. I'd like to automate this process so to run on daily basis let's say at 9AM.
The servers must be asleep when they will not be running this automation.
What is the easiest way to achieve this?
Note: Free version of AWS.
cloud9 is the ide that lets you write, run, and debug your code with just a browser.
"It preconfigures the development environment with all the SDKs, libraries, and plug-ins needed for serverless development. Cloud9 also provides an environment for locally testing and debugging AWS Lambda functions. This allows you to iterate on your code directly, saving you time and improving the quality of your code."
okay for the requirement you have posted :-
there are 2 ways of achieving this
on a local system use cron job scheduler daemon to run the script. a tutorial for cron.tutorial for cron
same thing can also be achieved by using a lambda function. lambda only runs when it is triggered, using compute resources for that particular time when it is invoked so your servers are sleeping for the rest of time( technically you are not provisioning any server for lambda)
convert your script in a function for lambda. and then use event bridge service where you can specify a corn expression to run your script everyday at 9am. wrote an article on the same may it can help.
note :- for email service you can use ses https://aws.amazon.com/ses/. my article uses ses.
To schedule Events you'd need a Lambda function with Cloudwatch events such as follow. https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
Cloud9 is an IDE.

Run Python script on Azure and save to SQL database

We have just signed up with Azure and were wondering how to schedule and run Python scripts that extract data from various sources like APIs, web scrape scripts, etc. What is the best tool on Azure that can run and schedule those scripts as well as save to target destination.
The output of the scripts will be saved to either data lakes and/or azure sql database.
Thank you.
There're several services in azure can do this task.
I suggest you can take use of azure webjobs(it supports python as well as support running as per schedule).
The rough guidelines are as below:
1.Develop your python scripts locally, make sure it can work locally(like extract data from other sources, save to azure database).
2.In azure portal, Create a scheduled WebJob. During creation, you need to upload the .py file(zip all the files into a .zip file); For "Type", please select "Triggered"; in the Triggers dropdown, select "Scheduled"; then specify at which time to run the .py file by using CRON Expression.
3.It's done.
You can also consider other azure services like azure function with time trigger. But the webjob is much more easier.
Hope it helps, and also please let me know if you still have more issues about that.

How to use the DockerOperator from Apache Airflow

This question is related to understanding a concept regarding the DockerOperator and Apache Airflow, so I am not sure if this site is the correct place. If not, please let me know where I can post it.
The situation is the following: I am working with a Windows laptop, I have a developed very basic ETL pipeline that extracts data from some server and writes the unprocessed data into a MongoDB on a scheduled basis with Apache-Airflow. I have a docker-compose.yml file with three services: A mongo service for the MongoDB, a mongo-express service as admin tool for the MongoDB, a webserver service for Apache-Airflow and a postgres service as database backend for Apache-Airflow.
So far, I have developed some Python code in functions and these functions are being called by the Airflow instance using the PythonOperator. Since debugging is very difficult using the PythonOperator, I want to try the DockerOperator now instead. I have been following this tutorial which claims that using the DockerOperator, you can develop your source code independent of the operating system the code will later be executed on due to Docker's concept 'build once, run everywhere'.
My problem is that I didn't fully understand all necessary steps needed to run code using the DockerOperator. In the tutorial, I have the following questions regarding the Task Development and Deployment:
Package the artifacts together with all dependencies into a Docker image. ==> Does this mean that I have to create a Dockerfile for every task and then build an image using this Dockerfile?
Expose an Entrypoint from your container to invoke and parameterize a task using the DockerOperator. ==> How do you do this?
Thanks for your time, I highly appreciate it!
Typically you're going to have a Docker image that handles one type of task. So for any one pipeline you'd probably be using a variety of different Docker images, one different one for each step.
There are a couple of considerations here in regards to your question which is specifically around deployment.
You'll need to create a Docker image. You likely want to add a tag to this as you will want to version the image. The DockerOperator defaults to the latest tag on an image.
The image needs to be available to your deployed instance of Airflow. They can be built on the machine you're running Airflow on if you're wanting to run it locally. If you've deployed Airflow somewhere online, the more common practice would be to push them to a cloud service. There are a number of providers you can use (Docker Hub, Amazon ECR, etc...).
Expose an Entrypoint from your container to invoke and parameterize a task using the DockerOperator. ==> How do you do this?
If you have your image built, and is available to Airflow you simply need to create a task using the DockerOperator like so:
dag = DAG(**kwargs)
task_1 = DockerOperator(
dag=dag,
task_id='docker_task',
image='dummyorg/dummy_api_tools:v1',
auto_remove=True,
docker_url='unix://var/run/docker.sock',
command='python extract_from_api_or_something.py'
)
I'd recommend investing some time into understanding Docker. It's a little bit difficult to wrap your head around at first but it's a highly valuable tool, especially for systems like Airflow.

Categories

Resources