I am developing a python script that downloads some excel files from a web service. These two files are combined with another one stored in my computer locally to produce the final file. This final file is loaded to some database and PowerBI dashboard to finally visualize data.
My question is: How can I schedule this to run it daily if my computer is turned off? As I said, two files are web scraped (so no problem to schedule) but one file is stored locally.
One solution that comes to my mind: Store the local file in Google Drive/OneDrive and download it with the API so my script is not dependent of my computer. But if this was the case, how can I schedule that? What service would you use? Heroku,...?
I am not entirely sure about your context, but I think you could look into using AWS Lambda for this. It is reasonably easy to set it up and also create a schedule for running code.
It is even easier to achieve this using the serverless framework. This link shows an example built with Python that will run on a schedule.
I am running the schedule package for exactly something like that.
It’s easy to setup and works very well.
Related
I am working on a project that allows users to upload a python script to an API and run it on a schedule. Currently, I'm trying to figure out a way to limit the functionality of the script so that it cannot access local files, mess with the flask server running the API, etc. Do you have any ideas on how I can achieve this? Is there anyway to make it so only specific libraries are available for importing?
Running other scripts on your server is serious security issue. If you are trying to deploy Python interpreter on your web application, you can try with something like judge0 - GitHub. It is free if you deploy it yourself and it will run scripts safely inside containers.
The simplest way is to ensure the user running the script is not root, but a user specifically designed for this task (e.g. part of a group that can only read and not write or execute). This means at minimum you should ensure all files have the appropriate mode. Then you can just use a pipe or something to run the script.
Alternatively, you could use a runtime that’s not “local”, like a VM or compute service (AWS lambda, etc). The latter would be simplest, and there’s lots of vendors who offer compute service with programmatic api.
We have just signed up with Azure and were wondering how to schedule and run Python scripts that extract data from various sources like APIs, web scrape scripts, etc. What is the best tool on Azure that can run and schedule those scripts as well as save to target destination.
The output of the scripts will be saved to either data lakes and/or azure sql database.
Thank you.
There're several services in azure can do this task.
I suggest you can take use of azure webjobs(it supports python as well as support running as per schedule).
The rough guidelines are as below:
1.Develop your python scripts locally, make sure it can work locally(like extract data from other sources, save to azure database).
2.In azure portal, Create a scheduled WebJob. During creation, you need to upload the .py file(zip all the files into a .zip file); For "Type", please select "Triggered"; in the Triggers dropdown, select "Scheduled"; then specify at which time to run the .py file by using CRON Expression.
3.It's done.
You can also consider other azure services like azure function with time trigger. But the webjob is much more easier.
Hope it helps, and also please let me know if you still have more issues about that.
I am running a python script on Heroku which runs every 10 minutes using the Heroku-Scheduler add-on. The script needs to be able to access the last time it was run. On my local machine I simply used a .txt file which I had update whenever the program was run with the "last run time". The issue is that Heroku doesn't save any file changes when a program is run so the file doesn't update on Heroku. I have looked into alternatives like Amazon S3 and Postgresql, but these seem like major overkill for storing one line of text. Are there any simpler alternatives out there?
If anyone has a similar problem, I ended up finding out about Heroku-redis which allows you to make key-value pairs and then access them. (Like a Cloud Based Python Dictionary)
I've made a PowerApps app which uploads an image to SharePoint. When Flow detects that this image is uploaded, I want to run a custom script that can interact with the Excel file. PowerShell should accomplish that, but I'm completely lost when it comes to running the PowerShell code from Flow.
My goal is to use an Excel macro to combine the image and an Excel file that is stored in the same location in SharePoint. PowerShell will execute the macro and delete the picture after.
I've found this guide "https://flow.microsoft.com/en-us/blog/flow-of-the-week-local-code-execution/", but I don't think it will work for me as the app will be running on more devices than just my local computer.
What technology can I use to run code using Flow as a trigger? The code must have access to a specific SharePoint site as well.
I believe you can create an Azure Function that will execute PowerShell. This will execute from the cloud rather than on your local machine.
I'd also like to add a solution that worked great for me: Using Flow to send HTTP requests to a REST API!
I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety on my local machine. Is there a convenient way to deploy this to a simple web server? The script basically downloads webpages into text files. How would this be best accomplished?
Thanks!
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy framework - it is a very fast and easy to use web-scraping framework. scrapyd tool would allow you to distribute the crawling - you can have multiple scrapyd services running on different servers and split the load between each. See:
Distributed crawls
Running Scrapy on Amazon EC2
There is also a Scrapy Cloud service out there:
Scrapy Cloud bridges the highly efficient Scrapy development
environment with a robust, fully-featured production environment to
deploy and run your crawls. It's like a Heroku for Scrapy, although
other technologies will be supported in the near future. It runs on
top of the Scrapinghub platform, which means your project can scale on
demand, as needed.
As an alternative to the solutions already given, I would suggest Heroku. You can not only deploy easily a website, but also scripts for bots to run.
Basic account is free and is pretty flexible.
This blog entry, this one and this video contain practical examples of how to make it work.
There are multiple places where you can do that. Just google for "python in the cloud", you will come up with a few, for example https://www.pythonanywhere.com/.
In addition, there are also several cloud IDEs that essentially give you a small VM for free where you can develop your code in a web-based IDE and also run it in the VM, one example is http://www.c9.io.
In 2021, Replit.com makes it very easy to write and run Python in the cloud.
If you have a google e-mail account you have an access to google drive and utilities. Choose for colaboratory (or find it in more... options first). This "CoLab" is essentially your python notebook on google drive with full access to your files on your drive, also with access to your GitHub. So, in addition to your local stuff you can edit your GitHub scripts as well.