How to run Python code periodically that connects to the internet - python

I'm working on a Python script that connects to the Twitter API to pull in some tweets into an array, then pushes this to a mysql database. It's a pretty basic script, but I'd like to set it up to run weekly.
I'd like to know the best way to deploy it so that it can automatically run weekly, so that I don't have to manually run it every week.

This depends on platform where you intend to run your python code. As martin says, this is not a python question and more of scheduling related question.

You can create a batch file that can activate python and run your script and then, use task scheduler to schedule youe batch file execution weekly

Related

How to get Windows scheduler and Batch File to run only the Master Git Branch

I have a python program doing a query and some data manipulation that is run every 5 minutes through a batch file and windows scheduler. I want to also start using Git and Github for better version control as the code base get's more advanced.
My question is, how do I set up the batch file to only run the code on the master branch? If I am in a separate branch doing some development, I assume that it will run the code in the branch that I am currently in.
Thanks! - Also, I would be open to suggestions on how to get away from windows scheduler... The program is taking the data and putting it into a CSV that is then being pulled into Tableau. So, if I could somehow get that process off of my windows scheduler that'd be nice but I suppose that's a separate question.

How can I schedule python script in the cloud?

I am developing a python script that downloads some excel files from a web service. These two files are combined with another one stored in my computer locally to produce the final file. This final file is loaded to some database and PowerBI dashboard to finally visualize data.
My question is: How can I schedule this to run it daily if my computer is turned off? As I said, two files are web scraped (so no problem to schedule) but one file is stored locally.
One solution that comes to my mind: Store the local file in Google Drive/OneDrive and download it with the API so my script is not dependent of my computer. But if this was the case, how can I schedule that? What service would you use? Heroku,...?
I am not entirely sure about your context, but I think you could look into using AWS Lambda for this. It is reasonably easy to set it up and also create a schedule for running code.
It is even easier to achieve this using the serverless framework. This link shows an example built with Python that will run on a schedule.
I am running the schedule package for exactly something like that.
It’s easy to setup and works very well.

Execute very long-running tasks using Google Cloud

I have been using Google CLoud for a few weeks now and I am facing a big problem for my limited GCP knowledge.
I have a python project whos goal is to "scrape" datas from a website using it's API. My project run a few tens of thousands of requests during executions and it can take very long (few hours, maybe more)
I have 4 python scripts in my project and it's all orchestrated by a bash script.
The execution is as follow :
The first script check a CSV file with all the instructions for the requests, and exeute the requests, save all the results from the requests in CSV files
Second script check the previously created CSV files and recreate an other CSV instruction file
The first script run again but with the new instructions and again save results in CSV files
Second script checks again and do the same again ...
... and so on a few times
Third script cleans the datas, delete duplicates and create an unique CSV file
Fourth script upload the final CSV file to bucket storage
Now I want to get ride of that bash script and I would like to automatize execution of thos scripts approx. once a week.
The problem here is the execution time. Here is what I already tested :
Google App Engine : The timeout of a request on GAE is limited to 10 minutes, and my functions can run for few hours. GAE is not usable here.
Google Compute Engine : My scripts will run max. 10-15 hours a week, keeping a compute engine up during all that time would be too pricey.
What could I do to automatize the execution of my scripts in a cloud environment ? What could be solutions I didn't though about, without changing my code ?
Thank you
A simple way to accomplish this without the need to get rid of the existing bash script that orchestrates everything would be:
Include the bash script on the startup script for the instance.
At the end of the bash script, include a shutdown command.
Schedule the starting of the instance using Cloud Scheduler. You'll have to make an authenticated call to the GCE API to start the existing instance.
With that, your instance will start on a schedule, it will run the startup script (that will be your existing orchestrating script), and it will shut down once it's finished.

Parallel Processing in Django

I get a file from the user. Once the file has been uploaded and saved, Now this file has to be analysed.
Since it is a huge file and analysis takes minimum 1 hour (say), I have a field in the model saying the status of the analysis as Analysing or Analysis Done.
The script for analysing is a separate python file and the analysis has to be done there.
How do I go about doing this? I want this script to run at the background. Also I have
to deploy in apache server.
How should I proceed?
Should I use threads? How do I go about using
external python scripts in threads?
I came to know about CronTabs, But I don't know
how can I implement in this situation.
I can't use Celery, since Celery has been stopped for
Windows
I came to know about Django Management
Commands. But since I deploy using an Apache
server, I don't know whether I can do that.
I can think of a few ways to solve this problem.
If you can batch the processing of the files, then you can run a cron job which will run a django command or a script at certain intervals to process the file.
If you can't batch the processing, you should look at other queuing systems like django-rq or you can build a simple queuing system using an event dispatch library.
If you really want to use celery what you can do is run your whole project inside a docker container so that you can use celery 4 since that is your requirement.

Deploying a Python Script on a Server (CentOS): Where to start?

I'm new to Python (relatively new to programing in general) and I have created a small python script that scrape some data off of a site once a week and stores it to a local database (I'm trying to do some statistical analysis on downloaded music). I've tested it on my Mac and would like to put it up onto my server (VPS with WiredTree running CentOS 5), but I have no idea where to start.
I tried Googling for it, but apparently I'm using the wrong terms as "deploying" means to create an executable file. The only thing that seems to make sense is to set it up inside Django, but I think that might be overkill. I don't know...
EDIT: More clarity
You should look into cron for this, which will allow you to schedule the execution of your Python script.
If you aren't sure how to make your Python script executable, add a shebang to the top of the script, and then add execute permissions to the script using chmod.
Copy script to server
test script manually on server
set cron, "crontab -e" to a value that will test it soon
once you've debugged issues set cron to the appropriate time.
Sounds like a job for Cron?
Cron is a scheduler that provides a way to run certain scripts (apps, etc.) at certain times.
Here is a short tutorial that explains how to set up cron.
See this for more general cron information.
Edit:
Also, since you are using CentOS: if you end up having issues with your script later on... it could partly be caused by SELinux. There are ways to disable SELinux on your server (if you have enough access permissions.) But... there are arguments against disabling SELinux, as well.

Categories

Resources