I am trying to figure out the best way to run a python process typically taking 10-30 (max an hour ish) minutes on my local machine. The process will be manually triggered, and may not be triggered for ours or days.
I am a bit confused, because I read official ms-docs stating that one should avoid long running processes in function apps (https://learn.microsoft.com/en-us/azure/azure-functions/performance-reliability#avoid-long-running-functions) but at the same time, the functionTimeout for the Premium and Dedicated plans can be unlimited.
I am hesitant to use a standard web app with an API since it seems overkill to have it running 24/7.
Are there any ideal resources for this?
you can use consumption based Azure durable functions, they can run for hours or even days.
I am developing a reporting service (i.e. Database reports via email) for a project on Google App Engine, naturally using the Google Cloud Platform.
I am using Python and Django but I feel that may be unimportant to my question specifically. I want to be able to allow users of my application schedule specific cron reports to send off at specified times of the day.
I know this is completely possible by running a cron on GAE on a minute-by-minute basis (using cron.yaml since I'm using Python) and providing the logic to determine which reports to run in whatever view I decide to make the cron hit, but this seems terribly inefficient to me, and seeing as the best answer I have found suggests doing the same thing (Adding dynamic cron jobs to GAE), I wanted an "updated" suggestion.
Is there at this point in time a better option than running a cron every minute and checking a DB full of client entries to determine which report to fire off?
You may want to have a look at the new Google Cloud Scheduler service (in beta at the moment), which is a fully managed cron job service. It allows you to create cron jobs programmatically via its REST API. So you could create a specific cron job per customer with the appropriate schedule to fit you needs.
Given this limit, my guess would be NO
Free applications can have up to 20 scheduled tasks. Paid applications can have up to 250 scheduled tasks.
https://cloud.google.com/appengine/docs/standard/python/config/cronref#limits
Another version of your minute-by-minute workaround would be a daily cron task that finds everyone that wants to be launched that day, and then use the _eta argument to pinpoint the precise moment in each day for each task to launch.
I am creating a web app made of python flask.. What I want is if I created an invoice it automatically send an e-mail on its renewal date to notify that the invoice is renewed and so on.. Please take note that renewal occurs every 3 months The code is something like:
def some_method(): // An API Enpoint that adds an invoice from a request
// Syntax that inserts the details to the database
// Let's assume that the the start of the renewal_date is December 15, 2016
How will the automatic email execution be achieved? Without putting too much stress in the backend? Because I'm guessing that if there are 300 invoices then the server might be stressed out
"Write it yourself." No, really.
Keep a database table of invoices to be sent. Every invoice has a status (values such as pending, sent, paid, ...) and an invoice date (which may be in the future). Use cron or similar to periodically run (e.g. 1/hour, 1/day) a program that queries the table for pending invoices for which the invoice date/time has arrived/passed, yet no invoice has yet been sent. This invoicing process sends the invoice, updates the status, and finishes. This invoicer utility will not be integral to your Flask web app, but live beside it as a support program.
Why? It's simple and direct approach. It keeps the invoicing code in Python, close to your chosen app language and database. It doesn't require much excursion into external systems or middleware. It's straightforward to debug and monitor, using the same database, queries, and skills as writing the app itself. Simple, direct, reliable, done. What's not to love?
Now, I fully understand that a "write it yourself" recommendation runs contrary to typical "buy not build" doctrine. But I have tried all the main alternatives such as cron and Celery; my experience with a revenue-producing web app says they're not the way to go for a few hundred long-term invoicing events.
The TL;DR--Why Not Cron and Celery?
cron and its latter-day equivalents (e.g. launchd or Heroku Scheduler) run recurring events. Every 10 minutes, every hour, once a day, every other Tuesday at 3:10am UTC. They generally don't solve the "run this once, at time and date X in the future" problem, but they are great for periodic work.
Now that last statement isn't strictly true. It describes cron and some of its replacements. But even traditional Unix provides at as side-car to cron, and some cron follow-ons (e.g. launchd, systemd) bundle recurring and future event scheduling together (along with other kitchen appliances and the proverbial sink). Evens so, there are some issues:
You're relying on external scheduling systems. That means another interface to learn, monitor, and debug if something goes wrong. There's significant impedance mismatch between those system-level schedulers and your Python app. Even if you farm out "run event at time X," you still need to write the Python code to send the invoice and properly move it along your business workflow.
Those systems, beautiful for a handful of events, generally lack interfaces that make reviewing, modifying, monitoring, or debugging hundreds of outstanding events straightforward. Debugging production app errors amidst an ocean of scheduled events is harrowing. You're talking about committing 300+ pending events to this external system. You must also consider how you'll monitor and debug that use.
Those schedulers are designed for "regular" not "high value" or "highly available" operations. As just one gotcha, what if an event is scheduled, but then you take downtime (planned or unplanned)? If the event time passes before the system is back up, what then? Most of the cron-like schedulers lack provisions for robustly handling "missed" events or "making them up at the earliest opportunity." That can be, in technical terms, "a bummer, man." Say the event triggered money collection--or in your case, invoice issuance. You have hundreds of invoices, and issuing those invoices is presumably business-critical. The capability gaps between system-level schedulers and your operational needs can be genuinely painful, especially as you scale.
Okay, what about driving those events into an external event scheduler like Celery? This is a much better idea. Celery is designed to run large number of app events. It supports various backend engines (e.g. RabbitMQ) proven in practice to handle thousands upon untold thousands of events, and it has user interface options to help deal with event multitudes. So far so good! But:
You will find yourself dealing with the complexities of installing, configuring, and operating external middleware (e.g. RabbitMQ). The effort yields very high achievable scale, but the startup and operational costs are real. This is true even if you farm much of it to a cloud service like Heroku.
More important, while great as a job dispatcher for near-term events, Celery is not ideal as a long-wait-time scheduler. In production I've seen serious issues with "long throw" events (those posted a month, or in your case three months, in the future). While the problems aren't identical, just like cron etc., Celery long-throw events intersect ungracefully with normal app update and restart cycles. This is environment-dependent, but happens on popular cloud services like Heroku.
The Celery issues are not entirely unsolvable or fatal, but long-delay events don't enjoy the same "Wow! Celery made everything work so much better!" magic that you get for swarms of near-term events. And you must become a bit of a Celery, RabbitMQ, etc. engineer and caretaker. That's a high price and a lot of work for just scheduling a few hundred invoices.
In summary: While future invoice scheduling may seem like something to farm out, in practice it will be easier, faster, and more immediately robust to keep that function in your primary app code (not directly in your Flask web app, but as an associated utility), and just farm out the "remind me to run this N times a day" low-level tickler to a system-level job scheduler.
You can use crontab in Linux, The syntax look like this
crontab -e
1 2 3 4 5 /path/to/command arg1 arg2
Or maybe you can have a look at Celery, Which I think is a good tool to handle Task Queue, and you may find something useful here.celery.schedules
EDIT
Schedule Tasks on Linux Using Crontab
HowTo: Add Jobs To cron Under Linux or UNIX?
How to Schedule Tasks on Linux: An Introduction to Crontab Files
If I understand correctly, you'll want to get the date when you generate the invoice, then add 3 months (90 days). You can use datetime.timedelta(days=90) in Python for this. Take a look at: Adding 5 days to a date in Python.
From there, you could theoretically spawn a thread with Threading.timer() (as seen here: Python - Start a Function at Given Time), but I would recommend against using Python for this part because, as you mention, it would put undue stress on the server (not to mention if the server goes down, you lose all your scheduling).
Option A (Schedule a task for each invoice):
What would be better is using the OS to schedule a task in the future. If your backend is Linux-based, Cron should work nicely. Take a look at this for ideas: How to setup cron to run a file just once at a specific time in future?. Personally, I like this answer, which suggests creating a file in /etc/cron.d for each task and having the script delete its own file when it has finished executing.
Option B (Check daily if reminders should be sent):
I know it's not what you asked, but I'll also suggest it might be cleaner to handle this as a daily task. You can schedule a daily cron job like this:
0 22 * * * /home/emailbot/bin/send_reminder_emails.py
So, in this example, at min 0, hour 22 (10pm) every day, every month, every day-of-the-week, check to see if we should send reminder emails.
In your send_reminder_emails.py script, you check a record (could be a JSON or YML file, or your database, or a custom format) for any reminders that need to be sent "today". If there's none, the script just exits, and if there is, you send out a reminder to each person on the list. Optionally, you can clean up the entries in the file as the reminders expire, or periodically.
Then all you have to do is add an entry to the reminder file every time an invoice is generated.
with open("reminder_list.txt", "a") as my_file:
my_file.write("Invoice# person#email.com 2016-12-22")
An added benefit of this method is that if your server is down for maintenance, you can keep entries and send them tomorrow by checking if the email date has passed datetime.datetime.now() >= datetime(2016,12,22). If you do that, you'll also want to keep a true/false flag that indicates whether the email has already been sent (so that you don't spam customers).
I have a web application running on django, wherein an end user can enter a URL to process. All the processing tasks are offloaded to a celery queue which sends notification to the user when the task is completed.
I need to stress test this app with the goals.
to determine breaking points or safe usage limits
to confirm intended specifications are being met
to determine modes of failure (how exactly a system fails)
to test stable operation of a part or system outside standard usage
How do I go about writing my script in Python, given the fact that I also need to take the offloaded celery tasks into account as well.
I think there is no need to write your own stress testing script.
I have used www.blitz.io for stress testing. It is set up in minutes, easy to use and it makes beautiful graphs.
It has a 14 day trial so you just can test the heck out of your system for 14 days for free. This should be enough to find all your bottlenecks.
I have a [python] AppEngine app which creates multiple tasks and adds them to a custom task queue. dev_appserver.py seems to ignore the rate/scheduling parameters I specify in queue.yaml and executes all the tasks immediately. This is a problem [as least for dev/testing purposes] as my tasks call a rate-throttled url; immediate execution of all tasks breaches the throttling limits and returns me a bunch of errors.
Does anyone know if task scheduling if dev_appserver.py is disabled ? I can't find anything that suggests this in the AppEngine docs. Can anyone suggest a workaround ?
Thank you.
When your app is running in the development server, tasks are automatically executed at the appropriate time just as in production.
You can examine and manipulate tasks from the developer console:
http://localhost:8080/_ah/admin/taskqueue
Documentation here
The documentation lies: the development server doesn't appear to support rate limiting. (This is documented for the Java dev server, but not for Python). You can demonstrate this by pausing a queue by giving it a 0/s rate, but you'll find it executes tasks anyway. When such an app is uploaded to production, it behaves as expected.
I opened a defect.
Rate parameter is not used for setting absolute upper bounds of TaskQueue processing. In fact, if you use for example:
rate: 10/s
bucket_size: 20
the processing can burst up to 20/s. Something more useful would be:
max_concurrent_requests: 1
which sets the maximum number of execution to 1 at a time.
However, this will not stop tasks from executing. If you are adding multiple Tasks a time but know that they need to be executed at a later time, you should probably use countdown.
_countdown using deferred library
countdown using Task class