How could I do this kind of scheduled task in AppEngine Python?

How could I do this kind of scheduled task in AppEngine Python? - python

I have a form with a text field for entering a number without decimal
places representing an amount of minutes that will have to be added to the current time
and will be inserted into a table named Alarm.
When the resulting time comes, my web app must make an insert operation over another table.
For example, if the user enters 20 minutes, and the current time is 22:10, the result time
will have to be 22:30 and will be inserted into Alarm table. So, when the 22:30 arrives, a new insert will have to be made over the another table.
How can I do this on AppEngine using Python?

Depending on your requirements, you may also want to consider using Tasks with an eta or countdown.
If you plan to allow users to cancel the action, you'd need to use some type of no-op marker the task checks for before adding to the "other" table. Or, make the task check the Alarm table before performing the add.
Also, note that the countdown / eta are not precise, they are more like polite requests. So if your queues are backing up with tasks, your adds will happen after they are supposed to. (though cron, particularly 1 minute jobs, also periodically suffer timing issues).
The advantage of this method is that you don't have to figure out how to avoid missing work. Each task represents one add (or a related set of adds). Also, if a write fails the task will retry, which is nice.
Cron may be a better solution for your particular problem though.

You've said that you're storing the target time in the Alarm table. So, your cron just has to run every minute (or every 5 or 10, depending on the resolution of your alarms) and check if there's an alarm matching the current time, and if so then do the action.

Related

Run multiple schedule jobs at same time using Python Schedule

I am using cx Oracle and schedule module in python. Following is the psuedo code.
import schedule,cx_Oracle
def db_operation(query):
'''
Some DB operations like
1. Get connection
2. Execute query
3. commit result (in case of DML operations)
'''
schedule.every().hour.at(":10").do(db_operation,query='some_query_1') # Runs at 10th minute in every hour
schedule.every().day.at("13:10").do(db_operation,query='some_query_2') # Runs at 1:10 p.m every day
Both the above scheduled jobs calls the same function (which does some DB operations) and will coincide at 13:10.
Questions:
So how does the scheduler handles this scenario? Like running 2 jobs at the same time. Does it puts in some sort of queue and runs one by one even though time is same? or are they in parallel?
Which one gets picked first? and if I would want the priority of first job over second, how to do it?
Also, important thing is that at a time only one of these should be accessing the database, otherwise it may lead to inconsistent data. How to take care of this scenario? Like is it possible to put a sort of lock while accessing the function or should the table be locked somehow?

I took a look at the code of schedule and I have come to the following conclusions:
The schedule library does not work in parallel or concurrent. Therefore, jobs that have expired are processed one after the other. They are sorted according to their due date. The job that should be performed furthest in the past is performed first.
If jobs are due at the same time, schedule execute the jobs according to the FIFO scheme, regarding the creation of the jobs. So in your example, some_query_1 would be executed before some_query_2.
Question three is actually self-explanatory as only one function can be executed at a time. Therefore, the functions should not actually get in each other's way.

How to limit number of rows in mysql and if exceeds then remove older row

I am using MySQL database via python for storing logs.
I was wondering if there is any efficient way to remove the oldest row if the num of rows exceeds the limit.
I was able to do this by executing a query to find total rows and then delete the older ones by arranging them in ascending and deleting. But this method is taking too much time. Is there a way to make this efficient by making a rule while creating a table, so that MySQL itself takes care if the limit exceeds?
Thanks in advance.

Well, there's no simple and built-in way to do this in MySQL.
Solutions that use triggers to delete old rows when you insert a new row are risky, because the trigger might fail. Or the transaction that spawned the trigger might be rolled back. In either of these cases, your intended deletion will not happen.
Also putting the burden of deleting on the thread that inserts new data causes extra work for the insert request, and usually we'd prefer not to make things slower for our current users.
It's more common to run an asynchronous job periodically to delete older data. This can be scheduled to run at off-hours, and run in batches. It also gives more flexibility to archive old data, or execute retries if the deletion or archiving fails or is interrupted.
MySQL does support an EVENT system, so you can run a stored routine based on a schedule. But you can only do tasks you can do in a stored routine, and it's not easy to make it do retries, or archive to any external system (e.g. cloud archive), or notify you when it's done.
Sorry there is no simple solution. There are just too many variations on how people would like it to work, and too many edge cases of potential failure.
The way I'd implement this is to use cron or else a timer thread in my web service to check the database, say once per hour. If it finds the number of rows is greater than the limit, it deletes the oldest rows in modestly sized batches (e.g. 1000 rows at a time) until the count is under the threshold.
I like to write scheduled jobs in a way that can be easily controlled and monitored. So I can make it run immediately if I want, and I can disable or resume the schedule if I want, and I can view a progress report about how much it deleted the last time it ran, and how long until the next time it runs, etc.

Django run tasks (possibly) in the far future

Suppose I have a model Event. I want to send a notification (email, push, whatever) to all invited users once the event has elapsed. Something along the lines of:
class Event(models.Model):
start = models.DateTimeField(...)
end = models.DateTimeField(...)
invited = models.ManyToManyField(model=User)
def onEventElapsed(self):
for user in self.invited:
my_notification_backend.sendMessage(target=user, message="Event has elapsed")
Now, of course, the crucial part is to invoke onEventElapsed whenever timezone.now() >= event.end.
Keep in mind, end could be months away from the current date.
I have thought about two basic ways of doing this:
Use a periodic cron job (say, every five minutes or so) which checks if any events have elapsed within the last five minutes and executes my method.
Use celery and schedule onEventElapsed using the eta parameter to be run in the future (within the models save method).
Considering option 1, a potential solution could be django-celery-beat. However, it seems a bit odd to run a task at a fixed interval for sending notifications. In addition I came up with a (potential) issue that would (probably) result in a not-so elegant solution:
Check every five minutes for events that have elapsed in the previous five minutes? seems shaky, maybe some events are missed (or others get their notifications send twice?). Potential workaroung: add a boolean field to the model that is set to True once notifications have been sent.
Then again, option 2 also has its problems:
Manually take care of the situation when an event start/end datetime is moved. When using celery, one would have to store the taskID (easy, ofc) and revoke the task once the dates have changed and issue a new task. But I have read, that celery has (design-specific) problems when dealing with tasks that are run in the future: Open Issue on github. I realize how this happens and why it is everything but trivial to solve.
Now, I have come across some libraries which could potentially solve my problem:
celery_longterm_scheduler (But does this mean I cannot use celery as I would have before, because of the differend Scheduler class? This also ties into the possible usage of django-celery-beat... Using any of the two frameworks, is it still possible to queue jobs (that are just a bit longer-running but not months away?)
django-apscheduler, uses apscheduler. However, I was unable to find any information on how it would handle tasks that are run in the far future.
Is there a fundemantal flaw with the way I am approaching this? Im glad for any inputs you might have.
Notice: I know this is likely to be somehwat opinion based, however, maybe there is a very basic thing that I have missed, regardless of what could be considered by some as ugly or elegant.

We're doing something like this in the company i work for, and the solution is quite simple.
Have a cron / celery beat that runs every hour to check if any notification needs to be sent.
Then send those notifications and mark them as done. This way, even if your notification time is years ahead, it will still be sent. Using ETA is NOT the way to go for a very long wait time, your cache / amqp might loose the data.
You can reduce your interval depending on your needs, but do make sure they dont overlap.
If one hour is too huge of a time difference, then what you can do is, run a scheduler every hour. Logic would be something like
run a task (lets call this scheduler task) hourly that gets all notifications that needs to be sent in the next hour (via celery beat) -
Schedule those notifications via apply_async(eta) - this will be the actual sending
Using that methodology would get you both of best worlds (eta and beat)

Executing a function on datetimes generated randomly each day for each user

An interesting conundrum. Here's what I want to do:
I have a Pyramid (python 2.7.2) website running on Heroku which pushes notifications to my iPhone app users. Each day, every user needs a push notification sent to them at a randomly generated time between 10:00am and 10:00pm (it obviously needs to know the users timezone as well).
My current plan is the following: Use a persistent worker process to trigger a function every 1 minute on the minute. Each minute, it will call a function (on a different thread so as not to interrupt the timer) which will do 2 things:
Check if it's 11:00pm for each timezone (which will happen 24 times a day, once for each timezone). If true, it will call a function which loops through every user in that respective timezone and generates their random time for the next day, then stores it in the Mongo database.
On each minute, the worker will also loop through the users and check if they have their notification due at that time. If it's due, then send the notification.
My question is: Is there a better way of doing this that doesn't require generating a huge list of random datetimes every day beforehand?

There are certainly other ways. Whether they're better is a different matter. For instance: suppose there are n minutes left before the end of a given user's day and they haven't had their notification yet. Then send them a notification now with probability 1/n. This way, you don't need the huge list of random datetimes, but every minute you still need to iterate over all your users, see whether they've been notified yet, and compute random numbers for them all. It's a little more computation in total (though I doubt the difference is significant) and means that all your database updates are small.
Or: Each time you notify a user, then you generate their next update time. That way, the next-update times get computed incrementally but are still known in advance.
(If your number of users is relatively small, so that on most minutes there isn't a notification, you can make the scheduling smarter -- but I won't say more about that, because if you have that few users then the amount of work your software needs to do is going to be negligible anyway and there's no point optimizing for that case.)

Here's some pseudo-code:
Once per PERIOD (e.g. 1 minute) in the RANGE:
Let NOTECOUNT be the number of users needing notification
Let FRACTION be the length of the RANGE divided by the PERIOD
Notify FRACTION of users (either first N or randomly chosen)
Update notified user records with the notification time
At the end of each RANGE:
Notify all users whose last notification time is at least 24 hours ago
There's nothing explicitly there about how to handle multiple time zones; you could simply consider each supported time zone to require one "instance" of the above process, and the list of users in each time zone would be the candidate list in each instance. Two potential issues are that users may change time zones, and that time zones suck, so some times may occur twice (e.g. when DST changes), so you should think about that.

Google App Engine - design considerations about cron tasks

I'm developing software using the Google App Engine.
I have some considerations about the optimal design regarding the following issue: I need to create and save snapshots of some entities at regular intervals.
In the conventional relational db world, I would create db jobs which would insert new summary records.
For example, a job would insert a record for every active user that would contain his current score to the "userrank" table, say, every hour.
I'd like to know what's the best method to achieve this in Google App Engine. I know that there is the Cron service, but does it allow us to execute jobs which will insert/update thousands of records?

I think you'll find that snapshotting every user's state every hour isn't something that will scale well no matter what your framework. A more ordinary environment will disguise this by letting you have longer running tasks, but you'll still reach the point where it's not practical to take a snapshot of every user's data, every hour.
My suggestion would be this: Add a 'last snapshot' field, and subclass the put() function of your model (assuming you're using Python; the same is possible in Java, but I don't know the syntax), such that whenever you update a record, it checks if it's been more than an hour since the last snapshot, and if so, creates and writes a snapshot record.
In order to prevent concurrent updates creating two identical snapshots, you'll want to give the snapshots a key name derived from the time at which the snapshot was taken. That way, if two concurrent updates try to write a snapshot, one will harmlessly overwrite the other.
To get the snapshot for a given hour, simply query for the oldest snapshot newer than the requested period. As an added bonus, since inactive records aren't snapshotted, you're saving a lot of space, too.

Have you considered using the remote api instead? This way you could get a shell to your datastore and avoid the timeouts. The Mapper class they demonstrate in that link is quite useful and I've used it successfully to do batch operations on ~1500 objects.
That said, cron should work fine too. You do have a limit on the time of each individual request so you can't just chew through them all at once, but you can use redirection to loop over as many users as you want, processing one user at a time. There should be an example of this in the docs somewhere if you need help with this approach.

I would use a combination of Cron jobs and a looping url fetch method detailed here: http://stage.vambenepe.com/archives/549. In this way you can catch your timeouts and begin another request.
To summarize the article, the cron job calls your initial process, you catch the timeout error and call the process again, masked as a second url. You have to ping between two URLs to keep app engine from thinking you are in a accidental loop. You also need to be careful that you do not loop infinitely. Make sure that there is an end state for your updating loop, since this would put you over your quotas pretty quickly if it never ended.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.