I would like to recreate some data in my project every 30 minutes (prices that change). also I got another job that needs to refresh every minute.
Now I heard I should use a daemon. but I'm not sure how that works.
Can someone put me into the right direction.
Also should i make an extra model to save that temporary data or is that part of the daemon?
PS: not sure if stack overflow can be used for this sort of questions, but i don't know where to search for this sort of information
You don't want a daemon. You just want cron jobs.
The best thing to do is to write your scripts as custom Django management commands and use cron to trigger them to run at the specified intervals.
Related
Okay, so basically I am creating a website. The data I need to display on this website is delivered twice daily, where I need to read the delivered data from a file and store this new data in the database (instead of the old data).
I have created the python functions to do this. However, I would like to know, what would be the best way to run this script, while my flask application is running? This may be a very simple answer, but I have seen some answers saying to incorporate the script into the website design (however these answers didn't explain how), and others saying to run it separately. The script needs to run automatically throughout the day with no monitoring or input from me.
TIA
Generally it's a really bad idea to put a webserver to handle such tasks, that is the flask application in your case. There are many reasons for it so just to name a few:
Python's Achilles heel - GIL.
Sharing system resources of the application between users and other operations.
Crashes - it happens, it could be unlikely but it does. And if you are not careful, the web application goes down along with it.
So with that in mind I'd advise you to ditch this idea and use crontabs. Basically write a script that does whatever transformations or operations it needs to do and create a cron job at a desired time.
Currently, I'm using Google's 2-step method to backup the datastore and than import it to BigQuery.
I also reviewed the code using pipeline.
Both methods are not efficient and have high cost since all data is imported everytime.
I need only to add the records added from last import.
What is the right way of doing it?
Is there a working example on how to do it in python?
You can look at Streaming inserts. I'm actually looking at doing the same thing in Java at the moment.
If you want to do it every hour, you could maybe add your inserts to a pull queue (either as serialised entities or keys/IDs) each time you put a new entity to Datastore. You could then process the queue hourly with a cron job.
There is no full working example (as far as I know), but I believe that the following process could help you :
1- You'd need to add a "last time changed" to your entities, and update it.
2- Every hour you can run a MapReduce job, where your mapper can have a filter to check for last time updated and only pick up those entities that were updated in the last hour
3- Manually add what needs to be added to your backup.
As I said, this is pretty high level, but the actual answer will require a bunch of code. I don't think it is suited to Stack Overflow's format honestly.
I would like to write a tiny calendar-like application for someone as a birthday present (to be run on Ubuntu). All it should do is display a separate picture each day, so whenever it's invoked it should check the date and select the appropriate picture from the collection I would provide, but also, in case it just keeps running, it should switch to the next picture when the next day begins.
The date-checking on invocation isn't the problem; my question pertains to the second case: how can I have the program notice the beginning of the next day? My clumsy approach would be to make it check the current date at regular intervals and let it change the displayed picture once there was a change in date, but that strikes me as very roundabout and not particularly elegant.
In case any of you have got some idea of how I could accomplish this, please don't hesitate to reply. I would aim to write the application in either Perl or Python, so suggestions concerning those two languages would be most welcome, but any other suggestions would be appreciated as well.
Thanks a lot for your time!
The answer to this could be very system dependant. Controlling the time at which your program is executed is likely to be system dependant. On all *nix type systems, I would use cron. Assuming for a moment that you are using a *nix system, the answer then depends on what the program actually does.
If it only needs to select an image, then I would suggest that it not be run continuously, but terminates itself after selecting it, and is then run again the next day (there are a lot of tutorials on how to setup cron).
If, however, it has some form of UI and it is likely (read possible) to keep running for several days, then you can follow two approaches:
Create your program as it is, to poll periodically for the current time, and do a date delta comparison. Python timedelta objects could help here. This is pretty much your inelegant approach.
The other solution would be to send it a signal from cron when you do wish it to update. This process would mean that you would have to make it signal aware, and respond to something like USR1. The Python docs talk to this, but you can find many tutorials on the web. This approach also works quite nicely for daemonised apps.
I'm sure there are many other approaches too, but those are the ones that come to mind for a quickish and nastyish app.
Did you think about scheduling the invoke of your script?
For me, the best approach is this:
1.Have two options to run the script:
run_script
run_script --update
2.Schedule the update run in some task scheduler (for example Cron) to be executed daily.
3.When you would want to check the image for current day, simply run the script without update option.
If you would like me to extend any part of these, simply ask about it.
Question for Python 2.6
I would like to create an simple web application which in specified time interval will run a script that modifies the data (in database). My problem is code for infinity loop or some other method to achieve this goal. The script should be run only once by the user. Next iterations should run automatically, even when the user leaves the application. If someone have idea for method detecting apps breaks it would be great to show it too. I think that threads can be the best way to achive that. Unfortunately, I just started my adventure with Python and don't know yet how to use them.
The application will have also views showing database and for control of loop script.
Any ideas?
You mentioned that you're using Google App Engine. You can schedule recurring tasks by placing a cron.yaml file in your application folder. The details are here.
Update: It sounds like you're not looking for GAE-specific solutions, so the more general advice I'd give is to use the native scheduling abilities of whatever platform you're using. Cron jobs on a *nix host, scheduled tasks on Windows, cron.yaml on GAE, etc.
In your other comments you've suggested wanting something in Python that doesn't leave your script executing, and I don't think there's any way to do this. Some process has to be responsible for kicking off whatever it is you need done, so either you do it in Python and keep a process executing (even if it's just sleeping), or you use the platform's scheduling tools. The OS is almost guaranteed to do a better job of this than your code.
i think you'd want to use cron. write your script, and have cron run it every X minutes / hours.
if you really want to do this in Python, you can do something like this:
while(True):
<your app logic here>
sleep(TIME_INTERVAL)
Can you use cron to schedule the job to run at certain intervals? It's usually considered better than infinite loops, and was designed to help solve this sort of problem.
There's a very primitive cron in the Python standard library: import sched. There's also threading.Timer.
But as others say, you probably should just use the real cron.
I have a website I am looking to stay updated with and scrape some content from there every day. I know the site is updated manually at a certain time, and I've set cron schedules to reflect this, but since it is updated manually it could be 10 or even 20 minutes later.
Right now I have a hack-ish cron update every 5 minutes, but I'd like to use the deferred library to do things in a more precise manner. I'm trying to chain deferred tasks so I can check if there was an update and defer that same update a for couple minutes if there was none, and defer again if need be until there is finally an update.
I have some code I thought would work, but it only ever defers once, when instead I need to continue deferring until there is an update:
(I am using Python)
class Ripper(object):
def rip(self):
if siteHasNotBeenUpdated:
deferred.defer(self.rip, _countdown=120)
else:
updateMySite()
This was just a simplified excerpt obviously.
I thought this was simple enough to work, but maybe I've just got it all wrong?
The example you give should work just fine. You need to add logging to determine if deferred.defer is being called when you think it is. More information would help, too: How is siteHasNotBeenUpdated set?