( ndb, python, gae) - cron job timeout using more than one module - python

Is there something special that I need to do when working with cron jobs for separated modules? I can't figure out why I can make a request to the cron job at localhost:8083/tasks/crontask (localhost:8083 runs the workers module), which is supposed to just print a simple line, and it doesn't print to the console, although it says that the request was successful if I run it by going to http://localhost:8000/cron and hitting the run button.. but even that still doesn't hit make it print to the console.
If I refresh the page localhost:8083/tasks/crontask as a way of triggering the cron job, it times out.
again, If I go to localhost:8001 and hit the run button, it says request to /tasks/crontask was successful, but it doesn't print to the console like it's supposed to
In send_notifications_handler.py within in workers/handlers directory
class CronTaskHandler(BaseApiHandler):
def get(self):
print "hello, this is a cron job"
in cron.yaml outside the workers module
cron:
- description: something
url: /tasks/crontask
schedule: every 1 minutes
target: workers
in init.py in the workers/handlers directory
from send_notifications_handler import CronTaskHandler
#--- Packaging
__all__ = [
CounterWorker,
DeleteGamesCronHandler,
CelebrityCountsCronTaskHandler,
QuestionTypeCountsCronHandler,
CronTaskHandler
]
in workers/routes.py
Route('/tasks/crontask', handlers.CronTaskHandler, methods=['GET']),
//++++++++++++++++++++ Updates / resolution +++++++++++++
The print statement is fine and does print to the console
Yes, the cron job will fire once under the using the dev server, although it doesn't repeat
The problem was that _ah/start in that module was routed to a pull queue that never stops. removing the pull queue fixed the issue.

That is actually the expected behavior when executing cron jobs locally.
If you take a look to the docs, it says the following:
The development server doesn't automatically run your cron jobs. You can use your local desktop's cron or scheduled tasks interface to trigger the URLs of your jobs with curl or a similar tool.
You will need to manually execute cron jobs on local server by visiting http://localhost:8000/cron, as you mentioned in your post.

/++++++++++++++++++++ Updates / resolution +++++++++++++
The print statement is fine and does print to the console
Yes, the cron job will fire once when using the dev server, although it doesn't repeat, which is normal behavior for dev servers
The problem was that _ah/start in that module was routed to a pull queue that never stops. removing the pull queue fixed the issue.
Thanks for suggestions

Related

Is it possible run a fastapi in command line?

We can run any script in python doing:
python main.py
Is it possible do the same if the script was a FastApi application?
Something like:
python main.py GET /login.html
To call a GET method that returns a login.html page.
If not, how I could start a FastApi application without using Uvicorn or another webserver?
I would like can run the script only when necessary.
Thanks
FastApi is designed to allow you to BUILD APIs which can be queried using a HTTP client, not directly query those APIs yourself; however, technically I believe you could.
When you start the script you could start the FastApi app in a another process running in the background, then send a request to it.
import subprocess
import threading
import requests
url = "localhost/some_path"
# launch sub process in background task while redirecting all output to /dev/null
thread = threading.Thread(target=lambda: subprocess.check_output(["uvcorn", "main:app"]))
thread.start()
response = requests.get(url)
# do something with the response...
thread.join()
Obviously this snippet has MUCH room for improvement, for example the thread will never actually end unless something bad happens, this is just a minimal example.
This is method has the clear drawback of starting the API each time you want to run the command. A better approach would be to emulate applications such as Docker, in which you would start up a local server daemon which you would then ping using the command line app.
This would mean that you would have the API running for much longer in the background, but typically these APIs are fairly light and you shouldn't notice and hit to you computer's performance. This also provides the benefit of multiple users being able to run the command at the same time.
If you used the first previous method you may run into situations where user A send a GET request, starting up the server taking hold of the configured host port combo. When user B tries to run the same command just after, they will find themselves unable to start the server. and perform the request.
This will also allow you to eventually move the API to an external server with minimal effort down the line. All you would need to do is change the base url of the requests.
TLDR; Run the FastApi application as a daemon, and query the local server from the command line program instead.

How to run a function periodically in django/celery without having a worker (or using Schedule)?

I would like to run my function do_periodic_server_error_checking every ten minutes, but I do not wish to:
have it run on a worker - I'd like it to be run on the server itself (as far as I can see there is no way to get apply_async, delay, or #periodic_task to run on the server)
use Schedule - (not sure why people are recommending a busy-waiting approach for a django server - maybe I'm missing something about the operation of Schedule)
My best idea so far is to schedule a cron job to make an HTTP request to the server, but I'm wondering if there's an easier way from within django itself?

Restart python script if not running/stopped/error with simple cron job

Summary: I have a python script which collects tweets using Twitter API and i have postgreSQL database in the backend which collects all the streamed tweets. I have custom code which overcomes the ratelimit issue and i made it to run 24/7 for months.
Issue: Sometimes streaming breaks and sleeps for given secs but it is not helpful. I do not want to check it manually.
def on_error(self,status)://tweepy method
self.mailMeIfError(['me <me#localhost'],'listen.py <root#localhost>','Error Occured on_error method',str(error))
time.sleep(300)
return True
Assume mailMeIfError is a method which takes care of sending me a mail.
I want a simple cron script which always checks the process and restart the python script if not running/error/breaks. I have gone through some answers from stackoverflow where they have used Process ID. In my case process ID still exists because this script sleeps if Error.
Thanks in advance.
Using Process ID is much easier and safer. Try using watchdog.
This can all be done in your one script. Cron would need to be configured to start your script periodically, say every minute. The start of your script then just needs to determine if it is the only copy of itself running on the machine. If it spots that another copy is running, it just silently terminates. Else it continues to run.
This behaviour is called a Singleton pattern. There are a number of ways to achieve this for example Python: single instance of program

how to write endless loop crawler in python?

EDITED:
I have a crawler.py that crawls certain sites every 10 minutes and sends me some emails regarding these site. The crawler is ready and working locally.
How can I adjust it so that the following two things will happen :
It will run in endless loop on the hosting that I'll upload it to?
Sometimes I will be able to stop it ( e.g. for debugging).
At first, I thought of doing endless loop e.g.
crawler.py:
while True:
doCarwling()
sleep(10 minutes)
However, according to answers I got below, this would be impossible since hosting providers kill processes after a while (just for the question sake, let's assume proccesses are killed every 30 min). Therefore, my endless loop process would be killed at some point.
Therefore, I have thought pf a different solution:
Lets assume that my crawler is located at "www.example.com\crawler.py" and each time it is accessed, it executes the function run():
run()
doCarwling()
sleep(10 minutes)
call URL "www.example.com\crawler.py"
Thus, there will be no endless loop. In fact, every time my crawler runs, it would also access the URL which will execute the same crawler again. Therefore, there would be no endless loop, no process with a long-running time, and my crawler will continue operating forever.
Will my idea work?
Are there any hidden drawbacks I haven't thought of?
Thanks!
Thanks
As you stated in the comments, you are running on a public shared server like GoDaddy and so on. Therefore cron is not available there and long running scripts are usually forbidden - your process would be killed even if you were using sleep.
Therefore, the only solution I see is to use an external server on which you have to control to connect to your public server and run the script, every 10 minutes. One solution could be using cron on your local machine to connect with wget or curl to a specific page on your host. **
Maybe you can find on-line services that allow running a script periodically, and use those, but I know none.
** Bonus: you can get the results directly as response without having to send yourself an email.
Update
So, in your updated question you propose yo use your script to call itself with an HTTP request. I thought of it before, but I didn't consider it in my previous answer because I believe it won't work (in general).
My concern is: will the server kill a script if the HTTP connection requesting it is closed before the script terminates?
In other words: if you open yoursite.com/script.py and it takes 60 seconds to run, and you close the connection with the server after 10 seconds, will the script run till its regular end?
I thought that the answer was obviously "no, the script will be killed", therefore that method would be useless, because you should guarantee that a script calling itself via a HTTP request stays alive longer than the called script. I did a little experiment using flask, and it proved me wrong:
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
import time
print('Script started...')
time.sleep(5)
print('5 seconds passed...')
time.sleep(5)
print('Script finished')
return 'Script finished'
if __name__ == '__main__':
app.run()
If I run this script and make an HTTP request to localhost:5000, and close the connection after 2 seconds, the scripts continues to run until the end and the messages are still printed.
Therefore, with flask, if you can do an asynchronous request to yourself, you should be able to have an "infinite loop" script.
I don't know the behavior on other servers, though. You should make a test.
Control
Assuming your server allows you to do a GET request and have the script running even if the connection is closed, you have few things to take care of, for example that your script still has to run fast enough to complete during the maximum server time allowance, and that to make your script run every 10 minutes, with a maximum allowance of 1 minute, you have to count every time 10 calls.
In addition, this mechanism has to be controlled, because you cannot interrupt it for debug as you requested. At least, not directly.
Therefore, I suggest you to use files: use a file to split your crawling in smaller steps, each capable to finish in less than one minute, and then continue again when the script is called again.
Use a file to count how many times the script is called, before actually doing the crawling. This is necessary if, for example, the script is allowed to live 90 seconds, but you want to crawl every 10 hours.
Use a file to control the script: store a boolean flag that you use to stop the recursion mechanism if you need to.
If you're using Linux you should just do a cron job for your script. Info: http://code.tutsplus.com/tutorials/scheduling-tasks-with-cron-jobs--net-8800
If you are running linux I would setup and upstart script http://upstart.ubuntu.com/getting-started.html to turn it into a service.
It offers a lot of advantages like:
-Starting at system boot
-Auto restart on crashes
-Manageable: service mycrawler restart
...
Or if you would prefer to have it run every 10 minutes forget about the endless loop and do a cronjob http://en.wikipedia.org/wiki/Cron

How to make my program add a cron job automatically with GAE?

I need to make a bot who can automatically add a cron job for itself,but I don't think I could access the cron.yaml file on GAE server.What can I do with this?
You could tell the bot to add the new schedule in your datastore instead.
Then create a single "master" cron job with 1 minute schedule that checks the schedules that you had set in the datastore. The cron job would then determine whether on the current time the handler for an associated schedule need to be invoked or not.
If it does, the master cron job would then invoke the stored job using the TaskQueue API.
It's true that a lot of dev environments don't give you access to the cron.yaml files, however, you can run a Python script locally that communicates with your deployed program, edits your local copy of cron.yaml and pushes up the changes.

Categories

Resources