Checking if a flask server is busy or idle - python

I want my flask server to do a certain task when its idle for some time lets say 5 mins, i.e after 5 mins it will do another job. For this i need something to check if a server is busy or idle. How can I achieve this?

You can save the current datetime in a variable everytime a request is made via the #app.after_request annotation.
Then you need a routine to periodically check if the time difference between now and the saved timestamp is greater than 5 miutes. If so then run your task.
For executing your check every 5 minutes you may need some kind of scheduler which will be started with your application.

Related

Free Heroku Server: Does Sleep Count as Active Time?

I am planning on having a Python app run under a free Heroku server, but I have read that there is a max 18 hour execution time before the process is slept. However, what if my app runs likes this -
process something (which should take less than a second).
sleep for 5 minutes.
I plan on having this script run continuously (all day long).
Does the 5 minute sleep count towards the 18 hour time limit?
I think it will be counted as the processing time because you are using a single thread to process a request, which is different from Heroku "sleep".
The timeout value is not configurable. If your server requires longer
than 30 seconds to complete a given request, we recommend moving that
work to a background task or worker to periodically ping your server
to see if the processing request has been finished. This pattern frees
your web processes up to do more work, and decreases overall
application response times.
You can read more here : https://devcenter.heroku.com/articles/request-timeout
if you are willing to wait for 10 minutes you can try https://elements.heroku.com/addons/scheduler or use some kind of monitoring service like http://godrb.com/

Restart python script if not running/stopped/error with simple cron job

Summary: I have a python script which collects tweets using Twitter API and i have postgreSQL database in the backend which collects all the streamed tweets. I have custom code which overcomes the ratelimit issue and i made it to run 24/7 for months.
Issue: Sometimes streaming breaks and sleeps for given secs but it is not helpful. I do not want to check it manually.
def on_error(self,status)://tweepy method
self.mailMeIfError(['me <me#localhost'],'listen.py <root#localhost>','Error Occured on_error method',str(error))
time.sleep(300)
return True
Assume mailMeIfError is a method which takes care of sending me a mail.
I want a simple cron script which always checks the process and restart the python script if not running/error/breaks. I have gone through some answers from stackoverflow where they have used Process ID. In my case process ID still exists because this script sleeps if Error.
Thanks in advance.
Using Process ID is much easier and safer. Try using watchdog.
This can all be done in your one script. Cron would need to be configured to start your script periodically, say every minute. The start of your script then just needs to determine if it is the only copy of itself running on the machine. If it spots that another copy is running, it just silently terminates. Else it continues to run.
This behaviour is called a Singleton pattern. There are a number of ways to achieve this for example Python: single instance of program

how to write endless loop crawler in python?

EDITED:
I have a crawler.py that crawls certain sites every 10 minutes and sends me some emails regarding these site. The crawler is ready and working locally.
How can I adjust it so that the following two things will happen :
It will run in endless loop on the hosting that I'll upload it to?
Sometimes I will be able to stop it ( e.g. for debugging).
At first, I thought of doing endless loop e.g.
crawler.py:
while True:
doCarwling()
sleep(10 minutes)
However, according to answers I got below, this would be impossible since hosting providers kill processes after a while (just for the question sake, let's assume proccesses are killed every 30 min). Therefore, my endless loop process would be killed at some point.
Therefore, I have thought pf a different solution:
Lets assume that my crawler is located at "www.example.com\crawler.py" and each time it is accessed, it executes the function run():
run()
doCarwling()
sleep(10 minutes)
call URL "www.example.com\crawler.py"
Thus, there will be no endless loop. In fact, every time my crawler runs, it would also access the URL which will execute the same crawler again. Therefore, there would be no endless loop, no process with a long-running time, and my crawler will continue operating forever.
Will my idea work?
Are there any hidden drawbacks I haven't thought of?
Thanks!
Thanks
As you stated in the comments, you are running on a public shared server like GoDaddy and so on. Therefore cron is not available there and long running scripts are usually forbidden - your process would be killed even if you were using sleep.
Therefore, the only solution I see is to use an external server on which you have to control to connect to your public server and run the script, every 10 minutes. One solution could be using cron on your local machine to connect with wget or curl to a specific page on your host. **
Maybe you can find on-line services that allow running a script periodically, and use those, but I know none.
** Bonus: you can get the results directly as response without having to send yourself an email.
Update
So, in your updated question you propose yo use your script to call itself with an HTTP request. I thought of it before, but I didn't consider it in my previous answer because I believe it won't work (in general).
My concern is: will the server kill a script if the HTTP connection requesting it is closed before the script terminates?
In other words: if you open yoursite.com/script.py and it takes 60 seconds to run, and you close the connection with the server after 10 seconds, will the script run till its regular end?
I thought that the answer was obviously "no, the script will be killed", therefore that method would be useless, because you should guarantee that a script calling itself via a HTTP request stays alive longer than the called script. I did a little experiment using flask, and it proved me wrong:
from flask import Flask
app = Flask(__name__)
#app.route('/')
def hello_world():
import time
print('Script started...')
time.sleep(5)
print('5 seconds passed...')
time.sleep(5)
print('Script finished')
return 'Script finished'
if __name__ == '__main__':
app.run()
If I run this script and make an HTTP request to localhost:5000, and close the connection after 2 seconds, the scripts continues to run until the end and the messages are still printed.
Therefore, with flask, if you can do an asynchronous request to yourself, you should be able to have an "infinite loop" script.
I don't know the behavior on other servers, though. You should make a test.
Control
Assuming your server allows you to do a GET request and have the script running even if the connection is closed, you have few things to take care of, for example that your script still has to run fast enough to complete during the maximum server time allowance, and that to make your script run every 10 minutes, with a maximum allowance of 1 minute, you have to count every time 10 calls.
In addition, this mechanism has to be controlled, because you cannot interrupt it for debug as you requested. At least, not directly.
Therefore, I suggest you to use files: use a file to split your crawling in smaller steps, each capable to finish in less than one minute, and then continue again when the script is called again.
Use a file to count how many times the script is called, before actually doing the crawling. This is necessary if, for example, the script is allowed to live 90 seconds, but you want to crawl every 10 hours.
Use a file to control the script: store a boolean flag that you use to stop the recursion mechanism if you need to.
If you're using Linux you should just do a cron job for your script. Info: http://code.tutsplus.com/tutorials/scheduling-tasks-with-cron-jobs--net-8800
If you are running linux I would setup and upstart script http://upstart.ubuntu.com/getting-started.html to turn it into a service.
It offers a lot of advantages like:
-Starting at system boot
-Auto restart on crashes
-Manageable: service mycrawler restart
...
Or if you would prefer to have it run every 10 minutes forget about the endless loop and do a cronjob http://en.wikipedia.org/wiki/Cron

Django, sleep() pauses all processes, but only if no GET parameter?

Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.
As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!
Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.
It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.

Timed email reminder in python

I have written up a python script that allows a user to input a message, his email and the time and they would like the email sent. This is all stored in a mysql database.
However, how do I get the script to execute on the said time and date? will it require a cron job? I mean say at 2:15 on april 20th, the script will search the database for all times of 2:15, and send out those emails. But what about for emails at 2:16?
I am using a shared hosting provided, so cant have a continously running script.
Thanks
If you cannot have a continuously running script, something must trigger it, so that would have to rely on your OS internals. In a unix environment a cron job, as you self state, would do the trick.
Set cron to run the script, and make the script wait for a given time and then continue running and sending until the next email is more than this given time away. Then make your script add a new cron job for a new wakeup time.
Looks like this django application was made just for people in your situation...
http://code.google.com/p/django-cron/
Also, your design seems a little flawed. If its running at 2:15 you wouldn't want to send out just emails that should be sent at 2:15, but all ones that should have been sent in the past that have not been sent.
Your database should either:
A. Delete the entries once they send
or
B. Have a column defined on your database table to store whether it was sent or not. Then your logic should make use of that column.
A cronjob every minute or so would do it. If you're considering this, you might like to mind two things:
1 - How many e-mails are expected to be sent per minute? If it takes you 1 second to send an e-mail and you have 100 e-mails per minute, you won't finish your queue.
2 - What will happen if one job starts before the last one finishes? Be careful not to send e-mails twice. You need either to make sure first process ends (risk: you can drop an e-mail eventually), avoid next process to start (risk: first process hangs whole queue) or make them work in parallel (risk: synchronization problems).
If you take daramarak's suggestion - make you script add a new cron job at end - you have the risk of whole system colapsing if one error occurs.

Categories

Resources