Script restarts every 24 hours Heroku [duplicate]

Script restarts every 24 hours Heroku [duplicate] - python

This question already has answers here:
Heroku: Prevent worker process from restarting?
(1 answer)
Controlling Heroku's random dyno restarts
(1 answer)
Closed last year.
I have a Telegram bot in Python but it is restarted by the same command in the worker about every 24 hours. I am using the free version dyno hours. How can I disable this reboot?

You cannot. You have to design your app so it works correctly across those restarts.
See docs:
Dynos are also restarted (cycled) at least once per day to help maintain the health of applications running on Heroku. Any changes to the local filesystem will be deleted. The cycling happens once every 24 hours (plus up to 216 random minutes, to prevent every dyno for an application from restarting at the same time). Manual restarts (heroku ps:restart) and releases (deploys or changing config vars) will reset this 24 hour period. Cycling happens for all dynos, including one-off dynos, so dynos will run for a maximum of 24 hours + 216 minutes.
Ideally you'd run at least two dynos for the same app at the same time for increased stability. Heroku will ensure they won't restart at the exact same time, so there will be always at least one online and responding to requests.
After receiving SIGTERM, you have 30 seconds to finish the work you are doing on existing requests before your process is killed, as explained here.

Related

Cloud Run suddenly starts timing out when processing any request

We've been running a backend application on Cloud Run for about a year and a half now, and a month ago it suddenly stopped properly handling all requests at seemingly random times (about every couple of days), only working again once we redeploy from the latest image from Cloud Build. The application will actually receive the request, however it just doesn't do anything and eventually the request will just time out (504) after 59m59s (the max timeout), even a test endpoint that just returns 'Hello World' times out without sending a response.
The application is written in Python and uses Flask to handle requests. We have a Cloud SQL instance that is used as its database, however we're confident this is not the source of the issue as even requests that don't involve the DB in any form do not work and the Cloud SQL instance is accessible even when the application stops working. Cloud Run is deployed with the following configuration:
CPU: 2
Memory: 8Gi
Timeout: 59m59s
VPC connector
VPC egress: private-ranges-only
Concurrency: 100
The vast majority of endpoints should produce some form of log when they first start, so we're confident that the application isn't executing any of the code after being triggered. We're not seeing any useful error messages in Logs Explorer either, simply just 504 errors from the requests timing out. It's deployed with a 59m59s timeout, so it's not the case that the timeout has been entered incorrectly and even then, that wouldn't explain why it works again when it's redeployed.
We have a Cloud Scheduler schedule that triggers the application every 15 minutes, which sends to an endpoint in the application that checks if any tasks are due to run and creates Cloud Tasks tasks (which send HTTP requests to an endpoint on the same application) for any tasks that need performing at that point in time. Every time the application stops working, it does seem to be during one of these runs, however we're not certain it's the cause as the Cloud Scheduler schedule is the most frequent trigger anyway. There doesn't seem to be a specific time of day that the crashes take place either.
This is a (heavily redacted) screenshot of the logs. The Cloud Scheduler schedule hits the endpoint at 21:00 and creates a number of tasks but then hits the default 3m Cloud Scheduler timeout limit at 21:03. The tasks it created then hit the default 10m Cloud Tasks timeout limit at 21:10 without their endpoint having done anything. After that point, all requests to the service timeout without doing anything.
The closest post I could find on SO was this one, their problem is also temporarily fixed by redeployment, however ours isn't sending 200 responses when it stops working and is instead just timing out without doing anything. We've tried adding retries to Cloud Scheduler + increasing its timeout limit, and we've also tried increasing the CPU and RAM allocation.
Any help is appreciated!

virtual real time limit (178/120s) reached

I am using ubuntu 16 version and running Odoo erp system 12.0 version.
On my application log file i see information says "virtual real time limit (178/120s) reached".
What exactly it means & what damage it can cause to my application?
Also how i can increase the virtual real time limit?

It's a parameter to add resilience to the Odoo server by killing zombie threads and spawning new ones. It won't harm your application but it limits your time for debugging if you don't change it.
According to Odoo's own documentation (see https://www.odoo.com/documentation/12.0/reference/cmdline.html)
--limit-time-real Prevents the worker from taking longer than seconds to process a request. If the limit is exceeded, the
worker is killed.
Differs from --limit-time-cpu in that this is a “wall time” limit
including e.g. SQL queries.
Defaults to 120.
So, to be able to debug in peace, I run Odoo with --limit-time-real 99999

Open your config file and just add below parameter :
--limit-time-real=100000

Free Heroku Server: Does Sleep Count as Active Time?

I am planning on having a Python app run under a free Heroku server, but I have read that there is a max 18 hour execution time before the process is slept. However, what if my app runs likes this -
process something (which should take less than a second).
sleep for 5 minutes.
I plan on having this script run continuously (all day long).
Does the 5 minute sleep count towards the 18 hour time limit?

I think it will be counted as the processing time because you are using a single thread to process a request, which is different from Heroku "sleep".
The timeout value is not configurable. If your server requires longer
than 30 seconds to complete a given request, we recommend moving that
work to a background task or worker to periodically ping your server
to see if the processing request has been finished. This pattern frees
your web processes up to do more work, and decreases overall
application response times.
You can read more here : https://devcenter.heroku.com/articles/request-timeout
if you are willing to wait for 10 minutes you can try https://elements.heroku.com/addons/scheduler or use some kind of monitoring service like http://godrb.com/

Console output consuming much CPU? (about 140 lines per second)

I am doing my bachelor's thesis where I wrote a program that is distributed over many servers and exchaning messages via IPv6 multicast and unicast. The network usage is relatively high but I think it is not too high when I have 15 servers in my test where there are 2 requests every second that are going like that:
Server 1 requests information from server 3-15 via multicast. every of 3-15 must respond. if one response is missing after 0.5 sec, the multicast is resent, but only the missing servers must respond (so in most cases this is only one server)
Server 2 does exactly the same. If there are missing results after 5 retries the missing servers are marked as dead and the change is synced with the other server (1/2)
So there are 2 multicasts every second and 26 unicasts every second. I think this should not be too much?
Server 1 and 2 are running python web servers which I use to do the request every second on each server (via a web client)
The whole szenario is running in a mininet environment which is running in a virtual box ubuntu that has 2 cores (max 2.8ghz) and 1GB RAM. While running the test, i see via htop that the CPUs are at 100% while the RAM is at 50%. So the CPU is the bottleneck here.
I noticed that after 2-5 minutes (1 minute = 60 * (2+26) messages = 1680 messages) there are too many missing results causing too many sending repetitions while new requests are already coming in, so that the "management server" thinks the client servers (3-15) are down and deregisters them. After syncing this with the other management server, all client servers are marked as dead on both management servers which is not true...
I am wondering if the problem could be my debug outputs? I am printing 3-5 messages for every message that is sent and received. So that are about (let's guess it are 5 messages per sent/recvd msg) (26 + 2)*5 = 140 lines that are printed on the console.
I use python 2.6 for the servers.
So the question here is: Can the console output slow down the whole system that simple requests take more than 0.5 seconds to complete 5 times in a row? The request processing is simple in my test. No complex calculations or something like that. basically it is something like "return request_param in ["bla", "blaaaa", ...] (small list of 5 items)"
If yes, how can I disable the output completely without having to comment out every print statement? Or is there even the possibility to output only lines that contain "Error" or "Warning"? (not via grep, because when grep becomes active all the prints already have finished... I mean directly in python)
What else could cause my application to be that slow? I know this is a very generic question, but maybe someone already has some experience with mininet and network applications...

I finally found the real problem. It was not because of the prints (removing them improved performance a bit, but not significantly) but because of a thread that was using a shared lock. This lock was shared over multiple CPU cores causing the whole thing being very slow.
It even got slower the more cores I added to the executing VM which was very strange...
Now the new bottleneck seems to be the APScheduler... I always get messages like "event missed" because there is too much load on the scheduler. So that's the next thing to speed up... :)

Django, sleep() pauses all processes, but only if no GET parameter?

Using Django (hosted by Webfaction), I have the following code
import time
def my_function(request):
time.sleep(10)
return HttpResponse("Done")
This is executed via Django when I go to my url, www.mysite.com
I enter the url twice, immediately after each other. The way I see it, both of these should finish after 10 seconds. However, the second call waits for the first one and finishes after 20 seconds.
If, however, I enter some dummy GET parameter, www.mysite.com?dummy=1 and www.mysite.com?dummy=2 then they both finish after 10 seconds. So it is possible for both of them to run simultaneously.
It's as though the scope of sleep() is somehow global?? Maybe entering a parameter makes them run as different processes instead of the same???
It is hosted by Webfaction. httpd.conf has:
KeepAlive Off
Listen 30961
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
I do need to be able to use sleep() and trust that it isn't stopping everything. So, what's up and how to fix it?
Edit: Webfaction runs this using Apache.

As Gjordis pointed out, sleep will pause the current thread. I have looked at Webfaction and it looks like their are using WSGI for running the serving instance of Django. This means, every time a request comes in, Apache will look at how many worker processes (that are processes that each run a instance of Django) are currently running. If there are none/to view it will spawn additonally workers and hand the requests to them.
Here is what I think is happening in you situation:
first GET request for resource A comes in. Apache uses a running worker (or starts a new one)
the worker sleeps 10 seconds
during this, a new request for resource A comes in. Apache sees it is requesting the same resource and sends it to the same worker as for request A. I guess the assumption here is that a worker that recently processes a request for a specific resource it is more likely that the worker has some information cached/preprocessed/whatever so it can handle this request faster
this results in a 20 second block since there is only one worker that waits 2 times 10 seconds
This behavior makes complete sense 99% of the time so it's logical to do this by default.
However, if you change the requested resource for the second request (by adding GET parameter) Apache will assume that this is a different resource and will start another worker (since the first one is already "busy" (Apache can not know that you are not doing any hard work). Since there are now two worker, both waiting 10 seconds the total time goes down to 10 seconds.
Additionally I assume that something is **wrong** with your design. There are almost no cases which I can think of where it would be sensible to not respond to a HTTP request as fast as you can. After all, you want to serve as many requests as possible in the shortest amount of time, so sleeping 10 seconds is the most counterproductive thing you can do. I would recommend the you create a new question and state what you actual goal is that you are trying to achieve. I'm pretty sure there is a more sensible solution to this!

Assuming you run your Django-server just with run() , by default this makes a single threaded server. If you use sleep on a single threaded process, the whole application freezes for that sleep time.

It may simply be that your browser is queuing the second request to be performed only after the first one completes. If you are opening your URLs in the same browser, try using the two different ones (e.g. Firefox and Chrome), or try performing requests from the command line using wget or curl instead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.