Is it possible to get container OS logs from Google Cloud Run - python

I'm using google cloud run. I run container with simple Flask+gunicorn app that starts heavy computation.
Sometimes it fails
Application exec likely failed
terminated: Application failed to start: not available
I'm 100% confident it's not related to google cloud run timeouts or Flask + gunicorn timeouts.
I've added hooks for gunicorn: worker_exit, worker_abort, worker_int, on_exit. Mentioned hooks are not invoked.
Exactly the same operation works well locally. I can reproduce it at cloud run only.
Seems like something crashes at cloud run and just kills my python process completely.
Is there any chance to debug it?
Maybe I can stream tail -f /var/log/{messages,kernel,dmesg,syslog} somehow in parallel to logs?
The idea is to understand what kills app.
UPD:
I've managed to get a bit more logs
Default
[INFO] Handling signal: term
Caught SIGTERM signal.Caught SIGTERM signal.
What is the right way to find what (and why) sends SIGTERM to my python process?

I would suggest setting up Cloud Logging with your Cloud Run instance. You can easily do so by following this documentation which shows how to attach Cloud Logging to the Python root logger. This will allow you to have more control over the logs that appear for your Cloud Run application.
Setting Up Cloud Logging for Python
Also in setting up Cloud Logging it should allow Cloud Run to pick up automatically any logs under the var/log directory as well as any syslogs (dev/log).
Hope this helps! Let me know if you need further assistance.

Related

logging system kill signals for debugging

I have a script run by python3.7.9 on my ubuntu18.04 docker container.
At some point the python interpreter is killed. This is likely caused by resource excess.
Using docker log ${container_id} I only get the stderr inside the container, but I am also interested in what kind of resource was exceeded, so that I can give useful feedback to development.
Is this automatically logged on a system level (linux, docker)?
If this is not the case, how can I log this?

Logging uWSGI application stop in Python

I have a Flask app that I run with uWSGI. I have configured logging to file in the Python/Flask application, so on service start it logs that the application has been started.
I want to be able to do this when the service stops as well, but I don't know how to implement it.
For example, if I run the uwsgi app in console, and then interrupt it with Ctrl-C, I get only uwsgi logs ("Goodbye to uwsgi" etc) in console, but no logs from the stopped python application. Not sure how to do this.
I would be glad if someone advised on possible solutions.
Edit:
I've tried to use Python's atexit module, but the function that I registered to run on exit is executed not one time, but 4 times (which is the number of uWSGI workers).
There is no "stop" event in WSGI, so there is no way to detect when the application stops, only when the server / worker stops.

Yarn resource manager fails the application even though Spark application executes successfully

I am running simple hello world python script with AWS EMR + Spark + Yarn.
Looking at the logs, even though the Spark application succeeds, overall job is marked as failed by Yarn resource manager.
Logs for the spark application shows success. "Hello world" is printed in stdout also. (See pastebin for application logs)
Logs for the node manager show no issue or error. (See pastebin for node manager logs)
Logs for the resource manager on master host show that resource manager marks the application as FAILED even though application completion seems to be successful. There is no apparent reason in the log for the failure! (See pastebin for resource manager logs)
I checked all logs and cannot really figure out the root cause. What could be the issue? How can I debug further?
Your logs have the following statement:-
ERROR ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application.
This arises typically if you are setting .master() in the SparkSession builder.

What does Tornado do with active requests when it is stopped?

Question pretty much says it all. If I am running Tornado on a server with Supervisor, what happens to active requests when I deploy code and need to restart the Tornado server? Are they dropped mid-request? Are they allowed to finish?
Supervisord send a signal like HUP or TERM to tornado process, the most important point is how tornado deal with it.
Unfortunately, tornado will simple exit when it get signal like HUP, TERM, INT.
Tornado has a sub module named autoreload, it make the application could detect the code files' changes and reload the application, but it only works the debug mode for one process, and not in WSGI applications. It's development tool.
But, we can define a function within run tornado.autoreload._reload function by manual, and register it for HUP sigal. tornado.autoreload.add_reload_hook can add functions should be called when reload.
Because the tornado doesn't manage the processes well on fork mode, so it's suggested running many independent processes for different ports. On this mode, the _reload will works like set debug flag.
After all, test and benchmark it for make sure it works well in your application.

Python Windows service autostarts too early

I am running a Python script as a Windows service, but it seems to be failing whenever I set it to auto-start. I believe this may be because the service uses network resources that are not yet mounted when the service starts. Is there a way I can get it to wait until startup is complete before running?
Configure your Windows Service so that it has the Workstation Service as a dependency.
This means Windows won't attempt to start your service until the appropriate resources are available.
Add in script wait for the resources who script must use is in good standing, or rewrite script to better design like not exit if dont have connection; wait 1s and try again if connection failed.

Categories

Resources