event, exception logging for django - python

Ok, here is my confusion/problem:
I develop in localhost and there you could raise exceptions and could see the logs in command line easily.
Then I deploy the code on test, stage and production server, that is where the problem begins, it is not easy to see logs or debug errors and exceptions. For normal errors I guess django-toolbar could be enabled, but I do get some silent exceptions which dont crash but the process manipulates to failure because of that. For example, I have payment integration, and few days ago the payments were failing on return (callback) on our site, but nothing was crashing, just that payment process failed message was coming, but the payment gateway vendor was working fine, then I had to look for some failure instances which could lead to this problem and figured out that one db save operation was not saving because the variable was not there.
Now my question, is Sentry (https://github.com/getsentry/sentry) an answer for that? Or is there any other option for this?
Please do ask if any further clarification is needed for my requirement.

Sentry is an option, but honestly is too limited (I tried it a month ago or so), it's intended to track exceptions, but in the real world we should track important informations and events too.
If you didn't setup an application logging, I suggest you to do it, by following this example.
In my app I defined several loggers, for different purposes, the python logging configuration via dictionary (the one used by Django), is very powerful and you have a full control over how things get logged, for example you can write logs to a file, to a database, send an email, call a third party api or whatever. If your app is running in a load balanced environment (so you have several machines running your app), you can use services like Loggly to aggregate the logs coming from your instances in a single place (and since it uses RSYSLOG, it aggregates not only your Django app logs, but also all the logs of your underlying OS).
I suggest you to use also New Relic, which keeps track of a lot of stuff automatically: query executed and time, template loading time, errors and a lot of other useful statistics.

Related

How to properly create logging configuration for structured AWS Cloud Watch logging?

I have a single docker image in AWS ECR which is used to create three Fargate-based services (FargateService, ScheduledFargateTask, and ApplicationLoadBalancedFargateService), let's call them A, B, and C. Each service has a log group created in the CDK code (e.g. A-log-group), which is just for the container output.
I would like to introduce more structured logging to my application. So far I've tried setting up logging from within the image, by providing a logging configuration like so:
_aws_handlers = {
name: {
"level": "INFO",
"class": "watchtower.CloudWatchLogHandler",
"boto3_client": boto3.client("logs"),
"log_group_name": LOGGER_AWS_LOG_GROUP_NAME,
"formatter": "aws_formatter",
"log_stream_name": name
} for name in ["django", "gunicorn", ...]
}
where LOGGER_AWS_LOG_GROUP_NAME is some value passed in during the service setup (e.g. A-boto-log-group), so that the resulting logging structure would be:
A-log-group
B-log-group
C-log-group
A-boto-log-group
B-boto-log-group
C-boto-log-group
Each boto-log-group would then have log streams called django, gunicorn, ..., depending on what logger "names" I've registered in the _aws_handlers. Later I was considering splitting it further based on levels, e.g. A-boto-log-group-ERROR, A-boto-log-group-INFO etc. But, for now, I just want the above to be functional.
However, this solution results in some strange AWS CDK behaviour, as the deployment can never be finalised and gets stuck in the UPDATE_IN_PROGRESS state. It doesn't error out, the log groups are created, and the logs can actually be recorded. But, for some reason, I only get the first initial log message recorded for one of the "names", for one of the services, and then never anything else. Furthermore, if I manually exec into a container and trigger a log message, the message is properly logged in the respective boto-log-group. So I'm not really sure why the deployment hangs and the service messages aren't automatically logged properly.
I don't think this has anything to do with permissions as I've given full control to the service for debugging purposes:
self.service.task_definition.task_role.add_to_principal_policy(
PolicyStatement(
actions=[
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
effect=Effect.ALLOW,
resources=["arn:aws:logs:*:*:*"],
)
)
Maybe somehow the logs get "consumed" by the standard container output logging and therefore only get recorded in the X-log-group type log groups? Or maybe for some reason boto can't cope with multiple handlers like that? I'm not really sure what's going on, I'd at least expect the deployment to fail if there were issues, instead of getting stuck. The code works for me locally with the same exact approach but using file logging.
Alternatively, I would be happy to consider a different logging approach. Maybe through recording some events to which some AWS logging tool would be subscribed to? The event would then get logged into multiple log groups or some other log tool. I would even be happy with some sample functional code, as my only goal is to get a structured way to view service logs, based on the level, the logger name, and the service which triggered the message - using the approach I've described above is not essential for me.
Edit: Turns out the code is partially fine, as only the gunicorn-related service is failing (more precisely, its health check is failing). The code boots up, but the workers aren't starting. I don't know yet why that happens or how to debug it.

How to avoid exceptions with open telemetry + zipkin + python if service is not available?

I have successfully setup tracing using open telemetry with Zipkin exporter in my Python app. When I closed the docker container running Zipkin the app started (quite rightly) to throw exceptions. Since my preference is for app functionality/performance over trace availability i'd like to understand if there is a setup or configuration to ignore the fact that traces cannot be exported?
I briefly used Jaeger which I believe used UDP and so wouldn't care if I stopped or started the docker instance. It would suit me to have similar functionality.
I have considered selecting the exporter at runtime, one of console or Zipkin but then I would need to restart the app to change over.
Since my preference is for app functionality/performance over trace availability i'd like to understand if there is a setup or configuration to ignore the fact that traces cannot be exported?
This is indeed one of the basic requirements in the error handling principles of OTEL https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/error-handling.md#basic-error-handling-principles.
The span processor implementation are made sure the any exception the might occur during export is caught and only logged for the visibility but never thrown into main application. You can verify it here for SimpleSpanProcessor and here for BatchSpanProcessor. You might want to share the case where you saw exceptions throw into application flow.

Run code on first Django start

I have a Django application written to handle displaying a webpage with data from a model based on the primary key passed in the URL, this all works fine and the Django component is working perfectly for the most part.
My question though is, and I have tried multiple methods such as using an AppConfig, is how I can make it so when the Django server boots up, code is called that would then create a separate thread which would then monitor an external source, logging valid data from that source as a model into the database.
I have the threading code written along with the section that creates the model and saves it in the database, my issue though is that if I try to use an AppConfig to create the thread which would then handle the code, I get an django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet. error and the server does not boot up.
Where would be appropriate to place the code? Is my approach incorrect to the matter?
Trying to use threading to get around blocking processes like web servers is an exercise in pain. I've done it before and it's fragile and often yields unpredictable results.
A much easier idea is to create a separate worker that runs in a totally different process that you start separately. It would have the same database access and could even use your Django models. This is how hosts like Heroku approach this problem. It comes with the added benefit of being able to be tested separately and doesn't need to run at all while you're working on your main Django application.
These days, with a multitude of virtualization options like Vagrant and containerization options like Docker, running parallel processes and workers is trivial. In the wild they may literally be running on separate servers with your database on yet another server. As was mentioned in the comments, starting a worker process could easily be delegated to a separate Django management command. This, in turn, can be fairly easily turned into separate worker processes by gunicorn on your web server.

Google App Engine - run task on publish

I have been looking for a solution for my app that does not seem to be directly discussed anywhere. My goal is to publish an app and have it reach out, automatically, to a server I am working with. This just needs to be a simple Post. I have everything working fine, and am currently solving this problem with a cron job, but it is not quite sufficient - I would like the job to execute automatically once the app has been published, not after a minute (or whichever the specified time it may be set to).
In concept I am trying to have my app register itself with my server and to do this I'd like for it to run once on publish and never be ran again.
Is there a solution to this problem? I have looked at Task Queues and am unsure if it is what I am looking for.
Any help will be greatly appreciated.
Thank you.
Personally, this makes more sense to me as a responsibility of your deploy process, rather than of the app itself. If you have your own deploy script, add the post request there (after a successful deploy). If you use google's command line tools, you could wrap that in a script. If you use a 3rd party tool for something like continuous integration, they probably have deploy hooks you could use for this purpose.
The main question will be how to ensure it only runs once for a particular version.
Here is an outline on how you might approach it.
You create a HasRun module, which you use store each the version of the deployed app and this indicates if the one time code has been run.
Then make sure you increment your version, when ever you deploy your new code.
In you warmup handler or appengine_config.py grab the version deployed,
then in a transaction try and fetch the new HasRun entity by Key (version number).
If you get the Entity then don't run the one time code.
If you can not find it then create it and run the one time code, either in a task (make sure the process is idempotent, as tasks can be retried) or in the warmup/front facing request.
Now you will probably want to wrap all of that in memcache CAS operation to provide a lock or some sort. To prevent some other instance trying to do the same thing.
Alternately if you want to use the task queue, consider naming the task the version number, you can only submit a task with a particular name once.
It still needs to be idempotent (again could be scheduled to retry) but there will only ever be one task scheduled for that version - at least for a few weeks.
Or a combination/variation of all of the above.

shell command from python script

I need you guys :D
I have a web page, on this page I have check some items and pass their value as variable to python script.
problem is:
I Need to write a python script and in that script I need to put this variables into my predefined shell commands and run them.
It is one gnuplot and one other shell commands.
I never do anything in python can you guys send me some advices ?
THx
I can't fully address your questions due to lack of information on the web framework that you are using but here are some advice and guidance that you will find useful. I did had a similar problem that will require me to run a shell program that pass arguments derived from user requests( i was using the django framework ( python ) )
Now there are several factors that you have to consider
How long will each job takes
What is the load that you are expecting (are there going to be loads of jobs)
Will there be any side effects from your shell command
Here are some explanation that why this will be important
How long will each job takes.
Depending on your framework and browser, there is a limitation on the duration that a connection to the server is kept alive. In other words, you will have to take into consideration that the time for the server to response to a user request do not exceed the connection time out set by the server or the browser. If it takes too long, then you will get a server connection time out. Ie you will get an error response as there is no response from the server side.
What is the load that you are expecting.
You will have probably figure that if a work that you are requesting is huge,it will take out more resources than you will need. Also, if you have multiple requests at the same time, it will take a huge toll on your server. For instance, if you do proceed with using subprocess for your jobs, it will be important to note if you job is blocking or non blocking.
Side effects.
It is important to understand what are the side effects of your shell process. For instance, if your shell process involves writing and generating lots of temp files, you will then have to consider the permissions that your script have. It is a complex task.
So how can this be resolve!
subprocesswhich ship with base python will allow you to run shell commands using python. If you want more sophisticated tools check out the fabric library. For passing of arguments do check out optparse and sys.argv
If you expect a huge work load or a long processing time, do consider setting up a queue system for your jobs. Popular framework like celery is a good example. You may look at gevent and asyncio( python 3) as well. Generally, instead of returning a response on the fly, you can retur a job id or a url in which the user can come back later on and have a look
Point to note!
Permission and security is vital! The last thing you want is for people to execute shell command that will be detrimental to your system
You can also increase connection timeout depending on the framework that you are using.
I hope you will find this useful
Cheers,
Biobirdman

Categories

Resources