Custom access log in aiohttp

Custom access log in aiohttp - python

I want to have additional attributes of request to be logged in access log of aiohttp server.
For example I have middleware, that adds user attribute to each request, and I want to store this value inside extra attribute for access log records. Documentation suggests to overwrite aiohttp.helpers.AccessLogger which indeed seems to be a good starts, but what do I do next, where do I put instance of my custom logger? I looked through the code and it looks like that is not possible on application creation stage, but rather on application run. But I'm running application using different approaches, so this it's not that convenient to modify startup in several places (for example locally I'm using aiohttp-devtools runserver and gunicorn for deployment).
So what should be correct approach here?
(Also I'd like to do the same for error log, but that seems even more complex, so for now I'm just using another middleware that catches errors and creates log records I need).

Taking into account that development logging config is usually very different from production one keeping two different approaches for gunicorn and aiohttp-devtools are totally fine.
For dev server you perhaps need to log everything on a console, staging and production writes logs differently.

Related

Can I use python3 classes inside a Flask app?

I'm designing a Flask app that graphs some weather data for several cities. It makes sense to me to use a "City" class that handles the fetching and parsing of the data every time the page is loaded. However, what I'm not sure about is how Flask would handle these instances. Is Flask "smart" enough to know to release the memory for these instances after the page is served? Or will it just gradually consume more and more memory?
Alternatively, would I just be able to create a single global class instance for each city OUTSIDE of the "#app.route" functions that I could use whenever a page is requested?
The deployment server will be Windows IIS using FastCGI, in case that matters at all.

Flask is "just" a framework. It is still executed and managed by the "normal" Python interpreter so the question "how Flask would handle these instances" is nonexistent.
Define classes and use their instances as you would in any other Python project/snippet, however it might be beneficial to think where to define them.
It will not make sense inside a route since the class will be redefined every time a request is received, but the how is exactly the same.

LiveServerTestCase server sees different database to tests

I have some code (a celery task) which makes a call via urllib to a Django view. The code for the task and the view are both part of the same Django project.
I'm testing the task, and need it to be able to contact the view and get data back from it during the test, so I'm using a LiveServerTestCase. In theory I set up the database in the setUp function of my test case (I add a list of product instances) and then call the task, it does some stuff, and then calls the Django view through urllib (hitting the dev server set up by the LiveServerTestCase), getting a JSON list of product instances back.
In practice, though, it looks like the products I add in setUp aren't visible to the view when it's called. It looks like the test case code is using one database (test_<my_database_name>) and the view running on the dev server is accessing another (the urllib call successfully contacts the view but can't find the product I've asked for).
Any ideas why this may be the case?
Might be relevant - we're testing on a MySQL db instead of the sqlite.
Heading off two questions (but interested in comments if you think we're doing this wrong):
I know It seems weird that the task accesses the view using urllib. We do this because the task usually calls one of a series of third party APIs to get info about a product, and if it cannot access these, it accesses our own Django database of products. The code that makes the urllib call is generic code that is agnostic of which case we're dealing with.
These are integration tests so we'd prefer actually make the urllib call rather than mock it out

The celery workers are still feeding off of the dev database even if the test server brings up other databases because they were told to in the settings file.
One fix would be to make a separate settings_test.py file that specifies the test database name and bring up celery workers from the setup command using subprocess.checkoutput that consume from a special queue for testing. Then these celery workers would feed from the test database rather than the dev database.

Flask App Dispatching: Multiple uWSGI instances or dispatch to single instance.

I'm working on a Flask application, in which we will have multiple clients (10-20) each with their own configuration (for the DB, client specific settings etc.) Each client will have a subdomain, like www.client1.myapp.com, www.cleint2.myapp.com. I'm using uWSGI as the middleware and nginx as the proxy server.
There are a couple of ways to deploy this I suppose, one would be to use application dispatching, and a single instance of uwsgi. Another way would be to just run a separate uwsgi instance for each client, and forward the traffic with nginx based on subdomain to the right app. Does anyone know of the pros and cons of each scenario? Just curious, how do applications like Jira handle this?

I would recommend having multiple instances, forwarded to by nginx. I'm doing something similar with a PHP application, and it works very well.
The reason, and benefit of doing it this way is that you can keep everything completely separate, and if one client's setup goes screwy, you can re-instance it and there's no problem for anyone else. Also, no user, even if they manage to break the application level security can access any other user's data.
I keep all clients on their own databases (one mysql instance, multiple dbs), so I can do a complete sqldump (if using mysql, etc) or for another application which uses sqlite rather than mysql: copy the .sqlite database completely for backup.
Going this way means you can also easily set up a 'test' version of a client's site, as well as as live one. Then you can swap which one is actually live just by changing your nginx settings. Say for doing upgrades, you can upgrade the testing one, check it's all OK, then swap. (Also, for some applications, the client may like having their own 'testing' version, which they can break to their hearts content, and know they (or you) can reinstance it in moments, without harming their 'real' data).
Going with application dispatching, you cannot easily get nginx to serve separate client upload directories, without having a separate nginx config per client (and if you're doing that, then why not go for individual uWSGI instances anyway). Likewise for individual SSL certificates (if you want that...).
Each subdomain (or separate domain entirely for some) has it's own logging, so if a certain client is being DOS'd, or hacked otherwise, it's easy to see which one.
You can set up file-system level size quotas per user, so that if one client starts uploading gBs of video, your server doesn't get filled up as well.
The way I'm working is using ansible to provision and set up the server how I want it, with the client specific details kept in a separate host_var file. So my inventory is:
[servers]
myapp.com #or whatever...
[application_clients]
apples
pears
elephants
[application_clients:vars]
ansible_address=myapp.com
then host_vars/apples:
url=apples.myapp.com
db_user=apples
db_pass=secret
then in the provisioning, I set up a two new users & one group for each client. For instance: apples, web.apples as the two users, and the group simply as apples (which both are in).
This way, all the application code is owned by apples user, but the PHP-FPM instance (or uWSGI instance in your case) is run by web.apples. The permissions of all the code is rwXr-X---, and the permissions of uploads & static directories is rwXrwXr-X. Nginx runs as it's own user, so it can access ONLY the upload/static directories, which it can serve as straight files. Any private files which you want to be served by the uWSGI app can be set that way easily. The web user can read the code, and execute it, but cannot edit it. The actual user itself can read and write to the code, but isn't normally used, except for updates, installing plugins, etc.
I can give out a SFTP user to a client which is chroot'd to their uploads directory if they want to upload outside of the application interface.
Using ansible, or another provisioning system, means there's very little work needed to create a new client setup, and if a client (for whatever reason) wants to move over to their own server, it's just a couple of lines to change in the provisioning details, and re-run the scripts. It also means I can keep a development server installed with the exact same provisioning as the main server, and also I can keep a backup amazon instance on standby which is ready to take over if ever I need it to.
I realise this doesn't exactly answer your question about pros and cons each way, but it may be helpful anyway. Multiple instances of uWSGI or any other WSGI server (mainly I use waitress, but there are plenty of good ones) are very simple to set up and if done logically, with a good provisioning system, easy to administrate.

How to make process wide editable variable (storage) in django?

I want create project wide accessible storage for project/application settings.
What i want to achieve: - Each app has it's own app specific settings stored in db - When you spawn django wsgi process each settings are stored in memory storage and are available project wide - Whenever you change any setting value in db there is a call to regenerate storage from db
So it works very close to cache but I can't use cache mechanism because it's serializing data. I can also use memcache for that purpose but i want to develop generic solution (not always you have access to memcache).
If anyone have any ideas to solve my problem i would be really gratefully for sharing.

Before giving any specific advice to you, you need to be aware of the limitations of these systems.
ISSUES
Architectural Differences between Django and PHP/other popular language.
PHP re-reads and re-evaluates the entire code tree every time a page is accessed. This allows PHP to re-read settings from DB/cache/whatever on every request.
Initialisation
Many django apps initialise their own internal cache of settings and parameters and perform validation on these parameters at module import time (server start) rather than on every request. This means you would need a server restart anyway when modifying any settings for non db-settings enabled apps, so why not just change settings.py and restart server?
Multi-site/Multi-instance and in-memory settings
In Memory changes are strictly discouraged by Django docs because they will only affect a single server instance. In the case of multiple sites (contrib.sites), only a single site will recieve updates to shared settings. In the case of instanced servers (ep.io/gondor) any changes will only be made to a local instance, not every instance running for your site. Tread carefully
PERFORMANCE
In Memory Changes
Changing settings values while the server is running is strictly discouraged by django docs for the reasons outlined above. However there is no performance hit with this option. USE ONLY WITHIN THE CONFINES OF SPECIFIC APPS MADE BY YOU AND SINGLE SERVER/INSTANCE.
Cache (redis-cache/memcached)
This is the intermediate speed option. Reasonably fast lookup of settings which can be deserialised into complex python structures easily - great for dict-configs. IMPORTANT is that values are shared among Sites/Instances safely and are updated atomically.
DB (SLOOOOW)
Grabbing one setting at a time from DB will be very very slow unless you hack in connection pooling. grabbing all settings at once is faster but increases db transfer on every single request. Settings synched between Sites/Instances safely. Use only for 1 or 2 settings and it would be reasonable to use.
CONCLUSION
Storing configuration values in database/cache/mem can be done, but you have to be aware of the caveats, and the potential performance hit. Creating a generic replacement for settings.py will not work, however creating a support app that stores settings for other apps could be a viable solution, as long as the other apps accept that settings must be reloaded every request.

Best practice: How to persist simple data without a database in django?

I'm building a website that doesn't require a database because a REST API "is the database". (Except you don't want to be putting site-specific things in there, since the API is used by mostly mobile clients)
However there's a few things that normally would be put in a database, for example the "jobs" page. You have master list view, and the detail views for each job, and it should be easy to add new job entries. (not necessarily via a CMS, but that would be awesome)
e.g. example.com/careers/ and example.com/careers/77/
I could just hardcode this stuff in templates, but that's no DRY- you have to update the master template and the detail template every time.
What do you guys think? Maybe a YAML file? Or any better ideas?
Thx

Why not still keep it in a database? Your remote REST store is all well and funky, but if you've got local data, there's nothing (unless there's spec saying so) to stop you storing some stuff in a local db. Doesn't have to be anything v glamorous - could be sqlite, or you could have some fun with redis, etc.

You could use the Memcachedb via the Django cache interface.
For example:
Set the cache backend as memcached in your django settings, but install/use memcachedb instead.
Django can't tell the difference between the two because the provide the same interface (at least last time I checked).
Memcachedb is persistent, safe for multithreaded django servers, and won't lose data during server restarts, but it's just a key value store. not a complete database.

Some alternatives built into the Python library are listed in the Data Persistence chapter of the documentation. Still another is JSON.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.