Specifying Readonly access for Django.db connection object

Specifying Readonly access for Django.db connection object - python

I have a series of integration-level tests that are being run as a management command in my Django project. These tests are verifying the integrity of a large amount of weather data ingested from external sources into my database. Because I have such a large amount of data, I really have to test against my production database for the tests to be meaningful. What I'm trying to figure out is how I can define a read-only database connection that is specific to that command or connection object. I should also add that these tests can't go through the ORM, so I need to execute raw SQL.
The structure of my test looks like this
class Command(BaseCommand):
help = 'Runs Integration Tests and Query Tests against Prod Database'
def handle(self,*args, **options):
suite = unittest.TestLoader().loadTestsFromTestCase(TestWeatherModel)
ret = unittest.TextTestRunner().run(suite)
if(len(ret.failures) != 0):
sys.exit(1)
else:
sys.exit(0)
class TestWeatherModel(unittest.TestCase):
def testCollectWeatherDataHist(self):
wm = WeatherManager()
wm.CollectWeatherData()
self.assertTrue(wm.weatherData is not None)
And the WeatherManager.CollectWeatherData() method would look like this:
def CollecWeatherData(self):
cur = connection.cursor()
cur.execute(<Raw SQL Query>)
wm.WeatherData = cur.fetchall()
cur.close()
I want to somehow idiot-proof this, so that someone else (or me) can't come along later and accidentally write a test that would modify the production database.

You can achieve this by hooking into Django's connection_created signal, and
then making the transaction read-only.
The following works for PostgreSQL:
from django.db.backends.signals import connection_created
class MyappConfig(AppConfig):
def ready(self):
def connection_created_handler(connection, **kwargs):
with connection.cursor() as cursor:
cursor.execute('SET default_transaction_read_only = true;')
connection_created.connect(connection_created_handler, weak=False)
This can be useful for some specific Django settings (e.g. to run development
code with runserver against the production DB), where you do not want to
create a real read-only DB user.

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mydb',
'USER': 'myusername',
'PASSWORD': 'mypassword',
'HOST': 'myhost',
'OPTIONS': {
'options': '-c default_transaction_read_only=on'
}
}
}
Source: https://nejc.saje.info/django-postgresql-readonly.html

Man, once again, I should read the docs more carefully before I post questions here. I can define a readonly connection to my production database in the settings file, and then straight from the docs:
If you are using more than one database, you can use django.db.connections to obtain the connection (and cursor) for a specific database. django.db.connections is a dictionary-like object that allows you to retrieve a specific connection using its alias:
from django.db import connections
cursor = connections['my_db_alias'].cursor()
# Your code here...

If you add a serializer for you model, you could specialized in the serializer that is working in readonly mode
class AccountSerializer(serializers.ModelSerializer):
class Meta:
model = Account
fields = ('id', 'account_name', 'users', 'created')
read_only_fields = ('account_name',)
from http://www.django-rest-framework.org/api-guide/serializers/#specifying-read-only-fields

Related

Django ORM using external database to unmanaged model in view says relation does not exist

I have two Postgres database connections and when using another than default ORM calls fail on views, but not raw. I am using docker and containers are connected and using Django's own runserver command. Using ORM in django command or django shell works fine, but not in a view.
As a side note, both databases are actually Django-projects, but main project is reading some data directly from another project's database using own unmanaged model.
Python: 3.9.7
Django: 3.2.10
Postgres: 13.3 (main project), 12.2 (side project)
# settings
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'USER': 'pguser',
'PASSWORD': 'pguser',
'NAME': 'mainproject',
'HOST': 'project-db', # docker container
'PORT': '5432',
},
'external': {
'ENGINE': 'django.db.backends.postgresql',
'USER': 'pguser',
'PASSWORD': 'pguser',
'NAME': 'sideproject',
'HOST': 'side-db', # docker container, attached to same network
'PORT': '5432',
},
}
# My unmanaged model
class MyTestModel(models.Model):

class Meta:
# table does not exist in 'default', but it does exist in 'external'
db_table = 'my_data_table'
managed = False
# my_data_table is very simple it has id field as integer
# and primary key (works out of the box with django)
# it has normal fields only like IntegerField or CharField
# But even if this model is empty then it returns normally the PK field
# My custom command in Main project
# python manage.py mycommand
# ...
def handle(self, *args, **options):
# This works fine and data from external is populated in MyTestModel
data = MyTestModel.objects.using('external').all()
for x in data:
print(x, vars(x))
# My simple view in Main project
class MyView(TemplateView):
# But using example in views.py, random view class:
def get(self, request, *args, **kwargs):
# with raw works
data = MyTestModel.objects.using('external').raw('select * from my_data_table')
for x in data:
print(x, vars(x))
# This does not work
# throws ProgrammingError: relation "my_data_table" does not exist
data = MyTestModel.objects.using('external').all()
for x in data:
print(x, vars(x))
return super().get(request, *args, **kwargs)
So somehow runserver and views does not generate query correctly when using ORM. It cannot be a connection error, because when running command or view with ".raw()" works.
Now the funny thing is that if I change "db_table" to something what is common in both database lets say "django_content_type" ORM "filter()" and "all()" works in view too. And yes then it actually returns data from correct database. So if main project has 50 content types and side project (external) has 100 content types it actually returns those 100 content types from external.
I have tried everything, rebuilding docker, creating new tables to database directly, force reload postgres user and made sure all is owner permissions and all permissions (should be ok anyway because command side works). I even tried to use another database.

I know I didn't post the full settings and my local settings which could have helped more to solve this case.
But I noticed that I had installed locally Django Silk, which captures the request and tries to analyze called database queries. Looks like it may have been loaded too early or it doesn't like external databases. But disabling Django silk from installed apps and removing it's middleware removed the problem.

Django testing: Got an error creating the test database: database "database_name" already exists

I have a problem with testing. It's my first time writing tests and I have a problem.
I just created a test folder inside my app users, and test_urls.py for testing the urls.
When I type:
python manage.py test users
It says:
Creating test database for alias 'default'... Got an error creating
the test database: database "database_name" already exists
Type 'yes' if you would like to try deleting the test database
'database_name', or 'no' to cancel:
What does it mean? What happens if I type yes? Do I lose all my data in database?

When testing, Django creates a test database to work on so that your development database is not polluted. The error message says that Django is trying to create a test database named "database_name" and that this database already exists. You should check the tables of the database software you are using and check what is in database_name, it's probably been created by mistake.
If you type yes, the database database_name will be deleted and it is unlikely that you will be able to recover the data. So try to understand what is going on first.
You should set the name of the test database in settings.py. There is a specific TEST dictionary in the DATABASE settings for this:
settings.py
...
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'USER': 'mydatabaseuser',
'NAME': 'mydatabase',
'TEST': {
'NAME': 'mytestdatabase',
},
},
}
...
By default, the prefix test_ is added to the name of your development database. You should check your settings.py to check what is going on.
From the docs:
The default test database names are created by prepending test_ to the value of each NAME in DATABASES. When using SQLite, the tests will use an in-memory database by default (i.e., the database will be created in memory, bypassing the filesystem entirely!). The TEST dictionary in DATABASES offers a number of settings to configure your test database. For example, if you want to use a different database name, specify NAME in the TEST dictionary for any given database in DATABASES.

FWIW, in the event that you get such a warning when using the --keepdb argument such as
python manage.py test --keepdb [appname]
then this would typically mean that multiple instances of the Client were instantiated, perhaps one per test. The solution is to create one client for the test class and refer to it in all corresponding methods like so:
from django.test import TestCase, Client
class MyTest(TestCase):
def setUp(self):
self.client = Client()
def test_one(self):
response = self.client.get('/path/one/')
# assertions
def test_two(self):
response = self.client.post('/path/two/', {'some': 'data'})
# assertions
You could also (unverified) create a static client using the setUpClass class method.

Django: using same test database in a separate thread

I am running pytests using a test database with the following DB settings.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'postgres',
'USER': 'something',
'PASSWORD': 'password',
},
}
Using the #pytest.mark.django_db, my test functions access a database called 'test_postgres' created for the tests.
#pytest.mark.django_db
def test_example():
from django.db import connection
cur_ = connection.cursor()
print cur_.db.settings_dict
outputs:
{'ENGINE': 'django.db.backends.postgresql_psycopg2', 'AUTOCOMMIT': True, 'ATOMIC_REQUESTS': False, 'NAME': 'test_postgres', 'TEST_MIRROR': None,...
but if I run a thread inside test_example:
def function_to_run():
from django.db import connection
cur_ = connection.cursor
logger.error(cur_.db.settings_dict)
#pytest.mark.django_db
def test_example():
p = multiprocessing.Process(target=function_to_run)
p.start()
I can see that in that thread the cursor is using database named 'postgres' which is the non-testing database. Output:
{'ENGINE': 'django.db.backends.postgresql_psycopg2', 'AUTOCOMMIT': True, 'ATOMIC_REQUESTS': False, 'NAME': 'postgres', 'TEST_MIRROR': None,...
Is there a way to pass a database connection argument to my thread from the original test function and tell my thread routine to use the same database name ('test_postgres') as my test function?

I found a workaround to my problem.
First you prepare a separate Django settings file for testing (settings_pytest.py), with the following DATABASES setting:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'test_database',
'TEST_NAME': 'test_database',
'USER': 'something',
'PASSWORD': 'password',
},
}
Notice that we define TEST_NAME, and it's the same as NAME, so that running through test runner or not, we will be accessing same database.
Now you need to create this database, and run 'syncdb' and 'migrate' on it first:
sql> CREATE DATABASE test_database;
manage.py syncdb --settings=settings_pytest
manage.py migrate --settings=settings_pytest
Finally you can run your tests with:
py.test --reuse-db
You need to specify --reuse-db, database re-creation will never work since the default database is the same as the test database. If there are changes to your database you will need to recreate the database manually with the commands above.
For the test itself, if you are adding records to the database that you need to be accessed by the spawned child process, remember to add transaction=True to the pytest decorator.
def function_to_run():
Model.objects.count() == 1
#pytest.mark.django_db(transaction=True)
def test_example():
obj_ = Model()
obj_.save()
p = multiprocessing.Process(target=function_to_run)
p.start()

In your function_to_run() declaration you're doing from django.db import connection. Are you sure that will be using the correct test db settings? I suspect the decorator you're using modifies the connection import to use the test_postgres rather than postgres but because you're importing outside of the decorators scope it's not using the right one. What happens if you put it inside the decorator-wrapped function like so...
#pytest.mark.django_db
def test_example():
def function_to_run():
from django.db import connection
cur_ = connection.cursor
logger.error(cur_.db.settings_dict)
p = multiprocessing.Process(target=function_to_run)
p.start()
Edit:
I'm not familiar with pytest_django so I'm shooting in the dark at this point, I imagine that the marker function allows you to decorate a class as well, so have you tried putting all the tests that want to use this shared function and the db in one TestCase class? Like so:
from django.test import TestCase
#pytest.mark.django_db
class ThreadDBTests(TestCase):
# The function we want to share among tests
def function_to_run():
from django.db import connection
cur_ = connection.cursor
logger.error(cur_.db.settings_dict)
# One of our tests taht needs the db
def test_example1():
p = multiprocessing.Process(target=function_to_run)
p.start()
# Another test that needs the DB
def test_example2():
p = multiprocessing.Process(target=function_to_run)
p.start()

How to get the database where a model instance was saved to in Django?

I have a django application using multiple databases. Given an instance of a model, how can I obtain the database where it is stored (if any)? I need this to save another object to the same database as the first.
def add_ducks_to_hunt(hunter):
db = # the hunter's db
duck = Duck()
duck.save(using=db)

Use _state.db, as in:
my_obj = MyModel.objects.get(pk=1)
my_obj._state.db
This is shown in the routing example.

I believe you are looking for settings.py in the project (top level) Django directory. If not, would you post more information?
Here is a bit of my settings.py file in the Django project directory /home/amr/django/amr.
import os
DEBUG = True
TEMPLATE_DEBUG = DEBUG
ADMINS = (
# ('Somone R. Somebody'so-forath-and-so-and-so#somewhere.com'),
)
MANAGERS = ADMINS
DATABASES = {
'default': {
#'ENGINE': 'django.db.backends.', # Add 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
'ENGINE': 'mysql',
'NAME': 'some-server-name', # Or path to database file if using sqlite3.
'USER': '<user>', # Not used with sqlite3.
'PASSWORD': 'xxxxxxxx', # Not used with sqlite3.
'HOST': '', # Set to empty string for localhost. Not used with sqlite3.
'PORT': '', # Set to empty string for default. Not used with sqlite3.
}
}
SESSION_COOKIE_AGE = 15 * 60 * 60 # Age of cookie, in seconds

If you always save a model in the same db, you could store that info in model class and then use it when needed. Probably not very pythonic, but it may do the trick.
Class Hunter(models.Model):
# ... other stuff
db = 'hunter_db'
Access it then with __class__:
def add_ducks_to_hunt(hunter):
db = hunter.__class__.db
duck = Duck()
duck.save(using=db)
On the other hand, if you save objects of one class in different databases, than I see no other way than to write each object's id and db name at object's save() to a table somewhere you can always access it and use that info to load the object.

There is no way to directly get the info from a model instance itself. Ah, _state, yes. its You should explicitly pass in 'using' parameter, just like those built-in methods. Or design router which could deduce the DB to use by check some facts of the model instance, here the hunter.

Assuming you have multiple databases, and you've defined a router, then you can just query your router like:
from myrouters import MyRouter
from myapp.models import Duck
using = MyRouter().db_for_write(Duck)
duck = Duck()
duck.save(using=using)
That said, I'm not sure why you'd need to do this. If the model exists on one database, and you've properly defined a router, then this should be unnecessary. Django will automatically lookup the correct default value for using. If the model exists on multiple databases simultaneously and want to control the specific database where it's written to, then you should already know which database you want to use and shouldn't need to query it.

How to use schemas in Django?

I whould like to use postgreSQL schemas with django, how can I do this?

Maybe this will help.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'OPTIONS': {
'options': '-c search_path=your_schema'
},
'NAME': 'your_name',
'USER': 'your_user',
'PASSWORD': 'your_password',
'HOST': '127.0.0.1',
'PORT': '5432',
}
}
I get the answer from the following link:
http://blog.amvtek.com/posts/2014/Jun/13/accessing-multiple-postgres-schemas-from-django/

I've been using:
db_table = '"schema"."tablename"'
in the past without realising that only work for read-only operation. When you try to add new record it would fail because the sequence would be something like "schema.tablename"_column_id_seq.
db_table = 'schema\".\"tablename'
does work so far. Thanks.

As mentioned in the following ticket:
https://code.djangoproject.com/ticket/6148, we could set search_path for the django user.
One way to achieve this is to set search_path via psql client, like
ALTER USER my_user SET SEARCH_PATH TO path;
The other way is to modify the django app, so that if we rebuild the database, django won't spit all the tables in public schema.
To achieve this, you could override the DatabaseWrapper defined in django.db.backends.postgresql_psycopg2.base
Create the following directory:
app/pg/
├── __init__.py
└── base.py
Here's the content of base.py
from django.db.backends.postgresql_psycopg2.base import DatabaseWrapper
class DatabaseWrapper(DatabaseWrapper):
def __init__(self, *args, **kwargs):
super(DatabaseWrapper, self).__init__(*args, **kwargs)
def _cursor(self):
cursor = super(DatabaseWrapper, self)._cursor()
cursor.execute('SET search_path = path')
return cursor
In settings.py, add the following database configuration:
DATABASES = {
'default': {
'ENGINE': 'app.pg',
'NAME': 'db',
'USER': 'user',
'PASSWORD': '',
'HOST': '',
'PORT': '',
}
}

It's a bit more complicated than tricky escaping. Have a look at Ticket #6148 in Django for perhaps a solution or at least a patch. It makes some minor changes deep in the django.db core but it will hopefully be officially included in django.
After that it's just a matter of saying
db_schema = 'whatever_schema'
in the Meta class or for a global change set
DATABASE_SCHEMA = 'default_schema_name'
in settings.py
UPDATE: 2015-01-08
The corresponding issue in django has been open for 7 years and the patch there will not work any more.
The correct answer to this should be...
At the moment you can't use postgreSQL schemas in django out of the box.

I just developed a package for this problem: https://github.com/ryannjohnson/django-schemas.
After some configuration, you can simply call set_db() on your models:
model_cls = UserModel.set_db(db='db_alias', schema='schema_name')
user_on_schema = model_cls.objects.get(pk=1)
The package uses techniques described in https://stackoverflow.com/a/1628855/5307109 and https://stackoverflow.com/a/18391525/5307109, then wraps them for use with Django models.

I've had some success just saying
db_table = 'schema\".\"tablename'
in the Meta class, but that's really ugly. And I've only used it in limited scenarios - it may well break if you try something complicated. And as said earlier, it's not really supported...

There is no explicit Django support for postgreSQL schemas.
When using Django (0.95), we had to add a search_path to the Django database connector for PostgreSQL, because Django didn't support specifying the schema that the tables managed by the ORM used.
Taken from:
http://nxsy.org/using-postgresql-schemas-with-sqlalchemy-and-elixir
The general response is to use SQLAlchemy to construct the SQL properly.
Oh, and here's another link with some suggestions about what you can do with the Django base, extending it to try to support your scheme:
http://news.ycombinator.com/item?id=662901

I know that this is a rather old question, but a different solution is to alter the SEARCH_PATH.
Example
Lets say you have three tables.
schema1.table_name_a
schema2.table_name_b
table_name_c
You could run the command:
SET SEARCH_PATH to public,schema1,schema2;
And refer to the tables by their table names only in django.
See 5.7.3. The Schema Search Path

For SQL server database:
db_table = "[your_schema].[your_table]"

https://docs.djangoproject.com/en/dev/topics/db/multi-db/#using-routers
urls.py
from django.urls import path, include
from rest_framework.routers import DefaultRouter
from my_app.my_views import ClientViewSet
router = DefaultRouter(trailing_slash=False)
router.register(r'', ClientViewSet, base_name='clients')
urlpatterns = [
path('', include(router.urls)),
]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.