Django loaddata with natural keys and many-to-many-relation to self

Django loaddata with natural keys and many-to-many-relation to self - python

I have a model with ManyToManyField to "self". Now I want to copy data from my local to the production environment. In order to do that, I'd like to dump all entries from my database to a fixture and load that on the production server. Because that server already contains data, I export them using natural keys. The export runs fine, but I get errors during loaddata if one instance references a relation to another instance of the same model that is defined below that one in the fixture.
What confuses me: Everything goes well, if I don't use natural keys. But this clashes with data already present on production.
Please consider the following example:
from django.db import models
class PersonManager(models.Manager):
def get_by_natural_key(self, name):
return self.get(name=name)
class Person(models.Model):
objects = PersonManager()
name = models.CharField(max_length=150, unique=True)
friends = models.ManyToManyField("self", blank=True)
def natural_key(self):
return self.name,
To reproduce:
>>> python manage.py shell
>>> from relations.models import *
>>> person1 = Person.objects.create(name='foo')
>>> person2 = Person.objects.create(name='bar')
>>> person3 = Person.objects.create(name='baz')
>>> person2.friends.add(person2)
>>> person2.save()
>>> exit()
>>>
>>> python manage.py dumpdata relations.Person --natural-primary --natural-foreign > relations/fixtures/people.json
# change the database url to an empty database and run migrations
>>> python3 manage.py loaddata people
raises relations.models.DoesNotExist: Person matching query does not exist.
If I omit the --natural-primary and -natural-foreign but follow the same steps, loading into a fresh db works.
Can somebody explain why the different behaviour occurs?

Related

How to create an object for a model from the console similar to how Django made the createsuperuser command

I'm trying to create an object from the console but not sure how to set that up.
This is my modelManager:
class MajorManager(models.Manager):
def __str__(self):
return self.name
def createMajor(self, name):
try:
name = name.lower()
major = self.create(name=name)
except IntegrityError:
print("This major has already been created")
And here is the model:
class Majors(models.Model):
name = models.CharField(max_length=30, unique=True)
objects = MajorManager()
Any help would be much appreciated.

You can go this route using Django's API - checkout the docs
First create a shell:
python manage.py shell
Then you can import your models and do basic CRUD on them.
>>> from polls.models import Choice, Question # Import the model classes we just wrote.
# No questions are in the system yet.
>>> Question.objects.all()
<QuerySet []>
# Create a new Question.
# Support for time zones is enabled in the default settings file, so
# Django expects a datetime with tzinfo for pub_date. Use timezone.now()
# instead of datetime.datetime.now() and it will do the right thing.
>>> from django.utils import timezone
>>> q = Question(question_text="What's new?", pub_date=timezone.now())
# Save the object into the database. You have to call save() explicitly.
>>> q.save()
Or, alternatively you can try the dbshell route, here's the documentation.
This command assumes the programs are on your PATH so that a simple
call to the program name (psql, mysql, sqlite3, sqlplus) will find the
program in the right place. There’s no way to specify the location of
the program manually.
You can't use the Django's ORM though, it's pure SQL, so it would be instructions like:
CREATE TABLE user (
Id Int,
Name Varchar
);

"Matching"/relations data across databases in Django

In developing a website for indexing system documentation I've come across a tough nut to crack regarding data "matching"/relations across databases in Django.
A simplified model for my local database:
from django.db import models
class Document(models.Model):
name = models.CharField(max_length=200)
system_id = models.IntegerField()
...
Imagined model, system details are stored in a remote database.
from django.db import models
class System(models.Model):
name = models.CharField(max_length=200)
system_id = models.IntegerField()
...
The idea is that when creating a new Document entry at my website the ID of the related system is to be stored in the local database. When presenting the data I would have to use the stored ID to retrieve the system name among other details from the remote database.
I've looked into foreign keys across databases, but this seems to be very extensive and I'm not sure if I want relations. Rather I visualize a function inside the Document model/class which is able to retrieve the matching data, for example by importing a custom router/function.
How would I go about solving this?
Note that I won't be able to alter anything on the remote database, and it's read-only. Not sure if I should create a model for System aswell. Both databases use PostgreSQL, however my impression is that it's not really of relevance to this scenario which database is used.

In the django documentation multi-db (manually-selecting-a-database)
# This will run on the 'default' database.
Author.objects.all()
# So will this.
Author.objects.using('default').all()
# This will run on the 'other' database.
Author.objects.using('other').all()
The 'default' and 'other' are aliases for you databases.
In your case it would could be 'default' and 'remote'.
of course you could replace the .all() with anything you want.
Example: System.objects.using('remote').get(id=123456)

You are correct that foreign keys across databases are a problem in Django ORM, and to some extent at the db level too.
You already have the answer basically: "I visualize a function inside the Document model/class which is able to retrieve the matching data"
I'd do it like this:
class RemoteObject(object):
def __init__(self, remote_model, remote_db, field_name):
# assumes remote db is defined in Django settings and has an
# associated Django model definition:
self.remote_model = remote_model
self.remote_db = remote_db
# name of id field on model (real db field):
self.field_name = field_name
# we will cache the retrieved remote model on the instance
# the same way that Django does with foreign key fields:
self.cache_name = '_{}_cache'.format(field_name)
def __get__(self, instance, cls):
try:
rel_obj = getattr(instance, self.cache_name)
except AttributeError:
system_id = getattr(instance, self.field_name)
remote_qs = self.remote_model.objects.using(self.remote_db)
try:
rel_obj = remote_qs.get(id=system_id)
except self.remote_model.DoesNotExist:
rel_obj = None
setattr(instance, self.cache_name, rel_obj)
if rel_obj is None:
raise self.related.model.DoesNotExist
else:
return rel_obj
def __set__(self, instance, value):
setattr(instance, self.field_name, value.id)
setattr(instance, self.cache_name, value)
class Document(models.Model:
name = models.CharField(max_length=200)
system_id = models.IntegerField()
system = RemoteObject(System, 'system_db_name', 'system_id')
You may recognise that the RemoteObject class above implements Python's descriptor protocol, see here for more info:
https://docs.python.org/2/howto/descriptor.html
Example usage:
>>> doc = Document.objects.get(pk=1)
>>> print doc.system_id
3
>>> print doc.system.id
3
>>> print doc.system.name
'my system'
>>> other_system = System.objects.using('system_db_name').get(pk=5)
>>> doc.system = other_system
>>> print doc.system_id
5
Going further you could write a custom db router:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/#using-routers
This would let you eliminate the using('system_db_name') calls in the code by routing all reads for System model to the appropriate db.

I'd go for a method get_system(). So:
class Document:
def get_system(self):
return System.objects.using('remote').get(system_id=self.system_id)
This is the simplest solution. A possible solution is also to use PostgreSQL's foreign data wrapper feature. By using FDW you can abstract away the multidb handling from django and do it inside the database - now you can use queries that need to use the document -> system relation.
Finally, if your use case allows it, just copying the system data periodically to the local db can be a good solution.

django orm search with postgres

For small searches in PostgreSQL, http://django-orm.readthedocs.org/en/latest/orm-pg-fulltext.html can be used easily, as shown in the docs.
Following are the steps, I used to implement it -
'''call the libraries'''
from djorm_pgfulltext.models import SearchManager
from djorm_pgfulltext.fields import VectorField
class Notes(models.Model):
title = models.CharField()
description = models.TextField()
# create a vector field
search_index = VectorField()
objects = models.Manager()
search_manager = SearchManager(
fields=('title', 'description'),
config='pg_catalog.english',
search_field='search_index',
auto_update_search_field=True
)
Ran the migration, and all the changes are being reflected in the database.
Last step -
In my postgresql database, I did the following -
sudo -u postgres psql postgres // login as root
CREATE EXTENSION unaccent;
ALTER FUNCTION unaccent(text) IMMUTABLE;
All this done, now I open my shell
from myapp.models import Notes
In [2]: Note.search_manager.search("p")
Out[3]: []
Any idea, why I am getting no result??
What is missing?

Getting missing column error whenever a model is saved

I'm having trouble creating a model in django. I wrote a model like this:
from django.db import models
class FooModel(models.Model):
name = models.CharField(max_length = 255)
I run
manage.py syncdb
But when I'm in the shell, I can't save an instance. Every time I call it, it tells me it's missing a column
manage.py shell
>>from app.models import FooModel
>>foo = FooModel()
>>foo.name = 'foo'
>>foo.save()
DatabaseError: column "name" of relation "ecommerce_foomodel" does not exist
LINE 1: INSERT INTO "ecommerce_foomodel" ("name") VALUES (E'123123as...
We're using postgres.

The database table was created before you added the corresponding fields.
So, you can recreate all of the tables of that app (in case you don't have any useful data) using:
python manage.py reset ecommerce
Or, you should migrate the database to the latest version using South.

UUID field added after data already in database. Is there any way to populate the UUID field for existing data?

I've added a UUID field to some of my models and then migrated with South. Any new objects I create have the UUID field populated correctly. However the UUID fields on all my older data is null.
Is there any way to populate UUID data for existing data?

For the following sample class:
from django_extensions.db.fields import UUIDField
def MyClass:
uuid = UUIDField(editable=False, blank=True)
name = models.CharField()
If you're using South, create a data migration:
python ./manage.py datamigration <appname> --auto
And then use the following code to update the migration with the specific logic to add a UUID:
from django_extensions.utils import uuid
def forwards(self, orm):
for item in orm['mypp.myclass'].objects.all():
if not item.uuid:
item.uuid = uuid.uuid4() #creates a random GUID
item.save()
def backwards(self, orm):
for item in orm['mypp.myclass'].objects.all():
if item.uuid:
item.uuid = None
item.save()
You can create different types of UUIDs, each generated differently. the uuid.py module in Django-extensions has the complete list of the types of UUIDs you can create.
It's important to note that if you run this migration in an environment with a lot of objects, it has the potential to time out (for instance, if using fabric to deploy). An alternative method of filling in already existing fields will be required for production environments.
It's possible to run out of memory while trying to do this to a large number of objects (we found ourselves running out of memory and having the deployment fail with 17,000+ objects).
To get around this, you need to create a custom iterator in your migration (or stick it where it's really useful, and refer to it in your migration). It would look something like this:
def queryset_iterator(queryset, chunksize=1000):
import gc
pk = 0
last_pk = queryset.order_by('-pk')[0].pk
queryset=queryset.order_by('pk')
if queryset.count() < 1
return []
while pk < last_pk:
for row in queryset.filter(pk__gt=pk)[:chunksize]:
pk = row.pk
yield row
gc.collect()
And then your migrations would change to look like this:
class Migration(DataMigration):
def forwards(self, orm):
for item in queryset_iterator(orm['myapp.myclass'].objects.all()):
if not item.uuid:
item.uuid = uuid.uuid1()
item.save()
def backwards(self, orm):
for item in queryset_iterator(orm['myapp.myclass'].objects.all()):
if item.uuid:
item.uuid = None
item.save()

To add UUID values to all existing records first you will need to make sure your model has the UUID filed with blank=True, null=True
Then Run the schemamigration command with south and then open up the resulting migration file.
And then Edit your migration file with the following as shown in this post
Quote:
You'll need to import the following import uuid
At the end of the forwards() function add the following def forwards(self, orm):
...
for a in MyModel.objects.all():
a.uuid = u'' + str(uuid.uuid1().hex)
a.save()
As stated that will loop through existing instances and add a uuid to it as part of the migration.

There is now an excellent, updated answer for Django 1.9 to this exact question in the Django docs.
Saved me a lot of time!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django loaddata with natural keys and many-to-many-relation to self - python

Related

How to create an object for a model from the console similar to how Django made the createsuperuser command

"Matching"/relations data across databases in Django

django orm search with postgres

Getting missing column error whenever a model is saved

UUID field added after data already in database. Is there any way to populate the UUID field for existing data?

Categories

Resources