Django JOIN eliminates desired columns

Django JOIN eliminates desired columns - python

I'm trying to join two tables with django related to each other with a foreign key field.
class Question(models.Model):
description = models.TextField('Description', blank=True, null=True)
class Vote(models.Model):
question = models.ForeignKey(Question)
profile = models.ForeignKey(UserProfile)
value = models.IntegerField('Value')
creator = models.ForeignKey(User)
I tried to create a queryset by using
questions = Question.objects.filter(vote__creator=3).values()
which results in a set like this
+----+-------------+
| id | description |
+----+-------------+
....
If I run a slightly similar query by hand in mysql with
select * from questions as t1 join votes as t2 on t1.id=question_id where creator_id=3;
it results in a set like this
+----+-------------+------+-------------+------------+-------+------------+
| id | description | id | question_id | profile_id | value | creator_id |
How can I prevent django from cutting columns from my resulting queryset? I'd really wish to retrieve a fully joined table.

use objects.select_related():
questions = Question.objects.select_related().filter(vote__creator=3).values()

A Question.objects.filter(...) will generate the following SQL
select * from question where question.id in (...)`
So, as you can see, it's different from what you wanted (question join votes)

Related

PostgreSQL JOIN on JSON Object column

I'm supposed to join 3 different tables on postgres:
lote_item (on which I have some books id's)
lote_item_log (on which I have a column "attributes", with a JSON object such as {"aluno_id": "2823", "aluno_email": "someemail#outlook.com", "aluno_unidade": 174, "livro_codigo": "XOZK-0NOYP0Z1EMJ"}) - Obs.: Some values on aluno_unidade are null
and finally
company (on which I have every school name for every aluno_unidade.
Ex: aluno_unidade = 174 ==> nome_fantasia = mySchoolName).
Joining the first two tables was easy, since lote_item_log has a foreign key which I could match like this:
SELECT * FROM lote_item JOIN lote_item_log ON lote_item.id = lote_item_log.lote_item_id
Now, I need to get the School Name, contained on table company, with the aluno_unidade ID from table lote_item_log.
My current query is:
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
(
SELECT
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int
FROM
lote_item_log
WHERE
operation_id = 6
) = company.senior_id
WHERE
item_id = {book_id};
operation_id determines which school is active.
ERROR I'M GETTING:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.CardinalityViolation) more than one row returned by a subquery used as an expression
I tried LIMIT 1, but then I got just an empty array.
What I need is:
lote_item.created_at | lote_item.updated_at | lote_item.item_id | uuid | aluno_email | c014_id | nome_fantasia | cnpj | is_franchise | is_active
somedate | somedate | some_item_id | XJW4 | someemail#a | some_id | SCHOOL NAME | cnpj | t | t

I got it.
Not sure it's the best way, but worked...
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int = company.senior_id
WHERE
item_id = {book_id};

Django pivot table without id and primary key

On the database i have 3 tables:
languages
cities
city_language
city_language Table:
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| city_id | bigint(20) unsigned | NO | PRI | NULL | |
| language_id | bigint(20) unsigned | NO | PRI | NULL | |
| name | varchar(255) | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
Model
class CityLanguage(models.Model):
city = models.ForeignKey('Cities', models.DO_NOTHING)
language = models.ForeignKey('Languages', models.DO_NOTHING)
name = models.CharField(max_length=255)
class Meta:
managed = False
db_table = 'city_language'
unique_together = (('city', 'language'),)
Model doesn't have id field and primary key also my table doesn't have id column. If i run this code i got error:
(1054, "Unknown column 'city_language.id' in 'field list'")
If i define primary key for a column this column values should unique. If i use primary_key when i want to put same city with different languages i get
With this city (name or language it depends on which column choose for primary key) already exists.
I don't want to create id column for pivot table. There is no reason create id column for pivot table. Please can you tell me how can i use pivot table with correct way. Thank you.

Django without primary_key not work. There is two way to figure out it:
Create id (Then Django model you don't need to add primary key)
Create other unique column and set it primary key, and also made it unique.
On my side i choose second way created a column named: unique_key and in model put the code.
unique_key = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
you need to import uuid.
Good luck.

Django retrieve rows for the distinct column values

I want to query Model rows in Django,
class Language(models.Model):
language_id = models.CharField(max_length=100, default="")
code = models.CharField(max_length=100, default="")
name = models.CharField(max_length=500, default="")
In this table, the language_id is not unique, for example, below is the sample data
+-------------+------+---------+
| language_id | code | name |
+-------------+------+---------+
| 12345 | en | english |
| 12345 | te | telugu |
| 54321 | en | english |
| 54321 | te | telugu |
+-------------+------+---------+
I want to filter the rows(all columns) which should have distinct language_ids.
What currently I am doing.
language_list = Language.objects.all()
list = []
idlist = []
for language in language_list:
if language.language_id not in idlist:
il = language
list.append(il)
idlist.append(language.language_id)
Then list will have all the distinct rows(model objects).
Is there any better way to do this. I don't want to rotate through all the language models.

It's unclear what you are trying to do.
What your script does is take the first occurrence of a given ID arbitrarily.
If that's what you want, it will depend on what database your model is based.
PostgreSQL allows the use of distinct on a field:
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#distinct
On MySQL what you could do is get all the unique instances of your id and get an instance of your model matching once per ID:
language_ids = Language.objects.values_list('language_id', flat=True).distinct()
result = []
for language_id in language_ids:
result.append(Language.objects.filter(language_id=language_id).first())
It's not necessarily much better than your solution simply because arbitrary picking isn't an expected use case for the ORM.
If on the other hand you meant to only get language_ids that appear once and only once:
Language.objects.values('language_id').annotate(cnt=Count('id')).filter(cnt=1)

cassandra not set default value for new column added later in python model

I have code like below.
from uuid import uuid4
from uuid import uuid1
from cassandra.cqlengine import columns, connection
from cassandra.cqlengine.models import Model
from cassandra.cqlengine.management import sync_table
class BaseModel(Model):
__abstract__ = True
id = columns.UUID(primary_key=True, default=uuid4)
created_timestamp = columns.TimeUUID(primary_key=True,
clustering_order='DESC',
default=uuid1)
deleted = columns.Boolean(required=True, default=False)
class OtherModel(BaseModel):
__table_name__ = 'other_table'
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create()
After first execution, I can see the record in db when run query as.
cqlsh> select * from test.other_table;
id | created_timestamp | deleted
--------------------------------------+--------------------------------------+---------
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False
(1 rows)
After this, I added new column name in OtherModel it and run same program.
class OtherModel(BaseModel):
__table_name__ = 'other_table'
name = columns.Text(required=True, default='')
if __name__ == '__main__':
connection.setup(hosts=['localhost'],
default_keyspace='test')
sync_table(OtherModel)
OtherModel.create(name='test')
When check db entry
cqlsh> select * from test.other_table;
id | created_timestamp | deleted | name
--------------------------------------+--------------------------------------+---------+------
936cfd6c-44a4-43d3-a3c0-fdd493144f4b | 4d7fd78c-cc74-11e6-bb49-38c986054a88 | False | test
febc7789-5806-44d8-bbd5-45321676def9 | 840e1b66-cc73-11e6-a66c-38c986054a88 | False | null
(2 rows)
There is one row with name as null.
But I can't query on null value.
cqlsh> select * from test.other_table where name=null;
InvalidRequest: code=2200 [Invalid query] message="Unsupported null value for indexed column name"
I got reference How Can I Search for Records That Have A Null/Empty Field Using CQL?.
When I set default='' in the Model, why it not set for all the null value in table?
Is there any way to set null value in name to default value '' with query?

The null cell is actually it just not being set. And the absence of data isn't something you can query on, since its a filtering operation. Its not scalable or possible to do efficiently, so its not something C* will encourage (or in this case even allow).
Going back and retroactively setting values to all the previously created rows would be very expensive (has to read everything, then do as many writes). Its pretty easy in application side to just say if name is null its '' though.

Django: IntegrityError during Many To Many add()

We run into a known issue in django:
IntegrityError during Many To Many add()
There is a race condition if several processes/requests try to add the same row to a ManyToManyRelation.
How to work around this?
Envionment:
Django 1.9
Linux Server
Postgres 9.3 (An update could be made, if necessary)
Details
How to reproduce it:
my_user.groups.add(foo_group)
Above fails if two requests try to execute this code at once. Here is the database table and the failing constraint:
myapp_egs_d=> \d auth_user_groups
id | integer | not null default ...
user_id | integer | not null
group_id | integer | not null
Indexes:
"auth_user_groups_pkey" PRIMARY KEY, btree (id)
fails ==> "auth_user_groups_user_id_group_id_key" UNIQUE CONSTRAINT,
btree (user_id, group_id)
Environment
Since this only happens on production machines, and all production machines in my context run postgres, a postgres only solution would work.

Can the error be reproduced?
Yes, let us use the famed Publication and Article models from Django docs. Then, let's create a few threads.
import threading
import random
def populate():
for i in range(100):
Article.objects.create(headline = 'headline{0}'.format(i))
Publication.objects.create(title = 'title{0}'.format(i))
print 'created objects'
class MyThread(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
pub.article_set.add(article)
print self.name
Article.objects.all().delete()
Publication.objects.all().delete()
populate()
thrd1 = MyThread()
thrd2 = MyThread()
thrd3 = MyThread()
thrd1.start()
thrd2.start()
thrd3.start()
You are sure to see unique key constraint violations of the type reported in the bug report. If you don't see them, try increasing the number of threads or iterations.
Is there a work around?
Yes. Use through models and get_or_create. Here is the models.py adapted from the example in the django docs.
class Publication(models.Model):
title = models.CharField(max_length=30)
def __str__(self): # __unicode__ on Python 2
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication, through='ArticlePublication')
def __str__(self): # __unicode__ on Python 2
return self.headline
class Meta:
ordering = ('headline',)
class ArticlePublication(models.Model):
article = models.ForeignKey('Article', on_delete=models.CASCADE)
publication = models.ForeignKey('Publication', on_delete=models.CASCADE)
class Meta:
unique_together = ('article','publication')
Here is the new threading class which is a modification of the one above.
class MyThread2(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
ap , c = ArticlePublication.objects.get_or_create(article=article, publication=pub)
print 'Get or create', self.name
You will find that the exception no longer shows up. Feel free to increase the number of iterations. I only went up to a 1000 with get_or_create it didn't throw the exception. However add() usually threw an exception with in 20 iterations.
Why does this work?
Because get_or_create is atomic.
This method is atomic assuming correct usage, correct database
configuration, and correct behavior of the underlying database.
However, if uniqueness is not enforced at the database level for the
kwargs used in a get_or_create call (see unique or unique_together),
this method is prone to a race-condition which can result in multiple
rows with the same parameters being inserted simultaneously.
Update:
Thanks #louis for pointing out that the through model can in fact be eliminated. Thuse the get_or_create in MyThread2 can be changed as.
ap , c = article.publications.through.objects.get_or_create(
article=article, publication=pub)

If you are ready to solve it in PostgreSQL you may do the following in psql:
-- Create a RULE and function to intercept all INSERT attempts to the table and perform a check whether row exists:
CREATE RULE auth_user_group_ins AS
ON INSERT TO auth_user_groups
WHERE (EXISTS (SELECT 1
FROM auth_user_groups
WHERE user_id=NEW.user_id AND group_id=NEW.group_id))
DO INSTEAD NOTHING;
Then it will ignore duplicates only new inserts in table:
db=# TRUNCATE auth_user_groups;
TRUNCATE TABLE
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,1);
INSERT 0 1 -- added
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,1);
INSERT 0 0 -- no insert no error
db=# INSERT INTO auth_user_groups (user_id, group_id) VALUES (1,2);
INSERT 0 1 -- added
db=# SELECT * FROM auth_user_groups; -- check
id | user_id | group_id
----+---------+----------
14 | 1 | 1
16 | 1 | 2
(2 rows)
db=#

From what I'm seeing in the code provided. I believe that you have a constraint for uniqueness in pairs (user_id, group_id) in groups. So that's why running 2 times the same query will fail as you are trying to add 2 rows with the same user_id and group_id, the first one to execute will pass, but the second will raise an exception.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django JOIN eliminates desired columns - python

use objects.select_related(): questions = Question.objects.select_related().filter(vote__creator=3).values()

A Question.objects.filter(...) will generate the following SQL select * from question where question.id in (...)` So, as you can see, it's different from what you wanted (question join votes)

Related

PostgreSQL JOIN on JSON Object column

Django pivot table without id and primary key

Django retrieve rows for the distinct column values

cassandra not set default value for new column added later in python model

Django: IntegrityError during Many To Many add()

Categories

Resources