Django ManyToMany relation add() error

Django ManyToMany relation add() error - python

I've got a model that looks like this,
class PL(models.Model):
locid = models.AutoField(primary_key=True)
mentionedby = models.ManyToManyField(PRT)
class PRT(models.Model):
tid = ..
The resulting many to many table in mysql is formed as,
+------------------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| PL_id | int(11) | NO | MUL | NULL | |
| PRT_id | bigint(64) | NO | MUL | NULL | |
+------------------+------------+------+-----+---------+----------------+
Now, if pl is an object of PL and prt that of PRT, then doing
pl.mentionedby.add(prt)
gives me an error
Incorrect integer value: 'PRT object'
for column 'prt_id' at row 1"
whereas
pl.mentionedby.add(prt.tid)
works fine - with one caveat.
I can see all the elements in pl.mentionedby.all(), but I can't go to a mentioned PRT object and see its prt.mentionedby_set.all().
Does anyone know why this happens? Whats the best way to fix it?
Thanks!

Adding prt directly should work on first try. How are you retrieving pl and prt? Assuming you have some data in your database, try those commands from the Django shell and see if it works. There seems to be some missing information from the question. After running python manage.py shell:
from yourapp.models import PL
pl = PL.objects.get(id=1)
prt = PRT.objects.get(id=1)
pl.mentionedby.add(prt)

Are these the complete models? I can only assume that something's been overriden somewhere, that probably shouldn't have been.
Can you post the full code?

Related

Postgres deadlocks during concurrent upserts from temporary tables

I have process that is controlled by Airflow that generates a number of tasks performing concurrent inserts to a Postgres database.
Each task takes a pandas dataframe, inserts the rows to a temporary table, then upserts from the temporary table to the target table. This is leading to deadlocks, but I am having a tough time understanding how to mitigate this issue. I have pulled out the salient components here, though please let me know if I have failed to include enough information.
I am in python 3.8.2, postgres 11.7, airflow 1.10.10, and using psycopg2 as an odbc connection.
# create temp table like target table
temp_table_sql = 'CREATE TEMP TABLE mur_global_raw_tmp_61400102 (Like mur_global_raw INCLUDING IDENTITY);'
cur.execute(temp_table_sql)
# serialize dataframe and copy to temp table
pd_df_serial = StringIO()
pd_df.to_csv(pd_df_serial, sep='\t', header=False, index=False)
pd_df_serial.seek(0)
cur.copy_from(pd_df_serial, temp_table_name, null="", columns=pd_df.columns.to_list())
conn.commit()
# upsert from temp table to target table
pd_df_insert_sql = 'INSERT INTO mur_global_raw(lat,lon,time,analysed_sst)
(SELECT lat,lon,time,analysed_sst FROM mur_global_raw_tmp_61400102
as tmp_vals ORDER BY lat,lon,time,analysed_sst)
ON CONFLICT DO NOTHING;'
cur.execute(pd_df_insert_sql)
conn.commit()
Here is the schema of the temporary table.
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+--------------------------+-----------+----------+----------------------------------+---------+--------------+-------------
ind | bigint | | not null | generated by default as identity | plain | |
lat | double precision | | | | plain | |
lon | double precision | | | | plain | |
time | timestamp with time zone | | | | plain | |
analysed_sst | double precision | | | | plain | |
And here is the schema of the target table.
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+--------------------------+-----------+----------+----------------------------------+---------+--------------+-------------
ind | bigint | | not null | generated by default as identity | plain | |
lat | double precision | | | | plain | |
lon | double precision | | | | plain | |
time | timestamp with time zone | | | | plain | |
analysed_sst | double precision | | | | plain | |
Indexes:
"mur_global_raw_pkey" PRIMARY KEY, btree (ind)
And finally, here is a sample from the server log:
2020-06-22 23:03:36 UTC::#:[3570]:LOG: checkpoint starting: xlog
2020-06-22 23:03:42 UTC:xxxxx(38068):postgres#public_data_raw:[13975]:WARNING: there is no transaction in progress
2020-06-22 23:03:43 UTC:xxxxx(38090):postgres#public_data_raw:[13993]:ERROR: deadlock detected
2020-06-22 23:03:43 UTC:xxxxx(38090):postgres#public_data_raw:[13993]:DETAIL: Process 13993 waits for ShareLock on transaction 42977; blocked by process 14014.
Process 14014 waits for ShareLock on transaction 42981; blocked by process 14021.
Process 14021 waits for ShareLock on transaction 42980; blocked by process 13993.
Process 13993: INSERT INTO mur_global_raw(lat,lon,time,analysed_sst) (SELECT lat,lon,time,analysed_sst FROM mur_global_raw_tmp_75410038 as tmp_vals ORDER BY lat,lon,time,analysed_sst) ON CONFLICT DO NOTHING;
Process 14014: INSERT INTO mur_global_raw(lat,lon,time,analysed_sst) (SELECT lat,lon,time,analysed_sst FROM mur_global_raw_tmp_41473761 as tmp_vals ORDER BY lat,lon,time,analysed_sst) ON CONFLICT DO NOTHING;
Process 14021: INSERT INTO mur_global_raw(lat,lon,time,analysed_sst) (SELECT lat,lon,time,analysed_sst FROM mur_global_raw_tmp_28913605 as tmp_vals ORDER BY lat,lon,time,analysed_sst) ON CONFLICT DO NOTHING;
2020-06-22 23:03:43 UTC:xxxxx(38090):postgres#public_data_raw:[13993]:HINT: See server log for query details.
2020-06-22 23:03:43 UTC:xxxxx(38090):postgres#public_data_raw:[13993]:CONTEXT: while inserting index tuple (1969403,34) in relation "mur_global_raw"
2020-06-22 23:03:43 UTC:xxxxx(38090):postgres#public_data_raw:[13993]:STATEMENT: INSERT INTO mur_global_raw(lat,lon,time,analysed_sst) (SELECT lat,lon,time,analysed_sst FROM mur_global_raw_tmp_75410038 as tmp_vals ORDER BY lat,lon,time,analysed_sst) ON CONFLICT DO NOTHING;
These deadlocks are happening persistently and regularly, so hopefully there is a component of the design that I can address to avoid them. My understanding of the locks going on is clearly not good enough to address the problem at this stage.
If anyone can help me understand the locks and transactions that are leading to this three-way deadlock, I would most appreciate it. Of course, if you have an idea for how to avoid it, I welcome that as well.
My humble thanks to the SO community.

THe best workaround I've got is to add an exlusive lock before starting the upsert, like so:
LOCK TABLE mur_global_raw IN EXCLUSIVE MODE;
Any comments welcome.

If you cannot figure out a better way, catch the deadlock errors and repeat the transaction. If the deadlocks happen a lot, that is annoying and will harm performance, but it is better than a table lock, because it won't prevent autovacuum from doing its important work.
Perhaps you can reduce the size or duration of the batches to make a deadlock less likely.

Python app returns MySQL data 3 times instead of 1 [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I have a simple MySQL table with the following meta and content
+--------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| Name | varchar(100) | YES | | NULL | |
| School | varchar(45) | YES | | NULL | |
+--------+--------------+------+-----+---------+----------------+
+----+---------+--------+
| id | Name | School |
+----+---------+--------+
| 1 | Artem | AU |
| 2 | Simon | AAU |
| 3 | Steffen | AU |
+----+---------+--------+
from which I'm trying to fetch all data using a simple python app, however my output is returned like this when I run the app(click to enlarge photo):
but what I expect it to return is (1, 'Artem', 'AU'), (2, 'Simon', 'AAU'), (3, 'Steffen', 'AU') only.
The Python code is
import MySQLdb
from django.shortcuts import render
from django.http import HttpResponse
def getFromDB():
data = []
db = MySQLdb.connect(host="ip",
user="user",
passwd="pw",
db="db")
cur = db.cursor()
cur.execute("SELECT * FROM students")
students = cur.fetchall()
for student in students:
data.append(students)
return data
def index(request):
return HttpResponse(getFromDB())
What am I missing?

See this part:
for student in students:
data.append(students)
Did you find any problem? students have three items, so this loop will run three times. But you append students to data. That means, you get a new list:
data = [students, students, students]
Extract it, you could see:
data = [(student1, student2, student3), (student1, student2, student3), (student1, student2, student3)]
It's easy to fix your code, just remove a character --- s
for student in students:
data.append(student)
In fact, you could just use one line to do this convert:
data = list(cur.fetchall())
But, as comment said, if you use django, you should try to learn how to use django built-in ORM.
from django.db import models
class Student(models.Model):
Name = models.CharField(max_length=100)
School = models.CharField(max_length=45)
Student.objects.all()

students (before the for loop), should contain the result you are looking for - .fetchall() will return a list of found rows, which you can access as students[x]

Get the intersection of two many-to-many relationship of specific values

N.B. I have tagged this with SQLAlchemy and Python because the whole point of the question was to develop a query to translate into SQLAlchemy. This is clear in the answer I have posted. It is also applicable to MySQL.
I have three interlinked tables I use to describe a book. (In the below table descriptions I have eliminated extraneous rows to the question at hand.)
MariaDB [icc]> describe edition;
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
+-----------+------------+------+-----+---------+----------------+
7 rows in set (0.001 sec)
MariaDB [icc]> describe line;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| edition_id | int(11) | YES | MUL | NULL | |
| line | varchar(200) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
5 rows in set (0.001 sec)
MariaDB [icc]> describe line_attribute;
+------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------+------+-----+---------+-------+
| line_id | int(11) | NO | PRI | NULL | |
| num | int(11) | YES | | NULL | |
| precedence | int(11) | YES | MUL | NULL | |
| primary | tinyint(1) | NO | MUL | NULL | |
+------------+------------+------+-----+---------+-------+
5 rows in set (0.001 sec)
line_attribute.precedence is the hierarchical level of the given heading. So if War and Peace has Books > Chapters, all of the lines have an attribute that corresponds to the Book they're in (e.g., Book 1 has precedence=1 and num=1) and an attribute for the Chapter they're in (e.g., Chapter 2 has precedence=2 and num=2). This allows me to translate the hierarchical structure of books with volumes, books, parts, chapters, sections, or even acts and scenes. The primary column is a boolean, so that each and every line has one attribute that is primary. If it is a book heading, it is the Book attribute, if it is a chapter heading, it is the Chapter attribute. If it is a regular line in text, it is a line attribute, and the precedence is 0 since it is not a part of the hierarchical structure.
I need to be able to query for all lines with a particular edition_id and that also have the intersection of two line_attributes.
(This would allow me to get all lines from a particular edition that are in, say, Book 1 Chapter 2 of War and Peace).
I can get all lines that have Book 1 with
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=1 AND line_attribute.num=1;
and I can get all lines that have Chapter 2:
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=2 AND line_attribute.num=1;
Except the second query returns each chapter 2 from every book in War and Peace.
How do I get from these two queries to just the lines from book 1 chapter 2?

Warning from Raymond Nijland in the comments:
Note for future readers.. Because this question is tagged MySQL.. MySQL does not support INTERSECT keyword.. MariaDB is indeed a fork off the MySQL source code but supports extra features which MySQL does not support.. In MySQL you can simulate the INTERSECT keyword with a INNER JOIN or IN()
Trying to write up a question on SO helps me get my thoughts clear and eventually solve the problem before I have to ask the question. The queries above are much clearer than my initial queries and the question pretty much answers itself, but I never found a clear answer that talks about the intersect utility, so I'm posting this answer anyway.
The solution was the INTERSECT operator.
The solution is simply the intersection of those two queries:
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=1 AND line_attribute.num=1
INTERSECT /* it is literally this simple */
SELECT
line.*
FROM
line
INNER JOIN
line_attribute
ON
line_attribute.line_id=line.id
WHERE
line.edition_id=2 AND line_attribute.precedence=2 AND line_attribute.num=2;
This also means I could get all of the book and chapter headings for a particular book by simply adding an additional constraint (line_attribute.primary=1).
This solution seems broadly applicable to me. Assuming, for instance, you have questions in a StackOverflow clone, which are tagged, you can get the intersection of questions with two tags (e.g., all posts that have both the SQLAlchemy and Python tags). I am certainly going to use this method for that sort of query.
I coded this up in MySQL because it helps me get the query straight to translate it into SQLAlchemy.
The SQLAlchemy query is this simple:
[nav] In [10]: q1 = Line.query.join(LineAttribute).filter(LineAttribute.precedence==1, LineAttribute.num==1)
[ins] In [11]: q2 = Line.query.join(LineAttribute).filter(LineAttribute.precedence==2, LineAttribute.num==1)
[ins] In [12]: q1.intersect(q2).all()
Hopefully the database structure in this question helps someone down the road. I didn't want to delete the question after I solved my own problem.

Django group by id then select max timestamp

It might be a redundant question, but I have tried previous answers from other related topics and still can't figure it out.
I have a table Board_status looks like this (multiple status and timestamp for each board):
time | board_id | status
-------------------------------
2012-4-5 | 1 | good
2013-6-6 | 1 | not good
2013-6-7 | 1 | alright
2012-6-8 | 2 | good
2012-6-4 | 3 | good
2012-6-10 | 2 | good
Now I want to select all records from Board_status table, group all of them by board_id for distinct board_id, then select the latest status on each board. Basically end up with table like this (only latest status and timestamp for each board):
time | board_id | status
------------------------------
2013-6-7 | 1 | alright
2012-6-4 | 3 | good
2012-6-10 | 2 | good
I have tried:
b = Board_status.objects.values('board_id').annotate(max=Max('time')).values_list('board_id','max','status')
but doesn't seem like it is working. Still give me more than 1 record per board_id.
Which command should I use in Django to do this?

An update, this is the solution I use. Not the best, but it works for now:
b=[]
a = Board_status.objects.values('board_id').distinct()
for i in range(a.count()):
b.append(Board_status.objects.filter(board_id=a[i]['board_id']).latest('time'))
So I got all board_id, store into list a. Then for each board_id, do another query to get the latest time. Any better answer is still welcomed.

How will it work? You neither have filter nor distinct to filter out the duplicates. I am not sure if this can be easily done in a single django query. You should read more on:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
https://docs.djangoproject.com/en/1.4/topics/db/aggregation/
If you can't do it in 1 raw sql query, you can't do it with an OR mapper either as it's built on top of mysql (in your case). Can you tell me how you would do this via raw SQL?

How to implement the having clause in sqlite django ORM

I've written django sqlite orm syntax to retrieve particular set of records:
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).aggregate(
Count(status)__gt=3
).distinct()
But it gives me an error and the sql equivalent for this syntax works fine for me.
This is my sql equivalent.
SELECT *
FROM tracker_jobstatus
WHERE status = 'PRF'
GROUP BY job_id
HAVING COUNT(status) > 3;
and I'm getting the result as follows
+----+--------+--------+---------+---------------------+---------+
| id | job_id | status | comment | date_and_time | user_id |
+----+--------+--------+---------+---------------------+---------+
| 13 | 3 | PRF | | 2012-11-12 13:16:00 | 1 |
| 31 | 4 | PRF | | 2012-11-12 13:48:00 | 1 |
+----+--------+--------+---------+---------------------+---------+
I'm unable to find the django sqlite equivalent for this.
I will be very grateful if anyone can help.

Finally I've managed to figure it out. The ORM syntax is something like this.
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).annotate(
count_status=Count('status')
).filter(
count_status__gt=1
).distinct()

More general rule for this: you need to create new column (by annotate) and then filter through that new column. This queryset will be transformed to HAVING keyword.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django ManyToMany relation add() error - python

Are these the complete models? I can only assume that something's been overriden somewhere, that probably shouldn't have been. Can you post the full code?

Related

Postgres deadlocks during concurrent upserts from temporary tables

Python app returns MySQL data 3 times instead of 1 [closed]

Get the intersection of two many-to-many relationship of specific values

Django group by id then select max timestamp

How to implement the having clause in sqlite django ORM

Categories

Resources