PostgreSQL JOIN on JSON Object column

PostgreSQL JOIN on JSON Object column - python

I'm supposed to join 3 different tables on postgres:
lote_item (on which I have some books id's)
lote_item_log (on which I have a column "attributes", with a JSON object such as {"aluno_id": "2823", "aluno_email": "someemail#outlook.com", "aluno_unidade": 174, "livro_codigo": "XOZK-0NOYP0Z1EMJ"}) - Obs.: Some values on aluno_unidade are null
and finally
company (on which I have every school name for every aluno_unidade.
Ex: aluno_unidade = 174 ==> nome_fantasia = mySchoolName).
Joining the first two tables was easy, since lote_item_log has a foreign key which I could match like this:
SELECT * FROM lote_item JOIN lote_item_log ON lote_item.id = lote_item_log.lote_item_id
Now, I need to get the School Name, contained on table company, with the aluno_unidade ID from table lote_item_log.
My current query is:
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
(
SELECT
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int
FROM
lote_item_log
WHERE
operation_id = 6
) = company.senior_id
WHERE
item_id = {book_id};
operation_id determines which school is active.
ERROR I'M GETTING:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.CardinalityViolation) more than one row returned by a subquery used as an expression
I tried LIMIT 1, but then I got just an empty array.
What I need is:
lote_item.created_at | lote_item.updated_at | lote_item.item_id | uuid | aluno_email | c014_id | nome_fantasia | cnpj | is_franchise | is_active
somedate | somedate | some_item_id | XJW4 | someemail#a | some_id | SCHOOL NAME | cnpj | t | t

I got it.
Not sure it's the best way, but worked...
SELECT
*
FROM
lote_item
JOIN
lote_item_log
ON
lote_item.id = lote_item_log.lote_item_id
JOIN
company
ON
JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int = company.senior_id
WHERE
item_id = {book_id};

Related

Specify FLOAT column precision in Peewee with MariaDB/MySQL

I am trying to specify the float precision for a column definition in Peewee and cannot find how to do this in the official docs or in the github issues.
My example model is below:
DB = peewee.MySQLDatabase(
"example",
host="localhost",
port=3306,
user="root",
password="whatever"
)
class TestModel(peewee.Model):
class Meta:
database = DB
value = peewee.FloatField()
The above creates the following table spec in the database:
SHOW COLUMNS FROM testmodel;
/*
+-------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+----------------+
| value | float | NO | | NULL | |
+-------+---------+------+-----+---------+----------------+
*/
What I would like is to specify the M and D parameters that the FLOAT field accepts so that the column is created with the precision parameters I need. I can accomplish this in SQL after the table is created using the below:
ALTER TABLE testmodel MODIFY COLUMN value FLOAT(20, 6); -- 20 and 6 are example parameters
Which gives this table spec:
SHOW COLUMNS FROM testmodel;
/*
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| value | float(20,6) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
*/
But I'd like it be done at table creation time within the peewee structure itself, rather than needing to run a separate "alter table" query after the peewee.Database.create_tables() method is run. If there is no way to do this in the peewee.FloatField itself then I'd also accept any other solution so long as it ensures the create_tables() call will create the columns with the specified precision.

As #booshong already mentions
The simpelst solution is to subclass the default FloatField like this :
class CustomFloatField(FloatField):
def __init__(self, *args, **kwargs):
self.max_digits = kwargs.pop("max_digits", 7)
self.decimal_places = kwargs.pop("decimal_places", 4)
super().__init__(*args, **kwargs)
def get_modifiers(self):
return [self.max_digits, self.decimal_places]
and then use it like this
my_float_field = CustomFloatField(max_digits=2, decimal_places=2)

How to order nested SQL SELECT on SqlAlchemy

I have written an SQL script that should get X entries before a user id, ordered by registration_date desc on ordered db table based on registration date.
To be more concrete, lets say that these are some entries on the ordered db:
id | Name | Email | registration_data
3939 | Barbara Hayes | barbara.hayes#example.com | 2019-09-15T23:39:26.910Z
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
3890 | Villads Andersen | villads.andersen#example.com | 2019-09-12T06:29:48.708Z
3685 | Houssine Van Sabben | houssine.vansabben#example.com | 2019-09-12T02:27:08.396Z
I would like to get the users over id 3890. So the query should return
689 | Noémie Harris | noemie.harris#example.com | 2019-09-14T21:39:15.641Z
2529 | Andrea Iglesias | andrea.iglesias#example.com | 2019-09-13T02:59:08.821Z
The raw SQL that I wrote is this:
SELECT * from (
SELECT id, name, email, registration_date FROM public.users
WHERE users.registration_date > (SELECT registration_date FROM users WHERE id = 3890)
order by registration_date
limit 2 )
as a
order by registration_date desc
See this dbfiddle.
I tried to implement the SqlAlchemy code with no luck. I believe that I am making a mistake on the subquery. This is what i have done so far.
registration_date_min = db.query(User.registration_date) \
.order_by(User.registration_date) \
.filter(User.id == ending_before).first()
users_list = db.query(User) \
.filter(User.registration_date > registration_date_min) \
.order_by('registration_date').limit(limit).subquery('users_list')
return users_list.order_by(desc('registration_date'))
P.s the ending_before represents a user_id. Like 3890 in the example.
Any ideas on the SqlAlchemy part would be very helpful!

First of all, your registration_date_min query has already been executed; you have a row with one column there. Remove the first() call; it executes the SELECT and returns the first row.
As you are selecting by the primary key, there is only ever going to be a single row and you don't need to order it. Just use:
registration_date_min = db.query(User.registration_date).filter(
User.id == ending_before
)
That's now a query object and can be used directly in a comparison:
users_list = (
db.query(User)
.filter(User.registration_date > registration_date_min)
.order_by(User.registration_date)
.limit(limit)
)
You can then self-select with Query.from_self() from that query to apply the final ordering:
return user_list.from_self().order_by(User.registration_date.desc()))
This produces the following SQL (on SQLite, other dialects can differ):
SELECT anon_1.users_id AS anon_1_users_id, anon_1.users_name AS anon_1_users_name, anon_1.users_email AS anon_1_users_email, anon_1.users_registration_date AS anon_1_users_registration_date
FROM (SELECT users.id AS users_id, users.name AS users_name, users.email AS users_email, users.registration_date AS users_registration_date
FROM users
WHERE users.registration_date > (SELECT users.registration_date AS users_registration_date
FROM users
WHERE users.id = ?) ORDER BY users.registration_date
LIMIT ? OFFSET ?) AS anon_1 ORDER BY anon_1.users_registration_date DESC
If I use the following model with __repr__:
class User(db.Model):
__tablename__ = "users"
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)
email = db.Column(db.String)
registration_date = db.Column(db.DateTime)
def __repr__(self):
return f"<User({self.id}, {self.name!r}, {self.email!r}, {self.registration_date!r}>"
and print the query result instances I get:
<User(689, 'Noémie Harris', 'noemie.harris#example.com', datetime.datetime(2019, 9, 14, 21, 39, 15, 641000)>
<User(2529, 'Andrea Iglesias', 'andrea.iglesias#example.com', datetime.datetime(2019, 9, 13, 2, 59, 8, 821000)>

Django pivot table without id and primary key

On the database i have 3 tables:
languages
cities
city_language
city_language Table:
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| city_id | bigint(20) unsigned | NO | PRI | NULL | |
| language_id | bigint(20) unsigned | NO | PRI | NULL | |
| name | varchar(255) | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
Model
class CityLanguage(models.Model):
city = models.ForeignKey('Cities', models.DO_NOTHING)
language = models.ForeignKey('Languages', models.DO_NOTHING)
name = models.CharField(max_length=255)
class Meta:
managed = False
db_table = 'city_language'
unique_together = (('city', 'language'),)
Model doesn't have id field and primary key also my table doesn't have id column. If i run this code i got error:
(1054, "Unknown column 'city_language.id' in 'field list'")
If i define primary key for a column this column values should unique. If i use primary_key when i want to put same city with different languages i get
With this city (name or language it depends on which column choose for primary key) already exists.
I don't want to create id column for pivot table. There is no reason create id column for pivot table. Please can you tell me how can i use pivot table with correct way. Thank you.

Django without primary_key not work. There is two way to figure out it:
Create id (Then Django model you don't need to add primary key)
Create other unique column and set it primary key, and also made it unique.
On my side i choose second way created a column named: unique_key and in model put the code.
unique_key = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
you need to import uuid.
Good luck.

Django retrieve rows for the distinct column values

I want to query Model rows in Django,
class Language(models.Model):
language_id = models.CharField(max_length=100, default="")
code = models.CharField(max_length=100, default="")
name = models.CharField(max_length=500, default="")
In this table, the language_id is not unique, for example, below is the sample data
+-------------+------+---------+
| language_id | code | name |
+-------------+------+---------+
| 12345 | en | english |
| 12345 | te | telugu |
| 54321 | en | english |
| 54321 | te | telugu |
+-------------+------+---------+
I want to filter the rows(all columns) which should have distinct language_ids.
What currently I am doing.
language_list = Language.objects.all()
list = []
idlist = []
for language in language_list:
if language.language_id not in idlist:
il = language
list.append(il)
idlist.append(language.language_id)
Then list will have all the distinct rows(model objects).
Is there any better way to do this. I don't want to rotate through all the language models.

It's unclear what you are trying to do.
What your script does is take the first occurrence of a given ID arbitrarily.
If that's what you want, it will depend on what database your model is based.
PostgreSQL allows the use of distinct on a field:
https://docs.djangoproject.com/en/2.1/ref/models/querysets/#distinct
On MySQL what you could do is get all the unique instances of your id and get an instance of your model matching once per ID:
language_ids = Language.objects.values_list('language_id', flat=True).distinct()
result = []
for language_id in language_ids:
result.append(Language.objects.filter(language_id=language_id).first())
It's not necessarily much better than your solution simply because arbitrary picking isn't an expected use case for the ORM.
If on the other hand you meant to only get language_ids that appear once and only once:
Language.objects.values('language_id').annotate(cnt=Count('id')).filter(cnt=1)

how to group by a column and pick one object ordered by created time

I have a model like below,
class MusicData(BaseModel):
name = models.CharField(max_length=100)
url = models.URLField()
description = models.TextField()
age = models.CharField(max_length=25)
language = models.ForeignKey(Language, on_delete=models.CASCADE,
related_name="music_data",
related_query_name="music_data")
count = models.IntegerField()
last_updated = models.CharField(max_length=255)
playlist = models.ForeignKey(PlayList, on_delete=models.CASCADE,
related_name="music_data",
related_query_name="music_data")
I want to get MusicData such that group by name, in each group get the one which has latest created_on (created_on is a DateTimeField in BaseModel)
suppose say I have following data
| Name | Created On |
| ----------- | ----------- |
| ABC | 2019-02-22 1:06:45 AM |
| ABC | 2019-02-22 1:07:45 AM |
| BAC | 2019-02-22 1:08:45 AM |
| BAC | 2019-02-22 1:09:45 AM |
| BAC | 2019-02-22 1:10:45 AM |
| BBC | 2019-02-22 1:11:45 AM |
The expected output is that
| Name | Created On |
| ----------- | ----------- |
| ABC | 2019-02-22 1:07:45 AM |
| BAC | 2019-02-22 1:10:45 AM |
| BBC | 2019-02-22 1:11:45 AM |
I have written this query, which is working fine for above case
models.MusicData.objects.filter(playlist__isnull=True).values(
"name").annotate(maxdate=Max("created_on"))
But, the problem is along with name and created_on I also need other values like name, url, age, count, playlist__name etc...
so I have followed this guide : https://docs.djangoproject.com/en/2.1/topics/db/aggregation/#combining-multiple-aggregations
Came up with this query,
models.MusicData.objects.filter(playlist__isnull = True).values(
"name").annotate(maxdate = Max("created_on")).values("age",
"name",
"description",
"url",
"count",
"last_updated",
"playlist",
language = F(
"language__name")
)
But, in this case I got duplicate objects, then I inspected sql queries, I figured out this
In the first case, only GROUP BY name is there along with joins which is fine
But in the second case, GROUP BY has all the columns I have specified in the values, I understand that if we want a column to SELECT we must include in GROUP BY clause
I even tried to generate a list of ids then filter on it, but that also the same case, it aggregates over the whole queryset
result = models.MusicData.objects.filter(playlist__isnull=True).values(
"name").annotate(maxdate=Max("created_on")).values_list("id", flat=True)
# Then filter on this list of id's
Anyone help me ???
Note: I am using PostgreSQL database

Finally figured it out, I could do this in PostgreSQL
MusicData.objects.order_by('name', '-created_on').distinct('name')..values(
"age",
"name",
"description",
"url",
"since_last_pushed",
"last_updated",
language=F("language__name"),
)
which will result me in the following query,
SELECT DISTINCT ON ('name', 'created_on')
id,
name,
url,
description,
... etc,
FROM
musicdata
ORDER BY name ASC,
created_on DESC;
I wondered if this is a tough question, been more than 2 days, but didn't get any response here make me surprise, I expected answer in hours, did I mistag topics ???

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

PostgreSQL JOIN on JSON Object column - python

I got it. Not sure it's the best way, but worked... SELECT * FROM lote_item JOIN lote_item_log ON lote_item.id = lote_item_log.lote_item_id JOIN company ON JSON_EXTRACT_PATH_TEXT(attributes, 'aluno_unidade')::int = company.senior_id WHERE item_id = {book_id};

Related

Specify FLOAT column precision in Peewee with MariaDB/MySQL

How to order nested SQL SELECT on SqlAlchemy

Django pivot table without id and primary key

Django retrieve rows for the distinct column values

how to group by a column and pick one object ordered by created time

Categories

Resources