Get the newest rows for each foreign key ID - python

I don't want to aggregate any columns. I just want the newest row for each foreign key in a table.
I've tried grouping.
Model.query.order_by(Model.created_at.desc()).group_by(Model.foreign_key_id).all()
# column "model.id" must appear in the GROUP BY clause
And I've tried distinct.
Model.query.order_by(Model.created_at.desc()).distinct(Model.foreign_key_id).all()
# SELECT DISTINCT ON expressions must match initial ORDER BY expressions

This is known as greatest-n-per-group, and for PostgreSQL you can use DISTINCT ON, as in your second example:
SELECT DISTINCT ON (foreign_key_id) * FROM model ORDER BY foreign_key_id, created_at DESC;
In your attempt, you were missing the DISTINCT ON column in your ORDER BY list, so all you had to do was:
Model.query.order_by(Model.foreign_key_id, Model.created_at.desc()).distinct(Model.foreign_key_id)

The solution is to left join an aliased model to itself (with a special join condition). Then filter out the rows that do not have an id.
model = Model
aliased = aliased(Model)
query = model.query.outerjoin(aliased, and_(
aliased.primary_id == model.primary_id,
aliased.created_at > model.created_at))
query = query.filter(aliased.id.is_(None))

Related

Aggregating joined tables in SQLAlchemy

I got this aggregate function working in Django ORM, it counts some values and percents from the big queryset and returns the resulting dictionary.
queryset = Game.objects.prefetch_related(
"timestamp",
"fighters",
"score",
"coefs",
"rounds",
"rounds_view",
"rounds_view_f",
"finishes",
"rounds_time",
"round_time",
"time_coef",
"totals",
).all()
values = queryset.aggregate(
first_win_cnt=Count("score", filter=Q(score__first_score=5)),
min_time_avg=Avg("round_time__min_time"),
# and so on
) # -> dict
I'm trying to achieve the same using SQLAlchemy and this is my tries so far:
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)
.options(
contains_eager(models.Game.fighters),
contains_eager(models.Game.timestamp),
contains_eager(models.Game.coefs),
contains_eager(models.Game.rounds),
contains_eager(models.Game.rounds_view),
contains_eager(models.Game.rounds_view_f),
contains_eager(models.Game.finishes),
contains_eager(models.Game.score),
contains_eager(models.Game.rounds_time),
contains_eager(models.Game.round_time),
contains_eager(models.Game.time_coef),
contains_eager(models.Game.totals),
)
.all()
)
And it gives me an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.GroupingError)
column "stats_fighters.id" must appear in the GROUP BY clause or be
used in an aggregate function LINE 1: SELECT stats_fighters.id AS
stats_fighters_id, stats_fighter...
I don't really understand why there should be stats_fighters.id in the group by, and why do I need to use group by. Please help.
This is the SQL which generates Django ORM:
SELECT
AVG("stats_roundtime"."min_time") AS "min_time_avg",
COUNT("stats_score"."id") FILTER (
WHERE "stats_score"."first_score" = 5) AS "first_win_cnt"
FROM "stats_game" LEFT OUTER JOIN "stats_roundtime" ON ("stats_game"."id" = "stats_roundtime"."game_id")
LEFT OUTER JOIN "stats_score" ON ("stats_game"."id" = "stats_score"."game_id")
Group by is used in connection with rows that have the same values and you want to calculate a summary. It is often used with sum, max, min or average.
Since SQLAlchemy generates the final SQL command you need to know your table structure and need to find out how to make SQLAlchemy to generate the right SQL command.
Doku says there is a group_by method in SQLAlchemy.
May be this code might help.
q = (
db.query(
models.Game,
func.count(models.Score.first_score)
.filter(models.Score.first_score == 5)
.label("first_win_cnt"),
)
.join(models.Game.fighters)
.filter_by(**fighter_options)
.join(models.Game.timestamp)
.group_by(models.Game.fighters)
.join(
models.Game.coefs,
models.Game.rounds,
models.Game.rounds_view,
models.Game.rounds_view_f,
models.Game.finishes,
models.Game.score,
models.Game.rounds_time,
models.Game.round_time,
models.Game.time_coef,
models.Game.totals,
)
func.count is an aggregation function. If any expression in your SELECT clause uses an aggregation, then all expressions in the SELECT must be constant, aggregation, or appear in the GROUP BY.
if you try SELECT a,max(b) the SQL parser would complain that a is not an aggregation or in group by. In your case, you may consider adding models.Game to GROUP BY.

Combine two SQL lite databases with Python

I have the following code in python to update db where the first column is "id" INTEGER PRIMARY KEY AUTOINCREMENT UNIQUE:
con = lite.connect('test_score.db')
with con:
cur = con.cursor()
cur.execute("INSERT INTO scores VALUES (NULL,?,?,?)", (first,last,score))
item = cur.fetchone()
on.commit()
cur.close()
con.close()
I get table "scores" with following data:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
Two different users (userA and userB) copy test_score.db and code to their computer and use it separately.
I get back two db test_score.db but now with different content:
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
user A test_score.db :
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Chris,Prat,99
5,Tom,Hanks,09
6,Tom,Hanks,15
I was trying to use
insert into AuditRecords select * from toMerge.AuditRecords;
to combine two db into one but failed as the first column is a unique id. Two db have now the same ids but with different or the same data and merging is failing.
I would like to find unique rows in both db (all values different ignoring id) and merge results to one full db.
Result should be something like this:
1,Adam,Smith,68
2,John,Snow,76
3,Jim,Green,88
4,Jim,Green,91
5,Tom,Hanks,15
6,Chris,Prat,99
7,Tom,Hanks,09
I can extract each value one by one and compare but want to avoid it as I might have longer rows in the future with more columns.
Sorry if it is obvious and easy questions, I'm still learning. I tried to find the answer but failed, please point me to answer if it already exists somewhere else. Thank you very much for your help.
You need to define the approach to resolve duplicated rows. Will consider the max score? The min? The first one?
Considering the table AuditRecords has all the lines of both User A and B, you can use GROUP BY to deduplicate rows and use an aggregation function to resolve the score:
insert into
AuditRecords
select
id,
first_name,
last_name,
max(score) as score
from
toMerge.AuditRecords
group by
id,
first_name,
last_name;
For this requirement you should have defined a UNIQUE constraint for the combination of the columns first, last and score:
CREATE TABLE AuditRecords(
id INTEGER PRIMARY KEY AUTOINCREMENT,
first TEXT,
last TEXT,
score INTEGER,
UNIQUE(first, last, score)
);
Now you can use INSERT OR IGNORE to merge the tables:
INSERT OR IGNORE INTO AuditRecords(first, last, score)
SELECT first, last, score
FROM toMerge.AuditRecords;
Note that you must explicitly define the list of the columns that will receive the values and in this list the id is missing because its value will be autoincremented by each insertion.
Another way to do it without defining the UNIQUE constraint is to use EXCEPT:
INSERT INTO AuditRecords(first, last, score)
SELECT first, last, score FROM toMerge.AuditRecords
EXCEPT
SELECT first, last, score FROM AuditRecords

How to query database to obtain a list where each value is the sum of the number of records with a the same date value?

I am performing a query on an SQLite database using SQLAlchemy. I'm doing a four table join. One of the fields within one of the tables is a datetime field. I am able to obtain the list of tuples with all records that fit the criteria I'm looking for. What I want to obtain is a list that contains the sums of records with the same date. So if I had 1 record with a certain time, 3 records all with the same time but different from the first, I would obtain the list [1, 3].
Here is my current query:
rework_records = DB.session.query(ReworkRecord, NonConformanceRecord, Part, Product).join(
NonConformanceRecord, NonConformanceRecord.id == ReworkRecord.nonconformance_id
).join(
Part
).join(
Product
).filter(
Product.id == form.product_family.data).order_by(
desc(NonConformanceRecord.entry_date)
).all()
This returns the list of tuples with each entry having records from each of the four tables being joined.
You are only missing a GROUP BY statement. The expression COUNT(*) will provide the number of records with the given entry_date.
rework_records = DB.session.from_statement('''SELECT NonConformanceRecord.entry_date, COUNT(*)
FROM ReworkRecord
JOIN NonConformanceRecord ON NonConformanceRecord.id == ReworkRecord.nonconformance_id
JOIN Part
JOIN Product
WHERE Product.id == form.product_family.data
GROUP BY NonConformanceRecord.entry_date
ORDER BY NonConformanceRecord.entry_date DESC''')
I'm not too familiar with the way you wrote it, but I believe it would just be:
rework_records = DB.session.query(NonConformanceRecord.entry_date, COUNT(*)).join(
NonConformanceRecord, NonConformanceRecord.id == ReworkRecord.nonconformance_id
).join(
Part
).join(
Product
).filter(
Product.id == form.product_family.data).group_by(
NonConformanceRecord.entry_date).order_by(
desc(NonConformanceRecord.entry_date)
).all()
The top one should do the exact same thing in the case that this second part didn't work. However, I don't know where you get form.product_family.data from as you dont join a table called form

How to find duplicates in MySQL

Suppose I have many columns. If 2 columns match and are exactly the same, then they are duplicates.
ID | title | link | size | author
Suppose if link and size are similar for 2 rows or more, then those rows are duplicates.
How do I get those duplicates into a list and process them?
Will return all records that have dups:
SELECT theTable.*
FROM theTable
INNER JOIN (
SELECT link, size
FROM theTable
GROUP BY link, size
HAVING count(ID) > 1
) dups ON theTable.link = dups.link AND theTable.size = dups.size
I like the subquery b/c I can do things like select all but the first or last. (very easy to turn into a delete query then).
Example: select all duplicate records EXCEPT the one with the max ID:
SELECT theTable.*
FROM theTable
INNER JOIN (
SELECT link, size, max(ID) as maxID
FROM theTable
GROUP BY link, size
HAVING count(ID) > 1
) dups ON theTable.link = dups.link
AND theTable.size = dups.size
AND theTable.ID <> dups.maxID
Assuming that none of id, link or size can be NULL, and id field is the primary key. This gives you the id's of duplicate rows. Beware that same id can be in the results several times, if there are three or more rows with identical link and size values.
select a.id, b.id
from tbl a, tbl b
where a.id < b.id
and a.link = b.link
and a.size = b.size
After you remove the duplicates from the MySQL table, you can add a unique index
to the table so no more duplicates can be inserted:
create unique index theTable_index on theTable (link,size);
If you want to do it exclusively in SQL, some kind of self-join of the table (on equality of link and size) is required, and can be accompanied by different kinds of elaboration. Since you mention Python as well, I assume you want to do the processing in Python; in that case, simplest is to build an iterator on a 'SELECT * FROM thetable ORDER BY link, size, and process withitertools.groupbyusing, as key, theoperator.itemgetter` for those two fields; this will present natural groupings of each bunch of 1+ rows with identical values for the fields in question.
I can elaborate on either option if you clarify where you want to do your processing and ideally provide an example of the kind of processing you DO want to perform!

Querying a view in SQLAlchemy

I want to know if SQLAlchemy has problems querying a view. If I query the view with normal SQL on the server like:
SELECT * FROM ViewMyTable WHERE index1 = '608_56_56';
I get a whole bunch of records. But with SQLAlchemy I get only the first one. But in the count is the correct number. I have no idea why.
This is my SQLAlchemy code.
myQuery = Session.query(ViewMyTable)
erg = myQuery.filter(ViewMyTable.index1 == index1.strip())
# Contains the correct number of all entries I found with that query.
totalCount = erg.count()
# Contains only the first entry I found with my query.
ergListe = erg.all()
if you've mapped ViewMyTable, the query will only return rows that have a fully non-NULL primary key. This behavior is specific to versions 0.5 and lower - on 0.6, if any of the columns have a non-NULL in the primary key, the row is turned into an instance. Specify the flag allow_null_pks=True to your mappers to ensure that partial primary keys still count :
mapper(ViewMyTable, myview, allow_null_pks=True)
If OTOH the rows returned have all nulls for the primary key, then SQLAlchemy cannot create an entity since it can't place it into the identity map. You can instead get at the individual columns by querying for them specifically:
for id, index in session.query(ViewMyTable.id, ViewMyTable.index):
print id, index
I was facing similar problem - how to filter view with SQLAlchemy. For table:
t_v_full_proposals = Table(
'v_full_proposals', metadata,
Column('proposal_id', Integer),
Column('version', String),
Column('content', String),
Column('creator_id', String)
)
I'm filtering:
proposals = session.query(t_v_full_proposals).filter(t_v_full_proposals.c.creator_id != 'greatest_admin')
Hopefully it will help:)

Categories

Resources