how to get latest timestamp from the two columns in django - python

How can I get latest time from two columns eg I have two column name start_time_a and start_time_b both stores value like 2014-05-13 12:34:34 but i need to get latest time from the two columns using Python Django. I am new to django query please help me to get rid of this issue.
Example table:
+-----+-----------------------+-----------------------+
| id | start_time_a | start_time_b |
+-----+-----------------------+-----------------------+
| 1 | 2014-05-13 12:34:34 | 2014-05-13 12:41:34 |
| 2 | 2014-05-13 12:40:34 | 2014-05-13 12:40:40 |
| 3 | 2014-05-13 12:20:34 | 2014-05-13 12:46:34 |
+-----+-----------------------+-----------------------+
and i want this output
| 3 | 2014-05-13 12:20:34 | 2014-05-13 12:46:34 |
because it has latest start_time_b from all timestamps

Referring to the SQL you posted, you can place that into a Django extra() Queryset modifier:
qs = YourModel.objects.extra(select={
'max_time': '''
select * from t where (
start_time_a in (
select greatest(max(start_time_a), max(start_time_b)) from t
) or start_time_b in (
select greatest(max(start_time_a), max(start_time_b)) from t
)
)'''
})
# each YourModel object in the queryset will have an extra attribute, max_time
for obj in qs:
print obj.max_time

For getting row of greatest value from two column I found this answer and it is quite usefull
select * from t where (
start_time_a in (select greatest(max(start_time_a), max(start_time_b)) from t) or
start_time_b in (select greatest(max(start_time_a), max(start_time_b)) from t)
);
mysql greatest() function

MySQL solution:
If you want to identify latest dates from all records, irrespective of other column values, you can use MAX function on the date columns.
Example:
select max( start_time_a ) as mx_start_time_a
, max( start_time_b ) as mx_start_time_b
from table_name

Related

django mysql connector - is allowing >1 entry for per specifc field in django

ive written some code to parse a website, and input it into a mysql db.
The problem is I am getting a lot of duplicates per FKToTech_id like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
| 2 | website.com/path | 1 |
| 3 | website.com/path | 1
What Im looking for is instead to have (1) row in this database, based on if ref has been entered already for FKToTech_id and not have multiple of the same row like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
How can I modify my code below to just python pass if the above is True (==1 ref with same FKToTech_id?
for i in elms:
allcves = {cursor.execute("INSERT INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", (i.attrs["href"], row[1])) for row in cves}
mydb.commit()
Thanks
Make ref a unique column, then use INSERT IGNORE to skip the insert if it would cause a duplicate key error.
ALTER TABLE TechBooks ADD UNIQUE INDEX (ref);
for i in elms:
cursor.executemany("INSERT IGNORE INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", [(i.attrs["href"], row[1]) for row in cves])
mydb.commit()
I'm not sure what your intent was by assigning the results of cursor.execute() to allcves. cursor.execute() doesn't return a value unless you use multi=True. I've replaced the useless set comprehension with use of cursor.executemany() to insert many rows at once.

Splitting a CSV table into SQL Tables with Foreign Keys

Say I have the following CSV file
Purchases.csv
+--------+----------+
| Client | Item |
+--------+----------+
| Mark | Computer |
| Mark | Lamp |
| John | Computer |
+--------+----------+
What is the best practice, in Python, to split this table into two separate tables and join them in a bridge table using foreign key, i.e.
Client table
+----------+--------+
| ClientID | Client |
+----------+--------+
| 1 | Mark |
| 2 | John |
+----------+--------+
Item table
+--------+----------+
| ItemID | Item |
+--------+----------+
| 1 | Computer |
| 2 | Lamp |
+--------+----------+
Item Client Bridge Table
+----------+--------+
| ClientID | ItemID |
+----------+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
+----------+--------+
I should mention here that it possible for records to already exist in the tables, i.e., if the Client Name in the CSV already has an assigned ID in the Client Table, this ID should be used in the Bridge table. This is because I have to do a one-time batch upload of a million line of data, and then insert a few thousands line of data daily.
I have also already created the tables, they are in the database, just empty at the moment
You would do this in the database (or via database commands in Python). The data never needs to be loaded into Python.
Load the purchases.csv table into a staging table in the database. Then be sure you have your tables defined:
create table clients (
clientId int generated always as identity primary key,
client varchar(255)
);
create table items (
itemId int generated always as identity primary key,
item varchar(255)
);
create table clientItems (
clientItemId int generated always as identity primary key,
clientId int references clients(clientId),
itemId int references items(itemId)
);
Note that the exact syntax for these depends on the database. Then load the tables:
insert into clients (client)
select distinct s.client
from staging s
where not exists (select 1 from clients c where c.client = s.client);
insert into items (item)
select distinct s.item
from staging s
where not exists (select 1 from items i where i.item = s.item);
I'm not sure if you need to take duplicates into account for ClientItems:
insert into ClientItems (clientId, itemId)
select c.clientId, i.itemId
from staging s join
clients c
on s.client = c.client join
items i
on s.item = i.item;
If you need to prevent duplicates here, then:
where not exists (select 1
from clientitems ci join
clients c
on c.clientid = ci.clientid join
items i
on i.itemid = ci.itemid
where c.client = s.client and i.item = s.item
);

Web2py DAL find the record with the latest date

Hi I have a table with the following structure.
Table Name: DOCUMENTS
Sample Table Structure:
ID | UIN | COMPANY_ID | DOCUMENT_NAME | MODIFIED_ON |
---|----------|------------|---------------|---------------------|
1 | UIN_TX_1 | 1 | txn_summary | 2016-09-02 16:02:42 |
2 | UIN_TX_2 | 1 | txn_summary | 2016-09-02 16:16:56 |
3 | UIN_AD_3 | 2 | some other doc| 2016-09-02 17:15:43 |
I want to fetch the latest modified record UIN for the company whose id is 1 and document_name is "txn_summary".
This is the postgresql query that works:
select distinct on (company_id)
uin
from documents
where comapny_id = 1
and document_name = 'txn_summary'
order by company_id, "modified_on" DESC;
This query fetches me UIN_TX_2 which is correct.
I am using web2py DAL to get this value. After some research I have been successful to do this:
fmax = db.documents.modified_on.max()
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
rows = db(query).select(fmax)
Now "rows" contains only the value of the modified_on date which has maximum value. I want to fetch the record which has the maximum date inside "rows". Please suggest a way. Help is much appreciated.
And my requirement extends to find each such records for each company_id for each document_name.
Your approach will not return complete row, it will only return last modified_on value.
To fetch last modified record for the company whose id is 1 and document_name "txn_summary", query will be
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
row = db(query).select(db.documents.ALL, orderby=~db.documents.modified_on, limitby=(0, 1)).first()
orderby=~db.documents.modified_on will return records arranged in descending order of modified_on (last modified record will be first) and first() will select the first record. i.e. complete query will return last modified record having company 1 and document_name = "txn_summary".
There can be other/better way to achieve this. Hope this helps!

Django - Get sum of field A where field B is equal to either field B or field C, OVER MULTIPLE ROWS?

+-------------------+-------------------+----------+
| mac_src | mac_dst | bytes_in |
+-------------------+-------------------+----------+
| aa:aa:aa:aa:aa:aa | bb:bb:bb:bb:bb:bb | 10 |
| bb:bb:bb:bb:bb:bb | aa:aa:aa:aa:aa:aa | 20 |
| cc:cc:cc:cc:cc:cc | aa:aa:aa:aa:aa:aa | 30 |
+-------------------+-------------------+----------+
I have a table with fields mac_src, mac_dst and bytes_in.
I need to get all rows where each mac_src value that exists in the table is present in EITHER mac_src or mac_dst. I then need the sum of the fields bytes_in of all these rows.
I want to get the sum of field bytes_in of all rows where the field mac_src and mac_dst are equal, and then sort this sum from highest to lowest.
The Queryset returned should have just one entry per mac_src.
Thanks.
I don't think there's a simple way to do it with just the Django ORM. Just write an SQL query (warning: untested and probably slow SQL below):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT mac, SUM(total) FROM (
(SELECT mac_src AS mac, SUM(bytes_in) AS total FROM your_table GROUP BY mac_src)
UNION ALL (SELECT mac_dst AS mac, SUM(bytes_in) AS total FROM your_table WHERE mac_src != mac_dst GROUP BY mac_dst)
) AS combined_rows GROUP BY mac
''')
counts = dict(cursor.fetchall()) # {mac1: total_bytes1, ...}

How to implement the having clause in sqlite django ORM

I've written django sqlite orm syntax to retrieve particular set of records:
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).aggregate(
Count(status)__gt=3
).distinct()
But it gives me an error and the sql equivalent for this syntax works fine for me.
This is my sql equivalent.
SELECT *
FROM tracker_jobstatus
WHERE status = 'PRF'
GROUP BY job_id
HAVING COUNT(status) > 3;
and I'm getting the result as follows
+----+--------+--------+---------+---------------------+---------+
| id | job_id | status | comment | date_and_time | user_id |
+----+--------+--------+---------+---------------------+---------+
| 13 | 3 | PRF | | 2012-11-12 13:16:00 | 1 |
| 31 | 4 | PRF | | 2012-11-12 13:48:00 | 1 |
+----+--------+--------+---------+---------------------+---------+
I'm unable to find the django sqlite equivalent for this.
I will be very grateful if anyone can help.
Finally I've managed to figure it out. The ORM syntax is something like this.
from django.db.models.aggregates import Count
JobStatus.objects.filter(
status='PRF'
).values_list(
'job', flat=True
).order_by(
'job'
).annotate(
count_status=Count('status')
).filter(
count_status__gt=1
).distinct()
More general rule for this: you need to create new column (by annotate) and then filter through that new column. This queryset will be transformed to HAVING keyword.

Categories

Resources