I want to make a Python (SQLAlchemy) program that takes the ID from a table according to a value.
For example, I have a table like this :
-------------------------------
| ID | Name |
-------------------------------
| 1 | Paul |
-------------------------------
| 2 | Paul |
-------------------------------
| 3 | John |
-------------------------------
And I want to retrieve the IDs where the name is Paul.
The code I was doing was something like this :
list = session.query(Table).filter_by(Name='Paul')
list_id = []
for tuple in list :
list_id.append(tuple.id)
for id in list_id :
print(id)
Is there any much easier solution?
Thanks!
You don't need the intermediate list to 'hold' the id, if you only use them once, just iterate directly over the query.
for row in session.query(Table).filter_by(Name='Paul'):
print(row.ID)
If you only need the ID, you can arrange for your query to return only that:
for ID in session.query(Table.ID).filter_by(Name='Paul'):
print(ID)
Related
Say I have the following CSV file
Purchases.csv
+--------+----------+
| Client | Item |
+--------+----------+
| Mark | Computer |
| Mark | Lamp |
| John | Computer |
+--------+----------+
What is the best practice, in Python, to split this table into two separate tables and join them in a bridge table using foreign key, i.e.
Client table
+----------+--------+
| ClientID | Client |
+----------+--------+
| 1 | Mark |
| 2 | John |
+----------+--------+
Item table
+--------+----------+
| ItemID | Item |
+--------+----------+
| 1 | Computer |
| 2 | Lamp |
+--------+----------+
Item Client Bridge Table
+----------+--------+
| ClientID | ItemID |
+----------+--------+
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
+----------+--------+
I should mention here that it possible for records to already exist in the tables, i.e., if the Client Name in the CSV already has an assigned ID in the Client Table, this ID should be used in the Bridge table. This is because I have to do a one-time batch upload of a million line of data, and then insert a few thousands line of data daily.
I have also already created the tables, they are in the database, just empty at the moment
You would do this in the database (or via database commands in Python). The data never needs to be loaded into Python.
Load the purchases.csv table into a staging table in the database. Then be sure you have your tables defined:
create table clients (
clientId int generated always as identity primary key,
client varchar(255)
);
create table items (
itemId int generated always as identity primary key,
item varchar(255)
);
create table clientItems (
clientItemId int generated always as identity primary key,
clientId int references clients(clientId),
itemId int references items(itemId)
);
Note that the exact syntax for these depends on the database. Then load the tables:
insert into clients (client)
select distinct s.client
from staging s
where not exists (select 1 from clients c where c.client = s.client);
insert into items (item)
select distinct s.item
from staging s
where not exists (select 1 from items i where i.item = s.item);
I'm not sure if you need to take duplicates into account for ClientItems:
insert into ClientItems (clientId, itemId)
select c.clientId, i.itemId
from staging s join
clients c
on s.client = c.client join
items i
on s.item = i.item;
If you need to prevent duplicates here, then:
where not exists (select 1
from clientitems ci join
clients c
on c.clientid = ci.clientid join
items i
on i.itemid = ci.itemid
where c.client = s.client and i.item = s.item
);
I have two tables with relatively different data.
the photos table is a table with all the relevant meta data for photos such as user_id, photo_id, datetime, name, etc.
I have another table ratings that holds liked/disliked data for each respective photo. The columns in this table would have rater_id(for the person rating the picture), photo_id, and the rating (like/dislike).
The user would be presented a picture (at random) and then pick whether they liked it or not. Every time the image is loaded/presented it would have to be something that they have not yet rated.
What I'm trying to do is return a photo_id where the user has not yet rated it.
I've thought of using join or union, but I'm having difficulty understanding how to best use those (or any other solution) for this application. Where my confusion lies is how I can compare the ratings table against the photos table, to only return the photos that have not been rated by rater_id.
Sample data
photos table
id | photo_id
-------------------------
1 | photo_123
2 | photo_456
3 | photo_432
4 | photo_642
-------------------------
ratings table
id | photo_id | rater_id | rating
---------------------------------
1 | photo_123 | user2 | 1
2 | photo_456 | user2 | 1
3 | photo_123 | user1 | 1
4 | photo_642 | user2 | 1
--------------------------------
Sample Result: return photo_432 for user2 because it has not yet had a rating in ratings table
The canonical way would be not exists:
select p.*
from photos p
where not exists (select 1
from ratings r
where r.photo_id = p.id and
r.rater_id = #rater
)
order by rand()
limit 1;
There are more efficient ways to get a random row back if the table is big.
This question already has answers here:
How can I reset a autoincrement sequence number in sqlite
(5 answers)
SQLite Reset Primary Key Field
(5 answers)
Closed 4 years ago.
how do I reset the increment count in flask-sqlalchemy after deleting a row so that the next insert will get the id of deleted row?
ie :
table users:
user_id | name |
________________
3 | mbuvi
4 | meshack
5 | You
when I delete user with id=5;
the next insertion into users is having id = 6 but I want it to have id=5;
user_id | name |
________________
3 | mbuvi
4 | meshack
6 | me
How do I solve this?
Your database will keep track your auto increment id! so you can't do something like this.BTW it's no about the flask-sqlalchemy question! If you really want to do this, you have to calculater the left id which you can used and fill it with that number! for example:
+----+--------+
| id | number |
+----+--------+
| 1 | 819 |
| 2 | 829 |
| 4 | 829 |
| 5 | 829 |
+----+--------+
And you have to find the id (3) and then insert with id. so this cause a query all the table util you got that position! don't no why you need to do this, but still have solution!
step01, you gotta use a cache to do this! here I recommand use redis
step02, If you want to delete any row, just simply cache your id into the redis list, the Order-Set is best optionl for you! before delete any row, save it to the cache!
step03, before insert any new row, check see if there any id aviable in your redis! if true, pop it out and insert it with the id which you pop out!
step04, the code should like below:
def insert_one(data):
r = redis.Redis()
_id = r.pop('ID_DB')
if _id:
cursor.execute("insert into xxx(id, data)values(%s, %s)", data)
else:
# use default increment id
cursor.execute("insert into xxx(data)values(%s)", data)
def delete(data, id):
# query the target which you will delete
# if you delete by condtion `id` that's best
r = redis.Redis()
r.push('ID_DB',id)
# DO the rest of you want ...
# delete ....
Hi I have a table with the following structure.
Table Name: DOCUMENTS
Sample Table Structure:
ID | UIN | COMPANY_ID | DOCUMENT_NAME | MODIFIED_ON |
---|----------|------------|---------------|---------------------|
1 | UIN_TX_1 | 1 | txn_summary | 2016-09-02 16:02:42 |
2 | UIN_TX_2 | 1 | txn_summary | 2016-09-02 16:16:56 |
3 | UIN_AD_3 | 2 | some other doc| 2016-09-02 17:15:43 |
I want to fetch the latest modified record UIN for the company whose id is 1 and document_name is "txn_summary".
This is the postgresql query that works:
select distinct on (company_id)
uin
from documents
where comapny_id = 1
and document_name = 'txn_summary'
order by company_id, "modified_on" DESC;
This query fetches me UIN_TX_2 which is correct.
I am using web2py DAL to get this value. After some research I have been successful to do this:
fmax = db.documents.modified_on.max()
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
rows = db(query).select(fmax)
Now "rows" contains only the value of the modified_on date which has maximum value. I want to fetch the record which has the maximum date inside "rows". Please suggest a way. Help is much appreciated.
And my requirement extends to find each such records for each company_id for each document_name.
Your approach will not return complete row, it will only return last modified_on value.
To fetch last modified record for the company whose id is 1 and document_name "txn_summary", query will be
query = (db.documents.company_id==1) & (db.documents.document_name=='txn_summary')
row = db(query).select(db.documents.ALL, orderby=~db.documents.modified_on, limitby=(0, 1)).first()
orderby=~db.documents.modified_on will return records arranged in descending order of modified_on (last modified record will be first) and first() will select the first record. i.e. complete query will return last modified record having company 1 and document_name = "txn_summary".
There can be other/better way to achieve this. Hope this helps!
It might be a redundant question, but I have tried previous answers from other related topics and still can't figure it out.
I have a table Board_status looks like this (multiple status and timestamp for each board):
time | board_id | status
-------------------------------
2012-4-5 | 1 | good
2013-6-6 | 1 | not good
2013-6-7 | 1 | alright
2012-6-8 | 2 | good
2012-6-4 | 3 | good
2012-6-10 | 2 | good
Now I want to select all records from Board_status table, group all of them by board_id for distinct board_id, then select the latest status on each board. Basically end up with table like this (only latest status and timestamp for each board):
time | board_id | status
------------------------------
2013-6-7 | 1 | alright
2012-6-4 | 3 | good
2012-6-10 | 2 | good
I have tried:
b = Board_status.objects.values('board_id').annotate(max=Max('time')).values_list('board_id','max','status')
but doesn't seem like it is working. Still give me more than 1 record per board_id.
Which command should I use in Django to do this?
An update, this is the solution I use. Not the best, but it works for now:
b=[]
a = Board_status.objects.values('board_id').distinct()
for i in range(a.count()):
b.append(Board_status.objects.filter(board_id=a[i]['board_id']).latest('time'))
So I got all board_id, store into list a. Then for each board_id, do another query to get the latest time. Any better answer is still welcomed.
How will it work? You neither have filter nor distinct to filter out the duplicates. I am not sure if this can be easily done in a single django query. You should read more on:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.distinct
https://docs.djangoproject.com/en/1.4/topics/db/aggregation/
If you can't do it in 1 raw sql query, you can't do it with an OR mapper either as it's built on top of mysql (in your case). Can you tell me how you would do this via raw SQL?