how can I transform this table:
ID ITEM_CODE
--------------------
1 1AB
1 22S
1 1AB
2 67R
2 225
3 YYF
3 1AB
3 UUS
3 F67
3 F67
3 225
......
..to a list of lists, each list being a distinct ID containing its allocated item_codes?
in the form: [[1AB,22S,1AB],[67R,225],[YYF,1AB,UUS,F67,F67,225]]
Using this query:
SELECT ID, ITEM_CODE
FROM table1
ORDER BY ID;
and doing cursor.fetchall() in python does not return it as a list nor ordered by ID
Thank you
You probly will have less post-processing in Python using that query:
SELECT GROUP_CONCAT(ITEM_CODE)
FROM table1
GROUP BY ID
ORDER BY ID;
That will directly produce that result:
1AB,22S,1AB
67R,225
YYF,1AB,UUS,F67,F67,225
After that, cursor.fetchall() will directly return more or less what you expected, I think.
EDIT:
result = [ split(row, ',') for row in cursor.fetchall()]
Related
I have 2 tables in PostgreSQL:-
"student" table
student_id name score
1 Adam 10
2 Brian 9
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
I have a python script which fetches a DataFrame with columns - "name" and "score" and then populates it to the student table.
I want to update the student and student_log table whenever the "score" changes for a student. Also, if there is a new student name in the dataframe, I want to add another row for it in the student table as well as maintain its record in the "student_log" table. Can anyone suggest how it can be done?
Let us consider the new fetched DataFrame looks like this:-
name score
Adam 7
Lee 5
Then the Expected Result is:-
"student" table
student_id name score
1 Adam 7
2 Brian 9
3 Lee 5
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
3 1 7
4 3 5
I finally found a good answer. I used trigger, function and CTE.
I create a function to log changes along with a trigger to handle the updates. Following is the code.
CREATE OR REPLACE FUNCTION log_last_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
DECLARE
serial_num integer;
BEGIN
IF NEW.name <> OLD.name OR NEW.score <> OLD.score
THEN
SELECT SETVAL('log_id_seq', (select max(id) from log)) into serial_num;
INSERT INTO log(student_id,score)
VALUES(NEW.id,NEW.score)
ON CONFLICT DO NOTHING;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER log_student
AFTER UPDATE
ON student
FOR EACH ROW
EXECUTE PROCEDURE log_last_changes();
THE CTE expression is as follow:-
WITH new_values(id, name, score) AS (
values
(1,'Adam',7),
(2,'Brian',9),
(3,'Lee',5)
),
upsert AS
(
UPDATE student s
SET NAME = nv.name,
SCORE = nv.score
FROM new_values nv, student s2
WHERE
s.id = nv.id and s.id = s2.id
Returning s.*
)
INSERT INTO student select id, name, score
FROM
new_values
WHERE NOT EXISTS (
SELECT 1 from upsert up where up.id=new_values.id
);
I guess you try to diff two dataframe
here is a example
#old student dataframe
old_pd:pd.DataFrame
#new student dataframe
new_pd:pd.DataFrame
joined_pd = new_pd.join(old_pd,on='name',lsuffix='_new',rsuffix='_old')
diff_pd = joined_pd[joined_pd['score_new']!=joined_pd['score_old']]
#then insert all diff_pd to student_log table.and update to student table
I have a question about SQL.
I have the following sample table called Records:
record_id
subject
start_timestamp
end_timestamp
interval
2
Start Product 2
2021-04-21T16:22:39
2021-04-21T16:23:40
0.97
3
error 1
2021-04-21T16:25:44
2021-04-21T16:25:54
10.0
4
End Product 2
2021-04-21T16:30:13
2021-04-21T16:30:14
0.97
5
Start Product 1
2021-04-21T16:35:13
2021-04-21T16:35:13
0.6
6
End Product 1
2021-04-21T16:36:13
2021-04-21T16:36:13
0.45
First I select all the items that have start in there subject with and are not in the table BackupTO (for now the table BackupTO is not important):
SELECT Records.record_id, Records.start_timestamp, Records.interval FROM Records
LEFT JOIN BackupTO ON BackupTO.record_id = Records.record_id
WHERE BackupTO.record_id IS NULL AND Records.log_type = 1 AND Records.subject LIKE '%start%'
When I ran this we get:
record_id
start_timestamp
interval
2
2021-04-21T16:22:39
0.97
5
2021-04-21T16:35:13
0.6
Oke, all good now comes my question, I fetch this in Python and loop through the data, first I calculate the product number based on the interval with:
product = round(result[2] / 0.5)
So a interval of 0.97 is product 2, and a interval of 0.6,0.45 is product 1, all great!
So I know record_id 2 is product 2 and I want to execute a sql query thats returns all items starting from record_id 2 untils its find a items that has %end%2 in its name (the 2 is for product 2, could also be product 1).
For example its finds Start Product 2 I get a list with record_id 3 and 4.
I want to get all items from the start until end.
So it gets me a list like this, this are all the items found under Start Product 2 until %end%2 was found. For product 1, it just would return just record_id 6, because there is nothing between the start and stop.
record_id
start_timestamp
interval
3
2021-04-21T16:22:39
10.0
4
2021-04-21T16:35:13
0.97
I tried OFFSET and FETCH, but I couldnt get it to work, somebody that could help me out here?
Use your query as a CTE and join it to the table Records.
Then with MIN() window function find the record_id up to which you want the rows returned:
WITH cte AS (
SELECT r.*
FROM Records r LEFT JOIN BackupTO b
ON b.record_id = r.record_id
WHERE b.record_id IS NULL AND r.log_type = 1 AND r.subject LIKE '%start%'
)
SELECT *
FROM (
SELECT r.*,
MIN(CASE WHEN r.subject LIKE '%end%' THEN r.record_id END) OVER () id_end
FROM Records r INNER JOIN cte c
ON r.record_id > c.record_id
WHERE c.record_id = ?
)
WHERE COALESCE(record_id <= id_end, 1)
Change ? to 2 or 5 for each case.
If you have the record_ids returned by your query, it is simpler:
SELECT *
FROM (
SELECT r.*,
MIN(CASE WHEN r.subject LIKE '%end%' THEN r.record_id END) OVER () id_end
FROM Records r
WHERE r.record_id > ?
)
WHERE COALESCE(record_id <= id_end, 1)
See the demo.
I have two tables below:
----------
Items | QTY
----------
sugar | 14
mango | 10
apple | 50
berry | 1
----------
Items |QTY
----------
sugar |10
mango |5
apple |48
berry |1
I use the following query in python to check difference between the QTY of table one and table two.
cur = conn.cursor()
cur.execute("select s.Items, s.qty - t.qty as quantity from Stock s join Second_table t on s.Items = t.Items;")
remaining_quantity = cur.fetchall()
I'm a bit stuck on how to go about what I need to accomplish. I need to check the difference between the quantity of table one and table two, if the quantity (difference) is under 5 then for those Items I want to be able to store this in another table column with the value 1 if not then the value will be 0 for those Items. How can I go about this?
Edit:
I have attempted this like by looping through the rows and if the column value is less than 5 then insert into the new table with the value below. :
for row in remaining_quantity:
print(row[1])
if((row[1]) < 5):
cur.execute('INSERT OR IGNORE INTO check_quantity_tb VALUES (select distinct s.Items, s.qty, s.qty - t.qty as quantity, 1 from Stock s join Second_table t on s.Items = t.Items'), row)
print(row)
But I get a SQL syntax error not sure where the error could be :/
First modify your first query so you retrieve all relevant infos and don't have to issue subqueries later:
readcursor = conn.cursor()
readcursor.execute(
"select s.Items, s.qty, s.qty - t.qty as remain "
"from Stock s join Second_table t on s.Items = t.Items;"
)
Then use it to update your third table:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
if remain < 5:
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, 1)
)
conn.commit()
Note the following points:
1/ We use two distinct cursor so we can iterate over the first one while wrting with the second one. This avoids fetching all results in memory, which can be a real life saver on huge datasets
2/ when iterating on the first cursor, we unpack the rows into their individual componants. This is called "tuple unpacking" (but actually works for most sequence types):
>>> row = ("1", "2", "3")
>>> a, b, c = row
>>> a
'1'
>>> b
'2'
>>> c
'3'
3/ We let the db-api module do the proper sanitisation and escaping of the values we want to insert. This avoids headaches with escaping / quoting etc and protects your code from SQL injection attacks (not that you might have one here, but that's the correct way to write parameterized queries in Python).
NB : since you didn't not post your full table definitions nor clear specs - not even the full error message and traceback - I only translated your code snippet to something more sensible (avoiding the costly and useless subquery, which migh or not be the cause of your error). I can't garantee it will work out of the box, but at least it should put you back on tracks.
NB2 : you mentionned you had to set the last col to either 1 or 0 depending on remain value. If that's the case, you want your loop to be:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
flag = 1 if remain < 5 else 0
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, flag)
)
conn.commit()
If you instead only want to process rows where remain < 5, you can specify it directly in your first query with a where clause.
I have a table, it has two millions rows data. For each row, it has a body column, it store a JSON format data. For example:
table_a:
id user_id body
1 1 {'tel': '13678031283', 'email': 'test#gmail.com', 'name': 'test'....}
2 2 {'tel' : '1567827126', 'age': '16'....}
......
I have another table, named table_b:
table_b:
id user_id tel email name
1 1 13678019 test#qq.com test1
2 2 15627378 test1#qq.com test2
.....
table_a has 2 million rows data, I want to import all table_a data to table_b, each row of table_a should be process.
I want to deal with it like this:
for row in table_a_rows:
result = process(row)
insert result to table_b
.....
But i think it is not a good idea. it there a better way to make it?
You can select the data you need from table_a directly with JSON_EXTRACT. For example, getting the email would be something like this:
mysql> SELECT JSON_EXTRACT(body, '$.email') from table_a;
So you could replace directly into table_b all the data you have in table_a:
mysql> REPLACE INTO table_b SELECT user_id,
JSON_EXTRACT(body, '$.tel'),
JSON_EXTRACT(body,'$.email'),
JSON_EXTRACT(body,'$.name') from table_a
I'm writing a utility to help analyze SQLite database consistencies using Python. Manually I've found some inconsistencies so I thought it would be helpful if I could do these in bulk to save time. I set out to try this in Python and I'm having trouble.
Let's say I connect, create a cursor, and run a query or two and end up with a result set that I want to iterate through. I want to be able to do something like this (list of tables each with an ID as the primary key):
# python/pseudocode of what I'm looking for
for table in tables:
for pid in pids:
query = 'SELECT %s FROM %s' % (pid, table)
result = connection.execute(query)
for r in result:
print r
And that would yield a list of IDs from each table in table list. I'm not sure if I'm even close.
The problem here is that some tables have a primary key called ID while others TABLE_ID, etc. If they were all ID, I could select the IDs from each but they're not. This is why I was hoping to find a query that would allow me to select only the first column or the key for each table.
To get the columns of a table, execute PRAGMA table_info as a query.
The result's pk column shows which column(s) are part of the primary key:
> CREATE TABLE t(
> other1 INTEGER,
> pk1 INTEGER,
> other2 BLOB,
> pk2 TEXT,
> other3 FLUFFY BUNNIES,
> PRIMARY KEY (pk1, pk2)
> );
> PRAGMA table_info(t);
cid name type notnull dflt_value pk
--- ------ -------------- ------- ---------- --
0 other1 INTEGER 0 0
1 pk1 INTEGER 0 1
2 other2 BLOB 0 0
3 pk2 TEXT 0 2
4 other3 FLUFFY BUNNIES 0 0