MySQL table to Python list of lists using fetchall()

MySQL table to Python list of lists using fetchall() - python

how can I transform this table:
ID ITEM_CODE
--------------------
1 1AB
1 22S
1 1AB
2 67R
2 225
3 YYF
3 1AB
3 UUS
3 F67
3 F67
3 225
......
..to a list of lists, each list being a distinct ID containing its allocated item_codes?
in the form: [[1AB,22S,1AB],[67R,225],[YYF,1AB,UUS,F67,F67,225]]
Using this query:
SELECT ID, ITEM_CODE
FROM table1
ORDER BY ID;
and doing cursor.fetchall() in python does not return it as a list nor ordered by ID
Thank you

You probly will have less post-processing in Python using that query:
SELECT GROUP_CONCAT(ITEM_CODE)
FROM table1
GROUP BY ID
ORDER BY ID;
That will directly produce that result:
1AB,22S,1AB
67R,225
YYF,1AB,UUS,F67,F67,225
After that, cursor.fetchall() will directly return more or less what you expected, I think.
EDIT:
result = [ split(row, ',') for row in cursor.fetchall()]

Related

Real time data handling in postgresql from pandas

I have 2 tables in PostgreSQL:-
"student" table
student_id name score
1 Adam 10
2 Brian 9
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
I have a python script which fetches a DataFrame with columns - "name" and "score" and then populates it to the student table.
I want to update the student and student_log table whenever the "score" changes for a student. Also, if there is a new student name in the dataframe, I want to add another row for it in the student table as well as maintain its record in the "student_log" table. Can anyone suggest how it can be done?
Let us consider the new fetched DataFrame looks like this:-
name score
Adam 7
Lee 5
Then the Expected Result is:-
"student" table
student_id name score
1 Adam 7
2 Brian 9
3 Lee 5
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
3 1 7
4 3 5

I finally found a good answer. I used trigger, function and CTE.
I create a function to log changes along with a trigger to handle the updates. Following is the code.
CREATE OR REPLACE FUNCTION log_last_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
DECLARE
serial_num integer;
BEGIN
IF NEW.name <> OLD.name OR NEW.score <> OLD.score
THEN
SELECT SETVAL('log_id_seq', (select max(id) from log)) into serial_num;
INSERT INTO log(student_id,score)
VALUES(NEW.id,NEW.score)
ON CONFLICT DO NOTHING;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER log_student
AFTER UPDATE
ON student
FOR EACH ROW
EXECUTE PROCEDURE log_last_changes();
THE CTE expression is as follow:-
WITH new_values(id, name, score) AS (
values
(1,'Adam',7),
(2,'Brian',9),
(3,'Lee',5)
),
upsert AS
(
UPDATE student s
SET NAME = nv.name,
SCORE = nv.score
FROM new_values nv, student s2
WHERE
s.id = nv.id and s.id = s2.id
Returning s.*
)
INSERT INTO student select id, name, score
FROM
new_values
WHERE NOT EXISTS (
SELECT 1 from upsert up where up.id=new_values.id
);

I guess you try to diff two dataframe
here is a example
#old student dataframe
old_pd:pd.DataFrame
#new student dataframe
new_pd:pd.DataFrame
joined_pd = new_pd.join(old_pd,on='name',lsuffix='_new',rsuffix='_old')
diff_pd = joined_pd[joined_pd['score_new']!=joined_pd['score_old']]
#then insert all diff_pd to student_log table.and update to student table

SQLite fetch until

I have a question about SQL.
I have the following sample table called Records:
record_id
subject
start_timestamp
end_timestamp
interval
2
Start Product 2
2021-04-21T16:22:39
2021-04-21T16:23:40
0.97
3
error 1
2021-04-21T16:25:44
2021-04-21T16:25:54
10.0
4
End Product 2
2021-04-21T16:30:13
2021-04-21T16:30:14
0.97
5
Start Product 1
2021-04-21T16:35:13
2021-04-21T16:35:13
0.6
6
End Product 1
2021-04-21T16:36:13
2021-04-21T16:36:13
0.45
First I select all the items that have start in there subject with and are not in the table BackupTO (for now the table BackupTO is not important):
SELECT Records.record_id, Records.start_timestamp, Records.interval FROM Records
LEFT JOIN BackupTO ON BackupTO.record_id = Records.record_id
WHERE BackupTO.record_id IS NULL AND Records.log_type = 1 AND Records.subject LIKE '%start%'
When I ran this we get:
record_id
start_timestamp
interval
2
2021-04-21T16:22:39
0.97
5
2021-04-21T16:35:13
0.6
Oke, all good now comes my question, I fetch this in Python and loop through the data, first I calculate the product number based on the interval with:
product = round(result[2] / 0.5)
So a interval of 0.97 is product 2, and a interval of 0.6,0.45 is product 1, all great!
So I know record_id 2 is product 2 and I want to execute a sql query thats returns all items starting from record_id 2 untils its find a items that has %end%2 in its name (the 2 is for product 2, could also be product 1).
For example its finds Start Product 2 I get a list with record_id 3 and 4.
I want to get all items from the start until end.
So it gets me a list like this, this are all the items found under Start Product 2 until %end%2 was found. For product 1, it just would return just record_id 6, because there is nothing between the start and stop.
record_id
start_timestamp
interval
3
2021-04-21T16:22:39
10.0
4
2021-04-21T16:35:13
0.97
I tried OFFSET and FETCH, but I couldnt get it to work, somebody that could help me out here?

Use your query as a CTE and join it to the table Records.
Then with MIN() window function find the record_id up to which you want the rows returned:
WITH cte AS (
SELECT r.*
FROM Records r LEFT JOIN BackupTO b
ON b.record_id = r.record_id
WHERE b.record_id IS NULL AND r.log_type = 1 AND r.subject LIKE '%start%'
)
SELECT *
FROM (
SELECT r.*,
MIN(CASE WHEN r.subject LIKE '%end%' THEN r.record_id END) OVER () id_end
FROM Records r INNER JOIN cte c
ON r.record_id > c.record_id
WHERE c.record_id = ?
)
WHERE COALESCE(record_id <= id_end, 1)
Change ? to 2 or 5 for each case.
If you have the record_ids returned by your query, it is simpler:
SELECT *
FROM (
SELECT r.*,
MIN(CASE WHEN r.subject LIKE '%end%' THEN r.record_id END) OVER () id_end
FROM Records r
WHERE r.record_id > ?
)
WHERE COALESCE(record_id <= id_end, 1)
See the demo.

Run checks on Items from tables in Sqlite and python

I have two tables below:
----------
Items | QTY
----------
sugar | 14
mango | 10
apple | 50
berry | 1
----------
Items |QTY
----------
sugar |10
mango |5
apple |48
berry |1
I use the following query in python to check difference between the QTY of table one and table two.
cur = conn.cursor()
cur.execute("select s.Items, s.qty - t.qty as quantity from Stock s join Second_table t on s.Items = t.Items;")
remaining_quantity = cur.fetchall()
I'm a bit stuck on how to go about what I need to accomplish. I need to check the difference between the quantity of table one and table two, if the quantity (difference) is under 5 then for those Items I want to be able to store this in another table column with the value 1 if not then the value will be 0 for those Items. How can I go about this?
Edit:
I have attempted this like by looping through the rows and if the column value is less than 5 then insert into the new table with the value below. :
for row in remaining_quantity:
print(row[1])
if((row[1]) < 5):
cur.execute('INSERT OR IGNORE INTO check_quantity_tb VALUES (select distinct s.Items, s.qty, s.qty - t.qty as quantity, 1 from Stock s join Second_table t on s.Items = t.Items'), row)
print(row)
But I get a SQL syntax error not sure where the error could be :/

First modify your first query so you retrieve all relevant infos and don't have to issue subqueries later:
readcursor = conn.cursor()
readcursor.execute(
"select s.Items, s.qty, s.qty - t.qty as remain "
"from Stock s join Second_table t on s.Items = t.Items;"
)
Then use it to update your third table:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
if remain < 5:
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, 1)
)
conn.commit()
Note the following points:
1/ We use two distinct cursor so we can iterate over the first one while wrting with the second one. This avoids fetching all results in memory, which can be a real life saver on huge datasets
2/ when iterating on the first cursor, we unpack the rows into their individual componants. This is called "tuple unpacking" (but actually works for most sequence types):
>>> row = ("1", "2", "3")
>>> a, b, c = row
>>> a
'1'
>>> b
'2'
>>> c
'3'
3/ We let the db-api module do the proper sanitisation and escaping of the values we want to insert. This avoids headaches with escaping / quoting etc and protects your code from SQL injection attacks (not that you might have one here, but that's the correct way to write parameterized queries in Python).
NB : since you didn't not post your full table definitions nor clear specs - not even the full error message and traceback - I only translated your code snippet to something more sensible (avoiding the costly and useless subquery, which migh or not be the cause of your error). I can't garantee it will work out of the box, but at least it should put you back on tracks.
NB2 : you mentionned you had to set the last col to either 1 or 0 depending on remain value. If that's the case, you want your loop to be:
writecursor = conn.cursor()
for items, qty, remain in readcursor:
print(remain)
flag = 1 if remain < 5 else 0
writecursor.execute(
'INSERT OR IGNORE INTO check_quantity_tb VALUES (?, ?, ?, ?)',
(items, qty, remain, flag)
)
conn.commit()
If you instead only want to process rows where remain < 5, you can specify it directly in your first query with a where clause.

mysql millions of rows data import from one table to another table

I have a table, it has two millions rows data. For each row, it has a body column, it store a JSON format data. For example:
table_a:
id user_id body
1 1 {'tel': '13678031283', 'email': 'test#gmail.com', 'name': 'test'....}
2 2 {'tel' : '1567827126', 'age': '16'....}
......
I have another table, named table_b:
table_b:
id user_id tel email name
1 1 13678019 test#qq.com test1
2 2 15627378 test1#qq.com test2
.....
table_a has 2 million rows data, I want to import all table_a data to table_b, each row of table_a should be process.
I want to deal with it like this:
for row in table_a_rows:
result = process(row)
insert result to table_b
.....
But i think it is not a good idea. it there a better way to make it?

You can select the data you need from table_a directly with JSON_EXTRACT. For example, getting the email would be something like this:
mysql> SELECT JSON_EXTRACT(body, '$.email') from table_a;
So you could replace directly into table_b all the data you have in table_a:
mysql> REPLACE INTO table_b SELECT user_id,
JSON_EXTRACT(body, '$.tel'),
JSON_EXTRACT(body,'$.email'),
JSON_EXTRACT(body,'$.name') from table_a

Loop Through First Column (Attribute) of Each Table from SQLite Database with Python

I'm writing a utility to help analyze SQLite database consistencies using Python. Manually I've found some inconsistencies so I thought it would be helpful if I could do these in bulk to save time. I set out to try this in Python and I'm having trouble.
Let's say I connect, create a cursor, and run a query or two and end up with a result set that I want to iterate through. I want to be able to do something like this (list of tables each with an ID as the primary key):
# python/pseudocode of what I'm looking for
for table in tables:
for pid in pids:
query = 'SELECT %s FROM %s' % (pid, table)
result = connection.execute(query)
for r in result:
print r
And that would yield a list of IDs from each table in table list. I'm not sure if I'm even close.
The problem here is that some tables have a primary key called ID while others TABLE_ID, etc. If they were all ID, I could select the IDs from each but they're not. This is why I was hoping to find a query that would allow me to select only the first column or the key for each table.

To get the columns of a table, execute PRAGMA table_info as a query.
The result's pk column shows which column(s) are part of the primary key:
> CREATE TABLE t(
> other1 INTEGER,
> pk1 INTEGER,
> other2 BLOB,
> pk2 TEXT,
> other3 FLUFFY BUNNIES,
> PRIMARY KEY (pk1, pk2)
> );
> PRAGMA table_info(t);
cid name type notnull dflt_value pk
--- ------ -------------- ------- ---------- --
0 other1 INTEGER 0 0
1 pk1 INTEGER 0 1
2 other2 BLOB 0 0
3 pk2 TEXT 0 2
4 other3 FLUFFY BUNNIES 0 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

MySQL table to Python list of lists using fetchall() - python

Related

Real time data handling in postgresql from pandas

SQLite fetch until

Run checks on Items from tables in Sqlite and python

mysql millions of rows data import from one table to another table

Loop Through First Column (Attribute) of Each Table from SQLite Database with Python

Categories

Resources