I have 2 tables in PostgreSQL:-
"student" table
student_id name score
1 Adam 10
2 Brian 9
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
I have a python script which fetches a DataFrame with columns - "name" and "score" and then populates it to the student table.
I want to update the student and student_log table whenever the "score" changes for a student. Also, if there is a new student name in the dataframe, I want to add another row for it in the student table as well as maintain its record in the "student_log" table. Can anyone suggest how it can be done?
Let us consider the new fetched DataFrame looks like this:-
name score
Adam 7
Lee 5
Then the Expected Result is:-
"student" table
student_id name score
1 Adam 7
2 Brian 9
3 Lee 5
"student_log" table:-
log_id student_id score
1 1 10
2 2 9
3 1 7
4 3 5
I finally found a good answer. I used trigger, function and CTE.
I create a function to log changes along with a trigger to handle the updates. Following is the code.
CREATE OR REPLACE FUNCTION log_last_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
DECLARE
serial_num integer;
BEGIN
IF NEW.name <> OLD.name OR NEW.score <> OLD.score
THEN
SELECT SETVAL('log_id_seq', (select max(id) from log)) into serial_num;
INSERT INTO log(student_id,score)
VALUES(NEW.id,NEW.score)
ON CONFLICT DO NOTHING;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER log_student
AFTER UPDATE
ON student
FOR EACH ROW
EXECUTE PROCEDURE log_last_changes();
THE CTE expression is as follow:-
WITH new_values(id, name, score) AS (
values
(1,'Adam',7),
(2,'Brian',9),
(3,'Lee',5)
),
upsert AS
(
UPDATE student s
SET NAME = nv.name,
SCORE = nv.score
FROM new_values nv, student s2
WHERE
s.id = nv.id and s.id = s2.id
Returning s.*
)
INSERT INTO student select id, name, score
FROM
new_values
WHERE NOT EXISTS (
SELECT 1 from upsert up where up.id=new_values.id
);
I guess you try to diff two dataframe
here is a example
#old student dataframe
old_pd:pd.DataFrame
#new student dataframe
new_pd:pd.DataFrame
joined_pd = new_pd.join(old_pd,on='name',lsuffix='_new',rsuffix='_old')
diff_pd = joined_pd[joined_pd['score_new']!=joined_pd['score_old']]
#then insert all diff_pd to student_log table.and update to student table
Related
I have the following table in my database, which represents the shifts of a working day.
When a new product is added to another table 'Products' I want to assign a shift to it based on the start_timestamp.
So when I insert into Products its takes start_timestamp and looks in table ProductionPlan and looks for a result (ProductionPlan.name) where it is between the start and end timestamp of that shift.
On that way I can assign a shift to the product.
I hope somebody can help me out with this!
Table ProductionPlan
name
start_timestamp
end_timestamp
shift 1
2021-05-10T07:00:00
2021-05-10T11:00:00
shift 2
2021-05-10T11:00:00
2021-05-10T15:00:00
shift 3
2021-05-10T15:00:00
2021-05-10T19:00:00
shift 1
2021-05-11T07:00:00
2021-05-11T11:00:00
shift 2
2021-05-11T11:00:00
2021-05-11T15:00:00
shift 3
2021-05-11T15:00:00
2021-05-11T19:00:00
Table Products
id
name
start_timestamp
end_timestamp
shift
1
Schroef
2021-05-10T08:09:05
2021-05-10T08:19:05
2
Bout
2021-05-10T08:20:08
2021-04-28T08:30:11
3
Schroef
2021-05-10T12:09:12
2021-04-28T12:30:15
I have the following code to insert into Products:
def insertNewProduct(self, log):
"""
This function is used to insert a new product into the database.
#param log : a object to log
#return None.
"""
debug("Class: SQLite, function: insertNewProduct")
self.__openDB()
timestampStart = datetime.fromtimestamp(int(log.startTime)).isoformat()
queryToExecute = "INSERT INTO Products (name, start_timestamp) VALUES('{0}','{1}')".format(log.summary,
timestampStart)
self.cur.execute(queryToExecute)
self.__closeDB()
return self.cur.lastrowid
It's just a simple INSERT INTO but I want to add a query or even extend this query to fill in the column shift.
You can use a SELECT inside an INSERT.
queryToExecute = """INSERT INTO Products (name, start_timestamp, shift)
SELECT :1, :2, name FROM ProductionPlan pp
WHERE :2 BETWEEN pp.start_timestamp and pp.end_timestamp"""
self.cur.execute(queryToExecute, (log.summary, timestampStart))
In above code I have used a parameterized query because I hate inserting parameters as strings inside a query. It was the cause of too many SQL injection attacks...
This question is duplicated I think, but I couldn't understand other answers...
Original table looks like this.
NAME AGE SMOKE
John 25 None
Alice 23 None
Ken 26 None
I will update SMOKE column in these rows,
But I need to use output of function which was coded in python check_smoke(). If I input name to check_smoke(), then it returns "Smoke" or "Not Smoke".
So final table would look like below:
NAME AGE SMOKE
John 25 Smoke
Alice 23 Not Smoke
Ken 26 Not Smoke
I'm using sqlite3 and python3.
How can I do it? Thank you for help!
You could use 1 cursor to select rows and another one to update them.
Assuming that the name of the table is smk (replace it by your actual name) and that con is an established connection to the database, you could do:
curs = con.cursor()
curs2 = con.cursor()
batch = 64 # size of a batch of records
curs.execute("SELECT DISTINCT name FROM smk")
while True:
names = curs.fetchmany(batch) # extract a bunch of rows
if len(names) == 0: break
curs2.executemany('UPDATE smk SET smoke=? WHERE name=?', # and update them
[(check_smoke(name[0]), name[0]) for name in names])
con.commit()
I am very new to Mysql and dumping a file in db using python. I'm having 2 tables
The file format is:
id name sports
1 john baseball
2 mary Football
like student & Sports
Student table
id name
1 John
2 Mary
here id is primary key
& in sports table
stu_id sports_title
1 Baseball
2 Football
and here stu_id is foreign key reference with student table
and my problem is
query="insert into sports (stu_id,name)VALUES (%d,%s)"
("select id from student where id=%d,%s")
#words[0]=1 ,words[2]=Baseball
args=(words[0],words[2])
cursor.execute(query,args)
upon executing this code, I'm facing
"Not all parameters were used in the SQL statement")
ProgrammingError: Not all parameters were used in the SQL statement
You can't use both VALUES and SELECT as the source of the data in INSERT. You use VALUES if the data is literals, you use SELECT if you're getting it from another table. If it's a mix of both, you use SELECT and put the literals into the SELECT list.
query = """
INSERT INTO sports (stu_id, sports_title)
SELECT id, %s
FROM student
WHERE name = %s
"""
args = (words[2], words[1])
cursor.execute(args)
Or, since the file contains the student ID, you don't need SELECT at all.
query = "INSERT INTO sports(stu_id, sports_title) VALUES (%d, %s)"
args = (words[0], words[1])
cursor.execute(args)
I'm writing a utility to help analyze SQLite database consistencies using Python. Manually I've found some inconsistencies so I thought it would be helpful if I could do these in bulk to save time. I set out to try this in Python and I'm having trouble.
Let's say I connect, create a cursor, and run a query or two and end up with a result set that I want to iterate through. I want to be able to do something like this (list of tables each with an ID as the primary key):
# python/pseudocode of what I'm looking for
for table in tables:
for pid in pids:
query = 'SELECT %s FROM %s' % (pid, table)
result = connection.execute(query)
for r in result:
print r
And that would yield a list of IDs from each table in table list. I'm not sure if I'm even close.
The problem here is that some tables have a primary key called ID while others TABLE_ID, etc. If they were all ID, I could select the IDs from each but they're not. This is why I was hoping to find a query that would allow me to select only the first column or the key for each table.
To get the columns of a table, execute PRAGMA table_info as a query.
The result's pk column shows which column(s) are part of the primary key:
> CREATE TABLE t(
> other1 INTEGER,
> pk1 INTEGER,
> other2 BLOB,
> pk2 TEXT,
> other3 FLUFFY BUNNIES,
> PRIMARY KEY (pk1, pk2)
> );
> PRAGMA table_info(t);
cid name type notnull dflt_value pk
--- ------ -------------- ------- ---------- --
0 other1 INTEGER 0 0
1 pk1 INTEGER 0 1
2 other2 BLOB 0 0
3 pk2 TEXT 0 2
4 other3 FLUFFY BUNNIES 0 0
how can I transform this table:
ID ITEM_CODE
--------------------
1 1AB
1 22S
1 1AB
2 67R
2 225
3 YYF
3 1AB
3 UUS
3 F67
3 F67
3 225
......
..to a list of lists, each list being a distinct ID containing its allocated item_codes?
in the form: [[1AB,22S,1AB],[67R,225],[YYF,1AB,UUS,F67,F67,225]]
Using this query:
SELECT ID, ITEM_CODE
FROM table1
ORDER BY ID;
and doing cursor.fetchall() in python does not return it as a list nor ordered by ID
Thank you
You probly will have less post-processing in Python using that query:
SELECT GROUP_CONCAT(ITEM_CODE)
FROM table1
GROUP BY ID
ORDER BY ID;
That will directly produce that result:
1AB,22S,1AB
67R,225
YYF,1AB,UUS,F67,F67,225
After that, cursor.fetchall() will directly return more or less what you expected, I think.
EDIT:
result = [ split(row, ',') for row in cursor.fetchall()]