How to update a column based on counts from another table? - python

I need to update the table actor, column numCharacters, depending on how many times each actor's actorID shows up on the characters table.
I have the following code:
cursor = connection.cursor()
statement = 'UPDATE actor SET numCharacters = (SELECT count(*) FROM characters GROUP BY actorID)';
cursor.execute(statement);
connection.commit()
Does anyone know how I could complete it?

I think your problem comes from your sub query will return multiple row, so the update statement won't know which row to update. Try updating your query to this:
UPDATE actor a
SET a.numCharacter = (
SELECT count(*)
from characters c
WHERE actorId = a.id
Group by actorId
);
db fiddle link

Related

Python Regex applying split to a capturing group

I'm trying to get create statement from ctas statement by adding limit 0 to all tables as I don't want to load the table. Using python for the same
Input :
create table new_table as select
mytable.column1, mytable2.column2
from schema.mytable left join mytable2 ;
Expected Output:
create table new_table IF NOT EXISTS as select
mytable.column1, mytable2.column2
from (select * from mytable limit 0) mytable join (select * from mytable2 limit 0) mytable2
I have to replace all the tables in from and join clause to (select * from tablename limit 0) and alias.
However I'm only able to generate below, not able to get the table name and add it as alias. Also not able to change the last name in the join clause. If the input has alias explicitly mentioned, I'm able to generate it. I'm very new to using regex and feel very overwhelmed. Appreciate support from the experts here.
Output obtained:
create table new_table as select
mytable.column1, mytable2.column2
from (select * from schema.mytable limit 0) join mytable2 ;
Code I tried: (First tried to capture if there's already alias and put it in capture group 4. I would like to generate alias when tables do not have alias explicitly mentioned. Capture group 2 would get the schema_name.table_name. If I can use python function split to the capturing group. Also the last table in the sql I'm not able to translate
import re
sql = """
create table new_table as select
mytable.column1, mytable2.column2
from schema.mytable left join mytable2 ;"""
rgxsubtable = re.compile(r"\b((?:from|join)\s+)([\w.\"]+)([\)\s]+)(\bleft|on|cross|join|inner\b)",re.MULTILINE|re.IGNORECASE) # look for table names in from and join clauses
rgxalias = re.compile(r"\b((?:from|join)\s+)([\w.\"]+)(\s+)(\b(?!left|on|cross|join|inner\b)\w+)",re.MULTILINE|re.IGNORECASE) # look for table names in from and join clauses but with aliases
sql = rgxalias.sub(r"\1 (select * from \2 limit 0) \4 ", sql)
sql = rgxsubtable.sub(r"\1 (select * from \2 limit 0) ", sql)

Is it possible to assign cursor.fetchall() to a variable?

rows_order = "SELECT COUNT (*) FROM 'Order'"
cursor.execute(rows_order)
ordernum = cursor.fetchall()
connection.commit()
cursor.execute("INSERT INTO 'Order' (OrderNo, CustomerID, Date, TotalCost) VALUES (?,?,?,?)", (
[ordernum], custid_Sorder, now, total_item_price))
This is what I am trying but this error popped up;
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
How do I fix this? I want to make it so the OrderNo is = to the amount of orders before it, hence why I want to assign the orderno to it. (I am using sqlite3)
as you have only one value you need only fetchone
import sqlite3
con = sqlite3.connect("tutorial.db")
cursor = con.cursor()
rows_order = "SELECT COUNT (*) FROM 'Order'"
cursor.execute(rows_order)
ordernum = cursor.fetchone()[0]
cursor.execute("INSERT INTO 'Order' (OrderNo, CustomerID, Date, TotalCost) VALUES (?,?,?,?)", (
ordernum, custid_Sorder, now, total_item_price))
tl;dr Don't do this. Use an auto-incremented primary key.
fetchall returns all rows as a list, even if there is only one row.
Instead, use fetchone. This will return a single tuple which you can then select the first item. ordernum = cursor.fetchone()[0]
However, you appear to be writing a query to get the next ID. Using count(*) is wrong. If there are any gaps in OrderNo, for example if something gets deleted, it can return a duplicate. Consider [1, 3, 4]; count(*) will return 3. Use max(OrderNo) instead.
Furthermore, if you try to insert two orders at the same time you might get a race condition and one will try to duplicate the other.
process 1 process 2
select max(orderNo)
fetchone # 4
select max(orderNo)
fetchone # 4
insert into orders...
insert into orders... # duplicate OrderNo
To avoid this, you have to do both the select and insert in a transaction.
process 1 process 2
begin
select max(orderNo)...
fetchone # 4 begin
select max(orderNo)
fetchone
insert into orders... # wait
commit # wait
# 5
insert into orders...
commit
Better yet, do them as a single query.
insert into "Order" (OrderNo, CustomerID, Date, TotalCost)
select max(orderNo), ?, ?, ?
from "order"
Even better don't do it at all. There is a built-in mechanism to do this use an auto-incremented primary keys.
-- order is a keyword, pluralizing table names helps to avoid them
create table orders (
-- It is a special feature of SQLite that this will automatically be unique.
orderNo integer primary key
customerID int,
-- date is also a keyword, and vague. Use xAt.
orderedAt timestamp,
totalCost int
)
-- orderNo will automatically be set to a unique number
insert into orders (customerID, orderedAt, totalCost) values (...)

Update one postgres table from another postgres table

I am loading a batch csv file to postgres using python (Say Table A).
I am using pandas to upload the data into chunk which is quite faster.
for chunk in pd.read_csv(csv_file, sep='|',chunksize=chunk_size,low_memory=False):
Now I want to update another table (say Table B) using A based on following rules
if there are any new records in table A which is not in table B then insert that as a new record in table B (based on Id field)
if the values changes in the Table A for the same ID which exists in Table B then update the records in table B using TableA
(There are server tables which i need to update based on Table A )
I am able to do that using below and then loop through each row, but Table A always have records around 1,825,172 and it becomes extremely slow. Any forum member can help to speed this up or suggest a alternate approach to achieve the same.
cursor.execute(sql)
records = cursor.fetchall()
for row in records:
id= 0 if row[0] is None else row[0] # Use this to match with Table B and decide insert or update
id2=0 if row[1] is None else row[1]
id2=0 if row[2] is None else row[2]
You could leverage Postgres upsert syntax, like:
insert into tableB tb (id, col1, col2)
select ta.id, ta.col1, ta.col2 from tableA ta
on conflict(id) do update
set col1 = ta.col1, col2 = ta.col2
You should do this completely inside the DBMS, not loop through the records inside your python script. That allows your DBMS to better optimize.
UPDATE TableB
SET x=y
FROM TableA
WHERE TableA.id = TableB.id
INSERT INTO TableB(id,x)
SELECT id, y
FROM TableA
WHERE TableA.id NOT IN ( SELECT id FROM TableB )

In python script i have insert query but when i want insert multiple columns in the same query it gives error

In python script i have insert query but when i want insert multiple columns in the same query it gives error.
but for single query it works perfectly.
Below is my code.
my database AWS S3.
A = [] #
for score_row in score:
A.append(score_row[2])
print("A=",A)
B = [] #
for day_row in score:
B.append(day_row[1])
print("B=",B)
for x,y in zip(A,B):
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.executemany(sql, (x,),(y,))
when i replace above query with following sql insert statement it works perfect.
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?)"""
cursor.executemany(sql, (x,))
Fix your code like this:
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.execute(sql, (x,y,)) #<-- here
Because is just a onet insert ( not several inserts )
Explanation
I guess you are mistaked about number of inserts ( rows ) and number of parĂ meters ( fields to insert on each row ). When you want to insert several rows, use executemany, just for one row you should to use execute. Second parapeter of execute is the "list" (or sequence ) of values to be inserted in this row.
Alternative
You can try to change syntax and insert all data in one shot using ** syntax:
values = zip(A,B) #instead of "for"
sql = """INSERT INTO calculated_corr_coeff(date,Day) VALUES (?,?)"""
cursor.executemany(sql, **values )
Notice this approach don't use for statement. This mean all data is send to database in one call, this is more efficient.

How can I do upsert (update and insert) query in MySQL Python?

I'm looking for a simple upsert (Update/Insert).
I have table in which I am inserting rows for books table but next time when I want to insert row I don't want to insert again data for that table just want to update with required columns if it exits there if not then create new row.
How can I do this in Mysql-python?
cursor.execute("""INSERT INTO books (book_code,book_name,created_at,updated_at) VALUES (%s,%s,%s,%s)""", (book_code,book_name,curr_time,curr_time,))
MySQL has REPLACE statement:
REPLACE works exactly like INSERT, except that if an old row in the
table has the same value as a new row for a PRIMARY KEY or a UNIQUE
index, the old row is deleted before the new row is inserted.
cursor.execute("""
REPLACE INTO books (book_code,book_name,created_at,updated_at)
VALUES (%s,%s,%s,%s)""",
(book_code,book_name,curr_time,curr_time,)
)
UPDATE According to comment of #Yo-han, REPLACE is like DELETE and INSERT, not UPSERT. Here's alternative using INSERT ... ON DUPLICATE KEY UPDATE:
cursor.execute("""
INSERT INTO books (book_code,book_name,created_at,updated_at)
VALUES (%s,%s,%s,%s)
ON DUPLICATE KEY UPDATE book_name=%s, created_at=%s, updated_at=%s
""", (book_code, book_name, curr_time, curr_time, book_name, curr_time, curr_time))

Categories

Resources