Update multiple postgresql records using unnest - python

I have a database table with nearly 1 million records.
I have added a new column, called concentration.
I then have a function which calculates 'concentration' for each record.
Now, I want to update the records in batch, so I have been looking at the following questions/answers: https://stackoverflow.com/a/33258295/596841, https://stackoverflow.com/a/23324727/596841 and https://stackoverflow.com/a/39626473/596841, but I am not sure how to do this using unnest...
This is my Python 3 function to do the updates:
def BatchUpdateWithConcentration(tablename, concentrations):
connection = psycopg2.connect(dbname=database_name, host=host, port=port, user=username, password=password);
cursor = connection.cursor();
sql = """
update #tablename# as t
set
t.concentration = s.con
FROM unnest(%s) s(con, id)
WHERE t.id = s.id;
"""
cursor.execute(sql.replace('#tablename#',tablename.lower()), (concentrations,))
connection.commit()
cursor.close()
connection.close()
concentrations is an array of tuples:
[(3.718244705238561e-16, 108264), (...)]
The first value is a double precision and the second is an integer, representing the concentration and rowid, respectively.
The error I'm getting is:
psycopg2.ProgrammingError: a column definition list is required for functions returning "record"
LINE 5: FROM unnest(ARRAY[(3.718244705238561e-16, 108264), (...
^

Since a Python tuple is adapted by Psycopg to a Postgresql anonymous record it is necessary to specify the data types:
from unnest(%s) s(con numeric, id integer)

Related

Use a list as cursor.execute parameter with dotenv

I am using the dotenv framework to connect my python script to a postgres database.
I have a list of ids and want to delete all the rows containing thoses ids
ids_to_delete = df["food_id"].tolist()
conn = create_conn()
with conn.cursor() as cursor:
sql = "DELETE FROM food_recommandations.food_categorisation
WHERE food_categorisation.food_id = %(ids)s "
cursor.execute(sql, {"ids":ids_to_delete} )
cursor.close()
conn.close()
This must delete all the rows containing thoses ids
You can not use = with list because your columns doesn't have lists stored in it. Single cell contains one integer id. So what you are looking for is SQL in operator
sql = "DELETE FROM food_recommandations.food_categorisation
WHERE food_categorisation.food_id in %(ids)s "
cursor.execute(sql, {"ids":tuple(ids_to_delete)} )
Apparently your obscurification manager (dotenv framework) translates the structure {"ids":tuple(ids_to_delete)} to an array before transmitting to Postgres. That then required a slight alteration in you query. The in expects a delimited list which is close tp the same to you and I is vary different to Postgres. With an array use the predicate = ANY. So the query from #WasiHaider becomes:
sql = "DELETE FROM food_recommandations.food_categorisation
WHERE food_categorisation.food_id = ANY %(ids)s "
cursor.execute(sql, {"ids":tuple(ids_to_delete)} )
Note: Not tested - no data.
If successful credit to #WasiHaider.

pyodbc not enough values\n (0)

cursor.execute('''CREATE TABLE PEDIDO(
CPEDIDO INT GENERATED AS IDENTITY PRIMARY KEY,
CCLIENTE INT NOT NULL,
FECHA DATE NOT NULL
)''')
valores=[]
for i in range(10):
print(i)
x=datetime.date(year=2022,month=11,day=i+1)
valores.append((i,x))
cursor.executemany("INSERT INTO PEDIDO VALUES(?,?);", valores) #doesn't work writing [valores] instead of valores
That results in:
pyodbc.Error: ('HY000', '[HY000] [Devart][ODBC][Oracle]ORA-00947: not enough values\n (0) (SQLExecDirectW)') #when inserting the instances
I have tried to save data in two different tuples: cclients = (...) and dates=(...) and then write:
cursor.executemany("INSERT INTO PEDIDO VALUES(?,?);", [cclients, dates]).
But doesn't work
Name the columns you are inserting into (and you may not need/require including the statement terminator ; in the query):
INSERT INTO PEDIDO (CCLIENTE, FECHA) VALUES (?,?)
If you do not then Oracle will expect you to provide a value for every column in the table (including CPEDIDO).
You created a table with three columns, but you only provide two values in your SQL: INSERT INTO PEDIDO VALUES(?,?). The column CPEDIDO is defined as GENERATED AS IDENTITY. This means that you can provide a value for this column, but you don't have to. But if you leave out this column your SQL statement has to be adjusted.
INSERT INTO PEDIDO (CCLIENTE, FECHA) VALUES(?,?)

Python teradatasql get column_name and data_type

I am using teradatasql to connect DB and get the table definition. Below is my code which returns the definition for table. Here I trying find any default function which returns the colum_name and data_type of table as a separate function.
with teradatasql.connect ('{"host":"whomooz","user":"guest","password":"please"}') as con:
with con.cursor () as cur:
try:
sRequest = "show table MYTABLE"
print(sRequest)
cur.execute(sRequest)
[print(row) for row in sorted(cur.fetchall())]
except Exception as ex:
print("Ignoring", str(ex).split("\n")[0])
here I am looking for any inbuilt function which can return column_name and data_type.
output should be like
customer_name VARCHAR
address VARCHAR
type SMALLINT
I looked at the teradatasql docs but did not find any reference
We offer a sample program that demonstrates how to prepare a SQL request and use the fake_result_sets feature to obtain result set column metadata and question-mark parameter marker metadata.
See below for example code that prepares a select * for a table in conjunction with fake_result_sets and prints the result set column metadata.
import json
import teradatasql
with teradatasql.connect ('{"host":"whomooz","user":"guest","password":"please"}') as con:
with con.cursor () as cur:
cur.execute ("{fn teradata_rpo(S)}{fn teradata_fake_result_sets}select * from dbc.dbcinfo")
[ print ("DatabaseName={} TableName={} ColumnName={} TypeName={}".format (m ["DatabaseName"], m ["ObjectName"], m ["Name"], m ["TypeName"])) for m in json.loads (cur.fetchone () [7]) ]
Prints the following:
DatabaseName=DBC TableName=dbcinfo ColumnName=InfoKey TypeName=VARCHAR
DatabaseName=DBC TableName=dbcinfo ColumnName=InfoData TypeName=VARCHAR

Psycopg2/PostgreSQL 11.9: Syntax error at or near "::" when performing string->date type cast

I am using psycopg2 to create a table partition and insert some rows into this newly created partition. The table is RANGE partitioned on a date type column.
Psycopg2 code:
conn = connect_db()
cursor = conn.cursor()
sysdate = datetime.now().date()
sysdate_str = sysdate.strftime('%Y%m%d')
schema_name = "schema_name"
table_name = "transaction_log"
# Add partition if not exists for current day
sql_add_partition = sql.SQL("""
CREATE TABLE IF NOT EXISTS {table_partition}
PARTITION of {table}
FOR VALUES FROM (%(sysdate)s) TO (maxvalue);
""").format(table = sql.Identifier(schema_name, table_name), table_partition = sql.Identifier(schema_name, f'{table_name}_{sysdate_str}'))
print(cursor.mogrify(sql_add_partition, {'sysdate': dt.date(2015,6,30)}))
cursor.execute(sql_add_partition, {'sysdate': sysdate})
Formatted output of cursor.mogrify():
CREATE TABLE IF NOT EXISTS "schema_name"."transaction_log_20211001"
PARTITION of "schema_name"."transaction_log"
FOR VALUES FROM ('2021-10-01'::date) TO (maxvalue);
Error received:
ERROR: syntax error at or near "::"
LINE 3: for values FROM ('2021-10-01'::date) TO (maxvalue);
Interestingly enough, psycopg2 appears to be attempting to cast the string '2021-10-01' to a date object with the "::date" syntax, and according to the postgreSQL documentation, this appears to be valid (although there are no explicit examples given in the docs), however executing the statement with both pyscopg2 and in a postgreSQL query editor yields this syntax error. However, executing the following statement in a postgreSQL SQL editor is successful:
CREATE TABLE IF NOT EXISTS "schema_name"."transaction_log_20211001"
PARTITION of "schema_name"."transaction_log"
FOR VALUES FROM ('2021-10-01') TO (maxvalue);
Any ideas on how to get psycopg2 to format the query correctly?
To follow up on #LaurenzAlbe comment:
sql_add_partition = sql.SQL("""
CREATE TABLE IF NOT EXISTS {table_partition}
PARTITION of {table}
FOR VALUES FROM (%(sysdate)s) TO (maxvalue);
""").format(table = sql.Identifier(schema_name, table_name), table_partition = sql.Identifier(schema_name, f'{table_name}_{sysdate_str}'))
print(cursor.mogrify(sql_add_partition, {'sysdate': '2021-10-01'}))
#OR
sql_add_partition = sql.SQL("""
CREATE TABLE IF NOT EXISTS {table_partition}
PARTITION of {table}
FOR VALUES FROM ({sysdate}) TO (maxvalue);
""").format(table = sql.Identifier(schema_name, table_name),
table_partition = sql.Identifier(schema_name, f'{table_name}_{sysdate_str}'),
sysdate=sql.Literal('2021-10-01'))
print(cursor.mogrify(sql_add_partition))
#Formatted as
CREATE TABLE IF NOT EXISTS "schema_name"."transaction_log_20211001"
PARTITION of "schema_name"."transaction_log"
FOR VALUES FROM ('2021-10-01') TO (maxvalue);
Pass the date in as a literal value instead of a date object. psycopg2 does automatic adaptation of date(time) objects to Postgres date/timestamp types(Datetime adaptation) which is what is biting you.
UPDATE
Per my comment, the reason why it needs to be a literal is explained here Create Table:
Each of the values specified in the partition_bound_spec is a literal, NULL, MINVALUE, or MAXVALUE. Each literal value must be either a numeric constant that is coercible to the corresponding partition key column's type, or a string literal that is valid input for that type.

How can I insert a list returned from pyodbc mssql query into mysql through stored procedure using pymysql

I am pulling data from a MSSQL db using pyodbc which returns my data set in a list. This data then needs to be transferred into a MySQL db. I have written the following stored procedure in MySQL.
CREATE DEFINER=`root`#`localhost` PROCEDURE `sp_int_pmt`(
IN pmtamt DECIMAL(16,10),
IN pmtdt DATETIME,
IN propmtref VARCHAR(128),
IN rtdinv_id INT(11)
)
BEGIN
INSERT INTO ay_financials.payment
(
pmtamt,
pmtdt,
propmtref,
rtdinv_id
)
VALUES
(
pmtamt,
pmtdt,
propmtref,
rtdinv_id
);
END
The procedure works fine if I am inserting one record at the time. So, for now, I am iterating over the list from my MSSQL query and call the procedure for each record. I am using this code:
cursor = cnxn.cursor()
cursor.execute(""" SELECT *
FROM [%s].[dbo].[pmt]
WHERE pmtdt BETWEEN '2018-01-01' AND '2018-12-31'""" %(database))
a = cursor.fetchmany(25)
cnxn.close()
import pymysql
# MySQL configurations
un = 'ssssssss'
pw = '****************'
db = 'ay_fnls'
h = '100.100.100.100'
conn = pymysql.connect(host=h, user=un, password=pw, db=db, cursorclass=pymysql.cursors.DictCursor)
cur = conn.cursor()
for ay in a:
cur.callproc('sp_int_pmt',(ay.pmtamt,ay.pmtdt,ay.propmtref,ay.rtdinv_id))
conn.commit()
The problem I will have in production is this list will contain 10,000-100,000 every day. Iterating over that data doesn't seem like an optimized way to handle this.
How can I use the full list from the MSSQL query, call the MySQL procedure one time and insert all the relevant data?
How can I use the full list from the MSSQL query, call the MySQL procedure one time and insert all the relevant data?
You can't do that with your stored procedure as written. It will only insert one row at a time, so to insert n rows you would have to call it n times.
Also, as far as I know you can't modify the stored procedure to insert n rows without using a temporary table or some other workaround because MySQL does not support table-valued parameters to stored procedures.
You can, however, insert multiple rows at once if you use a regular INSERT statement and .executemany. pymysql will bundle the inserts into one or more multi-row inserts
mssql_crsr = mssql_cnxn.cursor()
mssql_stmt = """\
SELECT 1 AS id, N'Alfa' AS txt
UNION ALL
SELECT 2 AS id, N'Bravo' AS txt
UNION ALL
SELECT 3 AS id, N'Charlie' AS txt
"""
mssql_crsr.execute(mssql_stmt)
mssql_rows = []
while True:
row = mssql_crsr.fetchone()
if row:
mssql_rows.append(tuple(row))
else:
break
mysql_cnxn = pymysql.connect(host='localhost', port=3307,
user='root', password='_whatever_',
db='mydb', autocommit=True)
mysql_crsr = mysql_cnxn.cursor()
mysql_stmt = "INSERT INTO stuff (id, txt) VALUES (%s, %s)"
mysql_crsr.executemany(mysql_stmt, mssql_rows)
The above code produces the following in the MySQL general_log
190430 10:00:53 4 Connect root#localhost on mydb
4 Query INSERT INTO stuff (id, txt) VALUES (1, 'Alfa'),(2, 'Bravo'),(3, 'Charlie')
4 Quit
Note that pymysql cannot bundle calls to a stored procedure in the same way, so if you were to use
mysql_stmt = "CALL stuff_one(%s, %s)"
instead of a regular INSERT then the general_log would contain
190430 9:47:10 3 Connect root#localhost on mydb
3 Query CALL stuff_one(1, 'Alfa')
3 Query CALL stuff_one(2, 'Bravo')
3 Query CALL stuff_one(3, 'Charlie')
3 Quit

Categories

Resources