Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?
As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)
Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)
I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).
Related
I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:
import pyodbc
table_name = 'my_table'
insert_values = [(1,2,3),(2,2,4),(3,4,5)]
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(
' '.join([
'insert into',
table_name,
'values',
','.join(
[str(i) for i in insert_values]
)
])
)
cursor.commit()
This should work as long as there are no duplicate keys (let's assume the first column contains the key). However for data with duplicate keys (data already existing in the table) it will raise an error.
How can I, in one go, insert multiple rows in a SQL server table using pyodbc such that data with duplicate keys simply gets updated.
Note: There are solutions proposed for single rows of data, however, I would like to insert multiple rows at once (avoid loops)!
This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:
MERGE INTO MyTable as Target
USING (SELECT * FROM
(VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5))
AS s (ID, col_a, col_b)
) AS Source
ON Target.ID=Source.ID
WHEN NOT MATCHED THEN
INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
WHEN MATCHED THEN
UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
You can give it a try on rextester.com/IONFW62765.
Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).
In python with pyodbc, it should probably look like this:
import pyodbc
insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
table_name = 'my_table'
key_col = 'ID'
col_a = 'col_a'
col_b = 'col_b'
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(('MERGE INTO {table_name} as Target '
'USING (SELECT * FROM '
'(VALUES {vals}) '
'AS s ({k}, {a}, {b}) '
') AS Source '
'ON Target.ID=Source.ID '
'WHEN NOT MATCHED THEN '
'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
'WHEN MATCHED THEN '
'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
.format(table_name=table_name,
vals=','.join([str(i) for i in insert_values]),
k=key_col,
a=col_a,
b=col_b)))
cursor.commit()
You can read up more on MERGE in the SQL Server docs.
Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:
# assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
# also assumes your dataframe columns have all the same names as the existing table
table_name_to_update = 'update_table'
table_name_to_transfer = 'placeholder_table'
# the dataframe and existing table should both have a column to use as the primary key
primary_key_col = 'id'
# replace the placeholder table with the dataframe
df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
# building the command terms
cols_list = df.columns.tolist()
cols_list_query = f'({(", ".join(cols_list))})'
sr_cols_list = [f'Source.{i}' for i in cols_list]
sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
up_cols_list_query = f'{", ".join(up_cols_list)}'
# fill values that should be interpreted as "NULL" with None
def fill_null(vals: list) -> list:
def bad(val):
if isinstance(val, type(pd.NA)):
return True
# the list of values you want to interpret as 'NULL' should be
# tweaked to your needs
return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
return tuple(i if not bad(i) else None for i in vals)
# create the list of parameter indicators (?, ?, ?, etc...)
# and the parameters, which are the values to be inserted
params = [fill_null(row.tolist()) for _, row in df.iterrows()]
param_slots = '('+', '.join(['?']*len(df.columns))+')'
cmd = f'''
MERGE INTO {table_name_to_update} as Target
USING (SELECT * FROM
(VALUES {param_slots})
AS s {cols_list_query}
) AS Source
ON Target.{primary_key_col}=Source.{primary_key_col}
WHEN NOT MATCHED THEN
INSERT {cols_list_query} VALUES {sr_cols_list_query}
WHEN MATCHED THEN
UPDATE SET {up_cols_list_query};
'''
# execute the command to merge tables
with engine.begin() as conn:
conn.execute(cmd, params)
This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).
For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:
import urllib
import pyodbc
pyodbc.pooling = False
import sqlalchemy
terms = urllib.parse.quote_plus(
'DRIVER={SQL Server Native Client 11.0};'
'SERVER=<your server>;'
'DATABASE=<your database>;'
'Trusted_Connection=yes;' # to logon using Windows credentials
url = f'mssql+pyodbc:///?odbc_connect={terms}'
engine = sqlalchemy.create_engine(url, fast_executemany=True)
EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.
Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.
def append(df, c):
table_name = 'ddf.ddf_actuals'
columns_list = df.columns.tolist()
columns_list_query = f'({(",".join(columns_list))})'
sr_columns_list = [f'Source.{i}' for i in columns_list]
sr_columns_list_query = f'({(",".join(sr_columns_list))})'
up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
up_columns_list_query = f'{",".join(up_columns_list)}'
rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
query = f"MERGE INTO {table_name} as Target \
USING (SELECT * FROM \
(VALUES {rows_to_insert}) \
AS s {columns_list_query}\
) AS Source \
ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
WHEN NOT MATCHED THEN \
INSERT {columns_list_query} VALUES {sr_columns_list_query} \
WHEN MATCHED THEN \
UPDATE SET {up_columns_list_query};"
c.execute(query)
c.commit()
I want to insert/update values from a pandas dataframe into a postgres table.
I have a unique tuple (a,b) in the postgres table. If the tuple already exists I only want to update the third value c, if the tuple doesn't exist I want to create a triple (a,b,c).
What is the most efficient way to do so? I guess some sort of bulk insert, but I am not quite sure how exactly.
You can convert your dataframe to a CTE https://www.postgresql.org/docs/current/queries-with.html and insert the data from the CTE into the table afterwards. Something like this:
def convert_df_to_cte(df):
vals = ', \n'.join([f"{tuple([f'$str${e}$str$' for e in row])}" for row in df.values])
vals = vals.replace("'$str$", "$str$")
vals = vals.replace("$str$'", "$str$")
vals = vals.replace('"$str$', "$str$")
vals = vals.replace('$str$"', "$str$")
vals = vals.replace('$str$nan$str$', 'NULL')
columns = ', \n'.join(df.columns)
sql = f"""
WITH vals AS (
SELECT
{columns}
FROM
(VALUES {vals}) AS t ({columns})
)
"""
return sql
df = pd.DataFrame([[1, 2, 3]], columns=['col_1', 'col_2', 'col_3'])
cte_sql = convert_df_to_cte(df)
sql_to_insert = f"""
{cte_sql}
INSERT INTO schema.table (col_1, col_2, col_3)
SELECT
col_1::integer, -- don't forget to cast to right type to avoid errors
col_2::integer, -- don't forget to cast to right type to avoid errors
col_3::character varying
FROM
vals
ON CONFLICT (col_1, col_2) DO UPDATE SET
col_3 = excluded.col_3;
"""
run_sql(sql)
cHandler = myDB.cursor()
cHandler.execute('select UserId,C1,LogDate from DeviceLogs_12_2019') // data from remote sql server database
curs = connection.cursor()
curs.execute("""select * from biometric""") //data from my database table
lst = []
result= cHandler.fetchall()
for row in result:
lst.append(row)
lst2 = []
result2= curs.fetchall()
for row in result2:
lst2.append(row)
t = []
r = [elem for elem in lst if not elem in lst2]
for i in r:
print(i)
t.append(i)
for i in t:
frappe.db.sql("""Insert into biometric(UserId,C1,LogDate) select '%s','%s','%s' where not exists(select * from biometric where UserID='%s' and LogDate='%s')""",(i[0],i[1],i[2],i[0],i[2]),as_dict=1)
I am trying above code to insert data into my table if record not exists but getting error :
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '1111'',''in'',''2019-12-03 06:37:15'' where not exists(select * from biometric ' at line 1")
Is there anything I am doing wrong or any other way to achieve this?
It appears you have potentially four problems:
There is a from clause missing between select and where not exists.
When using a prepared statement you do not enclose your placeholder arguments, %s, within quotes. Your SQL should be:
Your loop:
Loop:
t = []
r = [elem for elem in lst if not elem in lst2]
for i in r:
print(i)
t.append(i)
If you are trying to only include rows from the remote site that will not be duplicates, then you should explicitly check the two fields that matter, i.e. UserId and LogDate. But what is the point since your SQL is taking care of making sure that you are excluding these duplicate rows? Also, what is the point of copying everything form r to t?
SQL:
Insert into biometric(UserId,C1,LogDate) select %s,%s,%s from DUAL where not exists(select * from biometric where UserID=%s and LogDate=%s
But here is the problem even with the above SQL:
If the not exists clause is false, then the select %s,%s,%s from DUAL ... returns no columns and the column count will not match the number of columns you are trying to insert, namely three.
If your concern is getting an error due to duplicate keys because (UserId, LogDate) is either a UNIQUE or PRIMARY KEY, then add the IGNORE keyword on the INSERT statement and then if a row with the key already exists, the insertion will be ignored. But there is no way of knowing since you have not provided this information:
for i in t:
frappe.db.sql("Insert IGNORE into biometric(UserId,C1,LogDate) values(%s,%s,%s)",(i[0],i[1],i[2]))
If you do not want multiple rows with the same (UserId, LogDate) combination, then you should define a UNIQUE KEY on these two columns and then the above SQL should be sufficient. There is also an ON DUPLICATE KEY SET ... variation of the INSERT statement where if the key exists you can do an update instead (look this up).
If you don't have a UNIQUE KEY defined on these two columns or you need to print out those rows which are being updated, then you do need to test for the presence of the existing keys. But this would be the way to do it:
cHandler = myDB.cursor()
cHandler.execute('select UserId,C1,LogDate from DeviceLogs_12_2019') // data from remote sql server database
rows = cHandler.fetchall()
curs = connection.cursor()
for row in rows:
curs.execute("select UserId from biometric where UserId=%s and LogDate=%s", (ros[0], row[2])) # row already in biometric table?
biometric_row = curs.fetchone()
if biometric_row is None: # no, it is not
print(row)
frappe.db.sql("Insert into biometric(UserId,C1,LogDate) values(%s, %s, %s)", (row[0],row[1],row[2]))
I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:
import pyodbc
table_name = 'my_table'
insert_values = [(1,2,3),(2,2,4),(3,4,5)]
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(
' '.join([
'insert into',
table_name,
'values',
','.join(
[str(i) for i in insert_values]
)
])
)
cursor.commit()
This should work as long as there are no duplicate keys (let's assume the first column contains the key). However for data with duplicate keys (data already existing in the table) it will raise an error.
How can I, in one go, insert multiple rows in a SQL server table using pyodbc such that data with duplicate keys simply gets updated.
Note: There are solutions proposed for single rows of data, however, I would like to insert multiple rows at once (avoid loops)!
This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:
MERGE INTO MyTable as Target
USING (SELECT * FROM
(VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5))
AS s (ID, col_a, col_b)
) AS Source
ON Target.ID=Source.ID
WHEN NOT MATCHED THEN
INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
WHEN MATCHED THEN
UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
You can give it a try on rextester.com/IONFW62765.
Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).
In python with pyodbc, it should probably look like this:
import pyodbc
insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
table_name = 'my_table'
key_col = 'ID'
col_a = 'col_a'
col_b = 'col_b'
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(('MERGE INTO {table_name} as Target '
'USING (SELECT * FROM '
'(VALUES {vals}) '
'AS s ({k}, {a}, {b}) '
') AS Source '
'ON Target.ID=Source.ID '
'WHEN NOT MATCHED THEN '
'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
'WHEN MATCHED THEN '
'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
.format(table_name=table_name,
vals=','.join([str(i) for i in insert_values]),
k=key_col,
a=col_a,
b=col_b)))
cursor.commit()
You can read up more on MERGE in the SQL Server docs.
Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:
# assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
# also assumes your dataframe columns have all the same names as the existing table
table_name_to_update = 'update_table'
table_name_to_transfer = 'placeholder_table'
# the dataframe and existing table should both have a column to use as the primary key
primary_key_col = 'id'
# replace the placeholder table with the dataframe
df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
# building the command terms
cols_list = df.columns.tolist()
cols_list_query = f'({(", ".join(cols_list))})'
sr_cols_list = [f'Source.{i}' for i in cols_list]
sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
up_cols_list_query = f'{", ".join(up_cols_list)}'
# fill values that should be interpreted as "NULL" with None
def fill_null(vals: list) -> list:
def bad(val):
if isinstance(val, type(pd.NA)):
return True
# the list of values you want to interpret as 'NULL' should be
# tweaked to your needs
return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
return tuple(i if not bad(i) else None for i in vals)
# create the list of parameter indicators (?, ?, ?, etc...)
# and the parameters, which are the values to be inserted
params = [fill_null(row.tolist()) for _, row in df.iterrows()]
param_slots = '('+', '.join(['?']*len(df.columns))+')'
cmd = f'''
MERGE INTO {table_name_to_update} as Target
USING (SELECT * FROM
(VALUES {param_slots})
AS s {cols_list_query}
) AS Source
ON Target.{primary_key_col}=Source.{primary_key_col}
WHEN NOT MATCHED THEN
INSERT {cols_list_query} VALUES {sr_cols_list_query}
WHEN MATCHED THEN
UPDATE SET {up_cols_list_query};
'''
# execute the command to merge tables
with engine.begin() as conn:
conn.execute(cmd, params)
This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).
For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:
import urllib
import pyodbc
pyodbc.pooling = False
import sqlalchemy
terms = urllib.parse.quote_plus(
'DRIVER={SQL Server Native Client 11.0};'
'SERVER=<your server>;'
'DATABASE=<your database>;'
'Trusted_Connection=yes;' # to logon using Windows credentials
url = f'mssql+pyodbc:///?odbc_connect={terms}'
engine = sqlalchemy.create_engine(url, fast_executemany=True)
EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.
Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.
def append(df, c):
table_name = 'ddf.ddf_actuals'
columns_list = df.columns.tolist()
columns_list_query = f'({(",".join(columns_list))})'
sr_columns_list = [f'Source.{i}' for i in columns_list]
sr_columns_list_query = f'({(",".join(sr_columns_list))})'
up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
up_columns_list_query = f'{",".join(up_columns_list)}'
rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
query = f"MERGE INTO {table_name} as Target \
USING (SELECT * FROM \
(VALUES {rows_to_insert}) \
AS s {columns_list_query}\
) AS Source \
ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
WHEN NOT MATCHED THEN \
INSERT {columns_list_query} VALUES {sr_columns_list_query} \
WHEN MATCHED THEN \
UPDATE SET {up_columns_list_query};"
c.execute(query)
c.commit()
I have very big dictionary that I want to insert into MySQL table. The dictionary keys are the column names in the table. I'm constructing my query like this as of now:
bigd = {'k1':'v1', 'k2':10}
cols = str(bigd.keys()).strip('[]')
vals = str(bigd.values()).strip('[]')
query = "INSERT INTO table ({}) values ({})".format(cols,vals)
print query
Output:
"INSERT INTO table ('k2', 'k1') values (10, 'v1')"
And this works in Python2.7
But in Python 3.6 if I use string literals like this:
query = f"INSERT INTO table ({cols}) values ({vals})"
print(query)
It prints this:
"INSERT INTO table (dict_keys(['k1', 'k2'])) values (dict_values(['v1', 10]))"
Any tips?
For your curiosity, you should realize that you've cast these to str, getting the representation of dict_keys/values to be inserted into the f-string.
You could just cast to tuples and then insert:
cols = tuple(bigd.keys())
vals = tuple(bigd.values())
q = f"INSERT INTO table {cols} values {vals}"
but, as the comment notes, this isn't a safe approach.