Related
I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:
import pyodbc
table_name = 'my_table'
insert_values = [(1,2,3),(2,2,4),(3,4,5)]
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(
' '.join([
'insert into',
table_name,
'values',
','.join(
[str(i) for i in insert_values]
)
])
)
cursor.commit()
This should work as long as there are no duplicate keys (let's assume the first column contains the key). However for data with duplicate keys (data already existing in the table) it will raise an error.
How can I, in one go, insert multiple rows in a SQL server table using pyodbc such that data with duplicate keys simply gets updated.
Note: There are solutions proposed for single rows of data, however, I would like to insert multiple rows at once (avoid loops)!
This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:
MERGE INTO MyTable as Target
USING (SELECT * FROM
(VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5))
AS s (ID, col_a, col_b)
) AS Source
ON Target.ID=Source.ID
WHEN NOT MATCHED THEN
INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
WHEN MATCHED THEN
UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
You can give it a try on rextester.com/IONFW62765.
Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).
In python with pyodbc, it should probably look like this:
import pyodbc
insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
table_name = 'my_table'
key_col = 'ID'
col_a = 'col_a'
col_b = 'col_b'
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(('MERGE INTO {table_name} as Target '
'USING (SELECT * FROM '
'(VALUES {vals}) '
'AS s ({k}, {a}, {b}) '
') AS Source '
'ON Target.ID=Source.ID '
'WHEN NOT MATCHED THEN '
'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
'WHEN MATCHED THEN '
'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
.format(table_name=table_name,
vals=','.join([str(i) for i in insert_values]),
k=key_col,
a=col_a,
b=col_b)))
cursor.commit()
You can read up more on MERGE in the SQL Server docs.
Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:
# assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
# also assumes your dataframe columns have all the same names as the existing table
table_name_to_update = 'update_table'
table_name_to_transfer = 'placeholder_table'
# the dataframe and existing table should both have a column to use as the primary key
primary_key_col = 'id'
# replace the placeholder table with the dataframe
df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
# building the command terms
cols_list = df.columns.tolist()
cols_list_query = f'({(", ".join(cols_list))})'
sr_cols_list = [f'Source.{i}' for i in cols_list]
sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
up_cols_list_query = f'{", ".join(up_cols_list)}'
# fill values that should be interpreted as "NULL" with None
def fill_null(vals: list) -> list:
def bad(val):
if isinstance(val, type(pd.NA)):
return True
# the list of values you want to interpret as 'NULL' should be
# tweaked to your needs
return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
return tuple(i if not bad(i) else None for i in vals)
# create the list of parameter indicators (?, ?, ?, etc...)
# and the parameters, which are the values to be inserted
params = [fill_null(row.tolist()) for _, row in df.iterrows()]
param_slots = '('+', '.join(['?']*len(df.columns))+')'
cmd = f'''
MERGE INTO {table_name_to_update} as Target
USING (SELECT * FROM
(VALUES {param_slots})
AS s {cols_list_query}
) AS Source
ON Target.{primary_key_col}=Source.{primary_key_col}
WHEN NOT MATCHED THEN
INSERT {cols_list_query} VALUES {sr_cols_list_query}
WHEN MATCHED THEN
UPDATE SET {up_cols_list_query};
'''
# execute the command to merge tables
with engine.begin() as conn:
conn.execute(cmd, params)
This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).
For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:
import urllib
import pyodbc
pyodbc.pooling = False
import sqlalchemy
terms = urllib.parse.quote_plus(
'DRIVER={SQL Server Native Client 11.0};'
'SERVER=<your server>;'
'DATABASE=<your database>;'
'Trusted_Connection=yes;' # to logon using Windows credentials
url = f'mssql+pyodbc:///?odbc_connect={terms}'
engine = sqlalchemy.create_engine(url, fast_executemany=True)
EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.
Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.
def append(df, c):
table_name = 'ddf.ddf_actuals'
columns_list = df.columns.tolist()
columns_list_query = f'({(",".join(columns_list))})'
sr_columns_list = [f'Source.{i}' for i in columns_list]
sr_columns_list_query = f'({(",".join(sr_columns_list))})'
up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
up_columns_list_query = f'{",".join(up_columns_list)}'
rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
query = f"MERGE INTO {table_name} as Target \
USING (SELECT * FROM \
(VALUES {rows_to_insert}) \
AS s {columns_list_query}\
) AS Source \
ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
WHEN NOT MATCHED THEN \
INSERT {columns_list_query} VALUES {sr_columns_list_query} \
WHEN MATCHED THEN \
UPDATE SET {up_columns_list_query};"
c.execute(query)
c.commit()
I'm stuck on how to insert a column that contains lists into a Postgresql database. I know it is theoretically possible, because there are datatypes like BIGINT[] that exist, whereas it doesn't exist with other SQL variants.
Here is my code:
import datetime
import json
import pandas as pd
import pymysql.cursors
from sqlalchemy import create_engine
mock_data = {}
mock_data['a'] = 'HELLO'
mock_data['b'] = {
'c' : {
'd' : True,
# NOTE: If you replace this with 'e' : '', instead of the list, it works fine.
'e' : []
},
'f' : 'TESTING'
}
df = pd.json_normalize(mock_data)
engine = create_engine("postgresql://postgres:ABCDEFG#localhost:5432/testing")
con = engine.connect()
table_name = 'testing-db'
try:
frame = df.to_sql(con=con, name=table_name, index=False, if_exists='replace')
display(frame)
except ValueError as vx:
print(vx)
except Exception as ex:
print(ex)
else:
print("Table %s created successfully."%table_name);
finally:
connection.close()
The code above fails, due to 'e' : []. Python/Pandas doesn't report a failure, but I can't see the table being updated in Postgres. However, if you changed the list to an empty string, like this: 'e' : ''
The postgres database is updated. I can't figure out how to insert a list into a Postgres database with Pandas. Any help would be much appreciated.
I figured out the solution. I had to use the dtypes field to make it work. So here's the changes I had to make:
dtypes = {
'a' : types.TEXT(),
'b.c.d' : types.BOOLEAN(),
'b.c.e' : types.ARRAY(types.BIGINT()),
'b.f' : types.TEXT()
}
...
frame = df.to_sql(con=con,
name=table_name,
index=False,
dtype=dtypes,
if_exists='replace')
You cannot insert list in a SQL Database Cell as it breaks the normalization parameters.
What you can do instead is:
Convert your list to a json string object (or XML)
Create a new table that has the elements in individual cells and refers to original table.
For example: If you have a list field for which the length is fixed
list_field = [Element_A, Element_B, Element_C]
You can create another table list_data that has 4 columns (1 primary and 3 columns). Store your list data here and use this to refer your original table. This gives you a far more efficient handle on the data values and is the traditional way of doing so.
However, if you have variable length list_data, it is far more efficient to just dump it in a json and store it as a json or string object. But remember doing so will mean that you will have to pre-process the response on each fetch to get the data you want.
I am currently executing the simply query below with python using pyodbc to insert data in SQL server table:
import pyodbc
table_name = 'my_table'
insert_values = [(1,2,3),(2,2,4),(3,4,5)]
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(
' '.join([
'insert into',
table_name,
'values',
','.join(
[str(i) for i in insert_values]
)
])
)
cursor.commit()
This should work as long as there are no duplicate keys (let's assume the first column contains the key). However for data with duplicate keys (data already existing in the table) it will raise an error.
How can I, in one go, insert multiple rows in a SQL server table using pyodbc such that data with duplicate keys simply gets updated.
Note: There are solutions proposed for single rows of data, however, I would like to insert multiple rows at once (avoid loops)!
This can be done using MERGE. Let's say you have a key column ID, and two columns col_a and col_b (you need to specify column names in update statements), then the statement would look like this:
MERGE INTO MyTable as Target
USING (SELECT * FROM
(VALUES (1, 2, 3), (2, 2, 4), (3, 4, 5))
AS s (ID, col_a, col_b)
) AS Source
ON Target.ID=Source.ID
WHEN NOT MATCHED THEN
INSERT (ID, col_a, col_b) VALUES (Source.ID, Source.col_a, Source.col_b)
WHEN MATCHED THEN
UPDATE SET col_a=Source.col_a, col_b=Source.col_b;
You can give it a try on rextester.com/IONFW62765.
Basically, I'm creating a Source table "on-the-fly" using the list of values, which you want to upsert. When you then merge the Source table with the Target, you can test the MATCHED condition (Target.ID=Source.ID) on each row (whereas you would be limited to a single row when just using a simple IF <exists> INSERT (...) ELSE UPDATE (...) condition).
In python with pyodbc, it should probably look like this:
import pyodbc
insert_values = [(1, 2, 3), (2, 2, 4), (3, 4, 5)]
table_name = 'my_table'
key_col = 'ID'
col_a = 'col_a'
col_b = 'col_b'
cnxn = pyodbc.connect(...)
cursor = cnxn.cursor()
cursor.execute(('MERGE INTO {table_name} as Target '
'USING (SELECT * FROM '
'(VALUES {vals}) '
'AS s ({k}, {a}, {b}) '
') AS Source '
'ON Target.ID=Source.ID '
'WHEN NOT MATCHED THEN '
'INSERT ({k}, {a}, {b}) VALUES (Source.{k}, Source.{a}, Source.{b}) '
'WHEN MATCHED THEN '
'UPDATE SET {k}=Source.{a}, col_b=Source.{b};'
.format(table_name=table_name,
vals=','.join([str(i) for i in insert_values]),
k=key_col,
a=col_a,
b=col_b)))
cursor.commit()
You can read up more on MERGE in the SQL Server docs.
Following up on the existing answers here because they are potentially prone to injection attacks and it's better to use parameterized queries (for mssql/pyodbc, these are the "?" placeholders). I tweaked Alexander Novas's code slightly to use dataframe rows in a parameterized version of the query with sqlalchemy:
# assuming you already have a dataframe "df" and sqlalchemy engine called "engine"
# also assumes your dataframe columns have all the same names as the existing table
table_name_to_update = 'update_table'
table_name_to_transfer = 'placeholder_table'
# the dataframe and existing table should both have a column to use as the primary key
primary_key_col = 'id'
# replace the placeholder table with the dataframe
df.to_sql(table_name_to_transfer, engine, if_exists='replace', index=False)
# building the command terms
cols_list = df.columns.tolist()
cols_list_query = f'({(", ".join(cols_list))})'
sr_cols_list = [f'Source.{i}' for i in cols_list]
sr_cols_list_query = f'({(", ".join(sr_cols_list))})'
up_cols_list = [f'{i}=Source.{i}' for i in cols_list]
up_cols_list_query = f'{", ".join(up_cols_list)}'
# fill values that should be interpreted as "NULL" with None
def fill_null(vals: list) -> list:
def bad(val):
if isinstance(val, type(pd.NA)):
return True
# the list of values you want to interpret as 'NULL' should be
# tweaked to your needs
return val in ['NULL', np.nan, 'nan', '', '', '-', '?']
return tuple(i if not bad(i) else None for i in vals)
# create the list of parameter indicators (?, ?, ?, etc...)
# and the parameters, which are the values to be inserted
params = [fill_null(row.tolist()) for _, row in df.iterrows()]
param_slots = '('+', '.join(['?']*len(df.columns))+')'
cmd = f'''
MERGE INTO {table_name_to_update} as Target
USING (SELECT * FROM
(VALUES {param_slots})
AS s {cols_list_query}
) AS Source
ON Target.{primary_key_col}=Source.{primary_key_col}
WHEN NOT MATCHED THEN
INSERT {cols_list_query} VALUES {sr_cols_list_query}
WHEN MATCHED THEN
UPDATE SET {up_cols_list_query};
'''
# execute the command to merge tables
with engine.begin() as conn:
conn.execute(cmd, params)
This method is also better if you are inserting strings with characters that aren't compatible with SQL insert text (such as apostrophes which mess up the insert statement) since it lets the connection engine handle the parameterized values (which also makes it safer against SQL injection attacks).
For reference, I'm creating the engine connection using this code - you'll obviously need to adapt it to your server/database/environment and whether or not you want fast_executemany:
import urllib
import pyodbc
pyodbc.pooling = False
import sqlalchemy
terms = urllib.parse.quote_plus(
'DRIVER={SQL Server Native Client 11.0};'
'SERVER=<your server>;'
'DATABASE=<your database>;'
'Trusted_Connection=yes;' # to logon using Windows credentials
url = f'mssql+pyodbc:///?odbc_connect={terms}'
engine = sqlalchemy.create_engine(url, fast_executemany=True)
EDIT: I realized that this code does not actually make use of the "placeholder" table at all, and is just copying values directly from the dataframe rows by way of the parameterized command.
Given a dataframe(df) I used the code from ksbg to upsert into a table. Note that I looked for a match on two columns (date and stationcode) you can use one. Code generates the query given any df.
def append(df, c):
table_name = 'ddf.ddf_actuals'
columns_list = df.columns.tolist()
columns_list_query = f'({(",".join(columns_list))})'
sr_columns_list = [f'Source.{i}' for i in columns_list]
sr_columns_list_query = f'({(",".join(sr_columns_list))})'
up_columns_list = [f'{i}=Source.{i}' for i in columns_list]
up_columns_list_query = f'{",".join(up_columns_list)}'
rows_to_insert = [row.tolist() for idx, row in final_list.iterrows()]
rows_to_insert = str(rows_to_insert).replace('[', '(').replace(']', ')')[1:][:-1]
query = f"MERGE INTO {table_name} as Target \
USING (SELECT * FROM \
(VALUES {rows_to_insert}) \
AS s {columns_list_query}\
) AS Source \
ON Target.stationcode=Source.stationcode AND Target.date=Source.date \
WHEN NOT MATCHED THEN \
INSERT {columns_list_query} VALUES {sr_columns_list_query} \
WHEN MATCHED THEN \
UPDATE SET {up_columns_list_query};"
c.execute(query)
c.commit()
I'm trying to make a function which inserts a row into the SQLite3 database using dictionary.
I found here, on SO a way to do that, but it unfortunately does not work. There is some problem I can't figure out.
def insert_into_table(self,data):
for key in data.keys(): # ADDING COLUMNS IF NECESSARY
columns = self.get_column_names()
column = key.replace(' ','_')
if column not in columns:
self.cur.execute("""ALTER TABLE vsetkyfirmy ADD COLUMN {} TEXT""".format(column.encode('utf-8')))
self.conn.commit()
new_data={}
for v,k in data.iteritems(): # new dictionary with remaden names (*column = key.replace(' ','_'))
new_data[self.remake_name(v)]=k
columns = ', '.join(new_data.keys())
placeholders = ':'+', :'.join(new_data.keys())
query = 'INSERT INTO vsetkyfirmy (%s) VALUES (%s)' % (columns, placeholders)
self.cur.execute(query, new_data)
self.conn.commit()
EXCEPTION:
self.cur.execute(query, new_data)
sqlite3.ProgrammingError: You did not supply a value for binding 1.
When I print query and new_data everything seems correct:
INSERT INTO vsetkyfirmy (Obchodné_meno, IČ_DPH, Sídlo, PSČ, Spoločník, IČO, Základné_imanie, Konateľ, Ročný_obrat, Dátum_vzniku, Právna_forma) VALUES (:Obchodné_meno, :IČ_DPH, :Sídlo, :PSČ, :Spoločník, :IČO, :Základné_imanie, :Konateľ, :Ročný_obrat, :Dátum_vzniku, :Právna_forma)
{u'Obchodn\xe9_meno': 'PRspol. s r.o.', u'I\u010c_DPH': 'S9540', u'S\xeddlo': u'Bansk\xe1 Bystrica, Orembursk\xe1 2', u'PS\u010c': '97401', u'Spolo\u010dn\xedk': u'Dana Dzurianikov\xe1', u'I\u010cO': '3067', u'Z\xe1kladn\xe9_imanie': u'142899 \u20ac', u'Konate\u013e': 'Miroslav Dz', u'Ro\u010dn\xfd_obrat': '2014: 482 EUR', u'D\xe1tum_vzniku': '01.12.1991 ', u'Pr\xe1vna_forma': u'Spolo\u010dnos\u0165 s ru\u010den\xedm obmedzen\xfdm'}
EDIT: So I've tried to remove ':' from query so it looks like:
INSERT INTO vsetkyfirmy (Obchodné_meno, IČ_DPH, Sídlo, PSČ, Spoločník, IČO, Základné_imanie, Konateľ, Ročný_obrat, Dátum_vzniku, Právna_forma) VALUES (Obchodné_meno, IČ_DPH, Sídlo, PSČ, Spoločník, IČO, Základné_imanie, Konateľ, Ročný_obrat, Dátum_vzniku, Právna_forma)
And it returns that sqlite3.OperationalError: no such column: Obchodné_meno
I don't know where is the problem, could it be in encoding?
You are calling encode('utf-8') when creating the table, but not when inserting.
SQLite indeed uses UTF-8, but the sqlite3 module automatically handles conversion from/to Python's internal Unicode string encoding. Don't try to reencode manually.
Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?
As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)
Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)
I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).