Python - creating string from dict items (for writing to postgresql db)

Python - creating string from dict items (for writing to postgresql db) - python

I'm writing some code using psycopg2 to connect to a PostGreSQL database.
I have a lot of different data types that I want to write to different tables in my PostGreSQL database. I am trying to write a function that can write to each of the tables based on a single variable passed in the function and I want to write more than 1 row at a time to optimize my query. Luckily PostGreSQL allows me to do that: PostGreSQL Insert:
INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');
I have run into a problem that I was hoping someone could help me with.
I need to create a string:
string1 = (value11, value21, value31), (value12, value22, value32)
The string1 variable will be created by using a dictionary with values. So far I have been able to create a tuple that is close to the structure I want. I have a list of dictionaries. The list is called rows:
string1 = tuple([tuple([value for value in row.values()]) for row in rows])
To test it I have created the following small rows variable:
rows = [{'id': 1, 'test1': 'something', 'test2': 123},
{'id': 2, 'test1': 'somethingelse', 'test2': 321}]
When rows is passed through the above piece of code string1 becomes as follows:
((1, 'something', 123), (2, 'somethingelse', 321))
As seen with string1 I just need to remove the outmost parenthesis and make it a string for it to be as I need it. So far I don't know how this is done. So my question to you is: "How do I format string1 to have my required format?"

execute_values makes it much easier. Pass the dict sequence in instead of a values sequence:
import psycopg2, psycopg2.extras
rows = [
{'id': 1, 'test1': 'something', 'test2': 123},
{'id': 2, 'test1': 'somethingelse', 'test2': 321}
]
conn = psycopg2.connect(database='cpn')
cursor = conn.cursor()
insert_query = 'insert into t (id, test1, test2) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, rows,
template='(%(id)s, %(test1)s, %(test2)s)',
page_size=100
)
And the values are inserted:
table t;
id | test1 | test2
----+---------------+-------
1 | something | 123
2 | somethingelse | 321
To have the number of affected rows use a CTE:
insert_query = '''
with i as (
insert into t (id, test1, test2) values %s
returning *
)
select count(*) from i
'''
psycopg2.extras.execute_values (
cursor, insert_query, rows,
template='(%(id)s, %(test1)s, %(test2)s)',
page_size=100
)
row_count = cursor.fetchone()[0]

With little modification you can achieve this.
change your piece of cod as follows
','.join([tuple([value for value in row.values()]).__repr__() for row in rows])
current output is
tuple of tuple
(('something', 123, 1), ('somethingelse', 321, 2))
After changes output will be
in string format as you want
"('something', 123, 1),('somethingelse', 321, 2)"

The solution that you described is not so well because potentially it may harm your database – that solution does not care about escaping string, etc. So SQL injection is possible.
Fortunately, psycopg (and psycopg2) has cursor's methods execute and mogrify that will properly do all this work for you:
import contextlib
with contextlib.closing(db_connection.cursor()) as cursor:
values = [cursor.mogrify('(%(id)s, %(test1)s, %(test2)s)', row) for row in rows]
query = 'INSERT INTO films (id, test1, test2) VALUES {0};'.format(', '.join(values))
For python 3:
import contextlib
with contextlib.closing(db_connection.cursor()) as cursor:
values = [cursor.mogrify('(%(id)s, %(test1)s, %(test2)s)', row) for row in rows]
query_bytes = b'INSERT INTO films (id, test1, test2) VALUES ' + b', '.join(values) + b';'

Related

syntax error when trying to insert list of tuples using psycopg2 [duplicate]

I need to insert multiple rows with one query (number of rows is not constant), so I need to execute query like this one:
INSERT INTO t (a, b) VALUES (1, 2), (3, 4), (5, 6);
The only way I know is
args = [(1,2), (3,4), (5,6)]
args_str = ','.join(cursor.mogrify("%s", (x, )) for x in args)
cursor.execute("INSERT INTO t (a, b) VALUES "+args_str)
but I want some simpler way.

I built a program that inserts multiple lines to a server that was located in another city.
I found out that using this method was about 10 times faster than executemany. In my case tup is a tuple containing about 2000 rows. It took about 10 seconds when using this method:
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
and 2 minutes when using this method:
cur.executemany("INSERT INTO table VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s)", tup)

New execute_values method in Psycopg 2.7:
data = [(1,'x'), (2,'y')]
insert_query = 'insert into t (a, b) values %s'
psycopg2.extras.execute_values (
cursor, insert_query, data, template=None, page_size=100
)
The pythonic way of doing it in Psycopg 2.6:
data = [(1,'x'), (2,'y')]
records_list_template = ','.join(['%s'] * len(data))
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
cursor.execute(insert_query, data)
Explanation: If the data to be inserted is given as a list of tuples like in
data = [(1,'x'), (2,'y')]
then it is already in the exact required format as
the values syntax of the insert clause expects a list of records as in
insert into t (a, b) values (1, 'x'),(2, 'y')
Psycopg adapts a Python tuple to a Postgresql record.
The only necessary work is to provide a records list template to be filled by psycopg
# We use the data list to be sure of the template length
records_list_template = ','.join(['%s'] * len(data))
and place it in the insert query
insert_query = 'insert into t (a, b) values {}'.format(records_list_template)
Printing the insert_query outputs
insert into t (a, b) values %s,%s
Now to the usual Psycopg arguments substitution
cursor.execute(insert_query, data)
Or just testing what will be sent to the server
print (cursor.mogrify(insert_query, data).decode('utf8'))
Output:
insert into t (a, b) values (1, 'x'),(2, 'y')

Update with psycopg2 2.7:
The classic executemany() is about 60 times slower than #ant32 's implementation (called "folded") as explained in this thread: https://www.postgresql.org/message-id/20170130215151.GA7081%40deb76.aryehleib.com
This implementation was added to psycopg2 in version 2.7 and is called execute_values():
from psycopg2.extras import execute_values
execute_values(cur,
"INSERT INTO test (id, v1, v2) VALUES %s",
[(1, 2, 3), (4, 5, 6), (7, 8, 9)])
Previous Answer:
To insert multiple rows, using the multirow VALUES syntax with execute() is about 10x faster than using psycopg2 executemany(). Indeed, executemany() just runs many individual INSERT statements.
#ant32 's code works perfectly in Python 2. But in Python 3, cursor.mogrify() returns bytes, cursor.execute() takes either bytes or strings, and ','.join() expects str instance.
So in Python 3 you may need to modify #ant32 's code, by adding .decode('utf-8'):
args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x).decode('utf-8') for x in tup)
cur.execute("INSERT INTO table VALUES " + args_str)
Or by using bytes (with b'' or b"") only:
args_bytes = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_bytes)

cursor.copy_from is the fastest solution I've found for bulk inserts by far. Here's a gist I made containing a class named IteratorFile which allows an iterator yielding strings to be read like a file. We can convert each input record to a string using a generator expression. So the solution would be
args = [(1,2), (3,4), (5,6)]
f = IteratorFile(("{}\t{}".format(x[0], x[1]) for x in args))
cursor.copy_from(f, 'table_name', columns=('a', 'b'))
For this trivial size of args it won't make much of a speed difference, but I see big speedups when dealing with thousands+ of rows. It will also be more memory efficient than building a giant query string. An iterator would only ever hold one input record in memory at a time, where at some point you'll run out of memory in your Python process or in Postgres by building the query string.

A snippet from Psycopg2's tutorial page at Postgresql.org (see bottom):
A last item I would like to show you is how to insert multiple rows using a dictionary. If you had the following:
namedict = ({"first_name":"Joshua", "last_name":"Drake"},
{"first_name":"Steven", "last_name":"Foo"},
{"first_name":"David", "last_name":"Bar"})
You could easily insert all three rows within the dictionary by using:
cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)
It doesn't save much code, but it definitively looks better.

All of these techniques are called 'Extended Inserts" in Postgres terminology, and as of the 24th of November 2016, it's still a ton faster than psychopg2's executemany() and all the other methods listed in this thread (which i tried before coming to this answer).
Here's some code which doesnt use cur.mogrify and is nice and simply to get your head around:
valueSQL = [ '%s', '%s', '%s', ... ] # as many as you have columns.
sqlrows = []
rowsPerInsert = 3 # more means faster, but with diminishing returns..
for row in getSomeData:
# row == [1, 'a', 'yolo', ... ]
sqlrows += row
if ( len(sqlrows)/len(valueSQL) ) % rowsPerInsert == 0:
# sqlrows == [ 1, 'a', 'yolo', 2, 'b', 'swag', 3, 'c', 'selfie' ]
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*rowsPerInsert)
cur.execute(insertSQL, sqlrows)
con.commit()
sqlrows = []
insertSQL = 'INSERT INTO "twitter" VALUES ' + ','.join(['(' + ','.join(valueSQL) + ')']*len(sqlrows))
cur.execute(insertSQL, sqlrows)
con.commit()
But it should be noted that if you can use copy_from(), you should use copy_from ;)

Security vulnerabilities
As of 2022-11-16, the answers by #Clodoaldo Neto (for Psycopg 2.6), #Joseph Sheedy, #J.J, #Bart Jonk, #kevo Njoki, #TKoutny and #Nihal Sharma contain SQL injection vulnerabilities and should not be used.
The fastest proposal so far (copy_from) should not be used either because it is difficult to escape the data correctly. This is easily apparent when trying to insert characters like ', ", \n, \, \t or \n.
The author of psycopg2 also recommends against copy_from:
copy_from() and copy_to() are really just ancient and incomplete methods
The fastest method
The fastest method is cursor.copy_expert, which can insert data straight from CSV files.
with open("mydata.csv") as f:
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
copy_expert is also the fastest method when generating the CSV file on-the-fly. For reference, see the following CSVFile class, which takes care to limit memory usage.
import io, csv
class CSVFile(io.TextIOBase):
# Create a CSV file from rows. Can only be read once.
def __init__(self, rows, size=8192):
self.row_iter = iter(rows)
self.buf = io.StringIO()
self.available = 0
self.size = size
def read(self, n):
# Buffer new CSV rows until enough data is available
buf = self.buf
writer = csv.writer(buf)
while self.available < n:
try:
row_length = writer.writerow(next(self.row_iter))
self.available += row_length
self.size = max(self.size, row_length)
except StopIteration:
break
# Read requested amount of data from buffer
write_pos = buf.tell()
read_pos = write_pos - self.available
buf.seek(read_pos)
data = buf.read(n)
self.available -= len(data)
# Shrink buffer if it grew very large
if read_pos > 2 * self.size:
remaining = buf.read()
buf.seek(0)
buf.write(remaining)
buf.truncate()
else:
buf.seek(write_pos)
return data
This class can then be used like:
rows = [(1, "a", "b"), (2, "c", "d")]
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", CSVFile(rows))
If all your data fits into memory, you can also generate the entire CSV data directly without the CSVFile class, but if you do not know how much data you are going to insert in the future, you probably should not do that.
f = io.StringIO()
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
f.seek(0)
cursor.copy_expert("COPY mytable (my_id, a, b) FROM STDIN WITH csv", f)
Benchmark results
914 milliseconds - many calls to cursor.execute
846 milliseconds - cursor.executemany
362 milliseconds - psycopg2.extras.execute_batch
346 milliseconds - execute_batch with page_size=1000
265 milliseconds - execute_batch with prepared statement
161 milliseconds - psycopg2.extras.execute_values
127 milliseconds - cursor.execute with string-concatenated values
39 milliseconds - copy_expert generating the entire CSV file at once
32 milliseconds - copy_expert with CSVFile

I've been using ant32's answer above for several years. However I've found that is thorws an error in python 3 because mogrify returns a byte string.
Converting explicitly to bytse strings is a simple solution for making code python 3 compatible.
args_str = b','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s,%s,%s,%s)", x) for x in tup)
cur.execute(b"INSERT INTO table VALUES " + args_str)

executemany accept array of tuples
https://www.postgresqltutorial.com/postgresql-python/insert/
""" array of tuples """
vendor_list = [(value1,)]
""" insert multiple vendors into the vendors table """
sql = "INSERT INTO vendors(vendor_name) VALUES(%s)"
conn = None
try:
# read database configuration
params = config()
# connect to the PostgreSQL database
conn = psycopg2.connect(**params)
# create a new cursor
cur = conn.cursor()
# execute the INSERT statement
cur.executemany(sql,vendor_list)
# commit the changes to the database
conn.commit()
# close communication with the database
cur.close()
except (Exception, psycopg2.DatabaseError) as error:
print(error)
finally:
if conn is not None:
conn.close()

The cursor.copyfrom solution as provided by #jopseph.sheedy (https://stackoverflow.com/users/958118/joseph-sheedy) above (https://stackoverflow.com/a/30721460/11100064) is indeed lightning fast.
However, the example he gives are not generically usable for a record with any number of fields and it took me while to figure out how to use it correctly.
The IteratorFile needs to be instantiated with tab-separated fields like this (r is a list of dicts where each dict is a record):
f = IteratorFile("{0}\t{1}\t{2}\t{3}\t{4}".format(r["id"],
r["type"],
r["item"],
r["month"],
r["revenue"]) for r in records)
To generalise for an arbitrary number of fields we will first create a line string with the correct amount of tabs and field placeholders : "{}\t{}\t{}....\t{}" and then use .format() to fill in the field values : *list(r.values())) for r in records:
line = "\t".join(["{}"] * len(records[0]))
f = IteratorFile(line.format(*list(r.values())) for r in records)
complete function in gist here.

execute_batch has been added to psycopg2 since this question was posted.
It is faster than execute_values.

Another nice and efficient approach - is to pass rows for insertion as 1 argument,
which is array of json objects.
E.g. you passing argument:
[ {id: 18, score: 1}, { id: 19, score: 5} ]
It is array, which may contain any amount of objects inside.
Then your SQL looks like:
INSERT INTO links (parent_id, child_id, score)
SELECT 123, (r->>'id')::int, (r->>'score')::int
FROM unnest($1::json[]) as r
Notice: Your postgress must be new enough, to support json

If you're using SQLAlchemy, you don't need to mess with hand-crafting the string because SQLAlchemy supports generating a multi-row VALUES clause for a single INSERT statement:
rows = []
for i, name in enumerate(rawdata):
row = {
'id': i,
'name': name,
'valid': True,
}
rows.append(row)
if len(rows) > 0: # INSERT fails if no rows
insert_query = SQLAlchemyModelName.__table__.insert().values(rows)
session.execute(insert_query)

From #ant32
def myInsertManyTuples(connection, table, tuple_of_tuples):
cursor = connection.cursor()
try:
insert_len = len(tuple_of_tuples[0])
insert_template = "("
for i in range(insert_len):
insert_template += "%s,"
insert_template = insert_template[:-1] + ")"
args_str = ",".join(
cursor.mogrify(insert_template, x).decode("utf-8")
for x in tuple_of_tuples
)
cursor.execute("INSERT INTO " + table + " VALUES " + args_str)
connection.commit()
except psycopg2.Error as e:
print(f"psycopg2.Error in myInsertMany = {e}")
connection.rollback()

If you want to insert multiple rows within one insert statemens (assuming you are not using ORM) the easiest way so far for me would be to use list of dictionaries. Here is an example:
t = [{'id':1, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 6},
{'id':2, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 7},
{'id':3, 'start_date': '2015-07-19 00:00:00', 'end_date': '2015-07-20 00:00:00', 'campaignid': 8}]
conn.execute("insert into campaign_dates
(id, start_date, end_date, campaignid)
values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);",
t)
As you can see only one query will be executed:
INFO sqlalchemy.engine.base.Engine insert into campaign_dates (id, start_date, end_date, campaignid) values (%(id)s, %(start_date)s, %(end_date)s, %(campaignid)s);
INFO sqlalchemy.engine.base.Engine [{'campaignid': 6, 'id': 1, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 7, 'id': 2, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}, {'campaignid': 8, 'id': 3, 'end_date': '2015-07-20 00:00:00', 'start_date': '2015-07-19 00:00:00'}]
INFO sqlalchemy.engine.base.Engine COMMIT

psycopg2 2.9.3
data = "(1, 2), (3, 4), (5, 6)"
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)
or
data = [(1, 2), (3, 4), (5, 6)]
data = ",".join(map(str, data))
query = "INSERT INTO t (a, b) VALUES {0}".format(data)
cursor.execute(query)

The Solution am using can insert like 8000 records in 1 millisecond
curtime = datetime.datetime.now()
postData = dict()
postData["title"] = "This is Title Text"
postData["body"] = "This a Body Text it Can be Long Text"
postData['created_at'] = curtime.isoformat()
postData['updated_at'] = curtime.isoformat()
data = []
for x in range(8000):
data.append(((postData)))
vals = []
for d in postData:
vals.append(tuple(d.values())) #Here we extract the Values from the Dict
flds = ",".join(map(str, postData[0]))
tableFlds = ",".join(map(str, vals))
sqlStr = f"INSERT INTO posts ({flds}) VALUES {tableFlds}"
db.execute(sqlStr)
connection.commit()
rowsAffected = db.rowcount
print(f'{rowsAffected} Rows Affected')

Finally in SQLalchemy1.2 version, this new implementation is added to use psycopg2.extras.execute_batch() instead of executemany when you initialize your engine with use_batch_mode=True like:
engine = create_engine(
"postgresql+psycopg2://scott:tiger#host/dbname",
use_batch_mode=True)
http://docs.sqlalchemy.org/en/latest/changelog/migration_12.html#change-4109
Then someone would have to use SQLalchmey won't bother to try different combinations of sqla and psycopg2 and direct SQL together..

Using aiopg - The snippet below works perfectly fine
# items = [10, 11, 12, 13]
# group = 1
tup = [(gid, pid) for pid in items]
args_str = ",".join([str(s) for s in tup])
# insert into group values (1, 10), (1, 11), (1, 12), (1, 13)
yield from cur.execute("INSERT INTO group VALUES " + args_str)

Is there a way to automate UNION ALL insertion?

I am using an oracle 19c, and I am trying to insert using the union all method. I tried to automate it and I am getting ORA-00907.
Here is my code:
def insert(items):
# items is a list of dicts ->
# [{"test": "Test", "Test": "test", "r": "a"}, {"test": "Test", "Test": "test", "s": "a"}...]
cursor = connection.cursor()
insertions = []
for item in items:
insertions.append(item["test"], item["Test"])
query = """INSERT INTO C##USER.RANDOM
SELECT (:1, :2) FROM dual
""" + "\n".join(["UNION ALL SELECT (:{i}, {i+1}) FROM dual" for i in range(3, len(insertions), 2)])
cursor.execute(query, insertions)

I believe executemany is the better option for your use case.
example from the page:
dataToInsert = [
(10, 'Parent 10'),
(20, 'Parent 20'),
(30, 'Parent 30'),
(40, 'Parent 40'),
(50, 'Parent 50')
]
cursor.executemany("insert into ParentTable values (:1, :2)", dataToInsert)

If you want to effectively insert a large number of generated test data,
you should neither use
INSERT /*+APPEND*/ INTO ... VALUES (...)
As in the related question.
Note that you are inserting row by row so the APPENDhint is meaningless here and is ignored.
neither you should use as large UNION ALL select and bind thousands of bind variables.
As pointed by others this will take a large parsing time.
You should approach this with one INSERT statement that process all rows to be inserted:
Example
insert /*+ APPEND */ into tab (col1,col2)
select rownum, 'Test'||rownum from dual
connect by level <= 10000;
Note this will populate your table with 10000 rows such as
COL1 COL2
---------- --------------------------------------------
1 Test1
2 Test2
3 Test3
4 Test4
5 Test5
....

peewee orm: bulk insert using a subquery but is based on python-side-data

peewee allows bulk inserts via insert_many() and insert_from(), however insert_many() allows a list of data to be inserted, but does not allow data computed from other parts of the database. insert_from() does allow data computed from other parts of the database, but does not allow any data to be sent from python.
Example:
Assuming a model structure like so:
class BaseModel(Model):
class Meta:
database = db
class Person(BaseModel):
name = CharField(max_length=100, unique=True)
class StatusUpdate(BaseModel):
person = ForeignKeyField(Person, related_name='statuses')
status = TextField()
timestamp = DateTimeField(constraints=[SQL('DEFAULT CURRENT_TIMESTAMP')], index=True)
And some initial data:
Person.insert_many(rows=[{'name': 'Frank'}, {'name': 'Joe'}, {'name': 'Arnold'}]).execute()
print ('Person.select().count():',Person.select().count())
Output:
Person.select().count(): 3
Say we want to add a bunch new status updates, like the ones in this list:
new_status_updates = [ ('Frank', 'wat')
, ('Frank', 'nooo')
, ('Joe', 'noooo')
, ('Arnold', 'nooooo')]
We might try to use insert_many() like so:
StatusUpdate.insert_many( rows=[{'person': 'Frank', 'status': 'wat'}
, {'person': 'Frank', 'status': 'nooo'}
, {'person': 'Joe', 'status': 'noooo'}
, {'person': 'Arnold', 'status': 'nooooo'}]).execute()
But this would fail: the person field expects a Person model or a Person.id, and we would have to make an extra query to retrieve those from the names.
We might be able to avoid this with insert_from() allows us to make subqueries, but insert_from() has no way of processing our lists or dictionaries. What to do?

One idea is to use the SQL VALUES clause as part of a SELECT statement.
If you are familiar with SQL, you may have seen the VALUES clause before, it is commonly used as part of an INSERT statement like so:
INSERT INTO statusupdate (person_id,status)
VALUES (1, 'my status'), (1, 'another status'), (2, 'his status');
This tells the database to insert three rows - AKA tuples - into the table statusupdate.
Another way of inserting things though is to do something like:
INSERT INTO statusupdate (person_id,status)
SELECT ..., ... FROM <elsewhere or subquery>;
This is equivalent to the insert_from() functionality that peewee provides.
But there is another less common thing you can do: you can use the VALUES clause in any select to provide literal values. Example:
SELECT *
FROM (VALUES (1,2,3), (4,5,6)) as my_literal_values;
This will return a result-set of two rows/tuples, each with 3 values.
So, if you can convert the "bulk" insert into a SELECT/FROM/VALUES statement, you can then do whatever transformations you need to do (namely, convert Person.name values to corresponding Person.id values) and then combine it with the peewee 'insert_from()` functionality.
So let us see how this would look.
First let us begin constructing the VALUES clause itself. We want properly escaped values, so we will use question marks instead of the values for now, and put the actual values in later.
#this is gonna look like '(?,?), (?,?), (?,?)'
# or '(%s,%s), (%s,%s), (%s,%s)' depending on the database type
values_question_marks = ','.join(['(%s, %s)' % (db.interpolation,db.interpolation)]*len(new_status_updates))
The next step is to construct the values clause. Here is our first attempt:
--the %s here will be replaced by the question marks of the clause
--in postgres, you must have a name for every item in `FROM`
SELECT * FROM (VALUES %s) someanonymousname
OK, so now we have a result-set that looks like:
name | status
-----|-------
... | ...
Except! There are no column names. This will cause us a bit of heartache in a minute, so we have to figure out a way to give the result-set proper column names.
The postgres way would be to just alter the AS clause:
SELECT * FROM (VALUES %s) someanonymousname(name,status)
sqlite3 does not support that (grr).
So we are reduced to a kludge. Luckily stackoverflow provides: Is it possible to select sql server data using column ordinal position, and we can construct something like this:
SELECT NULL as name, NULL as status WHERE 1=0
UNION ALL
SELECT * FROM (VALUES %s) someanonymousname
This works by first creating an empty result-set with the proper column-names, and then concatenating the result-set from the VALUES clause to it. This will produce a result-set that has the proper column-names, will work in sqlite3, and in postgres.
Now to bring this back to peewee:
values_query = """
(
--a trick to make an empty query result with two named columns, to more portably name the resulting
--VALUES clause columns (grr sqlite)
SELECT NULL as name, NULL as status WHERE 1=0
UNION ALL
SELECT * FROM (VALUES %s) someanonymousname
)
"""
values_query %= (values_question_marks,)
#unroll the parameters into one large list
#this is gonna look like ['Frank', 'wat', 'Frank', 'nooo', 'Joe', 'noooo' ...]
values_query_params = [value for values in new_status_updates for value in values]
#turn it into peewee SQL
values_query = SQL(values_query,*values_query_params)
data_query = (Person
.select(Person.id, SQL('values_list.status').alias('status'))
.from_(Person,values_query.alias('values_list'))
.where(SQL('values_list.name') == Person.name))
insert_query = StatusUpdate.insert_from([StatusUpdate.person, StatusUpdate.status], data_query)
print (insert_query)
insert_query.execute()
print ('StatusUpdate.select().count():',StatusUpdate.select().count())
Output:
StatusUpdate.select().count(): 4

Python Sqlite3 insert operation with a list of column names

Normally, if i want to insert values into a table, i will do something like this (assuming that i know which columns that the values i want to insert belong to):
conn = sqlite3.connect('mydatabase.db')
conn.execute("INSERT INTO MYTABLE (ID,COLUMN1,COLUMN2)\
VALUES(?,?,?)",[myid,value1,value2])
But now i have a list of columns (the length of list may vary) and a list of values for each columns in the list.
For example, if i have a table with 10 columns (Namely, column1, column2...,column10 etc). I have a list of columns that i want to update.Let's say [column3,column4]. And i have a list of values for those columns. [value for column3,value for column4].
How do i insert the values in the list to the individual columns that each belong?

As far as I know the parameter list in conn.execute works only for values, so we have to use string formatting like this:
import sqlite3
conn = sqlite3.connect(':memory:')
conn.execute('CREATE TABLE t (a integer, b integer, c integer)')
col_names = ['a', 'b', 'c']
values = [0, 1, 2]
conn.execute('INSERT INTO t (%s, %s, %s) values(?,?,?)'%tuple(col_names), values)
Please notice this is a very bad attempt since strings passed to the database shall always be checked for injection attack. However you could pass the list of column names to some injection function before insertion.
EDITED:
For variables with various length you could try something like
exec_text = 'INSERT INTO t (' + ','.join(col_names) +') values(' + ','.join(['?'] * len(values)) + ')'
conn.exec(exec_text, values)
# as long as len(col_names) == len(values)

Of course string formatting will work, you just need to be a bit cleverer about it.
col_names = ','.join(col_list)
col_spaces = ','.join(['?'] * len(col_list))
sql = 'INSERT INTO t (%s) values(%s)' % (col_list, col_spaces)
conn.execute(sql, values)

I was looking for a solution to create columns based on a list of unknown / variable length and found this question. However, I managed to find a nicer solution (for me anyway), that's also a bit more modern, so thought I'd include it in case it helps someone:
import sqlite3
def create_sql_db(my_list):
file = 'my_sql.db'
table_name = 'table_1'
init_col = 'id'
col_type = 'TEXT'
conn = sqlite3.connect(file)
c = conn.cursor()
# CREATE TABLE (IF IT DOESN'T ALREADY EXIST)
c.execute('CREATE TABLE IF NOT EXISTS {tn} ({nf} {ft})'.format(
tn=table_name, nf=init_col, ft=col_type))
# CREATE A COLUMN FOR EACH ITEM IN THE LIST
for new_column in my_list:
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))
conn.close()
my_list = ["Col1", "Col2", "Col3"]
create_sql_db(my_list)
All my data is of the type text, so I just have a single variable "col_type" - but you could for example feed in a list of tuples (or a tuple of tuples, if that's what you're into):
my_other_list = [("ColA", "TEXT"), ("ColB", "INTEGER"), ("ColC", "BLOB")]
and change the CREATE A COLUMN step to:
for tupl in my_other_list:
new_column = tupl[0] # "ColA", "ColB", "ColC"
col_type = tupl[1] # "TEXT", "INTEGER", "BLOB"
c.execute('ALTER TABLE {tn} ADD COLUMN "{cn}" {ct}'.format(
tn=table_name, cn=new_column, ct=col_type))

As a noob, I can't comment on the very succinct, updated solution #ron_g offered. While testing, though I had to frequently delete the sample database itself, so for any other noobs using this to test, I would advise adding in:
c.execute('DROP TABLE IF EXISTS {tn}'.format(
tn=table_name))
Prior the the 'CREATE TABLE ...' portion.
It appears there are multiple instances of
.format(
tn=table_name ....)
in both 'CREATE TABLE ...' and 'ALTER TABLE ...' so trying to figure out if it's possible to create a single instance (similar to, or including in, the def section).

python sqlite insert named parameters or null

I'm trying to insert data from a dictionary into a database using named parameters. I have this working with a simple SQL statement e.g.
SQL = "INSERT INTO status (location, arrival, departure) VALUES (:location, :arrival,:departure)"
dict = {'location': 'somewhere', 'arrival': '1000', 'departure': '1001'}
c.execute(SQL,dict)
Inserts somewhere into location, 1000 into the arrival column, and 1001 into departure column.
The data that I will actually have will contain location but may contain either arrival, or departure but might not have both (in which case either nothing or NULL can go into the table). In this case, I get sqlite3.ProgrammingError: You did not supply a value for binding 2.
I can fix this by using defaultdict:
c.execute(SQL,defaultdict(str,dict))
To make things slightly more complicated, I will actually have a list of dictionaries containing multiple locations with either an arrival or departure.
({'location': 'place1', 'departure': '1000'},
{'location': 'palce2', 'arrival': '1010'},
{'location': 'place2', 'departure': '1001'})
and I want to be able to run this with c.executemany however I now can't use defaultdict.
I could loop through each dictionary in the list and run many c.execute statements, but executemany seems a tidier way to do it.
I've simplified this example for convenience, the actual data has many more entries in the dictionary, and I build it from a JSON text file.
Anyone have any suggestions for how I could do this?

Use None to insert a NULL:
dict = {'location': 'somewhere', 'arrival': '1000', 'departure': None}
You can use a default dictionary and a generator to use this with executemany():
defaults = {'location': '', 'arrival': None, 'departure': None}
c.executemany(SQL, ({k: d.get(k, defaults[k]) for k in defaults} for d in your_list_of_dictionaries)

There is a simpler solution to this problem that should be feasible in most cases; just pass to executemany a list of defaultdict instead of a list of dict.
In other words, if you build from scratch your rows as defaultdict you can pass the list of defaultdict rows directly to the command executemany, instead of building them as dictionaries and later patch the situation before using executemany.
The following working example (Python 3.4.3) shows the point:
import sqlite3
from collections import defaultdict
# initialization
db = sqlite3.connect(':memory:')
c = db.cursor()
c.execute("CREATE TABLE status(location TEXT, arrival TEXT, departure TEXT)")
SQL = "INSERT INTO status VALUES (:location, :arrival, :departure)"
# build each row as a defaultdict
f = lambda:None # use str if you prefer
row1 = defaultdict(f,{'location':'place1', 'departure':'1000'})
row2 = defaultdict(f,{'location':'place2', 'arrival':'1010'})
rows = (row1, row2)
# insert rows, executemany can be safely used without additional code
c.executemany(SQL, rows)
db.commit()
# print result
c.execute("SELECT * FROM status")
print(list(zip(*c.description))[0])
for r in c.fetchall():
print(r)
db.close()
If you run it, it prints:
('location', 'arrival', 'departure')
('place1', None, '1000') # None in Python maps to NULL in sqlite3
('place2', '1010', None)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - creating string from dict items (for writing to postgresql db) - python

Related

syntax error when trying to insert list of tuples using psycopg2 [duplicate]

Is there a way to automate UNION ALL insertion?

peewee orm: bulk insert using a subquery but is based on python-side-data

Python Sqlite3 insert operation with a list of column names

python sqlite insert named parameters or null

Categories

Resources