Any gotchas I should be aware of? Can I store it in a text field, or do I need to use a blob?
(I'm not overly familiar with either pickle or sqlite, so I wanted to make sure I'm barking up the right tree with some of my high-level design ideas.)
I needed to achieve the same thing too.
I turns out it caused me quite a headache before I finally figured out, thanks to this post, how to actually make it work in a binary format.
To insert/update:
pdata = cPickle.dumps(data, cPickle.HIGHEST_PROTOCOL)
curr.execute("insert into table (data) values (:data)", sqlite3.Binary(pdata))
You must specify the second argument to dumps to force a binary pickling.
Also note the sqlite3.Binary to make it fit in the BLOB field.
To retrieve data:
curr.execute("select data from table limit 1")
for row in curr:
data = cPickle.loads(str(row['data']))
When retrieving a BLOB field, sqlite3 gets a 'buffer' python type, that needs to be strinyfied using str before being passed to the loads method.
If you want to store a pickled object, you'll need to use a blob, since it is binary data. However, you can, say, base64 encode the pickled object to get a string that can be stored in a text field.
Generally, though, doing this sort of thing is indicative of bad design, since you're storing opaque data you lose the ability to use SQL to do any useful manipulation on that data. Although without knowing what you're actually doing, I can't really make a moral call on it.
I wrote a blog about this idea, except instead of a pickle, I used json, since I wanted it to be interoperable with perl and other programs.
http://writeonly.wordpress.com/2008/12/05/simple-object-db-using-json-and-python-sqlite/
Architecturally, this is a quick and dirty way to get persistence, transactions, and the like for arbitrary data structures. I have found this combination to be really useful when I want persistence, and don't need to do much in the sql layer with the data (or it's very complex to deal with in sql, and simple with generators).
The code itself is pretty simple:
# register the "loader" to get the data back out.
sqlite3.register_converter("pickle", cPickle.loads)
Then, when you want to dump it into the db,
p_string = p.dumps( dict(a=1,b=[1,2,3]))
conn.execute('''
create table snapshot(
id INTEGER PRIMARY KEY AUTOINCREMENT,
mydata pickle);
''')
conn.execute('''
insert into snapshot values
(null, ?)''', (p_string,))
''')
Pickle has both text and binary output formats. If you use the text-based format you can store it in a TEXT field, but it'll have to be a BLOB if you use the (more efficient) binary format.
I have to agree with some of the comments here. Be careful and make sure you really want to save pickle data in a db, there's probably a better way.
In any case I had trouble in the past trying to save binary data in the sqlite db.
Apparently you have to use the sqlite3.Binary() to prep the data for sqlite.
Here's some sample code:
query = u'''insert into testtable VALUES(?)'''
b = sqlite3.Binary(binarydata)
cur.execute(query,(b,))
con.commit()
Since Pickle can dump your object graph to a string it should be possible.
Be aware though that TEXT fields in SQLite uses database encoding so you might need to convert it to a simple string before you un-pickle.
If a dictionary can be pickled, it can be stored in text/blob field as well.
Just be aware of the dictionaries that can't be pickled (aka that contain unpickable objects).
Yes, you can store a pickled object in a TEXT or BLOB field in an SQLite3 database, as others have explained.
Just be aware that some object cannot be pickled. The built-in container types can (dict, set, list, tuple, etc.). But some objects, such as file handles, refer to state that is external to their own data structures, and other extension types have similar problems.
Since a dictionary can contain arbitrary nested data structures, it might not be pickle-able.
SpoonMeiser is correct, you need to have a strong reason to pickle into a database.
It's not difficult to write Python objects that implement persistence with SQLite. Then you can use the SQLite CLI to fiddle with the data as well. Which in my experience is worth the extra bit of work, since many debug and admin functions can be simply performed from the CLI rather than writing specific Python code.
In the early stages of a project, I did what you propose and ended up re-writing with a Python class for each business object (note: I didn't say for each table!) This way the body of the application can focus on "what" needs to be done rather than "how" it is done.
The other option, considering that your requirement is to save a dict and then spit it back out for the user's "viewing pleasure", is to use the shelve module which will let you persist any pickleable data to file. The python docs are here.
Depending on what you're working on, you might want to look into the shove module. It does something similar, where it auto-stores Python objects inside a sqlite database (and all sorts of other options) and pretends to be a dictionary (just like the shelve module).
It is possible to store object data as pickle dump, jason etc but it is also possible to index, them, restrict them and run select queries that use those indices. Here is example with tuples, that can be easily applied for any other python class. All that is needed is explained in python sqlite3 documentation (somebody already posted the link). Anyway here it is all put together in the following example:
import sqlite3
import pickle
def adapt_tuple(tuple):
return pickle.dumps(tuple)
sqlite3.register_adapter(tuple, adapt_tuple) #cannot use pickle.dumps directly because of inadequate argument signature
sqlite3.register_converter("tuple", pickle.loads)
def collate_tuple(string1, string2):
return cmp(pickle.loads(string1), pickle.loads(string2))
#########################
# 1) Using declared types
con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
con.create_collation("cmptuple", collate_tuple)
cur = con.cursor()
cur.execute("create table test(p tuple unique collate cmptuple) ")
cur.execute("create index tuple_collated_index on test(p collate cmptuple)")
cur.execute("select name, type from sqlite_master") # where type = 'table'")
print(cur.fetchall())
p = (1,2,3)
p1 = (1,2)
cur.execute("insert into test(p) values (?)", (p,))
cur.execute("insert into test(p) values (?)", (p1,))
cur.execute("insert into test(p) values (?)", ((10, 1),))
cur.execute("insert into test(p) values (?)", (tuple((9, 33)) ,))
cur.execute("insert into test(p) values (?)", (((9, 5), 33) ,))
try:
cur.execute("insert into test(p) values (?)", (tuple((9, 33)) ,))
except Exception as e:
print e
cur.execute("select p from test order by p")
print "\nwith declared types and default collate on column:"
for raw in cur:
print raw
cur.execute("select p from test order by p collate cmptuple")
print "\nwith declared types collate:"
for raw in cur:
print raw
con.create_function('pycmp', 2, cmp)
print "\nselect grater than using cmp function:"
cur.execute("select p from test where pycmp(p,?) >= 0", ((10, ),) )
for raw in cur:
print raw
cur.execute("explain query plan select p from test where p > ?", ((3,)))
for raw in cur:
print raw
print "\nselect grater than using collate:"
cur.execute("select p from test where p > ?", ((10,),) )
for raw in cur:
print raw
cur.execute("explain query plan select p from test where p > ?", ((3,)))
for raw in cur:
print raw
cur.close()
con.close()
Many applications use sqlite3 as a backend for SQLAlchemy so, naturally, this question can be asked in the SQLAlchemy framework as well (which is how I came across this question).
To do this, one will have wanted to define the column in which the pickle data is desired to be stored to store "PickleType" data. The implementation is pretty straightforward:
from sqlalchemy import PickleType, Integer
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
import pickle
Base= declarative_base()
class User(Base):
__tablename__= 'Users'
id= Column(Integer, primary_key= True)
user_login_data_array= Column(PickleType)
login_information= {'User1':{'Times': np.arange(0,20),
'IP': ['123.901.12.189','123.441.49.391']}}
engine= create_engine('sqlite:///memory:',echo= False)
Base.metadata.create_all(engine)
Session_maker= sessionmaker(bind=engine)
Session= Session_maker()
# The pickling here is very intuitive! Just need to have
# defined the column "user_login_data_array" to take pickletype data.
pickled_login_data_array= pickle.dumps(login_information)
user_object_to_add= User(user_login_data_array= pickled_login_data_array)
Session.add(user_object_to_add)
Session.commit()
(I'm not claiming that this example would best be suited to use pickle, as others have noted issues with.)
See this solution at SourceForge:
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
Related
Pretty new to sqlite3, so bear with me here..
I'd like to have a function to which I can pass the table name, and the values to update.
I initially started with something like this:
def add_to_table(table_name, string):
cursor.execute('INSERT INTO {table} VALUES ({var})'
.format(
table=table_name,
var=string)
)
Which works A-OK, but further reading about sqlite3 suggested that this was a terribly insecure way to go about things. However, using their ? syntax, I'm unable to pass in a name to specify the variable.
I tried adding in a ? in place of the table, but that throws a syntax error.
cursor.execute('INSERT INTO ? VALUES (?)', ('mytable','"Jello, world!"'))
>> >sqlite3.OperationalError: near "?": syntax error
Can the table in an sql statement be passed in safely and dynamically?
Its not the dynamic string substitution per-se thats the problem. Its dynamic string substitution with an user-supplied string thats the big problem because that opens you to SQL-injection attacks. If you are absolutely 100% sure that the tablename is a safe string that you control then splicing it into the SQL query will be safe.
if some_condition():
table_name = 'TABLE_A'
else:
table_name = 'TABLE_B'
cursor.execute('INSERT INTO '+ table_name + 'VALUES (?)', values)
That said, using dynamic SQL like that is certainly a code smell so you should double check to see if you can find a simpler alternative without the dynamically generated SQL strings. Additionally, if you really want dynamic SQL then something like SQLAlchemy might be useful to guarantee that the SQL you generate is well formed.
Composing SQL statements using string manipulation is odd not only because of security implications, but also because strings are "dumb" objects. Using sqlalchemy core (you don't even need the ORM part) is almost like using strings, but each fragment will be a lot smarter and allow for easier composition. Take a look at the sqlalchemy wiki to get a notion of what I'm talking about.
For example, using sqlsoup your code would look like this:
db = SQLSoup('sqlite://yourdatabase')
table = getattr(db, tablename)
table.insert(fieldname='value', otherfield=123)
db.commit()
Another advantage: code is database independent - want to move to oracle? Change the connection string and you are done.
My objective is to store a JSON object into a MySQL database field of type json, using the mysql.connector library.
import mysql.connector
import json
jsonData = json.dumps(origin_of_jsonData)
cnx = mysql.connector.connect(**config_defined_elsewhere)
cursor = cnx.cursor()
cursor.execute('CREATE DATABASE dataBase')
cnx.database = 'dataBase'
cursor = cnx.cursor()
cursor.execute('CREATE TABLE table (id_field INT NOT NULL, json_data_field JSON NOT NULL, PRIMARY KEY (id_field))')
Now, the code below WORKS just fine, the focus of my question is the use of '%s':
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES (%s, %s)"
values_to_insert = (1, jsonData)
cursor.execute(insert_statement, values_to_insert)
My problem with that: I am very strictly adhering to the use of '...{}'.format(aValue) (or f'...{aValue}') when combining variable aValue(s) into a string, thus avoiding the use of %s (whatever my reasons for that, let's not debate them here - but it is how I would like to keep it wherever possible, hence my question).
In any case, I am simply unable, whichever way I try, to create something that stores the jsonData into the mySql dataBase using something that resembles the above structure and uses '...{}'.format() (in whatever shape or form) instead of %s. For example, I have (among many iterations) tried
insert_statement = "INSERT INTO table (id_field, json_data_field) VALUES ({}, {})".format(1, jsonData)
cursor.execute(insert_statement)
but no matter how I turn and twist it, I keep getting the following error:
ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[some_content_from_jsonData})]' at line 1
Now my question(s):
1) Is there a way to avoid the use of %s here that I am missing?
2) If not, why? What is it that makes this impossible? Is it the cursor.execute() function, or is it the fact that it is a JSON object, or is it something completely different? Shouldn't {}.format() be able to do everything that %s could do, and more?
First of all: NEVER DIRECTLY INSERT YOUR DATA INTO YOUR QUERY STRING!
Using %s in a MySQL query string is not the same as using it in a python string.
In python, you just format the string and 'hello %s!' % 'world' becomes 'hello world!'. In SQL, the %s signals parameter insertion. This sends your query and data to the server separately. You are also not bound to this syntax. The python DB-API specification specifies more styles for this: DB-API parameter styles (PEP 249). This has several advantages over inserting your data directly into the query string:
Prevents SQL injection
Say you have a query to authenticate users by password. You would do that with the following query (of course you would normally salt and hash the password, but that is not the topic of this question):
SELECT 1 FROM users WHERE username='foo' AND password='bar'
The naive way to construct this query would be:
"SELECT 1 FROM users WHERE username='{}' AND password='{}'".format(username, password)
However, what would happen if someone inputs ' OR 1=1 as password. The formatted query would then become
SELECT 1 FROM users WHERE username='foo' AND password='' OR 1=1
which will allways return 1. When using parameter insertion:
execute('SELECT 1 FROM users WHERE username=%s AND password=%s', username, password)
this will never happen, as the query will be interpreted by the server separately.
Performance
If you run the same query many times with different data, the performance difference between using a formatted query and parameter insertion can be significant. With parameter insertion, the server only has to compile the query once (as it is the same every time) and execute it with different data, but with string formatting, it will have to compile it over and over again.
In addition to what was said above, I would like to add some details that I did not immediately understand, and that other (newbies like me ;)) may also find helpful:
1) "parameter insertion" is meant for only for values, it will not work for table names, column names, etc. - for those, the Python string substitution works fine in the sql syntax defintion
2) the cursor.execute function requires a tuple to work (as specified here, albeit not immediately clear, at least to me: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-execute.html)
EXAMPLE for both in one function:
def checkIfRecordExists(column, table, condition_name, condition_value):
...
sqlSyntax = 'SELECT {} FROM {} WHERE {} = %s'.format(column, table, condition_name)
cursor.execute(sqlSyntax, (condition_value,))
Note both the use of .format in the initial sql syntax definition and the use of (condition_value,) in the execute function.
I've got a PostgreSQL table with a column of type bytea. Porting that table from SQLite, I ran into an issue - I couldn't figure out how to pass raw binary data to an SQL query. The framework I use is PyGreSQL. I want to stick to the DB-API 2.0 interface to avoid a lot of conversion.
That interface, unlike the classic one (dollar-sign parameters) and SQLite (question-mark parameters), requires to specify the type (%-formatting like the old Python's).
The data I want to pass is a PNG file, binary-read using the 'rb' flag in the open() method.
The query code looks like this:
db = pgdb.connect(args)
c = db.cursor()
c.execute('INSERT INTO tbl VALUES (%b)', (b'test_bytes',))
This gives an error unsupported format character 'b' (0x62) at index 54 and doesn't allow the formatting to happen. What could be done to solve this issue?
Not really a solution, but a good-enough workaround. I decided to use psycopg2 instead of PyGreSQL as it is more supported and common, so it's easier to find info about any common problems.
There, the solution would be to use %s for every type, which I find much more Pythonic (no need to think about types in a dynamically-typed language)
So, my query would look like this:
c.execute('INSERT INTO tbl VALUES (%s)', (b'test_bytes',))
Pretty new to sqlite3, so bear with me here..
I'd like to have a function to which I can pass the table name, and the values to update.
I initially started with something like this:
def add_to_table(table_name, string):
cursor.execute('INSERT INTO {table} VALUES ({var})'
.format(
table=table_name,
var=string)
)
Which works A-OK, but further reading about sqlite3 suggested that this was a terribly insecure way to go about things. However, using their ? syntax, I'm unable to pass in a name to specify the variable.
I tried adding in a ? in place of the table, but that throws a syntax error.
cursor.execute('INSERT INTO ? VALUES (?)', ('mytable','"Jello, world!"'))
>> >sqlite3.OperationalError: near "?": syntax error
Can the table in an sql statement be passed in safely and dynamically?
Its not the dynamic string substitution per-se thats the problem. Its dynamic string substitution with an user-supplied string thats the big problem because that opens you to SQL-injection attacks. If you are absolutely 100% sure that the tablename is a safe string that you control then splicing it into the SQL query will be safe.
if some_condition():
table_name = 'TABLE_A'
else:
table_name = 'TABLE_B'
cursor.execute('INSERT INTO '+ table_name + 'VALUES (?)', values)
That said, using dynamic SQL like that is certainly a code smell so you should double check to see if you can find a simpler alternative without the dynamically generated SQL strings. Additionally, if you really want dynamic SQL then something like SQLAlchemy might be useful to guarantee that the SQL you generate is well formed.
Composing SQL statements using string manipulation is odd not only because of security implications, but also because strings are "dumb" objects. Using sqlalchemy core (you don't even need the ORM part) is almost like using strings, but each fragment will be a lot smarter and allow for easier composition. Take a look at the sqlalchemy wiki to get a notion of what I'm talking about.
For example, using sqlsoup your code would look like this:
db = SQLSoup('sqlite://yourdatabase')
table = getattr(db, tablename)
table.insert(fieldname='value', otherfield=123)
db.commit()
Another advantage: code is database independent - want to move to oracle? Change the connection string and you are done.
I have what is likely an easy question. I'm trying to pull a JSON from an online source, and store it in a SQLite table. In addition to storing the data in a rich table, corresponding to the many fields in the JSON, I would like to also just dump the entire JSON into a table every time it is pulled.
The table looks like:
CREATE TABLE Raw_JSONs (ID INTEGER PRIMARY KEY ASC, T DATE DEFAULT (datetime('now','localtime')), JSON text);
I've pulled a JSON from some URL using the following python code:
from pyquery import PyQuery
from lxml import etree
import urllib
x = PyQuery(url='json')
y = x('p').text()
Now, I'd like to execute the following INSERT command:
import sqlite3
db = sqlite3.connect('a.db')
c = db.cursor()
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", y)
But I'm told that I've supplied the incorrect number bindings (i.e. thousands, instead of just 1). I gather it's reading the y variable as all the different elements of the JSON.
Can someone help me store just the JSON, in it's entirety?
Also, as I'm obviously new to this JSON game, any online resources to recommend would be amazing.
Thanks!
.execute() expects a sequence, better give it a one-element tuple:
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", (y,))
A Python string is a sequence too, one of individual characters. So the .execute() call tried to treat each separate character as a parameter for your query, and unless your string is one character short that means it'll not provide the right number of parameters.
Don't forget to commit your inserts:
db.commit()
or use the database connection as a context manager:
with db:
# inserts executed here will automatically commit if no exceptions are raised.
You may also be interested to know about the built in sqlite modules adapters. These can convert any python object to an sqlite column both ways. See the standard documentation and the adapters section.