I have a python script which creates some objects.
I would like to be able to save these objects into my postgres database for use later.
My thinking was I could pickle an object, then store that in a field in the db.
But I'm going round in circles about how to store and retrieve and use the data.
I've tried storing the pickle binary string as text but I can't work out how to encode / escape it. Then how to load the string as a binary string to unpickle.
I've tried storing the data as bytea both with psycopg2.Binary(data) and without.
Then reading into buffer and encoding with base64.b64encode(result) but it's not coming out the same and cannot be unpickled.
Is there a simple way to store and retrieve python objects in a SQL (postgres) database?
Following the comment from #SergioPulgarin I tried the following which worked!
N.B Edit2 following comment by #Tomalak
Storing:
Pickle the object to a binary string
pickle_string = pickle.dumps(object)
Store the pickle string in a bytea (binary) field in postgres. Use simple INSERT query in Psycopg2
Retrieval:
Select the field in Psycopg2. (simple SELECT query)
Unpickle the decoded result
retrieved_pickle_string = pickle.loads(decoded_result)
Hope that helps anybody trying to do something similar!
Related
I have a JSON file, how to store a JSON file in MS SQL? And read the file data after storing it into Database?
I'm using a python script to interact with SQL Server.
Note: I don't want to store key-value pairs as an individual records in DB, I want to store the whole file in DB using python.
There is no specific data type for JSON in SQL Server, unlike say XML which has the xml data type.
If you are, however, storing JSON data in SQL Server then you will want to use an nvarchar(MAX). If you are on SQL Server 2016+ I also recommend adding a CHECK CONSTRAINT to the column to ensure that the JSON is valid, as otherwise parsing it (in SQL) will be impossible. You can check if a value is valid JSON using ISJSON. For example, if you were adding the column to an existing table:
ALTER TABLE dbo.YourTable ADD YourJSON nvarchar(MAX) NULL;
GO
ALTER TABLE dbo.YourTable ADD CONSTRAINT chk_YourTable_ValidJSON CHECK (ISJSON(YourJSON) = 1 OR YourJSON IS NULL);
SQL server has a JSON data type for this. This is wrong.
If your version doesn’t, you can just store it as a string with VARCHAR or TEXT.
This article reckons NVARCHAR(max) is the answer for documents greater than 8KB, for documents under that you can use NVARCHAR(4000) which apparently has better performance.
I am using Django to access a stored function in my postgres DB. When I execute the function inside of Postgres, it returns doublequotes and valid json. However, when I call the function from Django (which uses psycopg2) the doublequotes are removed and single quotes replace them.
It seems psycopg2 is doing some type of conversion to lists / dictionary in the background. However, I need to keep the json. Any ideas how to resolve this?
You can override the functionality of psycopg2 auto converting the JSON object/array by registering a no-op function with register_default_json()
psycopg2.extras.register_default_json(loads=lambda x: x)
Quote from the Docs
Psycopg automatically converts PostgreSQL json data into Python
objects. How can I receive strings instead? The easiest way to avoid
JSON parsing is to register a no-op function with
register_default_json():
psycopg2.extras.register_default_json(loads=lambda x: x) See JSON
adaptation for further details.
Source http://initd.org/psycopg/docs/faq.html?highlight=json#problems-with-type-conversions
Additional Reading
http://initd.org/psycopg/docs/extras.html#adapt-json
https://docs.djangoproject.com/en/2.2/ref/contrib/postgres/fields/#jsonfield (Not sure what you're attempting to do with the stored function but this may help alleviate the need for one)
I've got a little bit of a tricky question here regarding converting JSON strings into Python data dictionaries for analysis in Pandas. I've read a bunch of other questions on this but none seem to work for my case.
Previously, I was simply using CSVs (and Pandas' read_csv function) to perform my analysis, but now I've moved to pulling data directly from PostgreSQL.
I have no problem using SQLAlchemy to connect to my engine and run my queries. My whole script runs the same as it did when I was pulling the data from CSVs. That is, until it gets to the part where I'm trying to convert one of the columns (namely, the 'config' column in the sample text below) from JSON into a Python dictionary. The ultimate goal of converting it into a dict is to be able to count the number of responses under the "options" field within the "config" column.
df = pd.read_sql_query('SELECT questions.id, config from questions ', engine)
df = df['config'].apply(json.loads)
df = pd.DataFrame(df.tolist())
df['num_options'] = np.array([len(row) for row in df.options])
When I run this, I get the error "TypeError: expected string or buffer". I tried converting the data in the 'config' column to string from object, but that didn't do the trick (I get another error, something like "ValueError: Expecting property name...").
If it helps, here's a snipped of data from one cell in the 'config' column (the code should return the result '6' for this snipped since there are 6 options):
{"graph_by":"series","options":["Strongbow Case Card/Price Card","Strongbow Case Stacker","Strongbow Pole Topper","Strongbow Base wrap","Other Strongbow POS","None"]}
My guess is that SQLAlchemy does something weird to JSON strings when it pulls them from the database? Something that doesn't happen when I'm just pulling CSVs from the database?
In recent Psycopg versions the Postgresql json(b) adaption to Python is transparent. Psycopg is the default SQLAlchemy driver for Postgresql
df = df['config']['options']
From the Psycopg manual:
Psycopg can adapt Python objects to and from the PostgreSQL json and jsonb types. With PostgreSQL 9.2 and following versions adaptation is available out-of-the-box. To use JSON data with previous database versions (either with the 9.1 json extension, but even if you want to convert text fields to JSON) you can use the register_json() function.
Just sqlalchemy query:
q = session.query(
Question.id,
func.jsonb_array_length(Question.config["options"]).label("len")
)
Pure sql and pandas' read_sql_query:
sql = """\
SELECT questions.id,
jsonb_array_length(questions.config -> 'options') as len
FROM questions
"""
df = pd.read_sql_query(sql, engine)
Combine both (my favourite):
# take `q` from the above
df = pd.read_sql(q.statement, q.session.bind)
In my application I am using a postgresql database table with a "text" column to store
pickled python objects.
As database driver I'm using psycopg2 and until now I only passed python-strings (not unicode-objects) to the DB and retrieved strings from the DB. This basically worked fine until I recently decided to make String-handling the better/correct way and added the following construct to my DB-layer:
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
This basically works fine everywhere in my application and I'm using unicode-objects where possible now.
But for this special case with the text-column containing the pickled objects it makes troubles. I got it working in my test-system this way:
retrieving the data:
SELECT data::bytea, params FROM mytable
writing the data:
execute("UPDATE mytable SET data=%s", (psycopg2.Binary(cPickle.dumps(x)),) )
... but unfortunately I'm getting errors with the SELECT for some columns in the production-system:
psycopg2.DataError: invalid input syntax for type bytea
This error also happens when I try to run the query in the psql shell.
Basically I'm planning to convert the column from "text" to "bytea", but the error
above also prevents me from doing this conversion.
As far as I can see, (when retrieving the column as pure python string) there are only characters with ord(c)<=127 in the string.
The problem is that casting text to bytea doesn't mean, take the bytes in the string and assemble them as a bytea value, but instead take the string and interpret it as an escaped input value to the bytea type. So that won't work, mainly because pickle data contains lots of backslashes, which bytea interprets specially.
Try this instead:
SELECT convert_to(data, 'LATIN1') ...
This converts the string into a byte sequence (bytea value) in the LATIN1 encoding. For you, the exact encoding doesn't matter, because it's all ASCII (but there is no ASCII encoding).
I have what is likely an easy question. I'm trying to pull a JSON from an online source, and store it in a SQLite table. In addition to storing the data in a rich table, corresponding to the many fields in the JSON, I would like to also just dump the entire JSON into a table every time it is pulled.
The table looks like:
CREATE TABLE Raw_JSONs (ID INTEGER PRIMARY KEY ASC, T DATE DEFAULT (datetime('now','localtime')), JSON text);
I've pulled a JSON from some URL using the following python code:
from pyquery import PyQuery
from lxml import etree
import urllib
x = PyQuery(url='json')
y = x('p').text()
Now, I'd like to execute the following INSERT command:
import sqlite3
db = sqlite3.connect('a.db')
c = db.cursor()
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", y)
But I'm told that I've supplied the incorrect number bindings (i.e. thousands, instead of just 1). I gather it's reading the y variable as all the different elements of the JSON.
Can someone help me store just the JSON, in it's entirety?
Also, as I'm obviously new to this JSON game, any online resources to recommend would be amazing.
Thanks!
.execute() expects a sequence, better give it a one-element tuple:
c.execute("insert into Raw_JSONs values(NULL,DATETIME('now'),?)", (y,))
A Python string is a sequence too, one of individual characters. So the .execute() call tried to treat each separate character as a parameter for your query, and unless your string is one character short that means it'll not provide the right number of parameters.
Don't forget to commit your inserts:
db.commit()
or use the database connection as a context manager:
with db:
# inserts executed here will automatically commit if no exceptions are raised.
You may also be interested to know about the built in sqlite modules adapters. These can convert any python object to an sqlite column both ways. See the standard documentation and the adapters section.