Unable to convert PostgreSQL text column to bytea - python

In my application I am using a postgresql database table with a "text" column to store
pickled python objects.
As database driver I'm using psycopg2 and until now I only passed python-strings (not unicode-objects) to the DB and retrieved strings from the DB. This basically worked fine until I recently decided to make String-handling the better/correct way and added the following construct to my DB-layer:
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
This basically works fine everywhere in my application and I'm using unicode-objects where possible now.
But for this special case with the text-column containing the pickled objects it makes troubles. I got it working in my test-system this way:
retrieving the data:
SELECT data::bytea, params FROM mytable
writing the data:
execute("UPDATE mytable SET data=%s", (psycopg2.Binary(cPickle.dumps(x)),) )
... but unfortunately I'm getting errors with the SELECT for some columns in the production-system:
psycopg2.DataError: invalid input syntax for type bytea
This error also happens when I try to run the query in the psql shell.
Basically I'm planning to convert the column from "text" to "bytea", but the error
above also prevents me from doing this conversion.
As far as I can see, (when retrieving the column as pure python string) there are only characters with ord(c)<=127 in the string.

The problem is that casting text to bytea doesn't mean, take the bytes in the string and assemble them as a bytea value, but instead take the string and interpret it as an escaped input value to the bytea type. So that won't work, mainly because pickle data contains lots of backslashes, which bytea interprets specially.
Try this instead:
SELECT convert_to(data, 'LATIN1') ...
This converts the string into a byte sequence (bytea value) in the LATIN1 encoding. For you, the exact encoding doesn't matter, because it's all ASCII (but there is no ASCII encoding).

Related

inserting brokenly formatted text into PostgreSQL table despite encoding errors with \copy

I have a text field (utf8) that misbehaves a little. originally fetched it includes the character sequence \u00e2\u0080\u0099 which is basically an apostrophe in another encoding. I have decided to maintain this corrupted state and not solve it despite the fact I have found a few solutions online on how to reinterpret these kinds of errors in my text field.
So I just want to insert the raw data as is.
I have tried 2 ways to insert this row.
Using python with peewee (an orm library).
with everything configured correctly this method actually works, the data is inserted and
there is a row in the database.
selecting the column yield: donâ\u0080\u0099t
which I am ok with keeping.
So far so good.
Writing a python script that prints tab delimited text and using \copy
Annoyingly this method does not work and returns the following error:
ERROR: invalid byte sequence for encoding "UTF8": 0x80
CONTEXT: COPY comment, line 1: "'donâ\x80\x99t'"
(when printing the data from the python script to console it shows up as donâ\x80\x99t)
Thus clearly there is a difference between what peewee does and my naive printing of the string from python (peewee and print receive the same string as input).
How do I encode this string correctly so I can use \copy to populate the row?

saving python object in postgres table with pickle

I have a python script which creates some objects.
I would like to be able to save these objects into my postgres database for use later.
My thinking was I could pickle an object, then store that in a field in the db.
But I'm going round in circles about how to store and retrieve and use the data.
I've tried storing the pickle binary string as text but I can't work out how to encode / escape it. Then how to load the string as a binary string to unpickle.
I've tried storing the data as bytea both with psycopg2.Binary(data) and without.
Then reading into buffer and encoding with base64.b64encode(result) but it's not coming out the same and cannot be unpickled.
Is there a simple way to store and retrieve python objects in a SQL (postgres) database?
Following the comment from #SergioPulgarin I tried the following which worked!
N.B Edit2 following comment by #Tomalak
Storing:
Pickle the object to a binary string
pickle_string = pickle.dumps(object)
Store the pickle string in a bytea (binary) field in postgres. Use simple INSERT query in Psycopg2
Retrieval:
Select the field in Psycopg2. (simple SELECT query)
Unpickle the decoded result
retrieved_pickle_string = pickle.loads(decoded_result)
Hope that helps anybody trying to do something similar!

Trying to save special characters in MySQL DB

I have a string that looks like this 🔴Use O Mozilla Que Não Trava! Testei! $vip ou $apoio
When I try to save it to my database with ...SET description = %s... and cursor.execute(sql, description) it gives me an error
Warning: (1366, "Incorrect string value: '\xF0\x9F\x94\xB4Us...' for column 'description' ...
Assuming this is an ASCII symbol, I tried description.decode('ascii') but this leads to
'str' object has no attribute 'decode'
How can I determine what encoding it is and how could I store anything like that to the database? The database is utf-8 encoded if that is important.
I am using Python3 and PyMySQL.
Any hints appreciated!
First, you need to make sure the table column has correct character set setting. If it is "latin1" you will not be able to store content that contains Unicode characters.
You can use following query to determine the column character set:
SELECT CHARACTER_SET_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='your_database_name' AND TABLE_NAME='your_table_name' AND COLUMN_NAME='description'
Following Mysql document here if you want to change column character set.
Also, you need to make sure character set is properly configured for Mysql connection. Quoted from Mysql doc:
Character set issues affect not only data storage, but also
communication between client programs and the MySQL server. If you
want the client program to communicate with the server using a
character set different from the default, you'll need to indicate
which one. For example, to use the utf8 Unicode character set, issue
this statement after connecting to the server:
SET NAMES 'utf8';
Once character set setting is correct, you will be able to execute your sql statement. There is no need to encode / decode in Python side. That is used for different purposes.

How to handle and escape varbinary database entries correctly in python

I have a script where I first read a table, saving an int and a varbinary(512) in form of a some_set = set(), some_set.add((db_row['int_val'], db_row['varbin_val')). A set of tuples..
There may be multiple rows having the same int_val/varbin_val combination, but those are the duplicates.
Now saving them seems to work fine. But when I try to INSERT INTO said saved rows, from the source table to the destination table, I get b"Incorrect syntax near '\\'
I assume this occurs, because the varbinary(512) is not escaped correctly. (i currently just have a str() wrapped around it.
How can I escape a varbinary(512) from a MSSQL database, saved in a string tuple, to use it in a SELECT/INSERT WHERE query?

UTF-8 Error when trying to insert a string into a PostgresQL BYTEA column with SQLAlchemy

I have a column in a PostgresQL table of type BYTEA. The model class defines the column as a LargeBinary field, which the documentation says "The Binary type generates BLOB or BYTEA when tables are created, and also converts incoming values using the Binary callable provided by each DB-API."
I have a Python string which I would like to insert into this table.
The Python string is:
'\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9'
The relevant snippet of my SQLAlchemy code is:
migrate_engine.execute(
"""
UPDATE table
SET x=%(x)s
WHERE id=%(id)s
""",
x=the_string_above,
id='1')
I am getting the error:
sqlalchemy.exc.DataError: (DataError) invalid byte sequence for encoding "UTF8": 0x83
'\n UPDATE table\n SET x=%(x)s\n WHERE id=%(id)s\n ' {'x': '\x83\x8a\x13,\x96G\xfd9ae\xc2\xaa\xc3syn\xd1\x94b\x1cq\xfa\xeby$\xf8\xfe\xfe\xc5\xb1\xf5\xb5Q\xaf\xc3i\xe3\xe4\x02+\x00ke\xf5\x9c\xcbA8\x8c\x89\x13\x00\x07T\xeb3\xbcp\x1b\xff\xd0\x00I\xb9', 'id': '1',}
If I go into the pgadmin3 console and enter the UPDATE command directly, the update works fine. The error is clearly from SQLAlchemy. The string is a valid Python2 string. The column has type BYTEA. The query works without SQLAlchemy. Can anyone see why Python thinks this byte string is in UTF-8?
Try wrapping the data in a buffer:
migrate_engine.execute(
"""
UPDATE table
SET x=%(x)s
WHERE id=%(id)s
""",
x=buffer(the_string_above),
id='1')

Categories

Resources