Encode with JSON or pickle to a Variable Using Python - python

I know it is possible to encode a Python object to a file using
import pickle
pickle.dump(obj, file)
or you can do nearly the same using JSON, but the problem is, these all encode or decode to a file, is it possible to encode an object into a string or bytes variable instead of a file?
I am running Python 3.2 on windows.

Sure, just use pickle.dumps(obj) or json.dumps(obj).

Related

Error loading pickle file created with Python 2.7 in Python 3.8

I have a pickle file which contains floating-point values. This file was created with Python 2.7. In Python 2.7 I used to load it like:
matrix_file = pickle.load(open('matrix.pickle', 'r'))
Now in Python 3.8 this code is giving error
TypeError: a bytes-like object is required, not 'str'
When I trid with 'rb' I got this error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 8: ordinal not in range(128)
So I tried another method
matrix_file = pickle.load(open('matrix.pickle', 'r', encoding='utf-8'))
Now I get a different error
TypeError: a bytes-like object is required, not 'str'
Update: When I try loading with joblib, I get this error
ValueError: You may be trying to read with python 3 a joblib pickle generated with python 2. This feature is not supported by joblib.
The file must be opened in binary mode and you need to provide an encoding for the pickle.load call. Typically, the encoding should either be "latin-1" (for pickles with numpy arrays, datetime, date and time objects, or when the strings were logically Latin-1), or "bytes" (to decode Python 2 str as bytes objects). So the code should be something like:
with open('matrix.pickle', 'rb') as f:
matrix_file = pickle.load(f, encoding='latin-1')
This assumes it was originally containing numpy arrays; if not, "bytes" might be the more appropriate encoding. I also used a with statement just for good form (and to ensure deterministic file closing on non-CPython interpreters).

How to retrieve correct value of a UTF-8 encoded (from unicode) string from a file from Python3 which was encoded using Python2?

I am moving my application from Python2 to Python 3. The application saves configuration to a file, and one of the attributes is encoded into utf-8 before saving.
Eg: username='ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ' is saved as '\xe1\x9a\xa0\xe1\x9b....x9a\xb1' (converted data type being str)
Since this config file will be retained across migration, when I try to retrieve user name, I cannot decode this back to unicode as in python3, str object has no attribute decode. Ideally the saved value in file should be treated as bytes, but since python2 does not do it, it creates problem.
I cannot convert this into bytes object as it changes the whole string, tried the same.
Cannot change the current application code as it is already on production.
I tried appending b' manually before the string, which did the trick. But that's a hack. Tried ast.literal_eval but again that's not working.
Pseudo Codes currently, which work fine on Python2 (before migrating to python3):
1. To save value in text file:
fp=open(filename,'w')
encoded_name=name.encode('utf-8')
fp.write(encoded_name)
fp.close()
2. To retrieve:
fp.open(filename, 'r') #or rb
encoded_name=fp.read()
fp.close()
return encoded_name.decode('utf-8)
Expected results:
Retrieved username from the config file should be treated as bytes instead of str.
If you use
fp.open(filename, 'r')
then you don't need to decode anything, it's already a unicode string.
But if you use
fp.open(filename, 'rb')
it should be decoded with encoded_name.decode('utf-8')

UnicodeDecodeError when using python 2.7 code on python 3.7 with cPickle

I am trying to use cPickle on a .pkl file constructed from a "parsed" .csv file. The parsing is undertaken using a pre-constructed python toolbox, which has recently been ported to python 3 from python 2 (https://github.com/GEMScienceTools/gmpe-smtk)
The code I'm using is as follows:
from smtk.parsers.esm_flatfile_parser import ESMFlatfileParser
parser=ESMFlatfileParser.autobuild("Database10","Metadata10","C:/Python37/TestX10","C:/Python37/NorthSea_Inc_SA.csv")
import cPickle
sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","r"))
It returns the following error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 44: character maps to <undefined>
From what I can gather, I need to specify the encoding of my .pkl file to enable cPickle to work but I do not know what the encoding is on the file produced from the parsing of the .csv file, so I can't use cPickle to currently do so.
I used the sublime text software to find it is "hexadecimal", but this is not an accepted encoding format in Python 3.7 is it not?
If anyone knows how to determine the encoding format required, or how to make hexadecimal encoding usable in Python 3.7 their help would be much appreciated.
P.s. the modules used such as "ESMFlatfileparser" are part of a pre-constructed toolbox. Considering this, is there a chance I may need to alter the encoding in some way within this module also?
The code is opening the file in text mode ('r'), but it should be binary mode ('rb').
From the documentation for pickle.load (emphasis mine):
[The] file can be an on-disk file opened for binary reading, an io.BytesIO object, or any other custom object that meets this interface.
Since the file is being opened in binary mode there is no need to provide an encoding argument to open. It may be necessary to provide an encoding argument to pickle.load. From the same documentation:
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime, date and time pickled by Python 2.
This ought to prevent the UnicodeDecodeError:
sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","rb"))

How to read serialized data by python2 cPikle with python3 pickle?

I'm trying to work with CIFAR-10 dataset which contains a special version for python.
It is a set of binary files, each representing a dictionary of 10k numpy matrices. The files were obviously created by python2 cPickle.
I tried to load it from python2 as follows:
import cPickle
with open("data/data_batch_1", "rb") as f:
data = cPickle.load(f)
This works really great. However, if I try to load the data from python3 (that hasn't cPickle but pickle instead), it fails:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f)
If fails with the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
Can I somehow transform the ofiginal dataset into new one that will be readable from python3? Or may I somehow read it from python3 direrctly?
I've tried loading it by cPickle, dumping it into json and reading it back by pickle, but numpy matrices obviously can't be written as a json file.
You'll need to tell pickle what codec to use for those bytestrings, or tell it to load the data as bytes instead. From the pickle.load() documentation:
The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
To load the strings as bytes objects that'd be:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f, encoding='bytes')

Unpickling a python 2 object with python 3

I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.
I've been running 2to3 on a large amount of company legacy code to get it up to date.
Having done this, when running the file I get the following error:
File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)
Looking at the pickled object in contention, it's a dict in a dict, containing keys and values of type str.
So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?
You'll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.
The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:
Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
Setting the encoding to latin1 allows you to import the data directly:
with open(mshelffile, 'rb') as f:
d = pickle.load(f, encoding='latin1')
but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.
The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.
Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.
Using encoding='latin1' causes some issues when your object contains numpy arrays in it.
Using encoding='bytes' will be better.
Please see this answer for complete explanation of using encoding='bytes'

Categories

Resources