Error when retrieving saved object using pickle - python

Working with the MESA agent based modelling package. Using pickle to save the state of my intermediate model. But when retrieving the saved model the execution ends up in error saying:
File "/home/demonwolf/PycharmProjects/pythonProject1/main.py", line 281, in <module>
empty_model = pickle.load(f)
File "/home/demonwolf/anaconda3/envs/ABM/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte```
Any help would be appreciated.
Thanks in advance.

The file (the f parameter in pickle.load(f)) should be open in binary read (rb) modeļ¼Œ not the default text (r) mode.
with open("path/to/your/pickle.bin", "rb") as f:
empty_model = pickle.load(f)

Related

Huggingface Electra - Load model trained with google implementation error: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

I have trained an electra model from scratch using google implementation code.
python run_pretraining.py --data-dir gc://bucket-electra/dataset/ --model-name greek_electra --hparams hparams.json
with this json hyperparams:
{
"embedding_size": 768,
"max_seq_length": 512,
"train_batch_size": 128,
"vocab_size": 100000,
"model_size": "base",
"num_train_steps": 1500000
}
After having trained the model, I used the convert_electra_original_tf_checkpoint_to_pytorch.py script from transformers library to convert the checkpoint.
python convert_electra_original_tf_checkpoint_to_pytorch.py --tf_checkpoint_path output/models/transformer/greek_electra --config_file resources/hparams.json --pytorch_dump_path output/models/transformer/discriminator --discriminator_or_generator "discriminator"
Now I am trying to load the model:
from transformers import ElectraForPreTraining
model = ElectraForPreTraining.from_pretrained('discriminator')
but I get the following error:
Traceback (most recent call last):
File "~/.local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 427, in get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
File "~/.local/lib/python3.9/site-packages/transformers/configuration_utils.py", line 510, in _dict_from_json_file
text = reader.read()
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte
Any ideas what's causing this & how to solve it?
It seems that #npit is right. The output of the convert_electra_original_tf_checkpoint_to_pytorch.py does not contain the configuration that I gave (hparams.json), therefore I created an ElectraConfig object -- with the same parameters -- and provided it to the from_pretrained function. That solved the issue.

Want to upload a sqlite.db file to a swift container using python swiftclient and always get a utf-8 error

i am trying to upload a sqlite.db(binary file) to a swift container using swiftclient in my python code.
import swiftclient
swift_conn.put_object
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 43: invalid start byte
the code i am using is:
import swiftclient
bmdatabase = "./logs/test.db'
with open(bmdatabase, 'r') as bmdatabase_file:
#remote
correctbmdatabasename = bmdatabase.replace("./logs/", "")
swift_conn.put_object(container_name,correctbmdatabasename,
contents=bmdatabase_file.read())
I finally found it by myself, if I want to read a binary file I have to read it with 'rb'
like
import swiftclient
bmdatabase = "./logs/test.db'
with open(bmdatabase, 'rb') as bmdatabase_file:
#remote
correctbmdatabasename = bmdatabase.replace("./logs/", "")
swift_conn.put_object(container_name,correctbmdatabasename,
contents=bmdatabase_file.read())

how to set proper encoding for json.load

I have been trying to load json this way:
data = json.load(f)
For some reasons that JSON has windows1251 encoding. So trying opening it causes error:
File "./labelme2voc.py", line 252, in main
data = json.load(f)
File "/home/dex/anaconda3/lib/python3.6/json/__init__.py", line 296, in load
return loads(fp.read(),
File "/home/dex/anaconda3/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 81: invalid continuation byte
How can I fix that? JSON load doesn't have such option to encoding be specified
Try this:
import json
filename = ... # specify filename here
with open(filename, encoding='cp1252') as f:
data = json.loads(f.read())

UnicodeDecodeError, utf-8 invalid continuation byte

I m trying to extract lines from a log file , using that code :
with open('fichier.01') as f:
content = f.readlines()
print (content)
but its always makes the error statement
Traceback (most recent call last):
File "./parsepy", line 4, in <module>
content = f.readlines()
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2213: invalid continuation byte
how can i fix it ??
try one of the following
open('fichier.01', 'rb')
open('fichier.01', encoding ='utf-8')
open('fichier.01', encoding ='ISO-8859-1')
or also you can use io Module:
import io
io.open('fichier.01')
This is a common error when opening files when using Python (or any language really). This is an error you will soon learn to catch.
If it's not encoded as text then you will have to open it in binary mode e.g.:
with open('fichier.01', 'rb') as f:
content = f.readlines()
If it's encoded as something other than UTF-8 and it can be opened in text mode then open takes an encoding argument: https://docs.python.org/3.5/library/functions.html#open
Try to use it to solve it:
with open('fichier.01', errors='ignore') as f:
###

UnicodeDecodeError: invalid start byte

I have a quick question about UnicodeDecodeError:invalid start byte.
I think somewhere in my text has non-UTF-8 Character, but location of error message is the starting point of reading a file, so I have no idea how to fix it.
If you have any suggestion, just let me know
Following is my error message returned from python.
for line in fi:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
Following is my code:
for filename in os.listdir(readDir):
filename = os.path.join(readDir, filename)
for keyword in keywords:
outFileName = os.path.join(sortDir, keyword)
outFileName = outFileName+'.csv'
with open(filename, 'r') as fi, open(outFileName, "a") as fo:
for line in fi:
I had the same issue and after searching for a while what i did
import sys
#Set default encoder
sys.setdefaultencoding("ISO-8859-1")
#Then convert string to UTF-8
yourString.encode('utf-8').strip()
I hope it will be useful to someone

Categories

Resources