NonMatchingSplitsSizesError loading huggingface BookCorpus

NonMatchingSplitsSizesError loading huggingface BookCorpus - python

I want to load bookcorpus like this:
train_ds, test_ds = load_dataset('bookcorpus', split=['train', 'test']),
however, get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/load.py", line 1627, in load_dataset
builder_instance.download_and_prepare(
File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
self._download_and_prepare(
File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/builder.py", line 709, in _download_and_prepare
verify_splits(self.info.splits, split_dict)
File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 74, in verify_splits
raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=4853859824, num_examples=74004228, dataset_name='bookcorpus'), 'recorded': SplitInfo(name='train', num_bytes=2982081448, num_examples=45726619, dataset_name='bookcorpus')}]
I want to proceed to save this to disk as I don't want to download this every time I use it. What causes this error?

BookCorpus is no longer publicly available.
Here is a work around:
https://github.com/soskek/bookcorpus

Related

raise ImageFileError(f'Cannot work out file type of "{filename}"') when working with nnunet

I'm working with nnunet network with a custom dataset.
Everything works fine, I've created the new Task 88, I've launched the
!nnUNet_plan_and_preprocess -t 88 --verify_dataset_integrity
command and all the training cases and after the labels are checked, the problem is that; after this, it shows this error:
Verifying test set
Traceback (most recent call last):
File "/home/viberti/miniconda3/bin/nnUNet_plan_and_preprocess", line 8, in <module>
sys.exit(main())
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/experiment_planning/nnUNet_plan_and_preprocess.py", line 105, in main
verify_dataset_integrity(join(nnUNet_raw_data, task_name))
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/preprocessing/sanity_checks.py", line 223, in verify_dataset_integrity
all_same, unique_orientations = verify_all_same_orientation(join(folder, "imagesTr"))
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nnunet/preprocessing/sanity_checks.py", line 34, in verify_all_same_orientation
img = nib.load(n)
File "/home/viberti/miniconda3/lib/python3.9/site-packages/nibabel/loadsave.py", line 55, in load
raise ImageFileError(f'Cannot work out file type of "{filename}"')
nibabel.filebasedimages.ImageFileError: Cannot work out file type of "/home/viberti/nnUNet_raw_data_base/nnUNet_raw_data/Task088_BraTS2020/imagesTr/BRATS_001_0000.nii.gz"

Getting an error while training Resnet50 on Imagenet at 14th Epoch

I am training Resnet50 on imagenet using the script provided from PyTorch (with a slight trivial tweak for my purpose). However, I am getting the following error after 14 epochs of training. I have allocated 4 gpus in the server I'm using to run this. Any pointers as to what this error is about would be appreciated. Thanks a lot!
Epoch: [14][5000/5005] Time 1.910 (2.018) Data 0.000 (0.191) Loss 2.6954 (2.7783) Total 2.6954 (2.7783) Reg 0.0000 Prec#1 42.969 (40.556) Prec#5 64.844 (65.368)
Test: [0/196] Time 86.722 (86.722) Loss 1.9551 (1.9551) Prec#1 51.562 (51.562) Prec#5 81.641 (81.641)
Traceback (most recent call last):
File "main_group.py", line 549, in <module>
File "main_group.py", line 256, in main
File "main_group.py", line 466, in validate
if args.gpu is not None:
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
return self._process_data(data)
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
OSError: Caught OSError in DataLoader worker process 11.
Original Traceback (most recent call last):
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 138, in __getitem__
sample = self.loader(path)
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 174, in default_loader
return pil_loader(path)
File "/home/users/oiler/anaconda3/envs/ml/lib/python3.7/site-packages/torchvision/datasets/folder.py", line 155, in pil_loader
with open(path, 'rb') as f:
OSError: [Errno 5] Input/output error: '/data/users2/oiler/github/imagenet-data/val/n02102973/ILSVRC2012_val_00009130.JPEG'

It is difficult to tell what the problem is just by looking at the error you have posted.
All we know is that there was an issue reading the file at '/data/users2/oiler/github/imagenet-data/val/n02102973/ILSVRC2012_val_00009130.JPEG'.
Try the following:
Confirm the file actually exists.
Confirm that it is infact a valid JPEG and not corrupted (by viewing it).
Confirm that you can open it with Python and also load it manually with PIL.
If none of that works, try deleting the file. Do you get the same error on another file in the folder?

tensorflow_hub throwing this error: 'SentencepieceOp' when loading the link

The following line of code I am trying to run in PyCharm and I have tensorflow_hub installed and imported.
use = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
Any suggestions for the below error? As I need this for my project.
Traceback (most recent call last):
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3820, in _get_op_def
return self._op_def_cache[type]
KeyError: 'SentencepieceOp'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Jon10/OneDrive/Documents/Computer Science/Dissertation/PythonPractice/TFTest/test.py", line 28, in <module>
use = hub.load("https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3")
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_hub\module_v2.py", line 102, in load
obj = tf_v1.saved_model.load_v2(module_path, tags=tags)
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\saved_model\load.py", line 517, in load
return load_internal(export_dir, tags)
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\saved_model\load.py", line 541, in load_internal
export_dir)
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\saved_model\load.py", line 114, in __init__
meta_graph.graph_def.library))
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\saved_model\function_deserialization.py", line 312, in load_function_def_library
copy, copy_functions=False)
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\function_def_to_graph.py", line 61, in function_def_to_graph
fdef, input_shapes, copy_functions)
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\function_def_to_graph.py", line 214, in function_def_to_graph_def
op_def = ops.get_default_graph()._get_op_def(node_def.op) # pylint: disable=protected-access
File "C:\Users\Jon10\miniconda3\envs\tensorflow\lib\site-packages\tensorflow_core\python\framework\ops.py", line 3824, in _get_op_def
c_api.TF_GraphGetOpDef(self._c_graph, compat.as_bytes(type), buf)
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'SentencepieceOp' in binary running on DESKTOP-..... Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.

You need to install tensorflow_text, and import it before using hub.load

Error exporting inference graph (ValueError)

So I'm following sentdex's object detection tutorial and I have gotten to the step where you are supposed to export the inference graph. I'm using the "export_inference_graph.py" script from Tensorflow's object_detection folder.
The problem is that I'm getting this ValueError:
Traceback (most recent call last):
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\google\proto
buf\internal\python_message.py", line 545, in _GetFieldByName
return message_descriptor.fields_by_name[field_name]
KeyError: 'layout_optimizer'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "export_inference_graph.py", line 119, in <module>
tf.app.run()
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\p
ython\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "export_inference_graph.py", line 115, in main
FLAGS.output_directory, input_shape)
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\object_detec
tion-0.1-py3.5.egg\object_detection\exporter.py", line 427, in export_inference_graph
input_shape, optimize_graph, output_collection_name)
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\object_detec
tion-0.1-py3.5.egg\object_detection\exporter.py", line 391, in _export_inference_graph
initializer_nodes='')
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\object_detec
tion-0.1-py3.5.egg\object_detection\exporter.py", line 72, in freeze_graph_with_def_protos
layout_optimizer=rewriter_config_pb2.RewriterConfig.ON)
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\google\proto
buf\internal\python_message.py", line 484, in init
field = _GetFieldByName(message_descriptor, field_name)
File "C:\Users\Zelcore-Dator\AppData\Local\Programs\Python\Python35\lib\site-packages\google\proto
buf\internal\python_message.py", line 548, in _GetFieldByName
(message_descriptor.name, field_name))
ValueError: Protocol message RewriterConfig has no "layout_optimizer" field.
I'm guessing that it has something to do with protobuf, but I've reinstalled it several times already with no success.
All help appreciated

Happened to me too. Didn't happen few weeks ago.
Until the bug is fixed, you could use an earlier version that still works.
replace line 72 in 'object_detection/exporter.py':
layout_optimizer=rewriter_config_pb2.RewriterConfig.ON)
with the old and working line:
optimize_tensor_layout=True)

I used:
rewrite_options = rewriter_config_pb2.RewriterConfig(optimize_tensor_layout=True)
but kept running into the same issue UNTIL I went and reran
python setup.py install
from my "research" folder. Then I was able to get everything to work.

Remove optimize_tensor_layout=rewriter_config_pb2.RewriterConfig.ON
change the line 71 in exporter.py
rewrite_options = rewriter_config_pb2.RewriterConfig(optimize_tensor_layout=rewriter_config_pb2.RewriterConfig.ON)
to:
rewrite_options = rewriter_config_pb2.RewriterConfig()

Type error on unpickle

I have a dictionary that's pickled into a file "offsets.pickle". When I try to unpickle it with
import pickle
a = open('offsets.pickle')
b = pickle.load(a)
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/global/python-2.7.3-intel/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/global/python-2.7.3-intel/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/global/python-2.7.3-intel/lib/python2.7/pickle.py", line 1198, in load_setitem
dict[key] = value
TypeError: 'str' object does not support item assignment
My code runs nightly and was running fine for a while, when suddenly this started to happen. Is it possible for me to recover what is stored in the file?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

NonMatchingSplitsSizesError loading huggingface BookCorpus - python

BookCorpus is no longer publicly available. Here is a work around: https://github.com/soskek/bookcorpus

Related

raise ImageFileError(f'Cannot work out file type of "{filename}"') when working with nnunet

Getting an error while training Resnet50 on Imagenet at 14th Epoch

tensorflow_hub throwing this error: 'SentencepieceOp' when loading the link

Error exporting inference graph (ValueError)

Type error on unpickle

Categories

Resources