'Magic number mismatch' error when loading mnist dataset - python

I'm trying to load the mnist digit dataset and am routinely getting this error. I'm unable to find any solutions online
This code:
from mnist import MNIST
m = MNIST(path)
x_train, y_train = m.load_training()
Yields this error:
File "<stdin>", line 1, in <module>
File "C:\Python38\lib\site-packages\mnist\loader.py", line 125, in load_training
ims, labels = self.load(os.path.join(self.path, self.train_img_fname),
File "C:\Python38\lib\site-packages\mnist\loader.py", line 250, in load
raise ValueError('Magic number mismatch, expected 2049,'
ValueError: Magic number mismatch, expected 2049,got 529205256
I'm running python-mnist 0.7.

I've just had the exact same error and I fixed it by renaming the files:
First you download the data as gzip files. The should look like this: "train-labels-idx1-ubytes.gz"
then you need to extract these files
then you probably get files similar to this one: "train-labels.idx1-ubyte". The error with this is that the file should be named like "train-labels-idx1-ubyte" (hyphen instead of dot).
If you rename the files like that it should work, at least that's what worked for me.

Related

IndexError: index 89 is out of bounds for axis 0 with size 89

I am getting this error but I dont understand how to solve it, Can anyone please help. I am trying to run a pre-implemented project on my own text dataset. I am using CUB 200-2011 bird dataset which had initially 11788 birds images and author model has been trained and tested with 11788 images . but for some reason many images were missing when I downloaded the dataset so total I have only 2200 images and 1629 images for training. (you can see in the error as well Model is reading 11788 images filenames from somewhere but found only 1629)
After running the training file:
python3 bird_01_pretrain.py
Namespace(audio_model='Davenet', batch_size=128, cfg_file='cfg/Pretrain/bird_train.yml', data_path='data/birds', exp_dir='', gpu_id=0, image_model='VGG16', img_size=256, lr=0.001, lr_decay=50, manualSeed=200, margin=1.0, momentum=0.9, n_epochs=120, n_print_steps=2, optim='adam', pretrained_image_model=False, resume=True, rnn_type='GRU', save_root='outputs/pre_train/birds', simtype='MISA', tasks='extraction', weight_decay=0.001)
Total filenames: 11788 001.Black_footed_Albatross/Black_Footed_Albatross_0046_18.jpg
Load filenames from: data/birds/train/filenames.pickle (1629)
Traceback (most recent call last):
File "bird_01_pretrain.py", line 145, in <module>
dataset = SpeechDataset(cfg.DATA_DIR, 'train',
File "/opt/app/dataset/datasets_pre.py", line 358, in __init__
seq_labels[unique_id[i]-1]=i
IndexError: index 89 is out of bounds for axis 0 with size 89
Code:
# cacluate the sequence label for the whole dataset
if cfg.DATASET_NAME == 'birds'
if self.split =='train':
unique_id = np.unique(self.class_id)
seq_labels = np.zeros(cfg.DATASET_ALL_CLSS_NUM)
for i in range(cfg.DATASET_TRAIN_CLSS_NUM):
seq_labels[unique_id[i]-1]=i
self.labels = seq_labels[np.array(self.class_id)-1]
I am getting an error at this line: seq_labels[unique_id[i]-1]=i
datasets_pre.py file has around 400 lines of codes and calling many other modules as well. I would be happy to share if anyone wants to see whole code and other files as well but for now I am trying to give exact piece of code which is causing the error.
Please help me :)
I can also provide the author open github repo link if anyone wants to look into deeper.

How to implement a custom dataset to pytorch project

I’d like to train a NN with a given dataset (all including some kind of object, for example: a dog), after the training the NN should help me classifying my images (downloaded from instagram) as “image includes a dog (with probability:0.XX)” or “image doesn’t include a dog (with probability: 0.XX)”.
Obviously images from instagram-images do not always have the same size (but they all have the same format (.jpg) due to filtering), and the images from my dataset do not have the same size as well.
While testing, I'm getting this Error:
Traceback (most recent call last):
File "/venv/nn.py", line 129, in <module>
train(model=globalModel, hardware=hw, train_loader=loader, optimizer=optimizer, epoch=1)
File "/venv/nn.py", line 74, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "\venv\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "\venv\lib\site-packages\torch\utils\data\dataloader.py", line 384, in _next_data
index = self._next_index() # may raise StopIteration
File "\venv\lib\site-packages\torch\utils\data\dataloader.py", line 339, in _next_index
return next(self._sampler_iter) # may raise StopIteration
File "\venv\lib\site-packages\torch\utils\data\sampler.py", line 200, in __iter__
for idx in self.sampler:
File "\venv\lib\site-packages\torch\utils\data\sampler.py", line 62, in __iter__
return iter(range(len(self.data_source)))
TypeError: object of type 'type' has no len()
with this code: https://pastebin.com/DcvbeMcq
Does anyone know how to implement a custom dataset right?
Looks like the problem is that you're passing not the instance of customDataset, but the class type itself.
Try changing your loader creation code to
loader = torch.utils.data.DataLoader(customDataset(), batch_size=4)
I fixed all previous bugs and errors. At the moment I'm trying to Label my Input data manually via PyTorch:
train_data = torchvision.datasets.ImageFolder(root=TRAIN_DATA_PATH, transform=TRANSFORM)
In the Folder TRAIN_DATA_PATH are many pictures of, for example, dogs.
How can I manually label them all as "dogs" ?
I tried to implement the train data as traindataloader to label them, but it doesn't work until now.
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)
Since I just want to get my evaluation data predicted as "dog (1)" or "not dog (0)", I have to label all of my dog images as "1" or "dog".
But how can I do that?
Thanks to every reader!
Updated code (testing level): https://hastebin.com/axuvupihed.py

Error when Converting Node Type Slice with onnx_coreml

I have converted my darknet YOLOv3-SPP model into a PyTorch .pt model. I then converted the .pt to a .onnx. My end goal is to get to a CoreML model. I tried to use this GitHub repository. However when converting my model I am getting an error like this...
...
145/229: Converting Node Type LeakyRelu
146/229: Converting Node Type Conv
147/229: Converting Node Type Reshape
148/229: Converting Node Type Transpose
149/229: Converting Node Type Reshape
150/229: Converting Node Type Slice
Traceback (most recent call last):
File "convert2.py", line 11, in <module>
coreml_model = convert(model_proto, image_input_names=['inputImage'], image_output_names=['outputImage'], minimum_ios_deployment_target='13')
File "/usr/local/lib/python3.6/dist-packages/onnx_coreml/converter.py", line 626, in convert
_convert_node_nd(builder, node, graph, err)
File "/usr/local/lib/python3.6/dist-packages/onnx_coreml/_operators_nd.py", line 2387, in _convert_node_nd
return converter_fn(builder, node, graph, err)
File "/usr/local/lib/python3.6/dist-packages/onnx_coreml/_operators_nd.py", line 2011, in _convert_slice
end_masks=end_masks
File "/usr/local/lib/python3.6/dist-packages/coremltools/models/neural_network/builder.py", line 4220, in add_slice_static
assert len(strides) == rank
AssertionError
The script I am using is this...
import sys
from onnx import onnx_pb
from onnx_coreml import convert
model_in = sys.argv[1]
model_out = sys.argv[2]
model_file = open(model_in, 'rb')
model_proto = onnx_pb.ModelProto()
model_proto.ParseFromString(model_file.read())
coreml_model = convert(model_proto, image_input_names=['inputImage'], image_output_names=['outputImage'], minimum_ios_deployment_target='13')
coreml_model.save(model_out)
This simple python script should work, but I don't know why I am getting this error. I am very new to Machine Learning, so I do not understand how I can even begin to try to solve this issue. What should I do in order to convert my .onnx model to CoreML successfully?
Looks like rank mis-match between input tensor rank and slice parameter
Could you please file bug at onnx-coreml
As #matthijs-hollemans commented, try installing latest onnx-coreml
pip install onnx-coreml==1.2
Few other concerns:
What is the version of onnx model you are working with? with Operator-9 slice behavior is changed and that could be potential failure point from converter.
Could you please attach ONNX model as well?

Facing issue while loading the pre-trained model

I've trained my model using google colab and saved it as model.pkl. When I try to load the model in my laptop it is throwing the below error:
Traceback (most recent call last):
File "app.py", line 8, in <module>
model = pickle.load(open('model.pkl', 'rb'))
File "sklearn\tree\_tree.pyx", line 606, in sklearn.tree._tree.Tree.__cinit__
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'
I've done some research on the above error and got to know that the random forest code uses different types for indices on 32-bit and 64-bit machines. I've seen similar question on this platform but NOT satisfied with the accepted answer because the answer suggesting to train the model again which is not suitable in case since there are lot of thing to re-do and i don't want to put load on the server again.
Any suggestions or solutions ?
Not sure about '.pkl' format,but can you try saving it as
model.save('modelweight.h5') and then load as model.load ('modelweight.h5').
This shall work.
Thanks.
try to use cpickle instead of pickle
try:
import cPickle as pickle
except:
import pickle
f = open('model.pkl','w+')
pickle.dump(model, f)#to save the model into file
f = open('model.pkl','r')
model = pickle.load(f)

How to fix 'Error(s) in loading state_dict for AWD_LSTM' when using fast-ai

I am using fast-ai library in order to train a sample of the IMDB reviews dataset. My goal is to achieve sentiment analysis and I just wanted to start with a small dataset (this one contains 1000 IMDB reviews). I have trained the model in a VM by using this tutorial.
I saved the data_lm and data_clas model, then the encoder ft_enc and after that I saved the classifier learner sentiment_model. I, then, got those 4 files from the VM and put them in my machine and wanted to use those pretrained models in order to classify sentiment.
This is what I did:
# Use the IMDB_SAMPLE file
path = untar_data(URLs.IMDB_SAMPLE)
# Language model data
data_lm = TextLMDataBunch.from_csv(path, 'texts.csv')
# Sentiment classifier model data
data_clas = TextClasDataBunch.from_csv(path, 'texts.csv',
vocab=data_lm.train_ds.vocab, bs=32)
# Build a classifier using the tuned encoder (tuned in the VM)
learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5)
learn.load_encoder('ft_enc')
# Load the trained model
learn.load('sentiment_model')
After that, I wanted to use that model in order to predict the sentiment of a sentence. When executing this code, I ran into the following error:
RuntimeError: Error(s) in loading state_dict for AWD_LSTM:
size mismatch for encoder.weight: copying a param with shape torch.Size([8731, 400]) from checkpoint, the shape in current model is torch.Size([8888, 400]).
size mismatch for encoder_dp.emb.weight: copying a param with shape torch.Size([8731, 400]) from checkpoint, the shape in current model is torch.Size([8888, 400]).
And the Traceback is:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/SentAn/mainApp.py", line 51, in <module>
learn = load_models()
File "C:/Users/user/PycharmProjects/SentAn/mainApp.py", line 32, in load_models
learn.load_encoder('ft_enc')
File "C:\Users\user\Desktop\py_code\env\lib\site-packages\fastai\text\learner.py", line 68, in load_encoder
encoder.load_state_dict(torch.load(self.path/self.model_dir/f'{name}.pth'))
File "C:\Users\user\Desktop\py_code\env\lib\site-packages\torch\nn\modules\module.py", line 769, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
So the error occurs when loading the encoder. But, I also tried to remove the load_encoder line but the same error occurred at the next line learn.load('sentiment_model').
I searched through the fast-ai forum and noticed that others also had this issue but found no solution. In this post the user says that this might have to do with different preprocessing, though I couldn't understand why this would happen.
Does anyone have an idea about what I am doing wrong?
It seems vocabulary size of data_clas and data_lm are different. I guess the problem is caused by different preprocessing used in data_clas and data_lm. To check my guess I simply used
data_clas.vocab.itos = data_lm.vocab.itos
Before the following line
learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.3)
This has fixed the error.

Categories

Resources