next(iter(..)) function generate an error - python

I apply this code lines in my model
train_data_loader = create_data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)
data =next(iter(train_data_loader))
but I got this error
TypeError Traceback (most recent call last)
<ipython-input-39-8edd470666f3> in <module>()
----> 1 data =next(iter(train_data_loader))
3 frames
/usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self)
393 # (https://bugs.python.org/issue2651), so we work around it.
394 msg = KeyErrorMessage(msg)
--> 395 raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-21-cb3ac03ca3d1>", line 30, in __getitem__
'targets': torch.tensor(target, dtype=torch.long)
TypeError: new(): invalid data type 'str'
my dataset contains 3 columns with types of int64, object, and object.
How can I solve this problem?

Please check your y labels, they should be label encoded and of type int.
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
df["Label"] = label_encoder.fit_transform(df["Label"])

Cannot comment yet hence posting this as an answer.
Be more specific - where's the code for your custom dataset class?
Apparently there's a problem with your 'target' variable. show more code, nobody will be able to help this way.

Related

TypeError: _open() got an unexpected keyword argument 'pilmode'

I am training a CNN model on the COCO dataset, and I am getting this error after a few iterations. The error is not consistent because I got this error in 1100 iterations, once in 4500 iterations and one time in 8900 iterations (all of them in 1 epoch).
I thought that this error might be a bug in the new version of imageio, so I changed the version to 2.3.0 but still, after 8900 iterations in 1 epoch, I am getting this error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-46-4b33bec4a89e> in <module>()
52
53 # train for one epoch
---> 54 train_loss = train(train_loader, model, [criterion1, criterion2], optimizer)
55 print('train_loss: ',train_loss)
56
4 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
432 # instantiate since we don't know how to
433 raise RuntimeError(msg) from None
--> 434 raise exception
435
436
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-34-4c8722b5b16b>", line 143, in __getitem__
image = imageio.imread(img_path, pilmode='RGB')
File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 206, in imread
reader = read(uri, format, 'i', **kwargs)
File "/usr/local/lib/python3.7/dist-packages/imageio/core/functions.py", line 129, in get_reader
return format.get_reader(request)
File "/usr/local/lib/python3.7/dist-packages/imageio/core/format.py", line 168, in get_reader
return self.Reader(self, request)
File "/usr/local/lib/python3.7/dist-packages/imageio/core/format.py", line 217, in __init__
self._open(**self.request.kwargs.copy())
TypeError: _open() got an unexpected keyword argument 'pilmode'
I've had this error before. The TLDR is that you can't assume all of your data is clean and able to be parsed. You aren't loading the data in order as far as I can tell either. You may even have data shuffling enabled. With all of that in mind you should not expect it to fail determinisitically at iteration 100 or 102 or anything.
The issue comes down to one (or more) of the files in COCO dataset is either corrupted or is of a different format. You can process the images in order with a batchsize of 1 and print out the file name to see which one it is.
To "fix" this issue you can do one of several things:
wrap the call to load the image in a try-except block and just skip it.
Convert the image yourself to another appropriate format.
Try a different way to load images in with pytorch.
See here as an example failure scenario when loading in images with imageio.

What is mean by (AttributeError: 'NoneType' object has no attribute '__array_interface__') error?

I am trying to build a ML model to detect landmarks on a cartoon image face. When I split the image dataset in to training and validation sets I got the following error. Here I am using pytorch to build the model. So what is mean by this error?
This is how I split the dataset.
# split the dataset into validation and test sets
len_valid_set = int(0.2*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Valid set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True, num_workers=4)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)
images, landmarks = next(iter(train_loader))
This is the error I got.
The length of Train set is 105
The length of Valid set is 26
AttributeError Traceback (most recent call last)
<ipython-input-61-ffb86a628e37> in <module>()
----> 1 images, landmarks = next(iter(train_loader))
2
3 print(images.shape)
4 print(landmarks.shape)
3 frames
/usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self)
426 # have message field
427 raise self.exc_type(message=msg)
--> 428 raise self.exc_type(msg)
429
430
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataset.py", line 272, in __getitem__
return self.dataset[self.indices[idx]]
File "<ipython-input-12-5595ac89d75d>", line 38, in __getitem__
image, landmarks = self.transform(image, landmarks, self.crops[index])
File "<ipython-input-9-e38df55ee0d4>", line 46, in __call__
image = Image.fromarray(image)
File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2670, in fromarray
arr = obj.__array_interface__
AttributeError: 'NoneType' object has no attribute '__array_interface__'
Basically it says when executing the line image = Image.fromarray(image), the Image.fromarray function is expecting image to be an array and that image implements a function called __array_interface__ that will turn itself into an image. However, during execution image is actually None (a python object type for nothing). Surely you can't turn None into an image.
There could be something wrong with your data. I'd suggest not doing the random split first and check if the each item in the dataset is not None.

Re-fitting a saved scikit-learn model without some features not used - "ValueError: A given column is not a column of the dataframe"

I'd need to re-fit a scikit-learn pipeline using a smaller dataset, without some features that are actually not used by the model.
(The actual situation is that I'm saving it through joblib and loading it in another file where I need to re-fit is since it contains some custom transformers I made, but adding all features would be a pain since it's a different kind of model. However this is not important since the same error happens also if I re-fit the model before saving it in the same file where I first trained it).
This is my custom transformer:
class TransformAdoptionFeatures(BaseEstimator, TransformerMixin):
def __init__(self):
pass
def fit(self, X, y=None):
return self
def transform(self, X):
adoption_features = X.columns
feats_munic = [feat for feat in adoption_features if '_munic' in feat]
feats_adj_neigh = [feat for feat in adoption_features
if '_adj' in feat]
feats_port = [feat for feat in adoption_features if '_port' in feat]
feats_to_keep_all = feats_munic + feats_adj_neigh + feats_port
feats_to_keep = [feat for feat in feats_to_keep_all
if 'tot_cumul' not in feat]
return X[feats_to_keep]
And this is my pipeline:
full_pipeline = Pipeline([
('transformer', TransformAdoptionFeatures()),
('scaler', StandardScaler())
])
model = Pipeline([
("preparation", full_pipeline),
("regressor", ml_model)
])
Where ml_model is whichever scikit-learn machine learning model. Both the full_pipeline and the ml_model are already fitted when saving the model. (In the actual model there is a ColumnTransformer intermediate step that represent the actual full_pipeline, since I need to have different transformers for different columns, but I copied only the important one for brevity).
Issue: I reduced the number of features of the dataset I already used to fit everything, removing some features that are not considered in TransformAdoptionFeatures() (they do not get into the features to keep). Then, I tried to re-fit the model to the new dataset with reduced features and I got this error:
Traceback (most recent call last):
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\pandas\core\indexes\base.py", line 2889, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 97, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'tot_cumul_adoption_pr_y_munic'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\utils\__init__.py", line 447, in _get_column_indices
col_idx = all_columns.get_loc(col)
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
raise KeyError(key) from err
KeyError: 'tot_cumul_adoption_pr_y_munic'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\giaco\sbp-abm\municipalities_abm\test.py", line 15, in <module>
modelSBP = model.SBPAdoption(initial_year=start_year)
File "C:\Users\giaco\sbp-abm\municipalities_abm\municipalities_abm\model.py", line 103, in __init__
self._upload_ml_models(ml_clsf_folder, ml_regr_folder)
File "C:\Users\giaco\sbp-abm\municipalities_abm\municipalities_abm\model.py", line 183, in _upload_ml_models
self._ml_clsf.fit(clsf_dataset.drop('adoption_in_year', axis=1),
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\pipeline.py", line 330, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\pipeline.py", line 292, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\joblib\memory.py", line 352, in __call__
return self.func(*args, **kwargs)
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\pipeline.py", line 740, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\compose\_column_transformer.py", line 529, in fit_transform
self._validate_remainder(X)
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\compose\_column_transformer.py", line 327, in _validate_remainder
cols.extend(_get_column_indices(X, columns))
File "C:\Users\giaco\anaconda3\envs\mesa_geo_ml\lib\site-packages\sklearn\utils\__init__.py", line 454, in _get_column_indices
raise ValueError(
ValueError: A given column is not a column of the dataframe
I do not understand what does this error is due to, I thought scikit-learn was not storing the name of the columns that I pass.
I found my error and it was actually in the use of the ColumnsTransformer, that is also the only place where the column names enter.
My error was really simple, I just did not update the list of the columns to apply each transformation to removing the names of the features excluded.

TypeError although same shape: if not (target.size() == input.size()): 'int' object is not callable

This is the error message I get. In the first line, I output the shapes of predicted and target. From my understanding, the error arises from those shapes not being the same but here they clearly are.
torch.Size([6890, 3]) torch.Size([6890, 3])
Traceback (most recent call last):
File "train.py", line 251, in <module>
main()
File "train.py", line 230, in main
train(net, training_dataset, targets, device, criterion, optimizer, epoch, args.epochs)
File "train.py", line 101, in train
loss = criterion(predicted, target.detach().cpu().numpy())
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 443, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/home/hb119056/.local/lib/python3.6/site-packages/torch/nn/functional.py", line 2244, in mse_loss
if not (target.size() == input.size()):
TypeError: 'int' object is not callable
I hope all the relevant context information is provided and if not, please let me know. Thanks for any suggestions!
EDIT: This is the part of the code where this error occurs:
target = torch.from_numpy(np.load(file_dir + '/points/points{:03}.npy'.format(i))).to(device)
rv = torch.zeros(12 * outputs.shape[0])
for j in [x for x in range(10) if x != i]:
source = torch.from_numpy(np.load(file_dir + '/points/points{:03}.npy'.format(j))).to(device)
rv = factor.ransac(source, target, prob, n_iter, tol, device) # some self-written RANSAC-like method
predicted = factor.predict(source, rv, outputs)
print(target.shape, predicted.shape)
loss = criterion(predicted, target.detach().cpu().numpy()) ## error occurs here
criterion is nn.MSELoss().
A little bit late but maybe it will help someone else. Just solved the same problem for myself.
As Alpha said in his answer we cannot call .size() for a numpy array.
But we can call .size() for a tensor.
Therefore, we need to make our target a tensor. You can do it like this:
target = torch.from_numpy(target)
I'm using GPU, so I also needed to send my target to GPU. You can do it like this:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
target = target.to(device)
And then the loss function must work perfectly.
It probably means that you are trying to call a method when a property with the same name is available. If this is indeed the problem, the solution is easy. Simply change the method call into a property access.
If you are comparing in the following way:
compare = (X.method() == Y.method())
Change it to:
compare = (X.method == Y.method)
If this does not answer your question, kindly share the code which you have used to compare the shapes.
that's because your target is a numpy object
File "train.py", line 101, in train:
target.detach().cpu().numpy()
in your code change the target type to numpy.
TLDR try change
loss = criterion(predicted, target.detach().cpu().numpy()) ## error occurs here
to
loss = criterion(predicted, target) ## error occurs here
for example:
In [6]: b = np.ones(3)
In [7]: b.size
Out[7]: 3
In [8]: b.size()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-365705555409> in <module>
----> 1 b.size()
TypeError: 'int' object is not callable

CNTK python API: How to get predictions from the trained model?

I have a trained model which I am loading using CNTK.load_model() function. I was looking at the MNIST Tutorial on the CNTK git repo as reference for model evaluation code. I have created a data reader (which is a MinibatchSource object) and trying to run model.eval(mb) where mb = minibatch_source.next_minibatch(...) (Similar to this answer)
But, I'm getting the following error message
Traceback (most recent call last):
File "LID_test.py", line 162, in <module>
test_and_evaluate()
File "LID_test.py", line 159, in test_and_evaluate
predictions = model.eval(mb)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/ops/functions.py", line 228, in eval
_, output_map = self.forward(arguments, self.outputs, device=device, as_numpy=as_numpy)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/utils/swig_helper.py", line 62, in wrapper
result = f(*args, **kwds)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/ops/functions.py", line 354, in forward
None, device)
File "/home/t-asbahe/anaconda3/envs/cntk-py35/lib/python3.5/site-packages/cntk/utils/__init__.py", line 393, in sanitize_var_map
if len(arguments) < len(op_arguments):
TypeError: object of type 'Variable' has no len()
I have no input_variable named 'Variable' in my model and I don't see any reason to get this error.
P.S.: My inputs are sparse inputs (one-hots)
You have a few options:
Pass a set of data as numpy array (instance in CNTK 202 tutorial) where onehot data is passed in as a numpy array.
pred = model.eval({model.arguments[0]:[onehot]})
Read the minibatch data and pass it to the eval function
eval_input_map = { input : reader_eval.streams.features }
eval_data = reader_eval.next_minibatch(eval_minibatch_size,
input_map = eval_input_map)
mydata = eval_data[input].value
predicted= model.eval(mydata)

Categories

Resources