I wrote a code for training a particular data set using SVM using Opencv in Python(this is only part of the total code).
svm_params = dict( kernel_type = cv2.ml.SVM_LINEAR ,svm_type = cv2.ml.SVM_C_SVC,C=2.67, gamma=3 )
svm = cv2.ml.SVM_create()
trainingData,labels = getTrainingData()
svm.train(trainingData, labels, params=svm_params)
svm.save('svm_data.dat')
trainingData and labels are lists of features and responses respectively.
When I tried to run the code, I got an error which says:
Traceback (most recent call last):
File "C:\Python27\Python\fc.py", line 156, in <module>
svm.train(trainingData, labels, params=svm_params)
TypeError: an integer is required
Then I tried to convert lists of trainingData, Labels into numpy arrays.
This is the code when I converted them to numpy arrays:
svm_params = dict( kernel_type = cv2.ml.SVM_LINEAR ,svm_type = cv2.ml.SVM_C_SVC,C=2.67, gamma=3 )
svm = cv2.ml.SVM_create()
trainingData,labels = getTrainingData()
trainData = np.asarray(trainingData)
responses = np.asarray(labels)
svm.train(trainData, responses, params=svm_params)
svm.save('svm_data.dat')
Then it gave some other error:
Traceback (most recent call last):
File "C:\Python27\Python\fc.py", line 155, in <module>
svm.train(trainData, responses, params=svm_params)
TypeError: only length-1 arrays can be converted to Python scalars
trainingData and labels are features and their corresponding responses. I tried to print trainingData and labels: First few rows of trainingData are:
[0, 300, 866, 214, 120, 38] [[0, 140, 1620, 276, 162, 18], [0, 111, 1085, 207, 132, 12], [0, 102, 570, 174, 102, 10],..
Labels:
[[1], [1], [1], [1], [1], [1],...
What changes should I make for my code to successfully train the data?
Related
I apply this code lines in my model
train_data_loader = create_data_loader(df_train, tokenizer, MAX_LEN, BATCH_SIZE)
data =next(iter(train_data_loader))
but I got this error
TypeError Traceback (most recent call last)
<ipython-input-39-8edd470666f3> in <module>()
----> 1 data =next(iter(train_data_loader))
3 frames
/usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self)
393 # (https://bugs.python.org/issue2651), so we work around it.
394 msg = KeyErrorMessage(msg)
--> 395 raise self.exc_type(msg)
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "<ipython-input-21-cb3ac03ca3d1>", line 30, in __getitem__
'targets': torch.tensor(target, dtype=torch.long)
TypeError: new(): invalid data type 'str'
my dataset contains 3 columns with types of int64, object, and object.
How can I solve this problem?
Please check your y labels, they should be label encoded and of type int.
from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()
df["Label"] = label_encoder.fit_transform(df["Label"])
Cannot comment yet hence posting this as an answer.
Be more specific - where's the code for your custom dataset class?
Apparently there's a problem with your 'target' variable. show more code, nobody will be able to help this way.
I have tokenized data in the form of a list of unequally shaped arrays:
array([array([1179, 6, 208, 2, 1625, 92, 9, 3870, 3, 2136, 435,
5, 2453, 2180, 44, 1, 226, 166, 3, 4409, 49, 6728,
...
10, 17, 1396, 106, 8002, 7968, 111, 33, 1130, 60, 181,
7988, 7974, 7970])], dtype=object)
With their respective targets:
Out[74]: array([0, 0, 0, ..., 0, 0, 1], dtype=object)
I'm trying to transform them into a padded tf.data.Dataset(), but it won't let me convert unequal shapes to a tensor. I will get this error:
ValueError: Can't convert non-rectangular Python sequence to Tensor.
The full code is here. Assume that my starting point is after y = ...:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
(train_data, test_data) = tfds.load('imdb_reviews/subwords8k',
split=(tfds.Split.TRAIN, tfds.Split.TEST),
as_supervised=True)
x = np.array(list(train_data.as_numpy_iterator()))[:, 0]
y = np.array(list(train_data.as_numpy_iterator()))[:, 1]
train_tensor = tf.data.Dataset.from_tensor_slices((x.tolist(), y))\
.padded_batch(batch_size=8, padded_shapes=([None], ()))
What are my options to turn this into a padded batch tensor?
If your data is stored in Numpy arrays or Python lists, then you can use tf.data.Dataset.from_generator method to create the dataset and then pad the batches:
train_batches = tf.data.Dataset.from_generator(
lambda: iter(zip(x, y)),
output_types=(tf.int64, tf.int64)
).padded_batch(
batch_size=32,
padded_shapes=([None], ())
)
However, if you are using tensorflow_datasets.load function, then there is no need to use as_numpy_iterator to separate the data and the labels, and then put them back together in a dataset! That's redundant and inefficient. The objects returned by tensorflow_datasets.load are already an instance of tf.data.Dataset. So, you just need to use padded_batch on them:
train_batches = train_data.padded_batch(batch_size=32, padded_shapes=([None], []))
test_batches = test_data.padded_batch(batch_size=32, padded_shapes=([None], []))
Note that in TensorFlow 2.2 and above, you no longer need to provide the padded_shapes argument if you just want all the axes to be padded to the longest of the batch (i.e. default behavior).
I am trying to follow this machine learning hello world video: https://www.youtube.com/watch?v=cKxRvEZd3Mw
but I am getting errors that I do not understand. The errors seem to be within the library functions themselves and not with what I wrote directly. I know that the video is from 2016 so something must have changed in that time. I am just beginning with machine learning and need help!
I checked that the syntax is the same, don't know what else to do
from sklearn import tree
features = [[140, 1], [130, 1], [150, 0], [[170, 0]]]
labels = [0, 0, 1, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
print(clf.predict([[150, 0]]))
the output in the video is "[1]", but I am getting these errors instead:
C:\Users\offic\PycharmProjects\helloWorld\venv\Scripts\python.exe "C:/Users/offic/PycharmProjects/helloWorld/Hello World.py"
Traceback (most recent call last):
File "C:/Users/offic/PycharmProjects/helloWorld/Hello World.py", line 5, in <module>
clf = clf.fit(features, labels)
File "C:\Users\offic\PycharmProjects\helloWorld\venv\lib\site-packages\sklearn\tree\tree.py", line 801, in fit
X_idx_sorted=X_idx_sorted)
File "C:\Users\offic\PycharmProjects\helloWorld\venv\lib\site-packages\sklearn\tree\tree.py", line 116, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "C:\Users\offic\PycharmProjects\helloWorld\venv\lib\site-packages\sklearn\utils\validation.py", line 527, in check_array
array = np.asarray(array, dtype=dtype, order=order)
File "C:\Users\offic\PycharmProjects\helloWorld\venv\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
Process finished with exit code 1
I'm trying to get a multilabel model going in tensorflow. I saw a related question here: Multiple labels with tensorflow, but couldn't get the solution working.
The code is from a tensorflow tutorial. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/input_fn/boston.py
FEATURES = ["crim", "zn", "indus", "nox", "rm",
"dis", "tax", "ptratio"]
LABELS = ["medv", "age"]
def get_input_fn(data_set, num_epochs=None, shuffle=True):
return tf.estimator.inputs.pandas_input_fn(
x=pd.DataFrame({k: data_set[k].values for k in FEATURES}),
# y=pd.Series(data_set[LABEL].values),
y=list(map(lambda label: data_set[label].values, LABELS)),
num_epochs=num_epochs,
shuffle=shuffle)
In my regression I set the label dimension to 2.
regressor = tf.estimator.DNNRegressor(feature_columns=feature_cols,
label_dimension=2,
hidden_units=[10, 10],
model_dir="/tmp/boston_model")
With my try I get:
Traceback (most recent call last):
File "./boston.py", line 85, in <module>
tf.app.run()
File "/home/jillian/.eb/software/machine-learning/1.00/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "./boston.py", line 67, in main
regressor.train(input_fn=get_input_fn(training_set), steps=5000)
File "./boston.py", line 43, in get_input_fn
shuffle=shuffle)
File "/home/jillian/.eb/software/machine-learning/1.00/lib/python3.6/site-packages/tensorflow/python/estimator/inputs/pandas_io.py", line 87, in pand
as_input_fn
'Index for y: %s\n' % (x.index, y.index))
ValueError: Index for x and y are mismatched.
Index for x: RangeIndex(start=0, stop=400, step=1)
Index for y: <built-in method index of list object at 0x7f6f64a5bb48>
I also tried setting y to a numpy array instead of a list.
I'm attempting to load 3D images and their labels from a numpy array to TensorFlow records, then read them from a queue while training my network. The code for conversion is based on the conversion for TensorFlow's Inception model.
Each image has a different height, width, and depth value, so when reshaping the array I need to know these values. However, I'm getting an error when I try to use set_shape, as somewhere down the line int() is being used, and it doesn't accept Tensor values.
reader = tf.TFRecordReader()
_, value = reader.read(filename_queue)
# Features in Example proto
feature_map = {
'height': tf.VarLenFeature(dtype=tf.int64),
'width': tf.VarLenFeature(dtype=tf.int64),
'depth': tf.VarLenFeature(dtype=tf.int64),
'label': tf.VarLenFeature(dtype=tf.int64),
'image_raw': tf.VarLenFeature(dtype=tf.string)
}
features = tf.parse_single_example(value, feature_map)
result.label = tf.cast(features['label'].values[0], dtype=tf.int32)
result.height = tf.cast(features['height'].values[0], dtype=tf.int32)
result.width = tf.cast(features['width'].values[0], dtype=tf.int32)
result.depth = tf.cast(features['depth'].values[0], dtype=tf.int32)
image = tf.decode_raw(features['image_raw'].values[0], tf.int16)
image = tf.reshape(image, [result.depth, result.height, result.width])
image = tf.cast(tf.transpose(image, [1, 2, 0]), tf.float32)
result.image = tf.expand_dims(image, 3)
result.image.set_shape([result.height, result.width, result.depth, 1])
result.label = tf.expand_dims(result.label, 0)
result.label.set_shape([1])
Error trace:
Traceback (most recent call last):
File "dsb17_multi_gpu_train.py", line 227, in <module>
tf.app.run()
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "dsb17_multi_gpu_train.py", line 223, in main
train()
File "dsb17_multi_gpu_train.py", line 129, in train
loss = tower_loss(scope)
File "dsb17_multi_gpu_train.py", line 34, in tower_loss
images, labels = dsb17.inputs(False)
File "/home/ubuntu/dsb17/model/dsb17.py", line 104, in inputs
batch_size=FLAGS.batch_size)
File "/home/ubuntu/dsb17/model/dsb17_input.py", line 161, in inputs
read_input = read_data(filename_queue)
File "/home/ubuntu/dsb17/model/dsb17_input.py", line 62, in read_data
result.image.set_shape([result.height, result.width, result.depth, 1])
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 425, in set_shape
self._shape = self._shape.merge_with(shape)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 573, in merge_with
other = as_shape(other)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 821, in as_shape
return TensorShape(shape)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 457, in __init__
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 457, in <listcomp>
self._dims = [as_dimension(d) for d in dims_iter]
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 378, in as_dimension
return Dimension(value)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/tensor_shape.py", line 33, in __init__
self._value = int(value)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Tensor'
I originally thought this was because the Tensor did not have a value until it was evaluated in a session, but loss is being evaluated in a sess.run(), which is what requires the call to tower_loss(). My training code is identical in structure to cifar10_multi_gpu_train.py, and the overall file structure is also very similar.
The question then is: Is it actually being evaluated in a session, or is the graph not built yet? Do I need to somehow extract a value from the zero-dimensional Tensor? More generally, what am I misunderstanding about Tensors and sessions that is making my code not work as I expect it to?
According to TensorFlow's tf.cast docs, tf.cast returns a Tensor.
Your error says that when using set_shape(), you cannot have a Tensor as an argument, but rather an int.
You may try to force Tensorflow to evaluate the cast. This simple example works for me:
a = tf.constant(2.0)
b = tf.constant([1.0,2.0])
b.set_shape(a.eval())
Without the call to eval(), I get the same error as you.
In general you cannot do this using tf.Tensor.set_shape(), because that method expects a static shape. The tensors result.height, result.width, result.depth represent values read from a file, and at runtime they could evaluate to many different integers (depending on what is in your file), so there is no single int that you can pass for them. In that case, the best you can currently do is represent those dimensions as being statically unknown, using None for the unknown dimensions:
result.image.set_shape([None, None, None, 1])
Note that this statement shouldn't change anything, because TensorFlow should already be able to infer that the shape is 4-D with size 1 in the last dimension.
For more details about static and dynamic shapes, see this answer.
Actually you can pass the image shape to the reshape function but you need one more step. Just change the line:
image = tf.reshape(image, [result.depth, result.height, result.width])
to:
image_shape = tf.stack([result.depth, result.height, result.width])
image = tf.reshape(image, image_shape)