How to resume from a saved snapshot in chainer using python - python

How to resume training from the saved snapshot in chainer.I was trying to implement DCGAN using chainer using the following github link:
https://github.com/chainer/chainer/blob/master/examples/dcgan/train_dcgan.py
When I try to give the --resume parameter it is showing shape mismatch error in the network.
In the python code there is option to give snaptshot from which we need to resume the training.These snapshots are automatically getting saved to result folder.That is also given as argument in the code.So I tried to resume training from the saved snapshot by giving the below command.
$ python train.py --resume 'snapshot.npz'
where train.py is the modified code with label for dcgan ciphar10 dataset.
The error I got by giving the above command is :
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: LinearFunction (Forward)
Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100
When I run the python file with the below command there is no error:
$ python train.py
Complete error trace:
Exception in main training loop:
Invalid operation is performed in: LinearFunction (Forward)
Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100
Traceback (most recent call last):
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 315, in run
update()
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py, line 165, in update
self.update_core()
File /home/964769/Lakshmi/DCGAN/updater_with_label.py, line 50, in update_core
x_fake = gen(z,labels)
File /home/964769/Lakshmi/DCGAN/net_with_label.py, line 61, in call
h = F.reshape(F.relu(self.bn0(self.l0(F.concat((z,t),axis=1)))),
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/link.py, line 242, in call
out = forward(args, *kwargs)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/links/connection/linear.py, line 138, in forward
return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 288, in linear
y, = LinearFunction().apply(args)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 245, in apply
self.check_data_type_forward(in_data)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 330, in check_data_type_forward
self.check_type_forward(in_type)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 27, in check_type_forward
x_type.shape[1] == w_type.shape[1],
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 546, in expect
expr.expect()
File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 483, in expect
'{0} {1} {2}'.format(left, self.inv, right))
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
Filetrain.py, line 140, in
main()
Filetrain.py, line 135, in main
trainer.run()
File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 329, in run
six.reraise(sys.exc_info())
File /home/964769/anaconda3/lib/python3.6/site-packages/six.py, line 686, in reraise
raise value
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 315, in run
update()
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py, line 165, in update
self.update_core()
File /home/964769/Lakshmi/DCGAN/updater_with_label.py, line 50, in update_core
x_fake = gen(z,labels)
File /home/964769/Lakshmi/DCGAN/net_with_label.py, line 61, in call
h = F.reshape(F.relu(self.bn0(self.l0(F.concat((z,t),axis=1)))),
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/link.py, line 242, in call
out = forward(args, **kwargs)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/links/connection/linear.py, line 138, in forward
return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 288, in linear
y, = LinearFunction().apply(args)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 245, in apply
self.check_data_type_forward(in_data)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 330, in check_data_type_forward
self.check_type_forward(in_type)
File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 27, in checktype_forward
x_type.shape[1] == w_type.shape[1],
File "/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 546, in expect
expr.expect()
File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/type_check.py", line 483, in expect
'{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType:
Invalid operation is performed in: LinearFunction (Forward)
Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100

Related

error on search image in python image_match library

I'm using python image_match library. I need to use search_image method of this library. but when I se this method I got the below error:
Traceback (most recent call last):
File "/var/www/html/Panel/test2.py", line 16, in <module>
ses.search_image('https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
File "/usr/local/lib/python3.10/site-packages/image_match/signature_database_base.py", line 268, in search_image
transformed_record = make_record(img, self.gis, self.k, self.N)
File "/usr/local/lib/python3.10/site-packages/image_match/signature_database_base.py", line 356, in make_record
signature = gis.generate_signature(path)
File "/usr/local/lib/python3.10/site-packages/image_match/goldberg.py", line 161, in generate_signature
im_array = self.preprocess_image(path_or_image, handle_mpo=self.handle_mpo, bytestream=bytestream)
File "/usr/local/lib/python3.10/site-packages/image_match/goldberg.py", line 257, in preprocess_image
return rgb2gray(image_or_path)
File "/usr/local/lib/python3.10/site-packages/skimage/_shared/utils.py", line 394, in fixed_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/skimage/color/colorconv.py", line 875, in rgb2gray
rgb = _prepare_colorarray(rgb)
File "/usr/local/lib/python3.10/site-packages/skimage/color/colorconv.py", line 140, in _prepare_colorarray
raise ValueError(msg)
ValueError: the input array must have size 3 along `channel_axis`, got (1024, 687)
Can you please help me?

TypeError: _get_dataset_for_single_task() got an unexpected keyword argument 'sequence_length' #790

I got the following error in the evaluation of a t5 model:
model.batch_size = train_batch_size * 4
model.eval(
mixture_or_task_name="trivia_all",
checkpoint_steps=-1 #"all"
)
Traceback (most recent call last):
File "train.py", line 140, in <module>
checkpoint_steps=-1 #"all"
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/t5/models/mtf_model.py", line 267, in eval
self._model_dir, dataset_fn, summary_dir, checkpoint_steps)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 2025, in eval_model
for d in decode(estimator, input_fn, vocabulary, checkpoint_path)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 2024, in <listcomp>
d.decode("utf-8") if isinstance(d, bytes) else d
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 1114, in decode
for i, result in enumerate(result_iter):
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3132, in predict
rendezvous.raise_errors()
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 150, in raise_errors
six.reraise(typ, value, traceback)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
raise value
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3126, in predict
yield_single_examples=yield_single_examples):
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 611, in predict
input_fn, ModeKeys.PREDICT)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1007, in _get_features_from_input_fn
result = self._call_input_fn(input_fn, mode)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3041, in _call_input_fn
return input_fn(**kwargs)
File "/home/pouramini/miniconda3/lib/python3.7/site-packages/mesh_tensorflow/transformer/utils.py", line 1182, in input_fn
ds = dataset.dataset_fn(sequence_length=sequence_length)
TypeError: _get_dataset_for_single_task() got an unexpected keyword argument 'sequence_length'
There is a similar issue but I didn't get the solution which is one line.
https://github.com/google-research/text-to-text-transfer-transformer/issues/631
I had installed t5 v0.6.0, which wasn't the newest version. When I installed v0.9.0, the problem was resolved.
pip install t5==0.9.0

Error using set MLFLOW_TRACKING_URI='http://0.0.0.0:5000' for serve models

Hi i need to run a command like this
mlflow server --backend-store-uri postgresql://mlflow_user:mlflow#localhost:5433/mlflow --default-artifact-root file:D:/artifact_root --host 0.0.0.0 --port 5000
for start my serve and i have not problem with this but when i try to run a example
in the route of project from github python
mlflow/examples/sklearn_elasticnet_diabetes/linux/train_diabetes.py 0.1 0.9
i get this error
_model_registry_store_registry.register_entrypoints()
Elasticnet model (alpha=0.100000, l1_ratio=0.900000):
RMSE: 71.98302888908191
MAE: 60.5647520017933
R2: 0.21655161434654602
<function get_tracking_uri at 0x0000017F3AE885E8>
url 'http://0.0.0.0:8001'
url2 'http|//0.0.0.0|8001'
Traceback (most recent call last):
File "train_diabetes.py", line 90, in <module>
mlflow.log_param("alpha", alpha)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 210, in log_param
run_id = _get_or_start_run().info.run_id
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 508, in _get_or_start_run
return start_run()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 148, in start_run
active_run_obj = MlflowClient().create_run(
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 44, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 32, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 126, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 37, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 100, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 387, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 35, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|8001'
before running the python code i run this command to set the env tracking uri for the execution set MLFLOW_TRACKING_URI='http://0.0.0.0:5000'
i donĀ“t know why mlflow replace the : for | i need help. Before this option worked but now it is failing
This looks strange because if you set MLFLOW_TRACKING_URI to http://0.0.0.0:5000, RestStore should be used, but your stacktrace says FileStore is used. Can you run the code below and see what it prints out?
from urllib.parse import urlparse
urlparse('http://0.0.0.0:5000')
this error
$ python train_diabetes.py 0.1 0.9
C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_model_registry\utils.py:106: UserWarning: Failure attempting to register store for scheme "file-plugin": No module named 'mlflow_test_plugin.sqlalchemy_store'
_model_registry_store_registry.register_entrypoints()
Elasticnet model (alpha=0.100000, l1_ratio=0.900000):
RMSE: 71.98302888908191
MAE: 60.5647520017933
R2: 0.21655161434654602
Traceback (most recent call last):
File "train_diabetes.py", line 88, in <module>
mlflow.log_param("alpha", alpha)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 218, in log_param
run_id = _get_or_start_run().info.run_id
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 573, in _get_or_start_run
return start_run()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\fluent.py", line 159, in start_run
active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 54, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 39, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 127, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 38, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 132, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 390, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 33, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|5000'
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tensorflow.py", line 577, in _flush_queue
client = mlflow.tracking.MlflowClient()
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\client.py", line 54, in __init__
self._tracking_client = TrackingServiceClient(final_tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\client.py", line 39, in __init__
self.store = utils._get_store(self.tracking_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 127, in _get_store
return _tracking_store_registry.get_store(store_uri, artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\registry.py", line 38, in get_store
return builder(store_uri=store_uri, artifact_uri=artifact_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\tracking\_tracking_service\utils.py", line 81, in _get_file_store
return FileStore(store_uri, store_uri)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\store\tracking\file_store.py", line 132, in __init__
self.root_directory = local_file_uri_to_path(root_directory or _default_root_dir())
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\site-packages\mlflow\utils\file_utils.py", line 390, in local_file_uri_to_path
return urllib.request.url2pathname(path)
File "C:\Users\kevin.sanchez\Miniconda3\envs\env_mlflow\lib\nturl2path.py", line 33, in url2pathname
raise OSError(error)
OSError: Bad URL: 'http|//0.0.0.0|5000'

cv2.error in cv::copyMakeBorder,when I run python train.py --data data/hat.data --cfg cfg/yolov3-tiny.cfg to use yolo v3

When I use yolo v3 on pytorch to train some data, it throw a error like:
Epoch gpu_mem GIoU obj cls total targets img_size
Traceback (most recent call last):
File "train.py", line 433, in <module>
train() # train normally
File "train.py", line 310, in train
results, maps = test.test(cfg,
File "D:\code\pytorch\yolo\yolov3\test.py", line 74, in test
for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
File "D:\code\anaconda\envs\test\lib\site-packages\tqdm\std.py", line 1108, in __iter__
for obj in iterable:
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\dataloader.py", line 345, in __next__
data = self._next_data()
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\dataloader.py", line 856, in _next_data
return self._process_data(data)
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\dataloader.py", line 881, in _process_data
data.reraise()
File "D:\code\anaconda\envs\test\lib\site-packages\torch\_utils.py", line 394, in reraise
raise self.exc_type(msg)
cv2.error: Caught error in DataLoader worker process 0.
Original Traceback (most recent call last):
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\code\anaconda\envs\test\lib\site-packages\torch\utils\data\_utils\fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "D:\code\pytorch\yolo\yolov3\utils\datasets.py", line 432, in __getitem__
img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)
File "D:\code\pytorch\yolo\yolov3\utils\datasets.py", line 630, in letterbox
img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
cv2.error: OpenCV(4.2.0) C:\projects\opencv-python\opencv\modules\core\src\copy.cpp:1421: error: (-215:Assertion failed) top >= 0 && bottom >= 0 && left >= 0 && right >= 0 && _src.dims() <= 2 in function 'cv::copyMakeBorder'
Class Images Targets P R mAP#0.5 F1: 0%| | 0/5 [00:08<?, ?it/s](test)
I try to find some solution on Google.https://github.com/facebook/Surround360/issues/201 ,but it doesn't help me. So please help me solve this problem.
Here is some of my train images.

How do I fix RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase?

I'm in the midst of learning fastai and I'm constantly running into the '[Errno 32] Broken pipe' error, I have the necessary packages in pip and conda,
and also running an Intel i5-2520M 2.5GHz, Windows 10 64-bit,
Python
3.7.4
, and the CPU version of Pytorch since I don't have a dedicated GPU, and the latest Pytorch
torch 1.3.0+cpu
torchvision 0.4.1+cpu
And fastai
fastai 1.0.58
After launching the script in conda in my environment I get the following in my terminal
------->> biwi_head_pose\09\frame_00667_rgb.jpg
---- Done Gathering Data
epoch train_loss valid_loss time
------->> biwi_head_pose\09\frame_00667_rgb.jpg--------------------------| 0.00% [0/237 00:00<00:00]
---- Done Gathering Data
epoch train_loss valid_loss time
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main__")
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\Eric\Desktop\fastai Practice\L3\HeadPose.py", line 39, in <module>
learn.lr_find()
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\train.py", line 41, in lr_find
learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py", line 200, in fit
fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py", line 99, in fit
for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastprogress\fastprogress.py", line 72, in __iter__
for i,o in enumerate(self._gen):
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_data.py", line 75, in __iter__
for b in self.dl: yield self.proc_batch(b)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
return _DataLoaderIter(self)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
w.start()
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\context.py", line 223, in _Popen
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Traceback (most recent call last):
File "HeadPose.py", line 39, in <module>
learn.lr_find()
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\train.py", line 41, in lr_find
learn.fit(epochs, start_lr, callbacks=[cb], wd=wd)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py", line 200, in fit
fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_train.py", line 99, in fit
for xb,yb in progress_bar(learn.data.train_dl, parent=pbar):
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastprogress\fastprogress.py", line 72, in __iter__
for i,o in enumerate(self._gen):
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_data.py", line 75, in __iter__
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\context.py", line 322, in _Popen
for b in self.dl: yield self.proc_batch(b)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py", line 193, in __iter__
return Popen(process_obj)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
return _DataLoaderIter(self)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py", line 469, in __init__
w.start()
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
reduction.dump(process_obj, to_child)prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\reduction.py", line 60, in dump
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
_check_not_importing_main()
File "C:\Users\Eric\Anaconda3\envs\fastai\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
is not going to be frozen to produce an executable.''')
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
ForkingPickler(file, protocol).dump(obj)
BrokenPipeError: [Errno 32] Broken pipe
As you can see Errno 32 opens up, I have looked around and people say put if __name__ == '__main__': before the training code, however when I do this the error goes away but it does not train.
So this is what I have done(HeadPose.py), first we get the data from the path and convert the images to coords to train
from fastai.vision import *
import matplotlib.pyplot as plt
# Get Data & Convert to Matrices
path = 'C:\\Users\\Eric\\Desktop\\fastai Practice\\L3\\'
cal = np.genfromtxt(path+'biwi_head_pose\\01\\rgb.cal', skip_footer=6); cal
fname = 'biwi_head_pose\\09\\frame_00667_rgb.jpg'
def img2txt_name(f): return path+f'{str(f)[:-7]}pose.txt'
img=open_image(path+fname)
#img.show()
ctr = np.genfromtxt(img2txt_name(fname), skip_header=3); ctr
def convert_biwi(coords):
c1 = coords[0] * cal[0][0]/coords[2] + cal[0][2]
c2 = coords[1] * cal[1][1]/coords[2] + cal[1][2]
return tensor([c2,c1])
def get_ctr(f):
ctr = np.genfromtxt(img2txt_name(f), skip_header=3)
return convert_biwi(ctr)
def get_ip(img,pts): return ImagePoints(FlowField(img.size, pts), scale=True)
get_ctr(fname)
ctr = get_ctr(fname)
#img.show(y=get_ip(img,ctr), figsize=(6,6))
then this Code creates a dataset from the files in the biwi_head_pose folder
# Creating a dataset
data = (PointsItemList.from_folder('biwi_head_pose')
.split_by_valid_func(lambda o: o.parent.name=='13')
.label_from_func(get_ctr)
.transform(get_transforms(), tfm_y=True, size=(120,160))
.databunch().normalize(imagenet_stats))
data.show_batch(3, figsize=(9,6))
print('---- Done Gathering Data')
and this is the code that trains the data with a Convolutional Neural Network.
#Train model
learn = cnn_learner(data, models.resnet34)
learn.lr_find()
learn.recorder.plot()
lr = 2e-2
learn.fit_one_cycle(5, slice(lr))
print('---- Saving as 3-stage-1')
learn.save(p+'3-stage-1')
print('---- Saved!')
print('---- Loading 3-stage-1')
learn.load(p+'3-stage-1');
learn.show_results()
print('---- Done!')
plt.show()
I'm following the tutorial from this fastai github page:
https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-head-pose.ipynb

Categories

Resources