I'm trying to train fastspeech2 from Tensorflow TTS repo.
On single GPU training it is working fine but on multi-GPU training it says that the AttributeError: 'PerReplica' object has no attribute 'numpy'
The file that I'm trying to train is the official fastspeech2 train python file present over here.
My command:
CUDA_VISIBLE_DEVICES=0,1,2,3 python examples/fastspeech2/train_fastspeech2.py \
--train-dir ./dump/train/ \
--dev-dir ./dump/valid/ \
--outdir ./examples/fastspeech2/exp/train.fastspeech2.v1/ \
--config ./examples/fastspeech2/conf/fastspeech2.v1.yaml \
--use-norm 1 \
--f0-stat ./dump/stats_f0.npy \
--energy-stat ./dump/stats_energy.npy \
--mixed_precision 1 \
--resume ""
The error output I get is mentioned below:
Traceback (most recent call last):
File "examples/fastspeech2/train_fastspeech2.py", line 421, in <module>
main()
File "examples/fastspeech2/train_fastspeech2.py", line 413, in main
resume=args.resume,
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 852, in fit
self.run()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 127, in _train_epoch
self._check_eval_interval()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 164, in _check_eval_interval
self._eval_epoch()
File "/home/mydir/.local/lib/python3.6/site-packages/tensorflow_tts/trainers/base_trainer.py", line 747, in _eval_epoch
self.generate_and_save_intermediate_result(batch)
File "examples/fastspeech2/train_fastspeech2.py", line 150, in generate_and_save_intermediate_result
utt_ids = batch["utt_ids"].numpy()
AttributeError: 'PerReplica' object has no attribute 'numpy'
Please help as I'm unable to understand the exact reason for this error to appear on multi-GPU training.
I am currently working with that same repo and came across this error. Unfortunately I don't have a fix for it yet but in the meantime I am using a work around. This error is thrown when the training attempts to evaluate the network. It does this every x iterations depending on what you set eval_internal_steps to in the file "./examples/fastspeech2/conf/fastspeech2.v1.yaml". If you increase this number to something greater than train_max_steps, the function that throws the error is never called.
The function that is throwing this error is generate_and_save_intermediate_result(batch) and from my understanding you can train without it.
Related
2023-01-25 08:21:21,659 - ERROR - Traceback (most recent call last):
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 148, in __process_and_acknowledge
pipeline_result = self.__process_document_type(message, pipeline_input)
File "/home/xyzUser/project/queue_handler/document_queue_listner.py", line 194, in __process_document_type
pipeline_result = bill_parser_pipeline.process(pipeline_input)
File "/home/xyzUser/project/main/billparser/__init__.py", line 18, in process
bill_extractor_model = MachineGeneratedBillExtractorModel()
File "/home/xyzUser/project/main/billparser/models/qa_model.py", line 25, in __new__
cls.__model = TransformersReader(model_name_or_path=cls.__model_path, use_gpu=False)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/base.py", line 48, in wrapper_exportable_to_yaml
init_func(self, *args, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/haystack/nodes/reader/transformers.py", line 93, in __init__
self.model = pipeline(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 542, in pipeline
return task_class(model=model, framework=framework, task=task, **kwargs)
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 125, in __init__
super().__init__(
File "/home/xyzUser/project/.env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 691, in __init__
self.device = device if framework == "tf" else torch.device("cpu" if device < 0 else f"cuda:{device}")
TypeError: '<' not supported between instances of 'torch.device' and 'int'
This is the error message i got after installing a requirement.txt file from my project. I think it is related to torch but also dont know how to fix it. I am new to hugging face transformers and dont know if it is a version issue.
This was a bug with the transformers package for a number of versions prior to v4.22.0, given that particular line of code does not discern between the type of the device argument could be a torch.device before comparing that with an int. Tracing through git blame, we can find that this specific change made in changeset 9d4a45509ab include the much needed if isinstance(device, torch.device): provided by line 764 in the resulting file, which will ensure this error won't happen. Checking the tags above will show that the release for v4.22.0 and after should include this particular fix. As a refresher, to update a specific package, activate the environment, and issue the following:
pip install -U transformers
Alternatively with a specific version, e.g.:
pip install -U transformers==4.22.0
I am trying to run the following github project:
https://github.com/jeffgreenca/laughr
I can run the command: pipenv run python laughr.py --help
but when I ask to mute a laugh from an audio file I get the following error:
Traceback (most recent call last):
File "laughr.py", line 292, in <module>
model=localModel)
File "laughr.py", line 202, in do_mute_laughs
laughr.remove_laughs(sourceFile, outFile)
File "laughr.py", line 147, in remove_laughs
rc.laughs = self.model.predict(rc.build_features())
File "laughr.py", line 73, in build_features
self.raw = self.y.T[0][i * chunkLen:(i + 1) * chunkLen]
IndexError: invalid index to scalar variable.
I think it may be a wrong version of the tensorflow or keras that I am using, but I have no idea which one is right. thanks for listening.
So, I trained an object detection model and now I want to export .ckpt files.
When I try to export the .ckpt files:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/faster_rcnn_inception_v2_pets.config --trained_checkpoint_prefix training3/model.ckpt-47816 --output_directory inference_graph
I get this:
Traceback (most recent call last):
File "export_inference_graph.py", line 147, in <module>
tf.app.run()
File "/home/ubuntu/anaconda3/envs/tensorflow1/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "export_inference_graph.py", line 143, in main
FLAGS.output_directory, input_shape)
File "/home/ubuntu/tensorflow1/models/research/object_detection/exporter.py", line 454, in export_inference_graph
is_training=False)
File "/home/ubuntu/anaconda3/envs/tensorflow1/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 101, in build
add_summaries)
File "/home/ubuntu/anaconda3/envs/tensorflow1/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 274, in _build_faster_rcnn_model
image_resizer_fn = image_resizer_builder.build(frcnn_config.image_resizer)
File "/home/ubuntu/anaconda3/envs/tensorflow1/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/builders/image_resizer_builder.py", line 83, in build
if keep_aspect_ratio_config.per_channel_pad_value:
AttributeError: 'KeepAspectRatioResizer' object has no attribute 'per_channel_pad_value'
It seems that everybody has this working fine and have no problems with this.
Could anyone please tell me what is going on here?
I know this is a few months later, but I just encountered this issue too!
It seems the image_resizer.proto is missing the per_channel_pad_value attribute.
Update the proto file to include the attribute, from here:
https://github.com/tensorflow/models/blob/master/research/object_detection/protos/image_resizer.proto
recompile it and then try again.
Should work this time.
I am trying to export a model by executing export_inference_graph.py script.
I tried with my trained model.ckpt and official example files for ssd_mobilenet_v1_pets.
In cmd I type:
python export_inference_graph.py \ --input_type image_tensor \ --pipeline_config_path training/ssd_mobilenet_v1_pets.config \ --trained_checkpoint_prefix training/model.ckpt-2453 \ --output_directory heart_graph
I am using TensorFlow 1.4 and I always get the following error:
Traceback (most recent call last):
File "export_inference_graph.py", line 119, in <module>
tf.app.run()
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "export_inference_graph.py", line 115, in main
FLAGS.output_directory, input_shape)
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\Lib\site-packages\tensorflow\models\research\object_detection\exporter.py", line 427, in export_inference_graph
input_shape, optimize_graph, output_collection_name)
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\Lib\site-packages\tensorflow\models\research\object_detection\exporter.py", line 353, in _export_inference_graph
postprocessed_tensors = detection_model.postprocess(output_tensors)
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\Lib\site-packages\tensorflow\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 405, in postprocess
class_predictions_without_background)
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\Lib\site-packages\tensorflow\models\research\object_detection\builders\post_processing_builder.py", line 94, in score_converter_fn
scaled_logits = tf.divide(logits, logit_scale, name='scale_logits')
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py", line 309, in divide
return DivideDelegateWithName(x, name) / y
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py", line 294, in __truediv__
return _truediv_python3(self.x, y, self.name)
File "C:\Users\<Name>\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\math_ops.py", line 981, in _truediv_python3
(x_dtype, y_dtype))
TypeError: x and y must have the same dtype, got tf.float32 != tf.int32
Where is the problem and how to solve this?
recently I used object detection to do some interesting things, and occur this error too, I look up the issues on the github and find the solutions.
In the https://github.com/tensorflow/models/issues/2774, there is a solution that changed the source code, and I try it. It WORKS!!
You can find the post_processing_builder.pyand change the function with
def _score_converter_fn_with_logit_scale(tf_score_converter_fn, logit_scale):
"""Create a function to scale logits then apply a Tensorflow function."""
def score_converter_fn(logits):
cr = logit_scale
cr = tf.constant([[cr]],tf.float32)
print(logit_scale)
print(logits)
scaled_logits = tf.divide(logits, cr, name='scale_logits') #change logit_scale
return tf_score_converter_fn(scaled_logits, name='convert_scores')
score_converter_fn.__name__ = '%s_with_logit_scale' % (
tf_score_converter_fn.__name__)
return score_converter_fn
Then go to the research folder, run
python setup.py install
Then it will be OK!
By the way, I don't know whether you should re-install the slim in the research folder, you'd better re-install it too.
I'm trying to go through the tutorial on convolutional neural nets using cifar10. The cnn is being built (cifar10.py) but when I try to run cifar10_train.py I'm getting the following error:
Traceback (most recent call last):
File "cifar10_train.py", line 115, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "cifar10_train.py", line 111, in main
train()
File "cifar10_train.py", line 58, in train
images, labels = cifar10.distorted_inputs()
File "/home/brennus/workspace/python/cifar/cifar10.py", line 141, in distorted_inputs
batch_size=FLAGS.batch_size)
File "/home/brennus/workspace/python/cifar/cifar10_input.py", line 177, in distorted_inputs
float_image = tf.image.per_image_standardization(distorted_image)
AttributeError: 'module' object has no attribute 'per_image_standardization'
According to https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/image.md, there is indeed a per_image_standardization attribute but it looks like my tensorflow doesn't have it. I'm not sure what version I have and not sure where to find it, but I built it from source from the repository so I imagine it's the current one.
I can't find anyone else who is having this problem so I'm stymied. Maybe I have to write my own?
I reinstalled tensorflow and solved the problem. Thanks, all!