Problem with executing python machine learning code I found on GitHub - python

I need some clear instructions on how to execute some code.
Context:
This is a python machine learning peptide binding script, but you don't need to know biology to help me.
I am trying to recreate this scientific paper to test its validity and if I can use it. I work in the biotech industry and am only somewhat familiar with C# and python.
The paper is linked to a GitHub page. And the GitHub page has some instructions on how to execute the code. But every time I try to execute this code as instructed, it gives me an error. I already installed its requirements of the most updated pytorch, numpy, scikit-learn; I also switched between GPU and CPU, but no method worked. I don't know what to do at this point.
Paper Title:
"Prediction of Specific TCR-Peptide Binding From Large Dictionaries of TCR-Peptide Pairs" by Ido Springer, Hanan Besser. etc.
Paper's Github8 (found in the paper's abstract):
https://github.com/louzounlab/ERGO
These are the example codes I input in the terminal. The example code was found in a comment at the end of ERGO.py
GPU ver:
python ERGO.py train lstm mcpas specific cuda:0 --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
GPU code results:
Traceback (most recent call last): File "D:\D Download\ERGO-master\ERGO.py", line 437, in <module>
main(args) File "D:\D Download\ERGO-master\ERGO.py", line 141, in main
model, best_auc, best_roc = lstm.train_model(train_batches, test_batches, args.device, arg, params) File "D:\D Download\ERGO-master\lstm_utils.py", line 163, in train_model
model.to(device) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 927, in to
return self._apply(convert) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
param_applied = fn(param) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda\__init__.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled
CPU code ver (only replaced specific cuda:0 with specific cpu):
python ERGO.py train lstm mcpas specific cpu --model_file=model.pt --train_data_file=train_data --test_data_file=test_data
CPU code results:
epoch: 1 C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py:1960: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead. warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.") Traceback (most recent call last): File "D:\D Download\ERGO-master\ERGO.py", line 437, in <module>
main(args) File "D:\D Download\ERGO-master\ERGO.py", line 141, in main
model, best_auc, best_roc = lstm.train_model(train_batches, test_batches, args.device, arg, params) File "D:\D Download\ERGO-master\lstm_utils.py", line 173, in train_model
loss = train_epoch(batches, model, loss_function, optimizer, device) File "D:\D Download\ERGO-master\lstm_utils.py", line 137, in train_epoch
loss = loss_function(probs, batch_signs) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\loss.py", line 613, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction) File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\functional.py", line 3074, in binary_cross_entropy
raise ValueError( ValueError: Using a target size (torch.Size([50])) that is different to the input size (torch.Size([50, 1])) is deprecated. Please ensure they have the same size.

Looking at the ValueError, it seems that what you're trying to do is deprecated in pytorch, so you have a more recent version of the package than the one it was developed in. I suggest you try
pip install pytorch 1.4.0
in command line.
I'm not familiar with pytorch but menaging tensor shapes in tensorflow is the biggest pain in the a** for me. What it actually looks like to be the problem is that the input has an extra dimension than it should, so you would have to manually reshape it.

Related

YOLOv7 segmentation: error with using pre-trained weights

I would like to use YOLOv7 for segmentation on my custom dataset and custom classes.
I am already able to run the 'normal' YOLO version with my data and using the yolov7.pt weights.
But when I am using the yolov7-mask.pt weights, I end up having an error:
Traceback (most recent call last):
File "train.py", line 616, in <module>
train(hyp, opt, device, tb_writer)
File "train.py", line 71, in train
run_id = torch.load(weights, map_location=device).get('wandb_id') if weights.endswith('.pt') and os.path.isfile(weights) else None
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 789, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 1131, in _load
result = unpickler.load()
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 1124, in find_class
return super().find_class(mod_name, name)
AttributeError: Can't get attribute 'Merge' on <module 'models.common' from '/content/yolov7/models/common.py'>
I also saw that this error is not specific to me, but not a solution.
Also, this tutorial does not use pre-trained weights and does not mention why it does so.
When I do not use pretrained weights the code compiles, but I did not check yet how good it is (I assume will take much longer to train).
Any advice will be appreciated.

How to avoid "RuntimeError: error in LoadLibraryA" for torch.cat?

I am running a pytorch solution for wireframe detection. I am receiving a "RuntimeError: error in LoadLibraryA" when the solution executes "forward return torch.cat(outputs, 1)"
I am not able to provide a minimal re-producable example. Therefore the quesion: Is it possible to produce just type of error in a microsoft library by python programming errors, or is this most likely a version (of python, pytorch, CUDA,...) problem or a bug in my installation?
I am using windows 10, python 3.8.1 and pytorch 1.4.0.
File "main.py", line 144, in <module>
main()
File "main.py", line 137, in main
trainer.train(train_loader, val_loader=None)
File "D:\Dev\Python\Projects\wireframe\wireframe\junc\trainer\balance_junction_trainer.py", line 75, in train
self.step(epoch, train_loader)
File "D:\Dev\Python\Projects\wireframe\wireframe\junc\trainer\balance_junction_trainer.py", line 176, in step
) = self.model(input_var, junc_conf, junc_res, bin_conf, bin_res)
File "D:\Dev\Python\Environment\Environments\pytorch\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "D:\Dev\Python\Projects\wireframe\wireframe\junc\model\inception.py", line 41, in forward
base_feat = self.base_net(im_data)
File "D:\Dev\Python\Environment\Environments\pytorch\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "D:\Dev\Python\Projects\wireframe\wireframe\junc\model\networks\inception_v2.py", line 63, in forward
x = self.Mixed_3b(x)
File "D:\Dev\Python\Environment\Environments\pytorch\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "D:\Dev\Python\Projects\wireframe\wireframe\junc\model\networks\inception_v2.py", line 97, in forward
return torch.cat(outputs, 1)
RuntimeError: error in LoadLibraryA
Try this workground: run the following code after import torch (should be fixed in 1.5):
import ctypes
ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')
It was possible to avoid this error by downgrading to python 3.7.6
Remark: Unfortunately, the first step of the overall processing (run time 3 days on my GPU) creates intermediate results with pickel format 5, which is new in Python 3.8. Therefore, I either have to re-run the first step for 3 days or find another solution. The files with the intermediate results cannot be used with python 3.7.6

How to fully reset Keras?

I am using Keras in conjunction with scikit-optimize, which means that for every iteration of the later, a new Keras model is created, trained and tested. It worked fine until I started using reccurent models with GRU (or LSTM) cells. Now after the first iteration (which completes without an issue) I get the following error :
File "C:\...\my_script.py", line 157, in <module>
n_calls=bayesian_iterations, x0=default_parameters)
File "C:\ProgramData\Anaconda3\lib\site-packages\skopt\optimizer\gp.py", line 228, in gp_minimize
callback=callback, n_jobs=n_jobs)
File "C:\ProgramData\Anaconda3\lib\site-packages\skopt\optimizer\base.py", line 253, in base_minimize
next_y = func(next_x)
File "C:\...\my_script.py", line 121, in compute_cost
model = train_model(args,X_train,y_train,n_epochs=n_epochs)
File "C:\...\my_script.py", line 96, in train_model
activation='elu',alpha=alpha,l2_alpha=l2_alpha)
File "C:\...\my_script.py", line 63, in create_model
x = GRU(units=n_neurons,activation=activation,return_sequences=True)(x)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 499, in __call__
return super(RNN, self).__call__(inputs, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\topology.py", line 619, in __call__
output = self.call(inputs, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 1628, in call
initial_state=initial_state)
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 564, in call
if len(initial_state) != len(self.states):
File "C:\ProgramData\Anaconda3\lib\site-packages\keras\layers\recurrent.py", line 401, in states
num_states = len(self.cell.state_size)
TypeError: object of type 'numpy.int32' has no len()
At the moment I use:
K.reset_uids()
K.clear_session()
Although I didn't use reset_uid() before using recurrent network and it worked fine, I only tried it to see if it improved my situation but it didn't. I don't think this is normal behaviour but I need a quick solution, and I think if there was a way to completely reset Keras it would work (since it works on the first iteration with no issue). I tried using from importlib import reload then reload(keras) but it didn't change (although I import Keras modules separately using from keras import something so I don't know if this changes that). I couldn't find a way to fully reset Keras hence my question here.

Module object has no attribute leaky_relu

I am trying to run the code from here which is an implementatino of Generative Adversarial Networks using keras python. I followed the instructions and install all the requirements. Then i tried to run the code for DCGAN. However, it seems that there is some issue with the compatibility of the libraries. I am receiving the following message when i am running the code:
AttributeError: 'module' object has no attribute 'leaky_relu'
File "main.py", line 176, in <module>
dcgan = DCGAN()
File "main.py", line 25, in __init__
self.discriminator = self.build_discriminator()
File "main.py", line 84, in build_discriminator
model.add(LeakyReLU(alpha=0.2))
File "/opt/libraries/anaconda2/lib/python2.7/site-packages/keras/models.py", line 492, in add
output_tensor = layer(self.outputs[0])
File "/opt/libraries/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 617, in __call__
output = self.call(inputs, **kwargs)
File "/opt/libraries/anaconda2/lib/python2.7/site-packages/keras/layers/advanced_activations.py", line 46, in call
return K.relu(inputs, alpha=self.alpha)
File "/opt/libraries/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2918, in relu
x = tf.nn.leaky_relu(x, alpha)
I am using kerasVersion: 2.1.3 while tensorflowVersion: 1.2.1
and TheanoVersion: 1.0.1+40.g757b4d5
Any idea why am I receiving that issue?
EDIT:
The error is located in the line 84 in the build_discriminator:
function:`model.add(LeakyReLU(alpha=0.2))`
According to this answer, leaky_relu was added to tensorflow on version 1.4. So you might wanna check if your tensorflow installation is at least on version 1.4.

tensorflow object detection with my own data, can you help me?

I cloned tensorflow object detection model on githug:
github link
And I want to train this model with my own data (331 samoyed dog's images) following by this blog tutorial click here
My steps:
Created PASCAL VOC format dataset;
download retrained model(ssd_mobilenet_v1_coco_11_06_2017.tar.gz)
change the config file(ssd_mobilenet_v1_pets.config)
initial the training process by this codes:
python object_detection/train.py \
--logtostderr \
--pipeline_config_path=./samoyed_test_and_train/training/ssd_mobilenet_v1_pets.config \
--train_dir=./samoyed_test_and_train/data/train.record
but I receive errors, my os is MacOS,and I tried on AWS,same problem occurs, can you figured out my mistakes ?errors:
INFO:tensorflow:Summary name Learning Rate is illegal; using Learning_Rate instead.
WARNING:tensorflow:From /Users/zhaoenpei/Desktop/dabai-robot-arm/experiments/models/object_detection/meta_architectures/ssd_meta_arch.py:579: all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Please use tf.global_variables instead.
INFO:tensorflow:Summary name /clone_loss is illegal; using clone_loss instead.
2017-08-01 10:34:42.992224: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 10:34:42.992254: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-01 10:35:00.359032: I tensorflow/core/common_runtime/simple_placer.cc:675] Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0
INFO:tensorflow:Restoring parameters from /Users/zhaoenpei/Desktop/dabai-robot-arm/experiments/models/samoyed_test_and_train/training/model.ckpt
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.FailedPreconditionError'>, ./samoyed_test_and_train/data/train.record/graph.pbtxt.tmpf4587d1958df43cbaa9a0d7a04199f6f
2017-08-01 10:35:29.556458: E tensorflow/core/util/events_writer.cc:62] Could not open events file: ./samoyed_test_and_train/data/train.record/events.out.tfevents.1501554929.MacBook-Pro.local: Failed precondition: ./samoyed_test_and_train/data/train.record/events.out.tfevents.1501554929.MacBook-Pro.local
2017-08-01 10:35:29.556480: E tensorflow/core/util/events_writer.cc:95] Write failed because file could not be opened.
Traceback (most recent call last):
File "object_detection/train.py", line 198, in <module>
tf.app.run()
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 194, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/Users/zhaoenpei/Desktop/dabai-robot-arm/experiments/models/object_detection/trainer.py", line 290, in train
saver=saver)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 709, in prepare_or_wait_for_session
self._write_graph()
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 612, in _write_graph
self._logdir, "graph.pbtxt")
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/framework/graph_io.py", line 67, in write_graph
file_io.atomic_write_string_to_file(path, str(graph_def))
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 418, in atomic_write_string_to_file
write_string_to_file(temp_pathname, contents)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 305, in write_string_to_file
f.write(file_content)
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 101, in write
self._prewrite_check()
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 87, in _prewrite_check
compat.as_bytes(self.__name), compat.as_bytes(self.__mode), status)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/Users/zhaoenpei/.virtualenvs/python_virtual_1/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: ./samoyed_test_and_train/data/train.record/graph.pbtxt.tmpf4587d1958df43cbaa9a0d7a04199f6f
the train_dir flag is meant to point at some (typically empty) directory where your training logs and checkpoints will be written during training. For example it could be something like train_dir=/tmp/training_directory. It looks like you are trying to point it at your dataset --- which the config file should already be pointing at.

Categories

Resources