SageMaker Neo PyTorch 1.0.0

SageMaker Neo PyTorch 1.0.0 - python

I've updated the torch version in my SageMaker pytorch_36 kernel to torch version 1.0.0. I then tried running the example notebook pytorch_torchvision_neo.ipynb, also changing the framework_version to 1.0.0. Neo compilation then fails.
Any idea why it isn't working with 1.0.0? The console error message actually tells me to make sure I'm using 1.0.0, but the example notebook seems to only work with 0.4.0.

Sagemaker notebook has pytorch-1.1.0 pre-installed.
But Model Compilation service expects model saved by pytorch-0.4.0 or pytorch-1.0.1
Solution to the issue:
# 1. do not install `pytorch-cpu` and `torchvision-cpu`.
# 2. Downgrade pytorch version to 1.0.1
!conda install -y pytorch=1.0.1 -c pytorch
# 3. import pytorch and check that version is 1.0.1 (but not 1.1.0)
import torch
torch.__version__
Continue to run notebook steps: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/pytorch_torchvision/pytorch_torchvision_neo.ipynb

Related

module 'keras.api._v2.keras.experimental' has no attribute 'PeepholeLSTMCell'

I tried to install tensorflow_federated in google colab. I used
pip install --quiet tensorflow-federated-nightly
import tensorflow-federated as tff
and it worked. but now when I try to import it get this error:
AttributeError: module 'keras.api._v2.keras.experimental' has no attribute 'PeepholeLSTMCell'
I don't know why I get this error, because I didn't have any problem before.
I also used the following code to install tensorflow-federated:
pip install --upgrade tensorflow-federated-nightly
but I get the same error.
How do I fix it?
My versions are:
tensorflow 2.8.0,
keras 2.8.0,
tensorflow-federated-nightly 0.19.0.dev20220218

To use TensorFlow Federated with TensorFlow 2.8.0, please try the newly released version of TFF 0.20.0 pypi, github.
The tensorflow-federated-nightly package depends on the nightly versions of TensorFlow (tf-nightly), Keras (keras-nightly) and so on.

using a tensorflow model trained on google colab on my PC

I am using colab to train a tensorflow model. I see that google colab installs the following version by default:
import tensorflow
tensorflow.__version__
2.6.0
...
[train model]
...
model.save('mymodel.h5')
However, when I download the model to my windows pc and try to load it with tensorflow/keras, I get an error
import keras
import tensorflow
model = keras.models.load_model(r"mymodel.h5")
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
After searching on the net, it appears this is due to the different tensorflow versions (colab vs. my PC).
tensorflow.__version__
Out[4]: '2.1.0'
The problem is that when I install tensorflow with conda install tensorflow-gpu this is the version I get. Even trying to force conda install tensorflow-gpu==2.6 does not install anything.
What should I do?
Thanks!

hacky solution for now...
download tensorflow 2.1 + CUDA and CuDNN using conda install tensorflow-gpu
upgrade using pip install tensorflow-gpu==2.6 --upgrade --force-reinstall
The GPU does not work (likely because the CUDA versions are not the right ones) but at least I can run a tf 2.6 script using the CPU.

Tensorflow Federated tutorial in Google Colab giving errors in the initialization code snippet

Here is the cell that needs to be run before starting the tutorial.
##test {"skip": true}
# tensorflow_federated_nightly also bring in tf_nightly, which
# can causes a duplicate tensorboard install, leading to errors.
!pip uninstall --yes tensorboard tb-nightly
!pip install --quiet --upgrade tensorflow_federated_nightly
!pip install --quiet --upgrade nest_asyncio
!pip install --quiet tb-nightly # or tensorboard, but not both
import nest_asyncio
nest_asyncio.apply()
It is giving out following errors:
ERROR: tensorflow 2.4.1 requires tensorboard~=2.4, which is not installed.
ERROR: tensorflow 2.4.1 has requirement gast==0.3.3, but you'll have gast 0.4.0 which is incompatible.
ERROR: tensorflow 2.4.1 has requirement grpcio~=1.32.0, but you'll have grpcio 1.34.1 which is incompatible.
ERROR: tensorflow 2.4.1 has requirement h5py~=2.10.0, but you'll have h5py 3.1.0 which is incompatible.
ERROR: datascience 0.10.6 has requirement folium==0.2.1, but you'll have folium 0.8.3 which is incompatible.
Need help resolving this. I am not much familiar with libraries and classes on Tensorflow.

Even though the console says there was an error, the pip packages should have been installed correctly.
This happens because the notebooks use tensorflow-federated-nightly, which depends on an installs tf-nightly overwriting the base tensorflow install. However pip still thinks the TFF dependencies will conflict with the now overwritten TensorFlow core package.
Adding tensorflow to the !pip uninstall list may make this error go away, but the functionality of the notebook won't change.

You can import tensorflow federated like the following. It solved my error. I tried to follow Federated Learning for Image Classification and while I was trying to import tensorflow_federated it was always giving me error.
from tensorflow_federated import python as tff

Unable to use GPU in Anaconda environment

I want to use GPU & Anaconda environment on Linux.
I'm supposed to have adapted the versions of each module, but it doesn't work.
Cuda and cuDNN are installed by using conda.
The versions of each module and driver are listed below:
・GPU：RTX 2070 SUPEER
・OS：Linux Mint 19.3 Tricia ( Ubuntu 18.04 )
・Nvidia-driver：435.21
# conda list tensorflow
tensorflow 2.1.0 gpu_py37h7a4bb67_0
tensorflow-base 2.1.0 gpu_py37h6c5654b_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
# conda list cudnn
cudnn 7.6.5 cuda10.1_0
# conda list cudatoolkit
cudatoolkit 10.1.243 h6bb024c_0
I can see the GPU by entering the following command
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
When I run the training script, I get the following error
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1d_3/convolution ......
How do I get it to work correctly?

Root cause: lack of hardware resource.
Workaround:
Fresh installed TF 2.0 and ran a simple Minst tutorial, it was alright, opened another notebook, tried to run and encountered this issue.
I exited all notebooks and restarted Jupyter and open only one notebook, ran it successfully. Issue seems to be either memory or running more than one notebook on GPU
More reading here.

how to see where exactly torch is installed pip vs conda torch installation

On my machine i can't "pip install torch" - i get infamous "single source externally managed error" - i could not fix it and used "conda install torch" from anaconda.
Still, checking version is easy - torch.__version__
But how to see where is it installed -the home dir of torch?
Suppose if I had had both torches installed via pip and conda - how to know which one is used in a project?
import torch
print(torch__version__)

You can get torch module location which is imported in your script
import torch
print(torch.__file__)

pip show torch at terminal will give you all the required information.
Name: torch
Version: 1.3.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages#pytorch.org
License: BSD-3
Location: c:\programdata\anaconda3\lib\site-packages
Requires: numpy
Required-by: torchvision, torchtext, efficientunet-pytorch

If you have already installed PyTorch library, then open Google Colab, paste following code and hit the run button:
import torch
print(torch.__file__)
then you see version of PyTorch.
If it is not working then please go to https://pytorch.org/get-started/locally/ and follow the instruction about how to install PyTorch because sometimes Python and PyTorch have dependencies issues.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SageMaker Neo PyTorch 1.0.0 - python

Related

module 'keras.api._v2.keras.experimental' has no attribute 'PeepholeLSTMCell'

using a tensorflow model trained on google colab on my PC

Tensorflow Federated tutorial in Google Colab giving errors in the initialization code snippet

Unable to use GPU in Anaconda environment

how to see where exactly torch is installed pip vs conda torch installation

Categories

Resources