Tensorflow load_weights() works in Google Colab but fails locally - python

I am working on a Tensorflow 2.0 project and am running into an interesting error. My project works with no errors when I run it in Google Colab. However, when I run the code locally, I get the following error:
ERROR:tensorflow:Couldn't match files for checkpoint followed by the file path to the appropriate checkpoint.
This error occurs when I call model.load_weights(tf.train.latest_checkpoint(CKPT_DIR)). Here is the traceback (on my Windows machine):
Traceback (most recent call last):
File ".\shakespeare_lstm.py", line 159, in <module>
run_model(SEED)
File ".\shakespeare_lstm.py", line 147, in run_model
model.load_weights(tf.train.latest_checkpoint(CKPT_DIR))
File "E:\Github Repos\ShakespeareLSTM\venv\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 181, in load_weights
return super(Model, self).load_weights(filepath, by_name)
File "E:\Github Repos\ShakespeareLSTM\venv\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1139, in load_weights
if _is_hdf5_filepath(filepath):
File "E:\Github Repos\ShakespeareLSTM\venv\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1449, in _is_hdf5_filepath
return (filepath.endswith('.h5') or filepath.endswith('.keras') or
AttributeError: 'NoneType' object has no attribute 'endswith'
This happens on my Windows machine that runs Python 3.6.8 and my MacOS machine that runs Python 3.7.2. The Google Colab runs Python 3.6.9 and throws no errors and executes as expected. All of them have Tensorflow 2.0.0.
My GitHub repo has the project code.
The only difference in the code that I have on Google Colab is the ROOT variable for the file paths (ROOT = os.path.join("/", "content", "drive", "My Drive", "Colab Notebooks", "Shakespeare LSTM")) and the following two additional lines (not directly after one another in the actual code):
from google.colab import drive
drive.mount('/content/drive')
Another difference is that the Google Colab notebook is set up to use a GPU, while my local machines are not. The model was trained on Google Colab, and the checkpoint files were downloaded.
Does anyone have any ideas as to why this code works on Google Colab and not locally?

Related

How to run ssd_resnet_50_fpn_coco network from Tensorflow's Object Detection API in cv2?

I have been using tensorflow's 1.x Object Detection API to train custom object detection models. I like to run these models using cv2, since I already have cv2 available in the inference environment. The particular model that I'm struggling with is the ssd_resnet_50_fpn_coco, which can be found in the model zoo.
To run inference on a model from tensorflow's object detection API using cv2 I need two files, a frozen_inference_graph.pb and a graph.pbtxt as described here, on openCV's wiki page.
The frozen_inference_graph.pb can be created using the API's exportation script, which takes in three checkpoint files (.ckpt) and a configuration file (.config). The graph.pbtxt can be created using the tf_text_graph_ssd.py script provided here.
when I run the tf_text_graph_ssd.py script, pointed at the frozen_inference_graph.pb from the model zoo I get an error:
Traceback (most recent call last):
File "tf_text_graph_ssd.py", line 15, in <module>
from tf_text_graph_common import *
ModuleNotFoundError: No module named 'tf_text_graph_common'
I fix this by copying the tf_text_graph_common.py script to the execution folder, as suggested here. This makes the script run and produce a working graph.pbtxt.
Now this works just fine if I use one of the pre-trained frozen_inference_graph.pb file from tensorflow's model zoo, to generate the graph.pbtxt file. However, if I use tensorflow's exportation script on the corresponding .ckpt files provided in the model zoo, to create a frozen_inference_graph.pb, then the tf_text_graph_ssd.py script fails with the following error:
Levels: [3-7]
Anchor scale: 4.000000
Scales per octave: 2
Aspect ratios: [1.0, 2.0, 0.5]
Number of classes: 90
Number of layers: 5
box predictor: weight_shared_convolutional
Input image size: 640x640
Traceback (most recent call last):
File "tf_text_graph_ssd.py", line 413, in <module>
createSSDGraph(args.input, args.config, args.output)
File "tf_text_graph_ssd.py", line 235, in createSSDGraph
assert(graph_def.node[0].op == 'Placeholder')
AssertionError
When I search for the error I find this closed github page.
I would really like to get this to work, so any help is greatly appreciated
My installation:
kubuntu 20.04
python 3.6.10
tensorflow 1.15.5
libprotoc 3.6.1
opencv-python 4.5.3.56
tensorflow object detection API commit:
b6bb00b4e0e59dfcd2b4b6d307275d3fca14a933 (latest master Mon Sep 6 10:41:53 2021)

access mounted data in jupyter notebook on amazon ec2

I am trying to run a jupyter notebook using data and notebooks that are mounted on an EBS volume on my ec2 instance. My ec2 instance uses ubuntu. My directory structure looks like the following:
/
---|mountedData
---|localData
I used the instructions provided here here to set up the notebook. When I invoke the jupyter notebook command from / or from /localData it is successful. However, I can't navigate to the /mountedData directory (it doesn't even show up on the browser's file navigation screen). If I launch the jupyter notebook from within /mountedData I get an error in the browser
Server error: Traceback (most recent call last): File "/snap/jupyter/6/lib/python3.7/site-packages/tornado/web.py", line 1699, in _execute result = await result File "/snap/jupyter/6/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/snap/jupyter/6/lib/python3.7/site-packages/notebook/services/contents/handlers.py", line 112, in get path=path, type=type, format=format, content=content, File "/snap/jupyter/6/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 431, in get model = self._dir_model(path, content=content) File "/snap/jupyter/6/lib/python3.7/site-packages/notebook/services/contents/filemanager.py", line 313, in _dir_model for name in os.listdir(os_dir): PermissionError: [Errno 13] Permission denied: '/var/lib/snapd/void'`
All directories are owned by root and root is the usergroup. I even tried chmod 777'ing the /mountedData but that didn't help. I also tried symlinking the mounted data within /localData as I saw suggested online but that produces a 404 not found error when I try to click on the symlink. Unfortunately ditching the mounted data is not an option as I am working with TBs of data that I need to mount and attach to ec2 instances. Thanks for the help!
I managed to solve this problem by uninstalling the version of jupyter I had installed via snap, then reinstalling jupyter via anaconda. So the moral of the story is to use anaconda to install python packages!

Tensorflow detection API SsdFeatureExtractor' object has no attribute 'override_base_feature_extractor_hyperparams'

When I use ssd_mobilenet_v1_coco_11_06_2017 model to train myself data set,
I use google TensorFlow detection API to train that
There are some problems will arise.
my os: ubuntu 16.04
./train.sh
Traceback (most recent call last): File "../../train.py", line 167, in
tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py",
line 126, in run
_sys.exit(main(argv)) File "../../train.py", line 163, in main
worker_job_name, is_chief, FLAGS.train_dir) File "/home/feng/project/models/research/object_detection/trainer.py", line
240, in train
detection_model = create_model_fn() File "/home/feng/project/models/research/object_detection/builders/model_builder.py",
line 98, in build
add_background_class) File "/home/feng/project/models/research/object_detection/builders/model_builder.py",
line 166, in _build_ssd_model
is_training=is_training) File "/home/feng/project/models/research/object_detection/builders/model_builder.py",
line 129, in _build_ssd_feature_extractor
feature_extractor_config.override_base_feature_extractor_hyperparams)
AttributeError: 'SsdFeatureExtractor' object has no attribute 'override_base_feature_extractor_hyperparams'
What kind of question is that?
I reccomend you check out the below issue reported on the TensorFlow Object Detection API Git:
https://github.com/tensorflow/models/issues/4121
I ran into this recently after upgrading my TensorFlow Object Detection API and fixed it by refreshing the protobuf bindings.
This kind of error message can happen why you have the wrong protobuf bindings installed, since the new bindings may be incompatible with the old version bindings that are still being referenced. You can fix this by downloading the newest version of protobuf and compiling the new bindings.
To do this follow the instructions for "Manual protobuf-compiler installation and usage" found here: Section Link, but to future proof this against a future broken link, I will note that they currently instruct you to:
Make tensorflow/models/research the current directory
Download and install the latest version of protoc (aka protobuf)
wget -O protobuf.ziP
https://github.com/google/protobuf/releases/download/v3.0.0/protoc-3.0.0-linux-x86_64.zip
unzip protobuf.zip
Run the compilation process with the downloaded version (the old version may still be in your path and you might need it elsewhere).
./bin/protoc object_detection/protos/*.proto --python_out=.
Add libraies to PYTHONPATH
export PYTHONPATH=$PYTHONPATH: `pwd` : `pwd` /slim
Test the installation
python object_detection/builders/model_builder_test.py

GAE "500 Server Error" message when trying to load my application

I'm not sure which log to look at, but in general, all the errors are the same as this paste. I recently upgraded my client machine to Python version 2.7.8, from 2.7.7. The app runs locally.
E 22:20:58.694
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 240, in Handle
handler = _config_handle.add_wsgi_middleware(self._LoadHandler())
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 302, in _LoadHandler
raise err
ImportError: <module 'blog' from '/base/data/home/apps/s~eminent-augury-789/1.380687153152922933/blog.pyc'> has no attribute application
It didn't work under 2.7.7, I was getting the same error. I created another project, with a different project id, and it works. I'm guessing that the app.yaml file was corrupt and I just couldn't find the problem. I upgraded app engine before I started this. I use VIM, for editing. I used the same project id for different projects, just changed the version number, not sure if that was the problem, either. I don't know how to cut and paste in code samples, they get formatted differently, let me work on that and I'll paste the app.yaml file in.

JPype won't compile properly

So I am having trouble compiling a very simple python script using JPype.
My code goes like:
from jpype import *
startJVM(getDefaultJVMPath(), "-ea")
java.lang.System.out.println("hello world")
shutdownJVM()
and when I run it I receive an error saying:
Traceback (most recent call last): File "test.py", line 2, in
<module>
startJVM(getDefaultJVMPath(), "-ea") File "/usr/lib/pymodules/python2.7/jpype/_core.py", line 44, in startJVM
_jpype.startup(jvm, tuple(args), True) RuntimeError: Unable to load DLL [/usr/java/jre1.5.0_05/lib/i386/client/libjvm.so], error =
/usr/java/jre1.5.0_05/lib/i386/client/libjvm.so: cannot open shared
object file: No such file or directory at
src/native/common/include/jp_platform_linux.h:45
I'm stuck and I really need help. Thanks!
I had the same problem
RuntimeError: Unable to load DLL [/usr/java/jre1.5.0_05/lib/i386/client/libjvm.so], error = /usr/java/jre1.5.0_05/lib/i386/client/libjvm.so: cannot open shared object file: No such file or directory at src/native/common/include/jp_platform_linux.h:45
In my case wrong JAVA_HOME path was set
/profile/etc
export JAVA_HOME
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-amd64
PATH="$JAVA_HOME/bin:$PATH"
export PATH
The work around is to define the full path directly in the call to the JVM:
from jpype import *
startJVM('/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/MacOS/libjli.dylib', "-ea", "-Djava.class.path=/tmp/Jpype/sample")
java.lang.System.out.println("Hello World!!")
shutdownJVM()
Original text:
Similar issues when trying to run JPype on MacOS El Capitan. I could
not figure out how to coax the _darwin.py code finding the correct JVM
location, despite the JAVA_HOME system variable being set properly.
Caveat cursor, trying to run the above code in the Spyder IPython console did not produce any output, but the normal Console would.

Categories

Resources