AWS SageMaker PyTorch: no module named 'sagemaker'

AWS SageMaker PyTorch: no module named 'sagemaker' - python

I have deployed a PyTorch model on AWS with SageMaker, and I try to send a request to test the service. However, I got a very vague error message saying "no module named 'sagemaker'". I have tried to search online, but cannot find posts about similar message.
My client code:
import numpy as np
from sagemaker.pytorch.model import PyTorchPredictor
ENDPOINT = '<endpoint name>'
predictor = PyTorchPredictor(ENDPOINT)
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
Detailed error message:
Traceback (most recent call last):
File "client.py", line 7, in <module>
predictor.predict(np.random.random_sample([1, 3, 224, 224]).tobytes())
File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/sagemaker/predictor.py", line 110, in predict
response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 276, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Users/jiashenc/Env/py3/lib/python3.7/site-packages/botocore/client.py", line 586, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'sagemaker'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/<endpoint name> in account xxxxxxxxxxxxxx for more information.
This bug is because I merge both the serving script and my deploy script together, see below
import os
import torch
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
from torch import cuda
from torchvision.models import resnet50
def model_fn(model_dir):
device = torch.device('cuda' if cuda.is_available() else 'cpu')
model = resnet50()
with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
model.load_state_dict(torch.load(f, map_location=device))
return model.to(device)
def predict_fn(input_data, model):
device = torch.device('cuda' if cuda.is_available() else 'cpu')
model.eval()
with torch.no_grad():
return model(input_data.to(device))
if __name__ == '__main__':
pytorch_model = PyTorchModel(model_data='s3://<bucket name>/resnet50/model.tar.gz',
entry_point='serve.py', role='jiashenC-sagemaker',
py_version='py3', framework_version='1.3.1')
predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))
The root cause is the 4th line in my code. It tries to import sagemaker, which is an unavailable library.

(edit 2/9/2020 with extra code snippets)
Your serving code tries to use the sagemaker module internally. The sagemaker module (also called SageMaker Python SDK, one of the numerous orchestration SDKs for SageMaker) is not designed to be used in model containers, but instead out of models, to orchestrate their activity (train, deploy, bayesian tuning, etc). In your specific example, you shouldn't include the deployment and model call code to server code, as those are actually actions that will be conducted from outside the server to orchestrate its lifecyle and interact with it. For model deployment with the Sagemaker Pytorch container, your entry point script just needs to contain the required model_fn function for model deserialization, and optionally an input_fn, predict_fn and output_fn, respectively for pre-processing, inference and post-processing (detailed in the documentation here). This logic is beautiful :) : you don't need anything else to deploy a production-ready deep learning server! (MMS in the case of Pytorch and MXNet, Flask+Gunicorn in the case of sklearn).
In summary, this is how your code should be split:
An entry_point script serve.py that contains model serving code and looks like this:
import os
import numpy as np
import torch
from torch import cuda
from torchvision.models import resnet50
def model_fn(model_dir):
# TODO instantiate a model from its artifact stored in model_dir
return model
def predict_fn(input_data, model):
# TODO apply model to the input_data, return result of interest
return result
and some orchestration code to instantiate a SageMaker Model object, deploy it to a server and query it. This is run from the orchestration runtime of your choice, which could be a SageMaker Notebook, your laptop, an AWS Lambda function, an Apache Airflow operator, etc - and with the SDK for your choice; don't need to use python for this.
import numpy as np
from sagemaker.pytorch.model import PyTorchModel
pytorch_model = PyTorchModel(
model_data='s3://<bucket name>/resnet50/model.tar.gz',
entry_point='serve.py',
role='jiashenC-sagemaker',
py_version='py3',
framework_version='1.3.1')
predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)
print(predictor.predict(np.random.random_sample([1, 3, 224, 224]).astype(np.float32)))

Related

FastAPI loading model.pb - SavedModel file does not exist error

I'm trying to load a trained model on FastAPI and try pinging it from a notebook (to mimic a frontend call). But keep getting error saying the model file doesn't exist. I'm very new to this, any advice welcome...
Training notebook:
model.save('/data/model')
Downloaded the model and put the whole folder in the FastAPI folder.
File structure in FastAPI:
>> API
>> _pycache_
>> model
>> assets
>> variables
keras_metadata.pb
saved_model.pb
>> pyapi-env
api.py
api.py
from fastapi import FastAPI
from tensorflow.keras.models import load_model
...
#app.get("/predict")
def predict(test):
...
model = load_model("./model/saved_model.pb")
...
Testing notebook:
import requests
url = "http://localhost:8000/predict"
params = {
"test": "testing",
}
res = requests.get(url, params=params)
res.json()
Error: OSError: SavedModel file does not exist at: ./model/saved_model.pb\{saved_model.pbtxt|saved_model.pb}

I had the same issue and this worked for me:
model = load_model("./model/")
It seems like your code was treating "saved_model.pb" as a directory and looking for the model file inside it.

OSError: SavedModel file does not exist at: ../dnn/mpg_model.h5/{saved_model.pbtxt|saved_model.pb}

**
code editor: vscode
cmd: anaconda prompt
I followed the tutorial but why this error?
**
first error was ModuleNotFoundError: No module named 'tensorflow'
but i make env and install it
second error was ModuleNotFoundError: No module named 'flask'
but i make env and install it
i fix them and they work on python
How can I solve this?
# T81-558: Applications of Deep Neural Networks
# Module 13: Advanced/Other Topics
# Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
# For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).
# Deploy simple Keras tabular model with Flask only.
from flask import Flask, request, jsonify
import uuid
import os
from tensorflow.keras.models import load_model
import numpy as np
app = Flask(__name__)
# Used for validation
EXPECTED = {
"cylinders":{"min":3,"max":8},
"displacement":{"min":68.0,"max":455.0},
"horsepower":{"min":46.0,"max":230.0},
"weight":{"min":1613,"max":5140},
"acceleration":{"min":8.0,"max":24.8},
"year":{"min":70,"max":82},
"origin":{"min":1,"max":3}
}
# Load neural network when Flask boots up
model = load_model(os.path.join("../dnn/","mpg_model.h5"))
#app.route('/api/mpg', methods=['POST'])
def calc_mpg():
content = request.json
errors = []
# Check for valid input fields
for name in content:
if name in EXPECTED:
expected_min = EXPECTED[name]['min']
expected_max = EXPECTED[name]['max']
value = content[name]
if value < expected_min or value > expected_max:
errors.append(f"Out of bounds: {name}, has value of: {value}, but should be between {expected_min} and {expected_max}.")
else:
errors.append(f"Unexpected field: {name}.")
# Check for missing input fields
for name in EXPECTED:
if name not in content:
errors.append(f"Missing value: {name}.")
if len(errors) <1:
# Predict
x = np.zeros( (1,7) )
x[0,0] = content['cylinders']
x[0,1] = content['displacement']
x[0,2] = content['horsepower']
x[0,3] = content['weight']
x[0,4] = content['acceleration']
x[0,5] = content['year']
x[0,6] = content['origin']
pred = model.predict(x)
mpg = float(pred[0])
response = {"id":str(uuid.uuid4()),"mpg":mpg,"errors":errors}
else:
# Return errors
response = {"id":str(uuid.uuid4()),"errors":errors}
print(content['displacement'])
return jsonify(response)
if __name__ == '__main__':
app.run(host= '0.0.0.0',debug=True)
#conda
(tf-gpu) (HelloWold) C:\Users\ASUS\t81_558_deep_learning\py>python mpg_server_1.py
2020-05-09 17:25:38.498181: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Traceback (most recent call last):
File "mpg_server_1.py", line 26, in <module>
model = load_model(os.path.join("../dnn/","mpg_model.h5"))
File "C:\Users\ASUS\Envs\HelloWold\lib\site-packages\tensorflow\python\keras\saving\save.py", line 189, in load_model
loader_impl.parse_saved_model(filepath)
File "C:\Users\ASUS\Envs\HelloWold\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 113, in parse_saved_model
constants.SAVED_MODEL_FILENAME_PB))
OSError: SavedModel file does not exist at: ../dnn/mpg_model.h5/{saved_model.pbtxt|saved_model.pb}
from
https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_13_01_flask.ipynb
https://www.youtube.com/watch?v=H73m9XvKHug&t=1056s

The error occurs because your code is trying to load a model that does not exist. From the Notebook file you linked, you will most likely have to run the following:
from werkzeug.wrappers import Request, Response
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
return "Hello World!"
if __name__ == '__main__':
from werkzeug.serving import run_simple
run_simple('localhost', 9000, app)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from sklearn.model_selection import train_test_split
from tensorflow.keras.callbacks import EarlyStopping
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics
df = pd.read_csv(
"https://data.heatonresearch.com/data/t81-558/auto-mpg.csv",
na_values=['NA', '?'])
cars = df['name']
# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())
# Pandas to Numpy
x = df[['cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'year', 'origin']].values
y = df['mpg'].values # regression
# Split into validation and training sets
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.25, random_state=42)
# Build the neural network
model = Sequential()
model.add(Dense(25, input_dim=x.shape[1], activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
monitor = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, mode='auto',
restore_best_weights=True)
model.fit(x_train,y_train,validation_data=(x_test,y_test),callbacks=[monitor],verbose=2,epochs=1000)
pred = model.predict(x_test)
# Measure RMSE error. RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred,y_test))
print(f"After load score (RMSE): {score}")
model.save(os.path.join("./dnn/","mpg_model.h5"))
This will train and save the model that your code is loading.
It also looks like you have a small typo on the line: model = load_model(os.path.join("../dnn/","mpg_model.h5")) which should be changed to model = load_model(os.path.join("./dnn/","mpg_model.h5"))

I was getting the same error trying to load a .h5 model on a raspberry pi.
OSError: SavedModel file does not exist at: ... {saved_model.pbtxt|saved_model.pb}
sudo apt install python3-h5py
Seemed to have solved the issue.
reference

If on windows, the path to the model can cause the error.
For a sanity check, try placing the model in the same folder as the file that you are calling. Then fix your path to call the model from the same folder. This fixed my error.
If this works, then you can figure out how to fix the path issue (perhaps try providing an absolute path).

I got the same error, and I solved it by running the model again and save it with .pb extension instead with the .h5 or .hdf5 extension
Then use the tf.keras.models.load_model('D:\\model_name.pb') using double backslash
I had that error in windows and could solve it

Why am I facing the Unpickling Error while trying to load my model in Flask?

I have tried the pickling and unpickling in jupyter lab and it seems to work as its supposed to but when i run my app.py it gives me following error.
C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Python 3.7\fakenews\venv\lib\site-packages\sklearn\utils\deprecation.py:144: FutureWarning: The sklearn.linear_model.passive_aggressive module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.
warnings.warn(message, FutureWarning)
Traceback (most recent call last):
File "app.py", line 9, in <module>
model = pickle.load(open('model.pkl', 'rb'))
_pickle.UnpicklingError: invalid load key, '\x17'.
Here are my files.
Model.py
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
#Read the data
df=pd.read_csv('C:\\Users\\Hp\\Desktop\\mini project\\news\\news.csv')
#Get shape and head
df.shape
df.head()
#DataFlair - Get the labels
labels=df.label
labels.head()
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=7)
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)
#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
tfidf_test=tfidf_vectorizer.transform(x_test)
#DataFlair - Initialize a PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)
#DataFlair - Predict on the test set and calculate accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')
#DataFlair - Build confusion matrix
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
App.py
import numpy as np
from flask import Flask, request, jsonify, render_template
import pickle
import pandas as pd
app = Flask(__name__)
model = pickle.load(open('model.pkl', 'rb'))
session.clear()
#app.route('/')
def home():
return render_template('index.html')
#app.route('/predict',methods=['POST'])
def predict():
news = request.form["newsT"]
test1 = pd.Series(news, index=[11000])
prediction = model.predict(test1)
return render_template('index.html', prediction_text='Sales should be $ {}'.format(prediction))
if __name__ == "__main__":
app.run(debug=True)
I used this code to pickle---------------
# Save the model as a pickle in a file
joblib.dump(pac, 'model.pkl')
# Load the model from the file
pac_from_joblib = joblib.load('model.pkl')
# Use the loaded model to make predictions
pac_from_joblib.predict(tfidf_test)
This works fine in the lab but seems to give unpickling error while loading app.py.
I am fairly new to this field and am unable to figure out whats wrong even after extensive search online.

It seems like it is an encoding issue. It may because you are joblib to save a pickle model and try to load that same model vai pickle library. Try loading the model again by using joblib
model = joblib.load('model.pkl')
I hope it helps.

Couldn't create model.tar.gz file while training scikit learn model in AWS Sagemaker

I want to create an endpoint for scikit logistic regression in AWS Sagemaker. I have a train.py file which contains training code for scikit sagemaker.
import subprocess as sb
import pandas as pd
import numpy as np
import pickle,json
import sys
def install(package):
sb.call([sys.executable, "-m", "pip", "install", package])
install('s3fs')
import argparse
import os
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--solver', type=str, default='liblinear')
# Data, model, and output directories
parser.add_argument('--output_data_dir', type=str, default=os.environ.get('SM_OUTPUT_DIR'))
parser.add_argument('--model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
args, _ = parser.parse_known_args()
# ... load from args.train and args.test, train a model, write model to args.model_dir.
input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
if len(input_files) == 0:
raise ValueError(('There are no files in {}.\n' +
'This usually indicates that the channel ({}) was incorrectly specified,\n' +
'the data specification in S3 was incorrectly specified or the role specified\n' +
'does not have permission to access the data.').format(args.train, "train"))
raw_data = [ pd.read_csv(file, header=None, engine="python") for file in input_files ]
df = pd.concat(raw_data)
y = df.iloc[:,0]
X = df.iloc[:,1:]
solver = args.solver
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(solver=solver).fit(X, y)
from sklearn.externals import joblib
def model_fn(model_dir):
lr = joblib.dump(lr, "model.joblib")
return lr
In my sagemaker notebook I ran the following code
import os
import boto3
import re
import copy
import time
from time import gmtime, strftime
from sagemaker import get_execution_role
import sagemaker
role = get_execution_role()
region = boto3.Session().region_name
bucket=<bucket> # Replace with your s3 bucket name
prefix = <prefix>
output_path = 's3://{}/{}/{}'.format(bucket, prefix,'output_data_dir')
train_data = 's3://{}/{}/{}'.format(bucket, prefix, 'train')
train_channel = sagemaker.session.s3_input(train_data, content_type='text/csv')
from sagemaker.sklearn.estimator import SKLearn
sklearn = SKLearn(
entry_point='train.py',
train_instance_type="ml.m4.xlarge",
role=role,output_path = output_path,
sagemaker_session=sagemaker.Session(),
hyperparameters={'solver':'liblinear'})
I'm fitting my model here
sklearn.fit({'train': train_channel})
Now, for creating endpoint,
from sagemaker.predictor import csv_serializer
predictor = sklearn.deploy(1, 'ml.m4.xlarge')
While trying to create endpoint, it is throwing
ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Could not find model data at s3://<bucket>/<prefix>/output_data_dir/sagemaker-scikit-learn-x-y-z-000/output/model.tar.gz.
I checked my S3 bucket. Inside my output_data_dir there is sagemaker-scikit-learn-x-y-z-000 dir which has debug-output\training_job_end.ts file. An additional directory got created outside my <prefix> folder with name sagemaker-scikit-learn-x-y-z-000 that has source\sourcedir.tar.gz file. Generally whenever I trained my models with sagemaker built-in algorithms, output_data_dir\sagemaker-scikit-learn-x-y-z-000\output\model.tar.gz kind of files get created. Can someone please tell me where my scikit model got stored, how to push source\sourcedir.tar.gz inside my prefix code without having doing it manually and how to see contents of sourcedir.tar.gz?
Edit: I elaborated the question regarding prefix. Whenever I run sklearn.fit(), two files with same name sagemaker-scikit-learn-x-y-z-000 are getting created in my S3 bucket. One created inside my <bucket>/<prefix>/output_data_dir/sagemaker-scikit-learn-x-y-z-000/debug-output/training_job_end.ts and other file is created in <bucket>/sagemaker-scikit-learn-x-y-z-000/source/sourcedir.tar.gz. Why is the second file not created inside my <prefix> like the first one? What is contained in sourcedir.tar.gz file?

I am not sure if your model is really stored, if you can't find it in S3. While you define a function with the call of joblib.dump in your entry point script, I am having the call at the end of the main. For example:
# persist model
path = os.path.join(args.model_dir, "model.joblib")
joblib.dump(myestimator, path)
print('model persisted at ' + path)
Then the file can be found in ..\output\model.tar.gz just as in your other cases. In order to double-check that is created you maybe want to have a print statement that can be found in the protocol of the training.

You must dump the model as the last step of your training code. Currently you are doing it in the wrong place, as model_fn goal is to load the model for inference, not for training.
Add the dump after training:
lr = LogisticRegression(solver=solver).fit(X, y)
lr = joblib.dump(lr, args.model_dir)
Change model_fn() to load the model instead of dumping it.
See more here.

This post here explains it well:
https://towardsdatascience.com/deploying-a-pre-trained-sklearn-model-on-amazon-sagemaker-826a2b5ac0b6
In short, the tar.gz gets created by tar-gz-ing the model.joblib binary which was first created joblib.dump. To quote the article:
#Build tar file with model data + inference code
bashCommand = "tar -cvpzf model.tar.gz model.joblib inference.py"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
The inference.py is probably optional.

Using Rasa NLU model with python API instead of HTTP server

Is there a way to use https://nlu.rasa.com model without the HTTP server ? I want to use it as a python library/module.

Yes, and this is documented in there docs at nlu.rasa.com specifically this section.
As of version 0.12.3:
Training
from rasa_nlu.training_data import load_data
from rasa_nlu.config import RasaNLUModelConfig
from rasa_nlu.model import Trainer
from rasa_nlu import config
training_data = load_data('data/examples/rasa/demo-rasa.json')
trainer = Trainer(config.load("sample_configs/config_spacy.yml"))
trainer.train(training_data)
model_directory = trainer.persist('./projects/default/') # Returns the directory the model is stored in
Parsing
from rasa_nlu.model import Metadata, Interpreter
# where `model_directory points to the folder the model is persisted in
interpreter = Interpreter.load(model_directory)
interpreter.parse(u"The text I want to understand")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

AWS SageMaker PyTorch: no module named 'sagemaker' - python

Related

FastAPI loading model.pb - SavedModel file does not exist error

OSError: SavedModel file does not exist at: ../dnn/mpg_model.h5/{saved_model.pbtxt|saved_model.pb}

Why am I facing the Unpickling Error while trying to load my model in Flask?

Couldn't create model.tar.gz file while training scikit learn model in AWS Sagemaker

Using Rasa NLU model with python API instead of HTTP server

Categories

Resources