Passing a pandas dataframe to FastAPI for NLP ML

Passing a pandas dataframe to FastAPI for NLP ML - python

I am trying to, for the first time, deploy an NLP ML model. To do this it was suggested that I use FastAPI and uvicorn. I have had some success in getting FastAPI to respond; however, I have not been able to successfully pass the dataframe and have it process it. I've tried using dictionaries and even attempted to convert the passed json to a dataframe.
With data_dict = data.dict() I get:
ValueError: Iterable over raw text documents expected, string object received.
With data_dict = pd.DataFrame(data.dict()) I get:
ValueError: If using all scalar values, you must pass an index
I believe I understand the problem, my Data class is expecting a string which this is not; however, I have not been able to determine how to set and / or pass the expected data so that fit_transform() will work. Ultimately I will have a prediction returned based on the submitted messages value. Bonus if I can pass a dataframe of 1 or more rows and have predictions made and returned for each of the rows. The response will include the id, project, and the prediction so that we are in future able to leverage this response to post the prediction back to the original (requesting) system.
test_connection.py
#%%
import requests
import pandas as pd
import json
import os
from pprint import pprint
url = 'http://127.0.0.1:8000/predict'
print(os.getcwd())
#%%
df = pd.DataFrame(
{
'id': ['ab410483801c38', 'cd34148639180'],
'project': ['project1', 'project2'],
'messages': ['This is message 1', 'This is message 2']
}
)
to_predict_dict = df.iloc[0].to_dict()
#%%
r = requests.post(url, json=to_predict_dict)
main.py
#!/usr/bin/env python
# coding: utf-8
import pickle
import pandas as pd
import numpy as np
from pydantic import BaseModel
from sklearn.feature_extraction.text import TfidfVectorizer
# Server
import uvicorn
from fastapi import FastAPI
# Model
import xgboost as xgb
app = FastAPI()
clf = pickle.load(open('data/xgbmodel.pickle', 'rb'))
class Data(BaseModel):
# id: str
project: str
messages: str
#app.get("/ping")
async def test():
return {"ping": "pong"}
#app.post("/predict")
async def predict(data: Data):
# data_dict = data.dict()
data_dict = pd.DataFrame(data.dict())
tfidf_vect = TfidfVectorizer(stop_words="english", analyzer='word', token_pattern=r'\w{1,}')
tfidf_vect.fit_transform(data_dict['messages'])
# to_predict = tfidf_vect.transform(data_dict['messages'])
# prediction = clf.predict(to_predict)
return {"response": "Success"}

Probably not the most elegant solution but I've made progress using the following:
def predict(data: Data):
data_dict = pd.DataFrame(
{
'id': [data.id],
'project': [data.project],
'messages': [data.messages]
}
)

Frist, encode your dataFrame df to JSON record-oriented:
r = requests.post(url, json=df.to_json(orient='records')).
Then, decode your data inside the /predict/ endpoint with:
df = pd.DataFrame(jsonable_encoder(data))
Remember to import the module from fastapi.encoders import jsonable_encoder.

A new library called pandera now supports direct passage of DataFrames without conversion via FastAPI. The docs are bit basic as of posting this, but may be worth reading: https://pandera.readthedocs.io/en/latest/fastapi.html#fastapi-integration.

I was able to address the issue by simply converting data.messages into a list. I also had to make some unrelated changes, I had failed to pickle my vectorizer (string tokenizer).
import pickle
import pandas as pd
import numpy as np
import json
import time
from pydantic import BaseModel
from sklearn.feature_extraction.text import TfidfVectorizer
# Server / endpoint
import uvicorn
from fastapi import FastAPI
# Model
import xgboost as xgb
app = FastAPI(debug=True)
clf = pickle.load(open('data/xgbmodel.pickle', 'rb'))
vect = pickle.load(open('data/tfidfvect.pickle', 'rb'))
class Data(BaseModel):
id: str = None
project: str
messages: str
#app.get("/ping")
async def ping():
return {"ping": "pong"}
#app.post("/predict/")
def predict(data: Data):
start = time.time()
data_l = [data.messages] # make messages iterable.
to_predict = vect.transform(data_l)
prediction = clf.predict(to_predict)
exec_time = round((time.time() - start), 3)
return {
"id": data.id,
"project": data.project,
"prediction": prediction[0],
"execution_time": exec_time
}
if __name__ == "__main__":
uvicorn.run(app, host="127.0.0.1", port=8000)

Related

How to send form data to Flask App API Pandas Dataframe

I am learning API's but I have been using Pandas for data analysis for some time. Can I send data to an API from a Pandas dataframe?
For example, if I make up some time series data in a Pandas df and attempt to use df.to_json(). Ultimate goal is here to make a Flask API that returns the median value of Value in the Pandas df.
import requests
import pandas as pd
import numpy as np
from numpy.random import randint
np.random.seed(11)
rows,cols = 50000,1
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='T')
df = pd.DataFrame(data, columns=['Value'], index=tidx)
median_val = df.Value.median()
print('[INFO]')
print(median_val)
print('[INFO]')
print(df.head())
json_data = df.to_json()
print('[Sending to API!]')
url = "http://127.0.0.1:5000/api/v1.0/median_val"
print(requests.post(url, json_data).text)
Is it possible (or bad practice) to send a years worth of time series data to an API to get processed? Or how much data can be sent as FORM on an HTTP POST request?
Here is something simple in Flask on a local route shown below which errors out. This is just something I made up on the fly trying to figure it out.
import numpy as np
import pandas as pd
import time, datetime
from datetime import datetime
import json
from flask import Flask, request, jsonify
#start flask app
app = Flask(__name__)
#Simple flask route to return Value average
#app.route("/api/v1.0/median_val", methods=['POST'])
def med_val():
r = request.form.to_dict()
print(r.keys())
df = pd.json_normalize(r)
print(df)
if r.keys() == {'Date','Value'}:
try:
df = pd.json_normalize(r)
df['Date'] = datetime.fromtimestamp(df['Date'].astype(float))
df = pd.DataFrame(df,index=[0])
df = df.set_index('Date')
df['Value'] = df['Value'].astype(float)
median_val = df.Value.median()
except Exception as error:
print("Internal Sever Error {}".format(error))
error_str = str(error)
return error_str, 500
return json.dumps(median_val)
else:
print("Error on api route, rejected unable to process keys")
print("rejected unable to process keys")
return 'Bad Request', 400
if __name__ == '__main__':
print("Starting main loop")
app.run(debug=True,port=5000,host="127.0.0.1")
I dont get why the print on the flask side the prints are empty. Any tips greatly appreciated there isnt a lot of wisdom here to web server processes/design.
r = request.form.to_dict()
print(r.keys())
df = pd.json_normalize(r)
print(df)
Full trace back on the Flask side.
dict_keys([])
Empty DataFrame
Columns: []
Index: [0]
Error on api route, rejected unable to process keys
rejected unable to process keys
127.0.0.1 - - [10/Feb/2021 07:50:44] "←[31m←[1mPOST /api/v1.0/median_val HTTP/1.1←[0m" 400 -

I got the code to work :) not using df.to_json() but populating an empty Python dictionary baggage_handler = {} with the data to send to the Flask App Api route to process the data.
Also not super sure on best practices for how much data can be sent as an HTTP POST body but this appears to work on local host :)
Flask APP:
import numpy as np
import pandas as pd
import time, datetime
from datetime import datetime
import json
from flask import Flask, request, jsonify
#start flask app
app = Flask(__name__)
#Simple flask route to return Value average
#app.route("/api/v1.0/median_val", methods=['POST'])
def med_val():
r = request.form.to_dict()
df = pd.json_normalize(r)
print('incoming keys')
print(r.keys())
if r.keys() == {'Value'}:
print('keys are good')
try:
df = pd.json_normalize(r)
df['Value'] = df['Value'].astype(float)
median_val = df.Value.median()
print('median value == ',median_val)
except Exception as error:
print("Internal Sever Error {}".format(error))
error_str = str(error)
return error_str, 00
return json.dumps(median_val)
else:
print("Error on api route, rejected unable to process keys")
print("rejected unable to process keys")
return 'Bad Request', 400
if __name__ == '__main__':
print("Starting main loop")
app.run(debug=True,port=5000,host="127.0.0.1")
HTTP Request script:
import requests
import pandas as pd
import numpy as np
from numpy.random import randint
np.random.seed(11)
rows,cols = 50000,1
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='T')
df = pd.DataFrame(data, columns=['Value'], index=tidx)
median_val = df.Value.median()
print('[INFO]')
print(median_val)
print('[INFO]')
print(df.head())
#create an empty dictionary
baggage_handler = {}
print('[packaging some data!!]')
values_to_send = df.Value.tolist()
baggage_handler['Value'] = values_to_send
print('[Sending to API!]')
response = requests.post('http://127.0.0.1:5000/api/v1.0/median_val', data=baggage_handler)
print("RESPONCE TXT", response.json())
data = response.json()
print(data)

Sentinel API error on Python: HTTP status 200 OK: API response not valid. JSON decoding failed

I have been trying to follow an easy tutorial on how to get sentinel 2 images for a series of polygons I have. For some reason, no matter what I do I keep running into the same error (detailed above).
from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt
import geopandas as gpd
import folium
import rasterio as rio
from rasterio.plot import show
from rasterio.mask import mask
import matplotlib.pyplot as plt
from pyproj import Proj, transform
import pandas as pd
import os
from datetime import date
import sentinelhub
user = 'xxxxx'
password = 'xxxxx'
url = 'https://scihub.copernicus.eu/dhus'
api = SentinelAPI(user, password, url)
validation = gpd.read_file('EarthData/tutakoke_permafrost_validation/Tutakoke_permafrost_validation.shp')
plateau_transects = gpd.read_file('EarthData/tutakoke_permafrost_plateau_transects/Tutakoke_Permafrost_Plateau_Transects.shp')
validation = validation.set_crs(epsg=32604, inplace=True, allow_override=True)
validation['imdate']='01-01-2019'
validation['imdate'] = pd.to_datetime(validation2['imdate'])
validation['geometry2'] = validation.geometry.buffer(2, cap_style=3)
footprint=validation['geometry2'][1]
products = api.query(footprint,
date = ('20200109', '20200510'),
platformname = 'Sentinel-2',
processinglevel = 'Level-2A',
cloudcoverpercentage = (0, 20))
The error I keep getting is:
SentinelAPIError: HTTP status 200 OK: API response not valid. JSON decoding failed.

Ah - it was that my footprint was not in the correct lat lon format!

How to deploy Keras-yolo model to the web with Flask?

I'm successfully trained my own dataset using Keras yolov3 Github project link
and I've got good predictions:
I would like to deploy this model on the web using flask to make it work with a stream or with IP cameras.
I saw many tutorials explains how to do that but, in reality, I did not find what I am looking for.
How can I get started?

You can use flask-restful to design a simple rest API.
You can use opencv VideoCapture to grab the video stream and get frames.
import numpy as np
import cv2
# Open a sample video available in sample-videos
vcap = cv2.VideoCapture('URL')
The client will take an image/ frame, encode it using base64, add other details like height, width, and make a request.
import numpy as np
import base64
import zlib
import requests
import time
t1 = time.time()
for _ in range(1000): # 1000 continuous request
frame = np.random.randint(0,256, (416,416,3), dtype=np.uint8) # dummy rgb image
# replace frame with your image
# compress
data = frame # zlib.compress(frame)
data = base64.b64encode(data)
data_send = data
#data2 = base64.b64decode(data)
#data2 = zlib.decompress(data2)
#fdata = np.frombuffer(data2, dtype=np.uint8)
r = requests.post("http://127.0.0.1:5000/predict", json={'imgb64' : data_send.decode(), 'w': 416, 'h': 416})
# make a post request
# print the response here
t2 = time.time()
print(t2-t1)
Your server will load the darknet model, and when it receives a post request it will simply return the model output.
from flask import Flask, request
from flask_restful import Resource, Api, reqparse
import json
import numpy as np
import base64
# compression
import zlib
# load keras model
# load_model('model.h5')
app = Flask(__name__)
api = Api(app)
parser = reqparse.RequestParser()
parser.add_argument('imgb64', location='json', help = 'type error')
parser.add_argument('w', type = int, location='json', help = 'type error')
parser.add_argument('h', type = int, location='json', help = 'type error')
class Predict(Resource):
def post(self):
request.get_json(force=True)
data = parser.parse_args()
if data['imgb64'] == "":
return {
'data':'',
'message':'No file found',
'status':'error'
}
img = data['imgb64']
w = data['w']
h = data['h']
data2 = img.encode()
data2 = base64.b64decode(data2)
#data2 = zlib.decompress(data2)
fdata = np.frombuffer(data2, dtype=np.uint8).reshape(w, h, -1)
# do model inference here
if img:
return json.dumps({
'mean': np.mean(fdata),
'channel': fdata.shape[-1],
'message':'darknet processed',
'status':'success'
})
return {
'data':'',
'message':'Something when wrong',
'status':'error'
}
api.add_resource(Predict,'/predict')
if __name__ == '__main__':
app.run(debug=True, host = '0.0.0.0', port = 5000, threaded=True)
In the # do model inference here part, just use your detect/predict function.
If you want to use native darknet, https://github.com/zabir-nabil/tf-model-server4-yolov3
If you want to use gRPC instead of REST, https://github.com/zabir-nabil/simple-gRPC

Returning matplotlib plots using telegram bot

This code is from here
I have the following code for a telegram bot which i am building:
import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
import requests
from bottle import (
run, post, response, request as bottle_request
)
BOT_URL = 'https://api.telegram.org/bot128secretns/'
def get_chat_id(data):
"""
Method to extract chat id from telegram request.
"""
chat_id = data['message']['chat']['id']
return chat_id
def get_message(data):
"""
Method to extract message id from telegram request.
"""
message_text = data['message']['text']
return message_text
def send_message(prepared_data):
"""
Prepared data should be json which includes at least `chat_id` and `text`
"""
message_url = BOT_URL + 'sendMessage'
requests.post(message_url, json=prepared_data)
def get_ticker(text):
stock = f'^GSPC'
start = datetime.date(2000,1,1)
end = datetime.date.today()
data = web.DataReader(stock, 'yahoo',start, end)
plot = data.plot(y='Open')
return plot
def prepare_data_for_answer(data):
answer = get_ticker(get_message(data))
json_data = {
"chat_id": get_chat_id(data),
"text": answer,
}
return json_data
#post('/')
def main():
data = bottle_request.json
answer_data = prepare_data_for_answer(data)
send_message(answer_data) # <--- function for sending answer
return response # status 200 OK by default
if __name__ == '__main__':
run(host='localhost', port=8080, debug=True)
When i run this code i am getting the following error:
TypeError: Object of type AxesSubplot is not JSON serializable
What this code is suppose to do is take ticker symbols from telegram app and return its chart back.
I know this is because json does not handle images.
What can i do to resolve it?

Sorry, I'm a bit late to the party. Here is a possible solution below, though I didn't test it. Hope it works or at least gives you a way to go about solving the issue :)
import datetime
from io import BytesIO
import requests
from pandas_datareader import data as web
from bottle import (
run, post, response, request as bottle_request
)
BOT_URL = 'https://api.telegram.org/bot128secretns/'
def get_chat_id(data):
"""
Method to extract chat id from telegram request.
"""
chat_id = data['message']['chat']['id']
return chat_id
def get_message(data):
"""
Method to extract message id from telegram request.
"""
message_text = data['message']['text']
return message_text
def send_photo(prepared_data):
"""
Prepared data should be json which includes at least `chat_id` and `plot_file`
"""
data = {'chat_id': prepared_data['chat_id']}
files = {'photo': prepared_data['plot_file']}
requests.post(BOT_URL + 'sendPhoto', json=data, files=files)
def get_ticker(text):
stock = f'^GSPC'
start = datetime.date(2000,1,1)
end = datetime.date.today()
data = web.DataReader(stock, 'yahoo',start, end)
plot = data.plot(y='Open')
return plot
def prepare_data_for_answer(data):
plot = get_ticker(get_message(data))
# Write the plot Figure to a file-like bytes object:
plot_file = BytesIO()
fig = plot.get_figure()
fig.savefig(plot_file, format='png')
plot_file.seek(0)
prepared_data = {
"chat_id": get_chat_id(data),
"plot_file": plot_file,
}
return prepared_data
#post('/')
def main():
data = bottle_request.json
answer_data = prepare_data_for_answer(data)
send_photo(answer_data) # <--- function for sending answer
return response # status 200 OK by default
if __name__ == '__main__':
run(host='localhost', port=8080, debug=True)
The idea is not to send a message using the sendMessage Telegram API endpoint, but to send a photo file by using the sendPhoto endpoint. Here, we use savefig call in the prepare_data_for_answer function body to convert AxesSubplot instance, that we get as a return value from the get_ticker function, to a file-like BytesIO object, which we then send as a photo to Telegram using send_photo function (previously named as send_message).

You may use bob-telegram-tools
from bob_telegram_tools.bot
import TelegramBot
import matplotlib.pyplot as plt
token = '<your_token>'
user_id = int('<your_chat_id>')
bot = TelegramBot(token, user_id)
plt.plot([1, 2, 3, 4])
plt.ylabel('some numbers')
bot.send_plot(plt)
# This method delete the generetad image
bot.clean_tmp_dir()

You cannot send a matplotlib figure directly. You will need to convert it to bytes and then send it as a multipart message.
data.plot will return a matplotlib.axes.Axes object. You can save convert the figure to bytes like this
import StringIO
img = StringIO.StringIO()
plot.fig.savefig(img, format='png')
img.seek(0)
yukuku/telebot has some good code on how to send the image as a message. Check this line here.

How to apply the Python function with predictive model into Flask application

I have a code that consists of two logical parts: receiving POST json request in Flask and function with prediction model. So here how it looks in code representation:
from flask import Flask
from flask import request
import io
import json
import pandas as pd
import numpy as np
from fbprophet import Prophet
app = Flask(__name__)
#app.route('/postjson', methods = ['POST'])
def postJsonHandler():
print (request.is_json)
content = request.get_json()
df = pd.io.json.json_normalize(content, ['Days', 'Orders'])
print (df)
return 'JSON posted'
app.run(host='0.0.0.0', port= 8090)
And here is the part with the function with model:
def TimeSeries ():
df['Days'] = pd.to_datetime(df['Days'])
df = df.rename(columns={'Days': 'ds',
'Orders': 'y'})
my_model = Prophet(interval_width=0.95, yearly_seasonality=False, daily_seasonality=False, weekly_seasonality=True)
df['y'] = np.log(df['y'])
my_model.fit(df)
future_dates = my_model.make_future_dataframe(periods=30)
forecast = my_model.predict(future_dates)
yhat=forecast.yhat
ser=np.exp(yhat)
df_upd=pd.DataFrame(ser[-30:])
df_upd.reset_index(drop=True, inplace=True)
js=df_upd.to_dict(orient='split')
del js['index']
res=json.dumps(js)
return res
My questions will bу following:
How can I transfer the result dataframe df from first part function postJsonHandler () and use it as an input in the second part function TimeSeries()?
How can I integrate my predictive function TimeSeries() in Flask environment to be able to run everything at once- to receive a json request, transform it into pandas dataframe, caluclates the result of prediction in json format and transfer this to server.
Big Thanks!

You combine your functions:
from flask import Flask
from flask import request
import io
import json
import pandas as pd
import numpy as np
from fbprophet import Prophet
app = Flask(__name__)
my_model = Prophet(interval_width=0.95, yearly_seasonality=False, daily_seasonality=False, weekly_seasonality=True)
#app.route('/postjson', methods = ['POST'])
def postJsonHandler():
print (request.is_json)
content = request.get_json()
df = pd.io.json.json_normalize(content, ['Days', 'Orders'])
df['Days'] = pd.to_datetime(df['Days'])
df = df.rename(columns={'Days': 'ds',
'Orders': 'y'})
df['y'] = np.log(df['y'])
my_model.fit(df)
future_dates = my_model.make_future_dataframe(periods=30)
forecast = my_model.predict(future_dates)
yhat=forecast.yhat
ser=np.exp(yhat)
df_upd=pd.DataFrame(ser[-30:])
df_upd.reset_index(drop=True, inplace=True)
js=df_upd.to_dict(orient='split')
del js['index']
res=json.dumps(js)
app.run(host='0.0.0.0', port= 8090

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Passing a pandas dataframe to FastAPI for NLP ML - python

Probably not the most elegant solution but I've made progress using the following: def predict(data: Data): data_dict = pd.DataFrame( { 'id': [data.id], 'project': [data.project], 'messages': [data.messages] } )

Frist, encode your dataFrame df to JSON record-oriented: r = requests.post(url, json=df.to_json(orient='records')). Then, decode your data inside the /predict/ endpoint with: df = pd.DataFrame(jsonable_encoder(data)) Remember to import the module from fastapi.encoders import jsonable_encoder.

A new library called pandera now supports direct passage of DataFrames without conversion via FastAPI. The docs are bit basic as of posting this, but may be worth reading: https://pandera.readthedocs.io/en/latest/fastapi.html#fastapi-integration.

Related

How to send form data to Flask App API Pandas Dataframe

Sentinel API error on Python: HTTP status 200 OK: API response not valid. JSON decoding failed

How to deploy Keras-yolo model to the web with Flask?

Returning matplotlib plots using telegram bot

How to apply the Python function with predictive model into Flask application

Categories

Resources