Sending bulk data to Azure ML Endpoint - python

I have an Azure ML endpoint which is used to get scoring when I supply data in json.
import requests
import json
# URL for the web service
scoring_uri = 'http://107a119d-9c23-4792-b5db-065e9d3af1e6.eastus.azurecontainer.io/score'
# If the service is authenticated, set the key or token
key = '##########################'
data = {"data":
[{'Associate_Gender': 'Male', 'Associate_Age': 20, 'Education': 'Under Graduate', 'Source_Hiring': 'Internal Movement', 'Count_of_Incoming_Calls_6_month': None, 'Count_of_Incoming_Calls_6_month_bucket': 'Greater than equal to 0 and less than 4', 'Internal_Quality_IQ_Score_Last_6_Months': '93%', 'Internal_Quality_IQ_Score_Last_6_Months_Bucket': 'Greater than 75%', 'Associate_Tenure_Floor_Bucket': 'Greater than 0 and less than 90', 'Current_Call_Repeat_Yes_No': False, 'Historical_CSAT_score': 'Greater than equal to 7 and less than 9', 'Customer_Age': 54, 'Customer_Age_Bucket': 'Greater than equal to 46', 'Network_Region_Originating_Call': 'East London', 'Length_of_Relationship_with_Customer': 266, 'Length_of_Relationship_with_Customer_bucket': 'Greater than 90', 'Call_Reason_Type_L1': 'Voice', 'Call_Reason_Type_L2': 'Prepaid', 'Call_Reason_Type_L3': 'Request for Reversal Provisioning', 'Number_of_VAS_services_active': 6, 'Customer_Category': 'Mercury', 'Customer_monthly_ARPU_GBP_Bucket': 'More than 30', 'Customer_Location': 'Houslow'}]
}
# Convert to JSON string
input_data = json.dumps(data)
# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.text)
How to send input data from files in bulk and get output. Or is it not feasible to send huge amount of data for scoring on endpoints?
Any alternative suggestion for scoring on azure is also welcome.

Lets assume you have a folder called json_data, where all your json files are stored, then you would open these files and post them to your endpoint as follows:
import requests
import json
import os
import glob
your_uri = 'https://jsonplaceholder.typicode.com/'
folder_path = './json_data'
for filename in glob.glob(os.path.join(folder_path, '*.json')):
with open(filename, 'r') as f:
json_input_data = json.load(f)
resp = requests.post(your_uri, json_input_data)
print(resp)
To showcase the successful http response 201 with jsonplaceholder.typicode.com you have to create a folder in the same directory of your python file and name it json_data, then create a few json files inside the folder and paste some data into the files, e.g.:
file1.json:
{
"title": "some title name 1",
"body": "some body content 1"
}
file2.json:
{
"title": "some title name 2",
"body": "some body content 2"
}
etc.
You could easily rewrite it and use your own uri, key, headers, etc.

To send bulk data for inferencing, I recommend to create a Batch Endpoint,
in Azure ML and the best way to do it is using the Azure CLI:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#create-a-batch-endpoint
You can then start a batch scoring using:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#start-a-batch-scoring-job-using-the-azure-cli
Or using REST:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#start-a-batch-scoring-job-using-rest

Related

JSON link from google developer tools not working in Python (or in browser)

I am trying to extract the data in the table at https://www.ecoregistry.io/emit-certifications/ra/10
Using the google developer tools>network tab, I am able to get the json link where the data for this table is stored: https://api-front.ecoregistry.io/api/project/10/emitcertifications
I am able to manually copy this json data and extract the information using this code I've written:
import json
import pandas as pd
data = '''PASTE JSON DATA HERE'''
info = json.loads(data)
columns = ['# Certificate', 'Carbon offsets destination', 'Final user', 'Taxpayer subject','Date','Tons delivered']
dat = list()
for x in info['emitcertifications']:
dat.append([x['consecutive'],x['reasonUsingCarbonOffsets'],x['userEnd'],x['passiveSubject'],x['date'],x['quantity']])
df = pd.DataFrame(dat,columns=columns)
df.to_csv('Data.csv')
I want to automate it such that I can extract the data from the json link: https://api-front.ecoregistry.io/api/project/10/emitcertifications directly instead of manually pasting json data in:
data = '''PASTE JSON DATA HERE'''
The link is not working in python or even in browser directly:
import requests
import json
url = ('https://api-front.ecoregistry.io/api/project/10/emitcertifications')
response = requests.get(url)
print(json.dumps(info, indent=4))
The error output I get is:
{'status': 0, 'codeMessages': [{'codeMessage': 'ERROR_401', 'param': 'invalid', 'message': 'No autorizado'}]}
When I download the data from the developer tools then this dictionary has 'status':1 and after that all the data is there.
Edit: I tried adding request headers to the url but it still did not work:
import requests
import json
url = ('https://api-front.ecoregistry.io/api/project/10/emitcertifications')
hdrs = {"accept": "application/json","accept-language": "en-IN,en;q=0.9,hi-IN;q=0.8,hi;q=0.7,en-GB;q=0.6,en-US;q=0.5","authorization": "Bearer null", "content-type": "application/json","if-none-match": "W/\"1326f-t9xxnBEIbEANJdito3ai64aPjqA\"", "lng": "en", "platform": "ecoregistry","sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Windows\"", "sec-fetch-dest": "empty","sec-fetch-mode": "cors", "sec-fetch-site": "same-site" }
response = requests.get(url, headers = hdrs)
print(response)
info = response.json()
print(json.dumps(info, indent=4))
print(response) give output as '<Response [304]>' while info = response.json() gives traceback error 'Expecting value: line 1 column 1 (char 0)'
Can someone please point me in the right direction?
Thanks in advance!
Posting comment as an answer:
The headers required for that api in order to retrieve data
is platform: ecoregistry.
import requests as req
import json
req = req.get('https://api-front.ecoregistry.io/api/project/10/emitcertifications', headers={'platform': 'ecoregistry'})
data = json.loads(data)
print(data.keys())
# dict_keys(['status', 'projectSerialYear', 'yearValidation', 'project', 'emitcertifications'])
print(data['emitcertifications'][0].keys())
# dict_keys(['id', 'auth', 'operation', 'typeRemoval', 'consecutive', 'serialInit', 'serialEnd', 'serial', 'passiveSubject', 'passiveSubjectNit', 'isPublicEndUser', 'isAccept', 'isCanceled', 'isCancelProccess', 'isUpdated', 'isKg', 'reasonUsingCarbonOffsetsId', 'reasonUsingCarbonOffsets', 'quantity', 'date', 'nitEnd', 'userEnd'])

Upload csv via API gateway to S3

I am trying to set up an AWS API Gateway that could receive a POST request an upload a csv file to S3. Ideally, I would like to make some transformations to the file before uploading it to S3 (renaming and formatting some columns to normalize their names accross different uploads).
So far, I have set up my API Gateway to receive the request and send it to an AWS Lambda. I use Lambda proxy integration. The triggered lambda is as follows:
import logging
import pandas as pd
import boto3
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client("s3")
def handler(event, context):
logger.info(f"Event: {event}")
df = pd.read_csv(event['body']['file'])
logger.info(f"df1: {df}")
# Provided parameters
try:
code = event['body']['code']
except KeyError:
logger.info('Code not provided')
code = 'Code'
try:
date = event['body']['date']
except KeyError:
logger.info('Date not provided')
date = 'Date'
try:
debit = event['body']['debit']
except KeyError:
logger.info('Debit not provided')
debit = 'Debit'
try:
credit = event['body']['credit']
except KeyError:
logger.info('Credit not provided')
credit = 'Credit'
try:
id= event['body']['id']
except KeyError:
logger.info('Id not provided')
id = '001'
df.rename(columns={code: 'Code', date: 'Date', credit: 'Credit', debit: 'Debit'})
df.to_csv(f's3://bucket/{id}/file.csv', line_terminator='\n', sep = ';', date_format='%Y-%m-%d %H:%M:%S')
return {
'statusCode': 200,
'headers': {
'Content-Type': 'text/csv',
'Access-Control-Allow-Origin': '*'
},
'body': {
'uploaded': True
},
'isBase64Encoded': False
}
To test this API, I use the following function:
import requests
csv_file = open("file.csv", 'rb')
headers = {"x-api-key": "xxx", "Content-Type":"text/csv"}
url = "https://xxx.execute-api.xxx.amazonaws.com/xxx"
body = {
"file": csv_file,
"code": "my_code"
}
# files = {
# "file": ("file.csv", open('file.csv', 'r'), 'text/csv')
# }
r = requests.post(url=url, headers=headers, data=body)
print(r.text)
The output is {"message": "Internal server error"}, and if I look in CloudWatch logs, I see that the event is encoded this way:
'body': 'file=%EF%BB%BFCol1%3BCol2%3BCol3%3BCol4%0D%0A&file=11%3B12%3B13%3B14%3B%0D%0A&file=21%3B22%3B23%3B24%3B...'
It looks like the body is encoded and passed row by row into different "file" fields. For a file with about 5000 rows I get the error OSError: [Errno 36] File name too long when trying to read it.
Is there another way to proceed in order to get a full dataset that I can transform into a pandas dataframe?
I have also seen suggestions with multipart/form-data, using files=files in the request or using csv library but I keep getting similar errors.
Thank you

How do I make an API call and authenticate it with a given API key using Python?

This is my code to extract player data from an endpoint containing basketball data for a Data Science project.NOTE: I changed the name of the actual API key I was given since it's subscription. And I change the username/password because for privacy purposes. Using the correct credentials, I wouldn't receive a syntax error but the status code always returns 401. Since it wasn't accepting the API key, I added my account username, password, and the HTTP authentication header as well, but the status code still returns 401.
In case this is relevant, this is the website's recommendation in the developer portal: **The API key can be passed either as a query parameter or using the following HTTP request header.
Please let me know what changes I can make to my code. Any help is appreciated.
Ocp-Apim-Subscription-Key: {key}**
PS: My code got fragmented while posting this, but it is all in one function.
def getData():
user_name = "name#gmail.com"
api_endpoint = "https://api.sportsdata.io/v3/nba/stats/json/PlayerGameStatsByDate/2020-FEB7"
api_key = "a45;lkf"
password = "ksaljd"
header = "Ocp-Apim-Subscription-Key"
PARAMS = {'user': user_name, 'pass': password, 'header': header, 'key': api_key}
response = requests.get(url = api_endpoint, data = PARAMS)
print(response.status_code)
file = open("Data.csv", "w")
file.write(response.text)
file.close()
def _get_auth_headers() -> dict:
return {
'Content-Type': 'application/json',
'Ocp-Apim-Subscription-Key': "`Insert key here`"
}
api_endpoint = "https://api.sportsdata.io/v3/nba/stats/json/PlayerGameStatsByDate/2020-FEB7"
PARAMS = {
# Your params here
}
response = requests.get(
api_endpoint,
headers=_get_auth_headers(),
params=PARAMS
)
Instead of just a string, you need to pass dict in the headers parameter and auth param exist so you can use it as follow:
def getData():
[...]
header = {
"Ocp-Apim-Subscription-Key": api_key
}
[...]
response = requests.get(url = api_endpoint, data = PARAMS, headers=header, auth = (user_name, password))
According to the API documentation you don't need to provide email and password. You're only need to add your API Key to header:
import requests
r = requests.get(url='https://api.sportsdata.io/v3/nba/stats/json/PlayerGameStatsByDate/2020-FEB7', headers={'Ocp-Apim-Subscription-Key': 'API_KEY'})
print(r.json())
Output:
[{
'StatID': 768904,
'TeamID': 25,
'PlayerID': 20000788,
'SeasonType': 1,
'Season': 2020,
'Name': 'Tim Hardaway Jr.',
'Team': 'DAL',
'Position': 'SF',
'Started': 1,
'FanDuelSalary': 7183,
'DraftKingsSalary': 7623,
'FantasyDataSalary': 7623,
...

Python Post Request - Getting 415 Error When Sending Files via Outlook API

I've been having some trouble sending files via python's rest module. I can send emails without attachments just fine but as soon as I try and add a files parameter, the call fails and I get a 415 error.
I've looked through the site and found out it was maybe because I wasn't sending the content type of the files when building that array of data so altered it to query the content type with mimetypes; still 415.
This thread: python requests file upload made a couple of more edits but still 415.
The error message says:
"A supported MIME type could not be found that matches the content type of the response. None of the supported type(s)"
Then lists a bunch of json types e.g: "'application/json;odata.metadata=minimal;odata.streaming=true;IEEE754Compatible=false"
then says:
"matches the content type 'multipart/form-data; boundary=0e5485079df745cf0d07777a88aeb8fd'"
Which of course makes me think I'm still not handling the content type correctly somewhere.
Can anyone see where I'm going wrong in my code?
Thanks!
Here's the function:
def send_email(access_token):
import requests
import json
import pandas as pd
import mimetypes
url = "https://outlook.office.com/api/v2.0/me/sendmail"
headers = {
'Authorization': 'Bearer '+access_token,
}
data = {}
data['Message'] = {
'Subject': "Test",
'Body': {
'ContentType': 'Text',
'Content': 'This is a test'
},
'ToRecipients': [
{
'EmailAddress':{
'Address': 'MY TEST EMAIL ADDRESS'
}
}
]
}
data['SaveToSentItems'] = "true"
json_data = json.dumps(data)
#need to convert the above json_data to dict, otherwise it won't work
json_data = json.loads(json_data)
###ATTACHMENT WORK
file_list = ['test_files/test.xlsx', 'test_files/test.docx']
files = {}
pos = 1
for file in file_list:
x = file.split('/') #seperate file name from file path
files['file'+str(pos)] = ( #give the file a unique name
x[1], #actual filename
open(file,'rb'), #open the file
mimetypes.MimeTypes().guess_type(file)[0] #add in the contents type
)
pos += 1 #increase the naming iteration
#print(files)
r = requests.post(url, headers=headers, json=json_data, files=files)
print("")
print(r)
print("")
print(r.text)
I've figured it out! Took a look at the outlook API documentation and realised I should be adding attachments as encoded lists within the message Json, not within the request.post function. Here's my working example:
import requests
import json
import pandas as pd
import mimetypes
import base64
url = "https://outlook.office.com/api/v2.0/me/sendmail"
headers = {
'Authorization': 'Bearer '+access_token,
}
Attachments = []
file_list = ['test_files/image.png', 'test_files/test.xlsx']
for file in file_list:
x = file.split('/') #file the file path so we can get it's na,e
filename = x[1] #get the filename
content = open(file,'rb') #load the content
#encode the file into bytes then turn those bytes into a string
encoded_string = ''
with open(file, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
encoded_string = encoded_string.decode("utf-8")
#append the file to the attachments list
Attachments.append({
"#odata.type": "#Microsoft.OutlookServices.FileAttachment",
"Name": filename,
"ContentBytes": encoded_string
})
data = {}
data['Message'] = {
'Subject': "Test",
'Body': {
'ContentType': 'Text',
'Content': 'This is a test'
},
'ToRecipients': [
{
'EmailAddress':{
'Address': 'EMAIL_ADDRESS'
}
}
],
"Attachments": Attachments
}
data['SaveToSentItems'] = "true"
json_data = json.dumps(data)
json_data = json.loads(json_data)
r = requests.post(url, headers=headers, json=json_data)
print(r)

python post request json

I need to use Python to do a POST request using JSON format. What I have right now is
url = 'http://mysurl.org'
data = {my data }
headers = {'content-type': 'application/json'}
r = requests.post(url,data= json.dumps(data, headers=headers)
The issue come when my data is not one line but 500 lines of :
[
{
"Id" : "abc123",
"usr": "u1",
"pwd" : "p1"
},
{
"Id" : "abc124",
"usr": "u2",
"pwd" : "p2"
},
{
"Id" : "abc125",
"usr": "u3",
"pwd" : "p3"
}
.......
]
This really threw me off because "Id" field come from a random generater: id = gennum()
usr is from a query: usr = sqlout[0][0], and pwd is from pwd = sqlout[0][1].
I really do not have an idea how to read 500 line of data into my file data=....
I try to use data.append but do not know how to continue after that.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
[update] sorry that question is not specific. my data comes from three different area:
(1) id row come from an random number generator: gennum()
(2) from query my database. sqlout variable will have 500 lines of out put wiht :
user, and pwd. so basically user = sqlout[0][0], and pwd will = sqlout[0][1] and they need to be in the post request body all together, in one request. so when I send the post request, my request bodywill contain 500 entries of json data like stated below. Hope this will clean the question up a little bit.
Read content of the file using open and file.read:
with open('/path/to/json_file') as f:
data = f.read()
url = 'http://mysurl.org'
headers = {'content-type': 'application/json'}
r = requests.post(url, data=data, headers=headers)
UPDATE after reading comments.
You can make dictionaries from multiple data sources using zip and list comprehension:
data = [{'id': id, 'usr': usr, 'pwd': pwd} for id,usr,pwd in
zip(id_data_generator, usr_data_generator, pwd_data_generator)]

Categories

Resources