AWS lambda response ERROR: string indices must be integers (Textract) - python

I'm new to this and have spent 3 weeks on this error. I'm trying to get the response from a lambda function of the extracted text in an image in an s3 bucket. I successfully upload the image into the bucket from me expo project using this code:
try {
const response = await fetch(data.uri);
const blob = await response.blob();
const assetType = 'Mobile Device';
await Storage.put(`${username}/${assetType}/${UUID}`, blob, {
contentType: 'image/jpeg',
progressCallback,
});
} catch (err) {
console.log("Error uploading file:", err);
}
this will invoke the lambda function to extract the needed text
import json
import boto3
from urllib.parse import unquote_plus
import re
def lambda_handler(event, context):
bad_chars = [';', ':', '!', "*", ']', "[" ,'"', "{" , "}" , "'",","]
file_obj = event["Records"][0]
bucketname = str(file_obj["s3"]["bucket"]["name"])
filename = unquote_plus(str(file_obj["s3"]["object"]["key"], encoding='utf-8'))
textract = boto3.client('textract', region_name='ap-south-1')
print(f"Bucket: {bucketname} ::: Key: {filename}")
response = textract.detect_document_text(
Document={
'S3Object': {
"Bucket": bucketname,
"Name": filename,
}
}
)
result=[]
processedResult=""
for item in response["Blocks"]:
if item["BlockType"] == "WORD":
result.append(item["Text"])
element = item["Text"] + " "
processedResult += element
print (processedResult)
res=[]
imei=""
#Extracting imei number and removing the bad characters
str_IMEI= str(re.findall('(?i)imei*?[1]*?[:]\s*\d{15}',processedResult))
imei= str(re.findall('[^\]]+',str_IMEI))
imei = str(imei.split(' ')[-1])
for i in bad_chars :
imei = imei.replace(i, '')
if len(imei)>0 :
res.append(str(imei))
else:
res.append('')
#Extracting imei2 number and removing the bad characters
str_IMEI2= str(re.findall('(?i)imei*?[2]*?[:]\s*\d{15}',processedResult))
imei2= str(re.findall('[^\]]+',str_IMEI2))
imei2 = str(imei2.split(' ')[-1])
for i in bad_chars :
imei2 = imei2.replace(i, '')
if len(imei2)>0 :
res.append(str(imei2))
else:
res.append('')
#Extracting model number and removing the bad characters
str_Model = str(re.findall('(?i)model*?[:]\s*[\w|\d|_,-|A-Za-z]+\s*[\w|\d|_,-|A-Za-z]+',processedResult))
result = str(str_Model.split(' ')[-1])
for bad_char in bad_chars :
result = result.replace(bad_char, '')
if len(result) >0 :
res.append(result)
else:
res.append('')
print(res)
return {
'statusCode': 200,
#'body': JSON.stringify(result)
'body': res
}
and then use RESTapi to get the response from the lambda
try{
const targetImage = UUID + '.jpg';
const response = await fetch(
'https://6mpfri01z0.execute-api.eu-west-2.amazonaws.com/dev',
{
method: 'POST',
headers: {
Accept: "application/json",
"Content-Type": "application/json"
},
body: JSON.stringify(targetImage)
}
)
const OCRBody = await response.json();
console.log('OCRBody', OCRBody);
} catch (err) {
console.log("Error extracting details:", err);
}
but i keep getting this error
OCRBody Object {
"errorMessage": "string indices must be integers",
"errorType": "TypeError",
"requestId": "8042d6f7-44de-4a54-8209-fe93ca8570ec",
"stackTrace": Array [
" File \"/var/task/lambda_function.py\", line 10, in lambda_handler
file_obj = event[\"Records\"][0]
",
],
}
PLEASE HELP.

As the error says, you are trying to slice a string type with a some kind of key as if it were a dict type:
>>> str_name = 'Jhon Doe'
>>> str_name['name']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

Normaly, this issue could cause, that you do not receive the correct data in header.
In all of my Lambda function, is use something like this:
bucket = event['Records'][0]['s3']['bucket']['name']
Event is most of the time a valid dict within Python, so you have two possibilities here:
Option 1
Before proceeding, to something like
if 'Records' in event:
if 's3' in event['Records'][0]:
if 'bucket' in ...
if 'name' in ...
Only if all condition are true, you should proceed.
Option 1
do a simple
print(event)
and check your event, when your Lambda will be failing.
The issue here could cause that the Lambda is triggered before the S3 object is available.
What you can do is to add a S3 trigger which will be invoked automatically the Lambda, as soon as the S3 object was uploaded. See here
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html

Related

python unable to retrieve InstanceID for loop iteration

I am running a python script that reads the 'data.json' and provide the InstanceIds of the PrivateIP. here's the json file (data.json) that has 2 PrivateIPs with their own separate AWSAccount:
{
"ServerIPList" : [{
"PrivateIP": "17.11.11.11",
"HostName": "ip-17-11-11-11.ec2.internal",
"Region": "us-east-1",
"AccountID": "123456789123"
},
{
"PrivateIP": "18.22.22.22",
"HostName": "ip-18-22-22-22.ec2.internal",
"Region": "us-east-1",
"AccountID": "567891234567"
}],
}
Now, when i run the script using a FOR loop, it supposes to run on iteration where it keeps describing the first PrivateIP and saves the output(InstanceId) to the empty list []. It, then, describes another PrivateIP. However, it successfully convert the InstanceID of first PrivateIP but threw an error while describing the second PrivateIP. Any thought where am i missing the logic. here's my code:
import json, boto3
EC2_list = []
PrivateIP = []
AccountID = []
path = json.loads{'data.json'}
for item in input_data['ServerIPList']:
PrivateIP.append(item['PrivateIP'])
AccountID.append(item['AccountID'])
if "," in instanceIds:
instanceIds_list = instanceIds.split(",")
else:
instanceIds_list = instanceIds
if "," in AccountID:
AccountID_list = AccountID.split(",")
else:
AccountID_list = AccountID
for i in PrivateIP_list:
for account in range(len(AccountID)):
RoleArn = f"arn:aws:iam::{AccountID[account]}:role/Cloudbees"
print(RoleArn)
response = sts.assume_role(RoleArn=RoleArn, RoleSessionName="learnaws-test-session")
ec2_client = boto3.client('ec2', region_name=region_Name,
aws_access_key_id=response['Credentials']['AccessKeyId'],
aws_secret_access_key=response['Credentials']['SecretAccessKey'],
aws_session_token = response['Credentials']['SessionToken'])
ec2 = boto3.resource('ec2', region_name=region_Name,
aws_access_key_id=response['Credentials']['AccessKeyId'],
aws_secret_access_key=response['Credentials']['SecretAccessKey'],
aws_session_token = response['Credentials']['SessionToken'])
EC2Response = ec2_client.describe_instances(Filters=[
{
'Name': 'private-ip-address',
'Values': [
i,
]
}
])
print("This is the actual response :")
#print(EC2Response)
for instance in EC2Response['Reservations'][0]['Instances']:
instanceIds = instance['InstanceId']
print(instanceIds)
EC2_list.append(instanceIds)
output is :
Traceback (most recent call last):
File "ec2InstanceState.py", line 260, in <module>
for instance in EC2Response['Reservations'][0]['Instances']:
IndexError: list index out of range
arn:aws:iam::123456789123:role/Cloudbees
This is the actual response :
"i-xxxxxxxxxxxxx"
arn:aws:iam::567891234567:role/Cloudbees
This is the actual response : <BLANK> NO INSTANCEID -------????

Extracting specific JSON values in python

JSON return from spotify api. Example:
{
"tracks": {
"href": "https://api.spotify.com/v1/search?query=Stero+Hearts&type=track&offset=0&limit=1",
"items": [
{
"album": {
"album_type": "album",
"artists": [
{
"external_urls": {
"spotify": "https://open.spotify.com/artist/4IJczjB0fJ04gs4uvP0Fli"
},
"href": "https://api.spotify.com/v1/artists/4IJczjB0fJ04gs4uvP0Fli",
"id": "4IJczjB0fJ04gs4uvP0Fli",
"name": "Gym Class Heroes",
"type": "artist",
"uri": "spotify:artist:4IJczjB0fJ04gs4uvP0Fli"
}
]
}
}
]
}
}
Broken Code
import requests, json
spotifytrack = input("Name of Song?\\n")
link = "https://api.spotify.com/v1/search?q=" + spotifytrack + "&type=track&limit=1"
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
header = {
"Authorization": "Bearer {}".format(token),
"Content-Type": "application/json",
"Accept": "application/json",
}
auth_response = requests.get(link, headers=header)
pretty_response = json.dumps(auth_response.json(), indent=4)
data_by_user = {}
for d in auth_response:
data_by_user[d["artist"]] = d
print(data_by_user["uri"])
"""
def find_track_from_json(auth_response, artist):
return [p for p in auth_response if p["artist"] == artist][0]["uri"]
urii = find_track_from_json(auth_response, "uri")
print(urii)
x = load.json(auth_response.json())
print("Here is the data whic we have imported\n")
print(pretty_response)
print(x["name"])
print(x["uri"])
print(x["spotify"])
"""
Errors noticed:
File "spotify.py", line 19, in <module>
data_by_user[d["artist"]] = d
TypeError: byte indices must be integers or slices, not str
The aim is to convert word search to link in a cli application.
I tried load.json which i saw in some website and also tried def.
I expected the program to find out the artist name and uri from the json and print it in the cli interface.
You are iterating over the encoded json string:
auth_response = requests.get(link, headers=header)
for d in auth_response:
Python is complaining that you aren't providing a numerical index, which is correct as auth_response is just a string!
You should call json.loads to decode the string, and then you can iterate over it.
auth_response = requests.get(link, headers=header)
decoded_auth_response = json.loads(auth_response)
data_by_user = {}
for d in decoded_auth_response:
data_by_user[d["artist"]] = d
As you haven't provided the full json output from the API call I'm not sure what data is actually in decoded_auth_response, and you haven't described what your expected output would look like, so you may need to do some more work to find the correct data in each iteration.
The result from requests.get() is a requests.Response object. As far as I can see you want to iterate over the response body which is JSON. The requests.Response object has a .json() method which returns a dict from the response JSON.
Looking at the response you would probably want to iterate over resp_json['tracks']['items'] which is a list.
So to summarize your code should look something like this:
auth_response = requests.get(link, headers=header)
items = auth_response.json()['tracks']['items']
for d in items:
print(d)

Flutter Post Request to a Flask API

i want to send a request Post from my App coded in Flutter in which there is an image converted in Base 64. Here is the following code that I am using :
Future<List<Result>> postJSON(String imageP, String iP, String port) async {
final String jsonEndpoint = "http://$iP:$port/todo/api/v1.0/tasks/mdrv";
final response = await http.post('$jsonEndpoint', body:
{
"id": "3",
"title_image": "Test",
"b64Image": "$imageP",
"done": "false",
},);
if (response.statusCode == 200){
List results = jsonDecode(response.body);
return results
.map(
(result) => new Result.fromJson(result))
.toList();
} else {
throw Exception('Erreur dans le chargement, veuillez réessayer');
}
}
But, when i do the request, i have the following TypeError on my Flask API :
description = JSON["b64Image"]
TypeError: 'NoneType' object is not subscriptable
I am using the following Python code :
def send_client():
Lresult_algo=[]
JSON = request.get_json()
id = JSON['id']
'description':JSON['b64Image']
server=Server(description)
description1=server.B64_array(description)
description2=Image.fromarray(description1)
description2.save(r"C:\Users\vince\Desktop\test2.png")
queryPath=r"C:\Users\vince\Desktop\test2.png"
Lresult_algo=server.send(queryPath)
maskedBodies_b64 = []
for matrice in Lresult_algo:
matrice1=matrice.astype('uint8')
maskedBodies_b64.append(base64.b64encode(cv2.imencode('.jpg', matrice1)[1]))
maskedBodies_b64=[str(b64) for b64 in maskedBodies_b64]
data = {
'Image_1' : maskedBodies_b64[0],
'Image_2' : maskedBodies_b64[1],
'Image_3' : maskedBodies_b64[2],
'Image_4' : maskedBodies_b64[3],
'Image_5' : maskedBodies_b64[4]
}
resp=json.dumps(data)
return resp
Do you think is this a typing problem ? How could I fix it ?
I changed my code like this but there is still the same mistake :
Future<List<Result>> postJSON(String imageP, String iP, String port) async {
final String jsonEndpoint = 'http://$iP:$port/api/v1.0/tasks/mdrv';
Map<String, dynamic> data = {
'id': 1,
'title_image': "Test",
'b64Image': "$imageP",
'done': false,
};
var client = new http.Client();
var body = jsonEncode(data);
var response = await client.post('$jsonEndpoint',headers: {"Content-Type": "application/json"}, body: body,);
if (response.statusCode == 200){
List results = jsonDecode(response.body);
return results
.map(
(result) => new Result.fromJson(result))
.toList();
} else {
throw Exception('Erreur dans le chargement, veuillez réessayer');
}
}

Getting a specific value of JSON data

I'm getting a JSON data from RESTCONF HTTPS request, using the following code
https_request = 'https://' + host + '/restconf/data/' + operation
headers = {'Content-type': 'application/yang-data+json', 'Accept': 'application/yang-data+json'}
r = requests.get(https_request, auth=(user, password), headers=headers, verify=False)
print r.json()
The data I got is the following:
{
"Cisco-IOS-XE-segment-routing:ipv4": {
"prefixes": [
{
"ipprefix": "1.1.1.1/32",
"index": {
"range-start": 333,
"range": 1
}
}
]
}
}
Basically, I want to return the field's "range-start" value which is 333. I tried the following but it did not work.
for element in r:
id = element['range-start']
print(id)
Is there anyway to get that value?
From Python Console:
>>> import json
... data = json.loads('{"Cisco-IOS-XE-segment-routing:ipv4": {"prefixes": [{"ipprefix": "1.1.1.1/32", "index": {"range-start": 333, "range": 1}}]}}')
... print(data['Cisco-IOS-XE-segment-routing:ipv4']['prefixes'][0]['index']['range-start'])
333
>>>
You need to start at the beginning of the JSON and work your way to the key you want. To do this you need to start at Cisco-IOS-XE-segment-routing:ipv4.
prefixes = r.json()["Cisco-IOS-XE-segment-routing:ipv4"]["prefixes"]
id = prefixes[0]["index"]["range-start"]
If there are multiple prefixes you can loop over them and access each range-start.
Since you are looping over elements, I would suggest this approach using a helper function:
def get_id(element):
prefixes = r.json()["Cisco-IOS-XE-segment-routing:ipv4"]["prefixes"]
id = prefixes[0]["index"]["range-start"]
return id
Then you can do, as in your question:
for element in r:
id = get_id(element)
print(id)

Simplest lambda function to copy a file from one s3 bucket to another

I'm a total noob to working with AWS. I am trying to get a pretty simple and basic operation to work. What I want to do is, upon a file being uploaded to one s3 bucket, I want that upload to trigger a Lambda function that will copy that file to another bucket.
I went to the AWS management console, created an s3 bucket on the us-west2 server called "test-bucket-3x1" to use as my "source" bucket and another called "test-bucket-3x2" as my 'destination' bucket. I did not change or modify any settings when creating these buckets.
In the Lambda console, I created an s3 trigger for the 'test-bucket-3x1', changed 'event type' to "ObjectCreatedByPut", and didn't change any other settings.
This is my actual lamda_function code:
import boto3
import json
s3 = boto3.resource('s3')
def lambda_handler(event, context):
bucket = s3.Bucket('test-bucket-3x1')
dest_bucket = s3.Bucket('test-bucket-3x2')
print(bucket)
print(dest_bucket)
for obj in bucket.objects():
dest_key = obj.key
print(dest_key)
s3.Object(dest_bucket.name, dest_key).copy_from(CopySource = {'Bucket': obj.bucket_name, 'Key': obj.key})
When I test this function with the basic "HelloWorld" test available from the AWS Lambda Console, I receive this"
{
"errorMessage": "'s3.Bucket.objectsCollectionManager' object is not callable",
"errorType": "TypeError",
"stackTrace": [
[
"/var/task/lambda_function.py",
12,
"lambda_handler",
"for obj in bucket.objects():"
]
]
}
What changes do I need to make to my code in order to, upon uploading a file to test-bucket-3x1, a lambda function is triggered and the file is copied to test-bucket-3x2?
Thanks for your time.
I would start with the blueprint s3-get-object
for more information about creating a lambda from a blueprint use this page:
this is the code of the blueprint above:
console.log('Loading function');
const aws = require('aws-sdk');
const s3 = new aws.S3({ apiVersion: '2006-03-01' });
exports.handler = async (event, context) => {
//console.log('Received event:', JSON.stringify(event, null, 2));
// Get the object from the event and show its content type
const bucket = event.Records[0].s3.bucket.name;
const key = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, ' '));
const params = {
Bucket: bucket,
Key: key,
};
try {
const { ContentType } = await s3.getObject(params).promise();
console.log('CONTENT TYPE:', ContentType);
return ContentType;
} catch (err) {
console.log(err);
const message = `Error getting object ${key} from bucket ${bucket}. Make sure they exist and your bucket is in the same region as this function.`;
console.log(message);
throw new Error(message);
}
};
you will then need to update the code above to not only get the object info but to do the copy and the delete of the source, and for that you can refer to this answer:
const moveAndDeleteFile = async (file,inputfolder,targetfolder) => {
const s3 = new AWS.S3();
const copyparams = {
Bucket : bucketname,
CopySource : bucketname + "/" + inputfolder + "/" + file,
Key : targetfolder + "/" + file
};
await s3.copyObject(copyparams).promise();
const deleteparams = {
Bucket : bucketname,
Key : inputfolder + "/" + file
};
await s3.deleteObject(deleteparams).promise();
....
}
Source:How to copy the object from s3 to s3 using node.js
for object in source_bucket.objects.all():
print(object)
sourceObject = { 'Bucket' : 'bucketName', 'Key': object}
destination_bucket.copy(sourceObject, object)
You should really use the event from the lambda_handler() method to get the file [path|prefix|uri] and only deal with that file, since your event is being triggered on file being put in the bucket:
def lambda_handler(event, context):
...
if event and event['Records']:
for record in event['Records']:
source_key = record['s3']['object']['key']
... # do something with the key: key-prefix/filename.ext
for the additional question about opening files from the s3Bucket directly, I would recommend to check smart_open, that "kind of" handles the s3Bucket like a local file system:
from pandas import DataFrame, read_csv
from smart_open import open
def read_as_csv(file_uri: str): -> DataFrame
with open(file_uri) as f:
return read_csv(f, names=COLUMN_NAMES)
var AWS = require("aws-sdk");
exports.handler = (event, context, callback) => {
var s3 = new AWS.S3();
var sourceBucket = "sourcebucketskc";
var destinationBucket = "destbucketskc";
var objectKey = event.Records[0].s3.object.key;
var copySource = encodeURI(sourceBucket + "/" + objectKey);
var copyParams = { Bucket: destinationBucket, CopySource: copySource, Key: objectKey };
s3.copyObject(copyParams, function(err, data) {
if (err) {
console.log(err, err.stack);
} else {
console.log("S3 object copy successful.");
}
});
};
You also can do it in this way:
import boto3
def copy_file_to_public_folder():
s3 = boto3.resource('s3')
src_bucket = s3.Bucket("source_bucket")
dst_bucket = "destination_bucket"
for obj in src_bucket.objects.filter(Prefix=''):
# This prefix will got all the files, but you can also use:
# (Prefix='images/',Delimiter='/') for some specific folder
print(obj.key)
copy_source = {'Bucket': "source_bucket", 'Key': obj.key}
# and here define the name of the object in the destination folder
dst_file_name = obj.key # if you want to use the same name
s3.meta.client.copy(copy_source, dst_bucket, dst_file_name)
This basically will take all the objects in the origin bucket and copy to the next one.

Categories

Resources