Using Boto3 to create loop on specific folder

Using Boto3 to create loop on specific folder - python

I am testing the new data feeds as in XML. Those data will be stored in S3 in the following format:
2018\1\2\1.xml
2018\1\3\1.xml
2018\1\3\2.xml
etc. So, multiple .xml files are possible on one day. Also, important to note that there are folders in this bucket that I do NOT want to pull. So I have to target a very specific directory.
There is no date time stamp within the file, so I need to use created, modified, something to go off of. To do this I think of using a dictionary of key, values with folder+xml file as the key, created/modified timestamp as the value. Then, use that dict to essentially re-pull all the objects.
Here's what I've tried..
i
mport boto3
from pprint import pprint
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
result = paginator.paginate(
Bucket='bucket',
Prefix='folder/folder1/folder2')
bucket_object_list = []
for page in result:
pprint(page)
if "Contents" in page:
for key in page[ "Contents" ]:
keyString = key[ "Key" ]
pprint(keyString)
bucket_object_list.append(keyString)
s3 = boto3.resource('s3')
obj = s3.Object('bucket','bucket_object_list')
obj.get()["Contents"].read().decode('utf-8')
pprint(obj.get())
sys.exit()
This is throwing an error from the key within the obj = s3.Object('cluster','key') line.
Traceback (most recent call last):
File "s3test2.py", line 25, in <module>
obj = s3.Object('cluster', key)
NameError: name 'key' is not defined
The Maxitems is purely for testing purposes although it's interesting since this translates to 1000 when run.

NameError: name 'key' is not defined
As far as error is concerned, it's because key is not defined.
From this documentation:
Object(bucket_name, key)
Creates a Object resource.:
object = s3.Object('bucket_name','key')
Parameters
bucket_name(string) -- The Object's bucket_name identifier. This must be set.
key(string) -- The Object's key identifier. This must be set.
You need to assign an object key name to the 'key' you're using in the code
The keyName is the "name" (=unique identifier) by which your file will be stored in the S3 bucket
Code based on what you posted:
import boto3
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
result = paginator.paginate( Bucket='bucket_name', Prefix='folder/folder1/folder2')
bucket_object_list = []
for page in result:
if "Contents" in page:
for key in page[ "Contents" ]:
keyString = key[ "Key" ]
print(keyString)
bucket_object_list.append(keyString)
print bucket_object_list
s3 = boto3.resource('s3')
for file_name in bucket_object_list:
obj = s3.Object('bucket_name',file_name)
print(obj.get())
print(obj.get()["Body"].read().decode('utf-8'))

Related

passing parameter in Rest API request from other file or list variable- using

new to python and API.
i have list of values like below
typeid=['1','12','32','1000','9']
I have to pass this value as parameter in API request, so that it would take one typeid at a time and append the json. code i have following but not sure how it will move from one value to other?
# activity type id store as following in other .py file typeid=['1','12','32','1000','9']
#importing the file in main program file.
From typeid list import activitytypeids
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activitytypeids[0]
}).text
json_obj = json.loads(act1)
results.append(json_obj)
more_result = json_obj['moreResult']
while True:
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activitytypeids[0]
}).text
json_obj = json.loads(act1)
results.append(json_obj)
more_result =json(results['moreResult'])
if not more_result:
break
How do I pass the activity's in request param one by one, so that get the result of all type ids.

take your code to get one id and put it in a function that accepts an activity_id, and change all activitytypeids[0] to just be activity_id
From typeid list import activitytypeids
def get_activity_id(activity_id):
act1 = requests.get(host + '/rest/v1/activities.json',
params={
'activityTypeIds': activity_id
}).text
return act1.json()
then you can just iterate over your list
results = [get_activity_id(id) for id in activitytypeids]
that said it seems very surprising that a variable named activityTypeIds only accepts one id ... i would very much expect this to be able to accept a list based on nothing more than the variable name

Read a list of table names on an Amazon S3 folder under a named folder/directory using boto3

I have a use case where I have to read tables names under a folder on an Amazon S3 given a path.
e.g say a bucket with path s3://mybucket/aws glue service/raw/source_data/
in source data there's a folder named Tables that list table names. eg.
Tables:
users
customers
Admin
so basically I want to write a function that returns ["users","customers","Admin"]
Here's what I have so far:
def read_tables(path):
tables = []
s3 = boto3.resource('s3')
bucket = s3.Bucket(path)
for obj in bucket.objects.filter(Prefix='Tables/'):
tables.append(obj)
return tables

The table name will be at the end of the path of the object key and can be extracted as follows:
def read_tables(s3_uri):
tables = []
s3 = boto3.resource('s3')
remove_scheme = slice(5, len(s3_uri))
bucketname, key = s3_uri[remove_scheme].split('/', 1)
bucket = s3.Bucket(bucketname)
prefix = f'{key}/Tables/'
for obj in bucket.objects.filter(Prefix=prefix):
tablename = obj.key.split('/').pop()
tables.append(tablename)
return tables

sed recognition response to DynamoDB table using Lambda-python

I am using Lambda to detect faces and would like to send the response to a Dynamotable.
This is the code I am using:
rekognition = boto3.client('rekognition', region_name='us-east-1')
dynamodb = boto3.client('dynamodb', region_name='us-east-1')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_faces(bucket, key):
response = rekognition.detect_faces(Image={"S3Object": {"Bucket": bucket,
"Name": key}}, Attributes=['ALL'])
TableName = 'table_test'
for face in response['FaceDetails']:
table_response = dynamodb.put_item(TableName=TableName, Item='{0} - {1}%')
return response
My problem is in this line:
for face in response['FaceDetails']:
table_response = dynamodb.put_item(TableName=TableName, Item= {'key:{'S':'value'}, {'S':'Value')
I am able to see the result in the console.
I don't want to add specific item(s) to the table- I need the whole response to be transferred to the table.
Do do this:
1. What to add as a key and partition key in the table?
2. How to transfer the whole response to the table
i have been stuck in this for three days now and can't figure out any result. Please help!
******************* EDIT *******************
I tried this code:
rekognition = boto3.client('rekognition', region_name='us-east-1')
# --------------- Helper Functions to call Rekognition APIs ------------------
def detect_faces(bucket, key):
response = rekognition.detect_faces(Image={"S3Object": {"Bucket": bucket,
"Name": key}}, Attributes=['ALL'])
TableName = 'table_test'
for face in response['FaceDetails']:
face_id = str(uuid.uuid4())
Age = face["AgeRange"]
Gender = face["Gender"]
print('Generating new DynamoDB record, with ID: ' + face_id)
print('Input Age: ' + Age)
print('Input Gender: ' + Gender)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['test_table'])
table.put_item(
Item={
'id' : face_id,
'Age' : Age,
'Gender' : Gender
}
)
return response
It gave me two of errors:
1. Error processing object xxx.jpg
2. cannot concatenate 'str' and 'dict' objects
Can you pleaaaaase help!

When you create a Table in DynamoDB, you must specify, at least, a Partition Key. Go to your DynamoDB table and grab your partition key. Once you have it, you can create a new object that contains this partition key with some value on it and the object you want to pass itself. The partition key is always a MUST upon creating a new Item in a DynamoDB table.
Your JSON object should look like this:
{
"myPartitionKey": "myValue",
"attr1": "val1",
"attr2:" "val2"
}
EDIT: After the OP updated his question, here's some new information:
For problem 1)
Are you sure the image you are trying to process is a valid one? If it is a corrupted file Rekognition will fail and throw that error.
For problem 2)
You cannot concatenate a String with a Dictionary in Python. Your Age and Gender variables are dictionaries, not Strings. So you need to access an inner attribute within these dictionaries. They have a 'Value' attribute. I am not a Python developer, but you need to access the Value attribute inside your Gender object. The Age object, however, has 'Low' and 'High' as attributes.
You can see the complete list of attributes in the docs
Hope this helps!

Need help in fetching a particular value from the json output

I need to obtain the Tag values from the below code, it initially fetches the Id and then passes this to the describe_cluster, the value is then in the json format. Tryging to fetch a particular value from this "Cluster" json using "GET". However, it returns a error message as "'str' object has no attribute 'get'", Please suggest.
Here is a reference link of boto3 which I'm referring:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.describe_cluster
import boto3
import json
from datetime import timedelta
REGION = 'us-east-1'
emrclient = boto3.client('emr', region_name=REGION)
snsclient = boto3.client('sns', region_name=REGION)
def lambda_handler(event, context):
EMRS = emrclient.list_clusters(
ClusterStates = ['STARTING', 'RUNNING', 'WAITING']
)
clusters = EMRS["Clusters"]
for cluster_details in clusters :
id = cluster_details.get("Id")
describe_cluster = emrclient.describe_cluster(
ClusterId = id
)
cluster_values = describe_cluster["Cluster"]
for details in cluster_values :
tag_values = details.get("Tags")
print(tag_values)

The error is in the last part of the code.
describe_cluster = emrclient.describe_cluster(
ClusterId = id
)
cluster_values = describe_cluster["Cluster"]
for details in cluster_values: # ERROR HERE
tag_values = details.get("Tags")
print(tag_values)
The returned value from describe_cluster is a dictionary. The Cluster is also a dictionary. So you don't need to iterate over it. You can directly access cluster_values.get("Tags")

Quickbase module add_record() function—file upload parameters?

The code below is part of the Python Quickbase module which has not been updated in quite a while. The help text for one of the function shown below is not clear on how to pass the parameters to upload a file (the value of which is actually base64 encoded).
def add_record(self, fields, named=False, database=None, ignore_error=True, uploads=None):
"""Add new record. "fields" is a dict of name:value pairs
(if named is True) or fid:value pairs (if named is False). Return the new records RID
"""
request = {}
if ignore_error:
request['ignoreError'] = '1'
attr = 'name' if named else 'fid'
request['field'] = []
for field, value in fields.iteritems():
request_field = ({attr: to_xml_name(field) if named else field}, value)
request['field'].append(request_field)
if uploads:
for upload in uploads:
request_field = (
{attr: (to_xml_name(upload['field']) if named else upload['field']),
'filename': upload['filename']}, upload['value'])
request['field'].append(request_field)
response = self.request('AddRecord', database or self.database, request, required=['rid'])
return int(response['rid'])
Can someone help me in how I should pass the parameters to add a record.

Based on the definition you provided, it appears that you you need to pass an array of dictionaries that each provide the field name/id, filename, and the base64 encoding of the file for the uploads parameter. So, if I had a table where I record the name of a color to the field named "color" with the field id of 19 and a sample image to the field named "sample image" with the field id of 21, I believe my method call would be something like:
my_color_file = #base64 encoding of your file
my_fields = {'19': 'Seafoam Green'}
my_uploads = [{'field': 21, 'filename':'seafoam_green.png', 'value': my_color_file}]
client.add_record(fields=my_fields, uploads=my_uploads)
Or, if you're using field names:
my_color_file = #base64 encoding of your file
my_fields = {'color': 'Seafoam Green'}
my_uploads = [{'field': 'sample_image', 'filename':'seafoam_green.png', 'value': my_color_file}]
client.add_record(fields=my_fields, named=True, uploads=my_uploads)
client is just the object you instantiated earlier using whatever constructor this module has.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using Boto3 to create loop on specific folder - python

Related

passing parameter in Rest API request from other file or list variable- using

Read a list of table names on an Amazon S3 folder under a named folder/directory using boto3

sed recognition response to DynamoDB table using Lambda-python

Need help in fetching a particular value from the json output

Quickbase module add_record() function—file upload parameters?

Categories

Resources