List S3 bucket objects with access point using Boto3

List S3 bucket objects with access point using Boto3 - python

I am trying to use the list_objects_v2 function of the Python3 Boto3 S3 API client to list objects from an S3 access point.
Sample Code:
import boto3
import botocore
access_point_arn = "arn:aws:s3:region:account-id:accesspoint/resource"
client = boto3.client('s3')
response = client.list_objects_v2(Bucket=access_point_arn)
Somehow getting the error below:
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "arn:aws:s3:region:account-id:accesspoint/resource": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
Based on the documentation: https://docs.aws.amazon.com/AmazonS3/latest/dev/using-access-points.html, i should be able to pass an access point to the list_objects_v2 function as the Bucket name. The odd thing is, this function works locally on my Windows 10 laptop. The same Python3.6 code with the same Boto3 and Botocore package versions throws this error in AWS Glue Python Shell job. I also made sure the Glue role has S3 Full Access and Glue Service policies attached.
I would appreciate if someone can shed some lights on this.

Related

unable to read large csv file from s3 bucket to python

So I am trying to load a csv file from s3 bucket. The following is the code
import pandas as pd
import boto3
import io
s3_file_key = 'iris.csv'
bucket = 'data'
s3 = boto3.client('s3')
obj = s3.get_object(Bucket=bucket, Key=s3_file_key)
initial_df = pd.read_csv(io.BytesIO(obj['Body'].read()))
It works fine. iris.csv is only 3kb in size.
Now instead of iris.csv, I try to read 'mydata.csv' which is 6gb in size.
I get the following error :
ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
I am unable to comprehend how access can be an issue since I put the data there in the first place. Also I am able to read 'iris.csv' from the same location. Any ideas?

Here are the few things that you can do:
Make sure the region of the S3 bucket is the same as your AWS configure. Otherwise, it won't work. S3 service is global but every bucket is created in a specific region. The same region should be used by AWS clients.
Make sure the access keys for the resource has the right set of permissions.
Make sure the file is actually uploaded.
Make sure there is no bucket policy applied that revokes access.
You can enable logging on your S3 bucket to see errors.
Make sure the bucket is not versioned. If versioned, specify the object version.
Make sure the object has the correct set of ACLs defined.
If the object is encrypted, make sure you have permission to use that KMS key to decrypt the object.

Tensorflow - S3 object does not exist

How do I set up direct private bucket access for Tensorflow?
After running
from tensorflow.python.lib.io import file_io
and running print file_io.stat('s3://my/private/bucket/file.json') I end up with an error -
NotFoundError: Object s3://my/private/bucket/file.json does not exist
However, the same line on a public object works without an error:
print file_io.stat('s3://ryft-public-sample-data/wikipedia-20150518.bin')
There appears to be an article on support here: https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md
However, I end up with the same error after exporting the variables shown.
I have awscli set up with all credentials, and boto3 can view and download the file in question. I am wondering how I can get Tensorflow to have S3 access directly when the bucket is private.

I had the same problem when trying to access files in private S3 bucket from Sagemaker notebook. The mistake I made was to try using credentials I obtained from boto3, which seem not to be valid outside.
The solution was not to specify credentials (in such case it uses the role attached to the machine), but instead just specify the region name (for some reason it didn't read it from ~/.aws/config file) as follows:
import boto3
import os
session = boto3.Session()
os.environ['AWS_REGION']=session.region_name
NOTE: when debugging this error useful was to look at CloudWatch logs, as the logs of S3 client were printed only there and not in the Jupyter notebook.
In there I have first have seen, that:
when I did specify credentials from boto3 the error was: The AWS Access Key Id you provided does not exist in our records.
When accessing without AWS_REGION env variable set I had The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. which apparently is common when you don't specify bucket (see 301 Moved Permanently after S3 uploading)

Uploading files to Amazon s3 bucket using ARN iam in Python

I have been given a bucket name with ARN Number as below:
arn:aws:iam::<>:user/user-name
I was also given an access key.
I know that this can be done using boto.
Connect to s3 bucket using IAM ARN in boto3
As in the above link do i need to use 'sts'?
if so why am i provided with an access key?

First, I recommend you install the AWS Command-Line Interface (CLI), which provides a command-line for accessing AWS.
You can then store your credentials in a configuration file by running:
aws configure
It will prompt you for the Access Key and Secret Key, which will be stored in a config file.
Then, you will want to refer to S3 — Boto 3 documentation to find out how to access Amazon S3 from Python.
Here's some sample code:
import boto3
client = boto3.client('s3', region_name = 'ap-southeast-2') # Change as appropriate
client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

Can we access bucket using bucket endpoint with python sdk

Can we access bucket with bucket endpoint like .s3.amazonaws.com using python sdk. i don't want access bucket with following bucket = conn.get_bucket(bucket_name).

I don't know why you need to access it this way because the s3 endpoint is a fixed part where only thing changes is the name of your bucket (because it's global).
But, in the end, what you are looking for is not possible unfortunately. You need to provide bucket name for accessing the bucket and running operations on it.
Verified by boto3 documentation and here you can check:
S3 Boto documentation

using python boto to copy json file from my local machine to amazon S3

I have a json file with file name like '203456_instancef9_code323.json' in my C:\temp\testfiles directory and want to copy the file to Amazon s3 bucket and my bucket name is 'input-derived-files' using python and boto library but throwing exceptions at all times saying the file does not exist.I have a valid access id and secret key and could establish connection to AWS. Could someone help me with the best code to script for this please. Many thanks for your contribution

Here is the code that you need based on boto3, it is the latest boto library and is maintained. You need to make sure that you use the forward slash for directory path. I have tested this code on windows and it works.
import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file('C:/temp/testfiles/203456_instancef9_code323.json',
'input-derived-files', '203456_instancef9_code323.json')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

List S3 bucket objects with access point using Boto3 - python

Related

unable to read large csv file from s3 bucket to python

Tensorflow - S3 object does not exist

Uploading files to Amazon s3 bucket using ARN iam in Python

Can we access bucket using bucket endpoint with python sdk

using python boto to copy json file from my local machine to amazon S3

Categories

Resources