S3 boto3 refuses to overwrite endpoint URL - python

I'm working on an internal S3 service (not AWS one). When I provide hard coded credentials, region and endpoint_url, boto3 seems to ignore them.
I came to that conclusion because it is attempting to go on internet (by using a public aws endpoint URL instead of the internal I have provided) but it does not work because of the following proxy error. But he should not go on internet, since it is an internal S3 service :
botocore.exceptions.ProxyConnectionError: Failed to connect to proxy URL: "http://my_company_proxy"
Here is my code
import io
import os
import boto3
import pandas as pd
# Method 1 : Client #########################################
s3_client = boto3.client(
's3',
region_name='EU-WEST-1',
aws_access_key_id='xxx',
aws_secret_access_key='zzz',
endpoint_url='https://my_company_enpoint_url'
)
# ==> at this point no error, but I don't know the value of endpoint_url
# Read bucket
bucket = "bkt-udt-arch"
file_name = "banking.csv"
print("debug 1") # printed OK
obj = s3_client.get_object(Bucket= bucket, Key= file_name)
# program stops here :
botocore.exceptions.ProxyConnectionError: Failed to connect to proxy URL: "http://my_company_proxy"
print("debug 2") # not printed -
initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
print("debug 3")
# Method 2 : Resource #########################################
# use third party object storage
s3 = boto3.resource('s3', endpoint_url='https://my_company_enpoint_url',
aws_access_key_id='xxx',
aws_secret_access_key='zzz',
region_name='EU-WEST-1'
)
print("debug 4") # Printed OK if method 1 is commented
# Print out bucket names
for bucket in s3.buckets.all():
print(bucket.name)

Thank you for the review
It was indeed a proxy problem : when http_prxoxy env variable is disabled, it works fine.

Related

Python Boto3 to Aws Sdk for blob storage

This code retrieves the buckets of a Amazon S3-compatible storage (not Amazon AWS but the Zadara compatible cloud storage) and IT WORKS:
import boto3
from botocore.client import Config
session = boto3.session.Session( )
s3_client = session.client(
service_name = 's3',
region_name = 'IT',
aws_access_key_id = 'xyz',
aws_secret_access_key = 'abcedf',
endpoint_url = 'https://nothing.com:443',
config = Config(signature_version='s3v4'),
)
print('Buckets')
boto3.set_stream_logger(name='botocore')
print(s3_client.list_buckets())
I am trying to use the same method to access S3 via C# and AWS SDK, anyway I always obtain the error "The request signature we calculated does not match the signature you provided. Check your key and signing method.".
AmazonS3Config config = new AmazonS3Config();
config.AuthenticationServiceName = "s3";
config.ServiceURL = "https://nothing.com:443";
config.SignatureVersion = "s3v4";
config.AuthenticationRegion = "it";
AmazonS3Client client = new AmazonS3Client(
"xyz",
"abcdef",
config);
ListBucketsResponse r = await client.ListBucketsAsync();
What can I do? Why it is not working? I can't get a solution.
I tried also to trace debug infos:
Python
boto3.set_stream_logger(name='botocore')
C#
AWSConfigs.LoggingConfig.LogResponses = ResponseLoggingOption.Always;
AWSConfigs.LoggingConfig.LogMetrics = true;
AWSConfigs.LoggingConfig.LogTo = Amazon.LoggingOptions.SystemDiagnostics;
AWSConfigs.AddTraceListener("Amazon", new System.Diagnostics.ConsoleTraceListener());
but for C# it is not logging the whole request.
Any suggestion?

S3 unit tests boto client

Having issues writing a unit test for S3 client, it seems the test is trying to use a real s3 client rather than the one i have created for the test here is my example
#pytest.fixture(autouse=True)
def moto_boto(self):
# setup: start moto server and create the bucket
mocks3 = mock_s3()
mocks3.start()
res = boto3.resource('s3')
bucket_name: str = f"{os.environ['BUCKET_NAME']}"
res.create_bucket(Bucket=bucket_name)
yield
# teardown: stop moto server
mocks3.stop()
def test_with_fixture(self):
from functions.s3_upload_worker import (
save_email_in_bucket,
)
client = boto3.client('s3')
bucket_name: str = f"{os.environ['BUCKET_NAME']}"
client.list_objects(Bucket=bucket_name)
save_email_in_bucket(
"123AZT",
os.environ["BUCKET_FOLDER_NAME"],
email_byte_code,
)
This results in the following error
botocore.exceptions.ClientError: An error occurred (ExpiredToken) when calling the PutObject operation: The provided token has expired.
code i am testing looks like this
def save_email_in_bucket(message_id, bucket_folder_name, body):
s3_key = "".join([bucket_folder_name, "/", str(message_id), ".json"])
s3_client.put_object(
Bucket=bucket,
Key=s3_key,
Body=json.dumps(body),
ContentType="application-json",
)
LOGGER.info(
f"Saved email with messsage ID {message_id} in bucket folder {bucket_folder_name}"
)
Not accepting this an an answer but useful for anyone who ends up here, I found a workaround where if I create the s3 client in the function i am trying to test then this approach will work rather than create it globally. I would prefer to find an actual solution though.

Python upload to s3 : Stuck on put_object() call

I want to send an image in an s3 storage (MinIO).
Here is an example code :
import cv2
import argparse
import boto3
from botocore.client import Config
import logging
logging.basicConfig(level=logging.INFO)
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--s3_endpoint', type=str, default='http://127.0.0.1:9000')
parser.add_argument('--s3_access_key', type=str,default='minioadmin')
parser.add_argument('--s3_secret_key', type=str,default='minioadmin')
parser.add_argument('--image_bucket', type=str,default='my_bucket')
args = parser.parse_args()
s3 = boto3.client(
"s3",
use_ssl=False,
endpoint_url=args.s3_endpoint,
aws_access_key_id=args.s3_access_key,
aws_secret_access_key=args.s3_secret_key,
region_name='us-east-1',
config=Config(s3={'addressing_style': 'path'})
)
def upload_image_s3(bucket, image_name, image):
logging.info("Uploading image "+image_name+" to s3 bucket "+bucket)
body = cv2.imencode('.jpg', image)[1].tostring()
s3.put_object(Bucket=bucket, Key = image_name, Body = body)
logging.info("Image uploaded !")
upload_image_s3(args.processed_images_bucket,'test/image.jpg',cv2.imread("resources/images/my_image.jpg"))
But when I run it, I get stuck on the put_object() call forever (well, until timeout to be exact).
To do my test, I run locally a minIO server using the default configuration (on Windows).
Have you any idea what is the problem in my case ?
I was having the same error, my mistake was incorrect credentials when I want to connect with minIO. When I see that you just change the host to localhost I review all the code and realized that it was just a typo in my credentials

Boto3 not copying snapshot to other regions, other options?

[Very new to AWS]
Hi,
I am trying to move my EBS volume snapshot copies across regions. I have been trying to use Boto3 to move the snapshots. My objective is to move the latest snapshot from us-east-2 region to us-east-1 region automatically on a daily basis.
I have used aws configure command in terminal to setup my security credentials and set region to us-east-2.
I am using pandas to acquire the most recent snapshot-id using this code:
import boto3
import pandas as pd
from pandas.io.json.normalize import nested_to_record
import boto.ec2
client = boto3.client('ec2')
aws_api_response = client.describe_snapshots(OwnerIds=['self'])
flat = nested_to_record(aws_api_response)
df = pd.DataFrame.from_dict(flat)
df= df['Snapshots'].apply(pd.Series)
insert_snap = df.loc[df['StartTime'] == max(df['StartTime']),'SnapshotId']
insert_snap = insert_snap.reset_index(drop=True)
insert_snap returns a snapshot id something like snap-1234ABCD
I am try to use this code to move the snap shot from us-east-2 to us-east-1:
client.copy_snapshot(SourceSnapshotId='%s' %insert_snap[0],
SourceRegion='us-east-2',
DestinationRegion='us-east-1',
Description='This is my copied snapshot.')
The snapshot is copying in the same region using the above line.
I have also tried switching regions through aws configure command in terminal, with the same issue occurring where snapshot is being copied in the same region.
There is a bug in Boto3 that is skipping the destination parameter in the copy_snapshot() code. Information found here: https://github.com/boto/boto3/issues/886
I have tried inserting this code with into the lambda manager but keep getting error "errorMessage": "Unable to import module 'lambda_function'":
region = 'us-east-2'
ec = boto3.client('ec2',region_name=region)
def lambda_handler(event, context):
response=ec.copy_snapshot(SourceSnapshotId='snap-xxx',
SourceRegion=region,
DestinationRegion='us-east-1',
Description='copied from Ohio')
print (response)
I am out of options, what I can do to automate the transfer of snapshots in aws?
As per CopySnapshot - Amazon Elastic Compute Cloud:
CopySnapshot sends the snapshot copy to the regional endpoint that you send the HTTP request to, such as ec2.us-east-1.amazonaws.com (in the AWS CLI, this is specified with the --region parameter or the default region in your AWS configuration file).
Therefore, you should send the copy_snapshot() command to us-east-1, with the Source Region set to us-east-2.
If you wish to move the most recent snapshot, you could run:
import boto3
SOURCE_REGION = 'us-east-2'
DESTINATION_REGION = 'us-east-1'
# Connect to EC2 in Source region
source_client = boto3.client('ec2', region_name=SOURCE_REGION)
# Get a list of all snapshots, then sort them
snapshots = source_client.describe_snapshots(OwnerIds=['self'])
snapshots_sorted = sorted([(s['SnapshotId'], s['StartTime']) for s in snapshots['Snapshots']], key=lambda k: k[1])
latest_snapshot = snapshots_sorted[-1][0]
print ('Latest Snapshot ID is ' + latest_snapshot)
# Connect to EC2 in Destination region
destination_client = boto3.client('ec2', region_name=DESTINATION_REGION)
# Copy the snapshot
response = destination_client.copy_snapshot(
SourceSnapshotId=latest_snapshot,
SourceRegion=SOURCE_REGION,
Description='This is my copied snapshot'
)
print ('Copied Snapshot ID is ' + response['SnapshotId'])

Complete a multipart_upload with boto3?

Tried this:
import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = "/temp/"
fileName = "bigFile.gz" # this happens to be a 5.9 Gig file
client = boto3.client('s3', region)
config = TransferConfig(
multipart_threshold=4*1024, # number of bytes
max_concurrency=10,
num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file(path+fileName, 'bucket', 'key')
Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.
I found this example, but part is not defined.
import boto3
bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'
s3 = boto3.client('s3')
# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
part1 = s3.upload_part(Bucket=bucket
, Key=key
, PartNumber=1
, UploadId=mpu['UploadId']
, Body=data)
# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
'Parts': [
{
'PartNumber': 1,
'ETag': part['ETag']
}
]
}
# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
, Key=key
, UploadId=mpu['UploadId']
, MultipartUpload=part_info)
Question: Does anyone know how to use the multipart upload with boto3?
Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:
import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')
You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).
You seem to have been confused by the fact that the end result in S3 wasn't visibly made up of multiple parts:
Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.
... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.
As described in official boto3 documentation:
The AWS SDK for Python automatically manages retries and multipart and
non-multipart transfers.
The management operations are performed by using reasonable default
settings that are well-suited for most scenarios.
So all you need to do is just to set the desired multipart threshold value that will indicate the minimum file size for which the multipart upload will be automatically handled by Python SDK:
import boto3
from boto3.s3.transfer import TransferConfig
# Set the desired multipart threshold value (5GB)
GB = 1024 ** 3
config = TransferConfig(multipart_threshold=5*GB)
# Perform the transfer
s3 = boto3.client('s3')
s3.upload_file('FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', Config=config)
Moreover, you can also use multithreading mechanism for multipart upload by setting max_concurrency:
# To consume less downstream bandwidth, decrease the maximum concurrency
config = TransferConfig(max_concurrency=5)
# Download an S3 object
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)
And finally in case you want perform multipart upload in single thread just set use_threads=False:
# Disable thread use/transfer concurrency
config = TransferConfig(use_threads=False)
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME', Config=config)
Complete source code with explanation: Python S3 Multipart File Upload with Metadata and Progress Indicator
I would advise you to use boto3.s3.transfer for this purpose. Here is an example:
import boto3
def upload_file(filename):
session = boto3.Session()
s3_client = session.client("s3")
try:
print("Uploading file: {}".format(filename))
tc = boto3.s3.transfer.TransferConfig()
t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)
t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")
except Exception as e:
print("Error uploading: {}".format(e))
In your code snippet, clearly should be part -> part1 in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.
You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/
Why not use just the copy option in boto3?
s3.copy(CopySource={
'Bucket': sourceBucket,
'Key': sourceKey},
Bucket=targetBucket,
Key=targetKey,
ExtraArgs={'ACL': 'bucket-owner-full-control'})
There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.
copy from boto3 is a managed transfer which will perform a multipart copy in multiple threads if necessary.
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.copy
This works with objects greater than 5Gb and I have already tested this.
Change Part to Part1
import boto3
bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'
s3 = boto3.client('s3')
# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
part1 = s3.upload_part(Bucket=bucket
, Key=key
, PartNumber=1
, UploadId=mpu['UploadId']
, Body=data)
# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
'Parts': [
{
'PartNumber': 1,
'ETag': part1['ETag']
}
]
}
# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
, Key=key
, UploadId=mpu['UploadId']
, MultipartUpload=part_info)

Categories

Resources