I am trying to move files from a S3 bucket in one account(source account) to S3 bucket in another account(destination account)
I am using sagemaker notebook so I have a sagemaker role.
I also have a role in my team account which has full s3 access and fullsagemaker access and in the trust relationship i have given the destination account role arn and sagemaker role arn.
The destination account also has my team role arn and sagemaker role arn in its trust policy.
I am trying to assume my team role and then I will assume the destination role to copy files.
import boto3
sts_client = boto3.client('sts')
assumed_teamrole_object = sts_client.assume_role(DurationSeconds=1800,
RoleArn='myteamrole',
RoleSessionName='test1')
assumed_destrole_object = sts_client.assume_role(DurationSeconds=1800,
ExternalId='externalid provided by destination account',
RoleArn='destination account role',
RoleSessionName='test2')
temp_credentials = assumed_destrole_object['Credentials']
session=boto3.session.Session(aws_access_key_id=temp_credentials['Access KeyyId'],
aws_secret_access_key=temp_credentials['SecretAccessKey'],
aws_session_token=temp_credentials['SessionToken'],
region_name = 'us-east-1')
client = session.client('s3', aws_access_key_id=temp_credentials['AccessKeyId'],
aws_secret_access_key=temp_credentials['SecretAccessKey'],
aws_session_token=temp_credentials['SessionToken'],
region_name = 'us-east-1')
response = client.list_objects(Bucket='source bucket')
print(response)
When I am running the above script I a getting the error :
An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
The objects in the source bucket are encrypted. Do I have to add any-permissions to decrypt on my end? Not sure why i am not able to list objects.
When copying files between S3 buckets that belong to different AWS accounts, you will need a single set of credentials that can read from the source bucket and write to the destination bucket.
If, instead, you are using two different credentials, then you will need to download the file with one set of credentials and then upload with another set of credentials, rather than copying the object in one operation.
Therefore, I recommend that you use one set of credentials (eg the myteamrole IAM Role) and then:
Attach a policy to the IAM Role that permits GetObject access on the source bucket, and
Attach a bucket policy to the destination bucket in the other AWS account that permits PutObject access from the above IAM Role
This will permit the CopyObject() operation with the one set of credentials.
I also recommend specifying ACL = bucket-owner-full-control when copying the object. This will grant ownership of the object to the destination AWS Account, which can avoid some permission problems. This will also require PutObjectAcl permissions on the Bucket Policy.
Related
I am using below code to copy blob across different storage accounts, but it fails with the below error
src_blob = '{0}/{1}?{2}'.format('source_url',b_name,'sp=rw&st=2022-11-17T20:44:03Z&se=2022-12-31T04:44:03Z&spr=https&sv=2021-06-08&sr=c&sig=ZXRe2FptVF5ArRM%2BKDAkLboCN%2FfaD9Mx38yZGWhnps0%3D')
destination_client = BlobServiceClient.from_connection_string("destination_connection_string")//The connection string has sas token which has sr=c
copied_blob = destination_client.get_blob_client('standardfeed', b_name)
copied_blob.start_copy_from_url(src_blob)
ErrorCode: AuthorizationPermissionMismatch
This request is not authorized to perform this operation using this permission.
Any thing missing or did I copy the wrong SAS token?
I tried in my environment and successfully copied blob from one storage account to another storage account.
Code:
from azure.storage.blob import BlobServiceClient
b_name="sample1.pdf"
src_blob = '{0}/{1}?{2}'.format('https://venkat123.blob.core.windows.net/test',b_name,'sp=r&st=2022-11-18T07:46:10Z&se=2022-11-18T15:46:10Z&spr=https&sv=<SAS token >)
destination_client = BlobServiceClient.from_connection_string("<connection string>")
copied_blob = destination_client.get_blob_client('test1', b_name)
copied_blob.start_copy_from_url(src_blob)
Console:
Portal:
Make sure you has necessary permission for authentication purpose you need to assign roles in your storage account.
Storage Blob Data Contributor
Storage Blob Data Reader
Portal:
Update:
You can get the connection string through portal:
Reference:
Azure Blob Storage "Authorization Permission Mismatch" error for get request with AD token - Stack Overflow
I am doing a cross account copy of s3 objects. When I am trying to copy files from source bucket to destination bucket I am getting the error ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
I am getting error at line s3_object.download_fileobj(buffer)
final_df=pd.DataFrame()
for file in files1:
# file=file.split('/')[-1]
bucket = 'source bucket'
buffer = io.BytesIO()
s3 = boto3.resource('s3')
s3_object = s3.Object(bucket,file)
s3_object.download_fileobj(buffer)
df = pd.read_parquet(buffer)
print(file)
s3 = boto3.client('s3')
file=file.split('/')[-1]
print(file)
final_df=pd.concat([final_df,df],sort=False)
Files1 is the list of all parquet files in the bucket
The issue here may be with this being a cross account copy. You may need to set up an IAM role to ensure that both accounts have permission.
As found in https://aws.amazon.com/premiumsupport/knowledge-center/s3-troubleshoot-403/, along with other troubleshooting options,
For ongoing cross-account permissions, create an IAM role in your
account with permissions to your bucket. Then, grant another AWS
account the permission to assume that IAM role. For more information,
see Tutorial: Delegate access across AWS accounts using IAM roles. (https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html)
I'm trying to move the contents of a bucket from account-a to a bucket in account-b which I already have the credentials for both of them.
Here's the code I'm currently using:
import boto3
SRC_AWS_KEY = 'src-key'
SRC_AWS_SECRET = 'src-secret'
DST_AWS_KEY = 'dst-key'
DST_AWS_SECRET = 'dst-secret'
srcSession = boto3.session.Session(
aws_access_key_id=SRC_AWS_KEY,
aws_secret_access_key=SRC_AWS_SECRET
)
dstSession = boto3.session.Session(
aws_access_key_id=DST_AWS_KEY,
aws_secret_access_key=DST_AWS_SECRET
)
copySource = {
'Bucket': 'src-bucket',
'Key': 'test-bulk-src'
}
srcS3 = srcSession.resource('s3')
dstS3 = dstSession.resource('s3')
dstS3.meta.client.copy(CopySource=copySource, Bucket='dst-bucket', Key='test-bulk-dst', SourceClient=srcS3.meta.client)
print('success')
The problem is that when I specify a file's name in the field Key followed by /file.csv it works really fine, but when I set it to copy the whole folder, as showed in the code, it fails and throws this exception:
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
What I need to do is to move the contents in one call, not by iterating through the contents of the src-folder, because this is time/money consuming, as I may have thousands of files to be moved.
There is no API call in Amazon S3 to copy folders. (Folders do not actually exist — the Key of each object includes its full path.)
You will need to iterate through each object and copy it.
The AWS CLI (written in Python) provides some higher-level commands that will do this iteration for you:
aws s3 cp --recursive s3://source-bucket/folder/ s3://destination-bucket/folder/
If the buckets are in different accounts, I would recommend:
Use a set of credentials for the destination account (avoids problems with object ownership)
Modify the bucket policy on the source bucket to permit access by the credentials from the destination account (avoids the need to use two sets of credentials)
I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure
If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.
If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.
I am using python and boto3 to list resource that my organization has. I am listing the resources from my master account without a problem but I also need to list the resource from the child accounts as well. I can get the child account ID's but that's pretty much it.
Any help?
You will need access to a set of credentials that belong to the child account.
From Accessing and Administering the Member Accounts in Your Organization - AWS Organizations:
When you create a member account using the AWS Organizations console, AWS Organizations automatically creates an IAM role in the account. This role has full administrative permissions in the member account. The role is also configured to grant that access to the organization's master account.
To use this role to access the member account, you must sign in as a user from the master account that has permissions to assume the role.
So, you can assume the IAM Role in the child account, which then provides a set of temporary credentials that can be used with boto3 to make API calls to the child account.
import boto3
role_info = {
'RoleArn': 'arn:aws:iam::<AWS_ACCOUNT_NUMBER>:role/<AWS_ROLE_NAME>',
'RoleSessionName': '<SOME_SESSION_NAME>'
}
client = boto3.client('sts')
credentials = client.assume_role(**role_info)
session = boto3.session.Session(
aws_access_key_id=credentials['Credentials']['AccessKeyId'],
aws_secret_access_key=credentials['Credentials']['SecretAccessKey'],
aws_session_token=credentials['Credentials']['SessionToken']
)
An easier way is to put the role in your .aws/config file as a new profile. Then, you can specify a profile when making function calls:
# In ~/.aws/credentials:
[master]
aws_access_key_id=foo
aws_secret_access_key=bar
# In ~/.aws/config
[profile child1]
role_arn=arn:aws:iam:...
source_profile=master
Use it like this:
session = boto3.session.Session(profile_name='dev')
s3 = session.client('s3')
See: How to choose an AWS profile when using boto3 to connect to CloudFront