I have a command using the AWS CLI
aws s3 cp y:/mydatafolder s3://<bucket>/folder/subfolder --recursive --grants read=uri=http://policyurl
The first part is easy to do in python, I can use os.walk to walk the folders and get the files and upload the file using the boto3.s3client.upload_file command. The second part I'm struggling with is the --grants read part.
What boto3 function do I need to call to do this?
Thanks
In your upload_file call you will need to pass GrantRead in ExtraArgs. Generally speaking you can pass in any arguments that put_object would take this way. For example:
import boto3
s3 = boto3.client('s3')
file_path = '/foo/bar/baz.txt'
s3.upload_file(
Filename=file_path,
Bucket='<bucket>',
Key='key',
ExtraArgs={
'GrantRead': 'uri="http://policyurl"'
}
)
Related
My project needs to download quite a few files regularly before doing treatment on them.
I tried coding it directly in Python but it's horribly slow considering the amount of data in the buckets.
I decided to use a subprocess running aws-cli because boto3 still doesn't have a sync functionality. I know using a subprocess with aws-cli is not ideal, but it really is useful and works extremely well out of the box.
One of the perks of aws-cli is the fact that I can see the progress in stdout, which I am getting with the following code:
def download_bucket(bucket_url, dir_name, dest):
"""Download all the files from a bucket into a directory."""
path = Path(dest) / dir_name
bucket_dest = str(os.path.join(bucket_url, dir_name))
with subprocess.Popen(["aws", "s3", "sync", bucket_dest, path], stdout=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
for b in p.stdout:
print(b, end='')
if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, p.args)
Now, I want to make sure that I test this function but I am blocked here because:
I don't know the best way to test this kind of freakish behavior:
Am I supposed to actually create a fake local s3 bucket so that aws s3 sync can hit it?
Am I supposed to mock the subprocess call and not actually call my download_bucket function?
Until now, my attempt was to create a fake bucket and to pass it to my download_bucket function.
This way, I thought that aws s3 sync would still be working, albeit locally:
def test_download_s3(tmpdir):
tmpdir.join(f'frankendir').ensure()
with mock_s3():
conn = boto3.resource('s3', region_name='us-east-1')
conn.create_bucket(Bucket='cool-bucket.us-east-1.dev.000000000000')
s3 = boto3.client('s3', region_name="us-east-1")
s3.put_object(Bucket='cool-bucket.us-east-1.dev.000000000000', Key='frankendir', Body='has no files')
body = conn.Object('cool-bucket.us-east-1.dev.000000000000', 'frankendir').get()[
'Body'].read().decode("utf-8")
download_bucket('s3://cool-bucket.us-east-1.dev.000000000000', 'frankendir', tmpdir)
#assert tmpdir.join('frankendir').join('has not files').exists()
assert body == 'has no files'
But I get the following error fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
My questions are the following:
Should I continue to pursue this creation of a fake local s3 bucket?
If so, how am I supposed to get the credentials to work?
Should I just mock the subprocess call and how?
I am having a hard time understanding how mocking works and how it's supposed to be done. From my understanding, I would just fake a call to aws s3 sync and return some files?
Is there another kind of unit test that would be enough that I didn't think of?
After all, I just want to know if when I transmit a well-formed s3://bucketurl, a dir in that bucket and a local dir, the files contained within the s3://bucketurl/dir are downloaded to my local dir.
Thank you for your help, I hope that I am not all over the place.
A much better approach is to use moto when faking / testing s3. You can check out their documentation or look at a test code example I did: https://github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py.
If you have a few minutes, you can view this short video of me explaining the benefits of using moto vs trying to mock.
I want to get all files that are in my bucket with python. I trying this way:
import ibm_boto3
from ibm_botocore.client import Config, ClientError
files = cos.Object(my_bucket_name).objects.all() # error here
But it shows this error:
ValueError (note: full exception trace is shown but execution is paused at: <module>)
Required parameter key not set
How do I get all objects/files from the bucket?
Sorry, I was doing it wrong, the correct way is like this:
files = cos.Bucket(bucket_name).objects.all()
Problem solved!
I am trying to temporarily store files within a function app. The function is triggered by an http request which contains a file name. I first check if the file is within the function app storage and if not, I write it into the storage.
Like this:
if local_path.exists():
file = json.loads(local_path.read_text('utf-8'))
else:
s = AzureStorageBlob(account_name=account_name, container=container,
file_path=file_path, blob_contents_only=True,
mi_client_id=mi_client_id, account_key=account_key)
local_path.write_bytes(s.read_azure_blob())
file = json.loads((s.read_azure_blob()).decode('utf-8'))
This is how I get the local path
root = pathlib.Path(__file__).parent.absolute()
file_name = pathlib.Path(file_path).name
local_path = root.joinpath('file_directory').joinpath(file_name)
If I upload the files when deploying the function app everything works as expected and I read the files from the function app storage. But if I try to save or cache a file, it breaks and gives me this error Error: [Errno 30] Read-only file system:
Any help is greatly appreciated
So after a year I discovered that Azure function apps can utilize the python os module to save files which will persist throughout calls to the function app. No need to implement any sort of caching logic. Simply use os.write() and it will work just fine.
I did have to implement a function to clear the cache after a certain period of time but that was a much better alternative to the verbose code I had previously.
I try to get a pickle file from an S3 resource using the "Object.get()" method of the boto3 library from several processes simultaneously. This causes my program to get stuck on one of the processes (No exception raised and the program does not continue to the next line).
I tried to add a "Config" variable to the S3 connection. That didn't help.
import pickle
import boto3
from botocore.client import Config
s3_item = _get_s3_name(descriptor_key) # Returns a path string of the desiered file
config = Config(connect_timeout=5, retries={'max_attempts': 0})
s3 = boto3.resource('s3', config=config)
bucket_uri = os.environ.get(*ct.S3_MICRO_SERVICE_BUCKET_URI) # Returns a string of the bucket URI
estimator_factory_logger.debug(f"Calling s3 with item {s3_item} from URI {bucket_uri}")
model_file_from_s3 = s3.Bucket(bucket_uri).Object(s3_item)
estimator_factory_logger.debug("Loading bytes...")
model_content = model_file_from_s3.get()['Body'].read() # <- Program gets stuck here
estimator_factory_logger.debug("Loading from pickle...")
est = pickle.loads(model_content)
No error message raised. It seems that the "get" method is stuck in a deadlock.
Your help will be much appreciated.
Is there a possibility that one of the files in the bucket is just huge and program takes a long time to read?
If that's the case, as a debugging step I'd look into model_file_from_s3.get()['Body'] object, which is botocore.response.StreamingBody object, and use set_socket_timeout()on it to try and force timeout.
https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html
The problem was that we created a subprocess after our main process opened several threads in it. Apparently, This is a big No-No in Linux.
We fixed it by using "spawn" instead of "fork"
I've used boto to interact with S3 with with no problems, but now I'm attempting to connect to the AWS Support API to pull back info on open tickets, trusted advisor results, etc. It seems that the boto library has different connect methods for each AWS service? For example, with S3 it is:
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
According to the boto docs, the following should work to connect to AWS Support API:
>>> from boto.support.connection import SupportConnection
>>> conn = SupportConnection('<aws access key>', '<aws secret key>')
However, there are a few problems I see after digging through the source code. First, boto.support.connection doesn't actually exist. boto.connection does, but it doesn't contain a class SupportConnection. boto.support.layer1 exists, and DOES have the class SupportConnection, but it doesn't accept key arguments as the docs suggest. Instead it takes 1 argument - an AWSQueryConnection object. That class is defined in boto.connection. AWSQueryConnection takes 1 argument - an AWSAuthConnection object, class also defined in boto.connection. Lastly, AWSAuthConnection takes a generic object, with requirements defined in init as:
class AWSAuthConnection(object):
def __init__(self, host, aws_access_key_id=None,
aws_secret_access_key=None,
is_secure=True, port=None, proxy=None, proxy_port=None,
proxy_user=None, proxy_pass=None, debug=0,
https_connection_factory=None, path='/',
provider='aws', security_token=None,
suppress_consec_slashes=True,
validate_certs=True, profile_name=None):
So, for kicks, I tried creating an AWSAuthConnection by passing keys, followed by AWSQueryConnection(awsauth), followed by SupportConnection(awsquery), with no luck. This was inside a script.
Last item of interest is that, with my keys defined in a .boto file in my home directory, and running python interpreter from the command line, I can make a direct import and call to SupportConnection() (no arguments) and it works. It clearly is picking up my keys from the .boto file and consuming them but I haven't analyzed every line of source code to understand how, and frankly, I'm hoping to avoid doing that.
Long story short, I'm hoping someone has some familiarity with boto and connecting to AWS API's other than S3 (the bulk of material that exists via google) to help me troubleshoot further.
This should work:
import boto.support
conn = boto.support.connect_to_region('us-east-1')
This assumes you have credentials in your boto config file or in an IAM Role. If you want to pass explicit credentials, do this:
import boto.support
conn = boto.support.connect_to_region('us-east-1', aws_access_key_id="<access key>", aws_secret_access_key="<secret key>")
This basic incantation should work for all services in all regions. Just import the correct module (e.g. boto.support or boto.ec2 or boto.s3 or whatever) and then call it's connect_to_region method, supplying the name of the region you want as a parameter.