I need to download a file from this URL https://desafio-rkd.s3.amazonaws.com/disney_plus_titles.csv with Python, try to do it with "" require.get '", but it returns me denied access. I understand that I have to authenticate. I have the key and the secret key, but I do not know how to do it.
Help me please?
The preferred way would be to use the boto3 library for Amazon S3. It has a download_file() command, for which you would use:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('desafio-rkd', 'disney_plus_titles.csv', 'disney_plus_titles.csv')
The parameters are: Bucket, Key, local filename to use when saving the file
Also, you will need to provide an Access Key and Secret Key. The preferred way to do this is to store them in a credentials file. This can be done by using the AWS Command-Line Interface (CLI) aws configure command.
See: Credentials — Boto3 documentation
Related
In my python code I need to extract AWS credentials
AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID
which are stored in the plain text file as described here:
https://docs.aws.amazon.com/sdkref/latest/guide/file-format.html
I know the name of the file: AWS_SHARED_CREDENTIALS_FILE
and the name of profile: AWS_PROFILE.
My current approach is to read and parse this file in python by myself to get AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID.
But I hope there is already standard way to get it using boto3 or some other library.
Please suggest.
Would something like this work for you, or am I misunderstanding the question? Basically start a session for the appropriate profile (or the default, I guess), and then query those values from the credentials object:
session = boto3.Session(profile_name=<...your-profile...>)
credentials = session.get_credentials()
print("AWS_ACCESS_KEY_ID = {}".format(credentials.access_key))
print("AWS_SECRET_ACCESS_KEY = {}".format(credentials.secret_key))
print("AWS_SESSION_TOKEN = {}".format(credentials.token))
As far as I understand, the AWS credentials file uses a standard INI file format. You can utilize configparser to parse the file easily. Please refer to: https://docs.python.org/3/library/configparser.html.
For boto3, if you put it in standard areas, it will load automagically.
Boto3 will look in several locations when searching for credentials.
The mechanism in which Boto3 looks for credentials is to search
through a list of possible locations and stop as soon as it finds
credentials. The order in which Boto3 searches for credentials is:
Passing credentials as parameters in the boto.client() method Passing
credentials as parameters when creating a Session object
Environment variables
Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
Boto2 config file (/etc/boto.cfg and ~/.boto)
Instance metadata service on an Amazon EC2 instance that has an IAM role configured.
Reference: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html
I have a file that is stored on AWS s3 at https://xyz.s3.amazonaws.com/foo/file.json and I want to download that into my local machine using Python. However, the URL cannot be accessed publicly. I have the Account ID, IAM user name, and password (but NO Access Key or Secret Access Key and no permissions to view/change them either) to the resource that contains this file. How can I programmatically download this file instead of doing so from the console?
You could generate an Amazon S3 pre-signed URL, which would allow a private object to be downloaded from Amazon S3 via a normal HTTPS call (eg curl). This can be done easily using the AWS SDK for Python, or you could code it yourself without using libraries. Answer by John Rotenstein
here
I am using the Python Google Storage Client, however I am using a bucket with public read/write access. (I know this is usually a terrible idea but I have a rare use case where it is fine).
When I try to retrieve some files, I get a DefaultCredentialsError.
BUCKET_NAME = 'my-public-bucket-name'
storage_client = storage.Client()
bucket = storage_client.get_bucket(BUCKET_NAME)
def list_blobs(prefix, delimiter=None):
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
The specific error reads:
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
That page suggests using Oath or other tokens, but I shouldn't need these since my bucket is public? I can make an HTTP request to the bucket in chrome and receive data.
How should I get around this issue? Can I provide default or null credentials?
The default for a storage client with no parameters is to use environment credentials (e.g. authenticate with the gcloud tools first). If you want to use a client with no credentials you have to use
the create_anonymous_client method, which lets you access resources available to allUsers.
Be careful though which APIs you use, not all of them support anonymous credentials. E.g. instead of client.get_bucket('my-bucket') you have to use client.bucket(bucket_name='my-bucket').
Also note that it seems any permissions error returns a generic ValueError: Anonymous credentials cannot be refreshed.. E.g. if you try to overwrite an existing file while only having read/write permissions.
So a full example of uploading a file to a publicly accessible bucket is
from google.cloud import storage
client = storage.Client.create_anonymous_client()
bucket = client.bucket(bucket_name='my-public-bucket')
blob = bucket.blob('my-file')
blob.upload_from_filename('my-local-file')
From "Cloud Storage Authentication":
Most of the operations you perform in Cloud Storage must be authenticated. The only exceptions are operations on objects that allow anonymous access. Objects are anonymously accessible if the allUsers group has READ permission. The allUsers group includes anyone on the Internet.
I have been given a bucket name with ARN Number as below:
arn:aws:iam::<>:user/user-name
I was also given an access key.
I know that this can be done using boto.
Connect to s3 bucket using IAM ARN in boto3
As in the above link do i need to use 'sts'?
if so why am i provided with an access key?
First, I recommend you install the AWS Command-Line Interface (CLI), which provides a command-line for accessing AWS.
You can then store your credentials in a configuration file by running:
aws configure
It will prompt you for the Access Key and Secret Key, which will be stored in a config file.
Then, you will want to refer to S3 — Boto 3 documentation to find out how to access Amazon S3 from Python.
Here's some sample code:
import boto3
client = boto3.client('s3', region_name = 'ap-southeast-2') # Change as appropriate
client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')
I have my data on google cloud platform and i want to be able to be able to download it locally, this is my first time trying that and eventually i'll use the downloaded data with my python code.
I have checked the docs, like https://cloud.google.com/genomics/downloading-credentials-for-api-access and https://cloud.google.com/storage/docs/cloud-console i have successfully got the Json file for my first link, the second one is where u'm struggling, i'm using python 3.5 and assuming my json files name is data.json i have added the following code:
os.environ["file"] = "data.json"
urllib.request.urlopen('https://storage.googleapis.com/[bucket_name]/[filename]')
first of all i don't even know what i should call the value near environ so i just called it file, not sure how i'm supposed to fill it and i got access denied on the second line, obviously it's not how to download my file as there is no destination local repository or anything in that command any guidance will be appreciated.
Edit:
from google.cloud.storage import Blob
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "credentials/client_secret.json"
storage_client = storage.Client.from_service_account_json('service_account.json')
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
blob = Blob('path/to/my-object', bucket)
download_to_filename('local/path/to/my-file')
I'm getting unresolved reference for storage and download_to_filename and should i replace service_account.json with credentials/client_secret.json. Plus i tried to print the content of os.environ["GOOGLE_APPLICATION_CREDENTIALS"]['installed'] like i'd do with any Json but it just said i should give numbers meaning it read the input path as regular text only.
You should use the idiomatic Google Cloud library to run operations in GCS.
With the example there, and knowing that the client library will get the application default credentials, first we have to set the applicaiton default credentials with
gcloud auth application-default login
===EDIT===
That was the old way. Now you should use the instructions in this link.
This means downloading a service account key file from the console, and setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to the path of the downloaded JSON.
Also, make sure that this service account has the proper permissions on the project of the bucket.
Or you can create the client with explicit credentials. You'll need to download the key file all the same, but when creating the client, use:
storage_client = storage.Client.from_service_account_json('service_account.json')
==========
And then, following the example code:
from google.cloud import storage
client = storage.Client(project='project-id')
bucket = client.get_bucket('bucket-id')
blob = storage.Blob('bucket/file/path', bucket)
blob.download_to_filename('/path/to/local/save')
Or, if this is a one-off download, just install the SDK and use gsutil to download:
gsutil cp gs://bucket/file .