How to download a file from public AWS s3 with python?

How to download a file from public AWS s3 with python? - python

I am trying to download files from a public aws s3 from this website with python scripts. For example, the first the object on the link. I tried boto3 and I got a No Credentials error:
s3 = boto3.resource('s3')
bucket = s3.Bucket('oedi-data-lake')
keys = []
for obj in bucket.objects.filter(Prefix='nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/'):
if obj.key.endswith('bldg0000001-up10.zip'):
keys.append(obj.key)
print(keys)
I also found a post Download file/folder from Public AWS S3 with Python, no credentials
and I tried as the following:
import requests
headers = {'Host' : 'oedi-data-lake.s3.amazonaws.com'}
url = 'https://oedi-data-lake.s3.amazonaws.com/nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/bldg0000001-up10.zip'
r = requests.get(url)
but got a SSLCertVerificationError
Please help. :)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thank you, jhashimoto!
But by doing the following, I still have the NoCredentialsError
import boto3
from botocore import UNSIGNED
from botocore.config import Config
s3 = boto3.resource("s3", config=Config(signature_version=UNSIGNED))
s3_client = boto3.client('s3')
s3_client.download_file('oedi-data-lake', 'nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/bldg0000001-up10.zip', 'bldg1.zip')
I also read can-i-use-boto3-anonymously and changed the code as below:
import boto3
from botocore import UNSIGNED
from botocore.config import Config
client = boto3.client('s3', aws_access_key_id='', aws_secret_access_key='')
client._request_signer.sign = (lambda *args, **kwargs: None)
client.download_file('oedi-data-lake', 'nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/bldg0000001-up10.zip', 'bldg01.zip')
and got SSLCertVerificationError.
is this something that caused by the company security policy?
Sorry for the naive questions. Completely new on AWS.
thank you so much

To access a bucket that allows anonymous access, configure it not to use credentials.
import boto3
from botocore import UNSIGNED
from botocore.config import Config
s3 = boto3.resource("s3", config=Config(signature_version=UNSIGNED))
# output:
# ['nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/bldg0000001-up10.zip']
python - Can I use boto3 anonymously? - Stack Overflow
Yes. Your credentials are used to sign all the requests you send out, so what you have to do is configure the client to not perform the signing step at all.
Note:
Unrelated to the main topic, The AWS Python SDK team does not intend to add new features to the resource interface. You can use the client interface instead.
Resources — Boto3 Docs 1.26.54 documentation
The AWS Python SDK team does not intend to add new features to the resources interface in boto3. Existing interfaces will continue to operate during boto3's lifecycle. Customers can find access to newer service features through the client interface.
Added at 2023/01/21 12:00:
This is a sample code using the client interface.
import boto3
from botocore import UNSIGNED
from botocore.config import Config
s3_client = boto3.client('s3', config=Config(signature_version=UNSIGNED))
s3_client.download_file('oedi-data-lake', 'nrel-pds-building-stock/end-use-load-profiles-for-us-building-stock/2022/resstock_tmy3_release_1/building_energy_models/upgrade=10/bldg0000001-up10.zip', 'bldg1.zip')

Related

How to upload a file from Vertex AI Workbench to AWS S3?

I have an access in Google Cloud and AWS. I wanted to upload a file from Vertex AI Workbench to AWS S3, is that possible? Or there is an alternative way?
I have read some tread that might help me, and have try some code, but still can't solve my problem, and raise an error
Could not connect to the endpoint URL:
"https://xyz.s3.auto.amazonaws.com/uploaded.csv?uploads"
Here is my code
import boto3
import os
import io
s3 = boto3.resource('s3')
key_id="my_key"
access_key="my_access_key"
client = boto3.client("s3", region_name="auto", aws_access_key_id=key_id, aws_secret_access_key=access_key)
client.upload_file(
Filename="path_file.csv",
Bucket="bucket_name",
Key="uploaded.csv",
)

I think the issue here is you're using region=auto for AWS which is not supported. The region needs to be real region because (you can see in the error) it's being used to pick the endpoint.
Try it without that.
import os
import io
s3 = boto3.resource('s3')
key_id="my_key"
access_key="my_access_key"
client = boto3.client("s3", aws_access_key_id=key_id, aws_secret_access_key=access_key)
client.upload_file(
Filename="path_file.csv",
Bucket="bucket_name",
Key="uploaded.csv",
)

Boto3 is giving botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: on Linux server

Below code is working on Windows but on Linux server giving error. I am able to hit endpoint from linux server by using Firewall, ping and telnet.
import boto3
from botocore.client import Config
import boto3
config = Config(connect_timeout=5, retries={'max_attempts': 0})
aws_access_key_id = "aws_access_key_id"
aws_secret_access_key = "aws_secret_access_key"
host = "http://s3path",)
session = boto3.Session()
s1 = session.resource('s3', config=config)
s3 = boto3.client('s3',endpoint_url=host, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key,config=config)
# Print out bucket names
contents = s3.list_objects_v2(Bucket='bucket', MaxKeys=1000, Prefix='prefix')['Contents']
print(contents)```
Error:
raise ConnectTimeoutError(endpoint_url=request.url, error=e)
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL:

I can't test using your code since you didn't include the parts that are failing, but I can tell you that you don't typically need the endpoint_url. Here's the relevant part from the docs
endpoint_url (string) -- The complete URL to use for the constructed
client. Normally, botocore will automatically construct the
appropriate URL to use when communicating with a service.
Here's my version of your code that works. Ran from inside AWS Cloud Shell.
import boto3
from botocore.client import Config
bucket_name = "your-bucket-name"
config = Config(connect_timeout=5, retries={'max_attempts': 0})
session = boto3.Session()
s3_client = session.client('s3', config=config)
# Print out bucket names
contents = s3_client.list_objects_v2(Bucket=bucket_name)['Contents']
print(contents)

google storage error: Bucket is requester pays bucket but no user project provided

I want to download files using "file.json" which includes all URLs. Here I tried to use those code in Python, however I am getting this error:
"code":400,"message":"Bucket is requester pays bucket but no user project provided"
I already set up my billing details when I created the GCP account. I really don't know how to solve it. I am not the owner of the bucket. I do have permission to get data from the bucket.
Python code:
import os
from google.cloud.bigquery.client import Client
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/key.json'
bq_client = Client()
import json
from google.cloud import storage
client = storage.Client(project='my_projectID')
bucket = client.get_bucket('the_bucket')
file_json = bucket.get_blob('file.json')
data = json.loads(file_json.download_as_string())

You need to provide the user_project in the request, which is a gcp project for which you have billing rights. The requests will then be charged to that project.
You can find a python code sample here: https://cloud.google.com/storage/docs/using-requester-pays#using
bucket = storage_client.bucket(bucket_name, user_project=project_id)
See here for which permissions you need in the user_project: https://cloud.google.com/storage/docs/requester-pays#requirements
serviceusage.services.use

using local endpoint with boto2

I am trying to mock AWS s3 api calls using boto2.
I create local s3 endpoint using localstack and can use this using boto3 easily as below,
import boto3
s3_client = boto3.client('s3', endpoint_url='http://localhost:4572')
bucket_name = 'my-bucket'
s3_client.create_bucket(Bucket=bucket_name)
But I did not find way to do this using boto2. Is there any way preferably using ~/.boto or ~/.aws/config?
Tried providing endpoint with boto2 but it failed.
import boto
boto.s3.S3RegionInfo(name='test-s3-region', endpoint='http://127.0.0.1:4572/')
s3 = boto.s3.connect_to_region('test-s3-region')
print s3.get_bucket('test-poc')
error:
AttributeError: 'NoneType' object has no attribute 'get_bucket'
I am looking to use local endpoints for all AWS services for testing purpose.

This works for me:
import boto
from boto.s3.connection import S3Connection
region = boto.s3.S3RegionInfo(name='test-s3-region', endpoint='http://127.0.0.1:4572/', connection_cls=S3Connection)
conn = region.connect()
print conn.get_bucket('test-poc')
You need to set the connection_cls attribute wish is NoneType by default.

Sample of Server to Server authentication using OAuth 2.0 with Google API's

This is a follow-up question for this question:
I have successfully created a private key and have read the various pages of Google documentation on the concepts of server to server authentication.
I need to create a JWT to authorize my App Engine application (Python) to access the Google calendar and post events in the calendar. From the source in oauth2client it looks like I need to use oauth2client.client.SignedJwtAssertionCredentials to create the JWT.
What I'm missing at the moment is a stylised bit of sample Python code of the various steps involved to create the JWT and use it to authenticate my App Engine application for Google Calendar. Also, from SignedJwtAssertionCredentials source it looks like I need some App Engine compatible library to perform the signing.
Can anybody shed some light on this?

After some digging I found a couple of samples based on the OAuth2 authentication. From this I cooked up the following simple sample that creates a JWT to access the calendar API:
import httplib2
import pprint
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
# Get the private key from the Google supplied private key file.
f = file("your_private_key_file.p12", "rb")
key = f.read()
f.close()
# Create the JWT
credentials = SignedJwtAssertionCredentials(
"xxxxxxxxxx#developer.gserviceaccount.com", key,
scope="https://www.googleapis.com/auth/calendar"
)
# Create an authorized http instance
http = httplib2.Http()
http = credentials.authorize(http)
# Create a service call to the calendar API
service = build("calendar", "v3", http=http)
# List all calendars.
lists = service.calendarList().list(pageToken=None).execute(http=http)
pprint.pprint(lists)
For this to work on Google App Engine you will need to enable PyCrypto for your app. This means adding the following to your app.yaml file:
libraries:
- name: pycrypto
version: "latest"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download a file from public AWS s3 with python? - python

Related

How to upload a file from Vertex AI Workbench to AWS S3?

Boto3 is giving botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: on Linux server

google storage error: Bucket is requester pays bucket but no user project provided

using local endpoint with boto2

Sample of Server to Server authentication using OAuth 2.0 with Google API's

Categories

Resources