store files on s3 from local storage ckan

store files on s3 from local storage ckan - python

I am using ckan version 2.3 & currently using local storage.
want to stored the resouces on amazon s3 buckets using boto.
Did the configuration with production.ini file for ckan.
Not getting stored on s3 not any exceptions, errors
on upload via GUI.

You need to setup your file uploads to point to s3 in config file e.g.- /etc/ckan/default/development.ini:
## OFS configuration
ofs.impl = s3
ofs.aws_access_key_id = xxxxxxxxx
ofs.aws_secret_access_key = xxxxxxxx`
##bucket to use in storage:
ckanext.storage.bucket = your_s3_bucket`
save and exit.
Run the following on the command line to migrate
from local file storage to remote storage.
paster db migrate-filestore

Related

AWS Change Credential file Location?

I have .aws/credentials at a different location than the current folder, how do I specify a different location?
# Create a client using the credentials and region defined in the [adminuser]
# section of the AWS credentials file (~/.aws/credentials).
session = Session(profile_name="adminuser")
polly = session.client("polly")

It appears you are using python boto library
The shared credentials file has a default location of
~/.aws/credentials. You can change the location of the shared
credentials file by setting the AWS_SHARED_CREDENTIALS_FILE
environment variable.
e.g. export AWS_SHARED_CREDENTIALS_FILE=mycustompath
Then run your python script
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#:~:text=The%20shared%20credentials%20file%20has,section%20names%20corresponding%20to%20profiles.

how to include python code as a part of cloud formation template in aws?

I have some python code/project to so some ETL work, processing some XML/json docs. code sample below, is one of the connection.py file from my project. i have few other files, which reads and processes the XML document. I want to include the project/code as part of the cloud formation template, as a lambda. how can i do this?
file - connection.py
import json
from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
my_eshost = 'search-sampledomain-bqfy4dd5xuljut33l6jdz7gkqi.us-east-1.es.amazonaws.com'
aws_auth = AWS4Auth( '*******','******', 'us-east-1', 'es')
es = Elasticsearch(hosts = [{'host': my_eshost, 'port': 443}],
http_auth=aws_auth, use_ssl=True, verify_certs=True, connection_class=RequestsHttpConnection)
print(json.dumps(es.info()))

I think the following should be helpful:
Uploading Local Artifacts to an S3 Bucket
This enables, among other things, to use your local files, e.g. connection.py for lambda function body. When you package your stack for the upload to CloudFormation, the AWS CLI will upload automatically your file to S3 and use it as a source for the function:
If you specify a file, the command directly uploads it to the S3 bucket. After uploading the artifacts, the command returns a copy of your template, replacing references to local artifacts with the S3 location where the command uploaded the artifacts. Then, you can use the returned template to create or update a stack.

Transfer data from a S3 bucket to a GCP bucket using temporary credentials

I would like to download a public dataset from the NIMH Data Archive. After creating an account on their website and accepting their Data Usage Agreement, I can download a CSV file which contains the path to all the files in the dataset I am interested in. Each path is of the form s3://NDAR_Central_1/....
1 Download on my personal computer
In the NDA Github repository, the nda-tools Python library exposes some useful Python code to download those files to my own computer. Say I want to download the following file:
s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz
Given my username (USRNAME) and password (PASSWD) (the ones I used to create my account on the NIMH Data Archive), the following code allows me to download this file to TARGET_PATH on my personal computer:
from NDATools.clientscripts.downloadcmd import configure
from NDATools.Download import Download
config = configure(username=USRNAME, password=PASSWD)
s3Download = Download(TARGET_PATH, config)
target_fnames = ['s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz']
s3Download.get_links('paths', target_fnames, filters=None)
s3Download.get_tokens()
s3Download.start_workers(False, None, 1)
Behind the hood, the get_tokens method of s3Download will use USRNAME and PASSWD to generate temporary access key, secret key and security token. Then, the start_workers method will use the boto3 and s3transfer Python libraries to download the selected file.
Everything works fine !
2 Download to a GCP bucket
Now, say I created a project on GCP and would like to directly download this file to a GCP bucket.
Ideally, I would like to do something like:
gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
To do this, I execute the following Python code in the Cloud Shell (by running python3):
from NDATools.TokenGenerator import NDATokenGenerator
data_api_url = 'https://nda.nih.gov/DataManager/dataManager'
generator = NDATokenGenerator(data_api_url)
token = generator.generate_token(USRNAME, PASSWD)
This gives me the access key, the secret key and the session token. Indeed, in the following,
ACCESS_KEY refers to the value of token.access_key,
SECRET_KEY refers to the value of token.secret_key,
SECURITY_TOKEN refers to the value of token.session.
Then, I set these credentials as environment variables in the Cloud Shell:
export AWS_ACCESS_KEY_ID = [copy-paste ACCESS_KEY here]
export AWS_SECRET_ACCESS_KEY = [copy-paste SECRET_KEY here]
export AWS_SECURITY_TOKEN = [copy-paste SECURITY_TOKEN here]
Eventually, I also set up the .boto configuration file in my home. It looks like this:
[Credentials]
aws_access_key_id = $AWS_ACCESS_KEY_ID
aws_secret_access_key = $AWS_SECRET_ACCESS_KEY
aws_session_token = $AWS_SECURITY_TOKEN
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
use-sigv4=True
host=s3.us-east-1.amazonaws.com
When I run the following command:
gsutil cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
I end up with:
AccessDeniedException: 403 AccessDenied
The full traceback is below:
Non-MD5 etag ("a21a0b2eba27a0a32a26a6b30f3cb060-6") present for key <Key: NDAR_Central_1,submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz>, data integrity checks are not possible.
Copying s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz [Content-Type=application/x-gzip]...
Exception in thread Thread-2:iB]
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/google/google-cloud-sdk/platform/gsutil/gslib/daisy_chain_wrapper.py", line 213, in PerformDownload
decryption_tuple=self.decryption_tuple)
File "/google/google-cloud-sdk/platform/gsutil/gslib/cloud_api_delegator.py", line 353, in GetObjectMedia
decryption_tuple=decryption_tuple)
File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 590, in GetObjectMedia
generation=generation)
File "/google/google-cloud-sdk/platform/gsutil/gslib/boto_translation.py", line 1723, in _TranslateExceptionAndRaise
raise translated_exception # pylint: disable=raising-bad-type
AccessDeniedException: AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>
AccessDeniedException: 403 AccessDenied
<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>A93DBEA60B68E04D</RequestId><HostId>Z5XqPBmUdq05btXgZ2Tt7HQMzodgal6XxTD6OLQ2sGjbP20AyZ+fVFjbNfOF5+Bdy6RuXGSOzVs=</HostId></Error>
I would like to be able to directly download this file from a S3 bucket to my GCP bucket (without having to create a VM, setup Python and run the code above [which works]). Why is it that the temporary generated credentials work on my computer but do not work in GCP Cloud Shell?
The complete log of the debug command
gsutil -DD cp s3://NDAR_Central_1/submission_13364/00m/0.C.2/9007827/20041006/10263603.tar.gz gs://my-bucket
can be found here.

The procedure you are trying to implement is called "Transfer Job"
In order to transfer a file from Amazon S3 bucket to a Cloud Storage bucket:
A. Click the Burger Menu on the top left corner
B. Go to Storage > Transfer
C. Click Create Transfer
Under Select source, select Amazon S3 bucket.
In the Amazon S3 bucket text box, specify the source Amazon S3 bucket name.
The bucket name is the name as it appears in the AWS Management Console.
In the respective text boxes, enter the Access key ID and Secret key associated
with the Amazon S3 bucket.
To specify a subset of files in your source, click Specify file filters beneath
the bucket field. You can include or exclude files based on file name prefix and
file age.
Under Select destination, choose a sink bucket or create a new one.
To choose an existing bucket, enter the name of the bucket (without the prefix
gs://), or click Browse and browse to it.
To transfer files to a new bucket, click Browse and then click the New bucket
icon.
Enable overwrite/delete options if needed.
By default, your transfer job only overwrites an object when the source version is
different from the sink version. No other objects are overwritten or deleted.
Enable additional overwrite/delete options under Transfer options.
Under Configure transfer, schedule your transfer job to Run now (one time) or Run
daily at the local time you specify.
Click Create.
Before setting up the Transfer Job please make sure you have the necessary roles assigned to your account and the required permissions described here.
Also take into consideration that the Storage Transfer Service is currently available to certain Amazon S3 regions, described under the AMAZON S3 tab, of the Setting up a transfer job
Transfer jobs can also be done programmatically. More information here
Let me know if this was helpful.
EDIT
Neither the Transfer Service or gsutil command support currently "Temporary Security Credentials" even though they are supported by AWS. A workaround to do what you want is to change the source code of the gsutil command.
I also filed a Feature Request on your behalf, I suggest you to star it in order to get updates of the procedure.

Error while Downloading file to my local device from S3

I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.

AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration

You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

Is it possible to download file to Google Cloud Storage via API call (with Python) using Google App Engine (without Google Compute Engine)

I wrote a python program which connected various platforms' API for file downloading purposes here. The program is currently running on my local machine (laptop) without a problem (all downloaded files saved to my local drive of course).
Here is my real question, without Google Compute Engine, is it possible to deploy the very same python program using Google App Engine? If yes, how could I save my files (via API calls) to Google Cloud Storage here?
Thanks.

Is this a Web App? If so you deploy it using GOOGLE APP ENGINE standard or flexible.
In order to send files to Cloud Storage, try the example in the python-docs-samples repo (folder appengine/flexible/storage/):
# [START upload]
#app.route('/upload', methods=['POST'])
def upload():
"""Process the uploaded file and upload it to Google Cloud Storage."""
uploaded_file = request.files.get('file')
if not uploaded_file:
return 'No file uploaded.', 400
# Create a Cloud Storage client.
gcs = storage.Client()
# Get the bucket that the file will be uploaded to.
bucket = gcs.get_bucket(CLOUD_STORAGE_BUCKET)
# Create a new blob and upload the file's content.
blob = bucket.blob(uploaded_file.filename)
blob.upload_from_string(
uploaded_file.read(),
content_type=uploaded_file.content_type
)
# The public URL can be used to directly access the uploaded file via HTTP.
return blob.public_url
# [END upload]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.