Athena query fails with boto3 (S3 location invalid)

Athena query fails with boto3 (S3 location invalid) - python

I'm trying to execute a query in Athena, but it fails.
Code:
client.start_query_execution(QueryString="CREATE DATABASE IF NOT EXISTS db;",
QueryExecutionContext={'Database': 'db'},
ResultConfiguration={
'OutputLocation': "s3://my-bucket/",
'EncryptionConfiguration': {
'EncryptionOption': 'SSE-S3'
}
})
But it raises the following exception:
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException)
when calling the StartQueryExecution operation: The S3 location provided to save your
query results is invalid. Please check your S3 location is correct and is in the same
region and try again. If you continue to see the issue, contact customer support
for further assistance.
However, if I go to the Athena Console, go to Settings and enter the same S3 location (for example):
the query runs fine.
What's wrong with my code? I've used the API of several the other services (eg, S3) successfully, but in this one I believe I'm passing some incorrect parameters. Thanks.
Python: 3.6.1. Boto3: 1.4.4

I had to add a 'athena-' prefix to my bucket to get it to work. For example, in stead of:
"s3://my-bucket/"
Try:
"s3://athena-my-bucket/"

EDIT: As suggested by Justin, AWS later added support for Athena by adding athena prefix to the bucket. Please upvote his answer.
Accepted Answer:
The S3 location provided to save your query results is invalid. Please check your S3 location is correct and is in the same region and try again.
Since it works when you use the console, it is likely the bucket is in a different region than the one you are using in Boto3. Make sure you use the correct region (the one that worked in the console) when constructing the Boto3 client. By default, Boto3 will use the region configured in the credentials file.

Alternatively try boto3.client('athena', region_name = '<region>')
Ran into the same issue and needed to specify the S3 bucket in the client.

In my case, IAM role didn't have all the permissions for the S3 bucket. I gave IAM role following permissions for Athena results bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::athena_results_bucket",
"arn:aws:s3:::athena_results_bucket"
],
"Effect": "Allow"
}
]
}

I received the OP error, attempted Justin's answer, and got the following error
SYNTAX_ERROR: line 1:15: Schema TableName does not exist
Meaning that it was not able to find the tables that I had previously created through the AWS Athena UI.
The simple solution was to use dclaze's answer instead. These two answers cannot be used simultaneously, or you will get back the initial (OP) error.

Related

Lambda function making an athena query and the query result is not in the S3 output bucket

I'm querying an S3 bucket with athena through python boto3. The query is successful and there are no errors but the output S3 bucket is empty. But when I run the query through the python console it works and there is .csv and .csv.metadata with the athena query results in the S3 output bucket.
I have added permissions mentioned on this page. https://docs.aws.amazon.com/athena/latest/ug/cross-account-permissions.html
Not sure if this matters but the S3 output bucket where the query results should be is not managed by serverless (that's what I'm using for my project) it is an existing S3 bucket and I used this package and instructions https://www.npmjs.com/package/serverless-plugin-existing-s3 and it works fine with dynamoDB and glue Catalog information getting dumped when a lambda is trigered.
def function(event,context):
client = boto3('athena')
query = 'select * from athenaTable'
response = client.start_query_execution(
QueryString=params["query"],
QueryExecutionContext={
'Database': params['database']
},
ResultConfiguration={
'OutputLocation': 's3://bucket-name/key/' #output bucket
}
)
return response

For anyone running into this issue I was able to solve it by changing the permissions to my lambda function. So make sure the Role for your lambda has the correct policies.
I solved it by giving full access to Athena and Glue. Add those policies to your lambda role.
Full access may be more than what you actually need, so just select the necessary policies from those services

Is there any way to download exported data from a Google Vault Export?

From documentation on https://developers.google.com/vault/guides/exports, I've been able to create, list, and retrieve exports, but I haven't found any way to download the exported data associated with a specific export. Is there any way to download the exported files via the API, or is this only available through the vault UI?
There is a cloudStorageSink key in the export metadata, but trying to use the values provided using the cloud storage API results in a generic permissions issue (403 Error).
Example export metadata response:
{
"status": "COMPLETED",
"cloudStorageSink": {
"files": [
{
"md5Hash": "da5e3979864d71d1e3ac776b618dcf48",
"bucketName": "408d9135-6155-4a43-9d3c-424f124b9474",
"objectName": "a740999b-e11b-4af5-b8b1-6c6def35d677/exportly-41dd7886-fe02-432f-83c-a4b6fd4520a5/Test_Export-1.zip",
"size": "37720"
},
{
"md5Hash": "d345a812e15cdae3b6277a0806668808",
"bucketName": "408d9135-6155-4a43-9d3c-424f124b9474",
"objectName": "a507999b-e11b-4af5-b8b1-6c6def35d677/exportly-41dd6886-fb02-4c2f-813c-a4b6fd4520a5/Test_Export-metadata.xml",
"size": "8943"
},
{
"md5Hash": "21e91e1c60e6c07490faaae30f8154fd",
"bucketName": "408d9135-6155-4a43-9d3c-424f124b9474",
"objectName": "a503959b-e11b-4af5-b8b1-6c6def35d677/exportly-41dd6786-fb02-42f-813c-a4b6fd4520a5/Test_Export-results-count.csv",
"size": "26"
}
]
},
"stats": {
"sizeInBytes": "46689",
"exportedArtifactCount": "7",
"totalArtifactCount": "7"
},
"name": "Test Export",
...
}

There are two approaches that can do the action you require:
The first:
using OAuth 2.0 refresh and access keys however it requires the intervention of the user, acknowledging your app access.
You can find a nice playground supplied by Google and more info here: https://developers.google.com/oauthplayground/.
You will first need to choose your desired API (in your case it is the: https://www.googleapis.com/auth/devstorage.full_controll under the Cloud Storage JSON API v1 section.
Then, you will need to log in with an admin account and click: "Exchange authorization code for tokens" (the fields "Refresh token" and "Access token" will be field automatically).
Lastly, you will need to choose the right URL to perform your request. I suggest using the "List possible operations" to choose the right URL. You will need to choose "Get Object - Retrieve the object" under Cloud Storage API v1 (notice that there are several options with the name -"Get Object", be sure to choose the one under Cloud Storage API v1 and not the one under Cloud Storage JSON API v1). Now just enter your bucket and object name in the appropriate placeholders and click Send the request.
The second:
Programmatically download it using Google client libraries. This is the approach suggested by #darkfolcer however I believe that the documentation provided by Google is insufficient and thus does not really help. If a python example will help, you can find one in the answer to the following question - How to download files from Google Vault export immediately after creating it with Python API?

Once all the exports are created you'll need to wait for them to be completed. You can use https://developers.google.com/vault/reference/rest/v1/matters.exports/list to check the status of every export in a matter. In the response refer to the “exports” array and check the value of “status” for each, any that say "COMPLETED" can be downloaded.
To download a completed export go to the “cloudStorageSink” object of each export and take the "bucketName" and "objectName" value of the first entry in the "files" Array. You’ll need to use the Cloud Storage API and these two values to download the files. This page has code examples for all the popular languages and using the API https://cloud.google.com/storage/docs/downloading-objects#storage-download-object-cpp.
Hope it helps.

The issue you are seeing is because the API works with the principle of least privilege.
The implications for you is that, since your objective is to download the files from the export, you would get the permissions to download only the files, not the whole bucket (even if it contains only those files).
This is why when you request information from the storage bucket, you get the 403 error (permission error). However, you do have permission to download the files inside the bucket. In this way, what you should do is get each object directly, doing requests like this (using the information on the question):
GET https://storage.googleapis.com/storage/v1/b/408d9135-6155-4a43-9d3c-424f124b9474/o/a740999b-e11b-4af5-b8b1-6c6def35d677/exportly-41dd7886-fe02-432f-83c-a4b6fd4520a5/Test_Export-1.zip
So, in short, instead of getting the full bucket, get each individual file generated by the export.
Hope this helps.

Errors while using sagemaker api to invoke endpoints

I've deployed an endpoint in sagemaker and was trying to invoke it through my python program. I had tested it using postman and it worked perfectly ok. Then I wrote the invocation code as follows
import boto3
import pandas as pd
import io
import numpy as np
def np2csv(arr):
csv = io.BytesIO()
np.savetxt(csv, arr, delimiter=',', fmt='%g')
return csv.getvalue().decode().rstrip()
runtime= boto3.client('runtime.sagemaker')
payload = np2csv(test_X)
runtime.invoke_endpoint(
EndpointName='<my-endpoint-name>',
Body=payload,
ContentType='text/csv',
Accept='Accept'
)
Now whe I run this I get a validation error
ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint <my-endpoint-name> of account <some-unknown-account-number> not found.
While using postman i had given my access key and secret key but I'm not sure how to pass it when using sagemaker apis. I'm not able to find it in the documentation also.
So my question is, how can I use sagemaker api from my local machine to invoke my endpoint?

I also had this issue and it turned out to be my region was wrong.
Silly but worth a check!

When you are using any of the AWS SDK (including the one for Amazon SageMaker), you need to configure the credentials of your AWS account on the machine that you are using to run your code. If you are using your local machine, you can use the AWS CLI flow. You can find detailed instructions on the Python SDK page: https://aws.amazon.com/developers/getting-started/python/
Please note that when you are deploying the code to a different machine, you will have to make sure that you are giving the EC2, ECS, Lambda or any other target a role that will allow the call to this specific endpoint. While in your local machine it can be OK to give you admin rights or other permissive permissions, when you are deploying to a remote instance, you should restrict the permissions as much as possible.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "sagemaker:InvokeEndpoint",
"Resource": "arn:aws:sagemaker:*:1234567890:endpoint/<my-endpoint-name>"
}
]
}

Based on #Jack's answer, I ran aws configure and changed the default region name and it worked.

How to upload a file to S3 and make it public using boto3?

I am able to upload an image file using:
s3 = session.resource('s3')
bucket = s3.Bucket(S3_BUCKET)
bucket.upload_file(file, key)
However, I want to make the file public too. I tried looking up for some functions to set ACL for the file but seems like boto3 have changes their API and removed some functions. Is there a way to do it in the latest release of boto3?

To upload and set permission to publicly-readable in one step, you can use:
bucket.upload_file(file, key, ExtraArgs={'ACL':'public-read'})
See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html#the-extraargs-parameter

I was able to do it using objectAcl API:
s3 = boto3.resource('s3')
object_acl = s3.ObjectAcl('bucket_name','object_key')
response = object_acl.put(ACL='public-read')
For details: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#objectacl

Adi's way works. However, if you were like me, you might have run into an access denied issue. This is normally caused by broken permissions of the user.
I fixed it by adding the following to the Action array:
"s3:GetObjectAcl",
"s3:PutObjectAcl"

In the recent versions of boto, ACL is available as a regular parameter - both when using the S3 client and resource, it seems. You can just specify ACL="public_read" without having to wrap it with ExtraParams or using ObjectAcl API.

Set the ACL="public-read" as mentioned above.
Also, make sure your bucket policy Resource line
has both the bare arn and /* arn formats.
Not having them both can cause strange permissions problems.
...
"Resource": ["arn:aws:s3:::my_bucket/*", "arn:aws:s3:::my_bucket"]

Trying to access a s3 bucket using boto3, but getting 403

I wrote a python script to download some files from an s3 bucket. The script works just fine on one machine, but breaks on another.
Here is the exception I get: botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden.
I am pretty sure it's related to some system configurations, or something related to the registry, but don't know what exactly. Both machines are running Windows 7 and python 3.5.
Any suggestions.

The issue was actually being caused by the system time being incorrect. I fixed the system time and the problem is fixed.

So forbidden means you dont have access to perform the operation. Check you have permission to perform read on that specific bucket and also you have supplied valid IAM keys. Below is the sample policy for getting read and list access to bucket.
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"statement1",
"Effect":"Allow",
"Action":[
"s3:List*",
"s3:GetObject"
],
"Resource":[
"arn:aws:s3:::bucketname/*"
]
}
]
}
More info here:
Specifying Permissions in a Policy
Writing IAM Policies: How to Grant Access to an Amazon S3 Bucket

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.