AWS Athena PyAthena AccessDeniedException - python

I am new to AWS.
I have a user account and two roles, one for prod one for test.
Usually I log into my account and switch to prod role to run some simple select queries.
Now I want to use Athena locally in Python with PyAthena.
I have tried the following resources from PyAthena documentation:
from pyathena import connect
import pandas as pd
conn = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
region_name='us-west-2')
df = pd.read_sql("SELECT * FROM many_rows", conn)
print(df.head())
But always having the error
An error occurred (AccessDeniedException) when calling the StartQueryExecution operation: User: arn:aws:iam::xxxxxx:user/xxxx#xxxxx is not authorized to perform: athena:StartQueryExecution on resource: arn:aws:athena:ap-southeast-2:xxxxx:workgroup/primary
This is the exact error I would get if I run the same query using my user account without switching the role.
I have also tried to add a profile name parameter in connect but still not working even though the env is correctly recognised.
Could someone help me how to do the 'switch' role step in local python code?

It seems like the issue is due to a missing role and not the profile_name parameter. If you look at the Connection class in pyathena, there's a role_arn variable that you can specify while initializing the connection. Here's the line I'm talking about.
You might want to try it this way -
conn = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
region_name='us-west-2',
role_arn='<your arn here'>)
I haven't tested it myself though since I do not have an Athena setup.

Related

BigQuery cross project access via cloud functions

Let's say I have two GCP Projects, A and B. And I am the owner of both projects. When I use the UI, I can query BigQuery tables in project B from both projects. But I run into problems when I try to run a Cloud Function in project A, from which I try to access a BigQuery table in project B. Specifically I run into a 403 Access Denied: Table <>: User does not have permission to query table <>.. I am a bit confused as to why I can't access the data in B and what I need to do. In my Cloud Function all I do is:
from google.cloud import bigquery
client = bigquery.Client()
query = cient.query(<my-query>)
res = query.result()
The service account used to run the function exists in project A - how do I give it editor access to BigQuery in project B? (Or what else should I do?).
Basically you have an issue with IAM Permissions and roles on the service account used to run the function.
You should define the role bigquery.admin on your service account and it would do the trick.
However it may not be the adequate solution in regards to best practices. The link below provides a few scenarios with examples of roles most suited to your case.
https://cloud.google.com/bigquery/docs/access-control-examples

BigQuery GCP Python Integration

I am trying to write all my scripts in Python instead of BigQuery. I set my active project using 'glcoud config set project' but I still get this ERROR 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/analytics-supplychain-thd/jobs: Caller does not have required permission to use project analytics-supplychain-thd. Grant the caller the Owner or Editor role, or a custom role with the serviceusage.services.use permission, by visiting https://console.developers.google.com/iam-admin/iam/project?project=analytics-supplychain-thd and then retry (propagation of new permission may take a few minutes).
How do I fix this?
I suspect you are picking up the wrong "key".json, at least in terms of permissions for one of the operation you are trying to perform. The key currently defined [1] in GOOGLE_APPLICATION_CREDENTIALS seems not have right permission. A list of roles you should grant to the Service Account can be find here [2], anyway from your error you would need at least a primitive role as Owner or Editor. The latter depends on your needs and targets (operation you perform through such script).
You should pick up the right role for your operation and associating it to the Service Account you want to use, defining therefore an identity for it through the IAM portal UI, also doable also through the CLI or API calls.
After that be sure the client you are using is logged in with the correct service account (correct json key path).
Particularly, I used the code you gave me to test and I have been able to load the data:
import pandas_gbq
import google.oauth2.service_account as service_account
# TODO: Set project_id to your Google Cloud Platform project ID
project_id = "xxx-xxxx-xxxxx"
sql = """SELECT * FROM xxx-xxxx-xxxxx.fourth_dataset.2test LIMIT 100"""
credentials = service_account.Credentials.from_service_account_file('/home/myself/key.json')
df = pandas_gbq.read_gbq(sql, project_id=project_id, dialect="standard", credentials=credentials)
This
Hope this helps!!
[1] https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
[2] https://cloud.google.com/iam/docs/understanding-roles#primitive_roles

Boto3 get InvalidClientTokenId when using update_service_specific_credential

I want to change Git credential for AWS CodeCommit to Active/Inactive using Boto3.
I tried to use update_service_specific_credential but I got this error:
An error occurred (InvalidClientTokenId) when calling the CreateServiceSpecificCredential operation: The security token included in the request is invalid: ClientError
My code:
iamClient = boto3.client('iam')
response=iamClient.update_service_specific_credential(UserName="****",
ServiceSpecificCredentialId="*****",Status="Active")
someone tried to use it?
Any advice?
Thanks!
AWS errors are often purposefully opaque/non-specific so could you give a bit more detail? Specifically, are the user performing the update and the user whose credentials are being updated two different users? There may be a race condition arising if the user being updated IS the user performing the update.

Google big query, 403 error unable to query the data via API

I am trying to query data from a public big query dataset via the bigquery client.
The problem I am facing is that in order to be able to do that I have understand that I need to have at least reader right in the project. The thing is as the project is public I don't have any way to change my grant on that project.
Is there any other way to do that ? did I miss something ?
Here is the code I am using:
bigquery_client = bigquery.Client(GDELT_ID)
LIMITED = 'SELECT * FROM [gdelt-bq:full.events_partitioned] LIMIT 100'
query = bigquery_client.run_sync_query(LIMITED)
query.run()
And the error message:
google.cloud.exceptions.Forbidden: 403 Access Denied: Project gdelt-bq: The user ******#gmail.com does not have bigquery.jobs.create permission in project gdelt-bq. (POST https://www.googleapis.com/bigquery/v2/projects/gdelt-bq/queries)
Thanks for your help
Looks like in first line you specify gdelt-bq (most likely behind your GDELT_ID) as an acting project. Of course you do not have permissions to use it as such. Instead - you should point to your own project or just leave it empty, so it will be set to default inferred from the environment
bigquery_client = bigquery.Client()

AWS CloudSearch throws EndpointConnectionError exception

I'm trying to setup CloudSearch. At first I tried it with their demo dataset ("IMDB") and it all worked just perfect.
Then I created a new domain to export our data into it. But all the attempts to connect to the new domain result in an EndpointConnectionError exception. I tried it with and without indexes, upload and get documents, all with the same exception.
A simple code which reproduces the issue:
import boto3
cloudsearch = boto3.client('cloudsearch') # we store credentials in ~/.aws/
endpoint_url = cloudsearch.describe_domains(DomainNames=['DOMAINNAME',])['DomainStatusList'][0]['SearchService']['Endpoint']
cloudsearchdomain = boto3.client('cloudsearchdomain', endpoint_url='https://%s' % (endpoint_url,))
result = cloudsearchdomain.search(query='anything')
print result
This code was working great when DOMAINNAME was the domain with the IMDB demo dataset, but once I switched it to the new domain name it started throwing this exception:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://search-DOMAINNAME-bcoaescnsbrp2h5ojzyhljdc4u.us-west-2.cloudsearch.amazonaws.com/2013-01-01/documents/batch?format=sdk"
The problem was caused by the missing access policies. It seems like AWS auto-creates the policies for the domain when you create the demo dataset without notifying about it. So for the first domain access policies were created by AWS and I didn't know about that.

Categories

Resources