Calling external table from bigquery with python

Calling external table from bigquery with python - python

While I trying to reach the external table's data, I'm getting error like as below. I can not solve this issue. Here are the details about the situation;
google.api_core.exceptions.NotFound: 404 Not found: Files /gdrive/id/id123456id
PS: id123456id is a dummy id.
The file with ID id123456 id exists in my Google Drive. Bigquery table looking this id.
bq_test.json -> service account credential's JSON file. This service account has those roles;
BigQuery Data Editor
BigQuery Data Owner
BigQuery Data Viewer
BigQuery User
Owner
Here is my code block:
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive'
)
credentials = Credentials.from_service_account_file('bq_test.json')
credentials = credentials.with_scopes(scopes)
client = bigquery.Client(credentials=credentials)
QUERY = (
"""SELECT * FROM
`project_name.dataset_name.ext_table`
LIMIT 5"""
)
query_job = client.query(QUERY)
rows = query_job.result()
for row in rows:
print(row.name)

I solved the problem as follows;
Go to https://console.cloud.google.com/iam-admin/iam?project=PROJECT_ID
Take service account mail value. ( like bq_test#PROJECT_ID.iam.gserviceaccount.com )
Go to https://drive.google.com and find the related file. (id = id123456)
Right-click and choose Share
Paste the above mail value. ( bq_test#PROJECT_ID.iam.gserviceaccount.com )
Choose read-only or whatever you need.
This flow provides the solution in my case.

Related

How to point to the ARN of a dynamodb table instead of using the name when using boto3

I'm trying to access a dynamodb table in another account without having to make any code changes if possible. I've setup the IAM users, roles and policies to make this possible and have succeeded with other services such as sqs and s3.
The problem I have now is with dynamodb as the code to intialise the boto3.resource connection seems to only allow me to point to the name. docs
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
table = dynamodb.Table(config['dynamo_table_1'])
This causes the problem of the code trying to access a table with that particular name in the account the code is executing in which errors out as the table exists in a different AWS account.
Is there a way to pass the ARN of the table or some identifier that would allow me to specify the accountID?

There's sample code at https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/configure-cross-account-access-to-amazon-dynamodb.html which shows how to do cross-account access. Here is a snippet from the the attached zip. I expect you could do .resource() as well as .client() with the same arguments.
import boto3
from datetime import datetime
sts_client = boto3.client('sts')
sts_session = sts_client.assume_role(RoleArn='arn:aws:iam::<Account-A ID>::role/DynamoDB-FullAccess-For-Account-B',
RoleSessionName='test-dynamodb-session')
KEY_ID = sts_session['Credentials']['AccessKeyId']
ACCESS_KEY = sts_session['Credentials']['SecretAccessKey']
TOKEN = sts_session['Credentials']['SessionToken']
dynamodb_client = boto3.client('dynamodb',
region_name='us-east-2',
aws_access_key_id=KEY_ID,
aws_secret_access_key=ACCESS_KEY,
aws_session_token=TOKEN)

Google Big Query from Python

I am trying to run a simple query on BigQuery from Python and follow this document. To set the client I generated the JSON file for my project via service account:
import pandas as pd
from google.cloud import bigquery
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=*****
client = bigquery.Client()
QUERY = (
'SELECT name FROM `mythic-music-326213.mytestdata.trainData` '
'LIMIT 100')
query_job = client.query(QUERY)
However, I am getting the following error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Technically, I want to be able to query my dataset from Python. Any help would be appreciated.

I've tried your code snippet with my service account JSON file and dataset in my project. It worked as expected. Not clear why it's not working in your case.
Hovewer you can try to use service account JSON file directly like that:
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('<path to JSON file>')
client = bigquery.Client(credentials=credentials)
QUERY = (
'SELECT state FROM `so-project-a.test.states` '
'LIMIT 100')
query_job = client.query(QUERY)

Cannot query tables from sheets in BigQuery

I am trying to use BigQuery inside python to query a table that is generated via a sheet:
from google.cloud import bigquery
# Prepare connexion and query
bigquery_client = bigquery.Client(project="my_project")
query = """
select * from `table-from-sheets`
"""
df = bigquery_client.query(query).to_dataframe()
I can usually do queries to BigQuery tables, but now I am getting the following error:
Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
What do I need to do to access drive from python?
Is there another way around?

You are missing the scopes for the credentials. I'm pasting the code snippet from the official documentation.
In addition, do not forget to give at least VIEWER access to the Service Account in the Google sheet.
from google.cloud import bigquery
import google.auth
# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)

Got Python bigquery.jobs.create permission error

O want to write a python script that upload data from a file to a bigquery table.
here is the code:
from google.cloud import bigquery
client = bigquery.Client(project=project_id, location='US').from_service_account_json('my-key.json')
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_name)
client.load_table_from_file(filename, table_ref)
I am using a gcp vm created in the same project as where my bigquery table is. I am also using a service account that has Bigquery admin role.
I have an error that says that the user doesn't have the bigquery.jobs.create permission.
I don't know if it is a useful information but i am able to read my table.
I don't know what to do.
Thanks for your help.

How to load data from one project's bucket into another project's table in python?

I have two projects. I store data in project_A and build tables in project_B. I have created a service_account.json for both of them, but I don't know how to use both files at the same time when I need to load the data from project_A and build the tables in project_B.
The data is stored in URI:
gs://project_A//*
The table will live in project_B table name huge_table
from google import storage, bigquery
proj_a_client = storage.Client.from_service_account_json(service_acct_A.json)
proj_b_client = bigquery.Client.from_service_account_json(service_acct_B.json)
dest_table = proj_b_client.dataset('DS_B').table('huge_table')
uri = 'gs://project_A//*'
job_config = bigquery.LoadJobConfig()
load_job = proj_b_client.load_table_from_uri(uri,
dest_table,
job_config=job_config)
But I get the error:
google.api_core.exceptions.Forbidden: 403 Access Denied: File
gs://project_A/: Access Denied

You have to make sure service_acct_B has storage access to project_A:
In project_A,
go to IAM & admin
add member service_acct_B with (at a minimum) Storage Object Viewer role
As a matter of fact, you don't use/need a service_acct_A here so
proj_a_client = storage.Client.from_service_account_json(service_acct_A.json) is redundant.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calling external table from bigquery with python - python

Related

How to point to the ARN of a dynamodb table instead of using the name when using boto3

Google Big Query from Python

Cannot query tables from sheets in BigQuery

Got Python bigquery.jobs.create permission error

How to load data from one project's bucket into another project's table in python?

Categories

Resources