Google Big Query from Python

Google Big Query from Python - python

I am trying to run a simple query on BigQuery from Python and follow this document. To set the client I generated the JSON file for my project via service account:
import pandas as pd
from google.cloud import bigquery
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=*****
client = bigquery.Client()
QUERY = (
'SELECT name FROM `mythic-music-326213.mytestdata.trainData` '
'LIMIT 100')
query_job = client.query(QUERY)
However, I am getting the following error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Technically, I want to be able to query my dataset from Python. Any help would be appreciated.

I've tried your code snippet with my service account JSON file and dataset in my project. It worked as expected. Not clear why it's not working in your case.
Hovewer you can try to use service account JSON file directly like that:
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('<path to JSON file>')
client = bigquery.Client(credentials=credentials)
QUERY = (
'SELECT state FROM `so-project-a.test.states` '
'LIMIT 100')
query_job = client.query(QUERY)

Related

Cannot query tables from sheets in BigQuery

I am trying to use BigQuery inside python to query a table that is generated via a sheet:
from google.cloud import bigquery
# Prepare connexion and query
bigquery_client = bigquery.Client(project="my_project")
query = """
select * from `table-from-sheets`
"""
df = bigquery_client.query(query).to_dataframe()
I can usually do queries to BigQuery tables, but now I am getting the following error:
Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
What do I need to do to access drive from python?
Is there another way around?

You are missing the scopes for the credentials. I'm pasting the code snippet from the official documentation.
In addition, do not forget to give at least VIEWER access to the Service Account in the Google sheet.
from google.cloud import bigquery
import google.auth
# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)

Connect with Google Cloud MySQL through Python; how to access table?

After following this tutorial, I am able to run a script that prints all the details of my database. However, I have no clue as to how to do something with said database! Here's my code:
from google.oauth2 import service_account
import googleapiclient.discovery
import json
SCOPES = ['https://www.googleapis.com/auth/sqlservice.admin']
SERVICE_ACCOUNT_FILE = 'credentials.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE, scopes=SCOPES)
sqladmin = googleapiclient.discovery.build('sqladmin', 'v1beta4', credentials=credentials)
response = sqladmin.instances().list(project='single-router-309308').execute()
print(json.dumps(
response, sort_keys=True, indent=2))
sqladmin.close()
Which prints all the info. I tried various things to reach my table, products, but I can't get it to work and keep getting a AttributeError: 'Resource' object has no attribute 'execute' (or 'list') exception. I tried stuff like this:
response = sqladmin.projects().list().execute()
To view my tables as well, but it doesn't work. I believe this is the correct approach since I can connect, but I haven't figured it out yet. Anybody know the answer?

As per the documentation, you should be able to get access to your table using the below code.
Note that you have an sql project, then an instance on the project, then a database in the instance, then your table is nested inside that database.
from pprint import pprint
# Project ID of the project that contains the instance.
project = 'single-router-309308'
# Database instance ID. You should have this from the above printout
instance = 'my-instance'
# Name of the database in the instance. You can look this up if you arent sure by logging into google cloud for your project. Your table is inside this database.
database = 'my-database'
request = service.databases().get(project=project, instance=instance, database=database)
response = request.execute() #returns a dictionary with the data
pprint(response)
I would suggest you take a look at the REST API references for CLoud SQL for MySQL for further reading (https://cloud.google.com/sql/docs/mysql/admin-api/rest/v1beta4/databases/get).

Calling external table from bigquery with python

While I trying to reach the external table's data, I'm getting error like as below. I can not solve this issue. Here are the details about the situation;
google.api_core.exceptions.NotFound: 404 Not found: Files /gdrive/id/id123456id
PS: id123456id is a dummy id.
The file with ID id123456 id exists in my Google Drive. Bigquery table looking this id.
bq_test.json -> service account credential's JSON file. This service account has those roles;
BigQuery Data Editor
BigQuery Data Owner
BigQuery Data Viewer
BigQuery User
Owner
Here is my code block:
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive'
)
credentials = Credentials.from_service_account_file('bq_test.json')
credentials = credentials.with_scopes(scopes)
client = bigquery.Client(credentials=credentials)
QUERY = (
"""SELECT * FROM
`project_name.dataset_name.ext_table`
LIMIT 5"""
)
query_job = client.query(QUERY)
rows = query_job.result()
for row in rows:
print(row.name)

I solved the problem as follows;
Go to https://console.cloud.google.com/iam-admin/iam?project=PROJECT_ID
Take service account mail value. ( like bq_test#PROJECT_ID.iam.gserviceaccount.com )
Go to https://drive.google.com and find the related file. (id = id123456)
Right-click and choose Share
Paste the above mail value. ( bq_test#PROJECT_ID.iam.gserviceaccount.com )
Choose read-only or whatever you need.
This flow provides the solution in my case.

How can I retrieve data from Google Cloud Storage into the SQL Server

Is there a way to retrieve data from Google Cloud Storage without third party apps?
I tried with python but I'm getting error code below.
import json
from httplib2 import Http
from oauth2client.client import SignedJwtAssertionCredentials
from apiclient.discovery import build
# Change these variables to fit your case
client_email = *******.iam.gserviceaccount.com
json_file = C:\******
cloud_storage_bucket = pubsite_prod_rev_********
report_to_download = installs_********_201901_app_version
private_key = json.loads(open(json_file).read())[my private key here]
credentials = SignedJwtAssertionCredentials(*******#gmail.com,my_private_key here),
storage = build('storage', 'v1', http=credentials.authorize(Http()))
print storage.objects().get()
   bucket=cloud_storage_bucket
   object=report_to_download).execute()
Python throws this error:
multiple statements found while compiling a single statement

The error message comes from your incorrect Python format, syntax or indentation.
Look at the last statement you want to execute, the bucket and object should be inside the get().
print storage.objects().get(
bucket=cloud_storage_bucket,
object=report_to_download).execute()

How to use projection_fields parameter in Bigquery Python API

I am trying to load the Cloud Firestore export in Google Cloud Storage into Bigquery using the Python API. I need to load only a select few fields for which I want to use --projection_fields parameter. However, I couldn't able to successfully use this parameter in my code. I'm referring this doc: https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore
I am using google.cloud library.
Cannot find this field in the bigquery or firestore libraries.
Any tip on how to use this field using the Python API will be of great help.
import os
from google.cloud import bigquery
creds_file_path = "xxxx.json"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = creds_file_path
bigquery_client = bigquery.Client()
dataset_ref = bigquery_client.dataset('abcd')
job_config = bigquery.LoadJobConfig()
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
job_config.source_format = bigquery.SourceFormat.DATASTORE_BACKUP

Reviewing the pyhon client library changelog it seems that does not support this option yet. However, you can use this workaround to include the projectionFields property, and for that matter, any property that is not supported by the client yet, but it is for the API.
my_list_of_properties = [] # The properties you want to include on the table
job_config._set_sub_prop('projectionFields', my_list_of_properties)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Big Query from Python - python

Related

Cannot query tables from sheets in BigQuery

Connect with Google Cloud MySQL through Python; how to access table?

Calling external table from bigquery with python

How can I retrieve data from Google Cloud Storage into the SQL Server

How to use projection_fields parameter in Bigquery Python API

Categories

Resources