issue on avro file import in Google BigQuery

issue on avro file import in Google BigQuery - python

I'm getting the following cryptic error message when trying to import an AVRO file created with fastavro into BigQuery:
Error while reading data, error message: The Apache Avro library failed to read data with the following error:
Invalid branch index: 18446744073709551567, the leaves size is: 2 File:
I've searched all over the Internet, but I have no idea what this error actually means. Anyone have any ideas what could be the problem?

You can use the below code to load Avro data from Cloud Storage into a new BigQuery table.
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "project.dataset.table"
job_config = bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.AVRO)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.avro"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))
You can refer to this document for more information.

Related

Google Big Query from Python

I am trying to run a simple query on BigQuery from Python and follow this document. To set the client I generated the JSON file for my project via service account:
import pandas as pd
from google.cloud import bigquery
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=*****
client = bigquery.Client()
QUERY = (
'SELECT name FROM `mythic-music-326213.mytestdata.trainData` '
'LIMIT 100')
query_job = client.query(QUERY)
However, I am getting the following error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Technically, I want to be able to query my dataset from Python. Any help would be appreciated.

I've tried your code snippet with my service account JSON file and dataset in my project. It worked as expected. Not clear why it's not working in your case.
Hovewer you can try to use service account JSON file directly like that:
import pandas as pd
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('<path to JSON file>')
client = bigquery.Client(credentials=credentials)
QUERY = (
'SELECT state FROM `so-project-a.test.states` '
'LIMIT 100')
query_job = client.query(QUERY)

Cannot query tables from sheets in BigQuery

I am trying to use BigQuery inside python to query a table that is generated via a sheet:
from google.cloud import bigquery
# Prepare connexion and query
bigquery_client = bigquery.Client(project="my_project")
query = """
select * from `table-from-sheets`
"""
df = bigquery_client.query(query).to_dataframe()
I can usually do queries to BigQuery tables, but now I am getting the following error:
Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
What do I need to do to access drive from python?
Is there another way around?

You are missing the scopes for the credentials. I'm pasting the code snippet from the official documentation.
In addition, do not forget to give at least VIEWER access to the Service Account in the Google sheet.
from google.cloud import bigquery
import google.auth
# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
# Construct a BigQuery client object.
client = bigquery.Client(credentials=credentials, project=project)

Firestore to BigQuery using Cloud function

I have a Bigquery table that link using the firestore to bigQuery extension. But, I want the BigQuery to run based on the event trigger from cloud firestore (e.g. if there is any changes/new document inserted in specific collection) so that it can load the new data streaming direct to BigQuery table.
And I want all the cloud function to be written using Python script. But most of the online use case/examples are in JS (example: https://blog.questionable.services/article/from-firestore-to-bigquery-firebase-functions/).
When I tried to follow (https://cloud.google.com/functions/docs/calling/cloud-firestore) and deploy in cloud function as below, it keep fails.
import json
cred = credentials.Certificate("YYYYY.json")
firebase_admin.initialize_app(cred)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="YYYYY.json"
db = firestore.client()
bq = bigquery.Client()
collection = db.collection('XXXX')
def hello_firestore(data, context):
""" Triggered by a change to a Firestore document.
Args:
data (dict): The event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
trigger_resource = context.resource
print ('Function triggered by change to: %s' % trigger_resource)
print ('\nOld value:')
print (json.dumps(data["oldValue"]))
print ('\nNew value:')
print (json.dumps(data["value"]))
Request for ideas/suggestion. Thank you for clarifying me in advance.

Got Python bigquery.jobs.create permission error

O want to write a python script that upload data from a file to a bigquery table.
here is the code:
from google.cloud import bigquery
client = bigquery.Client(project=project_id, location='US').from_service_account_json('my-key.json')
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_name)
client.load_table_from_file(filename, table_ref)
I am using a gcp vm created in the same project as where my bigquery table is. I am also using a service account that has Bigquery admin role.
I have an error that says that the user doesn't have the bigquery.jobs.create permission.
I don't know if it is a useful information but i am able to read my table.
I don't know what to do.
Thanks for your help.

Calling external table from bigquery with python

While I trying to reach the external table's data, I'm getting error like as below. I can not solve this issue. Here are the details about the situation;
google.api_core.exceptions.NotFound: 404 Not found: Files /gdrive/id/id123456id
PS: id123456id is a dummy id.
The file with ID id123456 id exists in my Google Drive. Bigquery table looking this id.
bq_test.json -> service account credential's JSON file. This service account has those roles;
BigQuery Data Editor
BigQuery Data Owner
BigQuery Data Viewer
BigQuery User
Owner
Here is my code block:
from google.cloud import bigquery
from google.oauth2.service_account import Credentials
scopes = (
'https://www.googleapis.com/auth/bigquery',
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive'
)
credentials = Credentials.from_service_account_file('bq_test.json')
credentials = credentials.with_scopes(scopes)
client = bigquery.Client(credentials=credentials)
QUERY = (
"""SELECT * FROM
`project_name.dataset_name.ext_table`
LIMIT 5"""
)
query_job = client.query(QUERY)
rows = query_job.result()
for row in rows:
print(row.name)

I solved the problem as follows;
Go to https://console.cloud.google.com/iam-admin/iam?project=PROJECT_ID
Take service account mail value. ( like bq_test#PROJECT_ID.iam.gserviceaccount.com )
Go to https://drive.google.com and find the related file. (id = id123456)
Right-click and choose Share
Paste the above mail value. ( bq_test#PROJECT_ID.iam.gserviceaccount.com )
Choose read-only or whatever you need.
This flow provides the solution in my case.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

issue on avro file import in Google BigQuery - python

Related

Google Big Query from Python

Cannot query tables from sheets in BigQuery

Firestore to BigQuery using Cloud function

Got Python bigquery.jobs.create permission error

Calling external table from bigquery with python

Categories

Resources