I cannot find a way to append results of my query to a table in BigQuery that already exists and is partitioned by hour.
I have only found this solution: https://cloud.google.com/bigquery/docs/writing-results#writing_query_results.
job_config = bigquery.QueryJobConfig(destination=table_id)
sql = """SELECT * FROM table1 JOIN table2 ON table1.art_n=table2.artn"""
# Start the query, passing in the extra configuration.
query_job = client.query(sql, job_config=job_config) # Make an API request.
query_job.result() # Wait for the job to complete.
But providing a destination table to bigquery.QueryJobConfig overwrites it, and I did not find that bigquery.QueryJobConfig would have an option to specify if_exists or something. As far as I understand, I need to apply job.insert to query results, but I do not understand how.
I also did not find any good advice around, maybe someone can point me to it?
Just in case, my real query is huge and I load it from a separate JSON file.
When you create your job_config, you need to set the write_disposition to WRITE_APPEND:
[..]
job_config = bigquery.QueryJobConfig(
allow_large_results=True,
destination=table_id,
write_disposition='WRITE_APPEND'
)
[..]
See here.
You can add below lines to append data into existing table:
job_config.write_disposition = 'WRITE_APPEND'
Complete Code:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig(destination="myproject.mydataset.target_table")
job_config.write_disposition = 'WRITE_APPEND'
sql = """SELECT * FROM table1 JOIN table2 ON table1.art_n=table2.artn"""
query_job = client.query(sql, job_config=job_config)
query_job.result()
The parameter that you were looking for is called write_disposition. You want to use WRITE_APPEND to append to a table.
Related
Hi is there a way for deleting rows in BigQuery from a Python Script? I tried looking in the documentation and finding an example on internet, but I could not find anything.
Something that looks like this.
table_id = "a.dataset.table" # Table ID for faulty_gla_entry
statement = """ DELETE FROM a.dataset.table where value = 2 """
client.delete(table_id, statement)
Like #SergeyGeron stated. https://googleapis.dev/python/bigquery/latest/usage/index.html#bigquery-basics
has nice stuff.
Wrote something like this.
from google.cloud import bigquery
client = bigquery.Client()
query = """DELETE FROM a.dataset.table WHERE value = 4"""
query_job = client.query(query)
print(query_job.result())
here you can see a code a documentation to execute a query with python
You can see this example code, with the “Delete” statement.
from google.cloud import bigquery
client = bigquery.Client()
dml_statement = (
"Delete from dataset.Inventory where ID=5"
)
query_job = client.query(dml_statement) # API request
query_job.result() # Waits for statement to finish
How to build queries in BigQuery
I'm using the following python code to delete some rows in a bigquery table:
from google.cloud import bigquery
bigquery_client = bigquery.Client(project='my-project')
query_delete = f"""
delete from `my_table` where created_at >= '2021-01-01'
"""
print(query_delete)
# query_delete
job = bigquery_client.query(query)
job.result()
print("Deleted!")
However, the rows don't seem to be deleted when doing this from python. What am I missing?
I think below code snippet should work for you. You should pass query_delete instead of query
from google.cloud import bigquery
bigquery_client = bigquery.Client(project='my-project')
query_delete = f"""
delete from `my_table` where created_at >= '2021-01-01'
"""
print(query_delete)
job = bigquery_client.query(query_delete)
job.result()
print("Deleted!")
Or you can try below formatted query
query_delete = (
"DELETE from my_table "
"WHERE created_at >= '2021-01-01'"
)
You probably need to enable standard SQL in BigQuery table.
https://stackoverflow.com/a/42831957/1683626
It should be set as default as is noted in official documentation, but maybe you changed it to legacy sql.
In the Cloud Console and the client libraries, standard SQL is the default.
I'm using the python client to create tables via SQL as explained in the docs (https://cloud.google.com/bigquery/docs/tables) like so:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location='US',
job_config=job_config) # API request - starts the query
query_job.result() # Waits for the query to finish
print('Query results loaded to table {}'.format(table_ref.path))
This works well except that the client function for creating a table via SQL query uses a job_config object, and job_config receives a table_ref, not a table object.
I found this doc for creating tables with description here: https://google-cloud-python.readthedocs.io/en/stable/bigquery/usage.html, But this is for tables NOT created from queries.
Any ideas on how to create a table from query while specifying a description for that table?
Since you want to do more than only save the SELECT result to a new table the best way for you is not use a destination table in your job_config variable rather use a CREATE command
So you need to do 2 things:
Remove the following 2 lines from your code
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
Replace your SQL with this
#standardSQL
CREATE TABLE dataset_id.your_table_id
PARTITION BY DATE(_PARTITIONTIME)
OPTIONS(
description = 'this table was created via agent #123'
) AS
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
Hi I am using BigQuery and with its Python API submitting Queries to get results. I am using the method - bqclient.query("PASS THE QUERY") to execute the query programmatically. I am trying to do a performance test but BigQuery returns cached results. Is there a way I can set cache = False in the Python API while calling the bqclient.query method. Through the BigQuery documentation I have see that we can set useQueryCache property to false, but am not sure where to set it.
Current Code
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = bigquery.query(select_query, job_config = job_config)
query represents the query that I want to run.
Thank you
You need to set useQueryCache. See here for more info. Not the lower case underscore format:
[..]
QUERY = ('SELECT ..')
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = client.query(QUERY, job_config=job_config)
[..]
This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)