I'm running parameterised BigQuery queries inside a Flask app exactly as described in Google's docs.
I'm seeing some unexpected results, so simply want to print the query to my terminal/console for debug purposes. When I do this on only see the query with the parameterised placeholders, not the values.
Does anyone know how to get a view of the query with the values being run?
For example:
query = "select * from dogs where breed = #dog_breed"
query_params = [
bigquery.ScalarQueryParameter("dog_breed", "STRING", "kokoni")
]
job_config = bigquery.QueryJobConfig()
job_config.query_parameters = query_params
print(query) # This will only print query as above, not with value 'kokoni'
query_job = client.query(
query,
job_config=job_config,
)
You could use the list_jobs method to retrieve the information from the Job class, like in the example below:
from google.cloud import bigquery
client = bigquery.Client()
# List the 3 most recent jobs in reverse chronological order.
# Omit the max_results parameter to list jobs from the past 6 months.
print("Last 3 jobs:")
for job in client.list_jobs(max_results=3): # API request(s)
print(job.query)
print(job.query_parameters)
Related
Requirement: 1. I want to create python API which will help to insert data in big query table and this API will host in swagger/postman, from there user can provide input data so that it will get reflected in big query table.
Can anyone help me to find out suitable solution with code
import sqlite3 as sql
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/to/file.json')
project_id = 'project_id'
client = bigquery.Client(credentials= credentials,project=project_id)
def add_data(group_name, user_name):
try:
# Connecting to database
con = sql.connect('shot_database.db')
# Getting cursor
c = con.cursor()
# Adding data
job_config.use_legacy_sql = True
query_job = client.query("""
INSERT INTO `table_name` (group, user)
VALUES (%s, %s)""",job_config = job_config)
results = query_job.result() # Wait for the job to complete.
# Applying changes
con.commit()
except:
print("An error has occured")
The code you provided is a mix of SQLite and BigQuery, but it likes that you're trying to use BigQuery to insert data into a table. To insert data into a BigQuery table using Python, you can use the insert_data() method of the Client class. Here's I am adding an example of how you can use this method to insert data into a table called "mytable" in a dataset called "mydataset":
# Define the data you want to insert
data = [
{
"group": group_name,
"user": user_name
}
]
# Insert the data
table_id = "mydataset.mytable"
errors = client.insert_data(table_id, data)
if errors == []:
print("Data inserted successfully")
else:
print("Errors occurred while inserting data:")
print(json.dumps(errors, indent=2))
Then, You can create an API using Flask or Django and call the add_data method which you have defined to insert data into big query table.
Hi is there a way for deleting rows in BigQuery from a Python Script? I tried looking in the documentation and finding an example on internet, but I could not find anything.
Something that looks like this.
table_id = "a.dataset.table" # Table ID for faulty_gla_entry
statement = """ DELETE FROM a.dataset.table where value = 2 """
client.delete(table_id, statement)
Like #SergeyGeron stated. https://googleapis.dev/python/bigquery/latest/usage/index.html#bigquery-basics
has nice stuff.
Wrote something like this.
from google.cloud import bigquery
client = bigquery.Client()
query = """DELETE FROM a.dataset.table WHERE value = 4"""
query_job = client.query(query)
print(query_job.result())
here you can see a code a documentation to execute a query with python
You can see this example code, with the “Delete” statement.
from google.cloud import bigquery
client = bigquery.Client()
dml_statement = (
"Delete from dataset.Inventory where ID=5"
)
query_job = client.query(dml_statement) # API request
query_job.result() # Waits for statement to finish
How to build queries in BigQuery
I am trying to fetch data from big query using python. The code runs fine on my laptop but throws memory error on Linux server. Can this be optimized so that it can run on the server as well?
Error : table has 5 million rows...Linux machine with 8 GB ram....error "out of memory", process killed
Below is the code:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/Desktop/big_query_test/soy-serenity-89ed73.json"
client = bigquery.Client()
# Perform a query.
QUERY = “SELECT * FROM `soy-serenity-89ed73.events10`”
query_job = client.query(QUERY)
df = query_job.to_dataframe()
I can suggest two approaches:
option 1
SELECT the data in chunks to reduce the size of the data you received on each iteration from BigQuery.
For example, your table is partition you can do this:
WHERE _PARTITIONTIME = currentLoopDate
where currentLoopDate will be a date variable in your python code (Similar option will be to use ROW_NUMBER
option 2
By using BigQuery client library you can use a Jobs.insert API and set configuration.query.priority to batch.
# from google.cloud import bigquery
# client = bigquery.Client()
query = (
'SELECT name FROM `bigquery-public-data.usa_names.usa_1910_2013` '
'WHERE state = "TX" '
'LIMIT 100')
query_job = client.query(
query,
# Location must match that of the dataset(s) referenced in the query.
location='US') # API request - starts the query
for row in query_job: # API request - fetches results
# Row values can be accessed by field name or index
assert row[0] == row.name == row['name']
print(row)
See this link for some more details
After you get the jobId write a loop using Jobs.getQueryResults to get the chunks of data by setting the maxResults parameter of the API
Hi I am using BigQuery and with its Python API submitting Queries to get results. I am using the method - bqclient.query("PASS THE QUERY") to execute the query programmatically. I am trying to do a performance test but BigQuery returns cached results. Is there a way I can set cache = False in the Python API while calling the bqclient.query method. Through the BigQuery documentation I have see that we can set useQueryCache property to false, but am not sure where to set it.
Current Code
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = bigquery.query(select_query, job_config = job_config)
query represents the query that I want to run.
Thank you
You need to set useQueryCache. See here for more info. Not the lower case underscore format:
[..]
QUERY = ('SELECT ..')
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = client.query(QUERY, job_config=job_config)
[..]
This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)