Airflow Bigquery Hook : how to save results in python variable?

Airflow Bigquery Hook : how to save results in python variable? - python

I am using bigquery hook in my airflow code.
Query example : select count(*) from 'table-name';
so it will return only 1 integer as a result.
How can I save it in an Integer python variable instead of entire pandas dataframe ?
Below is my code example,
hook = BigQueryHook(bigquery_conn_id=BQ_CON, use_legacy_sql=False)
bq_client = bigquery.Client(project = hook._get_field("project"), credentials = hook._get_credentials())
query = "select count(*) from dataset1.table1;"
df = bq_client.query(query).to_dataframe()

If it is just a single row, you could name the column col1 and access it by this key name
query = "select count(*) as col1 from dataset1.table1;"
query_result = client.query(query)
result = query_result[0]['col1']
or if you have already called to_dataframe()
result = int(df.values[0])

Related

Print Results of Queries Python

My application uses SQLAlchemy/SQL to query a database. I want to print out the result of the query, but I am getting a <sqlalchemy.engine.result.ResultProxy object in response.
I tried out the suggestions in How to access the results of queries? but I am getting an "Uncaught exception"
See code below:
query = f"SELECT COUNT(DISTINCT id)"\
f"FROM group"
result = db.session.execute(query)
id_count = result.first()[0]
print(id_count)

Try this one:
query = f"SELECT COUNT(DISTINCT id)"\
f"FROM group"
result = db.session.execute(query)
id_count = result.first()
for i in id_count:
print(i[0]) # Access by positional index
print(i['my_column']) # Access by column name as a string
r_dict = dict(i.items()) # convert to dict keyed by column names

in MySQL and python how to access fields by using field name not field index

python & mysql
I am making a query on MySQL database in python module, as follows :
qry = "select qtext,a1,a2,a3,a4,rightanswer from question where qno = 1 ")
mycursor.execute(qry)
myresult = mycursor.fetchone()
qtext.insert('1', myresult[0])
I access the fields by their index number (i.e myresult[0])
my question is how can I access fields by their field-name instead of their index in the query ?

I have to add the following line before executing the query
mycursor = mydb.cursor(dictionary=True)
this line converts the query result to a dictionary that enabled me to access fields by their names names instead of index as follows
qtext.insert('1', myresult["qtext"])
qanswer1.insert('1',myresult["a1"]) # working
qanswer2.insert('1',myresult["a2"]) # working
qanswer3.insert('1',myresult["a3"]) # working
qanswer4.insert('1',myresult["a4"]) # working
r = int(myresult["rightanswer"])

Here is your answer: How to retrieve SQL result column value using column name in Python?
cursor.execute("SELECT name, category FROM animal")
result_set = cursor.fetchall()
for row in result_set:
print "%s, %s" % (row["name"], row["category"])```

Combining external temp table from cloud storage with pre existing big query table- append from python

I have a permanent table in bigquery that I want to append to with data coming from a csv in google cloud storage. I first read the csv file into a big query temp table:
table_id = "incremental_custs"
external_config = bigquery.ExternalConfig("CSV")
external_config.source_uris = [
"gs://location/to/csv/customers_5083983446185_test.csv"
]
external_config.schema=schema
external_config.options.skip_leading_rows = 1
job_config = bigquery.QueryJobConfig(table_definitions={table_id: external_config})
sql_test = "SELECT * FROM `{table_id}`;".format(table_id=table_id)
query_job = bq_client.query(sql_test,job_config=job_config)
customer_updates = query_job.result()
print(customer_updates.total_rows)
Up until here all works and I retrieve the records from the tmp table. Issue arises when I try to then combine it with a permanent table:
sql = """
create table `{project_id}.{dataset}.{table_new}` as (
select customer_id, email, accepts_marketing, first_name, last_name,phone,updated_at,orders_count,state,
total_spent,last_order_name,tags,ll_email,points_approved,points_spent,guest,enrolled_at,ll_updated_at,referral_id,
referred_by,referral_url,loyalty_tier_membership,insights_segment,rewards_claimed
from (
select * from `{project_id}.{dataset}.{old_table}`
union all
select * from `{table_id}`
ORDER BY customer_id, orders_count DESC
))
order by orders_count desc
""".format(project_id=project_id, dataset=dataset_id, table_new=table_new, old_table=old_table, table_id=table_id)
query_job = bq_client.query(sql)
query_result = query_job.result()
I get the following error:
BadRequest: 400 Table name "incremental_custs" missing dataset while no default dataset is set in the request.
Am I missing something here? Thanks !

Arf, you forgot the external config! You don't pass it in your second script
query_job = bq_client.query(sql)
Simply update it like in the first one
query_job = bq_client.query(sql_test,job_config=job_config)
A fresh look is always easier!

How to pass a date string into an SQL query using Jinja

I need to pass in a date '20200303' into an SQL table reading jinja.
the python script is as follows:
record_date = '20170303'
sql_data = {'date':record_date}
for file in files_sql:
with open(file) as file_reader:
sql_template = file_reader.read()
sql_template = jinja(sql_template)
sql_query = sql_template.render(data=sql_data)
spark.sql(sql_query)
the sql query file (query_A.sql) it's reading looks like this:
SELECT * FROM table where date <= {{date}}
however this is not working and is returning 0 rows. what am I doing wrong here?
EDIT: fixed key from record_date to date, but still having issues

Passing Array Parameter to SQL for BigQuery in Python

I have a set of IDs (~200k) and I need to get all the rows in a BigQuery Table with those IDs. I tried to construct a list in python and pass it as a parameter to the SQL query using # but I get TypeError: 'ArrayQueryParameter' object is not iterable error. Here is the code I tried (very similar to https://cloud.google.com/bigquery/querying-data#running_parameterized_queries):
id_list = ['id1', 'id2']
query = """
SELECT id
FROM `my-db`
WHERE id in UNNEST(#ids)
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'ARRAY<STRING>', id_list)
)
)

Probably the issue here is that you are not passing a tuple to the function.
Try adding a comma before closing the parenthesis, like so:
id_list = ['id1', 'id2']
query = """
SELECT id
FROM `my-db`
WHERE id in UNNEST(#ids)
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'STRING', id_list),
)
)
In Python if you do:
t = (1)
and then run:
type(t)
You will find the result to be int. But if you do:
t = (1,)
Then it results in a tuple.

You need to use 'STRING' rather than 'ARRAY<STRING>' for the array element type, e.g.:
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'STRING', id_list)
The example from the querying data topic is:
def query_array_params(gender, states):
client = bigquery.Client()
query = """
SELECT name, sum(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE gender = #gender
AND state IN UNNEST(#states)
GROUP BY name
ORDER BY count DESC
LIMIT 10;
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ScalarQueryParameter('gender', 'STRING', gender),
bigquery.ArrayQueryParameter('states', 'STRING', states)))
query_job.use_legacy_sql = False
# Start the query and wait for the job to complete.
query_job.begin()
wait_for_job(query_job)
print_results(query_job.results())

Above answers are a better solution but you may find a use for this too whe quickly drafting something in notebooks:
turn a list into a string of date values, comma-separated and in quotes. Then pass the string into the query like so:
id_list = ['id1', 'id2']
# format into a query valid string
id_string = '"'+'","'.join(id_list)+'"'
client = bigquery.Client()
query = f"""
SELECT id
FROM `my-db`
WHERE id in {id_string}
"""
query_job=client.query(query)
results = query_job.result()

If you want to use the simple query like client.query, not client.run_async_query as shown in the answers above. You can to pass an additional parameter QueryJobConfig. Simply add your arrays to query_parameters using bigquery.ArrayQueryParameter.
The following code worked for me:
query = f"""
SELECT distinct pipeline_commit_id, pipeline_id, name
FROM `{self.project_id}.{self.dataset_id}.pipelines_{self.table_suffix}`,
UNNEST(labels) AS label
where label.value IN UNNEST(#labels)
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ArrayQueryParameter('labels', 'STRING', labels)
]
)
query_job = self.client.query(query, job_config=job_config)
Based on those examples:
https://cloud.google.com/bigquery/docs/parameterized-queries

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Airflow Bigquery Hook : how to save results in python variable? - python

If it is just a single row, you could name the column col1 and access it by this key name query = "select count(*) as col1 from dataset1.table1;" query_result = client.query(query) result = query_result[0]['col1'] or if you have already called to_dataframe() result = int(df.values[0])

Related

Print Results of Queries Python

in MySQL and python how to access fields by using field name not field index

Combining external temp table from cloud storage with pre existing big query table- append from python

How to pass a date string into an SQL query using Jinja

Passing Array Parameter to SQL for BigQuery in Python

Categories

Resources