Airflow Bigquery Hook : how to save results in python variable? - python

I am using bigquery hook in my airflow code.
Query example : select count(*) from 'table-name';
so it will return only 1 integer as a result.
How can I save it in an Integer python variable instead of entire pandas dataframe ?
Below is my code example,
hook = BigQueryHook(bigquery_conn_id=BQ_CON, use_legacy_sql=False)
bq_client = bigquery.Client(project = hook._get_field("project"), credentials = hook._get_credentials())
query = "select count(*) from dataset1.table1;"
df = bq_client.query(query).to_dataframe()

If it is just a single row, you could name the column col1 and access it by this key name
query = "select count(*) as col1 from dataset1.table1;"
query_result = client.query(query)
result = query_result[0]['col1']
or if you have already called to_dataframe()
result = int(df.values[0])

Related

Print Results of Queries Python

My application uses SQLAlchemy/SQL to query a database. I want to print out the result of the query, but I am getting a <sqlalchemy.engine.result.ResultProxy object in response.
I tried out the suggestions in How to access the results of queries? but I am getting an "Uncaught exception"
See code below:
query = f"SELECT COUNT(DISTINCT id)"\
f"FROM group"
result = db.session.execute(query)
id_count = result.first()[0]
print(id_count)
Try this one:
query = f"SELECT COUNT(DISTINCT id)"\
f"FROM group"
result = db.session.execute(query)
id_count = result.first()
for i in id_count:
print(i[0]) # Access by positional index
print(i['my_column']) # Access by column name as a string
r_dict = dict(i.items()) # convert to dict keyed by column names

in MySQL and python how to access fields by using field name not field index

python & mysql
I am making a query on MySQL database in python module, as follows :
qry = "select qtext,a1,a2,a3,a4,rightanswer from question where qno = 1 ")
mycursor.execute(qry)
myresult = mycursor.fetchone()
qtext.insert('1', myresult[0])
I access the fields by their index number (i.e myresult[0])
my question is how can I access fields by their field-name instead of their index in the query ?
I have to add the following line before executing the query
mycursor = mydb.cursor(dictionary=True)
this line converts the query result to a dictionary that enabled me to access fields by their names names instead of index as follows
qtext.insert('1', myresult["qtext"])
qanswer1.insert('1',myresult["a1"]) # working
qanswer2.insert('1',myresult["a2"]) # working
qanswer3.insert('1',myresult["a3"]) # working
qanswer4.insert('1',myresult["a4"]) # working
r = int(myresult["rightanswer"])
Here is your answer: How to retrieve SQL result column value using column name in Python?
cursor.execute("SELECT name, category FROM animal")
result_set = cursor.fetchall()
for row in result_set:
print "%s, %s" % (row["name"], row["category"])```

Combining external temp table from cloud storage with pre existing big query table- append from python

I have a permanent table in bigquery that I want to append to with data coming from a csv in google cloud storage. I first read the csv file into a big query temp table:
table_id = "incremental_custs"
external_config = bigquery.ExternalConfig("CSV")
external_config.source_uris = [
"gs://location/to/csv/customers_5083983446185_test.csv"
]
external_config.schema=schema
external_config.options.skip_leading_rows = 1
job_config = bigquery.QueryJobConfig(table_definitions={table_id: external_config})
sql_test = "SELECT * FROM `{table_id}`;".format(table_id=table_id)
query_job = bq_client.query(sql_test,job_config=job_config)
customer_updates = query_job.result()
print(customer_updates.total_rows)
Up until here all works and I retrieve the records from the tmp table. Issue arises when I try to then combine it with a permanent table:
sql = """
create table `{project_id}.{dataset}.{table_new}` as (
select customer_id, email, accepts_marketing, first_name, last_name,phone,updated_at,orders_count,state,
total_spent,last_order_name,tags,ll_email,points_approved,points_spent,guest,enrolled_at,ll_updated_at,referral_id,
referred_by,referral_url,loyalty_tier_membership,insights_segment,rewards_claimed
from (
select * from `{project_id}.{dataset}.{old_table}`
union all
select * from `{table_id}`
ORDER BY customer_id, orders_count DESC
))
order by orders_count desc
""".format(project_id=project_id, dataset=dataset_id, table_new=table_new, old_table=old_table, table_id=table_id)
query_job = bq_client.query(sql)
query_result = query_job.result()
I get the following error:
BadRequest: 400 Table name "incremental_custs" missing dataset while no default dataset is set in the request.
Am I missing something here? Thanks !
Arf, you forgot the external config! You don't pass it in your second script
query_job = bq_client.query(sql)
Simply update it like in the first one
query_job = bq_client.query(sql_test,job_config=job_config)
A fresh look is always easier!

How to pass a date string into an SQL query using Jinja

I need to pass in a date '20200303' into an SQL table reading jinja.
the python script is as follows:
record_date = '20170303'
sql_data = {'date':record_date}
for file in files_sql:
with open(file) as file_reader:
sql_template = file_reader.read()
sql_template = jinja(sql_template)
sql_query = sql_template.render(data=sql_data)
spark.sql(sql_query)
the sql query file (query_A.sql) it's reading looks like this:
SELECT * FROM table where date <= {{date}}
however this is not working and is returning 0 rows. what am I doing wrong here?
EDIT: fixed key from record_date to date, but still having issues

Passing Array Parameter to SQL for BigQuery in Python

I have a set of IDs (~200k) and I need to get all the rows in a BigQuery Table with those IDs. I tried to construct a list in python and pass it as a parameter to the SQL query using # but I get TypeError: 'ArrayQueryParameter' object is not iterable error. Here is the code I tried (very similar to https://cloud.google.com/bigquery/querying-data#running_parameterized_queries):
id_list = ['id1', 'id2']
query = """
SELECT id
FROM `my-db`
WHERE id in UNNEST(#ids)
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'ARRAY<STRING>', id_list)
)
)
Probably the issue here is that you are not passing a tuple to the function.
Try adding a comma before closing the parenthesis, like so:
id_list = ['id1', 'id2']
query = """
SELECT id
FROM `my-db`
WHERE id in UNNEST(#ids)
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'STRING', id_list),
)
)
In Python if you do:
t = (1)
and then run:
type(t)
You will find the result to be int. But if you do:
t = (1,)
Then it results in a tuple.
You need to use 'STRING' rather than 'ARRAY<STRING>' for the array element type, e.g.:
query_parameters=(
bigquery.ArrayQueryParameter('ids', 'STRING', id_list)
The example from the querying data topic is:
def query_array_params(gender, states):
client = bigquery.Client()
query = """
SELECT name, sum(number) as count
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE gender = #gender
AND state IN UNNEST(#states)
GROUP BY name
ORDER BY count DESC
LIMIT 10;
"""
query_job = client.run_async_query(
str(uuid.uuid4()),
query,
query_parameters=(
bigquery.ScalarQueryParameter('gender', 'STRING', gender),
bigquery.ArrayQueryParameter('states', 'STRING', states)))
query_job.use_legacy_sql = False
# Start the query and wait for the job to complete.
query_job.begin()
wait_for_job(query_job)
print_results(query_job.results())
Above answers are a better solution but you may find a use for this too whe quickly drafting something in notebooks:
turn a list into a string of date values, comma-separated and in quotes. Then pass the string into the query like so:
id_list = ['id1', 'id2']
# format into a query valid string
id_string = '"'+'","'.join(id_list)+'"'
client = bigquery.Client()
query = f"""
SELECT id
FROM `my-db`
WHERE id in {id_string}
"""
query_job=client.query(query)
results = query_job.result()
If you want to use the simple query like client.query, not client.run_async_query as shown in the answers above. You can to pass an additional parameter QueryJobConfig. Simply add your arrays to query_parameters using bigquery.ArrayQueryParameter.
The following code worked for me:
query = f"""
SELECT distinct pipeline_commit_id, pipeline_id, name
FROM `{self.project_id}.{self.dataset_id}.pipelines_{self.table_suffix}`,
UNNEST(labels) AS label
where label.value IN UNNEST(#labels)
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ArrayQueryParameter('labels', 'STRING', labels)
]
)
query_job = self.client.query(query, job_config=job_config)
Based on those examples:
https://cloud.google.com/bigquery/docs/parameterized-queries

Categories

Resources