CREATE OR REPLACE TABLE using the Google BigQuery Python library

CREATE OR REPLACE TABLE using the Google BigQuery Python library - python

My Python code is like so:
from google.cloud import bigquery
client = bigquery.Client(
project='my-project',
credentials=credentials,
)
sql = '''
CREATE OR REPLACE TABLE `my-project.my_dataset.test` AS
WITH some_table AS (
SELECT * FROM `my-project.my_dataset.table_1`
),
some_other_table AS (
SELECT id, some_column FROM my-project.my_dataset.table_2
)
SELECT * FROM some_table
LEFT JOIN some_other_table ON some_table.unique_id=some_other_table.id
'''
query_job = client.query(sql)
query_job.result()
The query works in the Google BigQuery Console UI, but not when executed as above from Python.
I understand that by using CREATE OR REPLACE this is a "DDL" request, which I cannot figure out how to execute from the Python library. You can set the destination table in the job.config, which lets you CREATE a table, but then you don't get the CREATE OR REPLACE functionality.
Thanks for any assistance.

After carefully reviewing the documentation, I can say that the Python SDK for BigQuery don't specify a way to to perform DDL statements as a query. You can find the documented code for the query function you are using here. As you can see, the query parameter expects a SQL statement.
Despite that, I tried to reproduce your problem and it worked for me. I could create the table perfectly by using a DDL statement as you're trying to do. Hence we can conclude that the API consider DDL as a subset of SQL.
I suggest that you comment the error you're receiving so I can provide you a better support.

Related

Issue with Bigquery table created using Dataframe in Python

I have created a temporary Bigquery table using Python and loaded data from a panda dataframe (code snippet given below).
client=bigquery.Client(project)
client.create_table(tmp_table)
client.load_table_from_dataframe(df,tmp_table)
The table is being created successfully and I can run select queries from web UI.
But when I run a select query using python
query =f"""select * from {tmp_table.project_id}.{tmp_table.dataset_id}.{tmp_table.table_id} """
It throws error select * would expand to zero columns
This is because there python is not able to detect any schema. Below query returns null:
print(tmp_table.schema)
If I hardcode the table name like below, it works fine :
query =f"""select * from project_id.dataset_id.table_id """
Can someone suggest how do I get data from the temporary table using a select query in python? I can't hardcode table name as it's being created at runtime.

Google BigQuery Results Don't Show

I created a python script that pushes a pandas dataframe into Google BigQuery and it looks as though I'm able to query the table directly from GBQ. However, another user is unable to view the results when they query from that same table I generated on GBQ. This seems to be a Big Query issue because when they tried to connect to GBQ and query the table indirectly using pandas, it seemed to work fine (pd.read_gbq("SELECT * FROM ...", project_id)). What is causing this strange behaviour?
What I'm seeing:
What they are seeing:

I've encountered this when loading tables to BigQuery via Python GBQ. If you take the following steps, the table will display properly
Load dataframe to BigQuery via Python GBQ
SELECT * FROM uploaded_dataset.uploaded_dataset; doing so will properly show the table
Within the BigQuery UI, save the table (as a new table name)
From there, you will be able to see the table properly. Unfortunately, I don't know how to resolve this without a manual step in the UI.

Get the schema of an Oracle database with python

I want to list and describe the tables present in an Oracle database.
To do this by connecting to the database with a client such as SQL Plus, a working approach is:
List the tables:
select tablespace_name, table_name from all_tables;
Get columns and data types for each table:
describe [table_name];
However when using cx_Oracle through python, cur.execute('describe [table_name]') results in an 'invalid sql' error.
How can we use describe with cx_Oracle in python?

It seems you can't.
From cx_Oracle instead of describe use:
cur.execute('select column_name, data_type from all_tab_columns where table_name = [table_name]')
(From Richard Moore here http://cx-oracle-users.narkive.com/suaWH9nn/cx-oracle4-3-1-describe-table-query-is-not-working)

As noted by others there is no ability to describe directly. I created a set of libraries and tools that let you do this, however. You can see them here: https://github.com/anthony-tuininga/cx_OracleTools.

How to insert historical data on their respective partitions

I have a database that has records stretching back to 2014 that I have to migrate it to BigQuery, and I think that using the partitioned tables feature will help on the performance of the database.
So far, I loaded a small sample of the real data via the web UI, and while the table was already partitioned, all the data went to a single partition with the date that I had run the query in, which was expected, to be fair.
I searched the documentation sites and I ran into this, which I'm not sure if that's what I'm looking for.
I have two questions:
1) In the above example, they use the decorator on a SELECT query, but can I use it on a INSERT query as well?
2) I'm using the Python client to connect to the BigQuery API, and I while I found the table.insert_data method, I couldn't find anything that refers specifically to insert in the partitions, and I'm wondering if I missed it or I will have to use the query API to also insert data.

Investigated this a bit more:
1) I don't think I've managed to run an INSERT query at all, but this is moot for me, because..
2) Turns out that it is possible to insert in the partitions directly using the Python client, but it wasn't obvious to me:
I was using this snippet to insert some data into a table:
from google.cloud import bigquery
items = [
(1, 'foo'),
(2, 'bar')
]
client = bigquery.Client()
dataset = client.dataset('<dataset>')
table = dataset.table('<table_name>')
table.reload()
print table.insert_data(items)
The key is appending a $ and a date (say, 20161201) to the table name in the selector, like so:
table = dataset.table('<table_name>$20161201')
And it should insert in the correct partition.

Convert SQL query to Django friendly format for application

I have an SQL query thats runs on the Postgres database of my Django based webapp. The query runs against the data stored by Django-Notifications (a reusable app) and returns a list of email addresses that have not opted out of a specific notice type.
What I would really like to be able to do is to build an application that does this on demand, so I'm looking for an example of how to convert the SQL so it can run inside a Django view that will pass out a formatted email list. The SQL is currently thus:
gr_webapp=# select email from emailconfirmation_emailaddress where verified and user_id not in
(select user_id from notification_noticesetting s join notification_noticetype t on s.notice_type_id = t.id
where t.label = 'announcement' and not s.send);

You might have to make appropriate adjustments as far as model names go, since you didn't show them in your question:
users_to_exclude = Noticesetting.objects.filter(send=False, notice_type__label='announcement').values('user')
emails = Emailaddress.objects.exclude(user__in=users_to_exclude)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CREATE OR REPLACE TABLE using the Google BigQuery Python library - python

Related

Issue with Bigquery table created using Dataframe in Python

Google BigQuery Results Don't Show

Get the schema of an Oracle database with python

How to insert historical data on their respective partitions

Convert SQL query to Django friendly format for application

Categories

Resources