I created a python script that pushes a pandas dataframe into Google BigQuery and it looks as though I'm able to query the table directly from GBQ. However, another user is unable to view the results when they query from that same table I generated on GBQ. This seems to be a Big Query issue because when they tried to connect to GBQ and query the table indirectly using pandas, it seemed to work fine (pd.read_gbq("SELECT * FROM ...", project_id)). What is causing this strange behaviour?
What I'm seeing:
What they are seeing:
I've encountered this when loading tables to BigQuery via Python GBQ. If you take the following steps, the table will display properly
Load dataframe to BigQuery via Python GBQ
SELECT * FROM uploaded_dataset.uploaded_dataset; doing so will properly show the table
Within the BigQuery UI, save the table (as a new table name)
From there, you will be able to see the table properly. Unfortunately, I don't know how to resolve this without a manual step in the UI.
Related
I am currently working on a POC where we would like to get the snowflake query results into an email using Python.
For example : When executing an Insert statement in Snowflake, I would like to capture the result showing how many records were inserted. Please note that we are using Python Connector for Snowflake to execute our queries from Python script. Also we are using dataframes to store and process data internally.
Any help is appreciated!
Following the INSERT statement, you can retrieve the number of rows inserted from cursor.rowcount.
I have created a temporary Bigquery table using Python and loaded data from a panda dataframe (code snippet given below).
client=bigquery.Client(project)
client.create_table(tmp_table)
client.load_table_from_dataframe(df,tmp_table)
The table is being created successfully and I can run select queries from web UI.
But when I run a select query using python
query =f"""select * from {tmp_table.project_id}.{tmp_table.dataset_id}.{tmp_table.table_id} """
It throws error select * would expand to zero columns
This is because there python is not able to detect any schema. Below query returns null:
print(tmp_table.schema)
If I hardcode the table name like below, it works fine :
query =f"""select * from project_id.dataset_id.table_id """
Can someone suggest how do I get data from the temporary table using a select query in python? I can't hardcode table name as it's being created at runtime.
I can save a dataframe using df.write.saveAsTable('tableName') and read the subsequent table with spark.table('tableName') but I'm not sure where the table is actually getting saved?
It is stored under the default location of your database.
You can get the location by running the following spark sql query:
spark.sql("DESCRIBE TABLE EXTENDED tableName")
You can find the Location under the # Detailed Table Information section.
Please find a sample output below:
I have a table in Google BigQuery(GBQ) with almost 3 million records(rows) so-far that were created based on data coming from MySQL db every day. This data inserted in GBQ table using Python pandas data frame(.to_gbq()).
What is the optimal way to sync changes from MySQL to GBQ, in this direction, with python.
Several different ways to import data from MySQL to BigQuery that might suit your needs are described in this article. For example Binlog replication:
This approach (sometimes referred to as change data capture - CDC) utilizes MySQL’s binlog. MySQL’s binlog keeps an ordered log of every DELETE, INSERT, and UPDATE operation, as well as Data Definition Language (DDL) data that was performed by the database. After an initial dump of the current state of the MySQL database, the binlog changes are continuously streamed and loaded into Google BigQuery.
Seems to be exactly what you are searching for.
I've been able to append/create a table from a Pandas dataframe using the pandas-gbq package. In particular using the to_gbq method. However, When I want to check the table using the BigQuery web UI I see the following message:
This table has records in the streaming buffer that may not be visible in the preview.
I'm not the only one to ask, and it seems that there's no solution to this yet.
So my questions are:
1. Is there a solution to the above problem (namely the data not being visible in the web UI).
2. If there is no solution to (1), is there another way that I can append data to an existing table using the Python BigQuery API? (Note the documentation says that I can achieve this by running an asynchronous query and using writeDisposition=WRITE_APPEND but the link that it provides doesn't explain how to use it and I can't work it out).
That message is just a UI notice, it should not hold you back.
To check data run a simple query and see if it's there.
To read only the data that is still in Streaming Buffer use this query:
#standardSQL
SELECT count(1)
FROM `dataset.table` WHERE _PARTITIONTIME is null