Issue with Bigquery table created using Dataframe in Python - python

I have created a temporary Bigquery table using Python and loaded data from a panda dataframe (code snippet given below).
client=bigquery.Client(project)
client.create_table(tmp_table)
client.load_table_from_dataframe(df,tmp_table)
The table is being created successfully and I can run select queries from web UI.
But when I run a select query using python
query =f"""select * from {tmp_table.project_id}.{tmp_table.dataset_id}.{tmp_table.table_id} """
It throws error select * would expand to zero columns
This is because there python is not able to detect any schema. Below query returns null:
print(tmp_table.schema)
If I hardcode the table name like below, it works fine :
query =f"""select * from project_id.dataset_id.table_id """
Can someone suggest how do I get data from the temporary table using a select query in python? I can't hardcode table name as it's being created at runtime.

Related

Unable to create table in snowflake from databricks for kafka topic

I am connecting a kafka topic, applying some transformation on dataframe and writing that data in snowflake with help of databricks. If table is present, it is successfully writing the data. However, if it is not present, it is giving me an error:
net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Object 'new_table_name' does not exist or not authorized.
This is the code I am using:
def foreach_batch_function(df, epoch_id):
df.write.format("snowflake").options(**sfOptions).option("dbtable", "new_table_name").mode('append').save()
query = my_df.writeStream.foreachBatch(foreach_batch_function).trigger(processingTime='30 seconds').start()
query.awaitTermination()
Note : With the same user I can create a new table manually in snowflake.
When the mode is changed to "Overwrite", it is creating table but old data of previous batch job was getting deleted.
preactions was creating table but was not suitable for current scenario. As it is creating for each batch and thus failing.
So, I used following part to create table if it doesn't exist.
sf_utils = sc._jvm.net.snowflake.spark.snowflake.Utils
sf_utils.runQuery(sfOptions,"create table if not exists {table_name} like existing_table".format(table_name=db_table))

When I save a PySpark DataFrame with saveAsTable in AWS EMR Studio, where does it get saved?

I can save a dataframe using df.write.saveAsTable('tableName') and read the subsequent table with spark.table('tableName') but I'm not sure where the table is actually getting saved?
It is stored under the default location of your database.
You can get the location by running the following spark sql query:
spark.sql("DESCRIBE TABLE EXTENDED tableName")
You can find the Location under the # Detailed Table Information section.
Please find a sample output below:

Upload data to Exasol from python Dataframe

I wonder if there's anyways to upload a dataframe and create a new table in Exasol? import_from_pandas assumes the table already exists. Do we need to run a SQL separately to create the table? for other databases, to_sql can just create the table if it doesn't exist.
Yes, As you mentioned import_from_pandas requires a table. So, you need to create a table before writing to it. You can run a SQL create table ... script by connection.execute before using import_from_pandas. Also to_sql needs a table since based on the documentation it will be translated to a SQL insert command.
Pandas to_sql allows to create a new table if it does not exist, but it needs an SQLAlchemy connection, which is not supported for Exasol out of the box. However, there seems to be a SQLAlchemy dialect for Exasol you could use (haven't tried it yet): sqlalchemy-exasol.
Alternatively, I think you have to use a create table statement and then populate the table via pyexasol's import_from_pandas.

CREATE OR REPLACE TABLE using the Google BigQuery Python library

My Python code is like so:
from google.cloud import bigquery
client = bigquery.Client(
project='my-project',
credentials=credentials,
)
sql = '''
CREATE OR REPLACE TABLE `my-project.my_dataset.test` AS
WITH some_table AS (
SELECT * FROM `my-project.my_dataset.table_1`
),
some_other_table AS (
SELECT id, some_column FROM my-project.my_dataset.table_2
)
SELECT * FROM some_table
LEFT JOIN some_other_table ON some_table.unique_id=some_other_table.id
'''
query_job = client.query(sql)
query_job.result()
The query works in the Google BigQuery Console UI, but not when executed as above from Python.
I understand that by using CREATE OR REPLACE this is a "DDL" request, which I cannot figure out how to execute from the Python library. You can set the destination table in the job.config, which lets you CREATE a table, but then you don't get the CREATE OR REPLACE functionality.
Thanks for any assistance.
After carefully reviewing the documentation, I can say that the Python SDK for BigQuery don't specify a way to to perform DDL statements as a query. You can find the documented code for the query function you are using here. As you can see, the query parameter expects a SQL statement.
Despite that, I tried to reproduce your problem and it worked for me. I could create the table perfectly by using a DDL statement as you're trying to do. Hence we can conclude that the API consider DDL as a subset of SQL.
I suggest that you comment the error you're receiving so I can provide you a better support.

Google BigQuery Results Don't Show

I created a python script that pushes a pandas dataframe into Google BigQuery and it looks as though I'm able to query the table directly from GBQ. However, another user is unable to view the results when they query from that same table I generated on GBQ. This seems to be a Big Query issue because when they tried to connect to GBQ and query the table indirectly using pandas, it seemed to work fine (pd.read_gbq("SELECT * FROM ...", project_id)). What is causing this strange behaviour?
What I'm seeing:
What they are seeing:
I've encountered this when loading tables to BigQuery via Python GBQ. If you take the following steps, the table will display properly
Load dataframe to BigQuery via Python GBQ
SELECT * FROM uploaded_dataset.uploaded_dataset; doing so will properly show the table
Within the BigQuery UI, save the table (as a new table name)
From there, you will be able to see the table properly. Unfortunately, I don't know how to resolve this without a manual step in the UI.

Categories

Resources