I am trying to query data from Snowflake into a Jupyter Notebook. Since some columns were not present in the original table, I did create a temporary table which had the required new columns. Unfortunately, due to work restrictions, I couldn't show the whole output here. But when I did run the CREATE TEMPORARY TABLE command, got the following output.
Table CUSTOMER_ACCOUNT_NEW successfully created.
Here is the query I used to make the TEMP table.
CREATE OR REPLACE TEMPORARY TABLE DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW AS
SELECT ID,
VERIFICATION_PROFILE,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults')::VARCHAR AS identitymind,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."mm:1"')::VARCHAR AS mm1,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."mm:2"')::VARCHAR AS mm2,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults.res')::VARCHAR AS res,
get_path(VERIFICATION_PROFILE,'identityMindMostRecentResults."ss:1"')::VARCHAR AS sanctions,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.riskScore')::VARCHAR AS riskscore,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.giact.verificationResponse')::VARCHAR AS GIACT,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.type')::VARCHAR AS acct_type,
get_path(VERIFICATION_PROFILE,'autoVerified.verified')::VARCHAR AS verified,
get_path(VERIFICATION_PROFILE,'bankInformationProvided')::VARCHAR AS Bank_info_given,
get_path(VERIFICATION_PROFILE,'businessInformationProvided')::VARCHAR AS Business_info_given,
get_path(VERIFICATION_PROFILE,'autoVerified.facts.account.industry.riskLevel')::VARCHAR AS industry_risk
FROM DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT
WHERE DATEDIFF('day',TO_DATE(TIME_UPDATED),CURRENT_DATE())<=90
I would like to mention that VERIFICATION_PROFILE is a JSON blob, hence I had to use get_path to retrieve the values. Moreover, keys like mm:1 are originally in double quotes, so I did use it as it is, and it is working fine in snowflake.
Then using snowflake connector python, I did try to run following query;
import pandas as pd
import warnings
import snowflake.connector as sf
ctx = sf.connect(
user='*****',
password='*****',
account='*******',
warehouse='********',
database='DATA_LAKE',
schema='CUSTOMER'
)
#create cursor
curs = ctx.cursor()
sqlnew2 = "SELECT * \
FROM DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW;"
curs.execute(sqlnew2)
df = curs.fetch_pandas_all()
Here curs is the cursor object created earlier. Then I got the following message;
ProgrammingError: 002003 (42S02): SQL compilation error:
Object 'DATA_LAKE.CUSTOMER.CUSTOMER_ACCOUNT_NEW' does not exist or not authorized.
May I know does snowflake connector allow us to query data from temporary tables pr not? Help/advice is greatly appreciated.
Temp tables only live as long as the session they were created lives:
Temporary tables can have a Time Travel retention period of 1 day; however, a temporary table is purged once the session (in which the table was created) ends so the actual retention period is for 24 hours or the remainder of the session, whichever is shorter.
You might want to use a transient table instead:
https://docs.snowflake.com/en/user-guide/tables-temp-transient.html#comparison-of-table-types
Related
I'm trying to dump a pandas DataFrame into an existing Snowflake table (via a jupyter notebook).
When I run the code below no errors are raised, but no data is written to the destination SF table (df has ~800 rows).
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
sf_engine = create_engine(
URL(
user=os.environ['SF_PROD_EID'],
password=os.environ['SF_PROD_PWD'],
account=account,
warehouse=warehouse,
database=database,
)
)
df.to_sql(
"test_table",
con=sf_engine,
schema=schema,
if_exists="append",
index=False,
chunksize=16000,
)
If I check the SF History, I can see that the queries apparently ran without issue:
If I pull the query from the SF History UI and run it manually in the Snowflake UI the data shows up in the destination table.
If I try to use locopy I run into the same issue.
If the table does not exist before hand, the same code above creates the table and drops the rows no problem.
Here's where it gets weird. When I run the pd.to_sql command to try and append and then drop the destination table, if I then issue a select count(*) from destination_table a table still exists with that name and has (only) the data that I've been trying to drop. Thinking it may be a case-sensitive table naming situation?
Any insight is appreciated :)
Try adding role="<role>" and schema="<schema>" in URL.
engine = create_engine(URL(
account=os.getenv("SNOWFLAKE_ACCOUNT"),
user=os.getenv("SNOWFLAKE_USER"),
password=os.getenv("SNOWFLAKE_PASSWORD"),
role="<role>",
warehouse="<warehouse>",
database="<database>",
schema="<schema>"
))
Issue was due how I set up the database connection and the case-sensitivity of the table name. Turns out that I was writing to a table called DB.SCHEMA."db.schema.test_table" (note that the db.schema slug turns into part of the table name). Don't be like me kids. Use upper-case table names in Snowflake!
I am trying to experiment with creating new tables from existing BQ tables, all within python. So far I've successfully created the table using some similar code, but now I want to add another column to it from another table - which I have not been successful with. I think the problem comes somewhere within my SQL code.
Basically what I want here is to add another column named "ip_address" and put all the info from another table into that column.
I've tried splitting up the two SQL statements and running them separately, I've tried many different combinations of the commands (taking our CHAR, adding (32) after, combining all into one statement, etc.), and still I run into problems.
from google.cloud import bigquery
def alter(client, sql_alter, job_config, table_id):
query_job = client.query(sql_alter, job_config=job_config)
query_job.result()
print(f'Query results appended to table {table_id}')
def main():
client = bigquery.Client.from_service_account_json('my_json')
table_id = 'ref.datasetid.tableid'
job_config = bigquery.QueryJobConfig()
sql_alter = """
ALTER TABLE `ref.datasetid.tableid`
ADD COLUMN ip_address CHAR;
INSERT INTO `ref.datasetid.tableid` ip_address
SELECT ip
FROM `ref.datasetid.table2id`;
"""
alter(client, sql_alter, job_config, table_id)
if __name__ == '__main__':
main()
With this code, the current error is "400 Syntax error: Unexpected extra token INSERT at [4:9]" Also, do I have to continuously reference my table with ref.datasetid.tableid or can I write just tableid? I've run into errors before it gets there so I'm still not sure. Still a beginner so help is greatly appreciated!
BigQuery does not support ALTER TABLE or other DDL statements, take a look into how Modifying table schemas there you can find an example of how to add a new column when you append data to a table during a load job.
I'm using SQL Server 2014, pandas 0.23.4, sqlalchemy 1.2.11, pyodbc 4.0.24, and Python 3.7.0. I have a very simple stored procedure that performs an UPDATE on a table and then a SELECT on it:
CREATE PROCEDURE my_proc_1
#v2 INT
AS
BEGIN
UPDATE my_table_1
SET v2 = #v2
;
SELECT * from my_table_1
;
END
GO
This runs fine in MS SQL Server Management Studio. However, when I try to invoke it via Python using this code:
import pandas as pd
from sqlalchemy import create_engine
if __name__ == "__main__":
conn_str = 'mssql+pyodbc://#MODEL_TESTING'
engine = create_engine(conn_str)
with engine.connect() as conn:
df = pd.read_sql_query("EXEC my_proc_1 33", conn)
print(df)
I get the following error:
sqlalchemy.exc.ResourceClosedError: This result object does not return
rows. It has been closed automatically.
(Please let me know if you want full stack trace, I will update if so)
When I remove the UPDATE from the stored proc, the code runs and the results are returned. Note also that selecting from a table other than the one being updated does not make a difference, I get the same error. Any help is much appreciated.
The issue is that the UPDATE statement is returning a row count, which is a scalar value, and the rows returned by the SELECT statement are "stuck" behind the row count where pyodbc cannot "see" them (without additional machinations).
It is considered a best practice to ensure that our stored procedures always start with a SET NOCOUNT ON; statement to suppress the returning of row count values from DML statements (UPDATE, DELETE, etc.) and allow the stored procedure to just return the rows from the SELECT statement.
For me I got the same issue for another reason, I was using sqlachmey the newest syntax select to get the entries of a table and I had forgot to write the name of the table class I want to get values from, so I got this error, so I had only added the name of the table as an argument to fix the error.
the code leaded to the error
query = select().where(Assessment.created_by == assessment.created_by)
simply fix it by adding the table class name sometimes issues are only in the syntax hhh
query = select(Assessment).where(Assessment.created_by == assessment.created_by)
When I am exporting data from MySQL to BigQuery, some data are been duplicated. As a way to fix this, I thought of creating views of this tables using row number. The query to do this is shown below. The problem is that a lot of tables in my dataset are duplicated and possibly when I add new tables and export them to big query, they will have duplicated data and I don't want to create this type of query every time that a I add a new table in my dataset (I want that, in the moment I export a new table, a view to this table is created). Is this possible to do in a loop in the query (like 'for each table in my data set, do this')? Is this possible to do in shell script (when export a table to big query, create a view for this table)? In last case, is this possible to do in python?
SELECT
* EXCEPT (ROW_NUMBER)
FROM
(
SELECT
*, ROW_NUMBER() OVER (PARTITION BY id order by updated_at desc) ROW_NUMBER
FROM dataset1.table1
)
WHERE ROW_NUMBER = 1
It is definitely can be done in python.
I would recommend to use gcloud python library https://github.com/GoogleCloudPlatform/google-cloud-python
So I think you script should be something like this
from google.cloud import bigquery
from google.cloud.bigquery import Dataset
client = bigquery.Client()
dataset_ref = client.dataset('dataset_name')
tables = list(client.list_tables(dataset_ref))
for tab in tables:
table = dataset.table("v_{}".format(tab.name))
table.view_query = "select * from `my-project.my.dataset.{}`".format(tab.name)
#if creating legacy view comment out next line
table.view_query_legacy_sql = False
table.create()
I'm querying a json on a website for data, then saving that data into a variable so I can put it into a sqlite table. I'm 2 out of 3 for what I'm trying to do, but the sqlite side is just mystifying. I'm able to request the data, from there I can verify that the variable has data when I test it with a print, but all of my sqlite stuff is failing. It's not even creating a table, much less updating the table (but it is printing all the results to the buffer for some reason) Any idea what I'm doing wrong here? Disclaimer: Bit of a python noob. I've successfully created test tables just copying the stuff off of the python sqlite doc
# this is requesting the data and seems to work
for ticket in zenpy.search("bananas"):
id = ticket.id
subj = ticket.subject
created = ticket.created_at
for comment in zenpy.tickets.comments(ticket.id):
body = comment.body
# connecting to sqlite db that exists. things seem to go awry here
conn = sqlite3.connect('example.db')
c = conn.cursor()
# Creating the table table (for some reason table is not being created at all)
c.execute('''CREATE TABLE tickets_test
(ticket id, ticket subject, creation date, body text)''')
# Inserting the variables into the sqlite table
c.execute("INSERT INTO ticketstest VALUES (id, subj, created, body)")
# committing changes the changes and closing
c.commit()
c.close()
I'm on Windows 64bit and using pycharm to do this.
Your table likely isn't created because you haven't committed yet, and your sql fails before it commits. It should work when you fix your 2nd sql statement.
You're not inserting the variables you've created into the table. You need to use parameters. There are two ways of parameterizing your sql statement. I'll show the named placeholders one:
c.execute("INSERT INTO ticketstest VALUES (:id, :subj, :created, :body)",
{'id':id, 'subj':subj, 'created':created, 'body':body}
)