I am currently working on a POC where we would like to get the snowflake query results into an email using Python.
For example : When executing an Insert statement in Snowflake, I would like to capture the result showing how many records were inserted. Please note that we are using Python Connector for Snowflake to execute our queries from Python script. Also we are using dataframes to store and process data internally.
Any help is appreciated!
Following the INSERT statement, you can retrieve the number of rows inserted from cursor.rowcount.
Related
I have large text files that need parsing and cleaning. I am using jupyter notebook to do it. I have a sql server DB that I want to insert the data in after the data is prepared. I used pyodbc to insert the final dataframe into sql. df is my dataframe and I put my sql insert query in the field sqlInsertQuery
df_records = df.values.tolist()
cursor.executemany(sqlInsertQuery,df_records)
cursor.commit()
for a few rows it works fine, but when I want to insert the whole dataframe at once with the code above, with executemany() it lasts for hours and it keeps ruuning till I stop the kernel.
I exported one file/dataframe in an excel file and it is about 83 M, As my dataframe contains very large strings and lists.
Someone recommended using fast_executemany instead but seems it is faulty.
some others recommended other packages than pyodbc.
Some said better not to use jupyter and use pycharm or Ipython.
I did not get what is the best/fastet way to insert my data to my DB in my case. I am not a developer and I really appreciate if you help me on this.
I am using Python to_sql function to insert data in a database table from Pandas dataframe.
I am able to insert data in database table but I want to know in my code how many records are inserted .
How to know record count of inserts ( i do not want to write one more query to access database table to get record count)?
Also, is there a way to see logs for this function execution. like what were the queries executed etc.
There is no way to do this, since python cannot know how many of the records being inserted were already in the table.
I created a python script that pushes a pandas dataframe into Google BigQuery and it looks as though I'm able to query the table directly from GBQ. However, another user is unable to view the results when they query from that same table I generated on GBQ. This seems to be a Big Query issue because when they tried to connect to GBQ and query the table indirectly using pandas, it seemed to work fine (pd.read_gbq("SELECT * FROM ...", project_id)). What is causing this strange behaviour?
What I'm seeing:
What they are seeing:
I've encountered this when loading tables to BigQuery via Python GBQ. If you take the following steps, the table will display properly
Load dataframe to BigQuery via Python GBQ
SELECT * FROM uploaded_dataset.uploaded_dataset; doing so will properly show the table
Within the BigQuery UI, save the table (as a new table name)
From there, you will be able to see the table properly. Unfortunately, I don't know how to resolve this without a manual step in the UI.
I've been able to append/create a table from a Pandas dataframe using the pandas-gbq package. In particular using the to_gbq method. However, When I want to check the table using the BigQuery web UI I see the following message:
This table has records in the streaming buffer that may not be visible in the preview.
I'm not the only one to ask, and it seems that there's no solution to this yet.
So my questions are:
1. Is there a solution to the above problem (namely the data not being visible in the web UI).
2. If there is no solution to (1), is there another way that I can append data to an existing table using the Python BigQuery API? (Note the documentation says that I can achieve this by running an asynchronous query and using writeDisposition=WRITE_APPEND but the link that it provides doesn't explain how to use it and I can't work it out).
That message is just a UI notice, it should not hold you back.
To check data run a simple query and see if it's there.
To read only the data that is still in Streaming Buffer use this query:
#standardSQL
SELECT count(1)
FROM `dataset.table` WHERE _PARTITIONTIME is null
I am working on a Python Script to get the data out from sales force. Everything seems to be working fine I am passing a custom SOQL query to get the data but the challenge is that query is returning only the first 500 rows but there are close to 653000 results inside my object.
"data= sf.query('SELECT {} from abc'.format(','.join(column_names_list)))"
now i tried using querymore() and queryall() but that doesn't seem to work too.
At the end my intent is to load all this information in a dataframe and push this to a table and keep looking for new records by scheduling this code. Is there a way to achieve this. ?
In order to retrieve all the records using a single method just try
data= sf.query_all('SELECT {} from abc'.format(','.join(column_names_list)))
Refer following link