I am trying to load data from excel to Snowflake using Python.
Below is my code so far:
config = ConfigParser()
# parse ini file
config.read('config.ini')
## Read excel
file = '/path/INTRANSIT.xlsx'
df_excel = pd.read_excel(file, engine='openpyxl')
# sqlalchemy to create DB engine
engine = create_engine(URL(
account = config.get('Production', 'accountname'),
user = config.get('Production', 'username'),
password = config.get('Production', 'password'),
database = config.get('Production', 'dbname'),
schema = config.get('Production', 'schemaname'),
warehouse = config.get('Production', 'warehousename'),
role=config.get('Production', 'rolename'),
)
)
con = engine.connect()
df_excel.to_sql('transit_table',con, if_exists='replace', index=False)
con.close()
But I am getting below error:
sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in Engine(snowflake://username:***#account_identifier/): (db.schema.transit_table)
I have tried prefixing Database and schema to table and also tried passing table name alone. I have also tried passing uppercase and lowercase table name.
Still not able to resolve this error. Would really appreciate any help to resolve this!
Thank you.
Related
Requirement: 1. I want to create python API which will help to insert data in big query table and this API will host in swagger/postman, from there user can provide input data so that it will get reflected in big query table.
Can anyone help me to find out suitable solution with code
import sqlite3 as sql
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/to/file.json')
project_id = 'project_id'
client = bigquery.Client(credentials= credentials,project=project_id)
def add_data(group_name, user_name):
try:
# Connecting to database
con = sql.connect('shot_database.db')
# Getting cursor
c = con.cursor()
# Adding data
job_config.use_legacy_sql = True
query_job = client.query("""
INSERT INTO `table_name` (group, user)
VALUES (%s, %s)""",job_config = job_config)
results = query_job.result() # Wait for the job to complete.
# Applying changes
con.commit()
except:
print("An error has occured")
The code you provided is a mix of SQLite and BigQuery, but it likes that you're trying to use BigQuery to insert data into a table. To insert data into a BigQuery table using Python, you can use the insert_data() method of the Client class. Here's I am adding an example of how you can use this method to insert data into a table called "mytable" in a dataset called "mydataset":
# Define the data you want to insert
data = [
{
"group": group_name,
"user": user_name
}
]
# Insert the data
table_id = "mydataset.mytable"
errors = client.insert_data(table_id, data)
if errors == []:
print("Data inserted successfully")
else:
print("Errors occurred while inserting data:")
print(json.dumps(errors, indent=2))
Then, You can create an API using Flask or Django and call the add_data method which you have defined to insert data into big query table.
In my snowflake db, I have already created tables. I used the sqlalchemy inspector library to check if it was created.
engine = create_engine("snowflake://my_connection_string")
db = {}
metadata = MetaData(bind=engine)
inspector = inspect(engine)
schemas = inspector.get_schema_names()
for schema in schemas:
print(f"schema: {schema}")
for table_name in inspector.get_table_names(schema=schema):
print(f"table name: {table_name}")
db[f"{schema}.{table_name}"] = Table(f"{schema}.{table_name}", metadata)
print("Columns: ", end=': ')
for column in inspector.get_columns(table_name, schema=schema):
print(f"{column['name']}", end=',')
print()
However, I'm having trouble fetching those tables using sqlalchemy metadata:
engine = create_engine("snowflake://my_connection_string")
meta_data = MetaData(bind=engine)
MetaData.reflect(meta_data)
when I run the above code I get the following error:
ProgrammingError: (snowflake.connector.errors.ProgrammingError) 001059 (22023): SQL compilation error:
Must specify the full search path starting from database for TEST_DB
[SQL: SHOW /* sqlalchemy:_get_schema_primary_keys */PRIMARY KEYS IN SCHEMA test_db]
(Background on this error at: https://sqlalche.me/e/14/f405)
Any suggestions would be much appreciated!
I am importing a table created in python to my companies sql server database and would like to hide my sql server credentials.
The code I currently have is this:
engine = sqlalchemy.create_engine(
"mssql+pyodbc://trevor#email.com:mypassword!#dsn"
"?authentication=ActiveDirectoryPassword"
)
df.to_sql('df', con = engine, schema= 'dbo', if_exists='replace', index=False)
This currently works perfectly but I would like to hide my password as this code will live on our companies virtual machine so it can run automatically everyday. Any tips would be appreciated
I tried using keyring as such:
import keyring
creds = keyring.get_credential(service_name= "sqlupload", username = None)
username_var = creds.username
password_var = creds.password
and then replacing 'mypassword' with password_var
engine = sqlalchemy.create_engine(
"mssql+pyodbc://trevor#email.com:password_var#dsn"
"?authentication=ActiveDirectoryPassword"
)
df.to_sql('df', con = engine, schema= 'dbo', if_exists='replace', index=False)
I get the following error:
Error validating credentials due to invalid username or password
I believe this is because "mssql+pyodbc://trevor#email.com:password_var#dsn" is in quotes and password_var is not being read as a variable.
You can do a string formatter in python.
password_var = "mypassword!"
print(f'mssql+pyodbc://trevor#email.com:{password_var}#dsn')
Result:
mssql+pyodbc://trevor#email.com:mypassword!#dsn
Reference: https://docs.python.org/3/tutorial/inputoutput.html
I am trying to move the data from ADLS blob to Snowflake table.
I am able to do the same with UI.
Steps followed for UI :
Generated the following SAS token :
sp=rl&st=2021-06-01T05:45:37Z&se=2021-06-01T13:45:37Z&spr=https&sv=2020-02-10&sr=c&sig=rYYY4o%2YY3jj%2XXXXXAB%2Bo8ygrtyAVCnPOxomlOc%3D
Able to load the table with the above token in Snowflake Web UI :
copy into FIRST_LEVEL.MOVIES
from 'azure://adlsedmadifpoc.blob.core.windows.net/airflow-dif/raw-area/'
credentials=(azure_sas_token='sp=rl&st=2021-06-01T05:45:37Z&se=2021-06-01T13:45:37Z&spr=https&sv=2020-02-10&sr=c&sig=rYYY4o%2YY3jj%2XXXXXAB%2Bo8ygrtyAVCnPOxomlOc%3D')
FORCE = TRUE file_format = (TYPE = CSV);
I am trying to do the same with Python :
from azure.storage.blob import BlobServiceClient,generate_blob_sas,BlobSasPermissions
from datetime import datetime,timedelta
import snowflake.connector
def generate_sas_token(file_name):
sas = generate_blob_sas(account_name="xxxx",
account_key="p5V2GELxxxxQ4tVgLdj9inKwwYWlAnYpKtGHAg==", container_name="airflow-dif",blob_name=file_name,permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=2))
print (sas)
return sas
sas = generate_sas_token("raw-area/moviesDB.csv")
# Connectio string
conn = snowflake.connector.connect(user='xx',password='xx#123',account='xx.southeast-asia.azure',database='xx')
# Create cursor
cur = conn.cursor()
cur.execute(
f"copy into FIRST_LEVEL.MOVIES FROM 'azure://xxx.blob.core.windows.net/airflow-dif/raw-area/moviesDB.csv' credentials=(azure_sas_token='{sas}') file_format = (TYPE = CSV) ;")
cur.execute(f" Commit ;")
# Execute SQL statement
cur.close()
conn.close()
SAS token generated in the code :
se=2021-06-01T07%3A42%3A11Z&sp=rt&sv=2020-06-12&sr=b&sig=ZhZMPSI%yyyyAPTqqE0%3D
I am unable to use List permission while generating sas token thru python.
I am facing the below error :
cursor=cursor,
snowflake.connector.errors.ProgrammingError: 091003 (22000): Failure using stage area. Cause: [Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. (Status Code: 403; Error Code: AuthenticationFailed)]
I might have list of csv files in future in that folder.
Any help appreciated. Thanks.
The following code worked :
from azure.storage.blob import generate_container_sas, ContainerSasPermissions
from datetime import datetime,timedelta
import snowflake.connector
def get_sas_token():
container_sas_token = generate_container_sas(
account_name = 'XX',
account_key = 'p5V2GEL3AqGuPMMYXXXQ4tVgLdj9inKwwYWlAnYpKtGHAg==',
container_name = 'airflow-dif',
permission=ContainerSasPermissions(read=True,list=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
print (container_sas_token)
return container_sas_token
sas = get_sas_token()
# Connectio string
conn = snowflake.connector.connect(user='XX',password='XX#123',account='XX.southeast-asia.azure',database='XX')
# Create cursor
cur = conn.cursor()
cur.execute(
f"copy into FIRST_LEVEL.MOVIES FROM 'azure://XX.blob.core.windows.net/airflow-dif/raw-area/' credentials=(azure_sas_token='{sas}') FORCE = TRUE file_format = (TYPE = CSV) ;")
print (cur.fetchone())
cur.execute(f" Commit ;")
# Execute SQL statement
cur.close()
conn.close()
Thank you Gaurav for your inputs.
This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)