Azure's CosmosDB - Use query explorer in python program - python

I have a bunch of JSON files stored in Azure's CosmosDB database. I also have a python program that reads the JSON files. I want to run a query on the Azure's query explorer from python
SELECT VALUE Block
FROM c
JOIN Block IN c.radar50p01
So far what I have in my python program is the following
def getCosmosDBClient():
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(Constants.URL, {'masterKey': Constants.KEY})
return client
def getCosmosDBColl_link():
client = getCosmosDBClient()
db_id = Constants.RADAR_DATABASE_NAME
db_query = "select * from r where r.id = '{0}'".format(db_id)
db = list(client.QueryDatabases(db_query))[0]
db_link = db['_self']
coll_id = Constants.RADAR_COLL_NAME
coll_query = "select * from r where r.id = '{0}'".format(coll_id)
coll = list(client.QueryCollections(db_link, coll_query))
if coll:
coll = coll[0]
else:
raise ValueError("Collection not found in database.")
coll_link = coll['_self']
docs = client.ReadDocuments(coll_link)
return docs
So is there a way to use the Query above in python so I just get what I need specifically?
Thanks.

If your query have run successfully in the Query Explorer on Azure portal, you just use the client.QueryDocuments(collection_link, query) method to do your query, as the code below from here.
A query is performed using SQL
# Query them in SQL
query = { 'query': 'SELECT * FROM server s' }
options = {}
options['enableCrossPartitionQuery'] = True
options['maxItemCount'] = 2
result_iterable = client.QueryDocuments(collection['_self'], query, options)
results = list(result_iterable);
print(results)
Hope it helps. Any concern, please feel free to let me know.

Related

Can you delete rows in BigQuery from Python script?

Hi is there a way for deleting rows in BigQuery from a Python Script? I tried looking in the documentation and finding an example on internet, but I could not find anything.
Something that looks like this.
table_id = "a.dataset.table" # Table ID for faulty_gla_entry
statement = """ DELETE FROM a.dataset.table where value = 2 """
client.delete(table_id, statement)
Like #SergeyGeron stated. https://googleapis.dev/python/bigquery/latest/usage/index.html#bigquery-basics
has nice stuff.
Wrote something like this.
from google.cloud import bigquery
client = bigquery.Client()
query = """DELETE FROM a.dataset.table WHERE value = 4"""
query_job = client.query(query)
print(query_job.result())
here you can see a code a documentation to execute a query with python
You can see this example code, with the “Delete” statement.
from google.cloud import bigquery
client = bigquery.Client()
dml_statement = (
"Delete from dataset.Inventory where ID=5"
)
query_job = client.query(dml_statement) # API request
query_job.result() # Waits for statement to finish
How to build queries in BigQuery

Get dictionary from sqlite query when using python databases module

I am developing an asynchronous program and have decided to use databases module for interacting with a database. The problem is that when making queries I get a tuple as a response. However I want to receive the response in a dictionary format like this: 'column_name1': column_value1, ..., 'column_name_n': column_value_n.
I have found some solutions but they all use sqlite3 module.
import databases
pathtodb = '/path/to/your/database.db'
tablename = 'table_name_in_your_database'
database = databases.Database('sqlite:///{}'.format(pathtodb))
query1 = "PRAGMA table_info('{}')".format(tablename)
pragma = await database.fetch_all(query=query1)
query2 = "SELECT * FROM {}".format(tablename)
data = await database.fetch_all(query=query2)
dictionary_keys = list(zip(*pragma))[1]
dictionary_values = list(zip(*data))
dictionary = dict(zip(dictionary_keys,dictionary_values))

How to move a blob data to Snowflake thru Python

I am trying to move the data from ADLS blob to Snowflake table.
I am able to do the same with UI.
Steps followed for UI :
Generated the following SAS token :
sp=rl&st=2021-06-01T05:45:37Z&se=2021-06-01T13:45:37Z&spr=https&sv=2020-02-10&sr=c&sig=rYYY4o%2YY3jj%2XXXXXAB%2Bo8ygrtyAVCnPOxomlOc%3D
Able to load the table with the above token in Snowflake Web UI :
copy into FIRST_LEVEL.MOVIES
from 'azure://adlsedmadifpoc.blob.core.windows.net/airflow-dif/raw-area/'
credentials=(azure_sas_token='sp=rl&st=2021-06-01T05:45:37Z&se=2021-06-01T13:45:37Z&spr=https&sv=2020-02-10&sr=c&sig=rYYY4o%2YY3jj%2XXXXXAB%2Bo8ygrtyAVCnPOxomlOc%3D')
FORCE = TRUE file_format = (TYPE = CSV);
I am trying to do the same with Python :
from azure.storage.blob import BlobServiceClient,generate_blob_sas,BlobSasPermissions
from datetime import datetime,timedelta
import snowflake.connector
def generate_sas_token(file_name):
sas = generate_blob_sas(account_name="xxxx",
account_key="p5V2GELxxxxQ4tVgLdj9inKwwYWlAnYpKtGHAg==", container_name="airflow-dif",blob_name=file_name,permission=BlobSasPermissions(read=True),
expiry=datetime.utcnow() + timedelta(hours=2))
print (sas)
return sas
sas = generate_sas_token("raw-area/moviesDB.csv")
# Connectio string
conn = snowflake.connector.connect(user='xx',password='xx#123',account='xx.southeast-asia.azure',database='xx')
# Create cursor
cur = conn.cursor()
cur.execute(
f"copy into FIRST_LEVEL.MOVIES FROM 'azure://xxx.blob.core.windows.net/airflow-dif/raw-area/moviesDB.csv' credentials=(azure_sas_token='{sas}') file_format = (TYPE = CSV) ;")
cur.execute(f" Commit ;")
# Execute SQL statement
cur.close()
conn.close()
SAS token generated in the code :
se=2021-06-01T07%3A42%3A11Z&sp=rt&sv=2020-06-12&sr=b&sig=ZhZMPSI%yyyyAPTqqE0%3D
I am unable to use List permission while generating sas token thru python.
I am facing the below error :
cursor=cursor,
snowflake.connector.errors.ProgrammingError: 091003 (22000): Failure using stage area. Cause: [Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. (Status Code: 403; Error Code: AuthenticationFailed)]
I might have list of csv files in future in that folder.
Any help appreciated. Thanks.
The following code worked :
from azure.storage.blob import generate_container_sas, ContainerSasPermissions
from datetime import datetime,timedelta
import snowflake.connector
def get_sas_token():
container_sas_token = generate_container_sas(
account_name = 'XX',
account_key = 'p5V2GEL3AqGuPMMYXXXQ4tVgLdj9inKwwYWlAnYpKtGHAg==',
container_name = 'airflow-dif',
permission=ContainerSasPermissions(read=True,list=True),
expiry=datetime.utcnow() + timedelta(hours=1)
)
print (container_sas_token)
return container_sas_token
sas = get_sas_token()
# Connectio string
conn = snowflake.connector.connect(user='XX',password='XX#123',account='XX.southeast-asia.azure',database='XX')
# Create cursor
cur = conn.cursor()
cur.execute(
f"copy into FIRST_LEVEL.MOVIES FROM 'azure://XX.blob.core.windows.net/airflow-dif/raw-area/' credentials=(azure_sas_token='{sas}') FORCE = TRUE file_format = (TYPE = CSV) ;")
print (cur.fetchone())
cur.execute(f" Commit ;")
# Execute SQL statement
cur.close()
conn.close()
Thank you Gaurav for your inputs.

Running Bigquery query uncached using Python API

Hi I am using BigQuery and with its Python API submitting Queries to get results. I am using the method - bqclient.query("PASS THE QUERY") to execute the query programmatically. I am trying to do a performance test but BigQuery returns cached results. Is there a way I can set cache = False in the Python API while calling the bqclient.query method. Through the BigQuery documentation I have see that we can set useQueryCache property to false, but am not sure where to set it.
Current Code
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = bigquery.query(select_query, job_config = job_config)
query represents the query that I want to run.
Thank you
You need to set useQueryCache. See here for more info. Not the lower case underscore format:
[..]
QUERY = ('SELECT ..')
job_config = bigquery.QueryJobConfig()
job_config.use_query_cache = False
query_job = client.query(QUERY, job_config=job_config)
[..]

How to run a BigQuery query in Python

This is the query that I have been running in BigQuery that I want to run in my python script. How would I change this/ what do I have to add for it to run in Python.
#standardSQL
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
From what I have been researching it is saying that I cant save this query as a permanent table using Python. Is that true? and if it is true is it possible to still export a temporary table?
You need to use the BigQuery Python client lib, then something like this should get you up and running:
from google.cloud import bigquery
client = bigquery.Client(project='PROJECT_ID')
query = "SELECT...."
dataset = client.dataset('dataset')
table = dataset.table(name='table')
job = client.run_async_query('my-job', query)
job.destination = table
job.write_disposition= 'WRITE_TRUNCATE'
job.begin()
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery-usage.html
See the current BigQuery Python client tutorial.
Here is another way using a JSON file for the service account:
>>> from google.cloud import bigquery
>>>
>>> CREDS = 'test_service_account.json'
>>> client = bigquery.Client.from_service_account_json(json_credentials_path=CREDS)
>>> job = client.query('select * from dataset1.mytable')
>>> for row in job.result():
... print(row)
This is a good usage guide:
https://googleapis.github.io/google-cloud-python/latest/bigquery/usage/index.html
To simply run and write a query:
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_id = 'your_dataset_id'
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table("your_table_id")
job_config.destination = table_ref
sql = """
SELECT corpus
FROM `bigquery-public-data.samples.shakespeare`
GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
sql,
# Location must match that of the dataset(s) referenced in the query
# and of the destination table.
location="US",
job_config=job_config,
) # API request - starts the query
query_job.result() # Waits for the query to finish
print("Query results loaded to table {}".format(table_ref.path))
I personally prefer querying using pandas:
# BQ authentication
import pydata_google_auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform',
'https://www.googleapis.com/auth/drive',
]
credentials = pydata_google_auth.get_user_credentials(
SCOPES,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as over SSH or with Google Colab.
auth_local_webserver=True,
)
query = "SELECT * FROM my_table"
data = pd.read_gbq(query, project_id = MY_PROJECT_ID, credentials=credentials, dialect = 'standard')
The pythonbq package is very simple to use and a great place to start. It uses python-gbq.
To get started you would need to generate a BQ json key for external app access. You can generate your key here.
Your code would look something like:
from pythonbq import pythonbq
myProject=pythonbq(
bq_key_path='path/to/bq/key.json',
project_id='myGoogleProjectID'
)
SQL_CODE="""
SELECT
Serial,
MAX(createdAt) AS Latest_Use,
SUM(ConnectionTime/3600) as Total_Hours,
COUNT(DISTINCT DeviceID) AS Devices_Connected
FROM `dataworks-356fa.FirebaseArchive.testf`
WHERE Model = "BlueBox-pH"
GROUP BY Serial
ORDER BY Serial
LIMIT 1000;
"""
output=myProject.query(sql=SQL_CODE)

Categories

Resources