Joining logs from 2 Azure Log Analytics workspaces - python

I'm using the Azure SDK for Python to query a log Analytics workspace.
I have 2 workspaces I'd like to query, but I was wondering if there is a way to union the data inside the query instead of querying both workspaces and combining the result objects within my Python program.
Something like this -
from azure.monitor.query import LogsQueryClient
client = LogsQueryClient(creds)
query = """
TableName // Table from the current workspace
| union ExteralTableName // Table from a different workspace
"""
client.query_workspace("<current_workspace_id>", query, timespan="...")
The identity that executes this query will have permissions to query both workspaces separately, and I have their URLs.
I couldn't find this option in the Log Analytics documentation, so I'm wondering if anyone else has done this before, or if I must process the data after It's sent back to me.
Thanks in advance!

I did some further digging of the SDK source and found this nice example which does exactly what I want.
If you end up using this, it seems that the result is a union of the results from both of the workspaces - the results are not separated to different result tables.

You should be able to make cross-workspace queries like explained here in detail: https://learn.microsoft.com/en-us/azure/azure-monitor/logs/cross-workspace-query

Related

How to get a list of measure names in AWS timestream?

I want to get the list of measures in a timestream table, ie. the result of this query:
SELECT DISTINCT measure_name FROM <db_name>.<table_name>
The problem with this query is that it scans the entire db, which is expensive.
I wonder if there's a better way of getting this list, especially because the AWS console has an option to show the measures ("Show measures" in menu of the table). Screenshot of "Show measures" action in AWS console.
I tried to find a specialized action to do this in the API doc, but couldn't find anything. This left me to wonder if the AWS console runs the above (expensive) query everytime I look at the measures. Does it?
Side note: I need to run this from a python app / through boto3.
It's possible to use a "SHOW" statement.
https://docs.aws.amazon.com/timestream/latest/developerguide/supported-sql-constructs.SHOW.html
SHOW MEASURES FROM database.table [LIKE pattern]

Salesforce - Pull all deleted cases in Salesforce

I am trying to see if we can pull list of all Salesforce cases that have been deleted using their API using python.
The given below query returns back all Salesforce cases created, but I am trying to see how to retrieve all cases that have been deleted.
SELECT Id FROM Case
I tried doing the below, but it returned no data whereas I know there are deleted cases
SELECT Id FROM Case where isDeleted = true
Queries that include Recycle Bin need to be issued differently. In Apex you need to add "ALL ROWS"
In SOAP API it's queryAll vs normal query call. in REST API it's a different service, also "queryAll".
If you're using simple salesforce it's supposed to be
query = 'SELECT Id FROM Case LIMIT 10'
sf.bulk.Account.query_all(query)
If you're using another library - you'll need to check internals, which API it uses and whether it exposed queryAll to you.
(rememeber that records that are purged from recycle bin don't show up in these queries anymore and then your only hope is something like Data Replication API's getDeleted())

how to create a glue search script in python

so I have been asked to write a python script that pulls out all the Glue databases in our aws account, and then lists all the tables and partitions in the database in a CSV file? Its acceptable for it to just run on desktop for now, would really love some guidance on how to do this/direction on how to go about this as I'm a new junior and would like to explore my options before going back to my manager
format:
layout of csv file
Can be easily done using Boto3 - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Client
I'll start it off for you and you can figure out the rest.
import boto3
glue_client = boto3.client('glue')
db_name_list = [db['Name'] for db in glue_client.get_databases()['DatabaseList']]
I haven't tested this code but it should create a list of all names of your databases. From here you can then use this information to run nested loops to get your tables get_tables(DatabaseName= ...) and then next your partitions get_partitions(DatabaseName=...,TableName=...).
Make sure to read the documentation to double check the arguments youre providing are correct.
EDIT: You will also likely need to use a paginator if you have a large amount of values to be returned. Best practice would be to use the paginator for all three calls which would just mean an additional loop at each step. Documentation about paginator is here - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html#Glue.Paginator.GetDatabases
And there is plenty of stackoverflow examples on how to use it.

Multiple input sources for MSSQL Server 2017 Python analytical services

I am currently porting some code from Spark to the MSSQL Analytical Services with Python. Everything is nice and dandy, but I am not sure if my solution is the correct one for multiple inputs for the scripts.
Consider the following code snippet:
DROP PROCEDURE IF EXISTS SampleModel;
GO
CREATE PROCEDURE SampleModel
AS
BEGIN
exec sp_execute_external_script
#language =N'Python',
#script=N'
import sys
sys.path.append("C:\path\to\custom\package")
from super_package.sample_model import run_model
OutputDataSet = run_model()'
WITH RESULT SETS ((Score float));
END
GO
INSERT INTO [dbo].[SampleModelPredictions] (prediction) EXEC [dbo].[SampleModel]
GO
I have a custom package called super_package and a sample model called sample_model. Since this model uses multiple database tables as input, and I would rather have everything in one place I have a module which connects to the database and fetches the data directly:
def go_go_get_data(query, config):
return rx_data_step(RxSqlServerData(
sql_query=query,
connection_string=config.connection_string,
user=config.user,
password=config.password))
Inside the run_model() function I fetch all necessary data from the database with the go_go_get_data function.
If the data is too big to handle in one go I would to some pagination.
In general I cannot join the tables so this solution doesn't work.
The questions is: Is this the right approach to tackle this problem? Or did I miss something? For now this works, but as I am still in the development / tryout phase I cannot be certain that this will scale. I would rather use the parameters for the stored procedure than fetching inside the Python context.
As you've already figured out, sp_execucte_external_script only allows one result set to be passed in. :-(
You can certainly query from inside the script to fetch data as long as your script is okay with the fact that it's not executing under the current SQL session's user's permissions.
If pagination is important and one data set is significantly larger than the others and you're using Enterprise Edition, you might consider passing the largest data set into the script in chunks using sp_execute_external_script's streaming feature.
If you'd like all of your data to be assembled in SQL Server (vs. fetched by queries in your script), you could try to serialize the result sets and then pass them in as parameters (link describes how to do this in R but something similar should be possible with Python).

How to insert historical data on their respective partitions

I have a database that has records stretching back to 2014 that I have to migrate it to BigQuery, and I think that using the partitioned tables feature will help on the performance of the database.
So far, I loaded a small sample of the real data via the web UI, and while the table was already partitioned, all the data went to a single partition with the date that I had run the query in, which was expected, to be fair.
I searched the documentation sites and I ran into this, which I'm not sure if that's what I'm looking for.
I have two questions:
1) In the above example, they use the decorator on a SELECT query, but can I use it on a INSERT query as well?
2) I'm using the Python client to connect to the BigQuery API, and I while I found the table.insert_data method, I couldn't find anything that refers specifically to insert in the partitions, and I'm wondering if I missed it or I will have to use the query API to also insert data.
Investigated this a bit more:
1) I don't think I've managed to run an INSERT query at all, but this is moot for me, because..
2) Turns out that it is possible to insert in the partitions directly using the Python client, but it wasn't obvious to me:
I was using this snippet to insert some data into a table:
from google.cloud import bigquery
items = [
(1, 'foo'),
(2, 'bar')
]
client = bigquery.Client()
dataset = client.dataset('<dataset>')
table = dataset.table('<table_name>')
table.reload()
print table.insert_data(items)
The key is appending a $ and a date (say, 20161201) to the table name in the selector, like so:
table = dataset.table('<table_name>$20161201')
And it should insert in the correct partition.

Categories

Resources