Salesforce - Pull all deleted cases in Salesforce - python

I am trying to see if we can pull list of all Salesforce cases that have been deleted using their API using python.
The given below query returns back all Salesforce cases created, but I am trying to see how to retrieve all cases that have been deleted.
SELECT Id FROM Case
I tried doing the below, but it returned no data whereas I know there are deleted cases
SELECT Id FROM Case where isDeleted = true

Queries that include Recycle Bin need to be issued differently. In Apex you need to add "ALL ROWS"
In SOAP API it's queryAll vs normal query call. in REST API it's a different service, also "queryAll".
If you're using simple salesforce it's supposed to be
query = 'SELECT Id FROM Case LIMIT 10'
sf.bulk.Account.query_all(query)
If you're using another library - you'll need to check internals, which API it uses and whether it exposed queryAll to you.
(rememeber that records that are purged from recycle bin don't show up in these queries anymore and then your only hope is something like Data Replication API's getDeleted())

Related

Joining logs from 2 Azure Log Analytics workspaces

I'm using the Azure SDK for Python to query a log Analytics workspace.
I have 2 workspaces I'd like to query, but I was wondering if there is a way to union the data inside the query instead of querying both workspaces and combining the result objects within my Python program.
Something like this -
from azure.monitor.query import LogsQueryClient
client = LogsQueryClient(creds)
query = """
TableName // Table from the current workspace
| union ExteralTableName // Table from a different workspace
"""
client.query_workspace("<current_workspace_id>", query, timespan="...")
The identity that executes this query will have permissions to query both workspaces separately, and I have their URLs.
I couldn't find this option in the Log Analytics documentation, so I'm wondering if anyone else has done this before, or if I must process the data after It's sent back to me.
Thanks in advance!
I did some further digging of the SDK source and found this nice example which does exactly what I want.
If you end up using this, it seems that the result is a union of the results from both of the workspaces - the results are not separated to different result tables.
You should be able to make cross-workspace queries like explained here in detail: https://learn.microsoft.com/en-us/azure/azure-monitor/logs/cross-workspace-query

Unable to fetch complete records from Salesforce using Python

I am trying to fetch the data from salesforce using the simple_salesforce library in python.
I am able to get the correct count of records while running the count query.
But while I am trying to put that results (in the form of list) into s3 as a JSON object, not as many reocrds are getting persisted as I captured from Salesforce.
Here is the piece of code:
result = sf.query("SELECT ID FROM Opportunity")['records']
object.put(Body=(bytes(json.dumps(result, indent=2).encode('UTF-8'))))
Is the problem on the Salesforce side or am I running into an issue using AWS's SDK to put the objects into S3?
Salesforce API returns stuff in chunks, default is 2000 records at a time. If it'd return to you 1M records it could kill your memory usage. Retrieve a chunk, process it (save to file?), request next chunk.
It's straight on the project's homepage:
If, due to an especially large result, Salesforce adds a
nextRecordsUrl to your query result, such as "nextRecordsUrl" :
"/services/data/v26.0/query/01gD0000002HU6KIAW-2000", you can pull the
additional results with either the ID or the full URL (if using the
full URL, you must pass ‘True’ as your second argument)
sf.query_more("01gD0000002HU6KIAW-2000")
sf.query_more("/services/data/v26.0/query/01gD0000002HU6KIAW-2000", True)
As a convenience, to retrieve all of the results in a single local
method call use
sf.query_all("SELECT Id, Email FROM Contact WHERE LastName = 'Jones'")

"Get" document from cosmosdb by id (not knowing the _rid)

As MS Support recently told me that using a "GET" is much more efficient in RUs usage than a sql query. I'm wondering if I can (within the azure.cosmos python package or a custom HTTP request to the REST API) get a document by its unique 'id' field (for which I generated a GUIDs) without an SQL Query.
Every example shown are using the link/path of the doc which is built with the '_rid' metadata of the document and not the 'id' field set when creating the doc.
I use a bulk upsert stored procedure I wrote to create my new documents and never retrieve the metadata for each one of them (I have ~ 100 millions docs) so retrieving the _rid would be equivalent to retrieving the doc itself.
The reason that the ReadDocument method is so much more efficient than a SQL query is because it uses _rid instead of a user generated field, even the required id field. This is because the _rid isn't just a unique value, it also encodes information about where that document is physically stored.
To give an example of how this works, let's say you are explaining to someone where a party is this weekend. You could use the name that you use for the house "my friend Ryan's house" or you could use the address "123 ThatOne Street Somewhere, WA 11111". They both are unique identifiers, but for someone trying to get there one is way more efficient than the other.
Telling someone to go to your friend's house is like using your own id. It does map to a specific house, but the person will still need to find out where that physically is to get there. Using the address is like working with the _rid field. Based on that information alone they can get to the party location. Of course, in the real world the person would probably need directions, but the data storage in a database is a lot more organized than most city streets so an address is sufficient to go retrieve the document.
If you want to take advantage of this method you will need to find a way to work with the _rid field.

How to fix query problem on Azure CosmosDB that occurs only on collections with large data?

I'm trying to read from a CosmosDB collection (MachineCollection) with a large amount of data (58 GB data; index-size 9 GB). Throughput is set to 1000 RU/s. The collection is partitioned with a Serial number, Read Location (WestEurope, NorthEurope), Write Location (WestEurope). Simultaneously to my reading attempts, the MachineCollection is fed with data every 20 seconds.
The problem is that I can not query any data via Python. If I execute the query on CosmosDB Data Explorer I get results in no time. (e.g. querying for a certain serial number).
For troubleshooting purposes, I have created a new Database (TestDB) and a TestCollection. In this TestCollection, there are 10 datasets of MachineCollection. If I try to read from this MachineCollection via Python it succeeds and I am able to save the data to CSV.
This makes me wonder why I am not able to query data from MachineCollection when configuring TestDB and TestCollection with the exact same properties.
What I have already tried for the querying via Python:
options['enableCrossPartitionQuery'] = True
Querying using PartitionKey: options['partitionKey'] = 'certainSerialnumber'
Same as always. Works with TestCollection, but not with MachineCollection.
Any ideas on how to resolve this issue are highly appreciated!
Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?
Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.
By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.
As I know,Azure portal doesn't automatically help you optimize your SQL so you need to handle this in the sdk or rest api.
You could set value of Max Item Count and paginate your data using continuation token. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
Another case you could refer to:How do I set continuation tokens for Cosmos DB queries sent by document_client objects in Python?

Incorrect and inconsistent user counts using Google Core Reporting API and bigquery when using variable/dimension filtering

Background: I have app and web data, some of my apps (new iOS versions) use GA dimensions and the rest (Android and web) use GA custom variables.
So firstly, I'm currently trying to replicate this query in BigQuery in the Query Explorer to get simple user counts over a defined date for my web users only:
select count(distinct fullvisitorid, 10000000) as users
from table_date_range([12345678.ga_sessions_],
timestamp('2015-02-01'), timestamp('2015-03-01'))
where hits.customvariables.customvarvalue like '%web%'
I get around 5.34m users. This corresponds to what I see in Google Analytics. I am confident this figure is correct.
If I go into the Query Explorer and apply no filters (so I include my app and web users) I get 5.70m users. Again, this corresponds to Google Analytics and we're confident this figure is correct, web makes up the majority of our traffic.
If I run another query in Query Explorer but this time apply the filter:
ga:customVarValue1=#web
I get 8.73m users. So I have more users after applying the filter than without... obviously this isn't correct and has something to do with how the Query Explorer is applying the filter post aggregation.
Note: When I run this query in BigQuery:
select sum(users)
from (
select count(distinct fullvisitorid, 1000000) as users,
hits.customvariables.customvarvalue as platform
from table_date_range([12345678.ga_sessions_],
timestamp('2015-02-01'), timestamp('2015-03-01'))
group each by 2)
where platform like '%web%'
I get 8.73m users. Almost the exact same number as I get when applying the filter in Query Explorer, the difference I get of around 1% can be explained by the sampling. I've tested it on multiple dates so I'm sure this is what's happening. Applying the filter post aggregation instead of pre (as in my first BigQuery query) leads to a higher number of users because we had two web releases in this timeframe. So all users are being counted once for every version of web they used.
To add:
One of the developers on my team wrote some Python script back in February which replicated the first BigQuery code written above (a simple user count where the variable=web) but instead hits the Core Reporting API and requests an unsampled report. Until March 5th 2015 the number of users we got using BigQuery versus the Python script were almost identical (difference of 1% due to sampling). Then on March 5th they began to diverge, even for historical user counts, and instead our Python script started producing counts similar to the Query Explorer (filters being applied post aggregation instead of pre).
My question(s) are:
1. What changed on March 5th?
2. How do we replicate in Query explorer the first BigQuery code above? Are we applying the variable filter correctly?
3. How do we replicate the BigQuery code in our Python script which hits the Core reporting API?
Lastly:
When in Query Explorer I ask for user counts over a given date and instead use a dimension filter:
ga:dimension2=#ios
I get around 50% LESS than I get in BigQuery when running:
select count(distinct fullvisitorid, 10000000) as users
from table_date_range([12345678.ga_sessions_],
timestamp('2015-02-01'), timestamp('2015-03-01'))
where hits.customdimensions.value like '%ios%'
If the filter was being applied post aggregation as it is when filtering using variables then I would get a higher user count, not less. I seriously cannot explain what the Query Explorer is doing in order to give me substantially lower counts when filtering on dimensions.
Please halp
I don't have an answer for you, but since you're using BigQuery, I assume you are a Premium customer? If so, you can open a ticket with their support team - they should get back to you quickly.

Categories

Resources