Firestore Python SDK - count() aggregation - python

I'm using Firebase Firestore in my Python project (with their official Python SDK) and having trouble performing count() aggregation. This funciton is supported according to their docs. However, they do not provide Python example ( they do in other parts of documentation ). I tried to play with it in Python console, tried something like this:
query = db.collection('videos').where('status', '==', 'pending')
query.count()
without any luck. So I'm wondering how is it possible to implement? Does Python SDK support this functionality?

Firebase Admin Python SDK doesn't support that query yet. You can still use the runAggregationQuery REST API meanwhile. The Google Cloud Firestore Python SDK has Aggregation result types available from v2.7.0+ so it should be available in Admin SDK soon.

Although the API for this purpose is not fully ready but it's coming along. If you don't wanna use the REST API as suggested by #Dharmaraj, you can do something like this for now:
from google.cloud.firestore_v1.services.firestore import FirestoreClient
from google.cloud.firestore_v1.types.document import Value
from google.cloud.firestore_v1.types.firestore import RunAggregationQueryRequest
from google.cloud.firestore_v1.types.query import (
StructuredAggregationQuery,
StructuredQuery,
)
Aggregation = StructuredAggregationQuery.Aggregation
CollectionSelector = StructuredQuery.CollectionSelector
Count = Aggregation.Count
FieldFilter = StructuredQuery.FieldFilter
FieldReference = StructuredQuery.FieldReference
Filter = StructuredQuery.Filter
Operator = StructuredQuery.FieldFilter.Operator
client = FirestoreClient()
project_id = ""
request = RunAggregationQueryRequest(
parent=f"projects/{project_id}/databases/(default)/documents",
structured_aggregation_query=StructuredAggregationQuery(
structured_query=StructuredQuery(
from_=[CollectionSelector(collection_id="videos")],
where=Filter(
field_filter=FieldFilter(
field=FieldReference(
field_path="status",
),
op=Operator.EQUAL,
value=Value(string_value="pending"),
)
),
),
aggregations=[Aggregation(count=Count())],
),
)
stream = client.run_aggregation_query(request=request)
print(next(stream).result.aggregate_fields["field_1"].integer_value)
Output:
1
Generally the following would work to count the total number of documents in a collection:
def count_documents(collection_id: str) -> int:
client = FirestoreClient()
project_id = ""
request = RunAggregationQueryRequest(
parent=f"projects/{project_id}/databases/(default)/documents",
structured_aggregation_query=StructuredAggregationQuery(
structured_query=StructuredQuery(
from_=[CollectionSelector(collection_id=collection_id)]
),
aggregations=[Aggregation(count=Count())],
),
)
stream = client.run_aggregation_query(request=request)
return next(stream).result.aggregate_fields["field_1"].integer_value
print(count_documents(collection_id="videos"))
Output:
10
Make sure that you have google-cloud-firestore>=2.7.3 and also remember to set the value of project_id variable in the count_documents function accordingly.

Related

can I access api gate-way of a lambda and extend it?

I have a lambda function like this :
my_executor = python_lambda.PythonFunction(
scope=self,
id="my-lambda-new",
runtime=_lambda.Runtime.PYTHON_3_8,
role=existing_role_for_lambda,
memory_size=512,
function_name="my-lambda-new",
description="This ",
entry="./logs/src/myfolder",
index='controller.py',
handler="lambda_handler",
timeout=core.Duration.minutes(5)
)
I have create api gate way like this :
my_api = _apigateway.LambdaRestApi(
scope=self,
id="my-api",
endpoint_configuration=_apigateway.EndpointConfiguration(
types=[_apigateway.EndpointType.EDGE]
),
handler=my_executor,
default_cors_preflight_options=shared_stack.cors_options,
deploy_options=api_stage,
proxy=True
)
and now In other file using lambda arn I am accessing lambda like this:
_lambda_arn = ssm.StringParameter.value_for_string_parameter(self, "my-executor-lambda-arn")
self.my_executor_lambda = _lambda.Function.from_function_arn(self, "my_executor",
_lambda_arn)
Now how I can extent it's api gateway . want to add new api end point in here
from CDK docs:
Import an existing RestApi that can be configured with additional Methods and Resources.
Take a look at fromRestApiAttributes

How to get all resources with details from Azure subscription via Python

I am trying to get all resources and providers from Azure subscription by using Python SDK.
Here is my code:
get all resources by "resource group"
extract id of each resource within "resource group"
then calling details about particular resource by its id
The problem is that each call from point 3. requires a correct "API version" and it differs from object to object. So obviously my code keeps failing when trying to find some common API version that fits to everything.
Is there a way to retrieve suitable API version per resource in resource group ??? (similarly as retrieving id, name, ...)
# Import specific methods and models from other libraries
from azure.mgmt.resource import SubscriptionClient
from azure.identity import AzureCliCredential
from azure.mgmt.resource import ResourceManagementClient
credential = AzureCliCredential()
client = ResourceManagementClient(credential, "<subscription_id>")
rg = [i for i in client.resource_groups.list()]
# Retrieve the list of resources in "myResourceGroup" (change to any name desired).
# The expand argument includes additional properties in the output.
rg_resources = {}
for i in range(0, len(rg)):
rg_resources[rg[i].as_dict()
["name"]] = client.resources.list_by_resource_group(
rg[i].as_dict()["name"],
expand="properties,created_time,changed_time")
data = {}
for i in rg_resources.keys():
details = []
for _data in iter(rg_resources[i]):
a = _data
details.append(client.resources.get_by_id(vars(_data)['id'], 'latest'))
data[i] = details
print(data)
error:
azure.core.exceptions.HttpResponseError: (NoRegisteredProviderFound) No registered resource provider found for location 'westeurope' and API version 'latest' for type 'workspaces'. The supported api-versions are '2015-03-20, 2015-11-01-preview, 2017-01-01-preview, 2017-03-03-preview, 2017-03-15-preview, 2017-04-26-preview, 2020-03-01-preview, 2020-08-01, 2020-10-01, 2021-06-01, 2021-03-01-privatepreview'. The supported locations are 'eastus, westeurope, southeastasia, australiasoutheast, westcentralus, japaneast, uksouth, centralindia, canadacentral, westus2, australiacentral, australiaeast, francecentral, koreacentral, northeurope, centralus, eastasia, eastus2, southcentralus, northcentralus, westus, ukwest, southafricanorth, brazilsouth, switzerlandnorth, switzerlandwest, germanywestcentral, australiacentral2, uaecentral, uaenorth, japanwest, brazilsoutheast, norwayeast, norwaywest, francesouth, southindia, jioindiawest, canadaeast, westus3
What information exactly do you want to retrieve from the resources?
In most cases, I would recommend to use the Graph API to query over all resources. This is very powerful, as you can query the whole platform using a simple Query language - Kusto Query Lanaguage (KQL)
You can try the queries directly in the Azure service Azure Resource Graph Explorer in the Portal
A query that summarizes all types of resources would be:
resources
| project resourceGroup, type
| summarize count() by type, resourceGroup
| order by count_
A simple python-codeblock can be seen on the linked documentation above.
The below sample uses DefaultAzureCredential for authentication and lists the first resource in detail, that is in a resource group, where its name starts with "rg".
# Import Azure Resource Graph library
import azure.mgmt.resourcegraph as arg
# Import specific methods and models from other libraries
from azure.mgmt.resource import SubscriptionClient
from azure.identity import DefaultAzureCredential
# Wrap all the work in a function
def getresources( strQuery ):
# Get your credentials from environment (CLI, MSI,..)
credential = DefaultAzureCredential()
subsClient = SubscriptionClient(credential)
subsRaw = []
for sub in subsClient.subscriptions.list():
subsRaw.append(sub.as_dict())
subsList = []
for sub in subsRaw:
subsList.append(sub.get('subscription_id'))
# Create Azure Resource Graph client and set options
argClient = arg.ResourceGraphClient(credential)
argQueryOptions = arg.models.QueryRequestOptions(result_format="objectArray")
# Create query
argQuery = arg.models.QueryRequest(subscriptions=subsList, query=strQuery, options=argQueryOptions)
# Run query
argResults = argClient.resources(argQuery)
# Show Python object
print(argResults)
getresources("Resources | where resourceGroup startswith 'rg' | limit 1")

How To Export GCP Security Command Center Findings To BigQuery?

Similar to this: How to export GCP's Security Center Assets to a Cloud Storage via cloud Function?
I need to export the Findings as seen in the Security Command Center to BigQuery so we can easily filter the data we need and generate custom reports.
Using this documentation as an example (https://cloud.google.com/security-command-center/docs/how-to-api-list-findings#python), I wrote the following:
from google.cloud import securitycenter
from google.cloud import bigquery
JSONPath = "Path to JSON File For Service Account"
client = securitycenter.SecurityCenterClient().from_service_account_json(JSONPath)
BQclient = bigquery.Client().from_service_account_json(JSONPath)
table_id = "project.security_center.assets"
org_name = "organizations/1234567891011"
all_sources = "{org_name}/sources/-".format(org_name=org_name)
finding_result_iterator = client.list_findings(request={"parent": all_sources})
for i, finding_result in enumerate(finding_result_iterator):
errors = BQclient.insert_rows_json(table_id, finding_result)
if errors == []:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
However, that then gave me the error:
"json_rows argument should be a sequence of dicts".
Any help with this would be greatly appreciated :)
Not sure if this existed back then in Q2 of 2021, but now there is documentation telling how to do this:
https://cloud.google.com/security-command-center/docs/how-to-analyze-findings-in-big-query
You can create exports of SCC findings to bigquery using this command:
gcloud scc bqexports create BIG_QUERY_EXPORT \
--dataset=DATASET_NAME \
--folder=FOLDER_ID | --organization=ORGANIZATION_ID | --project=PROJECT_ID \
[--description=DESCRIPTION] \
[--filter=FILTER]
Filter will allow to filter out unwanted findings (they will be in SCC, but won't be copied to the BigQuery).
It's useful if you want to export findings from one project or selected categories only. (Use -category:CATEGORY to exclude categories, works the same on different parameters as well).
I managed to sort this by writing:
for i, finding_result in enumerate(finding_result_iterator):
rows_to_insert = [
{u"category": finding_result.finding.category, u"name": finding_result.finding.name, u"project": finding_result.resource.project_display_name, u"external_uri": finding_result.finding.external_uri},
]

How to describe a topic using kafka client in Python

I'm beginner to kafka client in python, i need some help to describe the topics using the client.
I was able to list all my kafka topics using the following code:-
consumer = kafka.KafkaConsumer(group_id='test', bootstrap_servers=['kafka1'])
topicList = consumer.topics()
After referring multiple articles and code samples, I was able to do this through describe_configs using confluent_kafka.
Link 1 [Confluent-kafka-python]
Link 2 Git Sample
Below is my sample code!!
from confluent_kafka.admin import AdminClient, NewTopic, NewPartitions, ConfigResource
import confluent_kafka
import concurrent.futures
#Creation of config
conf = {'bootstrap.servers': 'kafka1','session.timeout.ms': 6000}
adminClient = AdminClient(conf)
topic_configResource = adminClient.describe_configs([ConfigResource(confluent_kafka.admin.RESOURCE_TOPIC, "myTopic")])
for j in concurrent.futures.as_completed(iter(topic_configResource.values())):
config_response = j.result(timeout=1)
I have found how to do it with kafka-python:
from kafka.admin import KafkaAdminClient, ConfigResource, ConfigResourceType
KAFKA_URL = "localhost:9092" # kafka broker
KAFKA_TOPIC = "test" # topic name
admin_client = KafkaAdminClient(bootstrap_servers=[KAFKA_URL])
configs = admin_client.describe_configs(config_resources=[ConfigResource(ConfigResourceType.TOPIC, KAFKA_TOPIC)])
config_list = configs.resources[0][4]
In config_list (list of tuples) you have all the configs for the topic.
Refer: https://docs.confluent.io/current/clients/confluent-kafka-python/
list_topics provide confluent_kafka.admin.TopicMetadata (topic,
partitions)
kafka.admin.TopicMetadata.partitions provide: confluent_kafka.admin.PartitionMetadata (Partition id, leader, replicas, isrs)
from confluent_kafka.admin import AdminClient
kafka_admin = AdminClient({"bootstrap.servers": bootstrap_servers})
for topic in topics:
x = kafka_admin.list_topics(topic=topic)
print x.topics, '\n'
for key, value in x.topics.items():
for keyy, valuey in value.partitions.items():
print keyy, ' Partition id : ', valuey, 'leader : ', valuey.leader,' replica: ', valuey.replicas
Interestingly, for Java this functionality (describeTopics()) sits within the KafkaAdminCLient.java.
So, I was trying to look for the python equivalent of the same and I discovered the code repository of kafka-python.
The documentation (in-line comments) in admin-client equivalent in kafka-python package says the following:
describe topics functionality is in ClusterMetadata
Note: if implemented here, send the request to the controller
I then switched to cluster.py file in the same repository. This contains the topics() function that you've used to retrieve the list of topics and the following 2 functions that could help you achieve the describe functionality:
partitions_for_topic() - Return set of all partitions for topic (whether available or not)
available_partitions_for_topic() - Return set of partitions with known leaders
Note: I haven't tried this myself so I'm not entierly sure if the behaviour would be identical to what you would see in the result for kafka-topics --describe ... command but worth a try.
I hope this helps!

Google Analytics and Python

I'm brand new at Python and I'm trying to write an extension to an app that imports GA information and parses it into MySQL. There is a shamfully sparse amount of infomation on the topic. The Google Docs only seem to have examples in JS and Java...
...I have gotten to the point where my user can authenticate into GA using SubAuth. That code is here:
import gdata.service
import gdata.analytics
from django import http
from django import shortcuts
from django.shortcuts import render_to_response
def authorize(request):
next = 'http://localhost:8000/authconfirm'
scope = 'https://www.google.com/analytics/feeds'
secure = False # set secure=True to request secure AuthSub tokens
session = False
auth_sub_url = gdata.service.GenerateAuthSubRequestUrl(next, scope, secure=secure, session=session)
return http.HttpResponseRedirect(auth_sub_url)
So, step next is getting at the data. I have found this library: (beware, UI is offensive) http://gdata-python-client.googlecode.com/svn/trunk/pydocs/gdata.analytics.html
However, I have found it difficult to navigate. It seems like I should be gdata.analytics.AnalyticsDataEntry.getDataEntry(), but I'm not sure what it is asking me to pass it.
I would love a push in the right direction. I feel I've exhausted google looking for a working example.
Thank you!!
EDIT: I have gotten farther, but my problem still isn't solved. The below method returns data (I believe).... the error I get is: "'str' object has no attribute '_BecomeChildElement'" I believe I am returning a feed? However, I don't know how to drill into it. Is there a way for me to inspect this object?
def auth_confirm(request):
gdata_service = gdata.service.GDataService('iSample_acctSample_v1.0')
feedUri='https://www.google.com/analytics/feeds/accounts/default?max-results=50'
# request feed
feed = gdata.analytics.AnalyticsDataFeed(feedUri)
print str(feed)
Maybe this post can help out. Seems like there are not Analytics specific bindings yet, so you are working with the generic gdata.
I've been using GA for a little over a year now and since about April 2009, i have used python bindings supplied in a package called python-googleanalytics by Clint Ecker et al. So far, it works quite well.
Here's where to get it: http://github.com/clintecker/python-googleanalytics.
Install it the usual way.
To use it: First, so that you don't have to manually pass in your login credentials each time you access the API, put them in a config file like so:
[Credentials]
google_account_email = youraccount#gmail.com
google_account_password = yourpassword
Name this file '.pythongoogleanalytics' and put it in your home directory.
And from an interactive prompt type:
from googleanalytics import Connection
import datetime
connection = Connection() # pass in id & pw as strings **if** not in config file
account = connection.get_account(<*your GA profile ID goes here*>)
start_date = datetime.date(2009, 12, 01)
end_data = datetime.date(2009, 12, 13)
# account object does the work, specify what data you want w/
# 'metrics' & 'dimensions'; see 'USAGE.md' file for examples
account.get_data(start_date=start_date, end_date=end_date, metrics=['visits'])
The 'get_account' method will return a python list (in above instance, bound to the variable 'account'), which contains your data.
You need 3 files within the app. client_secrets.json, analytics.dat and google_auth.py.
Create a module Query.py within the app:
class Query(object):
def __init__(self, startdate, enddate, filter, metrics):
self.startdate = startdate.strftime('%Y-%m-%d')
self.enddate = enddate.strftime('%Y-%m-%d')
self.filter = "ga:medium=" + filter
self.metrics = metrics
Example models.py: #has the following function
import google_auth
service = googleauth.initialize_service()
def total_visit(self):
object = AnalyticsData.objects.get(utm_source=self.utm_source)
trial = Query(object.date.startdate, object.date.enddate, object.utm_source, ga:sessions")
result = service.data().ga().get(ids = 'ga:<your-profile-id>', start_date = trial.startdate, end_date = trial.enddate, filters= trial.filter, metrics = trial.metrics).execute()
total_visit = result.get('rows')
<yr save command, ColumnName.object.create(data=total_visit) goes here>

Categories

Resources