Similar to this: How to export GCP's Security Center Assets to a Cloud Storage via cloud Function?
I need to export the Findings as seen in the Security Command Center to BigQuery so we can easily filter the data we need and generate custom reports.
Using this documentation as an example (https://cloud.google.com/security-command-center/docs/how-to-api-list-findings#python), I wrote the following:
from google.cloud import securitycenter
from google.cloud import bigquery
JSONPath = "Path to JSON File For Service Account"
client = securitycenter.SecurityCenterClient().from_service_account_json(JSONPath)
BQclient = bigquery.Client().from_service_account_json(JSONPath)
table_id = "project.security_center.assets"
org_name = "organizations/1234567891011"
all_sources = "{org_name}/sources/-".format(org_name=org_name)
finding_result_iterator = client.list_findings(request={"parent": all_sources})
for i, finding_result in enumerate(finding_result_iterator):
errors = BQclient.insert_rows_json(table_id, finding_result)
if errors == []:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
However, that then gave me the error:
"json_rows argument should be a sequence of dicts".
Any help with this would be greatly appreciated :)
Not sure if this existed back then in Q2 of 2021, but now there is documentation telling how to do this:
https://cloud.google.com/security-command-center/docs/how-to-analyze-findings-in-big-query
You can create exports of SCC findings to bigquery using this command:
gcloud scc bqexports create BIG_QUERY_EXPORT \
--dataset=DATASET_NAME \
--folder=FOLDER_ID | --organization=ORGANIZATION_ID | --project=PROJECT_ID \
[--description=DESCRIPTION] \
[--filter=FILTER]
Filter will allow to filter out unwanted findings (they will be in SCC, but won't be copied to the BigQuery).
It's useful if you want to export findings from one project or selected categories only. (Use -category:CATEGORY to exclude categories, works the same on different parameters as well).
I managed to sort this by writing:
for i, finding_result in enumerate(finding_result_iterator):
rows_to_insert = [
{u"category": finding_result.finding.category, u"name": finding_result.finding.name, u"project": finding_result.resource.project_display_name, u"external_uri": finding_result.finding.external_uri},
]
Related
I am trying to get all resources and providers from Azure subscription by using Python SDK.
Here is my code:
get all resources by "resource group"
extract id of each resource within "resource group"
then calling details about particular resource by its id
The problem is that each call from point 3. requires a correct "API version" and it differs from object to object. So obviously my code keeps failing when trying to find some common API version that fits to everything.
Is there a way to retrieve suitable API version per resource in resource group ??? (similarly as retrieving id, name, ...)
# Import specific methods and models from other libraries
from azure.mgmt.resource import SubscriptionClient
from azure.identity import AzureCliCredential
from azure.mgmt.resource import ResourceManagementClient
credential = AzureCliCredential()
client = ResourceManagementClient(credential, "<subscription_id>")
rg = [i for i in client.resource_groups.list()]
# Retrieve the list of resources in "myResourceGroup" (change to any name desired).
# The expand argument includes additional properties in the output.
rg_resources = {}
for i in range(0, len(rg)):
rg_resources[rg[i].as_dict()
["name"]] = client.resources.list_by_resource_group(
rg[i].as_dict()["name"],
expand="properties,created_time,changed_time")
data = {}
for i in rg_resources.keys():
details = []
for _data in iter(rg_resources[i]):
a = _data
details.append(client.resources.get_by_id(vars(_data)['id'], 'latest'))
data[i] = details
print(data)
error:
azure.core.exceptions.HttpResponseError: (NoRegisteredProviderFound) No registered resource provider found for location 'westeurope' and API version 'latest' for type 'workspaces'. The supported api-versions are '2015-03-20, 2015-11-01-preview, 2017-01-01-preview, 2017-03-03-preview, 2017-03-15-preview, 2017-04-26-preview, 2020-03-01-preview, 2020-08-01, 2020-10-01, 2021-06-01, 2021-03-01-privatepreview'. The supported locations are 'eastus, westeurope, southeastasia, australiasoutheast, westcentralus, japaneast, uksouth, centralindia, canadacentral, westus2, australiacentral, australiaeast, francecentral, koreacentral, northeurope, centralus, eastasia, eastus2, southcentralus, northcentralus, westus, ukwest, southafricanorth, brazilsouth, switzerlandnorth, switzerlandwest, germanywestcentral, australiacentral2, uaecentral, uaenorth, japanwest, brazilsoutheast, norwayeast, norwaywest, francesouth, southindia, jioindiawest, canadaeast, westus3
What information exactly do you want to retrieve from the resources?
In most cases, I would recommend to use the Graph API to query over all resources. This is very powerful, as you can query the whole platform using a simple Query language - Kusto Query Lanaguage (KQL)
You can try the queries directly in the Azure service Azure Resource Graph Explorer in the Portal
A query that summarizes all types of resources would be:
resources
| project resourceGroup, type
| summarize count() by type, resourceGroup
| order by count_
A simple python-codeblock can be seen on the linked documentation above.
The below sample uses DefaultAzureCredential for authentication and lists the first resource in detail, that is in a resource group, where its name starts with "rg".
# Import Azure Resource Graph library
import azure.mgmt.resourcegraph as arg
# Import specific methods and models from other libraries
from azure.mgmt.resource import SubscriptionClient
from azure.identity import DefaultAzureCredential
# Wrap all the work in a function
def getresources( strQuery ):
# Get your credentials from environment (CLI, MSI,..)
credential = DefaultAzureCredential()
subsClient = SubscriptionClient(credential)
subsRaw = []
for sub in subsClient.subscriptions.list():
subsRaw.append(sub.as_dict())
subsList = []
for sub in subsRaw:
subsList.append(sub.get('subscription_id'))
# Create Azure Resource Graph client and set options
argClient = arg.ResourceGraphClient(credential)
argQueryOptions = arg.models.QueryRequestOptions(result_format="objectArray")
# Create query
argQuery = arg.models.QueryRequest(subscriptions=subsList, query=strQuery, options=argQueryOptions)
# Run query
argResults = argClient.resources(argQuery)
# Show Python object
print(argResults)
getresources("Resources | where resourceGroup startswith 'rg' | limit 1")
Ok I'm new to python but...I really like it. I have been trying to figure this out for awhile and thought someone could help that knows a lot more than I.
So what I would like to do is use pygsheets and combine batch the updates with one api call vs several. I have been searching for examples or ideas and found if you unlink and link it will do this? I tried and it speed it up only a little bit, then I looked and you could use update.values vs update.value. I have got it to work with the something like this wk1.update_values('A2:C4',[[1,2,3],[4,5,6],[7,8,9]]) but what if you want the updates to be in specific cell locations vs a range like a2:c4? I appreciate any advice in advance.
https://pygsheets.readthedocs.io/en/latest/worksheet.html#pygsheets.Worksheet.update_values
https://pygsheets.readthedocs.io/en/latest/sheet_api.html?highlight=batch_updates#pygsheets.sheet.SheetAPIWrapper.values_batch_update
import pygsheets
gc = pygsheets.authorize() # This will create a link to authorize
# Open spreadsheet
GS_ID = ''
File_Tab_Name = 'File1'
Main_Topic = 'Main Topic'
Actual_Company_Name = 'Company Name'
Street = 'Street Address'
City_State_Zip = 'City State Zip'
Phone_Number = 'Phone Number'
# 2. Open spreadsheet by key
sh = gc.open_by_key(GS_ID)
sh.title = File_Tab_Name
wk1 = sh[0]
wk1.title = File_Tab_Name
#wk1.update_values('A2:C4',[[1,2,3],[4,5,6],[7,8,9]])
wk1.update_values([['a1'],['h1'],['i3']],[[Main_Topic],[Actual_Company_Name],[Street]]) ### is this possible
#wk1.unlink()
#wk1.title = File_Tab_Name
#wk1.update_value("a1",Main_Topic) ###Topic
#wk1.update_value("h1",Actual_Company_Name) ###Company Name
#wk1.update_value("i3",Street) ###Street Address
#wk1.update_value("i4",City_State_Zip) ###City State Zip
#wk1.update_value("i5",Phone_Number) ### Phone Number
#wk1.link() # will do all the updates
From what I could undersand you want to batch update values. you can use the update_values_batch function.
wks.update_values_batch(['A1:A2', 'B1:B2'], [[[1],[2]], [[3],[4]]])
# or
wks.update_values_batch([((1,1), (2,1)), 'B1:B2'], [[[1,2]], [[3,4]]], 'COLUMNS')
# or
wks.update_values_batch(['A1:A2', 'B1:B2'], [[[1,2]], [[3,4]]], 'COLUMNS')
see doc here.
NB: update pygsheets to latest version or install from gitub
pip install --upgrade https://github.com/nithinmurali/pygsheets/archive/staging.zip
Unfortunately, pygsheets has no method for updating multiple ranges in batch. Instead, you can use gspread.
gspread has batch_update method where you can update multiple cell or range at once.
Example:
Code:
import gspread
gc = gspread.service_account()
sh = gc.open_by_key("insert spreadsheet key here").sheet1
sh.batch_update([{
'range': 'A1:B1',
'values': [['42', '43']],
}, {
'range': 'A2:B2',
'values': [['44', '45']],
}])
Output:
References:
gspread:batch_update()
gspread Authentication
I'm pretty new using stackoverflow as well as using the google cloud platform, so apologies if am not asking this question in the right format. I am currently facing an issue with getting the predictions from my model.
I've trained a multilabel automl model on the google cloud platform and and now i want to use that model to score out new data entries.
Since the platform only allows one entry at the same time i want to make use of python to do batch predictions.
I've stored my data entries in seperate .txt files on the google cloud bucket and created a .txt file where i'm listing the gs:// references to those files (like they recommend in the documentation).
I've exported a .json file with my credentials from the service account and specified the id's and paths in my code:
# import API credentials and specify model / path references
path = 'xxx.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = path
model_name = 'xxx'
model_id = 'TCN1234567890'
project_id = '1234567890'
model_full_id = f"https://eu-automl.googleapis.com/v1/projects/{project_id}/locations/eu/models/{model_id}"
input_uri = f"gs://bucket_name/{model_name}/file_list.txt"
output_uri = f"gs://bucket_name/{model_name}/outputs/"
prediction_client = automl.PredictionServiceClient()
And then i'm running the following code to get the predictions:
# score batch of file_list
gcs_source = automl.GcsSource(input_uris=[input_uri])
input_config = automl.BatchPredictInputConfig(gcs_source=gcs_source)
gcs_destination = automl.GcsDestination(output_uri_prefix=output_uri)
output_config = automl.BatchPredictOutputConfig(
gcs_destination=gcs_destination
)
response = prediction_client.batch_predict(
name=model_full_id,
input_config=input_config,
output_config=output_config
)
print("Waiting for operation to complete...")
print(
f"Batch Prediction results saved to Cloud Storage bucket. {response.result()}"
)
However, i'm getting the following error: InvalidArgument: 400 Request contains an invalid argument.
Would anyone have a hince what is causing this issue?
Any input would be appreciated! Thanks!
Found the issue!
I needed to set the client to the 'eu' environment first:
options = ClientOptions(api_endpoint='eu-automl.googleapis.com')
prediction_client = automl.PredictionServiceClient(client_options=options)
I have a cloud function calling SCC's list_assets and converting the paginated output to a List (to fetch all the results). However, since I have quite a lot of assets in the organization tree, it is taking a lot of time to fetch and cloud function times out (540 seconds max timeout).
asset_iterator = security_client.list_assets(org_name)
asset_fetch_all=list(asset_iterator)
I tried to export via WebUI and it works fine (took about 5 minutes). Is there a way to export the assets from SCC directly to a Cloud Storage bucket using the API?
I develop the same thing, in Python, for exporting to BQ. Searching in BigQuery is easier than in a file. The code is very similar for GCS storage. Here my working code with BQ
import os
from google.cloud import asset_v1
from google.cloud.asset_v1.proto import asset_service_pb2
from google.cloud.asset_v1 import enums
def GCF_ASSET_TO_BQ(request):
client = asset_v1.AssetServiceClient()
parent = 'organizations/{}'.format(os.getenv('ORGANIZATION_ID'))
output_config = asset_service_pb2.OutputConfig()
output_config.bigquery_destination.dataset = 'projects/{}/datasets/{}'.format(os.getenv('PROJECT_ID'),os.getenv('DATASET'))
content_type = enums.ContentType.RESOURCE
output_config.bigquery_destination.table = 'asset_export'
output_config.bigquery_destination.force = True
response = client.export_assets(parent, output_config, content_type=content_type)
# For waiting the finish
# response.result()
# Do stuff after export
return "done", 200
if __name__ == "__main__":
GCF_ASSET_TO_BQ('')
As you can see, there is some values in Env Var (OrganizationID, projectId and Dataset). For exporting to Cloud Storage, you have to change the definition of the output_config like this
output_config = asset_service_pb2.OutputConfig()
output_config.gcs_destination.uri = 'gs://path/to/file'
You have example in other languages here
Try something like this:
We use it to upload finding into a bucket. Make sure to give the SP the function is running the right permissions to the bucket.
def test_list_medium_findings(source_name):
# [START list_findings_at_a_time]
from google.cloud import securitycenter
from google.cloud import storage
# Create a new client.
client = securitycenter.SecurityCenterClient()
#Set query paramaters
organization_id = "11112222333344444"
org_name = "organizations/{org_id}".format(org_id=organization_id)
all_sources = "{org_name}/sources/-".format(org_name=org_name)
#Query Security Command Center
finding_result_iterator = client.list_findings(all_sources,filter_=YourFilter)
#Set output file settings
bucket="YourBucketName"
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket)
output_file_name = "YourFileName"
my_file = bucket.blob(output_file_name)
with open('/tmp/data.txt', 'w') as file:
for i, finding_result in enumerate(finding_result_iterator):
file.write(
"{}: name: {} resource: {}".format(
i, finding_result.finding.name, finding_result.finding.resource_name
)
)
#Upload to bucket
my_file.upload_from_filename("/tmp/data.txt")
I'm beginner to kafka client in python, i need some help to describe the topics using the client.
I was able to list all my kafka topics using the following code:-
consumer = kafka.KafkaConsumer(group_id='test', bootstrap_servers=['kafka1'])
topicList = consumer.topics()
After referring multiple articles and code samples, I was able to do this through describe_configs using confluent_kafka.
Link 1 [Confluent-kafka-python]
Link 2 Git Sample
Below is my sample code!!
from confluent_kafka.admin import AdminClient, NewTopic, NewPartitions, ConfigResource
import confluent_kafka
import concurrent.futures
#Creation of config
conf = {'bootstrap.servers': 'kafka1','session.timeout.ms': 6000}
adminClient = AdminClient(conf)
topic_configResource = adminClient.describe_configs([ConfigResource(confluent_kafka.admin.RESOURCE_TOPIC, "myTopic")])
for j in concurrent.futures.as_completed(iter(topic_configResource.values())):
config_response = j.result(timeout=1)
I have found how to do it with kafka-python:
from kafka.admin import KafkaAdminClient, ConfigResource, ConfigResourceType
KAFKA_URL = "localhost:9092" # kafka broker
KAFKA_TOPIC = "test" # topic name
admin_client = KafkaAdminClient(bootstrap_servers=[KAFKA_URL])
configs = admin_client.describe_configs(config_resources=[ConfigResource(ConfigResourceType.TOPIC, KAFKA_TOPIC)])
config_list = configs.resources[0][4]
In config_list (list of tuples) you have all the configs for the topic.
Refer: https://docs.confluent.io/current/clients/confluent-kafka-python/
list_topics provide confluent_kafka.admin.TopicMetadata (topic,
partitions)
kafka.admin.TopicMetadata.partitions provide: confluent_kafka.admin.PartitionMetadata (Partition id, leader, replicas, isrs)
from confluent_kafka.admin import AdminClient
kafka_admin = AdminClient({"bootstrap.servers": bootstrap_servers})
for topic in topics:
x = kafka_admin.list_topics(topic=topic)
print x.topics, '\n'
for key, value in x.topics.items():
for keyy, valuey in value.partitions.items():
print keyy, ' Partition id : ', valuey, 'leader : ', valuey.leader,' replica: ', valuey.replicas
Interestingly, for Java this functionality (describeTopics()) sits within the KafkaAdminCLient.java.
So, I was trying to look for the python equivalent of the same and I discovered the code repository of kafka-python.
The documentation (in-line comments) in admin-client equivalent in kafka-python package says the following:
describe topics functionality is in ClusterMetadata
Note: if implemented here, send the request to the controller
I then switched to cluster.py file in the same repository. This contains the topics() function that you've used to retrieve the list of topics and the following 2 functions that could help you achieve the describe functionality:
partitions_for_topic() - Return set of all partitions for topic (whether available or not)
available_partitions_for_topic() - Return set of partitions with known leaders
Note: I haven't tried this myself so I'm not entierly sure if the behaviour would be identical to what you would see in the result for kafka-topics --describe ... command but worth a try.
I hope this helps!