How to schedule a Job macie from boto3 with a specific time?

How to schedule a Job macie from boto3 with a specific time? - python

I have successfully created my Job, but now I would like it to run at 4am. I've searched the documentation and haven't found a way to do it, is that possible?
I leave my code below:
client = boto3.client('macie2')
response = client.create_classification_job(
customDataIdentifierIds=[ CustomDataIdentifierID ],
description='macie',
initialRun=True,
jobType='SCHEDULED',
name='macie_classification_job',
scheduleFrequency={"dailySchedule": {}}
)

Related

Update query string in scheduled query using Python Client for BigQuery Data Transfer Service

I'm struggling to find documentation and examples for Python Client for BigQuery Data Transfer Service. A new query string is generated by my application from time to time and I'd like to update the existing scheduled query accordingly. This is the most helpful thing I have found so far, however I am still unsure where to pass my query string. Is this the correct method?
from google.cloud import bigquery_datatransfer_v1
def sample_update_transfer_config():
# Create a client
client = bigquery_datatransfer_v1.DataTransferServiceClient()
# Initialize request argument(s)
transfer_config = bigquery_datatransfer_v1.TransferConfig()
transfer_config.destination_dataset_id = "destination_dataset_id_value"
request = bigquery_datatransfer_v1.UpdateTransferConfigRequest(
transfer_config=transfer_config,
)
# Make the request
response = client.update_transfer_config(request=request)
# Handle the response
print(response)

You may refer to this Update Scheduled Queries for python documentation from BigQuery for the official reference on the usage of Python Client Library in updating scheduled queries.
However, I updated the code for you to update your query string. I added the updated query string in the params and define what attributes of the TransferConfig() will be updated in the update_mask.
See updated code below:
from google.cloud import bigquery_datatransfer
from google.protobuf import field_mask_pb2
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
transfer_config_name = "projects/{your-project-id}/locations/us/transferConfigs/{unique-ID-of-transferconfig}"
new_display_name = "Your Desired Updated Name if Necessary" #--remove if no need to update **scheduled query name**.
query_string_new = """
SELECT
CURRENT_TIMESTAMP() as current_time
"""
new_params={
"query": query_string_new,
"destination_table_name_template": "your_table_{run_date}",
"write_disposition": "WRITE_TRUNCATE",
"partitioning_field": "",
}
transfer_config = bigquery_datatransfer.TransferConfig(name=transfer_config_name,
)
transfer_config.display_name = new_display_name #--remove if no need to update **scheduled query name**.
transfer_config.params = new_params
transfer_config = transfer_client.update_transfer_config(
{
"transfer_config": transfer_config,
"update_mask": field_mask_pb2.FieldMask(paths=["display_name","params"]), #--remove "display_name" from the list if no need to update **scheduled query name**.
}
)
print("Updates are executed successfully")
For you to get the value of your transfer_config_name, you may list all your scheduled queries by following this SO post.

How do I get the GCE vm instance id using python?

I am using python. I have the correct project and vm instance names. So I can query Google Cloud metrics just fine. But now I need to query some Agent metrics, but it needs the instance id of my vm instead of the name. What is the simplest way for me to get the instance id of my vm with a query?
Sorry, I should be more clear. Here is my sample code:
results = client.list_time_series(
request={
"name": project_name,
"filter": filter,
"interval": interval,
"view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
}
)
I want to make a query similar to this. Any simple filter I can use, or something else, that will get me the instance_id of a particular instance name?

If you are inside the gce vm you use the Metadata server
import requests
metadata_server = "http://metadata/computeMetadata/v1/instance/"
metadata_flavor = {'Metadata-Flavor' : 'Google'}
gce_id = requests.get(metadata_server + 'id', headers = metadata_flavor).text
gce_name = requests.get(metadata_server + 'hostname', headers = metadata_flavor).text
gce_machine_type = requests.get(metadata_server + 'machine-type', headers = metadata_flavor).text
If you are looking to list gce vms check the example in the GCP documentation for using client libraries: Listing Instances

How to use Azure DataBricks Api to submit job?

I am a beginner in Azure Databricks and I want to use APIs to create cluster and submit job in python. I am stuck as I am unable to do so. Also if I have an existing cluster how will the code look like? I got job id after running this code but unable to see any output.
import requests
DOMAIN = ''
TOKEN = ''
response = requests.post(
'https://%s/api/2.0/jobs/create' % (DOMAIN),
headers={'Authorization': 'Bearer %s' % TOKEN},
json={
"name": "SparkPi spark-submit job",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
},
"spark_submit_task": {
"parameters": [
"--class",
"org.apache.spark.examples.SparkPi",
"dbfs:/FileStore/sparkpi_assembly_0_1.jar",
"10"
]
}
}
)
if response.status_code == 200:
print(response.json())
else:
print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Jobs at Databricks could be executed two ways (see docs):
on a new cluster - that's how you do it right now
on existing cluster - remove the new_cluster block, and add the existing_cluster_id field with the ID of existing cluster. If you don't have a cluster yet, then you can create it via Cluster API
When you create a job, then you get back the job ID that could be used to edit the job or delete it. You can also launch the job using the Run Now API. But if you just want to execute the job without create the Job in the UI, then you need to look onto Run Submit API. Either of the APIs will return the ID of specific job run, and then you can use Run Get API to get status of the job, or Run Get Output API to get the results of execution.

Facebook Marketing API - how to handle rate limit for retrieving all ad sets through campaign ids?

I've recently started working with the Facebook Marketing API, using the facebook_business SDK for Python (running v3.9 on Ubuntu 20.04). I think I've mostly wrapped my head around how it works, however, I'm still kind of at a loss as to how I can handle the arbitrary way in which the API is rate-limited.
Specifically, what I'm attempting to do is to retrieve all Ad Sets from all the campaigns that have ever run on my ad account, regardless of whether their effective_status is ACTIVE, PAUSED, DELETED or ARCHIVED.
Hence, I pulled all the campaigns for my ad account. These are stored in a dict, whereby the key indicates the effective_status, like so, called output:
{'ACTIVE': ['******************',
'******************',
'******************'],
'PAUSED': ['******************',
'******************',
'******************'}
Then, I'm trying to pull the Ad Set ids, like so:
import pandas as pd
import json
import re
import time
from random import *
from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.adaccount import AdAccount # account-level info
from facebook_business.adobjects.campaign import Campaign # campaign-level info
from facebook_business.adobjects.adset import AdSet # ad-set level info
from facebook_business.adobjects.ad import Ad # ad-level info
# auth init
app_id = open(APP_ID_PATH, 'r').read().splitlines()[0]
app_secret = open(APP_SECRET_PATH, 'r').read().splitlines()[0]
token = open(APP_ACCESS_TOKEN, 'r').read().splitlines()[0]
# init the connection
FacebookAdsApi.init(app_id, app_secret, token)
campaign_types = list(output.keys())
ad_sets = {}
for status in campaign_types:
ad_sets_for_status = []
for campaign_id in output[status]:
# sleep and wait for a random time
sleepy_time = uniform(1, 3)
time.sleep(sleepy_time)
# pull the ad_sets for this particular campaign
campaign_ad_sets = Campaign(campaign_id).get_ad_sets()
for entry in campaign_ad_sets:
ad_sets_for_status.append(entry['id'])
ad_sets[status] = ad_sets_for_status
Now, this crashes at different times whenever I run it, with the following error:
FacebookRequestError:
Message: Call was not successful
Method: GET
Path: https://graph.facebook.com/v11.0/23846914220310083/adsets
Params: {'summary': 'true'}
Status: 400
Response:
{
"error": {
"message": "(#17) User request limit reached",
"type": "OAuthException",
"is_transient": true,
"code": 17,
"error_subcode": 2446079,
"fbtrace_id": "***************"
}
}
I can't reproduce the time at which it crashes, however, it certainly doesn't take ~600 calls (see here: https://stackoverflow.com/a/29690316/5080858), and as you can see, I'm sleeping ahead of every API call. You might suggest that I should just call the get_ad_sets method on the AdAccount endpoint, however, this pulls fewer ad sets than the above code does, even before it crashes. For my use-case, it's important to pull ads that are long over as well as ads that are ongoing, hence it's important that I get as much data as possible.
I'm kind of annoyed with this -- seeing as we are paying for these ads to run, you'd think FB would make it as easy as possible to retrieve info on them via API, and not introduce API rate limits similar to those for valuable data one doesn't necessarily own.
Anyway, I'd appreciate any kind of advice or insights - perhaps there's also a much better way of doing this that I haven't considered.
Many thanks in advance!

The error with 'code': 17 means that you reach the limit of call and in order to get more nodes you have to wait.
Firstly I would handle the error in this way:
from facebook_business.exceptions import FacebookRequestError
...
for status in campaign_types:
ad_sets_for_status = []
for campaign_id in output[status]:
# keep trying until the request is ok
while True:
try:
campaign_ad_sets = Campaign(campaign_id).get_ad_sets()
break
except FacebookRequestError as error:
if error.api_error_code() in [17, 80000]:
time.sleep(sleepy_time) # sleep for a period of time
for entry in campaign_ad_sets:
ad_sets_for_status.append(entry['id'])
ad_sets[status] = ad_sets_for_status
I'd like to suggest you moreover to fetch the list of nodes from the account (by using the 'level': node param in params) and by using the batch calls: I can assure you that this will help you a lot and it will decrease the program run time.
I hope I was helpful.

What is the best Python Design pattern for consuming Paginated Messages from a TWILIO's HTTP GET REST API response?

Trying to implement source_to_raw to consume Twilio API responses via a python script. Below is a sample code I have tried. I hope there should be a better way than this.
I'm exploring options to accomplish via Python helper libraries without any schema options as its only to raw_zone. I ran into infinite loops of never ending 'next_page_uri''s. Twilio offers pageSize limits but couldn't figure out an end of 'page'(s) for designing an exit condition for loops and conditional statements in my code. Any help regarding Twilio Pagination on Python-AzureDatabricks stack would be of great help.
Following is the sample code & a couple of sample responses.
page_data = page_response(url,date,creds)
data.update(page_data)
while(page_data['next_page_uri']!=None):
page_data = page_response(url,date,creds)
data.update(page_data)
next_page_url=data['next_page_uri']
src_url='https://api.twilio.com'
url=src_url+next_page_url
print(url)```
Sample Responses:
#response_0:
{
"first_page_uri":"",
"end":11111,
"previous_page_uri":"/2010-04-01/..../",
"messages":[{raw...data}]
"next_page_uri":""/2010-04-01/Accounts/ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Messages.json?start=2020-12-02PageSize=50&Page=1"
"page":0
}
#response_1:
{
"first_page_uri":"",
"end":49,
"previous_page_uri":"",
"messages":[{raw...data}]
"next_page_uri":""/2010-04-01/Accounts/ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/Messages.json?start=2020-12-02PageSize=50&Page=2"
"page":1
}

I did not test this, but you can always do some recursion like the following code:
def get_twillo_data(url, date, creds, data):
_data = page_response(url, date, creds)
data += _data['messages']
next_page_uri = _data.get('next_page_uri')
if next_page_uri:
get_twillo_data(
url = url + next_page_uri,
date = date,
creds = creds,
data = data
)
else:
return data
data = []
messages = get_twillo_data(
url='https://api.twilio.com',
date='ur_date',
creds='ur_creds',
data = data
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to schedule a Job macie from boto3 with a specific time? - python

Related

Update query string in scheduled query using Python Client for BigQuery Data Transfer Service

How do I get the GCE vm instance id using python?

How to use Azure DataBricks Api to submit job?

Facebook Marketing API - how to handle rate limit for retrieving all ad sets through campaign ids?

What is the best Python Design pattern for consuming Paginated Messages from a TWILIO's HTTP GET REST API response?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to schedule a Job macie from boto3 with a specific time? - python

Related

Update query string in scheduled query using Python Client for BigQuery Data Transfer Service

How do I get the GCE vm instance id using python?

How to use Azure DataBricks Api to submit job?

Facebook Marketing API - how to handle rate limit for retrieving *all* ad sets through campaign ids?

What is the best Python Design pattern for consuming Paginated Messages from a TWILIO's HTTP GET REST API response?

Categories

Resources

Facebook Marketing API - how to handle rate limit for retrieving all ad sets through campaign ids?