How to use Azure DataBricks Api to submit job? - python

I am a beginner in Azure Databricks and I want to use APIs to create cluster and submit job in python. I am stuck as I am unable to do so. Also if I have an existing cluster how will the code look like? I got job id after running this code but unable to see any output.
import requests
DOMAIN = ''
TOKEN = ''
response = requests.post(
'https://%s/api/2.0/jobs/create' % (DOMAIN),
headers={'Authorization': 'Bearer %s' % TOKEN},
json={
"name": "SparkPi spark-submit job",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 2
},
"spark_submit_task": {
"parameters": [
"--class",
"org.apache.spark.examples.SparkPi",
"dbfs:/FileStore/sparkpi_assembly_0_1.jar",
"10"
]
}
}
)
if response.status_code == 200:
print(response.json())
else:
print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Jobs at Databricks could be executed two ways (see docs):
on a new cluster - that's how you do it right now
on existing cluster - remove the new_cluster block, and add the existing_cluster_id field with the ID of existing cluster. If you don't have a cluster yet, then you can create it via Cluster API
When you create a job, then you get back the job ID that could be used to edit the job or delete it. You can also launch the job using the Run Now API. But if you just want to execute the job without create the Job in the UI, then you need to look onto Run Submit API. Either of the APIs will return the ID of specific job run, and then you can use Run Get API to get status of the job, or Run Get Output API to get the results of execution.

Related

Google Cloud Function - Python script to get data from Webhook

I hope someone can help me out on my problem.
I have a google-cloud function created which is http triggered and a webhook setup in customer.io
I need to capture data that was sent by customer.io app. this should trigger the google cloud function and run the python script setup within the cloud function. I am new to writing python script and its libraries. The final goal is to write the webhook data into bigquery table.
For now, I am able to see that trigger is working since I am seeing the data using print sent by the app using the function logs. I am able to check the schema of the data as well from the logs in textpayload.
This is the sample data from the textpayload I wanted to load on a bigquery table:
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}
and this is the sample Python code I have created on the google-cloud function:
import os
def webhook(request):
request_json = request.get_json()
if request.method == 'POST':
print(request_json)
return 'success'
else:
return 'failed'
I have only tried printing the data from webhook and what I am expecting is to have a Python code that writes this textpayload data into bigquery table.
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}
So, you have up a cloud function that executes some code whenever the webhook posts some data to it.
What this cloud function needs now is the BigQuery python client library. Here's an example of how it's used (source):
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = ...
table_name = ...
data = ...
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_name)
table = client.get_table(table_ref)
result = client.insert_rows(table, data)
So you could put something like this into your cloud function in order to send your data to a target BigQuery table.

How to schedule a Job macie from boto3 with a specific time?

I have successfully created my Job, but now I would like it to run at 4am. I've searched the documentation and haven't found a way to do it, is that possible?
I leave my code below:
client = boto3.client('macie2')
response = client.create_classification_job(
customDataIdentifierIds=[ CustomDataIdentifierID ],
description='macie',
initialRun=True,
jobType='SCHEDULED',
name='macie_classification_job',
scheduleFrequency={"dailySchedule": {}}
)

How do I destructure an API with python Django and django-rest-framework?

I have a successfully compiled and run a django rest consuming cocktaildb api. On local server when I run http://127.0.0.1:8000/api/ I get
{
"ingredients": "http://127.0.0.1:8000/api/ingredients/",
"drinks": "http://127.0.0.1:8000/api/drinks/",
"feeling-lucky": "http://127.0.0.1:8000/api/feeling-lucky/"
}
But when I go to one of the links mentioned in the json result above, for example:
http://127.0.0.1:8000/api/ingredients/
I get an empty [] with a status 200OK!
I need an endpoint to GET drinks and ingredients before I can destructure to specific details using angular.
I implemented helper folder in the app with the the API function as below:
class TheCoctailDBAPI:
THECOCTAILDB_URL = 'https://www.thecocktaildb.com/api/json/v1/1/'
async def __load_coctails_for_drink(self, drink, session):
for i in range(1, 16):
ingredientKey = 'strIngredient' + str(i)
ingredientName = drink[ingredientKey]
if not ingredientName:
break
if ingredientName not in self.ingredients:
async with session.get(f'{TheCoctailDBAPI.THECOCTAILDB_URL}search.php?i={ingredientName}') \
as response:
result = json.loads(await response.text())
self.ingredients[ingredientName] = result['ingredients'][0]
What was your expected responce?
Add the function that is called by this API as well as the DB settings in the question, so that we can properly help you.
Are you sure that you are connecting and pulling data from a remote location? It looks to me like your local DB is empty, so the API has no data to return.

How do I get the GCE vm instance id using python?

I am using python. I have the correct project and vm instance names. So I can query Google Cloud metrics just fine. But now I need to query some Agent metrics, but it needs the instance id of my vm instead of the name. What is the simplest way for me to get the instance id of my vm with a query?
Sorry, I should be more clear. Here is my sample code:
results = client.list_time_series(
request={
"name": project_name,
"filter": filter,
"interval": interval,
"view": monitoring_v3.ListTimeSeriesRequest.TimeSeriesView.FULL,
}
)
I want to make a query similar to this. Any simple filter I can use, or something else, that will get me the instance_id of a particular instance name?
If you are inside the gce vm you use the Metadata server
import requests
metadata_server = "http://metadata/computeMetadata/v1/instance/"
metadata_flavor = {'Metadata-Flavor' : 'Google'}
gce_id = requests.get(metadata_server + 'id', headers = metadata_flavor).text
gce_name = requests.get(metadata_server + 'hostname', headers = metadata_flavor).text
gce_machine_type = requests.get(metadata_server + 'machine-type', headers = metadata_flavor).text
If you are looking to list gce vms check the example in the GCP documentation for using client libraries: Listing Instances

Needed advice to automate REST services test

I am kind of newbie to REST and testing dept. I needed to write automation scripts to test our REST services.We are planning to run these scripts from a Jenkins CI job regularly. I prefer writing these in python as we already have UI functionality testing scripts in python generated by selenium IDE, but I am open to any good solution.I checked httplib,simplejson and Xunit, but looking for better solutions available out there.
And also, I would prefer to write a template and generate actual script for each REST API by reading api info from xml or something. Advance thanks to all advices.
I usually use Cucumber to test my restful APIs. The following example is in Ruby, but could easily be translated to python using either the rubypy gem or lettuce.
Start with a set of RESTful base steps:
When /^I send a GET request for "([^\"]*)"$/ do |path|
get path
end
When /^I send a POST request to "([^\"]*)" with the following:$/ do |path, body|
post path, body
end
When /^I send a PUT request to "([^\"]*)" with the following:$/ do |path, body|
put path, body
end
When /^I send a DELETE request to "([^\"]*)"$/ do |path|
delete path
end
Then /^the response should be "([^\"]*)"$/ do |status|
last_response.status.should == status.to_i
end
Then /^the response JSON should be:$/ do |body|
JSON.parse(last_response.body).should == JSON.parse(body)
end
And now we can write features that test the API by actually issuing the requests.
Feature: The users endpoints
Scenario: Creating a user
When I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
Then the response should be "200"
Scenario: Listing users
Given I send a POST request to "/users" with the following:
"""
{ "name": "Swift", "status": "awesome" }
"""
When I send a GET request for "/users"
Then the response should be "200"
And the response JSON should be:
"""
[{ "name": "Swift", "status": "awesome" }]
"""
... etc ...
These are easy to run on a CI system of your choice. See these links for references:
http://www.anthonyeden.com/2010/11/testing-rest-apis-with-cucumber-and-rack-test/
http://jeffkreeftmeijer.com/2011/the-pain-of-json-api-testing/
http://www.cheezyworld.com/2011/08/09/running-your-cukes-in-jenkins/
import openpyxl
import requests
import json
from requests.auth import HTTPBasicAuth
urlHead='https://IP_ADDRESS_HOST:PORT_NUMBER/'
rowStartAt=2
apiColumn=2
#payloadColumn=3
responseBodyColumn=12
statusCodeColumn=13
headerTypes = {'Content-Type':'application/json',
'Accept':'application/json',
'Authorization': '23324'
}
wb = openpyxl.load_workbook('Excel_WORKBOOK.xlsx')
# PROCESS EACH SHEET
for sheetName in (wb.get_sheet_names()):
print ('Sheet Name = ' + sheetName)
flagVar = input('Enter N To avoid APIs Sheets')
if (flagVar=='N'):
print ('Sheet got skipped')
continue
#get a sheet
sheetObj = wb.get_sheet_by_name(sheetName)
#for each sheet iterate the API's
for i in range(2, sheetObj.max_row+1):
#below is API with method type
apiFromSheet = (sheetObj.cell(row=i, column=apiColumn).value)
if apiFromSheet is None:
continue
#print (i, apiFromSheet)
#Let's split the api
apiType = apiFromSheet.split()[0]
method = apiFromSheet.split()[1]
if (apiType!='GET'):
continue
#lets process GET API's
absPath = urlHead + method
print ("REQUESTED TYPE AND PATH = ", apiType, absPath)
print('\n')
res = requests.get(absPath, auth=HTTPBasicAuth(user, pwd), verify=False, headers=headerTypes)
#LET's write res body into relevant cell
sheetObj.cell(row=i, column=responseBodyColumn).value = (res.text)
sheetObj.cell(row=i, column=statusCodeColumn).value = (res.status_code)
wb.save('Excel_WORKBOOK.xlsx')
`#exit(0)`

Categories

Resources