Using Azure Durable Functions with a CosmosDBTrigger - python

I have a data pipeline that is reading the change feed from CosmosDB and loading the data into an external resource via Durable Functions. My start function (in Python) looks something like this:
import azure.durable_functions as df
import azure.functions as func
async def main(documents: func.DocumentList, starter: str):
client = df.DurableOrchestrationClient(starter)
instance_id = await client.start_new('MyDFOrchestrator', client_input=documents)
logging.info(f"Started orchestration ID {instance_id}")
However, this runs into an error because the cosmosDBTrigger passes in a list of Cosmos Documents but client.start_new() needs a JSON-serializable input value. I can get around this by shoving the list into a simple JSON object (like {"doc_list": [{doc1}, {doc2}, {doc3}]}) but I want to make sure I'm not missing something in the API to handle this pattern.

It should be fine to pass JSON as input value to the orchestrator. There is an example here which does similar. Though the example is with http trigger, but the concerned area here has nothing to do what trigger you use at the starter/triggering Function.
Alternatively, you can create a concrete serializable class holding the model/entity structure (much cleaner than raw json). To create serializable classes, all we require is for your class to export two static methods: to_json() and from_json(). The Durable Functions framework will interally call these classes to serialize and de-serialize your custom class. Therefore, you should design these two functions such that calling from_json on the result of to_json is able to reconstruct your class.

Related

Write alert policies list in Cloud Storage with cloud functions

I want to write a list of alert policies in a json file in cloud storage. I have the script below:
def my_function(request):
alert_client = monitoring_v3.AlertPolicyServiceClient()
storage_client = storage.Client()
project_name = 'projects/my_project'
bucket_name = 'test'
policies = alert_client.list_alert_policies(name=project_name)
for policy in policies:
print(policy)
destination_blob_name = 'test'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string("{}".format(json.dumps(policy)))
This script is not working and returns me the following error: TypeError: Object of type AlertPolicy is not JSON serializable"
Now a couple of things:
Using the API explorer or looking at this documentation the response from the list method should be easy to handle. However I'm writing my cloud functions in Python and It seems that the response is different.
I understand that there is something around the way pagination is handle but I don't understand how to deal with it.
I can print(policies) but the log output is kind of weird with a line for each element of the json object. Why is that? What does it mean?
How can I handle this response? Is there a generic approach here or is this specific to the API?
Still I'm able to access each variable independently policy.name, policy.conditions etc... does it means that I have to rebuild the json object I want manually?
According to the googleapis documentation of the alert policy service, iterating over the list of alert policies using list_alert_policies() automatically resolves subsequent pages of the response. You should not worry about implementing pagination logic according to the documentation:
Returns: The protocol for the ListAlertPolicies response. Iterating over this object will yield results and resolve additional pages automatically. source
As for the type AlertPolicy, it does not appear to natively be able to convert to JSON. You might have to build the JSON objects by calling on the respective properties of the AlertPolicy objects that are returned, or you can also implement something similar to this ProtoEncoder class which appears to return JSON from AlertPolicy types. As for the available properties on the AlertPolicy objects, here is the source.
class ProtoEncoder(json.JSONEncoder):
"""Encode protobufs as json."""
def default(self, obj):
if type(obj) in (monitoring_v3.AlertPolicy, monitoring_v3.NotificationChannel):
text = proto.Message.to_json(obj)
return json.loads(text)
return super(ProtoEncoder, self).default(obj)

How to write helper / decorator function for cdk function

I have below function which will create S3 bucket with cdk code:
def __create_s3_components_bucket(self, this_dir: str, props):
"""Create S3 components bucket"""
s3_bucket = s3.Bucket(
self,
"BucketForImageBuilder",
bucket_name="some_bucket_name_1234",
block_public_access=s3.BlockPublicAccess(
block_public_acls=True,
block_public_policy=True,
ignore_public_acls=True,
restrict_public_buckets=True,
),
public_read_access=False,
encryption=s3.BucketEncryption.S3_MANAGED,
removal_policy=cdk.RemovalPolicy.DESTROY,
auto_delete_objects=True,
lifecycle_rules=[
s3.LifecycleRule(
abort_incomplete_multipart_upload_after=cdk.Duration.days(amount=2),
enabled=True,
expiration=cdk.Duration.days(amount=180),
transitions=[
s3.Transition(
transition_after=cdk.Duration.days(amount=30),
storage_class=s3.StorageClass.ONE_ZONE_INFREQUENT_ACCESS,
)
],
)
],
)
I would like to create helper / decorator function to implement this part of above code for all buckets defined in different stacks:
block_public_access=s3.BlockPublicAccess(
block_public_acls=True,
block_public_policy=True,
ignore_public_acls=True,
restrict_public_buckets=True,
I understand theory (watched few YouTube videos, read some examples) that they add extra functionality, etc., but can't get my head around of doing that in this example. Is it doable? If yes, how? Can anyone share some code example please?
so you can do something like:
my_bucket = create_s3_bucket(self, "MyBucketName")
if so, and you stated you want this helper function to be used for different stacks, then you will want to do something like the following in your utility/helper file (using Python)
from aws_cdk import aws_s3
def create_s3_bucket(scope, bucket_name:str) -> aws_s3.Bucket:
return aws_s3.Bucket(
scope, bucket_name,
bucket_name=bucket_name,
property1= value1,
ect = ect
)
if you want do more things with that bucket you can do so, just do so before the return, or create the bucket as its own object than manipulate it as you need before returning it.
Two things to note: if your function to do this commonality is NOT in the same stack class then you MUST pass it the self parameter as scope - the scope of a construct is how CDK knows what stack to put that constructs instructions for formation into. So, as you may have noticed, every construct in the init of a Stack starts with self, "LogicalID" - these are the two cloudformation template variables needed to allow CloudFormation to actually work - the first defining the stack this instruction goes into, the second the LogicalID of that cloudformation resource (how CloudFormation identifies a resource that the stack controls, and when to update or create a new one.
So since you stated you intend to use this across multiple stacks, put this into its own helper/utility file and import it, then pass it the self variable when called so CDK knows which stack its part of.
Second, you should return the object. Even if you don't plan on using it now, its better to return it so you can make use of it later. To reference the Bucket.BucketName or to add a role to it thats unique for that stack or whatever.
You can also add additional parameters to the function for anything you want to specify, and set a default. For instance:
from aws_cdk import aws_s3
def create_s3_bucket(scope, bucket_name:str, removal: cdk.RemovalPolicy = cdk.RemovalPolicy.RETAIN) -> aws_s3.Bucket
return aws_s3.Bucket(scope, bucket_name
...
removal_policy=removal)
I'm not 100% confident in decorator functions, but I am pretty sure that the way CDK construct instantiators work a decorator would not do what you are trying to do here, where this does. This is mostly because most properties cannot be set after the bucket is instantiated in your code. The few that could be could be put in a decorator, I suppose, something like
psudo code
def custom_s3_decorator():
yield
bucket.add_removal_policy(blah)
bucket.add_to_principle_policy(blah)
ect, with the correct code needed to make a decorator work, but again ... pretty sure most of what you want isn't possible to do after the instatiation so you'd want to use a helper function.

Why don't web2py json services treat lists properly?

The following works for json whose outermost container is an object like { ... }
#service.json
def index():
data = request.vars
#fields are now accessible via data["fieldname"] or data.fieldname
#processing must be done to turn Storage object into dict()
return data_as_dict
If you post a list however, it does not work
POST:
[
{"test": 1}
]
data will be an empty Storage object and data[0] will be None
The workaround is simple:
#service.json #so output is still returned as json
def index():
data = json.loads(request.body.read())
return data
data is now a dict in cases of object style JSON (easier to work with than a Storage object imo) and a native list when the JSON is a list.
My question is why is this not the default behaviour? Why should a JSON service not accept valid JSON?
The #service.json decorator simply registers a function so it can be accessed via a controller that returns a Service object. The decorator ensures that the service controller returns a JSON response when the decorated function is called, but it does nothing regarding the processing of JSON input.
In any case, your problem is not with the #service.json decorator but with a misunderstanding regarding request.vars. request.vars is a dictionary-like object that is populated with keys and values from the query string and/or the request body (if the request body includes form variables or a JSON object of keys and values). It is not intended to simply be a copy of any arbitrary data structure that is posted in the request body. So, if you post a JSON array in the request body, it would not make sense to copy that array to request.vars, as it is not the appropriate type of data structure. If you want to post a JSON array, the correct way to process it is to read the request body, as you have done.
Also, note that because your index function does not take any arguments and therefore does not take advantage of the #service decorator's ability to map parameters from the HTTP request into function arguments, you could simplify your code by foregoing the #service decorator and accessing the index function more directly:
def index():
data = json.loads(request.body.read())
return data
Assuming index is in the default.py controller, you could post JSON to /yourapp/default/index.json (note the .json extension), and you will get back a JSON response.

How do I programmatically add new functions to current scope in Python?

In Python it is easy to create new functions programmatically. How would I assign this to programmatically determined names in the current scope?
This is what I'd like to do (in non-working code):
obj_types = ('cat', 'dog', 'donkey', 'camel')
for obj_type in obj_types:
'create_'+obj_type = lambda id: id
In the above example, the assignment of lambda into a to-be-determined function name obviously does not work. In the real code, the function itself would be created by a function factory.
The background is lazyness and do-not-repeat-yourself: I've got a dozen and more object types for which I'd assign a generated function. So the code currently looks like:
create_cat = make_creator('cat')
# ...
create_camel = make_creator('camel')
The functions create_cat etc are used hardcoded in a parser.
If I would create classes as a new type programmatically, types.new_class() as seen in the docs seems to be the solution.
Is it my best bet to (mis)use this approach?
One way to accomplish what you are trying to do (but not create functions with dynamic names) is to store the lamda's in a dict using the name as the key. Instead of calling create_cat() you would call create['cat'](). That would dovetail nicely with not hardcoding names in the parser logic as well.
Vaughn Cato points out that one could just assign into locals()[object_type] = factory(object_type). However the Python docs prohibit this: "Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter"
D. Shawley points out that it would be wiser to use a dict() object which entries would hold the functions. Access would be simple by using create['cat']() in the parser. While this is compelling I do not like the syntax overhead of the brackets and ticks required.
J.F. Sebastian points to classes. And this is what I ended up with:
# Omitting code of these classes for clarity
class Entity:
def __init__(file_name, line_number):
# Store location, good for debug, messages, and general indexing
# The following classes are the real objects to be generated by a parser
# Their constructors must consume whatever data is provided by the tokens
# as well as calling super() to forward the file_name,line_number info.
class Cat(Entity): pass
class Camel(Entity): pass
class Parser:
def parse_file(self, fn):
# ...
# Function factory to wrap object constructor calls
def create_factory(obj_type):
def creator(text, line_number, token):
try:
return obj_type(*token,
file_name=fn, line_number=line_number)
except Exception as e:
# For debug of constructor during development
print(e)
return creator
# Helper class, serving as a 'dictionary' of obj construction functions
class create: pass
for obj_type in (Cat, Camel):
setattr(create,
obj_type.__name__.lower(),
create_factory(obj_type))
# Parsing code now can use (again simplified for clarity):
expression = Keyword('cat').setParseAction(create.cat)
This is helper code for deploying a pyparsing parser. D. Shawley is correct in that the dict would actually more easily allow to dynamically generate the parser grammar.

How do I test an API Client with Python?

I'm working on a client library for a popular API. Currently, all of my unit tests of said client are making actual API calls against a test account.
Here's an example:
def test_get_foo_settings(self):
client = MyCustomClient(token, account)
results = client.get_foo_settings()
assert_is(type(results), list)
I'd like to stop making actual API calls against my test account.
How should I tackle this? Should I be using Mock to mock the calls to the client and response?
Also, I'm confused on the philosophy of what to test with this client library. I'm not interested in testing the actual API, but when there are different factors involved like the method being invoked, the permutations of possible return results, etc - I'm not sure what I should test and/or when it is safe to make assumptions (such as a mocked response).
Any direction and/or samples of how to use Mock in my type of scenario would be appreciated.
I would personally do it by first creating a single interface or function call which your library uses to actually contact the service, then write a custom mock for that during tests.
For example, if the service uses HTTP and you're using Requests to contact the service:
class MyClient(…):
def do_stuff(self):
result = requests.get(self.service_url + "/stuff")
return result.json()
I would first write a small wrapper around requests:
class MyClient(…):
def _do_get(self, suffix):
return requests.get(self.service_url + "/" + suffix).json()
def do_stuff(self):
return self._do_get("stuff")
Then, for tests, I would mock out the relevant functions:
class MyClientWithMocks(MyClient):
def _do_get(self, suffix):
self.request_log.append(suffix)
return self.send_result
And use it in tests like this:
def test_stuff(self):
client = MyClientWithMocks(send_result="bar")
assert_equal(client.do_stuff(), "bar")
assert_contains(client.request_log, "stuff")
Additionally, it would likely be advantageous to write your tests so that you can run them both against your mock and against the real service, so that if things start failing, you can quickly figure out who's fault it is.
I'm using HTTmock and I'm pretty happy with it : https://github.com/patrys/httmock

Categories

Resources