deserialize RecognizedForm object from json (Azure Form Recognizer Python SDK) - python

I need to deserialize json-serialized Azure Form Recognizer results into python FormRecognizer objects (from azure-ai-formrecognizer==3.1.0b1 package), and I do not see any sort of api to perform this deserialization. Any help would be appreciated.

Depending on how you are parsing, if you just need to access RecognizedForm attributes and don't need a true RecognizedForm object (in other words, just the shape of a RecognizedForm object), this might work for you:
import json
from types import SimpleNamespace
recognized_form = json.loads(recognized_form_json, object_hook=lambda fields: SimpleNamespace(**fields))
Otherwise, we can manually deserialize it back to a RecognizedForm. See this gist for an example (not fully tested).

Related

AttributeError: Can't pickle local object 'SharedMemoryDisplay.__init__.<locals>.<lambda>'

I am working with this object SharedMemoryDisplay in the above script and I want to return/retrieve self.camera_container, where self.camera_container = {camera_id: (camera_id, frame, frame_properties)}
I tried to create a method to return the this but it gives two errors
prop, camera_container = monitor_memory.get_frame()
TypeError: 'NoneType' object is not iterable
AttributeError: Can't pickle local object 'SharedMemoryDisplay.__init__.<locals>.<lambda>'
I am only able to able to get self.camera_container[key] if I just do which is fine but I want to get self.camera_container also.
return self.camera_container[key]
In the below script is where, I am using this object to display in a cv2 named window, my ultimate motive is to retrieve frames of all the cameras seperately what it currently does is joins all the camera frames and returns that via self.display_frame which is added to webdisplay_memory in the below script (for displaying in the html) that's why I created a method to retrieve dictionary camera_container.
webdisplay_memory.add_frame(0, self.display_frame, None)
rather than messing with this variable I was thinking of creating a method that returns self.camera_container than using this to get frames of each camera seperately.
how can I overcome this, kindly help if you have better and efficient solutions!
multiprocessing uses pickle under the hood. pickle can serialize only a certain set of objects. Specifically, it can not serialize defaultdict (a type for your camera_container). So, either use a normal dict and replace all the lookups to self.camera_container.get(key, None), or look into this question and try to use pathos.multiprocessing with dill. The latter approach is not tested by me, though.

Write alert policies list in Cloud Storage with cloud functions

I want to write a list of alert policies in a json file in cloud storage. I have the script below:
def my_function(request):
alert_client = monitoring_v3.AlertPolicyServiceClient()
storage_client = storage.Client()
project_name = 'projects/my_project'
bucket_name = 'test'
policies = alert_client.list_alert_policies(name=project_name)
for policy in policies:
print(policy)
destination_blob_name = 'test'
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(destination_blob_name)
blob.upload_from_string("{}".format(json.dumps(policy)))
This script is not working and returns me the following error: TypeError: Object of type AlertPolicy is not JSON serializable"
Now a couple of things:
Using the API explorer or looking at this documentation the response from the list method should be easy to handle. However I'm writing my cloud functions in Python and It seems that the response is different.
I understand that there is something around the way pagination is handle but I don't understand how to deal with it.
I can print(policies) but the log output is kind of weird with a line for each element of the json object. Why is that? What does it mean?
How can I handle this response? Is there a generic approach here or is this specific to the API?
Still I'm able to access each variable independently policy.name, policy.conditions etc... does it means that I have to rebuild the json object I want manually?
According to the googleapis documentation of the alert policy service, iterating over the list of alert policies using list_alert_policies() automatically resolves subsequent pages of the response. You should not worry about implementing pagination logic according to the documentation:
Returns: The protocol for the ListAlertPolicies response. Iterating over this object will yield results and resolve additional pages automatically. source
As for the type AlertPolicy, it does not appear to natively be able to convert to JSON. You might have to build the JSON objects by calling on the respective properties of the AlertPolicy objects that are returned, or you can also implement something similar to this ProtoEncoder class which appears to return JSON from AlertPolicy types. As for the available properties on the AlertPolicy objects, here is the source.
class ProtoEncoder(json.JSONEncoder):
"""Encode protobufs as json."""
def default(self, obj):
if type(obj) in (monitoring_v3.AlertPolicy, monitoring_v3.NotificationChannel):
text = proto.Message.to_json(obj)
return json.loads(text)
return super(ProtoEncoder, self).default(obj)

pydantic convert to jsonable dict (not full json string)

I'd like to use pydantic for handling data (bidirectionally) between an api and datastore due to it's nice support for several types I care about that are not natively json-serializable. It has better read/validation support than the current approach, but I also need to create json-serializable dict objects to write out.
from uuid import UUID, uuid4
from pydantic import BaseModel
class Model(BaseModel):
the_id: UUID
instance = Model(the_id=uuid4())
print("1: %s" % instance.dict()
print("2: %s" % instance.json()
prints
{'the_id': UUID('4108356a-556e-484b-9447-07b56a664763')}
>>> inst.json()
'{"the_id": "4108356a-556e-484b-9447-07b56a664763"}'
Id like the following:
{"the_id": "4108356a-556e-484b-9447-07b56a664763"} # eg "json-compatible" dict
It appears that while pydantic has all the mappings, but I can't find any usage of the serialization outside the standard json ~recursive encoder (json.dumps( ... default=pydantic_encoder)) in pydantic/main.py. but I'd prefer to keep to one library for both validate raw->obj (pydantic is great at this) as well as the obj->raw(dict) so that I don't have to manage multiple serialization mappings. I suppose I could implement something similar to the json usage of the encoder, but this should be a common use case?
Other approaches such as dataclasses(builtin) + libraries such as dataclasses_jsonschema provide this ~serialization to json-ready dict, but again, hoping to use pydantic for the more robust input validation while keeping things symmetrical.
The current version of pydantic does not support creating jsonable dict straightforwardly. But you can use the following trick:
class Model(BaseModel):
the_id: UUID = Field(default_factory=uuid4)
print(json.loads(Model().json()))
{'the_id': '4c94e7bc-78fe-48ea-8c3b-83c180437774'}
Or more efficiently by means of orjson
orjson.loads(Model().json())
it appears this functionality has been proposed, and (may be) favored by pydantic's author samuel colvin, as https://github.com/samuelcolvin/pydantic/issues/951#issuecomment-552463606
which proposes adding a simplify parameter to Model.dict() to output jsonalbe data.
This code runs in a production api layer, and is exersized such that we can't use the one-line workaround suggested (just doing a full serialize (.json()) + full deserialize). We implemented a custom function to do this, descending the result of .dict() and converting types to jsonable - hopefully the above proposed functionality is added to pydantic in the future.
Another alternative is to use the jsonable_encoder method from fastapi if you're using that already: https://fastapi.tiangolo.com/tutorial/encoder/
The code seems pretty self-contained so you could copy paste it if the license allows it.

Using Azure Durable Functions with a CosmosDBTrigger

I have a data pipeline that is reading the change feed from CosmosDB and loading the data into an external resource via Durable Functions. My start function (in Python) looks something like this:
import azure.durable_functions as df
import azure.functions as func
async def main(documents: func.DocumentList, starter: str):
client = df.DurableOrchestrationClient(starter)
instance_id = await client.start_new('MyDFOrchestrator', client_input=documents)
logging.info(f"Started orchestration ID {instance_id}")
However, this runs into an error because the cosmosDBTrigger passes in a list of Cosmos Documents but client.start_new() needs a JSON-serializable input value. I can get around this by shoving the list into a simple JSON object (like {"doc_list": [{doc1}, {doc2}, {doc3}]}) but I want to make sure I'm not missing something in the API to handle this pattern.
It should be fine to pass JSON as input value to the orchestrator. There is an example here which does similar. Though the example is with http trigger, but the concerned area here has nothing to do what trigger you use at the starter/triggering Function.
Alternatively, you can create a concrete serializable class holding the model/entity structure (much cleaner than raw json). To create serializable classes, all we require is for your class to export two static methods: to_json() and from_json(). The Durable Functions framework will interally call these classes to serialize and de-serialize your custom class. Therefore, you should design these two functions such that calling from_json on the result of to_json is able to reconstruct your class.

Pickling an object on appengine

I have an object with an __init__ procedure that requires at least one parameter and
I want to store in the cache.
When trying to getting the object from the cache I get an error that the I didn't pass enough parameters to the ___init___ method.
Someone told me I need to pickle the object before sending it to the cache but all the examples I saw were using .dat files and on appengine you cannot use any file system.
You can use pickle without any filesystem, using pickle.loads / pickle.dumps. For example:
import pickle
obj = YourClass(yourparam=...)
data = pickle.dumps(obj)
# and now, store "data" into the cache
# later, get "data" from the cache
obj = pickle.loads(data)
# and tada, obj if the same as before :)
I think you are trying to use memcache in appengine. This blog will help you a lot
http://blog.notdot.net/2009/9/Efficient-model-memcaching

Categories

Resources