Pythonic Way to Handle Non-Existant Python Dict Keys

Pythonic Way to Handle Non-Existant Python Dict Keys - python

I'm using an API that returns a JSON object that I easily convert to a dict. We'll call the converted JSON-to-dict object, JSON_API_ITEM.
However, sometimes the JSON_API_ITEM doesn't have certain fields (like url for example). Instead of setting that field to None, the field simply isn't present so when I make an assignment:
url = JSON_API_ITEM['url']
My program throws an exception. I thought of using a for loop across dict.keys() or putting everything in a try or if block, but that will make my code extremely ugly.
What is the pythonic way to handle something like this?

You can use get method.
my_item.get('url',None) # None is default in case url key does not exist.
EDIT:
I undelete my answer as, after OP's edit, its clear that the issue is about getting value from a dict, not about assignment.

Although .get() works and is the shortest (I think) way to do this, a more Pythonic way would be EAFP or "Easier to Ask Forgiveness than Permission". That means that a try/except block is how you're supposed to do it:
try:
url = my_item['url']
except KeyError e:
url = None
I'm assuming here that you meant to write url = my_item['url'], not my_item['url'] = url
I see the answer about .get() was removed. I'll explain it, also.
You can use url = my_item.get('url',None); the second argument to dict.get() is a default value. It actually defaults to None, though, so you can just write utl = my_item.get('url')

As your question has changed, this answer is out-of-scope but still posting in case someone needs to assign to maybe non-existent keys, instead of just needing access.
This is done using a defaultdict instead of a dict in the loaded JSON. The convertion is done using the object_hook. This hook is called whenever a JSON object is scanned (which usually results in a dict in Python). In the following example, the default_factory simply returns None, but you can place something more interesting for your purpose if you want (an empty dict, for example, so that chained getitem calls do not fail).
samplejson = """
{
"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{ "value": "New", "onclick": "CreateNewDoc()" },
{ "value": "Open", "onclick": "OpenDoc()" },
{ "value": "Close", "onclick": "CloseDoc()" }
]
}
}
}
"""
jsobject = json.loads(
samplejson,
object_hook=lambda obj: collections.defaultdict(lambda:None, **obj)
)
print(jsobject["menu"]["id"])
print(jsobject["menu"]["tooltip"])
jsobject["menu"]["tooltip"] = "File menu"
jsobject["menu"]["accel"] = "Alt+F"
print(jsobject["menu"]["accel"])
print(jsobject["menu"]["tooltip"])
#Output:
#file
#None
#Alt+F
#File menu

Related

What's the most Pythonic way to pull from structured data with an inconsistent maximum "depth"?

I've got a JSON file holding different dialog lines for a Discord bot to use, sorted by which bot command triggers the line. It looks something like this:
{
"!remind": {
"responses": {
"welcome": "Reminder set for {time}.",
"reminder": "At {time} you asked me to remind you {thing}."
},
"errors": {
"R0": "Invalid reminder time.",
"R1": "Reminder time is in the past."
},
"help": "To set a reminder say `!reminder [time] [thing]`"
},
"<!timezone, !spoilimage, !crosspost, etc.>": {
<same structure>
}
}
I have a function that's meant to access the values stored in the JSON file, do any necessary formatting using kwargs, and return a string. My original approach was
def dialog(command, category, name, **fmt):
json_data = <json stuff>
return json_data[command][category][name].format(**fmt)
# Sample call:
pastcommand = <magic>
reply = dialog("!remind", "response", "reminder", time=pastcommand.time, thing=pastcommand.message)
# Although in practice I've made wrapper methods to avoid having to specify all of these args each time
But this will only work for "responses" and "errors", not "help," since in "help" the message to send is a level "shallower".
Two other things to note:
It's unlikely there will ever need to be anything in "help" other than the single value.
Currently there are no name conflicts between keys in different subcategories, and it's very easy to keep it that way. However, "responses"/"errors"/"help" is consistent across all categories, and some key names are repeated across categories (although I could change that if necessary).
So, in terms of fixing this, I could always just restructure the JSON file, something like
"help": {"main": "To set a reminder say `!reminder [time] [thing]`"}
but I don't like the idea of turning a string into a dict containing just a single string, just to satisfy the constraints of a function that pulls it.
Beyond that, I've run through a number of options, namely: explicitly checking the category and making it a special case (if category == "help"); trying both options with a try/except block, and using pandas.json_normalize (which I'm pretty sure would work? I haven't actually worked with it. Either way, any time a seemingly simple problem brings me to a third-party library, it makes me suspect I'm doing something wrong.).
What I've settled on, so far, is this:
def dialog(*json_keys, **fmt):
json_data = <json stuff>
current_level = json_data
for key in json_keys:
# Let's pretend I did error-handling here.
current_level = current_level[key]
return current_level.format(**fmt)
It's a lot more elegant and more flexible than any of the other things I considered, but I'm self-taught and pretty inexperienced, and I'm wondering if I'm overlooking some better approach.

Naming CDK resources dynamically

I'm using the CDK to create some infrastructure from a yaml template file. Some resources require multiple instances. I thought writing a function would be the easiest way to create multiple instance of the resource
Function
def create_vpn_connection_route(cidr_count, destination_cidr):
vpn_connection_route = aws_ec2.CfnVPNConnectionRoute(
self,
f'vpn_connection_route{cidr_count}',
vpn_connection_id=vpn_connection.ref,
destination_cidr_block=destination_cidr
)
return vpn_connection_route
I then loop over it and generate the "Id" by enumarating over the destination_cidrs like so
for cidr_count, destination_cidr in enumerate(tenant_config['vpn_config'][0]['destination_cidrs']):
create_vpn_connection_route(cidr_count, destination_cidr)
This is what's in my yaml
vpn_config:
- private_ip:
- 10.1.195.201/32
- 10.1.80.20/32
- 10.1.101.8/32
Is there a better way to do this in the CDK? and can I dynamically generate Id'S for resources?
Cheers

I don't know that it makes your code much better, but you can use a Construct instead of a function.
class VpnConnectionRoute(core.Construct):
def __init__(self, scope, id_, vpn_connection, destination_cidr):
super().__init__(scope, id_)
self.vpn_connection_route = aws_ec2.CfnVPNConnectionRoute(
self,
'vpn_connection_route',
vpn_connection_id=vpn_connection.vpn_id,
destination_cidr_block=destination_cidr
)
# ...
for cidr_count, destination_cidr in enumerate(tenant_config['vpn_config'][0]['destination_cidrs']):
VpnConnectionRoute(self, f"route{cidr_count}", vpn_connection, destination_cidr)
VpnConnectionRoute(self, f"route{cidr_count}", vpn_connection, destination_cidr)
VpnConnectionRoute(self, f"route{cidr_count}", vpn_connection, destination_cidr)
CDK will automatically name your resources based on both the construct and your name. So the end result will look like:
"route1vpnconnectionrouteAE1C11A9": {
"Type": "AWS::EC2::VPNConnectionRoute",
"Properties": {
"DestinationCidrBlock": "10.1.195.201/32",
"VpnConnectionId": {
"Ref": "Vpn6F669752"
}
},
"Metadata": {
"aws:cdk:path": "app/route1/vpn_connection_route"
}
},
You can also just put destination_cidr inside your route name. CDK will remove all unsupported characters for you automatically.
for destination_cidr in tenant_config['vpn_config'][0]['destination_cidrs']:
aws_ec2.CfnVPNConnectionRoute(
self,
f'VPN Connection Route for {destination_cidr}',
vpn_connection_id=vpn_connection.vpn_id,
destination_cidr_block=destination_cidr
)
The best solution here probably depends on what you want to happen when these addresses change. For this particular resource type, any change in the name or the values will require a replacement anyway. So keeping the names consistent while the values change might not matter that much.

Entity Framework: Update multiple objects only if all objects exist

I'm writing a PATCH REST API in C# using Entity Framework that updates a specific field in multiple objects.
Request body:
{
"ids": [
"id1",
"id2"
],
"foo": "bar"
}
I would like to update all objects' foo field to be bar, but only if all objects exist.
I'm trying to keep it clean by not having a preemptive select that checks whether all objects exist (which BTW might not be good enough because if an object exist now it doesn't mean it will still exist few milliseconds later).
I'm looking for a short solution that would rollback and raise an exception if one of the objects didn't successfully update (or doesn't exist).
The only solution I found is to open a transaction and update each object in a loop, which IMHO isn't the best way because I don't want to access the database each row at a time.
What's the best way to implement this?

The DbContext.SaveChanges method returns the number of entries written to.
In case of an update, it will return the number of updated rows.
So what you want to do is:
Start a new transaction
Execute a single update query for all you IDs together
Check the return value of SaveChanges, and Commit if it matches the number of IDs in your query, or Abort otherwise.

The best thing I can come up with is the following:
var ids = new List<int>(){1,2,3,4,5,6,7};
var records = db.Records.Where(x=> ids.Contains(x.Id));
try
{
foreach(var i in ids)
{
var record = records.FirstOrDefault(x=>x.Id == i);
if(record == null)
{
throw new Exception($"Record with Id {i} not found");
}
record.Foo = "Bar";
}
db.saveChanges();
}
catch(Exception ex)
{
//roll back changes
var changedEntries = db.ChangeTracker.Entries()
.Where(x => x.State != EntityState.Unchanged).ToList();
foreach(var entry in changedEntries)
{
db.Entry(entry).State = EntityState.Unchanged;
}
}
The reasoning here is that EF implicitly uses a transaction, which is "committed" when you call .SaveChanges(). If something goes wrong, you simply reset the entities' state to Unchanged and never call SaveChanges().

Firestore: REST API runQuery method expects wrong parent path pattern

I'm trying to perform a simple query with Firestore REST API.
Being on Google App Engine standard I cannot use the google-cloud-firestore client which is not yet compatible with GAE standard. Instead, I'm using google-api-python-client as for other Google APIs.
This is how I initialize my service:
service = build('firestore', 'v1beta1', credentials=_credentials)
Once this is done, I perform the query that way:
query = { "structuredQuery":
{
"from": [{ "collectionId": "mycollection" }],
"where": {
"fieldFilter":
{
"field": { "fieldPath": "myfield" },
"op": "EQUAL",
"value": { "stringValue": "myvalue" }
}
}
}
}
response = service.projects().databases().documents().runQuery(
parent='projects/myprojectid/databases/(default)/documents',
body=query).execute()
This returns an error quite explicit:
TypeError: Parameter "parent" value
"projects/myprojectid/databases/(default)/documents"
does not match the pattern
"^projects/[^/]+/databases/[^/]+/documents/[^/]+/.+$"
which obviously is true. My point is that the documentation cleary states that this should be an accepted value:
The parent resource name. In the format: projects/{project_id}/databases/{database_id}/documents or projects/{project_id}/databases/{database_id}/documents/{document_path}. For example: projects/my-project/databases/my-database/documents or projects/my-project/databases/my-database/documents/chatrooms/my-chatroom (string)
Performing the same query with the API Explorer (or curl) works fine and returns the expected results. (even though the API Explorer does state that the parent parameter does not match the expected pattern).
It seems that the discovery document (which is used by google-api-python-client) enforces this pattern check for the parent parameter but the associated regular expression does not actually allow the only parent path that seems to work (projects/myprojectid/databases/(default)/documents).
I tried to use a different pattern path like projects/myprojectid/databases/(default)/documents/*/**, which makes the query run fine but does not return any results.
Is anyone having the same issue or am I doing something wrong here ?
The only workaround I can think of is making a request directly to the proper url without using google-api-python-client, but that means that I have to handle the auth process manually which is a pain.
Thanks for any help you could provide !

google-cloud-firestore is now compatible with standard App Engine.
https://cloud.google.com/firestore/docs/quickstart-servers

for your information, a trick works.
You can change the uri after generating the request :
request = service.projects().databases().documents().runQuery(
parent='projects/myprojectid/databases/(default)/documents/*/**',
body=query)
request.uri = request.uri.replace('/*/**', '')
response = request.execute()

Using google.protobuf.Any in python file

I have such .proto file
syntax = "proto3";
import "google/protobuf/any.proto";
message Request {
google.protobuf.Any request_parameters = 1;
}
How can I create Request object and populate its fields? I tried this:
import ma_pb2
from google.protobuf.any_pb2 import Any
parameters = {"a": 1, "b": 2}
Request = ma_pb2.Request()
some_any = Any()
some_any.CopyFrom(parameters)
Request.request_parameters = some_any
But I have an error:
TypeError: Parameter to CopyFrom() must be instance of same class: expected google.protobuf.Any got dict.
UPDATE
Following prompts of #Kevin I added new message to .proto file:
message Small {
string a = 1;
}
Now code looks like this:
Request = ma_pb2.Request()
small = ma_pb2.Small()
small.a = "1"
some_any = Any()
some_any.Pack(small)
Request.request_parameters = small
But at the last assignment I have an error:
Request.request_parameters = small
AttributeError: Assignment not allowed to field "request_parameters" in protocol message object.
What did I do wrong?

Any is not a magic box for storing arbitrary keys and values. The purpose of Any is to denote "any" message type, in cases where you might not know which message you want to use until runtime. But at runtime, you still need to have some specific message in mind. You can then use the .Pack() and .Unpack() methods to convert that message into an Any, and at that point you would do something like Request.request_parameters.CopyFrom(some_any).
So, if you want to store this specific dictionary:
{"a": 1, "b": 2}
...you'll need a .proto file which describes some message type that has integer fields named a and b. Personally, I'd see that as overkill; just throw your a and b fields directly into the Request message, unless you have a good reason for separating them out. If you "forget" one of these keys, you can always add it later, so don't worry too much about completeness.
If you really want a "magic box for storing arbitrary keys and values" rather than what I described above, you could use a Map instead of Any. This has the advantage of not requiring you to declare all of your keys upfront, in cases where the set of keys might include arbitrary strings (for example, HTTP headers). It has the disadvantage of being harder to lint or type-check (especially in statically-typed languages), because you can misspell a string more easily than an attribute. As shown in the linked resource, Maps are basically syntactic sugar for a repeated field like the following (that is, the on-wire representation is exactly the same as what you'd get from doing this, so it's backwards compatible to clients which don't support Maps):
message MapFieldEntry {
key_type key = 1;
value_type value = 2;
}
repeated MapFieldEntry map_field = N;

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.