I'm trying to test a lot of json documents against a schema, and I use an object with all the required field names to keep how many errors each has.
Is there a function in any python libraries that creates a sample object with boolean values for whether a particular field is required. i.e.
From this schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"type": {
"type": "string"
},
"position": {
"type": "array"
},
"content": {
"type": "object"
}
},
"additionalProperties": false,
"required": [
"type",
"content"
]
}
I need to get something like:
{
"type" : True,
"position" : False,
"content" : True
}
I need it to support references to definitions as well
I don't know of a library that will do this, but this simple function uses a dict comprehension to get the desired result.
def required_dict(schema):
return {
key: key in schema['required']
for key in schema['properties']
}
print(required_dict(schema))
Example output from your provided schema
{'content': True, 'position': False, 'type': True}
Edit: link to repl.it example
Related
I am trying to validate the json for required fields using python. I am doing it manually like iterating through the json reading it. Howerver i am looking for more of library / generic solution to handle all scenarios.
For example I want to check in a list, if a particular attribute is available in all the list items.
Here is the sample json which I am trying to validate.
{
"service": {
"refNumber": "abc",
"item": [{
"itemnumber": "1",
"itemloc": "uk"
}, {
"itemnumber": "2",
"itemloc": "us"
}]
}
}
I want to validate if I have refNumber and itemnumber in all the list items.
A JSON Schema is a way to define the structure of JSON.
There are some accompanying python packages which can use a JSON schema to validate JSON (jsonschema).
The JSON Schema for your example would look approximately like this:
{
"type": "object",
"properties": {
"service": {
"type": "object",
"properties": {
"refNumber": {
"type": "string"
},
"item": {
"type": "array",
"items": {
"type": "object",
"properties": {
"itemnumber": {
"type": "string"
},
"itemloc": {
"type": "string"
}
}
}
}
}
}
}
}
i.e., an object containing service, which itself contains a refNumber and a list of items.
Since i dont have enough rep to add a comment i will post this answer.
First i have to say i dont program with python.
According to my google search, you have a jsonschema module available for Python.
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"service": {"object": {
"refNumber": {"type" : "string"},
"item: {"array": []}
},
"required": ["refNumber"]
},
},
}
validate(instance=yourJSON, schema=yourValidationSchema)
This example is not tested, but you can get some idea,
Link to jsonschema docs
In Python 3.8, I'm trying to mock up a validation JSON schema for the structure below:
{
# some other key/value pairs
"data_checks": {
"check_name": {
"sql": "SELECT col FROM blah",
"expectations": {
"expect_column_values_to_be_unique": {
"column": "col",
},
# additional items as required
}
},
# additional items as required
}
}
The requirements I'm trying to enforce include:
At least one item in data_checks that can have a dynamic name. Item keys should be unique.
sql and expectations keys must be present
sql should be a text string
At least one item in expectations. Item keys should be unique.
Within expectations, item keys must be equal to available methods provided by dir(class_name)
More advanced capability would include:
Enforcing expectations method items to only include kwargs for that method
I currently have the following JSON schema for the data_checks portion:
"data_checks": {
"description": "Data quality checks against provided sources.",
"minProperties": 1,
"type": "object",
"patternProperties": {
".+": {
"required": ["expectations", "sql"],
"sql": {
"description": "SQL for data quality check.",
"minLength": 1,
"type": "string",
},
"expectations": {
"description": "Great Expectations function name.",
"minProperties": 1,
"type": "object",
"anyOf": [
{
"type": "string",
"minLength": 1,
"pattern": [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")],
}
],
},
},
},
},
This JSON schema does not enforce expectations to have at least one item nor does it enforce valid method names for the nested keys as expected from [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")]. I haven't really looked into enforcing kwargs for the selected method (is that even possible?).
I don't know if this is related to things being nested, but how would I enforce the proper validation requirements?
Thanks!
I am looking for Python module to filter JSON data against schema.
For example,
there is JSON data:
{
"system" : {
"state" : "enabled",
"id" : 5,
"keys" : [
{ "key_id": 12, "key": "filename.key" }
]
}
}
And there is JSON schema:
{
"system": {
"id": "system",
"required": true,
"type": "object",
"properties": {
"state": {
"id": "state",
"required": true,
"type": "string"
},
"id": {
"id": "id",
"required": true,
"type": "number"
}
}
}
}
As you can see, the schema does not contain "keys" property.
I need some tool, which could filter the JSON data using the schema and provide following JSON as an output:
{
"system" : {
"state" : "enabled",
"id" : 5
}
}
Since there is no tool, for filtering JSON data against schema, I have resolved my task as follows.
Created template of expected JSON file. Actually it is already filtered JSON file, but without data.
{
"system" : {
"state" : "",
"id" : 0
}
}
Then go through the data file and the template file and just copy values from one to another for properties that exist in both files.
You can use jsonschema to validate your json against the schema, check this example
from jsonschema import validate
schema = {"type" : "object","properties" : { "price" : {"type" : "number"},"name" : {"type" : "string"},},}
validate(instance={"name" : "Eggs", "price" : 34}, schema=schema)
If no exception is raised by validate(), the instance is valid
Purpose of JSON schema is to validate given JSON input against a defined schema. As #Relequestual says in a comment you cannot use JSON schema to filter out fields directly.
If you need to remove only keys field then you do not need to use JSON schema at all. You could simply remove the field from JSON input.
In case you need to filter out a bunch of unexpected fields from the input you could use JSON schema to identify those fields. But you need to do filtering part manually or using another library since JSON schema cannot do that for you.
You could use additionalProperties field to restrict unexpected keys.
{
"type":"object",
"required":false,
"properties":{
"system": {
"id": "system",
"required": true,
"type": "object",
"properties": {
"state": {
"id": "state",
"required": true,
"type": "string"
},
"id": {
"id": "id",
"required": true,
"type": "number"
}
},
"additionalProperties": false
}
}
}
This will give a validation error like following
Message:
Property 'keys' has not been defined and the schema does not allow additional properties.
Schema path:
#/properties/system/additionalProperties
This may not be the exact answer you are looking for. But hope it helps.
My question for jsonschema is twofold:
Given
{
"foo": {"ar": {"a": "r"}},
"bar": ""
}
How do I check if the key "ar" exists inside of "foo"?
And only if "ar" exists inside of "foo", how do I make it so that "bar" must exists inside the given json?
I have tried looking other SO answers or jsonschema docs, but they only seem to check if the key has a specific value rather than if the key just exists regardless of its value. And the jsonschema for nested objects only seem to check for the deepest level of the nest rather than somewhere in the middle.
I have come up with this, but it doesn't work.
{
"definitions": {},
"$schema": "https://json-schema.org/draft-07/schema#",
"$id": "https://example.com/root.json",
"type": "object",
"properties": {
"foo": {
"type": "object"
},
"bar": {
"type": "string"
}
},
"required": [
"foo"
],
"if": {
"properties": {
"foo": {
"properties": {
"ar": {
"type": "object"
}
}
}
}
},
"then": {
"required": [
"bar"
]
}
}
To test if the property is present, use the required keyword.
{
"properties": {
"foo": {
"required": ["ar"]
}
},
"required": ["foo"]
}
This schema validates to true if /foo/ar is present and false if it's not. Use this in place of your if schema and your conditional should work as expected.
I have reviewed avro documentation as well as several examples online (and similar StackOverflow questions). I then attempted to define an avro schema, and had to progressively back out fields to determine what my issue was (the error message from the avro library in python was not as helpful as one would hope). I have a JSON document that I would like to convert to Avro and I need a schema to be specified for that purpose (using avro-tools to generate the schema from the json did not work as expected and yielded an AvroTypeException when attempting to convert the json into avro). I am using Avro version 1.7.7. Here is the JSON document for which I would like to define the avro schema:
{
"method": "Do_Thing",
"code": 200,
"reason": "OK",
"siteId": {
"string": "a1283632-121a-4a3f-9560-7b73830f94j8"
}
}
I was able to define the schema for the non-complex types but not for the complex "siteId" field:
{
"namespace" : "com.example",
"name" : "methodEvent",
"type" : "record",
"fields" : [
{"name": "method", "type": "string"},
{"name": "code", "type": "int"},
{"name": "reason", "type": "string"}
{"name": "siteId", "type": [ "null", "string" ]}
]
}
Attempting to use the previous schema to convert the Json object to avro yields an avro.io.AvroTypeException: The datum [See JSON Object above] is not an example of the schema [See Avro Schema Object above]. I only see this error when attempting to define a field in the schema to represent the "siteId" field in the above json.
Avro's python implementation represents unions differently than their JSON encoding: it "unwraps" them, so the siteId field is expected to be just the string, without the wrapping object. See below for a few examples.
Valid JSON encodings
Non-null siteid:
{
"method": "Do_Thing",
"code": 200,
"reason": "OK",
"siteId": {
"string": "a1283632-121a-4a3f-9560-7b73830f94j8"
}
}
Null siteid:
{
"method": "Do_Thing",
"code": 200,
"reason": "OK",
"siteId": null
}
Valid python objects (in-memory representation)
Non-null siteid:
{
"method": "Do_Thing",
"code": 200,
"reason": "OK",
"siteId": "a1283632-121a-4a3f-9560-7b73830f94j8"
}
Null siteid:
{
"method": "Do_Thing",
"code": 200,
"reason": "OK",
"siteId": null
}
Note that nulls are unwrapped in both cases which is why your solution isn't working.
Unfortunately, the python implementation doesn't have a JSON decoder/encoder currently (AFAIK), so there is no easy way to translate between the two representations. Depending on the source of your JSON-encoded data, the simplest might be to edit it to not wrap union instances anymore.
I was able to resolve the issue with the following schema:
{
"namespace" : "com.example",
"name" : "methodEvent",
"type" : "record",
"fields" : [
{"name": "method", "type": "string"},
{"name": "code", "type": "int"},
{"name": "reason", "type": "string"}
{
"name": "siteId",
"type": {
"name" : "siteId",
"type" : "record",
"fields" : [
"name" : "string",
"type" : [ "null", "string" ]
]
}
},
"default" : null
]
}