I have a set of jsonschema compliant documents. Some documents contain references to other documents (via the $ref attribute). I do not wish to host these documents such that they are accessible at an HTTP URI. As such, all references are relative. All documents live in a local folder structure.
How can I make python-jsonschema understand to properly use my local file system to load referenced documents?
For instance, if I have a document with filename defs.json containing some definitions. And I try to load a different document which references it, like:
{
"allOf": [
{"$ref":"defs.json#/definitions/basic_event"},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["page_load"]
}
},
"required": ["action"]
}
]
}
I get an error RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/defs.json'>
It may be important that I'm on a linux box.
(I'm writing this as a Q&A because I had a hard time figuring this out and observed other folks having trouble too.)
I had the hardest time figuring out how to resolve against a set of schemas that $ref each other without going to the network. It turns out the key is to create the RefResolver with a store that is a dict which maps from url to schema.
import json
from jsonschema import RefResolver, Draft7Validator
address="""
{
"$id": "https://example.com/schemas/address",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"],
"additionalProperties": false
}
"""
customer="""
{
"$id": "https://example.com/schemas/customer",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"additionalProperties": false
}
"""
data = """
{
"first_name": "John",
"last_name": "Doe",
"shipping_address": {
"street_address": "1600 Pennsylvania Avenue NW",
"city": "Washington",
"state": "DC"
},
"billing_address": {
"street_address": "1st Street SE",
"city": "Washington",
"state": "DC"
}
}
"""
address_schema = json.loads(address)
customer_schema = json.loads(customer)
schema_store = {
address_schema['$id'] : address_schema,
customer_schema['$id'] : customer_schema,
}
resolver = RefResolver.from_schema(customer_schema, store=schema_store)
validator = Draft7Validator(customer_schema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
The above was built with jsonschema==4.9.1.
You must build a custom jsonschema.RefResolver for each schema which uses a relative reference and ensure that your resolver knows where on the filesystem the given schema lives.
Such as...
import os
import json
from jsonschema import Draft4Validator, RefResolver # We prefer Draft7, but jsonschema 3.0 is still in alpha as of this writing
abs_path_to_schema = '/path/to/schema-doc-foobar.json'
with open(abs_path_to_schema, 'r') as fp:
schema = json.load(fp)
resolver = RefResolver(
# The key part is here where we build a custom RefResolver
# and tell it where *this* schema lives in the filesystem
# Note that `file:` is for unix systems
schema_path='file:{}'.format(abs_path_to_schema),
schema=schema
)
Draft4Validator.check_schema(schema) # Unnecessary but a good idea
validator = Draft4Validator(schema, resolver=resolver, format_checker=None)
# Then you can...
data_to_validate = `{...}`
validator.validate(data_to_validate)
EDIT-1
Fixed a wrong reference ($ref) to base schema.
Updated the example to use the one from the docs: https://json-schema.org/understanding-json-schema/structuring.html
EDIT-2
As pointed out in the comments, in the following I'm using the following imports:
from jsonschema import validate, RefResolver
from jsonschema.validators import validator_for
This is just another version of #Daniel's answer -- which was the one correct for me. Basically, I decided to define the $schema in a base schema. Which then release the other schemas and makes for a clear call when instantiating the resolver.
The fact that RefResolver.from_schema() gets (1) some schema and also (2) a schema-store was not very clear to me whether the order and which "some" schema were relevant here. And so the structure you see below.
I have the following:
base.schema.json:
{
"$schema": "http://json-schema.org/draft-07/schema#"
}
definitions.schema.json:
{
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
address.schema.json:
{
"type": "object",
"properties": {
"billing_address": { "$ref": "definitions.schema.json#" },
"shipping_address": { "$ref": "definitions.schema.json#" }
}
}
I like this setup for two reasons:
Is a cleaner call on RefResolver.from_schema():
base = json.loads(open('base.schema.json').read())
definitions = json.loads(open('definitions.schema.json').read())
schema = json.loads(open('address.schema.json').read())
schema_store = {
base.get('$id','base.schema.json') : base,
definitions.get('$id','definitions.schema.json') : definitions,
schema.get('$id','address.schema.json') : schema,
}
resolver = RefResolver.from_schema(base, store=schema_store)
Then I profit from the handy tool the library provides give you the best validator_for your schema (according to your $schema key):
Validator = validator_for(base)
And then just put them together to instantiate validator:
validator = Validator(schema, resolver=resolver)
Finally, you validate your data:
data = {
"shipping_address": {
"street_address": "1600 Pennsylvania Avenue NW",
"city": "Washington",
"state": "DC"
},
"billing_address": {
"street_address": "1st Street SE",
"city": "Washington",
"state": 32
}
}
This one will crash since "state": 32:
>>> validator.validate(data)
ValidationError: 32 is not of type 'string'
Failed validating 'type' in schema['properties']['billing_address']['properties']['state']:
{'type': 'string'}
On instance['billing_address']['state']:
32
Change that to "DC", and will validate.
Following up on the answer #chris-w provided, I wanted to do this same thing with jsonschema 3.2.0 but his answer didn't quite cover it I hope this answer helps those who are still coming to this question for help but are using a more recent version of the package.
To extend a JSON schema using the library, do the following:
Create the base schema:
base.schema.json
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
Create the extension schema
extend.schema.json
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
Create your JSON file you want to test against the schema
data.json
{
"prop": "This is the property",
"extra": true
}
Create your RefResolver and Validator for the base Schema and use it to check the data
#Set up schema, resolver, and validator on the base schema
baseSchema = json.loads(baseSchemaJSON) # Create a schema dictionary from the base JSON file
relativeSchema = json.loads(relativeJSON) # Create a schema dictionary from the relative JSON file
resolver = RefResolver.from_schema(baseSchema) # Creates your resolver, uses the "$id" element
validator = Draft7Validator(relativeSchema, resolver=resolver) # Create a validator against the extended schema (but resolving to the base schema!)
# Check validation!
data = json.loads(dataJSON) # Create a dictionary from the data JSON file
validator.validate(data)
You may need to make a few adjustments to the above entries, such as not using the Draft7Validator. This should work for single-level references (children extending a base), you will need to be careful with your schemas and how you set up the RefResolver and Validator objects.
P.S. Here is a snipped that exercises the above. Try modifying the data string to remove one of the required attributes:
import json
from jsonschema import RefResolver, Draft7Validator
base = """
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
"""
extend = """
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
"""
data = """
{
"prop": "This is the property string",
"extra": true
}
"""
schema = json.loads(base)
extendedSchema = json.loads(extend)
resolver = RefResolver.from_schema(schema)
validator = Draft7Validator(extendedSchema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
My approach is to preload all schema fragments to RefResolver cache. I created a gist that illustrates this: https://gist.github.com/mrtj/d59812a981da17fbaa67b7de98ac3d4b
This is what I used to dynamically generate a schema_store from all schemas in a given directory
base.schema.json
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
extend.schema.json
{
"$id": "extend.schema.json",
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
instance.json
{
"prop": "This is the property string",
"extra": true
}
validator.py
import json
from pathlib import Path
from jsonschema import Draft7Validator, RefResolver
from jsonschema.exceptions import RefResolutionError
schemas = (json.load(open(source)) for source in Path("schema/dir").iterdir())
schema_store = {schema["$id"]: schema for schema in schemas}
schema = json.load(open("schema/dir/extend.schema.json"))
instance = json.load(open("instance/dir/instance.json"))
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = Draft7Validator(schema, resolver=resolver)
try:
errors = sorted(validator.iter_errors(instance), key=lambda e: e.path)
except RefResolutionError as e:
print(e)
Related
I am trying to validate the json for required fields using python. I am doing it manually like iterating through the json reading it. Howerver i am looking for more of library / generic solution to handle all scenarios.
For example I want to check in a list, if a particular attribute is available in all the list items.
Here is the sample json which I am trying to validate.
{
"service": {
"refNumber": "abc",
"item": [{
"itemnumber": "1",
"itemloc": "uk"
}, {
"itemnumber": "2",
"itemloc": "us"
}]
}
}
I want to validate if I have refNumber and itemnumber in all the list items.
A JSON Schema is a way to define the structure of JSON.
There are some accompanying python packages which can use a JSON schema to validate JSON (jsonschema).
The JSON Schema for your example would look approximately like this:
{
"type": "object",
"properties": {
"service": {
"type": "object",
"properties": {
"refNumber": {
"type": "string"
},
"item": {
"type": "array",
"items": {
"type": "object",
"properties": {
"itemnumber": {
"type": "string"
},
"itemloc": {
"type": "string"
}
}
}
}
}
}
}
}
i.e., an object containing service, which itself contains a refNumber and a list of items.
Since i dont have enough rep to add a comment i will post this answer.
First i have to say i dont program with python.
According to my google search, you have a jsonschema module available for Python.
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"service": {"object": {
"refNumber": {"type" : "string"},
"item: {"array": []}
},
"required": ["refNumber"]
},
},
}
validate(instance=yourJSON, schema=yourValidationSchema)
This example is not tested, but you can get some idea,
Link to jsonschema docs
A Flink SQL application receives data from an AWS Kinesis Data Stream, where the received messages are in JSON and where the schema is expressed in JSON Schema and which contains a property which is not a primitive object, for example:
{
"$id": "https://example.com/schemas/customer",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"$defs": {
"address": {
"$id": "/schemas/address",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "$ref": "#/definitions/state" }
},
"required": ["street_address", "city", "state"],
"definitions": {
"state": { "enum": ["CA", "NY", "... etc ..."] }
}
}
}
}
I can see in the documentation that:
Currently, registered structured types are not supported. Thus, they
cannot be stored in a catalog or referenced in a CREATE TABLE DDL.
So if I cannot use CREATE TABLE in order to create an input table representing the stream of data my application is receiving, how should I handle the stream of data? Can I even use Flink SQL at all?
NOTE: I need to write my application in Python.
I am trying to validate a JSON file using the schema listed below, I can enter any additional fields, I don't understand, what I am doing wrong and why please?
Sample JSON Data
{
"npcs":
[
{
"id": 0,
"name": "Pilot Alpha",
"isNPC": true,
"race": "1e",
"testNotValid": false
},
{
"id": 1,
"name": "Pilot Beta",
"isNPC": true,
"race": 1
}
]
}
JSON Schema
I have set "required" and "additionalProperties" so I thought the validation would fail....
FileSchema = {
"definitions":
{
"NpcEntry":
{
"properties":
{
"id": { "type": "integer" },
"name": { "type" : "string" },
"isNPC": { "type": "boolean" },
"race": { "type" : "integer" }
},
"required": [ "id", "name", "isNPC", "race" ],
"additionalProperties": False
}
},
"type": "object",
"required": [ "npcs" ],
"additionalProperties": False,
"properties":
{
"npcs":
{
"type": "array",
"npcs": { "$ref": "#/definitions/NpcEntry" }
}
}
}
The JSON file and schema are processed using the jsonschema package for Python, (I am using python 3.7 on a Mac).
The method I use to read and validate is below, I have removed a lot of the general validation to make the code as short and usable as possible:
import json
import jsonschema
def _ReadJsonfile(self, filename, schemaSystem, fileType):
with open(filename) as fileHandle:
fileContents = fileHandle.read()
jsonData = json.loads(fileContents)
try:
jsonschema.validate(instance=jsonData, schema=schemaSystem)
except jsonschema.exceptions.ValidationError as ex:
print(f"JSON schema validation failed for file '{filename}'")
return None
return jsonData
at: "npcs": { "$ref": "#/definitions/NpcEntry" }
change "npcs" to "items". npcs is not a valid keyword so it is ignored. The only validation that is happening is at the top level, verifying that the data is an object and that the one property is an array.
I am using python to dereference JSON from two/more files.
Something like below,
content of file1 (Primary.json):
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/order/v2/order.json",
"title": "LCBO order schema",
"description": "Canonical order structure describing various order types",
"definitions": {
"addressDetail": {
"description": "Address Information",
"type": "object",
"properties": {
"name": {
"type": ["string","null"],
"minLength": 1,
"maxLength": 64
},
"age": {
"$ref": "secondary.json#/properties/age"
}
}
}
}
}
And the File2(Secondary.json) :
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/enumerations.json",
"title": "Enumerations used for JSON schemas",
"description": "Catalog of allowed values for schema properties",
"properties": {
"age": {
"type": "integer"
}
}
}
My idea is to use jsonref.I tried this library from the answer
json reference extraction in python
but in this case the reference is mentioned in the same file like this,
json_str = """{"real": [1, 2, 3, 4], "ref": {"$ref": "#/real"}}"""
data = jsonref.loads(json_str)
but in my case, the reference is in another file. So I tried to merge two files with jsonmerge and to use jsonref,
I tried using jsonmerge and jsonref with the below code,
import jsonmerge
import jsonref
import pprint
head = open('AltPrimary.json')
tail = open('Secondary.json')
result = jsonmerge.merge(head,tail)
final = jsonref.loads(data3)
this errors out in jsonref.loads because it doesn't know that the second part (after merging) is from 'Secondary.json'. So, it errors out while reading,
$ref": "secondary.json#/properties/age"
I tried by concatenating, 'seondary' to the file 'secondary.json' like,
"Secondary" :{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://hcmdevblobsa.blob.core.windows.net/enumerations.json",
"title": "Enumerations used for JSON schemas",
"description": "Catalog of allowed values for schema properties",
"properties": {
"age": {
"type": "integer"
}
}
}
but during validation it failed. In real world I may need to dereference from multiple files like below,
"person": {
"$ref": "schemas/people/Bruce-Wayne.json"
},
"place": {
"$ref": "schemas/places.yaml#/definitions/Gotham-City"
},
It would be helpful if anyone has any thoughts on this. Many thanks.
I'm trying to validate a json file against a schema using python and jsonschema module. My schema is made up from a list of schemas, one of them has definitions of basic elements and the rest are collections of these elements and other objects.
I can't find the documentation for function which loads the list of schemas so that I can validate using it. I tried separating schemas into a dictionary and calling the appropriate one on a jsonObject, but that doesn't work since they cross reference each other.
How do I load/assemble all schemas into one for validation?
Part of the schema I'm trying to load:
[{
"definitions": {
"location": {
"required": [
"name",
"country"
],
"additionalProperties": false,
"properties": {
"country": {
"pattern": "^[A-Z]{2}$",
"type": "string"
},
"name": {
"type": "string"
}
},
"type": "object"
}
},
"required": [
"type",
"content"
],
"additionalProperties": false,
"properties": {
"content": {
"additionalProperties": false,
"type": "object"
},
"type": {
"type": "string"
}
},
"type": "object",
"title": "base",
"$schema": "http://json-schema.org/draft-04/schema#"
},
{
"properties": {
"content": {
"required": [
"address"
],
"properties": {
"address": {
"$ref": "#/definitions/location"
}
},
"type": {
"pattern": "^person$"
}
}
}]
And the json object would look something like this:
{
"type":"person",
"content":{
"address": {
"country": "US",
"name" : "1,Street,City,State,US"
}
}
}
You can only validate against one schema at a time, but that schema can reference ($ref) external schemas. These references are usually URIs that can be used to GET the schema. A filepath might work too if your schemas are not public. Using a fixed up version of your example, this would look something like this ...
http://your-domain.com/schema/person
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Person",
"allOf": [{ "$ref": "http://your-domain.com/schema/base#" }],
"properties": {
"type": { "enum": ["person"] },
"content": {
"properties": {
"address": { "$ref": "http://your-domain.com/schema/base#/definitions/location" }
},
"required": ["address"],
"additionalProperties": false
}
}
}
http://your-domain.com/schema/base
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "base",
"type": "object",
"properties": {
"content": { "type": "object" },
"type": { "type": "string" }
},
"required": ["type", "content"],
"additionalProperties": false,
"definitions": {
"location": {
"type": "object",
"properties": {
"country": {
"type": "string",
"pattern": "^[A-Z]{2}$"
},
"name": { "type": "string" }
},
"required": ["name", "country"],
"additionalProperties": false
}
}
}
Some documentation that might be useful
https://python-jsonschema.readthedocs.org/en/latest/validate/#the-validator-interface
https://python-jsonschema.readthedocs.org/en/latest/references/
Instead of hand coding a single schema from all your schemata, you can create a small schema which refers to the other schema files. This way you can use multiple existing JSONschema files and validate against them in combination:
import yaml
import jsonschema
A_yaml = """
id: http://foo/a.json
type: object
properties:
prop:
$ref: "./num_string.json"
"""
num_string_yaml = """
id: http://foo/num_string.json
type: string
pattern: ^[0-9]*$
"""
A = yaml.load(A_yaml)
num_string = yaml.load(num_string_yaml)
resolver = jsonschema.RefResolver("",None,
store={
"http://foo/A.json":A,
"http://foo/num_string.json":num_string,
})
validator = jsonschema.Draft4Validator(
A, resolver=resolver)
try:
validator.validate({"prop":"1234"})
print "Properly accepted object"
except jsonschema.exceptions.ValidationError:
print "Failed to accept object"
try:
validator.validate({"prop":"12d34"})
print "Failed to reject object"
except jsonschema.exceptions.ValidationError:
print "Properly rejected object"
Note that you may want to combine the external using one of the schema cominators oneOf, allOf, or anyOf to combine your schemata like so:
[A.yaml]
oneOf:
- $ref: "sub-schema1.json"
- $ref: "sub-schema2.json"
- $ref: "sub-schema3.json"