Python JSON schema validation for array of objects - python

I am trying to validate a JSON file using the schema listed below, I can enter any additional fields, I don't understand, what I am doing wrong and why please?
Sample JSON Data
{
"npcs":
[
{
"id": 0,
"name": "Pilot Alpha",
"isNPC": true,
"race": "1e",
"testNotValid": false
},
{
"id": 1,
"name": "Pilot Beta",
"isNPC": true,
"race": 1
}
]
}
JSON Schema
I have set "required" and "additionalProperties" so I thought the validation would fail....
FileSchema = {
"definitions":
{
"NpcEntry":
{
"properties":
{
"id": { "type": "integer" },
"name": { "type" : "string" },
"isNPC": { "type": "boolean" },
"race": { "type" : "integer" }
},
"required": [ "id", "name", "isNPC", "race" ],
"additionalProperties": False
}
},
"type": "object",
"required": [ "npcs" ],
"additionalProperties": False,
"properties":
{
"npcs":
{
"type": "array",
"npcs": { "$ref": "#/definitions/NpcEntry" }
}
}
}
The JSON file and schema are processed using the jsonschema package for Python, (I am using python 3.7 on a Mac).
The method I use to read and validate is below, I have removed a lot of the general validation to make the code as short and usable as possible:
import json
import jsonschema
def _ReadJsonfile(self, filename, schemaSystem, fileType):
with open(filename) as fileHandle:
fileContents = fileHandle.read()
jsonData = json.loads(fileContents)
try:
jsonschema.validate(instance=jsonData, schema=schemaSystem)
except jsonschema.exceptions.ValidationError as ex:
print(f"JSON schema validation failed for file '{filename}'")
return None
return jsonData

at: "npcs": { "$ref": "#/definitions/NpcEntry" }
change "npcs" to "items". npcs is not a valid keyword so it is ignored. The only validation that is happening is at the top level, verifying that the data is an object and that the one property is an array.

Related

Validation Json Schema

I am trying to validate the json for required fields using python. I am doing it manually like iterating through the json reading it. Howerver i am looking for more of library / generic solution to handle all scenarios.
For example I want to check in a list, if a particular attribute is available in all the list items.
Here is the sample json which I am trying to validate.
{
"service": {
"refNumber": "abc",
"item": [{
"itemnumber": "1",
"itemloc": "uk"
}, {
"itemnumber": "2",
"itemloc": "us"
}]
}
}
I want to validate if I have refNumber and itemnumber in all the list items.
A JSON Schema is a way to define the structure of JSON.
There are some accompanying python packages which can use a JSON schema to validate JSON (jsonschema).
The JSON Schema for your example would look approximately like this:
{
"type": "object",
"properties": {
"service": {
"type": "object",
"properties": {
"refNumber": {
"type": "string"
},
"item": {
"type": "array",
"items": {
"type": "object",
"properties": {
"itemnumber": {
"type": "string"
},
"itemloc": {
"type": "string"
}
}
}
}
}
}
}
}
i.e., an object containing service, which itself contains a refNumber and a list of items.
Since i dont have enough rep to add a comment i will post this answer.
First i have to say i dont program with python.
According to my google search, you have a jsonschema module available for Python.
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"service": {"object": {
"refNumber": {"type" : "string"},
"item: {"array": []}
},
"required": ["refNumber"]
},
},
}
validate(instance=yourJSON, schema=yourValidationSchema)
This example is not tested, but you can get some idea,
Link to jsonschema docs

how to limit the additions field in jsonschema where they is undeclare

In json object some field is not what i need,but they still through my validation,like below.
The jsonchemaļ¼š
my_schema = {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["endpoint", "bucket"],
"properties": {
"endpoint": {
"type": "string"
},
"bucket": {
"type": "string",
"minLength": 1
},
"dir": {
"type": "string",
"minLength": 1
}
}
}
}
The jsons:
one = [{
"endpoint": "",
"bucket": "2020-10-27 16:20:24",
"dir": "/tmp"
}]
two = [{
"endpoint": "",
"bucket": "2020-10-27 16:20:24",
"dir": "/tmp",
"id": "leaking"
}]
And i run this code in python2.7:
import jsonschema
from jsonschema import validate
print(jsonschema.__version__)
validate(instance=one, schema=my_schema)
validate(instance=two, schema=my_schema) # how can i let this line raise a Exception?
print("validated one, two.")
Output:
3.2.0
passed one, two. # this not my intention
The foregoing json two have a field "id" need to be excluded, and any fields not declared in "properties" are treated the same way.
Use additionalProperties
The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword. By default any additional properties are allowed.
The additionalProperties keyword may be either a boolean or an object. If additionalProperties is a boolean and set to false, no additional properties will be allowed.
From docs
Your schema would look like:
{
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["endpoint", "bucket"],
"properties": {
"endpoint": {
"type": "string"
},
"bucket": {
"type": "string",
"minLength": 1
},
"dir": {
"type": "string",
"minLength": 1
}
},
"additionalProperties": false
}
}

How to set up local file references in python-jsonschema document?

I have a set of jsonschema compliant documents. Some documents contain references to other documents (via the $ref attribute). I do not wish to host these documents such that they are accessible at an HTTP URI. As such, all references are relative. All documents live in a local folder structure.
How can I make python-jsonschema understand to properly use my local file system to load referenced documents?
For instance, if I have a document with filename defs.json containing some definitions. And I try to load a different document which references it, like:
{
"allOf": [
{"$ref":"defs.json#/definitions/basic_event"},
{
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["page_load"]
}
},
"required": ["action"]
}
]
}
I get an error RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/defs.json'>
It may be important that I'm on a linux box.
(I'm writing this as a Q&A because I had a hard time figuring this out and observed other folks having trouble too.)
I had the hardest time figuring out how to resolve against a set of schemas that $ref each other without going to the network. It turns out the key is to create the RefResolver with a store that is a dict which maps from url to schema.
import json
from jsonschema import RefResolver, Draft7Validator
address="""
{
"$id": "https://example.com/schemas/address",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"],
"additionalProperties": false
}
"""
customer="""
{
"$id": "https://example.com/schemas/customer",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"additionalProperties": false
}
"""
data = """
{
"first_name": "John",
"last_name": "Doe",
"shipping_address": {
"street_address": "1600 Pennsylvania Avenue NW",
"city": "Washington",
"state": "DC"
},
"billing_address": {
"street_address": "1st Street SE",
"city": "Washington",
"state": "DC"
}
}
"""
address_schema = json.loads(address)
customer_schema = json.loads(customer)
schema_store = {
address_schema['$id'] : address_schema,
customer_schema['$id'] : customer_schema,
}
resolver = RefResolver.from_schema(customer_schema, store=schema_store)
validator = Draft7Validator(customer_schema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
The above was built with jsonschema==4.9.1.
You must build a custom jsonschema.RefResolver for each schema which uses a relative reference and ensure that your resolver knows where on the filesystem the given schema lives.
Such as...
import os
import json
from jsonschema import Draft4Validator, RefResolver # We prefer Draft7, but jsonschema 3.0 is still in alpha as of this writing
abs_path_to_schema = '/path/to/schema-doc-foobar.json'
with open(abs_path_to_schema, 'r') as fp:
schema = json.load(fp)
resolver = RefResolver(
# The key part is here where we build a custom RefResolver
# and tell it where *this* schema lives in the filesystem
# Note that `file:` is for unix systems
schema_path='file:{}'.format(abs_path_to_schema),
schema=schema
)
Draft4Validator.check_schema(schema) # Unnecessary but a good idea
validator = Draft4Validator(schema, resolver=resolver, format_checker=None)
# Then you can...
data_to_validate = `{...}`
validator.validate(data_to_validate)
EDIT-1
Fixed a wrong reference ($ref) to base schema.
Updated the example to use the one from the docs: https://json-schema.org/understanding-json-schema/structuring.html
EDIT-2
As pointed out in the comments, in the following I'm using the following imports:
from jsonschema import validate, RefResolver
from jsonschema.validators import validator_for
This is just another version of #Daniel's answer -- which was the one correct for me. Basically, I decided to define the $schema in a base schema. Which then release the other schemas and makes for a clear call when instantiating the resolver.
The fact that RefResolver.from_schema() gets (1) some schema and also (2) a schema-store was not very clear to me whether the order and which "some" schema were relevant here. And so the structure you see below.
I have the following:
base.schema.json:
{
"$schema": "http://json-schema.org/draft-07/schema#"
}
definitions.schema.json:
{
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "type": "string" }
},
"required": ["street_address", "city", "state"]
}
address.schema.json:
{
"type": "object",
"properties": {
"billing_address": { "$ref": "definitions.schema.json#" },
"shipping_address": { "$ref": "definitions.schema.json#" }
}
}
I like this setup for two reasons:
Is a cleaner call on RefResolver.from_schema():
base = json.loads(open('base.schema.json').read())
definitions = json.loads(open('definitions.schema.json').read())
schema = json.loads(open('address.schema.json').read())
schema_store = {
base.get('$id','base.schema.json') : base,
definitions.get('$id','definitions.schema.json') : definitions,
schema.get('$id','address.schema.json') : schema,
}
resolver = RefResolver.from_schema(base, store=schema_store)
Then I profit from the handy tool the library provides give you the best validator_for your schema (according to your $schema key):
Validator = validator_for(base)
And then just put them together to instantiate validator:
validator = Validator(schema, resolver=resolver)
Finally, you validate your data:
data = {
"shipping_address": {
"street_address": "1600 Pennsylvania Avenue NW",
"city": "Washington",
"state": "DC"
},
"billing_address": {
"street_address": "1st Street SE",
"city": "Washington",
"state": 32
}
}
This one will crash since "state": 32:
>>> validator.validate(data)
ValidationError: 32 is not of type 'string'
Failed validating 'type' in schema['properties']['billing_address']['properties']['state']:
{'type': 'string'}
On instance['billing_address']['state']:
32
Change that to "DC", and will validate.
Following up on the answer #chris-w provided, I wanted to do this same thing with jsonschema 3.2.0 but his answer didn't quite cover it I hope this answer helps those who are still coming to this question for help but are using a more recent version of the package.
To extend a JSON schema using the library, do the following:
Create the base schema:
base.schema.json
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
Create the extension schema
extend.schema.json
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
Create your JSON file you want to test against the schema
data.json
{
"prop": "This is the property",
"extra": true
}
Create your RefResolver and Validator for the base Schema and use it to check the data
#Set up schema, resolver, and validator on the base schema
baseSchema = json.loads(baseSchemaJSON) # Create a schema dictionary from the base JSON file
relativeSchema = json.loads(relativeJSON) # Create a schema dictionary from the relative JSON file
resolver = RefResolver.from_schema(baseSchema) # Creates your resolver, uses the "$id" element
validator = Draft7Validator(relativeSchema, resolver=resolver) # Create a validator against the extended schema (but resolving to the base schema!)
# Check validation!
data = json.loads(dataJSON) # Create a dictionary from the data JSON file
validator.validate(data)
You may need to make a few adjustments to the above entries, such as not using the Draft7Validator. This should work for single-level references (children extending a base), you will need to be careful with your schemas and how you set up the RefResolver and Validator objects.
P.S. Here is a snipped that exercises the above. Try modifying the data string to remove one of the required attributes:
import json
from jsonschema import RefResolver, Draft7Validator
base = """
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
"""
extend = """
{
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
"""
data = """
{
"prop": "This is the property string",
"extra": true
}
"""
schema = json.loads(base)
extendedSchema = json.loads(extend)
resolver = RefResolver.from_schema(schema)
validator = Draft7Validator(extendedSchema, resolver=resolver)
jsonData = json.loads(data)
validator.validate(jsonData)
My approach is to preload all schema fragments to RefResolver cache. I created a gist that illustrates this: https://gist.github.com/mrtj/d59812a981da17fbaa67b7de98ac3d4b
This is what I used to dynamically generate a schema_store from all schemas in a given directory
base.schema.json
{
"$id": "base.schema.json",
"type": "object",
"properties": {
"prop": {
"type": "string"
}
},
"required": ["prop"]
}
extend.schema.json
{
"$id": "extend.schema.json",
"allOf": [
{"$ref": "base.schema.json"},
{
"properties": {
"extra": {
"type": "boolean"
}
},
"required": ["extra"]
}
]
}
instance.json
{
"prop": "This is the property string",
"extra": true
}
validator.py
import json
from pathlib import Path
from jsonschema import Draft7Validator, RefResolver
from jsonschema.exceptions import RefResolutionError
schemas = (json.load(open(source)) for source in Path("schema/dir").iterdir())
schema_store = {schema["$id"]: schema for schema in schemas}
schema = json.load(open("schema/dir/extend.schema.json"))
instance = json.load(open("instance/dir/instance.json"))
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = Draft7Validator(schema, resolver=resolver)
try:
errors = sorted(validator.iter_errors(instance), key=lambda e: e.path)
except RefResolutionError as e:
print(e)

Creating a config file to map model in json format to csv

How can I build a config file that has a python model in json format that maps to the csv column name?
My plan is when a json data comes in, it will look into this config file, map the data's field with the config's json field, then get the csv column name. Example as below:
First I have a json as below:
{
"id": {
"type": "integer",
"format": "int64"
},
"category": {
"$ref": "#/definitions/Category",
"x-scope": [
"https://petstore.swagger.io/v2/swagger.json"
]
},
"name": {
"type": "string",
"example": "doggie"
},
"photoUrls": {
"type": "array",
"xml": {
"name": "photoUrl",
"wrapped": true
},
"items": {
"type": "string"
}
},
"tags": {
"type": "array",
"xml": {
"name": "tag",
"wrapped": true
},
"items": {
"$ref": "#/definitions/Tag",
"x-scope": [
"https://petstore.swagger.io/v2/swagger.json"
]
}
},
"status": {
"type": "string",
"description": "pet status in the store",
"enum": [
"available",
"pending",
"sold"
]
}
}
Then I have csv column name as below:
id, category, name, photoUrls, tags, status
Now I want a config file which has mapping of this 2 field and probably some settings like delimiter type , etc.
How can I achieve that using python?

python jsonschema validation using schema list

I'm trying to validate a json file against a schema using python and jsonschema module. My schema is made up from a list of schemas, one of them has definitions of basic elements and the rest are collections of these elements and other objects.
I can't find the documentation for function which loads the list of schemas so that I can validate using it. I tried separating schemas into a dictionary and calling the appropriate one on a jsonObject, but that doesn't work since they cross reference each other.
How do I load/assemble all schemas into one for validation?
Part of the schema I'm trying to load:
[{
"definitions": {
"location": {
"required": [
"name",
"country"
],
"additionalProperties": false,
"properties": {
"country": {
"pattern": "^[A-Z]{2}$",
"type": "string"
},
"name": {
"type": "string"
}
},
"type": "object"
}
},
"required": [
"type",
"content"
],
"additionalProperties": false,
"properties": {
"content": {
"additionalProperties": false,
"type": "object"
},
"type": {
"type": "string"
}
},
"type": "object",
"title": "base",
"$schema": "http://json-schema.org/draft-04/schema#"
},
{
"properties": {
"content": {
"required": [
"address"
],
"properties": {
"address": {
"$ref": "#/definitions/location"
}
},
"type": {
"pattern": "^person$"
}
}
}]
And the json object would look something like this:
{
"type":"person",
"content":{
"address": {
"country": "US",
"name" : "1,Street,City,State,US"
}
}
}
You can only validate against one schema at a time, but that schema can reference ($ref) external schemas. These references are usually URIs that can be used to GET the schema. A filepath might work too if your schemas are not public. Using a fixed up version of your example, this would look something like this ...
http://your-domain.com/schema/person
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Person",
"allOf": [{ "$ref": "http://your-domain.com/schema/base#" }],
"properties": {
"type": { "enum": ["person"] },
"content": {
"properties": {
"address": { "$ref": "http://your-domain.com/schema/base#/definitions/location" }
},
"required": ["address"],
"additionalProperties": false
}
}
}
http://your-domain.com/schema/base
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "base",
"type": "object",
"properties": {
"content": { "type": "object" },
"type": { "type": "string" }
},
"required": ["type", "content"],
"additionalProperties": false,
"definitions": {
"location": {
"type": "object",
"properties": {
"country": {
"type": "string",
"pattern": "^[A-Z]{2}$"
},
"name": { "type": "string" }
},
"required": ["name", "country"],
"additionalProperties": false
}
}
}
Some documentation that might be useful
https://python-jsonschema.readthedocs.org/en/latest/validate/#the-validator-interface
https://python-jsonschema.readthedocs.org/en/latest/references/
Instead of hand coding a single schema from all your schemata, you can create a small schema which refers to the other schema files. This way you can use multiple existing JSONschema files and validate against them in combination:
import yaml
import jsonschema
A_yaml = """
id: http://foo/a.json
type: object
properties:
prop:
$ref: "./num_string.json"
"""
num_string_yaml = """
id: http://foo/num_string.json
type: string
pattern: ^[0-9]*$
"""
A = yaml.load(A_yaml)
num_string = yaml.load(num_string_yaml)
resolver = jsonschema.RefResolver("",None,
store={
"http://foo/A.json":A,
"http://foo/num_string.json":num_string,
})
validator = jsonschema.Draft4Validator(
A, resolver=resolver)
try:
validator.validate({"prop":"1234"})
print "Properly accepted object"
except jsonschema.exceptions.ValidationError:
print "Failed to accept object"
try:
validator.validate({"prop":"12d34"})
print "Failed to reject object"
except jsonschema.exceptions.ValidationError:
print "Properly rejected object"
Note that you may want to combine the external using one of the schema cominators oneOf, allOf, or anyOf to combine your schemata like so:
[A.yaml]
oneOf:
- $ref: "sub-schema1.json"
- $ref: "sub-schema2.json"
- $ref: "sub-schema3.json"

Categories

Resources