We're currently building an in house service to manage our users. Our challenge is syncing to multiple 3rd party systems that have disparate formats (though all in JSON).
As an example our User Schema could look like the below
{
"title": "User Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"email": {
"type": "string"
}
"required": ["firstName", "lastName"]
}
At a later point we need to transform this set of data for several other vendors that might have all or some of the fields
e.g. vendor 1
{
"first_name": {
"type": "string"
},
"last_name": {
"type": "string"
},
}
e.g. vendor 2
{
"fname": {
"type": "string"
},
"lname": {
"type": "string"
},
}
We'd need to map firstName and lastName to each of the alternative formats above for each vendor. I feel like we should be able to just shove this data into the JSON schema and do the transformations with python easily enough.
It's basically just data transformation. Is there a standard format or package in Python that can assist in the transformation?
I'm thinking something below would be a starting point but unsure.
{
"title": "User Schema",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"vendorMap" {
"vendor1": "first_name",
"vendor2": "last_name"
}
},
"lastName": {
"type": "string"
},
"email": {
"type": "string"
}
"required": ["firstName", "lastName"]
}
You could create a dictionary for each vendor and their mappings:
transformers = {'v2': {
'fname': 'first_name',
'lname': 'last_name'}}
vendor = 'v2'
# Sample data from vendor 'v2'.
v2 = {
"fname": "John",
"lname": "Smith",
"email": "john#example.com",
"random": "randomly keyed data"
}
# Convert data into standard form.
transformed_data = {
transformers[vendor].get(k, k) if vendor in transformers else k: v
for k, v in v2.iteritems()}
>>> transformed_data
{'email': 'john#example.com',
'first_name': 'John',
'last_name': 'Smith',
'random': 'randomly keyed data'}
If ordering is important, consider using an OrderedDict
Related
I am trying to validate the json for required fields using python. I am doing it manually like iterating through the json reading it. Howerver i am looking for more of library / generic solution to handle all scenarios.
For example I want to check in a list, if a particular attribute is available in all the list items.
Here is the sample json which I am trying to validate.
{
"service": {
"refNumber": "abc",
"item": [{
"itemnumber": "1",
"itemloc": "uk"
}, {
"itemnumber": "2",
"itemloc": "us"
}]
}
}
I want to validate if I have refNumber and itemnumber in all the list items.
A JSON Schema is a way to define the structure of JSON.
There are some accompanying python packages which can use a JSON schema to validate JSON (jsonschema).
The JSON Schema for your example would look approximately like this:
{
"type": "object",
"properties": {
"service": {
"type": "object",
"properties": {
"refNumber": {
"type": "string"
},
"item": {
"type": "array",
"items": {
"type": "object",
"properties": {
"itemnumber": {
"type": "string"
},
"itemloc": {
"type": "string"
}
}
}
}
}
}
}
}
i.e., an object containing service, which itself contains a refNumber and a list of items.
Since i dont have enough rep to add a comment i will post this answer.
First i have to say i dont program with python.
According to my google search, you have a jsonschema module available for Python.
from jsonschema import validate
schema = {
"type": "object",
"properties": {
"service": {"object": {
"refNumber": {"type" : "string"},
"item: {"array": []}
},
"required": ["refNumber"]
},
},
}
validate(instance=yourJSON, schema=yourValidationSchema)
This example is not tested, but you can get some idea,
Link to jsonschema docs
A Flink SQL application receives data from an AWS Kinesis Data Stream, where the received messages are in JSON and where the schema is expressed in JSON Schema and which contains a property which is not a primitive object, for example:
{
"$id": "https://example.com/schemas/customer",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"first_name": { "type": "string" },
"last_name": { "type": "string" },
"shipping_address": { "$ref": "/schemas/address" },
"billing_address": { "$ref": "/schemas/address" }
},
"required": ["first_name", "last_name", "shipping_address", "billing_address"],
"$defs": {
"address": {
"$id": "/schemas/address",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"street_address": { "type": "string" },
"city": { "type": "string" },
"state": { "$ref": "#/definitions/state" }
},
"required": ["street_address", "city", "state"],
"definitions": {
"state": { "enum": ["CA", "NY", "... etc ..."] }
}
}
}
}
I can see in the documentation that:
Currently, registered structured types are not supported. Thus, they
cannot be stored in a catalog or referenced in a CREATE TABLE DDL.
So if I cannot use CREATE TABLE in order to create an input table representing the stream of data my application is receiving, how should I handle the stream of data? Can I even use Flink SQL at all?
NOTE: I need to write my application in Python.
Please see below json schema:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"firstname": {
"type": "string"
},
"lastname": {
"type": "string"
},
"post_sql": {
"type": "object",
"properties": {
"datasets": {
"type": "array",
"items": {
"type": "object",
"properties": {
"firstname": {
"type": "string"
},
"lastname": {
"type": "string"
},
"access": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
}
}
}
}
}
I want to compare firstname and lastname of first object which is at top with firstname and lastname which is inside post_sql->datasets if they are same then all accesses to that user ["insert","update","delete"] else only["select"] access for different user.
sample data:
{
"registration": {
"firstname": "john",
"lastname": "dharman",
"post_sql": {
"datasets": [
{
"firstname": "john",
"lastname": "dharman",
"access": [
"select","insert","update","delete"
]
},
{
"firstname": "jenny",
"lastname": "shein",
"access": [
"select","insert","update","delete"
]
}
]
}
}
}
in above example
"firstname": "john",
"lastname": "dharman"
are same in first object and in post_sql->datasets:
"post_sql": {
"datasets": [
{
"firstname": "john",
"lastname": "dharman",
"access": [
"select","insert","update","delete"
]
},
so john should get all accesses but if firstname and last name are not same(like jenny in above data) then we need to give only ["select"] like in above example second object have one more dataset with:
{
"firstname": "jenny",
"lastname": "shein",
"access": [
"select","insert","update","delete"
]
}
so I want such if-else in my json schema where it will check firstname and lastname with second object and based on that if both are same then all access array should be given to that user
[
"select","insert","update","delete"
]
else only
["select"]
I tried to put if else in datasets but some how it did not work please help in this. We just need if->else-> then statements but its just this json schema contains a bit nested objects in one schema.
I have a JSON schema with which I want to validate some data, using python and the jsonschema module. However, this doesn't quite work as expected, as some of the accepted data doesn't appear valid at all (to me and the purpose of my application). Sadly, the schema is provided, so I can't change the schema itself - at least not manually.
This is a shortened version of the schema ('schema.json' in code below):
{
"type": "object",
"allOf": [
{
"type": "object",
"allOf": [
{
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
}
}
},
{
"type": "object",
"properties": {
"language": {
"type": "integer"
}
}
}
]
},
{
"type": "object",
"properties": {
"addressArray": {
"type": "array",
"items": {
"type": "object",
"properties": {
"streetNumber": {
"type": "string"
},
"street": {
"type": "string"
},
"city": {
"type": "string"
}
}
}
}
}
}
]
}
This is an example of what should be a valid instance ('person.json' in code below):
{
"firstName": "Sherlock",
"lastName": "Holmes",
"language": 1,
"addresses": [
{
"streetNumber": "221B",
"street": "Baker Street",
"city": "London"
}
]
}
This is an example of what should be considered invalid ('no_person.json' in code below):
{
"name": "eggs",
"colour": "white"
}
And this is the code I used for validating:
from json import load
from jsonschema import Draft7Validator, exceptions
with open('schema.json') as f:
schema = load(f)
with open('person.json') as f:
person = load(f)
with open('no_person.json') as f:
no_person = load(f)
validator = Draft7Validator(schema)
try:
validator.validate(person)
print("person.json is valid")
except exceptions.ValidationError:
print("person.json is invalid")
try:
validator.validate(no_person)
print("no_person.json is valid")
except exceptions.ValidationError:
print("no_person.json is invalid")
Result:
person.json is valid
no_person.json is valid
I expected no_person.json to be invalid. What can there be done to have only data such as person.json to be validated successfully? Thank you very much for your help, I'm very new to this (spent ages searching for an answer).
This is work schema and pay attention on "required" (when there is no such key - if field is doesn't get it just skipped):
{
"type": "object",
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"language": {
"type": "integer"
},
"addresses": {
"type": "array",
"items": {
"type": "object",
"properties": {
"streetNumber": {
"type": "string"
},
"street": {
"type": "string"
},
"city": {
"type": "string"
}
},
"required": [
"streetNumber",
"street",
"city"
]
}
}
},
"required": [
"firstName",
"lastName",
"language",
"addresses"
]
}
I've got:
person.json is valid
no_person.json is invalid
If you have hardest structure of response (array of objects, which contain objects etc) let me known
I'm trying to validate a json file against a schema using python and jsonschema module. My schema is made up from a list of schemas, one of them has definitions of basic elements and the rest are collections of these elements and other objects.
I can't find the documentation for function which loads the list of schemas so that I can validate using it. I tried separating schemas into a dictionary and calling the appropriate one on a jsonObject, but that doesn't work since they cross reference each other.
How do I load/assemble all schemas into one for validation?
Part of the schema I'm trying to load:
[{
"definitions": {
"location": {
"required": [
"name",
"country"
],
"additionalProperties": false,
"properties": {
"country": {
"pattern": "^[A-Z]{2}$",
"type": "string"
},
"name": {
"type": "string"
}
},
"type": "object"
}
},
"required": [
"type",
"content"
],
"additionalProperties": false,
"properties": {
"content": {
"additionalProperties": false,
"type": "object"
},
"type": {
"type": "string"
}
},
"type": "object",
"title": "base",
"$schema": "http://json-schema.org/draft-04/schema#"
},
{
"properties": {
"content": {
"required": [
"address"
],
"properties": {
"address": {
"$ref": "#/definitions/location"
}
},
"type": {
"pattern": "^person$"
}
}
}]
And the json object would look something like this:
{
"type":"person",
"content":{
"address": {
"country": "US",
"name" : "1,Street,City,State,US"
}
}
}
You can only validate against one schema at a time, but that schema can reference ($ref) external schemas. These references are usually URIs that can be used to GET the schema. A filepath might work too if your schemas are not public. Using a fixed up version of your example, this would look something like this ...
http://your-domain.com/schema/person
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Person",
"allOf": [{ "$ref": "http://your-domain.com/schema/base#" }],
"properties": {
"type": { "enum": ["person"] },
"content": {
"properties": {
"address": { "$ref": "http://your-domain.com/schema/base#/definitions/location" }
},
"required": ["address"],
"additionalProperties": false
}
}
}
http://your-domain.com/schema/base
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "base",
"type": "object",
"properties": {
"content": { "type": "object" },
"type": { "type": "string" }
},
"required": ["type", "content"],
"additionalProperties": false,
"definitions": {
"location": {
"type": "object",
"properties": {
"country": {
"type": "string",
"pattern": "^[A-Z]{2}$"
},
"name": { "type": "string" }
},
"required": ["name", "country"],
"additionalProperties": false
}
}
}
Some documentation that might be useful
https://python-jsonschema.readthedocs.org/en/latest/validate/#the-validator-interface
https://python-jsonschema.readthedocs.org/en/latest/references/
Instead of hand coding a single schema from all your schemata, you can create a small schema which refers to the other schema files. This way you can use multiple existing JSONschema files and validate against them in combination:
import yaml
import jsonschema
A_yaml = """
id: http://foo/a.json
type: object
properties:
prop:
$ref: "./num_string.json"
"""
num_string_yaml = """
id: http://foo/num_string.json
type: string
pattern: ^[0-9]*$
"""
A = yaml.load(A_yaml)
num_string = yaml.load(num_string_yaml)
resolver = jsonschema.RefResolver("",None,
store={
"http://foo/A.json":A,
"http://foo/num_string.json":num_string,
})
validator = jsonschema.Draft4Validator(
A, resolver=resolver)
try:
validator.validate({"prop":"1234"})
print "Properly accepted object"
except jsonschema.exceptions.ValidationError:
print "Failed to accept object"
try:
validator.validate({"prop":"12d34"})
print "Failed to reject object"
except jsonschema.exceptions.ValidationError:
print "Properly rejected object"
Note that you may want to combine the external using one of the schema cominators oneOf, allOf, or anyOf to combine your schemata like so:
[A.yaml]
oneOf:
- $ref: "sub-schema1.json"
- $ref: "sub-schema2.json"
- $ref: "sub-schema3.json"