JSON schema nesting based on linked data

JSON schema nesting based on linked data - python

I've dataset pulled from a linked data platform.
The dataset looks like this:
label
relationClass
Organization
Department
Department
Employee
I want to create a JSON Schema based on this data where the hierarchy between objects is nested.
The decomposition of the hierarchy look something like this:
Organization
Department
Employee
Eventually the parsing should result in a JSON Schema looking like this:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"organization": {
"type": "object",
"properties": {
"department": {
"type": "object",
"properties": {
"employee": {
"type": "object"
}
}
}
}
}
}
}
Can someone help out with the most efficient way to achieve this?

It looks like a classical tree structure. For optimal performance you'd go over it once and build a tree/directed graph from it, then recursively traverse in preorder to create all the children of the nodes as object.
Searching for 'build tree from list of pairs' yielded the following SO question with a working answer: Given a flat list of (parent,child), create a hierarchical dictionary tree

Related

Adding multiple components using the Jira Rest API

I'm trying to create an issue using the Jira Rest API but facing some roadblocks when trying to add more than one component when creating an issue.
def create_issue(self, summary, description, priority, issue_type, component, assignee, epic_name):
""" Creates a new issue with the given parameters.
issue_data = {
"fields": {
"project": {"key": self.project},
"summary": summary,
"description": description,
"priority": {"name": priority},
"issuetype": {"name": issue_type},
"components": {"name": component},
"assignee": {"name": assignee}
}
}
Just thinking out loud, if my component parameter is a list then each element can be added like
"components": [{"name": component[0]},{"name": component[1]}]
but then how do I iterate thought the list in the json object. Tried using a for loop but wasn't able to implement it properly. Any help or alternative approach to solve this would be appreciated. Thanks.

Apache Spark / PySpark, defining custom JSON Schema for Dynamic Keys

I have a bunch of JSON files, and suppose each have the following structure:
{
"fields": {
"name": "Bob",
"key": "bob"
},
"results": {
"bob": { ... }
}
}
Where by some unfortunate reason, while the structure of the JSON is fairly consistent, there is one dynamic key under "results". Defining the schema for under the fields is fairly straight-forward to me.
So, for several JSON files, the final schema might be:
fieldSchema = StructField(...)
resultSchema = StructField("results", StructType([StructField("bob", ...)]))
finalSchema = StructType([fieldSchema, resultsSchema])
Where the problem is this line: StructField("bob", ...)
Obviously, bob is not the key I'm looking for. This name for the StructField would ideally be some kind of wildcard character, regex pattern, or worst case, some dynamic field based on other fields.
I'm a newbie to Spark and have been scouring the documentation and historical StackOverflow posts, but I've been unable to find anything.
Long story short, I want to be able to pass some kind of wide net for the name parameter in StructField to encompass a variety of different keys, similar to a regex pattern.

Mongodb can I structure this data

Basically I am designing and developing an application in Python that each night executes and takes a website and a list of keywords and queries the Google API to obtain their position given a specific keyword.
I want to use a none sql approach and using objects that Mongodb offers this seems like the best approach however I'm confused about how to structure the data inside the database.
Each night new data will be generated this will contain 50 keywords and their positions this I presume will be stored inside its own object and will be able to be identified by a specific url.
So therefore will it be possible to query the database given a url and use a data range of say the past 30 days or 60 days? I'm confused if I will be able to fetch all of objects back

The main requirement for that structure will be ability to query on daily basis.
so let say we have a website www.stackoverflow.com and our X keywords.
The basic document shape could look like that:
{
_id : objectId, // this have timestamp
www : "www.stackoverflow.com",
rankings : [{
"key1" : "val1"
}, {
"key2" : "val2"
}
],
}
then, if we want to see a ranking history per key1, we can use aggregation framework to query:
db.ranking.aggregate(
[{
$unwind : "$rankings"
}, {
$match : {
"rankings.key1" : { $exists : true}
}
}
])
and response will be similar to:
{
"_id" : ObjectId("584dbe04f4ce077869fee3dc"),
"www" : "www.stackoverflow.com",
"rankings" : {
"key1" : "val1"
}
},
{
"_id" : ObjectId("584dbe07f4ce077869fee3dd"),
"www" : "www.stackoverflow.com",
"rankings" : {
"key1" : "val1"
}
}
seek more about grouping in aggregation framework to uncover power of mongo!

Request array of json documents (disable item reference) from MongoDB using python eve

Using Python eve framework, Is there any way to get response shown in first json type which is array of objects like shown in example?. I have tried to disable HATEOAS like it says here. Some View Applications use direct fetching on model and collections based on it, such as Backbone NodeJS data handler.
[
{
"_id": "526c0e21977a67d6966dc763",
"question": "1",
"uk": "I heard a bloke on the train say that tomorrow's trains will be delayed.",
"us": "I heard a guy on the train say that tomorrow's trains will be delayed."
},
{
"_id": "526c0e21977a67d6966dc764",
"question": "2",
"uk": "Tom went outside for a fag. I think he smokes too much!",
"us": "Tom went outside for a cigarette. I think he smokes too much!"
}
]
Instead of returning the JSON object with _items key like it shows:
{
"_items":[
{
"_id": "526c0e21977a67d6966dc763",
"question": "1",
"uk": "I heard a bloke on the train",
"us": "I heard a guy on the train"
},
{
"_id": "526c0e21977a67d6966dc764",
"question": "2",
"uk": "Tom went outside for a fag. I think he smokes too much!",
"us": "Tom went outside for a cigarette. I think he smokes too much!"
}
]
}

This is currently not possible, as the response payload is built as a dictionary in which several keys might appear (pagination data, HATOEAS links, and actual documents).
In theory we could add a new configuration option which would switch to a list-formatted (and simplified) layout. Should consider all the consequences though, so no promises, but consider opening a ticket.

From JSON to JSON-LD without changing the source

There are 'duplicates' to my question but they don't answer my question.
Considering the following JSON-LD example as described in paragraph 6.13 - Named Graphs from http://www.w3.org/TR/json-ld/:
{
"#context": {
"generatedAt": {
"#id": "http://www.w3.org/ns/prov#generatedAtTime",
"#type": "http://www.w3.org/2001/XMLSchema#date"
},
"Person": "http://xmlns.com/foaf/0.1/Person",
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"#id": "http://example.org/graphs/73",
"generatedAt": "2012-04-09",
"#graph":
[
{
"#id": "http://manu.sporny.org/about#manu",
"#type": "Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"#id": "http://greggkellogg.net/foaf#me",
"#type": "Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}
]
}
Question:
What if you start with only the JSON part without the semantic layer:
[{
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}]
and you link the #context from a separate file or location using a http link header or rdflib parsing, then you are still left without the #id and #type in the rest of the document. Injecting those missing keys-values into the json string is not a clean option. The idea is to go from JSON to JSON-LD without changing the original JSON part.
The way I see it to define a triple subject, one has to use an #id to map tot an IRI. It's very unlikely that JSON data has the #id key-values. So does this mean all JSON files cannot be parsed as JSON-LD without add the keys first? I wonder how they do it.
Does someone have an idea to point me in the right direction?
Thank you.

No, unfortunately that's not possible. There exist, however, libraries and tools that have been created exactly for that reason. JSON-LD Macros is such a library. It allows declarative transformations of JSON objects to make them usable as JSON-LD. So, effectively, all you need is a very thin layer on top of an off-the-shelve JSON-LD processor.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

JSON schema nesting based on linked data - python

Related

Adding multiple components using the Jira Rest API

Apache Spark / PySpark, defining custom JSON Schema for Dynamic Keys

Mongodb can I structure this data

Request array of json documents (disable item reference) from MongoDB using python eve

From JSON to JSON-LD without changing the source

Categories

Resources