From JSON to JSON-LD without changing the source - python

There are 'duplicates' to my question but they don't answer my question.
Considering the following JSON-LD example as described in paragraph 6.13 - Named Graphs from http://www.w3.org/TR/json-ld/:
{
"#context": {
"generatedAt": {
"#id": "http://www.w3.org/ns/prov#generatedAtTime",
"#type": "http://www.w3.org/2001/XMLSchema#date"
},
"Person": "http://xmlns.com/foaf/0.1/Person",
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"#id": "http://example.org/graphs/73",
"generatedAt": "2012-04-09",
"#graph":
[
{
"#id": "http://manu.sporny.org/about#manu",
"#type": "Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"#id": "http://greggkellogg.net/foaf#me",
"#type": "Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}
]
}
Question:
What if you start with only the JSON part without the semantic layer:
[{
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}]
and you link the #context from a separate file or location using a http link header or rdflib parsing, then you are still left without the #id and #type in the rest of the document. Injecting those missing keys-values into the json string is not a clean option. The idea is to go from JSON to JSON-LD without changing the original JSON part.
The way I see it to define a triple subject, one has to use an #id to map tot an IRI. It's very unlikely that JSON data has the #id key-values. So does this mean all JSON files cannot be parsed as JSON-LD without add the keys first? I wonder how they do it.
Does someone have an idea to point me in the right direction?
Thank you.

No, unfortunately that's not possible. There exist, however, libraries and tools that have been created exactly for that reason. JSON-LD Macros is such a library. It allows declarative transformations of JSON objects to make them usable as JSON-LD. So, effectively, all you need is a very thin layer on top of an off-the-shelve JSON-LD processor.

Related

When working with json why use json.loads?

This is not much an error I am having but I would like the reason behind the following:
For example in a tutorial page we have
json_string = """
{
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
"""
data = json.loads(json_string)
Which is ok, but my question is why all the bother to put the json as a string and then call json.loads when the same thing can be obtained by
otro={
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
print(type(otro))
print(otro)
print(otro==data) #True
Because your second example is not JSON at all, that's Python. They have superficial similarities, but you are only confusing yourself by mixing them.
For example, the values None, True, and False are valid in Python but not in JSON, where they would be represented by null, true, and false, respectively. Another difference is in how Unicode characters are represented. Obviously there are also many Python constructs which cannot be represented in JSON at all.
Which to use in practice depends on your use case. If you are exercising or testing code which needs to work on actual JSON input, obviously pass it JSON, not something else. The example you are citing is obviously trying to demonstrate how to use JSON functions from Python, and the embedding of the example data in a string is just to make the example self-contained, where in reality you would probably be receiving the data from a file or network API.

Adding multiple components using the Jira Rest API

I'm trying to create an issue using the Jira Rest API but facing some roadblocks when trying to add more than one component when creating an issue.
def create_issue(self, summary, description, priority, issue_type, component, assignee, epic_name):
""" Creates a new issue with the given parameters.
issue_data = {
"fields": {
"project": {"key": self.project},
"summary": summary,
"description": description,
"priority": {"name": priority},
"issuetype": {"name": issue_type},
"components": {"name": component},
"assignee": {"name": assignee}
}
}
Just thinking out loud, if my component parameter is a list then each element can be added like
"components": [{"name": component[0]},{"name": component[1]}]
but then how do I iterate thought the list in the json object. Tried using a for loop but wasn't able to implement it properly. Any help or alternative approach to solve this would be appreciated. Thanks.

How to add json objects from file to another json array of objects in another file

In *nix environment. I'm seeking a solution on how to add some (not quite so valid) json in a file, to another (valid) json file. Let me elaborate and also cover some failed attempts I've tried so far.
This will run in a shell script in a loop which will grow quite large. It's making an api call which can only return 1000 at a time. However, there are 70,000,000+ total records. So, I will have to make this api call 70,000 times in order to get all of the desired records. The original json file I want to keep, it includes information outside of the actual data I want, such as result info and success messages, etc. Each time I iterate and call the next set, I'm trying to strip out that information and just append the main data records to the main data records of the first set.
I'm already 99% there. I'm attempting this using jq sed and python. The body of the data records is not technically valid json. So jq is complaining because it can only append if valid data. My attempt looks like this jq --argjson results "$(<new.json)" '.result[] += [$results]' original.json. But if it would, then it would be valid json.
I've already used grep -n to abstract the line number of where I want to start appending the new sets of records to the first set of records. So I've been trying to use sed but can not figure out the right syntax. Though I feel I'm close. I've been trying something like sed -i -e $linenumber '<a few sed things here> new.json' original.json. But no success yet.
I've now tried to write a python script to do this. But I had never tried anything like this before. Just some string matching on readlines and string replacements. I didn't realize that there isn't a built in method for jumping to a specific line. I guess I could do some find statements to jump to that line in python but I've already done this in the bash script. Also, I realize I could read each line to memory in python but I fear that with this many records, it might get to be too much and become very slow.
I had some passing thoughts on trying some kind of head and tail and write in between since I know the exact line number. Any thoughts or solutions with any tools/languages are welcome. This is a devops project that is just to diagnose some logs, so I'm trying to not make this a full project, as once I produce the logs, I'll shift all my focus and efforts to running commands against this final produced json file and not really use this script ever again.
Example of original.json
{
"result": [
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
],
"result_info": {
"cursors": {
"after": "dlipU4c",
"before": "iLjx06u"
},
"scanned_range": {
"since": "date",
"until": "date"
}
},
"success": true,
"errors": [],
"messages": []
}
Example of new.json
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
Don't worry about the indentation or missing trailing commas, I already have that figured out and confirmed working.
You can turn the invalid JSON from the API response into a valid array by wrapping it in [...]. The resulting array can be imported and added directly to the result array.
jq --argjson results "[$(<new.json)]" '.result += $results' original.json
So first to add results into new file create a json file with "[]" an empty array as its content, this is to make sure the file we load is valid json.
Next run the following command for each file as input
jq --argjson results "$(<new.json)" '.result | . += $results ' orig.json > new.json
Issue with your query was .result[] this return all the elements individually not as a json object in format
{}
{}
instead of
[
{},
{}
]
Based on the given new.json and its description, you seem to have comma-separated JSON objects with the JSON-separating commas on separate lines matching the regex '^}, *$'
If that's the case, the good news is you can achieve the result you want by simply removing the superfluous commas with:
sed 's/^}, *$/}/' new.txt
This produces a stream of objects, which can then be processed in any one of several well-known ways (e.g. by "slurping" it using the -s option).
"XY problem"?
In a side-comment, you wrote:
I did fix this with sed to add the commas, which are included in the question.
So it is beginning to sound as if the Q as posted is really a so-called "XY" problem. Anyway, if you were starting with a stream of JSON objects, then of course there would be need to add the commas and deal with the consequences.

Google Docs API programmatically adding a table of content

I have a python script which does some analysis and output the results as text (paragraphs) on a Google Doc. I know how to insert text, update paragraph and text style through batchUpdate.
doc_service.documents().batchUpdate(documentId=<ID>,body={'requests': <my_request>}).execute()
where, for instance, "my_request" takes the form of something like:
request = [
{
"insertText": {
"location": {
"index": <index_position>,
"segmentId": <id>
},
"text": <text>
}
},
{
"updateParagraphStyle": {
"paragraphStyle": {
"namedStyleType": <paragraph_type>
},
"range": {
"segmentId": <id>,
"startIndex": <index_position>,
"endIndex": <index_position>
},
"fields": "namedStyleType"
}
},
]
However, once the script is done updating the table, it would be fantastic if a table of content could be added at the top of the document.
However, I am very new to Google Docs API and I am not entirely sure how to do that. I know I should use "TableOfContents" as a StructuralElement. I also know this option currently does not update automatically after each modification brought to the document (this is why I would like to create it AFTER the document has finished updating and place it at the top of the document).
How to do this with python? I am unclear where to call "TableOfContents" in my request.
Thank you so very much!
After your comment, I was able to understand better what you are desiring to do, but I came across these two Issue Tracker's posts:
Add the ability to generate and update the TOC of a doc.
Geting a link to heading paragraph.
These are well-known feature requests that unfortunately haven't been implemented yet. You can hit the ☆ next to the issue number in the top left on this page as it lets Google know more people are encountering this and so it is more likely to be seen faster.
Therefore, it's not possible to insert/update a table of contents programmatically.

Request array of json documents (disable item reference) from MongoDB using python eve

Using Python eve framework, Is there any way to get response shown in first json type which is array of objects like shown in example?. I have tried to disable HATEOAS like it says here. Some View Applications use direct fetching on model and collections based on it, such as Backbone NodeJS data handler.
[
{
"_id": "526c0e21977a67d6966dc763",
"question": "1",
"uk": "I heard a bloke on the train say that tomorrow's trains will be delayed.",
"us": "I heard a guy on the train say that tomorrow's trains will be delayed."
},
{
"_id": "526c0e21977a67d6966dc764",
"question": "2",
"uk": "Tom went outside for a fag. I think he smokes too much!",
"us": "Tom went outside for a cigarette. I think he smokes too much!"
}
]
Instead of returning the JSON object with _items key like it shows:
{
"_items":[
{
"_id": "526c0e21977a67d6966dc763",
"question": "1",
"uk": "I heard a bloke on the train",
"us": "I heard a guy on the train"
},
{
"_id": "526c0e21977a67d6966dc764",
"question": "2",
"uk": "Tom went outside for a fag. I think he smokes too much!",
"us": "Tom went outside for a cigarette. I think he smokes too much!"
}
]
}
This is currently not possible, as the response payload is built as a dictionary in which several keys might appear (pagination data, HATOEAS links, and actual documents).
In theory we could add a new configuration option which would switch to a list-formatted (and simplified) layout. Should consider all the consequences though, so no promises, but consider opening a ticket.

Categories

Resources