When working with json why use json.loads? - python

This is not much an error I am having but I would like the reason behind the following:
For example in a tutorial page we have
json_string = """
{
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
"""
data = json.loads(json_string)
Which is ok, but my question is why all the bother to put the json as a string and then call json.loads when the same thing can be obtained by
otro={
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
print(type(otro))
print(otro)
print(otro==data) #True

Because your second example is not JSON at all, that's Python. They have superficial similarities, but you are only confusing yourself by mixing them.
For example, the values None, True, and False are valid in Python but not in JSON, where they would be represented by null, true, and false, respectively. Another difference is in how Unicode characters are represented. Obviously there are also many Python constructs which cannot be represented in JSON at all.
Which to use in practice depends on your use case. If you are exercising or testing code which needs to work on actual JSON input, obviously pass it JSON, not something else. The example you are citing is obviously trying to demonstrate how to use JSON functions from Python, and the embedding of the example data in a string is just to make the example self-contained, where in reality you would probably be receiving the data from a file or network API.

Related

How to add json objects from file to another json array of objects in another file

In *nix environment. I'm seeking a solution on how to add some (not quite so valid) json in a file, to another (valid) json file. Let me elaborate and also cover some failed attempts I've tried so far.
This will run in a shell script in a loop which will grow quite large. It's making an api call which can only return 1000 at a time. However, there are 70,000,000+ total records. So, I will have to make this api call 70,000 times in order to get all of the desired records. The original json file I want to keep, it includes information outside of the actual data I want, such as result info and success messages, etc. Each time I iterate and call the next set, I'm trying to strip out that information and just append the main data records to the main data records of the first set.
I'm already 99% there. I'm attempting this using jq sed and python. The body of the data records is not technically valid json. So jq is complaining because it can only append if valid data. My attempt looks like this jq --argjson results "$(<new.json)" '.result[] += [$results]' original.json. But if it would, then it would be valid json.
I've already used grep -n to abstract the line number of where I want to start appending the new sets of records to the first set of records. So I've been trying to use sed but can not figure out the right syntax. Though I feel I'm close. I've been trying something like sed -i -e $linenumber '<a few sed things here> new.json' original.json. But no success yet.
I've now tried to write a python script to do this. But I had never tried anything like this before. Just some string matching on readlines and string replacements. I didn't realize that there isn't a built in method for jumping to a specific line. I guess I could do some find statements to jump to that line in python but I've already done this in the bash script. Also, I realize I could read each line to memory in python but I fear that with this many records, it might get to be too much and become very slow.
I had some passing thoughts on trying some kind of head and tail and write in between since I know the exact line number. Any thoughts or solutions with any tools/languages are welcome. This is a devops project that is just to diagnose some logs, so I'm trying to not make this a full project, as once I produce the logs, I'll shift all my focus and efforts to running commands against this final produced json file and not really use this script ever again.
Example of original.json
{
"result": [
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
],
"result_info": {
"cursors": {
"after": "dlipU4c",
"before": "iLjx06u"
},
"scanned_range": {
"since": "date",
"until": "date"
}
},
"success": true,
"errors": [],
"messages": []
}
Example of new.json
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
Don't worry about the indentation or missing trailing commas, I already have that figured out and confirmed working.
You can turn the invalid JSON from the API response into a valid array by wrapping it in [...]. The resulting array can be imported and added directly to the result array.
jq --argjson results "[$(<new.json)]" '.result += $results' original.json
So first to add results into new file create a json file with "[]" an empty array as its content, this is to make sure the file we load is valid json.
Next run the following command for each file as input
jq --argjson results "$(<new.json)" '.result | . += $results ' orig.json > new.json
Issue with your query was .result[] this return all the elements individually not as a json object in format
{}
{}
instead of
[
{},
{}
]
Based on the given new.json and its description, you seem to have comma-separated JSON objects with the JSON-separating commas on separate lines matching the regex '^}, *$'
If that's the case, the good news is you can achieve the result you want by simply removing the superfluous commas with:
sed 's/^}, *$/}/' new.txt
This produces a stream of objects, which can then be processed in any one of several well-known ways (e.g. by "slurping" it using the -s option).
"XY problem"?
In a side-comment, you wrote:
I did fix this with sed to add the commas, which are included in the question.
So it is beginning to sound as if the Q as posted is really a so-called "XY" problem. Anyway, if you were starting with a stream of JSON objects, then of course there would be need to add the commas and deal with the consequences.

How do I get all first objects of my json using jq? [duplicate]

This question already has answers here:
How to get key names from JSON using jq
(9 answers)
Closed 4 years ago.
I would like to get first objects (don't know if it's the right name) of my json file that is huge (more than 120k lines), so I can't parse it manually.
Format is like this :
"datanode": [
{
"isWhitelisted": true,
"metricname": "write_time",
"seriesStartTime": 1542037566944,
"supportsAggregation": true
},
{
"isWhitelisted": true,
"metricname": "dfs.datanode.CacheReportsNumOps",
"seriesStartTime": 1542037501137,
"supportsAggregation": true,
"type": "COUNTER"
},
{
"isWhitelisted": true,
"metricname": "FSDatasetState.org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.EstimatedCapacityLostTotal",
"seriesStartTime": 1542037495521,
"supportsAggregation": true,
"type": "GAUGE"
},
],
"toto": [
....
And what I need is to extract this : datanode, toto, etc. Only the name.
Can you help me please ?
I tried using jq without success.
You can use jq's keys functionality
jq 'keys' file.json
In the future try to improve on which words you use to describe the different parts the json data. You asked about objects in the text, but actually refer to the keys.
A more fitting title for the question would have been: "How to get all top level keys of json data using jq?" And with this, more correct, wording you find already answered questions like this one: How to get key names from JSON using jq
Also provide a complete and valid example structure and the expected result like this:
{
"one_key": {
"foo": "bar"
},
"another_one": {
"bla": "bla"
}
}
And desired result:
[
"another_one",
"one_key"
]

Storing sequence Information in JSON

I want to store some sequence information in a JSON. For example, I want to store a variable value which can have following values:
some_random_string_2
some_random_string_3
some_random_string_4
...
To do so, I have tried using the following format:
json_obj = {
"k1": {
"nk1": "some_random_string_{$1}"
"patterns": {
"p1": {
"pattern": "[2-9]|[1-9]\d+",
"symbol_type": "int",
"start_symbol": 2,
"step": 1
}
}
}
}
Above json contains regex pattern for variable string, its type, start symbol and step. But, it seems unnecessarily complicated and difficult to generate sequence from.
Is there some simpler way to store this sequence information so that its easier to generate the sequence while parsing?
Currently, I don't have exhaustive patterns list, so we'll have to assume, that it can be anything that can be written as a regular exp. On a side note, I'll be using python to parse this json and generate a sequence.

From JSON to JSON-LD without changing the source

There are 'duplicates' to my question but they don't answer my question.
Considering the following JSON-LD example as described in paragraph 6.13 - Named Graphs from http://www.w3.org/TR/json-ld/:
{
"#context": {
"generatedAt": {
"#id": "http://www.w3.org/ns/prov#generatedAtTime",
"#type": "http://www.w3.org/2001/XMLSchema#date"
},
"Person": "http://xmlns.com/foaf/0.1/Person",
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"#id": "http://example.org/graphs/73",
"generatedAt": "2012-04-09",
"#graph":
[
{
"#id": "http://manu.sporny.org/about#manu",
"#type": "Person",
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"#id": "http://greggkellogg.net/foaf#me",
"#type": "Person",
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}
]
}
Question:
What if you start with only the JSON part without the semantic layer:
[{
"name": "Manu Sporny",
"knows": "http://greggkellogg.net/foaf#me"
},
{
"name": "Gregg Kellogg",
"knows": "http://manu.sporny.org/about#manu"
}]
and you link the #context from a separate file or location using a http link header or rdflib parsing, then you are still left without the #id and #type in the rest of the document. Injecting those missing keys-values into the json string is not a clean option. The idea is to go from JSON to JSON-LD without changing the original JSON part.
The way I see it to define a triple subject, one has to use an #id to map tot an IRI. It's very unlikely that JSON data has the #id key-values. So does this mean all JSON files cannot be parsed as JSON-LD without add the keys first? I wonder how they do it.
Does someone have an idea to point me in the right direction?
Thank you.
No, unfortunately that's not possible. There exist, however, libraries and tools that have been created exactly for that reason. JSON-LD Macros is such a library. It allows declarative transformations of JSON objects to make them usable as JSON-LD. So, effectively, all you need is a very thin layer on top of an off-the-shelve JSON-LD processor.

Python: Finding element in json obj without iterating

Is it possible to check if particular element exists in json without iterating through it? For example, in the following json data, I want to check if appid with value 4000 exists. I need to process hundreds of similar json data set so it needs to be quick and efficient.
{
"response": {
"game_count": 62,
"games": [
{
"appid": 10,
"playtime_forever": 15
},
{
"appid": 20,
"playtime_forever": 0
},
...
{
"appid": 4000,
"playtime_2weeks": 104,
"playtime_forever": 21190
}
]
}
}
The relevant object is contained in an array, so no, it is not possible to find it without iteration. The data could be massaged to use the appid as the key for an object that contains the games as the value, but that requires additional preprocessing.
However, it could be possible to craft a parser such that the appropriate data can be extracted immediately upon parsing. This would piggyback the iteration within the parser itself instead of being explicit code after the fact. See the object_hook argument of the parser.

Categories

Resources