Python: Finding element in json obj without iterating - python

Is it possible to check if particular element exists in json without iterating through it? For example, in the following json data, I want to check if appid with value 4000 exists. I need to process hundreds of similar json data set so it needs to be quick and efficient.
{
"response": {
"game_count": 62,
"games": [
{
"appid": 10,
"playtime_forever": 15
},
{
"appid": 20,
"playtime_forever": 0
},
...
{
"appid": 4000,
"playtime_2weeks": 104,
"playtime_forever": 21190
}
]
}
}

The relevant object is contained in an array, so no, it is not possible to find it without iteration. The data could be massaged to use the appid as the key for an object that contains the games as the value, but that requires additional preprocessing.
However, it could be possible to craft a parser such that the appropriate data can be extracted immediately upon parsing. This would piggyback the iteration within the parser itself instead of being explicit code after the fact. See the object_hook argument of the parser.

Related

When working with json why use json.loads?

This is not much an error I am having but I would like the reason behind the following:
For example in a tutorial page we have
json_string = """
{
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
"""
data = json.loads(json_string)
Which is ok, but my question is why all the bother to put the json as a string and then call json.loads when the same thing can be obtained by
otro={
"researcher": {
"name": "Ford Prefect",
"species": "Betelgeusian",
"relatives": [
{
"name": "Zaphod Beeblebrox",
"species": "Betelgeusian"
}
]
}
}
print(type(otro))
print(otro)
print(otro==data) #True
Because your second example is not JSON at all, that's Python. They have superficial similarities, but you are only confusing yourself by mixing them.
For example, the values None, True, and False are valid in Python but not in JSON, where they would be represented by null, true, and false, respectively. Another difference is in how Unicode characters are represented. Obviously there are also many Python constructs which cannot be represented in JSON at all.
Which to use in practice depends on your use case. If you are exercising or testing code which needs to work on actual JSON input, obviously pass it JSON, not something else. The example you are citing is obviously trying to demonstrate how to use JSON functions from Python, and the embedding of the example data in a string is just to make the example self-contained, where in reality you would probably be receiving the data from a file or network API.

How to add json objects from file to another json array of objects in another file

In *nix environment. I'm seeking a solution on how to add some (not quite so valid) json in a file, to another (valid) json file. Let me elaborate and also cover some failed attempts I've tried so far.
This will run in a shell script in a loop which will grow quite large. It's making an api call which can only return 1000 at a time. However, there are 70,000,000+ total records. So, I will have to make this api call 70,000 times in order to get all of the desired records. The original json file I want to keep, it includes information outside of the actual data I want, such as result info and success messages, etc. Each time I iterate and call the next set, I'm trying to strip out that information and just append the main data records to the main data records of the first set.
I'm already 99% there. I'm attempting this using jq sed and python. The body of the data records is not technically valid json. So jq is complaining because it can only append if valid data. My attempt looks like this jq --argjson results "$(<new.json)" '.result[] += [$results]' original.json. But if it would, then it would be valid json.
I've already used grep -n to abstract the line number of where I want to start appending the new sets of records to the first set of records. So I've been trying to use sed but can not figure out the right syntax. Though I feel I'm close. I've been trying something like sed -i -e $linenumber '<a few sed things here> new.json' original.json. But no success yet.
I've now tried to write a python script to do this. But I had never tried anything like this before. Just some string matching on readlines and string replacements. I didn't realize that there isn't a built in method for jumping to a specific line. I guess I could do some find statements to jump to that line in python but I've already done this in the bash script. Also, I realize I could read each line to memory in python but I fear that with this many records, it might get to be too much and become very slow.
I had some passing thoughts on trying some kind of head and tail and write in between since I know the exact line number. Any thoughts or solutions with any tools/languages are welcome. This is a devops project that is just to diagnose some logs, so I'm trying to not make this a full project, as once I produce the logs, I'll shift all my focus and efforts to running commands against this final produced json file and not really use this script ever again.
Example of original.json
{
"result": [
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
],
"result_info": {
"cursors": {
"after": "dlipU4c",
"before": "iLjx06u"
},
"scanned_range": {
"since": "date",
"until": "date"
}
},
"success": true,
"errors": [],
"messages": []
}
Example of new.json
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
},
{
"id": "5b5915f4cdb39c7b",
"kind": "foo",
"source": "bar",
"action": "baz",
"matches": [
{
"id": "b298ee91704b489b8119c1d604a8308d",
"source": "blah",
"action": "buzz"
}
],
"occurred_at": "date"
}
Don't worry about the indentation or missing trailing commas, I already have that figured out and confirmed working.
You can turn the invalid JSON from the API response into a valid array by wrapping it in [...]. The resulting array can be imported and added directly to the result array.
jq --argjson results "[$(<new.json)]" '.result += $results' original.json
So first to add results into new file create a json file with "[]" an empty array as its content, this is to make sure the file we load is valid json.
Next run the following command for each file as input
jq --argjson results "$(<new.json)" '.result | . += $results ' orig.json > new.json
Issue with your query was .result[] this return all the elements individually not as a json object in format
{}
{}
instead of
[
{},
{}
]
Based on the given new.json and its description, you seem to have comma-separated JSON objects with the JSON-separating commas on separate lines matching the regex '^}, *$'
If that's the case, the good news is you can achieve the result you want by simply removing the superfluous commas with:
sed 's/^}, *$/}/' new.txt
This produces a stream of objects, which can then be processed in any one of several well-known ways (e.g. by "slurping" it using the -s option).
"XY problem"?
In a side-comment, you wrote:
I did fix this with sed to add the commas, which are included in the question.
So it is beginning to sound as if the Q as posted is really a so-called "XY" problem. Anyway, if you were starting with a stream of JSON objects, then of course there would be need to add the commas and deal with the consequences.

Troubleshoot JSON Parsing/Adding Property

I have a json whose first few lines are:
{
"type": "Topology",
"objects": {
"counties": {
"type": "GeometryCollection",
"bbox": [-179.1473399999999, 17.67439566600018, 179.7784800000003, 71.38921046500008],
"geometries": [{
"type": "MultiPolygon",
"id": 53073,
"arcs": [
[
[0, 1, 2]
]
]
},
I built a python dictionary from that data as follows:
import json
with open('us.json') as f:
data = json.load(f)
It's a very long json (each county in the US). Yet when I run: len(data) it returns 4. I was a bit confused by that. So I set out to probe further and explore the data:
data['id']
data['geometry']
both of which return key errors. Yet I know that this json file is defined for those properties. In fact, that's all the json is, its the id for each county 'id' and a series of polygon coordinates for each county 'geometry'. Entering data does indeed return the whole json, and I can see the properties that way, but that doesn't help much.
My ultimate aim is to add a property to the json file, somewhat similar to this:
Add element to a json in python
The difference is I'm adding a property that is from a tsv. If you'd like all the details you may find my json and tsv here:
https://gist.github.com/diggetybo/ca9d3c2fed76ddc7185cf966a65b8718
For clarity, let me summarize what I'm asking:
My question is: Why can't I access the properties in the above way? Can someone provide a way to access the properties I'm interested in ('id','geometries') Or better yet, demonstrate how to add a property?
Thank you
json.load
Deserialize fp (a .read()-supporting file-like object containing a
JSON document) to a Python object using this conversion table.
[] are for lists and {} are for dictionaries.So this is an example to get id:
with open("us.json") as f:
c=json.load(f)
for i in c["objects"]["counties"]["geometries"]:
print i["id"]
And the structure of your data is like this:
{
"type":"xx",
"objects":"xx",
"arcs":"xx",
"transform":"xx"
}
So the length of data is 4.You can append data or add a new element just like using list and dict.See more details from Json.
Hope this helps.

Trying to convert a CSV into JSON in python for posting to REST API

I've got the following data in a CSV file (a few hundred lines) that I'm trying to massage into sensible JSON to post into a rest api
I've gone with the bare minimum fields required, but here's what I've got:
dateAsked,author,title,body,answers.author,answers.body,topics.name,answers.accepted
13-Jan-16,Ben,Cant set a channel ,"Has anyone had any issues setting channels. it stays at �0�. It actually tells me there are �0� files.",Silvio,"I�m not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port. maybe �extended� clip names over 32 characters.
Please let me know if you find out!
Best regards.",club_k,TRUE
Here's a sample of JSON that is roughly like where I need to get to:
json_test = """{
"title": "Can I answer a question?",
"body": "Some text for the question",
"author": "Silvio",
"topics": [
{
"name": "club_k"
}
],
"answers": [
{
"author": "john",
"body": "I\'m not sure. I think you can leave the cable out. Please let me know if you find out! Best regards.",
"accepted": "true"
}
]
}"""
Pandas seems to import it into a dataframe okay (ish) but keeps telling me I can't serialize it to json - also need to clean it and sanitise, but that should be fairly easy to achieve within the script.
There must also be a way to do this in Pandas, but I'm beating my head against a wall here - as the columns for both answers and topics can't easily be merged together into a dict or a list in python.
You can use a csv.DictReader to process the CSV file as a dictionary for each row. Using the field names as keys, a new dictionary can be constructed that groups common keys into a nested dictionary keyed by the part of the field name after the .. The nested dictionary is held within a list, although it is unclear whether that is really necessary - the nested dictionary could probably be placed immediately under the top-level without requiring a list. Here's the code to do it:
import csv
import json
json_data = []
for row in csv.DictReader(open('/tmp/data.csv')):
data = {}
for field in row:
key, _, sub_key = field.partition('.')
if not sub_key:
data[key] = row[field]
else:
if key not in data:
data[key] = [{}]
data[key][0][sub_key] = row[field]
# print(json.dumps(data, indent=True))
# print('---------------------------')
json_data.append(json.dumps(data))
For your data, with the print() statements enabled, the output would be:
{
"body": "Has anyone had any issues setting channels. it stays at '0'. It actually tells me there are '0' files.",
"author": "Ben",
"topics": [
{
"name": "club_k"
}
],
"title": "Cant set a channel ",
"answers": [
{
"body": "I'm not sure. I think you can leave the cable out, because the control works. But you could try and switch two port and see if problem follows the serial port. maybe 'extended' clip names over 32 characters. \nPlease let me know if you find out!\n Best regards.",
"accepted ": "TRUE",
"author": "Silvio"
}
],
"dateAsked": "13-Jan-16"
}
---------------------------

Storing sequence Information in JSON

I want to store some sequence information in a JSON. For example, I want to store a variable value which can have following values:
some_random_string_2
some_random_string_3
some_random_string_4
...
To do so, I have tried using the following format:
json_obj = {
"k1": {
"nk1": "some_random_string_{$1}"
"patterns": {
"p1": {
"pattern": "[2-9]|[1-9]\d+",
"symbol_type": "int",
"start_symbol": 2,
"step": 1
}
}
}
}
Above json contains regex pattern for variable string, its type, start symbol and step. But, it seems unnecessarily complicated and difficult to generate sequence from.
Is there some simpler way to store this sequence information so that its easier to generate the sequence while parsing?
Currently, I don't have exhaustive patterns list, so we'll have to assume, that it can be anything that can be written as a regular exp. On a side note, I'll be using python to parse this json and generate a sequence.

Categories

Resources