Trying to get data from JSON file in Python

Trying to get data from JSON file in Python - python

I have a JSON file that looks like the following (not the whole file)
"route-information" : [
{
"attributes" : {"xmlns" : "http://xml.juniper.net"},
"route-table" : [
{
"comment" : "keepalive",
"table-name" : [
{
"data" : "inet"
}
],
"destination-count" : [
{
"data" : "24324"
}
],
"total-route-count" : [
{
"data" : "47432"
}
],
"active-route-count" : [
{
"data" : "43252"
}
],
"holddown-route-count" : [
{
"data" : "0"
}
],
"hidden-route-count" : [
{
"data" : "1"
}
],
I am trying to access the 'comment' part by using python.
So far I have this:
import json
# read file
with open('route-table.json') as file:
data = json.load(file)
print(data["route-information"]["route-table"]["comment"])
Whenever I run this part I get the following error and I can't seem to fix it.
Traceback (most recent call last):
File "json_test.py", line 7, in <module>
print(data["route-information"]["route-table"]["comment"])
TypeError: list indices must be integers or slices, not str

data["route-information"] is a list, so you can do data["route-information"][0] to access the dict inside, same with data["route-information"][0]["route-table"]:
print(data["route-information"][0]["route-table"][0]["comment"])
If you intend to use data later and are okay with changing it's structure, you can replace the lists with their first elements (assuming they only have one element) so that you won't have to use the [0] notation every time you need to access the dicts:
data["route-information"] = data["route-information"][0]
data["route-information"]["route-table"] = data["route-information"]["route-table"][0]
print(data["route-information"]["route-table"]["comment"])

MrGeek is correct. Just to add more info
[] are for JSON arrays, which are called list in Python
{} are for JSON objects, which are called dict in Python
To get the value we do as follows:
data["route-information"][0]["route-table"][0]["comment"]
data["attributes"]["xmlns"]

Related

Python: Get all values of a specific key from json file

Im getting the json data from a file:
"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
here's my initial code:
def get_names():
students = open('students.json')
data = json.load(students)
I want to get the values of all names
[ben,sam]

you need to extract the names from the students list.
data = {"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
names = [each_student['name'] for each_student in data['students']]
print(names) #['ben', 'sam']

Try using a list comprehension:
>>> [dct['name'] for dct in data['students']]
['ben', 'sam']
>>>

import json
with open('./students.json', 'r') as students_file:
students_content = json.load(students_file)
print([student['name'] for student in students_content['students']]) # ['ben', 'sam']

JSON's load function from the docs:
Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object...
The JSON file in students.json will look like:
{
"students": [
{
"name" : "ben",
"age" : 15
},
{
"name" : "sam",
"age" : 14
}
]
}
The JSON load function can then be used to deserialize this JSON object in the file to a Python dictionary:
import json
# use with context manager to ensure the file closes properly
with open('students.json', 'rb')as students_fp:
data = json.load(students_fp)
print(type(data)) # dict i.e. a Python dictionary
# list comprehension to take the name of each student
names = [student['name'] for student in data['students']]
Where names now contains the desired:
["ben", "sam"]

Reading nested dictionary-like txt file into a Pandas dataframe

Sort of a new python guy here and haven't had much success with the following.
I have a txt file with data formatted as follows:
{
"$type" : "TableInstance",
"$version" : 1,
"Instance" : "InstanceName",
"ColumnAliases" : [ "", "", ],
"ColumnNames" : [ "keyName", "dateName"],
"ColumnData" : [ {
"type" : "ColumnData1",
"Strings" : [key1, key2],]
}, {
"type" : "ColumnData2",
"Strings" : [date1, date2]}]
}
That I would like to read into a dataframe such that it is formatted as:
[ keyName dateName
key1 date1
key2 date1 ]
Is there a simple way to do this?

does this work for you?
dict = {
"$type" : "TableInstance",
"$version" : 1,
"Instance" : "InstanceName",
"ColumnAliases" : [ "", "", ],
"ColumnNames" : [ "keyName", "dateName"],
"ColumnData" : [ {
"type" : "ColumnData1",
"Strings" : ['key1', 'key2']
}, {
"type" : "ColumnData2",
"Strings" : ['date1', 'date2']}]
}
df = pd.DataFrame({dict['ColumnNames'][0]:dict['ColumnData'][0]['Strings'], dict['ColumnNames'][1]:dict['ColumnData'][1]['Strings']})

It looks it that you stored the serialized python object in the file. Hence, you can deserialize the Python object by the help of pickle, then you can parse the object based on your requirements.
import pickle
import pandas as pd
filePath = 'test.txt'
obj = pd.read_pickle(filePath)
#obj = pickle.load(open(filePath, "rb"))
df = pd.DataFrame({obj['ColumnNames'][0]:obj['ColumnData'][0]['Strings'], obj['ColumnNames'][1]:obj['ColumnData'][1]['Strings']})

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Sample JSON file below
{
"destination_addresses" : [ "New York, NY, USA" ],
"origin_addresses" : [ "Washington, DC, USA" ],
"rows" : [
{
"elements" : [
{
"distance" : {
"text" : "225 mi",
"value" : 361715
},
"duration" : {
"text" : "3 hours 49 mins",
"value" : 13725
},
"status" : "OK"
}
]
}
],
"status" : "OK"
}
I'm looking to reference the text value for distance and duration. I've done research but i'm still not sure what i'm doing wrong...
I have a work around using several lines of code, but i'm looking for a clean one line solution..
thanks for your help!

If you're using the regular JSON module:
import json
And you're opening your JSON like this:
json_data = open("my_json.json").read()
data = json.loads(json_data)
# Equivalent to:
data = json.load(open("my_json.json"))
# Notice json.load vs. json.loads
Then this should do what you want:
distance_text, duration_text = [data['rows'][0]['elements'][0][key]['text'] for key in ['distance', 'duration']]
Hope this is what you wanted!

Python parsing json file to access values returning TypeError

I am using python to parse a json file full of url data to try and build a url reputation classifier. There are around 2,000 entries in the json file and not all of them have all of the fields present. A typical entry looks like this:
[
{
"host_len" : 12,
"fragment" : null,
"url_len" : 84,
"default_port" : 80,
"domain_age_days" : "5621",
"tld" : "com",
"num_domain_tokens" : 3,
"ips" : [
{
"geo" : "CN",
"ip" : "115.236.98.124",
"type" : "A"
}
],
"malicious_url" : 0,
"url" : "http://www.oppo.com/?utm_source=WeiBo&utm_medium=OPPO&utm_campaign=DailyFlow",
"alexa_rank" : "25523",
"query" : "utm_source=WeiBo&utm_medium=OPPO&utm_campaign=DailyFlow",
"file_extension" : null,
"registered_domain" : "oppo.com",
"scheme" : "http",
"path" : "/",
"path_len" : 1,
"port" : 80,
"host" : "www.oppo.com",
"domain_tokens" : [
"www",
"oppo",
"com"
],
"mxhosts" : [
{
"mxhost" : "mail1.oppo.com",
"ips" : [
{
"geo" : "CN",
"ip" : "121.12.164.123",
"type" : "A"
}
]
}
],
"path_tokens" : [
""
],
"num_path_tokens" : 1
}
]
I am trying to access the data stored in the fields "ips" and "mxhosts" to compare the "geo" location. To try and access the first "ips" field I'm using:
corpus = open(file)
urldata = json.load(corpus, encoding="latin1")
for record in urldata:
print record["ips"][0]["geo"]
But as I mentioned not all of the json entries have all of the fields. "ips" is always present but sometimes it's "null" and the same goes for "geo". I'm trying to check for the data before accessing it using:
if(record["ips"] is not None and record["ips"][0]["geo"] is not None):
But I this an error:
if(record["ips"] is not None and record["ips"][0]["geo"] is not None):
TypeError: string indices must be integers
When I try to check it using this:
if("ips" in record):
I get this error message:
print record["ips"][0]["geo"]
TypeError: 'NoneType' object has no attribute '__getitem__'
So I'm not sure how to check if the record I'm trying to access exists before I access it, or if I'm even accessing in the most correct way. Thanks.

You can simply check if record["ips"] is not None, or more simply if it's True, before proceeding to access it as a list; otherwise you would be calling a list method on a None object.
for record in urldata:
if record["ips"]:
print record["ips"][0]["geo"]

So it ended up being a little convoluted due to the inconsistent nature of the json file, but I had to end up first checking that "ips" was not null and then checking that "geo" was present in record["ips"][0]. This is what it looks like:
if(record["ips"] is not None and "geo" in record["ips"][0]):
print record["ips"][0]["geo"]
Thanks for the feedback everyone!

working with JSON with a bash script

I’m writing a bash script that does a few things. Right now it copies a few files into correct directories and runs a few commands. I need this bash script to edit a JSON file. Essentially this script would append a snippet of JSON to an existing JSON object that exists in a file.JSON. I cannot just append the data because the JSON snippet must be a part of an existing JSON object (should be added to the tracks array). So is this possible to do with a bash script? Should I just write another python or R script to handle this JSON Logic or is there a more elegant solution. Thanks for any help.
file.JSON looks like this...
{
"formatVersion" : 1,
"tracks" : [
{
"key" : "Reference sequence",
"chunkSize" : 20000,
"urlTemplate" : "seq/{refseq_dirpath}/{refseq}-",
"storeClass" : "JBrowse/Store/Sequence/StaticChunked",
"type" : "SequenceTrack",
"seqType" : "dna",
"category" : "Reference sequence",
"label" : "DNA"
},
{
"type" : "FeatureTrack",
"label" : "gff_track1",
"trackType" : null,
"key" : "gff_track1",
"compress" : 0,
"style" : {
"className" : "feature"
},
"storeClass" : "JBrowse/Store/SeqFeature/NCList",
"urlTemplate" : "tracks/gff_track1/{refseq}/trackData.json"
},
{
"storeClass" : "JBrowse/Store/SeqFeature/NCList",
"style" : {
"className" : "feature"
},
"urlTemplate" : "tracks/ITAG2.4_gene_models.gff3/{refseq}/trackData.json",
"key" : "ITAG2.4_gene_models.gff3",
"compress" : 0,
"trackType" : null,
"label" : "ITAG242.4_gene_models.gff3",
"type" : "FeatureTrack"
},
{
"urlTemplate" : "g-231FRL.bam",
"storeClass" : "JBrowse/Store/SeqFeature/BAM",
"label" : "g-1FRL.bam",
"type" : "JBrowse/View/Track/Alignments2",
"key" : "g-1FRL.bam"
}
]
}
the JSON snippet looks like this ...
{
"urlTemplate": "AX2_filtered.vcf.gz",
"label": "AX2_filtered.vcf.gz",
"storeClass": "JBrowse/Store/SeqFeature/VCFTabix",
"type": "CanvasVariants"
}

Do yourself a favor and install jq, then it's as simple as:
jq -n 'input | .tracks += [inputs]' file.json snippet.json > out.json
Trying to modify a structured data (like JSON) is a fool's errand without a proper parser and jq really makes it easy.
However, if you prefer doing it through Python (although it would be an overkill for this kind of a task) it's pretty much as straight forward as with jq:
import json
with open("file.json", "r") as f, open("snippet.json", "r") as s, open("out.json", "w") as u:
data = json.load(f) # parse `file.json`
data["tracks"].append(json.load(s)) # parse `snippet.json` and append it to `.tracks[]`
json.dump(data, u, indent=4) # encode the data back to JSON and write it to `out.json`

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Trying to get data from JSON file in Python - python

MrGeek is correct. Just to add more info [] are for JSON arrays, which are called list in Python {} are for JSON objects, which are called dict in Python To get the value we do as follows: data["route-information"][0]["route-table"][0]["comment"] data["attributes"]["xmlns"]

Related

Python: Get all values of a specific key from json file

Reading nested dictionary-like txt file into a Pandas dataframe

Correctly referencing a JSON in Python. Strings versus Integers and Nested items

Python parsing json file to access values returning TypeError

working with JSON with a bash script

Categories

Resources