I am new to JSON formatted files.
I have a Pandas DataFrame:
import pandas as pd
df = pd.DataFrame([["A", "2014/01/01", "2014/01/02", "A", -0.0061, "A"],
["A", "2015/07/11", "2015/08/21", "A", 1.50, "A"],
["C", "2016/01/01", "2016/01/05", "U", 2.75, "R"],
["D", "2013/05/19", "2014/09/30", "Q", -100.0, "N"],
["B", "2015/08/22", "2015/09/01", "T", 10.0, "R"]],
columns=["P", "Start", "End", "Category", "Value", "Group"]
)
That looks like this
P Start End Category Value Group
0 A 2014/01/01 2014/01/02 A -0.0061 A
1 A 2015/07/11 2015/08/21 A 1.5000 A
2 C 2016/01/01 2016/01/05 U 2.7500 R
3 D 2013/05/19 2014/09/30 Q -100.0000 N
4 B 2015/08/22 2015/09/01 T 10.0000 R
I know that I could convert this to JSON via:
df.to_json("output.json")
But I need to convert it to a nested JSON format like this:
{
"group_list": [
{
"category_list": [
{
"category": "A",
"p_list": [
{
"p": "A",
"date_list": [
{
"start": "2014/01/01",
"end": "2014/01/02",
"value": "-0.0061"
}
]
},
{
"p": "A",
"date_list": [
{
"start": "2015/07/11",
"end": "2015/08/21",
"value": "1.5000"
}
]
}
]
}
],
"group": "A"
},
{
"category_list": [
{
"category": "U",
"p_list": [
{
"p": "C",
"date_list": [
{
"start": "2016/01/01",
"end": "2016/01/05",
"value": "2.7500"
}
]
}
]
},
{
"category": "T",
"p_list": [
{
"p": "B",
"date_list": [
{
"start": "2015/08/22",
"end": "2015/09/01",
"value": "10.000"
}
]
}
]
}
],
"group": "R"
},
{
"category_list": [
{
"category": "Q",
"p_list": [
{
"p": "D",
"date_list": [
{
"start": "2013/05/19",
"end": "2014/09/30",
"value": "-100.0000"
}
]
}
]
}
],
"group": "N"
}
]
}
I've considered using Pandas' groupby functionality but I can't quite figure out how I could then get it into the final JSON format. Essentially, the nesting begins with grouping together rows with the same "group" and "category" columns. Afterwards, it is a matter of listing out the rows. I could write some code with nested for-loops but I'm hoping that there is a more efficient way to accomplish this.
Update
I can also manipulate my DataFrame via:
df2 = df.set_index(['Group', 'Category', 'P']).stack()
Group Category P
A A A Start 2014/01/01
End 2014/01/02
Value -0.0061
Start 2015/07/11
End 2015/08/21
Value 1.5
R U C Start 2016/01/01
End 2016/01/05
Value 2.75
N Q D Start 2013/05/19
End 2014/09/30
Value -100
R T B Start 2015/08/22
End 2015/09/01
Value 10
which is close to where I need to be but I don't think one could call df2.to_json() in this case.
The below nested loop should get you pretty close:
import json
from json import dumps
json_dict = {}
json_dict['group_list'] = []
for grp, grp_data in df.groupby('Group'):
grp_dict = {}
grp_dict['group'] = grp
for cat, cat_data in grp_data.groupby('Category'):
grp_dict['category_list'] = []
cat_dict = {}
cat_dict['category'] = cat
cat_dict['p_list'] = []
for p, p_data in cat_data.groupby('P'):
p_data = p_data.drop(['Category', 'Group'], axis=1).set_index('P')
for d in p_data.to_dict(orient='records'):
cat_dict['p_list'].append({'p': p, 'date_list': [d]})
grp_dict['category_list'].append(cat_dict)
json_dict['group_list'].append(grp_dict)
json_out = dumps(json_dict)
parsed = json.loads(json_out)
resulting in:
json.dumps(parsed, indent=4, sort_keys=True)
{
"group_list": [
{
"category_list": [
{
"category": "A",
"p_list": [
{
"date_list": [
{
"End": "2014/01/02",
"Start": "2014/01/01",
"Value": -0.0061
}
],
"p": "A"
},
{
"date_list": [
{
"End": "2015/08/21",
"Start": "2015/07/11",
"Value": 1.5
}
],
"p": "A"
}
]
}
],
"group": "A"
},
{
"category_list": [
{
"category": "Q",
"p_list": [
{
"date_list": [
{
"End": "2014/09/30",
"Start": "2013/05/19",
"Value": -100.0
}
],
"p": "D"
}
]
}
],
"group": "N"
},
{
"category_list": [
{
"category": "U",
"p_list": [
{
"date_list": [
{
"End": "2016/01/05",
"Start": "2016/01/01",
"Value": 2.75
}
],
"p": "C"
}
]
}
],
"group": "R"
}
]
}
Related
I have an index in Elastic that contains an array of keys and values.
For example - a single document looks like this:
{
"_index": "my_index",
"_source": {
"name": "test",
"values": [
{
"name": "a",
"score": 10
},
{
"name": "b",
"score": 4
},
{
"name": "c",
"score": 2
},
{
"name": "d",
"score": 1
}
]
},
"fields": {
"name": [
"test"
],
"values.name.keyword": [
"a",
"b",
"c",
"d"
],
"name.keyword": [
"test"
],
"values.score": [
10,
4,
2,
1
],
"values.name": [
"a",
"b",
"c",
"d"
]
}
}
I want to create an Elastic query (through API) that retrieves a sum of all the name scores filtered by a list of names.
For example, for the input:
names = ['a', 'b']
The result will be: 14
Any idea how to do it?
You can di this by making values array nested. Example mapping:
{
"mappings": {
"properties": {
"values": { "type": "nested" }
}
}
}
Following query will give the result you want:
{
"size":0,
"aggs": {
"asd": {
"nested": {
"path": "values"
},
"aggs": {
"filter_agg": {
"filter": {
"terms": {
"values.name.keyword": [
"a",
"b"
]
}
},
"aggs": {
"sum": {
"sum": {
"field": "values.score"
}
}
}
}
}
}
}
}
I'm trying to convert a dataframe to a particular JSON format. I've attempted doing this using the methods "to_dict()" and "json.dump()" from the pandas and json modules, respectively, but I can't get the JSON format I'm after. To illustrate:
df = pd.DataFrame({
"Location": ["1ST"] * 3 + ["2ND"] * 3,
"Date": ["2019-01", "2019-02", "2019-03"] * 2,
"Category": ["A", "B", "C"] * 2,
"Number": [1, 2, 3, 4, 5, 6]
})
def dataframe_to_dictionary(df, orientation):
dictionary = df.to_dict(orient=orientation)
return dictionary
dict_records = dataframe_to_dictionary(df, "records")
with open("./json_records.json", "w") as json_records:
json.dump(dict_records, json_records, indent=2)
dict_index = dataframe_to_dictionary(df, "index")
with open("./json_index.json", "w") as json_index:
json.dump(dict_index, json_index, indent=2)
When I convert "dict_records" to JSON, I get an array of the form:
[
{
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
{
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
},
...
]
And, when I convert "dict_index" to JSON, I get an object of the form:
{
"0": {
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
"1": {
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
}
...
}
But, I'm trying to get a format that looks like the following (where key = location and values = [{}]) like below. Thanks in advance for your help.
{
1ST: [
{
"Date": "2019-01",
"Category": "A",
"Number" 1
},
{
"Date": "2019-02",
"Category": "B",
"Number" 2
},
{
"Date": "2019-03",
"Category": "C",
"Number" 3
}
],
2ND: [
{},
{},
{}
]
}
This can be achieved via groupby:
gb = df.groupby('Location')
{k: v.drop('Location', axis=1).to_dict(orient='records') for k, v in gb}
My final output JSON file is in following format
[
{
"Type": "UPDATE",
"resource": {
"site ": "Lakeside mh041",
"name": "Total Flow",
"unit": "CubicMeters",
"device": "2160 LaserFlow Module",
"data": [
{
"timestamp": [
"1087009200"
],
"value": [
6945.68
]
},
{
"timestamp": [
"1087095600"
],
"value": [
NaN
]
},
{
"timestamp": [
"1087182000"
],
"value": [
7091.62
]
},
I want to remove the whole object if the "value" is NaN.
Expected Output
[
{
"Type": "UPDATE",
"resource": {
"site ": "Lakeside mh041",
"name": "Total Flow",
"unit": "CubicMeters",
"device": "2160 LaserFlow Module",
"data": [
{
"timestamp": [
"1087009200"
],
"value": [
6945.68
]
},
{
"timestamp": [
"1087182000"
],
"value": [
7091.62
]
},
I cannot remove the blank values from my csv file because of the format of the file.
I have tried this:
with open('Result.json' , 'r') as j:
json_dict = json.loads(j.read())
json_dict['data'] = [item for item in json_dict['data'] if
len([val for val in item['value'] if isnan(val)]) == 0]
print(json_dict)
Error - json_dict['data'] = [item for item in json_dict['data'] if len([val for val in item['value'] if isnan(val)]) == 0]
TypeError: list indices must be integers or slices, not str
In case you have more than one value for json"value": [...]
then,
import json
from math import isnan
json_str = '''
[
{
"Type": "UPDATE",
"resource": {
"site ": "Lakeside mh041",
"name": "Total Flow",
"unit": "CubicMeters",
"device": "2160 LaserFlow Module",
"data": [
{
"timestamp": [
"1087009200"
],
"value": [
6945.68
]
},
{
"timestamp": [
"1087095600"
],
"value": [
NaN
]
}
]
}
}
]
'''
json_dict = json.loads(json_str)
for typeObj in json_dict:
resource_node = typeObj['resource']
resource_node['data'] = [
item for item in resource_node['data']
if len([val for val in item['value'] if isnan(val)]) == 0
]
print(json_dict)
For testing if value is NaN you could use math.isnan() function (doc):
data = '''{"data": [
{
"timestamp": [
"1058367600"
],
"value": [
9.65
]
},
{
"timestamp": [
"1058368500"
],
"value": [
NaN
]
},
{
"timestamp": [
"1058367600"
],
"value": [
4.75
]
}
]}'''
import json
from math import isnan
data = json.loads(data)
data['data'] = [i for i in data['data'] if not isnan(i['value'][0])]
print(json.dumps(data, indent=4))
Prints:
{
"data": [
{
"timestamp": [
"1058367600"
],
"value": [
9.65
]
},
{
"timestamp": [
"1058367600"
],
"value": [
4.75
]
}
]
}
I have a problem, i want to converting my list from my json file. When i'm trying to convert smaller file with the same format, the result is good as i expected. But when i change the data with bigger file and same format, I get an error like this:
could not convert string to float: '-2.942-2.942', but after i checked i dont have a string '-2.942-2.942' in my bigger json file. So my question is: Is it an issue with my RAM memory so my computer can't convert the list to float?
I have this JSON data:
{
"En": -2.942,
"atoms": [
{
"type": "Br",
"xyz": [
-4.0223,
0.5054,
-0.022
]
},
{
"type": "Cl",
"xyz": [
3.9221,
0.4837,
0.009
]
},
{
"type": "N",
"xyz": [
0.0218,
-0.5066,
-0.0031
]
},
{
"type": "C",
"xyz": [
-1.1862,
0.3061,
0.0071
]
},
{
"type": "C",
"xyz": [
1.2137,
0.3338,
0.0107
]
},
{
"type": "C",
"xyz": [
-2.4113,
-0.5886,
0.0255
]
},
{
"type": "C",
"xyz": [
2.4622,
-0.5338,
-0.0271
]
},
{
"type": "H",
"xyz": [
-1.2118,
0.9549,
-0.8777
]
},
{
"type": "H",
"xyz": [
-1.1963,
0.9561,
0.8911
]
},
{
"type": "H",
"xyz": [
1.2242,
0.9489,
0.9188
]
},
{
"type": "H",
"xyz": [
1.2105,
1.0092,
-0.854
]
},
{
"type": "H",
"xyz": [
0.0291,
-1.1038,
-0.8296
]
},
{
"type": "H",
"xyz": [
-2.46,
-1.2431,
-0.8498
]
},
{
"type": "H",
"xyz": [
-2.4697,
-1.1924,
0.9358
]
},
{
"type": "H",
"xyz": [
2.5076,
-1.2058,
0.8359
]
},
{
"type": "H",
"xyz": [
2.5034,
-1.1368,
-0.9398
]
}
],
"id": 16164,
"shapeM": [
146.89,
7.96,
0.86,
0.63,
0.1,
0.01,
0.0,
1.5,
0.03,
0.07,
0.0,
0.06,
0.02,
0.02
]
}
And this is my python script:
import numpy as np
import pandas as pd
import json
from itertools import groupby
from operator import itemgetter
with open('testdata.json') as f:
data = json.load(f)
grouper = itemgetter("En")
total = 0
quantity = 0
for key, grp in groupby(sorted(data, key = grouper),grouper):
total = float(''.join(map(str, [item["En"] for item in grp]))) #converting list to str to float
quantity += 1
average = total/quantity
print(average,total,quantity)
I've this JSON file (it's a little part of the file):
[
{
"History bleed": {
"sentences": [
{
"words": [
[
"History",
{
"PartOfSpeech": "NN",
"CharacterOffsetEnd": "7",
"Lemma": "history",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "0"
}
],
[
"bleed",
{
"PartOfSpeech": "VB",
"CharacterOffsetEnd": "39",
"Lemma": "bleed",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "34"
}
]
],
"indexeddependencies": [],
"parsetree": [],
"text": "History of lower gastrointestinal bleed",
"dependencies": []
}
]
}
},
{
"Antigen of Bordetella": {
"sentences": [
{
"words": [
[
"Antigen",
{
"PartOfSpeech": "NN",
"CharacterOffsetEnd": "7",
"Lemma": "antigen",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "0"
}
],
[
"of",
{
"PartOfSpeech": "IN",
"CharacterOffsetEnd": "10",
"Lemma": "of",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "8"
}
],
[
"Bordetella",
{
"PartOfSpeech": "NN",
"CharacterOffsetEnd": "21",
"Lemma": "bordetellum",
"NamedEntityTag": "PERSON",
"CharacterOffsetBegin": "11"
}
]
],
"indexeddependencies": [],
"parsetree": [],
"text": "Antigen of Bordetella",
"dependencies": []
}
]
}
},
{
"Anti-Histoplasma": {
"sentences": [
{
"words": [
[
"Anti-Histoplasma",
{
"PartOfSpeech": "JJ",
"CharacterOffsetEnd": "16",
"Lemma": "anti-histoplasma",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "0"
}
],
],
"indexeddependencies": [],
"parsetree": [],
"text": "Anti-Histoplasma capsulatum IgG",
"dependencies": []
}
]
}
}
]
and I want to get this:
{
"sentences": [
{
"words": [
[
"Antigen",
{
"PartOfSpeech": "NN",
"CharacterOffsetEnd": "7",
"Lemma": "antigen",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "0"
}
],
[
"of",
{
"PartOfSpeech": "IN",
"CharacterOffsetEnd": "10",
"Lemma": "of",
"NamedEntityTag": "O",
"CharacterOffsetBegin": "8"
}
],
[
"Bordetella",
{
"PartOfSpeech": "NN",
"CharacterOffsetEnd": "21",
"Lemma": "bordetellum",
"NamedEntityTag": "PERSON",
"CharacterOffsetBegin": "11"
}
]
],
"indexeddependencies": [],
"parsetree": [],
"text": "Antigen of Bordetella",
"dependencies": []
}
]
}
To obtain that I write this:
with open(pathOfTheJsonFIle) as f:
data = json.load(f)
print(data['Antigen of Bordetella'])
but i get this error: list indices must be integers, not str
This file is quite big (there are more than 10.000 items) so I would like to find the item Antigen of Bordetella using some index (and not writing data[2], for example)
That's because the JSON file starts with a list and not a dictionary.
Try:
for i in data:
if 'Antigen of Bordetella' in i:
print i
Using itertools you could do this:
from itertools import ifilter
...
searchkey = "Antigen of Bordetella"
search_data = ifilter(lambda X: searchkey in X, data).next()[searchkey]