Related
I have an index in Elastic that contains an array of keys and values.
For example - a single document looks like this:
{
"_index": "my_index",
"_source": {
"name": "test",
"values": [
{
"name": "a",
"score": 10
},
{
"name": "b",
"score": 4
},
{
"name": "c",
"score": 2
},
{
"name": "d",
"score": 1
}
]
},
"fields": {
"name": [
"test"
],
"values.name.keyword": [
"a",
"b",
"c",
"d"
],
"name.keyword": [
"test"
],
"values.score": [
10,
4,
2,
1
],
"values.name": [
"a",
"b",
"c",
"d"
]
}
}
I want to create an Elastic query (through API) that retrieves a sum of all the name scores filtered by a list of names.
For example, for the input:
names = ['a', 'b']
The result will be: 14
Any idea how to do it?
You can di this by making values array nested. Example mapping:
{
"mappings": {
"properties": {
"values": { "type": "nested" }
}
}
}
Following query will give the result you want:
{
"size":0,
"aggs": {
"asd": {
"nested": {
"path": "values"
},
"aggs": {
"filter_agg": {
"filter": {
"terms": {
"values.name.keyword": [
"a",
"b"
]
}
},
"aggs": {
"sum": {
"sum": {
"field": "values.score"
}
}
}
}
}
}
}
}
How to convert a list of dictionaries to a list?
Here is what I have:
{
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]}
Here is what I want it to look like:
({'6953':'https://address-jbr.ofp.ae','6967':'https://plots.ae','6970':'https://dubai-creek-harbour.ofp.ae'})
This is indeed very straightforward:
data = {
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]
}
Then:
data_list = [{x["ID"]: x["VALUE"]} for x in data["sources"]]
Which is the same as:
data_list = []
for x in data["sources"]:
data_list.append({
x["ID"]: x["VALUE"]
})
EDIT: You said convert to a "list" in the question and that confused me. Then this is what you want:
data_dict = {x["ID"]: x["VALUE"] for x in data["sources"]}
Which is the same as:
data_dict = {}
for x in data["sources"]:
data_dict[x["ID"]] = x["VALUE"]
P.S. Seems like you're asking for answers to your course assignments or something here, which is not what this place is for.
A solution using pandas
import pandas as pd
data = {
"sources": [
{"ID": "6953", "VALUE": "https://address-jbr.ofp.ae"},
{"ID": "6967", "VALUE": "https://plots.ae"},
{"ID": "6970", "VALUE": "https://dubai-creek-harbour.ofp.ae"},
]
}
a = pd.DataFrame.from_dict(data["sources"])
print(a.set_index("ID").T.to_dict(orient="records"))
outputs to:
[{'6953': 'https://address-jbr.ofp.ae', '6967': 'https://plots.ae', '6970': 'https://dubai-creek-harbour.ofp.ae'}]
this should work.
Dict = {
"sources": [
{
"ID": "6953",
"VALUE": "https://address-jbr.ofp.ae"
},
{
"ID": "6967",
"VALUE": "https://plots.ae"
},
{
"ID": "6970",
"VALUE": "https://dubai-creek-harbour.ofp.ae"
}]}
# Store all the keys here
value_LIST = []
for item_of_list in Dict["sources"]:
for key, value in item_of_list.items():
value_LIST.append(value)
I am literally lost with ideas on converting multiple lines of string into a JSON tree structure
So, I have multiple lines of string like below under a particular excel column:
/abc/a[2]/a/x[1]
/abc/a[2]/a/x[2]
Since the above strings contain the delimiter / , I could use them to create a parent-child relationship and convert them into a Python dictionary (or) JSON like below:
{
"tag": "abc",
"child": [
{
"tag": "a[2]",
"child": [
{
"tag": "a",
"child": [
{
"tag": "a",
"child": [
{
"tag": "x[1]"
},
{
"tag": "x[2]"
}
]
}
]
}
]
}
]
}
I am unable to come up with a logic for this part of my project since I need to look for the presence of [1],[2] and assign them to a common parent and this needs to be done in some recursive way that works for strings with any length. Please help me out with any code logic in Python or provide me with suggestions. Much appreciated !!
Additional Update:
Just wondering if it would also be possible to include other column's data along with the JSON structure.
For ex: If the excel contains the below three columns with 2 rows
tag
text
type
/abc/a[2]/a/x[1]
Hello
string
/abc/a[2]/a/x[2]
World
string
Along with the JSON from the original question, is it possible to add these other column information as key-value attributes (to the corresponding innermost child nesting) in the JSON like below.
These do not follow the same '/' delimiter format, hence I am unsure on approaching this..
{
"tag": "abc",
"child": [
{
"tag": "a[2]",
"child": [
{
"tag": "a",
"child": [
{
"tag": "a",
"child": [
{
"tag": "x[1]",
"text": "Hello",
"type": "string"
},
{
"tag": "x[2]",
"text": "World",
"type": "string"
}
]
}
]
}
]
}
]
}
P.S: I felt it would be appropriate to add the data to the innermost child to avoid information redundancy... Please feel free to suggest including these other column in any other appropriate way as well.
You can use recursion with collections.defaultdict:
import collections, json
def to_tree(d):
v = collections.defaultdict(list)
for [a, *b], *c in d:
v[a].append([b, *c])
return [{'tag':a, **({'child':to_tree(b)} if all(j for j, *_ in b) else dict(zip(['text', 'type'], b[0][-2:])))}
for a, b in v.items()]
col_data = [['/abc/a[2]/a/x[1]', 'Hello', 'string'], ['/abc/a[2]/a/x[2]', 'World', 'string']] #read in from your excel file
result = to_tree([[[*filter(None, a.split('/'))], b, c] for a, b, c in col_data])
print(json.dumps(result, indent=4))
Output:
[
{
"tag": "abc",
"child": [
{
"tag": "a[2]",
"child": [
{
"tag": "a",
"child": [
{
"tag": "x[1]",
"text": "Hello",
"type": "string"
},
{
"tag": "x[2]",
"text": "World",
"type": "string"
}
]
}
]
}
]
}
]
def to_dict(data, old_data=None):
json={}
if old_data!=None:
json["child"]=old_data
for row in data.split("\n"):
path=json
for element in row.strip("/").split("/"):
if not "child" in path:
path["child"]=[]
path=path["child"]
for path_el in path:
if element==path_el["tag"]:
path=path_el
break
else:
path.append({"tag":element})
path=path[-1]
return json["child"]
#tests:
column="""/abc/a[2]/a/x[1]
/abc/a[2]/a/x[2]"""
print(to_dict_2(column))
print(to_dict_2(column,[{'tag': 'abc', 'child': [{'tag': 'old entry'}]}]))
I'm trying to convert a dataframe to a particular JSON format. I've attempted doing this using the methods "to_dict()" and "json.dump()" from the pandas and json modules, respectively, but I can't get the JSON format I'm after. To illustrate:
df = pd.DataFrame({
"Location": ["1ST"] * 3 + ["2ND"] * 3,
"Date": ["2019-01", "2019-02", "2019-03"] * 2,
"Category": ["A", "B", "C"] * 2,
"Number": [1, 2, 3, 4, 5, 6]
})
def dataframe_to_dictionary(df, orientation):
dictionary = df.to_dict(orient=orientation)
return dictionary
dict_records = dataframe_to_dictionary(df, "records")
with open("./json_records.json", "w") as json_records:
json.dump(dict_records, json_records, indent=2)
dict_index = dataframe_to_dictionary(df, "index")
with open("./json_index.json", "w") as json_index:
json.dump(dict_index, json_index, indent=2)
When I convert "dict_records" to JSON, I get an array of the form:
[
{
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
{
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
},
...
]
And, when I convert "dict_index" to JSON, I get an object of the form:
{
"0": {
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
"1": {
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
}
...
}
But, I'm trying to get a format that looks like the following (where key = location and values = [{}]) like below. Thanks in advance for your help.
{
1ST: [
{
"Date": "2019-01",
"Category": "A",
"Number" 1
},
{
"Date": "2019-02",
"Category": "B",
"Number" 2
},
{
"Date": "2019-03",
"Category": "C",
"Number" 3
}
],
2ND: [
{},
{},
{}
]
}
This can be achieved via groupby:
gb = df.groupby('Location')
{k: v.drop('Location', axis=1).to_dict(orient='records') for k, v in gb}
I am new to JSON formatted files.
I have a Pandas DataFrame:
import pandas as pd
df = pd.DataFrame([["A", "2014/01/01", "2014/01/02", "A", -0.0061, "A"],
["A", "2015/07/11", "2015/08/21", "A", 1.50, "A"],
["C", "2016/01/01", "2016/01/05", "U", 2.75, "R"],
["D", "2013/05/19", "2014/09/30", "Q", -100.0, "N"],
["B", "2015/08/22", "2015/09/01", "T", 10.0, "R"]],
columns=["P", "Start", "End", "Category", "Value", "Group"]
)
That looks like this
P Start End Category Value Group
0 A 2014/01/01 2014/01/02 A -0.0061 A
1 A 2015/07/11 2015/08/21 A 1.5000 A
2 C 2016/01/01 2016/01/05 U 2.7500 R
3 D 2013/05/19 2014/09/30 Q -100.0000 N
4 B 2015/08/22 2015/09/01 T 10.0000 R
I know that I could convert this to JSON via:
df.to_json("output.json")
But I need to convert it to a nested JSON format like this:
{
"group_list": [
{
"category_list": [
{
"category": "A",
"p_list": [
{
"p": "A",
"date_list": [
{
"start": "2014/01/01",
"end": "2014/01/02",
"value": "-0.0061"
}
]
},
{
"p": "A",
"date_list": [
{
"start": "2015/07/11",
"end": "2015/08/21",
"value": "1.5000"
}
]
}
]
}
],
"group": "A"
},
{
"category_list": [
{
"category": "U",
"p_list": [
{
"p": "C",
"date_list": [
{
"start": "2016/01/01",
"end": "2016/01/05",
"value": "2.7500"
}
]
}
]
},
{
"category": "T",
"p_list": [
{
"p": "B",
"date_list": [
{
"start": "2015/08/22",
"end": "2015/09/01",
"value": "10.000"
}
]
}
]
}
],
"group": "R"
},
{
"category_list": [
{
"category": "Q",
"p_list": [
{
"p": "D",
"date_list": [
{
"start": "2013/05/19",
"end": "2014/09/30",
"value": "-100.0000"
}
]
}
]
}
],
"group": "N"
}
]
}
I've considered using Pandas' groupby functionality but I can't quite figure out how I could then get it into the final JSON format. Essentially, the nesting begins with grouping together rows with the same "group" and "category" columns. Afterwards, it is a matter of listing out the rows. I could write some code with nested for-loops but I'm hoping that there is a more efficient way to accomplish this.
Update
I can also manipulate my DataFrame via:
df2 = df.set_index(['Group', 'Category', 'P']).stack()
Group Category P
A A A Start 2014/01/01
End 2014/01/02
Value -0.0061
Start 2015/07/11
End 2015/08/21
Value 1.5
R U C Start 2016/01/01
End 2016/01/05
Value 2.75
N Q D Start 2013/05/19
End 2014/09/30
Value -100
R T B Start 2015/08/22
End 2015/09/01
Value 10
which is close to where I need to be but I don't think one could call df2.to_json() in this case.
The below nested loop should get you pretty close:
import json
from json import dumps
json_dict = {}
json_dict['group_list'] = []
for grp, grp_data in df.groupby('Group'):
grp_dict = {}
grp_dict['group'] = grp
for cat, cat_data in grp_data.groupby('Category'):
grp_dict['category_list'] = []
cat_dict = {}
cat_dict['category'] = cat
cat_dict['p_list'] = []
for p, p_data in cat_data.groupby('P'):
p_data = p_data.drop(['Category', 'Group'], axis=1).set_index('P')
for d in p_data.to_dict(orient='records'):
cat_dict['p_list'].append({'p': p, 'date_list': [d]})
grp_dict['category_list'].append(cat_dict)
json_dict['group_list'].append(grp_dict)
json_out = dumps(json_dict)
parsed = json.loads(json_out)
resulting in:
json.dumps(parsed, indent=4, sort_keys=True)
{
"group_list": [
{
"category_list": [
{
"category": "A",
"p_list": [
{
"date_list": [
{
"End": "2014/01/02",
"Start": "2014/01/01",
"Value": -0.0061
}
],
"p": "A"
},
{
"date_list": [
{
"End": "2015/08/21",
"Start": "2015/07/11",
"Value": 1.5
}
],
"p": "A"
}
]
}
],
"group": "A"
},
{
"category_list": [
{
"category": "Q",
"p_list": [
{
"date_list": [
{
"End": "2014/09/30",
"Start": "2013/05/19",
"Value": -100.0
}
],
"p": "D"
}
]
}
],
"group": "N"
},
{
"category_list": [
{
"category": "U",
"p_list": [
{
"date_list": [
{
"End": "2016/01/05",
"Start": "2016/01/01",
"Value": 2.75
}
],
"p": "C"
}
]
}
],
"group": "R"
}
]
}