I have a csv file with some as the columns in the format x;y;z. I am using pandas to read this data, do some pre-processing and convert to a list of json objects using to_json/to_dict methods of pandas. While converting these special columns, the json object for that column should be of the format {x: {y: {z: value}}}. There could be different columns like x:y:z and x:y:a and these 2 have to be merged together into a single object in the resultant record json in the format i.e., {x: {y: {z: value1, a: value2}}}
CSV:
Id,Name,X;Y;Z,X;Y;A,X;B;Z
101,Adam,1,2,3
102,John,4,5,6
103,Sara,7,8,9
Output:
[
{
"Id":101,
"Name":"Adam",
"X":{
"Y":{
"Z":1,
"A":2
},
"B":{
"Z":3
}
}
},
{
"Id":102,
"Name":"John",
"X":{
"Y":{
"Z":4,
"A":5
},
"B":{
"Z":6
}
}
},
{
"Id":103,
"Name":"Sara",
"X":{
"Y":{
"Z":7,
"A":8
},
"B":{
"Z":9
}
}
}
]
I found it easier to use pandas to dump the data as a dict then use a recursive function to iterate through the keys and where I encounter a key which contains a ; then i split the key by this deliminator and recursively create the nested dicts. When i reach the last element in the split key i update the key with the original value and the remove the original key from the dict.
import pandas as pd
from io import StringIO
import json
def split_key_to_nested_dict(original_dict, original_key, nested_dict, nested_keys):
if nested_keys[0] not in nested_dict:
nested_dict[nested_keys[0]] = {}
if len(nested_keys) == 1:
nested_dict[nested_keys[0]] = original_dict[original_key]
del original_dict[original_key]
else:
split_key_to_nested_dict(original_dict, original_key, nested_dict[nested_keys[0]], nested_keys[1:])
csv_data = StringIO("""Id,Name,X;Y;Z,X;Y;A,X;B;Z
101,Adam,1,2,3
102,John,4,5,6
103,Sara,7,8,9""")
df = pd.DataFrame.from_csv(csv_data)
df.insert(0, df.index.name, df.index)
dict_data = df.to_dict('records')
for data in dict_data:
keys = list(data.keys())
for key in keys:
if ';' in key:
nested_keys = key.split(';')
split_key_to_nested_dict(data, key, data, nested_keys)
print(json.dumps(dict_data))
OUTPUT
[{"Id": 101, "Name": "Adam", "X": {"Y": {"Z": 1, "A": 2}, "B": {"Z": 3}}}, {"Id": 102, "Name": "John", "X": {"Y": {"Z": 4, "A": 5}, "B": {"Z": 6}}}, {"Id": 103, "Name": "Sara", "X": {"Y": {"Z": 7, "A": 8}, "B": {"Z": 9}}}]
FORMATED OUTPUT
[
{
"Id": 101,
"Name": "Adam",
"X": {
"Y": {
"Z": 1,
"A": 2
},
"B": {
"Z": 3
}
}
},
{
"Id": 102,
"Name": "John",
"X": {
"Y": {
"Z": 4,
"A": 5
},
"B": {
"Z": 6
}
}
},
{
"Id": 103,
"Name": "Sara",
"X": {
"Y": {
"Z": 7,
"A": 8
},
"B": {
"Z": 9
}
}
}
]
Related
Working on a freshwater fish conservation project. I scraped a JSON file that looks like this:
{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}
And I'm trying to extract the keys "id" and "a" into a python dictionary like this:
fish_id = {
0 : "NONE",
1 : "Hampala macrolepidota",
2 : "Channa micropeltes",
3 : "Chitala ornata"
}
import json
data = """{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}"""
data_dict = json.loads(data)
fish_id = {}
for item in data_dict["fish"]:
fish_id[item["id"]] = item["a"]
print(fish_id)
First create a fish.json file and get your JSON file;
with open('fish.json') as json_file:
data = json.load(json_file)
Then, take your fishes;
fish1 = data['fish'][0]
fish2 = data['fish'][1]
fish3 = data['fish'][2]
fish4 = data['fish'][3]
After that take only values for each, because you want to create a dictionary only from values;
value_list1=list(fish1.values())
value_list2=list(fish2.values())
value_list3=list(fish3.values())
value_list4=list(fish4.values())
Finally, create fish_id dictionary;
fish_id = {
f"{value_list1[0]}" : f"{value_list1[2]}",
f"{value_list2[0]}" : f"{value_list2[2]}",
f"{value_list3[0]}" : f"{value_list3[2]}",
f"{value_list4[0]}" : f"{value_list4[2]}",
}
if you run;
print(fish_id)
Result will be like below, but if you can use for loops, it can be more effective.
{'0': 'NONE', '1': 'Hampala macrolepidota', '2': 'Channa micropeltes', '3': 'Chitala ornata'}
I have a list of multiple dictionaries with different numbers of layers. Here's what it looks like:
data_ls = [
{"a": {"b": {"c1": {"d1": "d1_value"}}}},
{"a": {"b": {"c2": {"d2": {"e1": "e1_value "}}}}},
...
...
]
I need to write it to a JSON file, here's what I tried:
json_str = json.dumps(data_ls)
json_file = open("data.json", "w")
json_file.write(json_str)
The output will be like:
[
{
"a": {
"b": {
"c1": {
"d1": "d1_value"
}
}
}
},
{
"a": {
"b": {
"c2": {
"d2": {
"e1": "e1_value "
}
}
}
}
}
]
But some of the same keys are turned out to be separated nested, the desired output looks like:
[{
"a": {
"b": {
"c1": {"d1": "d1_value"},
"c2": {
"d2": {"e1": "e1_value "},
},
}
}
}]
How do I get the output like this?
Thanks in advance!
You can use a recursive function with two dictionaries at a time, checking whether the key exists or not if not update the key
import json
data = [
{},
{'a': {'b': {'c1': {'d1': 'd1_value'}}}},
{'a': {'b': {'c2': {'d2': {'e1': 'e1_value '}}}}},
{'a': {'b1': {'c3': 'd3'}}},
{'x': {'y': 'z'}},
{'a': {'b': {'c2': {'d2': {'e2': 'e2_value '}}}}}
]
def fun(d1: dict, d2: dict):
for k, v in d2.items():
if k not in d1:
d1[k] = v
if isinstance(v, dict):
return fun(d1[k], v)
res = data[0]
for d in data[1:]:
fun(res, d)
print(json.dumps(res))
Output:
{
"a": {
"b": {
"c1": {
"d1": "d1_value"
},
"c2": {
"d2": {
"e1": "e1_value ",
"e2": "e2_value "
}
}
},
"b1": {
"c3": "d3"
}
},
"x": {
"y": "z"
}
}
Note:
I'm only considering nested elements as dicts and other non sequence types
I have a nested JSON file which I fail to parse into flatten csv.
I want to have the following columns in the csv:
id, name, path, tags (a column for each of them), points (I need x\y values of the 4 dots)
example of the JSON input:
{
"name": "test",
"securityToken": "test Token",
"videoSettings": {
"frameExtractionRate": 15
},
"tags": [
{
"name": "Blur Reject",
"color": "#FF0000"
},
{
"name": "Blur Poor",
"color": "#800000"
}
],
"id": "Du1qtrZQ1",
"activeLearningSettings": {
"autoDetect": false,
"predictTag": true,
"modelPathType": "coco"
},
"version": "2.1.0",
"lastVisitedAssetId": "ddee3e694ec299432fed9e42de8741ad",
"assets": {
"0b8f6f214dc7066b00b50ae16cf25cf6": {
"asset": {
"format": "jpg",
"id": "0b8f6f214dc7066b00b50ae16cf25cf6",
"name": "1.jpg",
"path": "c:\temp\1.jpg",
"size": {
"width": 1500,
"height": 1125
},
"state": 2,
"type": 1
},
"regions": [
{
"id": "VtDyR9Ovl",
"type": "POLYGON",
"tags": [
"3",
"9",
"Dark Poor"
],
"boundingBox": {
"height": 695.2110389610389,
"width": 1111.607142857143,
"left": 167.41071428571428,
"top": 241.07142857142856
},
"points": [
{
"x": 167.41071428571428,
"y": 252.02922077922076
},
{
"x": 208.80681818181816,
"y": 891.2337662337662
},
{
"x": 1252.232142857143,
"y": 936.2824675324675
},
{
"x": 1279.017857142857,
"y": 241.07142857142856
}
]
}
],
"version": "2.1.0"
},
"0155d8143c8cad85b5b9d392fd2895a4": {
"asset": {
"format": "jpg",
"id": "0155d8143c8cad85b5b9d392fd2895a4",
"name": "2.jpg",
"path": "c:\temp\2.jpg",
"size": {
"width": 1080,
"height": 1920
},
"state": 2,
"type": 1
},
"regions": [
{
"id": "7FFl_diM2",
"type": "POLYGON",
"tags": [
"Dark Poor"
],
"boundingBox": {
"height": 502.85714285714283,
"width": 820.3846153846155,
"left": 144.08653846153848,
"top": 299.2207792207792
},
"points": [
{
"x": 152.39423076923077,
"y": 311.68831168831167
},
{
"x": 144.08653846153848,
"y": 802.077922077922
},
{
"x": 964.4711538461539,
"y": 781.2987012987012
},
{
"x": 935.3942307692308,
"y": 299.2207792207792
}
]
}
],
"version": "2.1.0"
}
}
I tried using pandas's json_normalize and realized I don't fully understand how to specify the columns I wish to parse:
import json
import csv
import pandas as pd
from pandas import Series, DataFrame
from pandas.io.json import json_normalize
f = open(r'c:\temp\test-export.json')
data = json.load(f) # load as json
f.close()
df = json_normalize(data) #load json into dataframe
df.to_csv(r'c:\temp\json-to-csv.csv', sep=',', encoding='utf-8')
The results are hard to work with because I didn't specify what I want (iteirate trough specific array and append it to the CSV)
This where I wish your help.
I assume i don't fully understand how the normalize works and suspect it is not the best way to deal with this problem.
Thank you!
You can do something like this. Since you didn't provide an example output I did something on my own.
import json
import csv
f = open(r'file.txt')
data = json.load(f)
f.close()
with open("output.csv", mode="w", newline='') as out:
w = csv.writer(out)
header = ["id","name","path","tags","points"]
w.writerow(header)
for asset in data["assets"]:
data_point = data["assets"][asset]
output = [data_point["asset"]["id"]]
output.append(data_point["asset"]["name"])
output.append(data_point["asset"]["path"])
output.append(data_point["regions"][0]["tags"])
output.append(data_point["regions"][0]["points"])
w.writerow(output)
Output
id,name,path,tags,points
0b8f6f214dc7066b00b50ae16cf25cf6,1.jpg,c:\temp\1.jpg,"['3', '9', 'Dark Poor']","[{'x': 167.41071428571428, 'y': 252.02922077922076}, {'x': 208.80681818181816, 'y': 891.2337662337662}, {'x': 1252.232142857143, 'y': 936.2824675324675}, {'x': 1279.017857142857, 'y': 241.07142857142856}]"
0155d8143c8cad85b5b9d392fd2895a4,2.jpg,c:\temp\2.jpg,['Dark Poor'],"[{'x': 152.39423076923077, 'y': 311.68831168831167}, {'x': 144.08653846153848, 'y': 802.077922077922}, {'x': 964.4711538461539, 'y': 781.2987012987012}, {'x': 935.3942307692308, 'y': 299.2207792207792}]"
I'm trying to convert a dataframe to a particular JSON format. I've attempted doing this using the methods "to_dict()" and "json.dump()" from the pandas and json modules, respectively, but I can't get the JSON format I'm after. To illustrate:
df = pd.DataFrame({
"Location": ["1ST"] * 3 + ["2ND"] * 3,
"Date": ["2019-01", "2019-02", "2019-03"] * 2,
"Category": ["A", "B", "C"] * 2,
"Number": [1, 2, 3, 4, 5, 6]
})
def dataframe_to_dictionary(df, orientation):
dictionary = df.to_dict(orient=orientation)
return dictionary
dict_records = dataframe_to_dictionary(df, "records")
with open("./json_records.json", "w") as json_records:
json.dump(dict_records, json_records, indent=2)
dict_index = dataframe_to_dictionary(df, "index")
with open("./json_index.json", "w") as json_index:
json.dump(dict_index, json_index, indent=2)
When I convert "dict_records" to JSON, I get an array of the form:
[
{
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
{
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
},
...
]
And, when I convert "dict_index" to JSON, I get an object of the form:
{
"0": {
"Location": "1ST",
"Date": "2019-01",
"Category": "A",
"Number": 1
},
"1": {
"Location": "1ST",
"Date": "2019-02",
"Category": "B",
"Number": 2
}
...
}
But, I'm trying to get a format that looks like the following (where key = location and values = [{}]) like below. Thanks in advance for your help.
{
1ST: [
{
"Date": "2019-01",
"Category": "A",
"Number" 1
},
{
"Date": "2019-02",
"Category": "B",
"Number" 2
},
{
"Date": "2019-03",
"Category": "C",
"Number" 3
}
],
2ND: [
{},
{},
{}
]
}
This can be achieved via groupby:
gb = df.groupby('Location')
{k: v.drop('Location', axis=1).to_dict(orient='records') for k, v in gb}
I have been working on this code for a few hours trying a bunch of things to iterate through the supplied json data. Can figure out how to properly iterate through these nested lists and objects.
import json
data = """
{
"tracks": "1",
"timeline": {
"0.733251541": [
{
"id": 1,
"bounds": {
"Width": 0.5099463905313426,
"Height": 0.2867199993133546,
"Y": 0.4436400003433228,
"X": 0.4876505160745349
}
}
],
"0.965": [
{
"id": 1,
"bounds": {
"Width": 0.4205311330135182,
"Height": 0.2363199994340539,
"Y": 0.2393400002829731,
"X": 0.1593787633901481
}
}
],
"1.098224": [
{
"id": 1,
"bounds": {
"Width": 0.4568560813801344,
"Height": 0.2564799993857742,
"Y": 0.1992600003071129,
"X": 0.1000513407532317
}
}
]
},
"taggedTracks": {
"1": "dirk"
}
}
"""
json = json.loads(data)
for a in json["timeline"]:
for b in a:
for c in b["bounds"]:
print a, c["Width"], c["Height"], c["Y"], c["X"]
Can someone please steer me in the right direction on how to deal with the json data supplied?
I get the following error.
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: string indices must be integers
You are getting the the TypeError because inside each value of "timeline", first comes a list. You have to take the first value of that list, using the index of 0. Then you can parse the rest.
Hopefully the following code helps:
import json
data = """
{
"tracks": "1",
"timeline": {
"0.733251541": [
{
"id": 1,
"bounds": {
"Width": 0.5099463905313426,
"Height": 0.2867199993133546,
"Y": 0.4436400003433228,
"X": 0.4876505160745349
}
}
],
"0.965": [
{
"id": 1,
"bounds": {
"Width": 0.4205311330135182,
"Height": 0.2363199994340539,
"Y": 0.2393400002829731,
"X": 0.1593787633901481
}
}
],
"1.098224": [
{
"id": 1,
"bounds": {
"Width": 0.4568560813801344,
"Height": 0.2564799993857742,
"Y": 0.1992600003071129,
"X": 0.1000513407532317
}
}
]
},
"taggedTracks": {
"1": "dirk"
}
}
"""
test_json = json.loads(data)
for num, data in test_json["timeline"].iteritems():
print(num+":")
bounds = data[0]["bounds"]
for bound, value in bounds.iteritems():
print('\t'+bound+": "+str(value))
First of all, it's not a great idea to use the name json for a variable since that is the name of the module. Let's use j instead.
Anyway, when you do json.loads(), you get back a dict. When you iterate for a in <dict>, you get back the list of keys (only). You can instead iterate over the keys and values with iteritems(), like:
for k, a in j['timeline'].iteritems():
for b in a:
c = b['bounds']
print k, c["Width"], c["Height"], c["Y"], c["X"]