This question already has answers here:
Flatten nested dictionaries, compressing keys
(32 answers)
Python 3: Flatten Dictionary including lists
(2 answers)
Closed 7 days ago.
I've been trying to get better at data manipulation and flattening large sources of data. I have a JSON structure that looks like this:
json_structure =
{
"a": "1",
"b": {
"c": 2,
"d": "3",
"e": {
"f": 4
}
},
"g": 5,
"h": [
{
"i": "6",
"j": "7"
}
],
"k": {
"l": "8",
"m": "9"
},
"n": {
"o": "10",
"p": {
"q": 11,
"r": 12,
},
"s": [
{
"t": "13",
"u": {
"v": 14,
},
"w": 15,
}
],
"x": [
{
"y": "16",
"z": [],
},
{
"aa": "17",
"bb": [
"abc"
]
},
}
}
I am able to get keys and values using this recursive code:
def deepValue(D,key,*rest,default=None):
try: return deepValue(D[key],*rest,default=default) if rest else D[key]
except: return default
def deepKeys(D,key,*rest):
try:
return deepKeys(D[key],*rest) if rest \
else D[key].keys() if isinstance(D[key],dict) \
else range(len(D[key]))
except:
return []
def deepItems(D,key,*rest):
try:
if rest:
yield from deepItems(D[key],*rest)
elif isinstance(D[key],dict):
yield from D[key].items()
else:
yield from enumerate(D[key])
except: return
However, when I get these keys, I also want to rename and store them in a dictionary that would look like this:
flattened = {
"a": 1,
"b_c": 2,
"b_d": 3,
"b_e_f": 4,
"g": 5,
"h_i_n": 6, # n refers to the index of the list, 0, 1, 2, etc.
"h_j_n": 7, # n refers to the index of the list, 0, 1, 2, etc.
"k_l": 8,
"k_m": 9,
"n_o": 10,
"n_o_p_q": 11,
"n_o_p_q_r": 12,
"n_s_t_i": 13, # i refers to the index of the list, 0, 1, 2, etc.
"s_u_v_n": 14 # n refers to the index of the list, 0, 1, 2, etc.
...
}
The naming convention being parentkey_childkey_nextchildkey.
This seems very complex to me and wondering if its even possible to do
I new to python.
I want read auth column from PostgreSQL which gives a json. I need to parse it and get the relevant api credentials in it. Then based on these, I want to get the data which is again json but this time its deeply nested json and objects can be more or less in different json. Now, from these JSON, I want to get all the keys and insert these in Source column names in the source table as rows of sourceColumnNames column. Target Column may have less columns then source lets say only a and d from source as name and PostalCode.
I am wondering how can I achieve this. It looks to be done something like scala case classes, target and source model classes but its needed to be done in python. How?
Data in AuthColumn is
{ "url": "https://api.myUrl.com/v2",
"headers": {
"Authorization": "TheSecretAccessToken2022",
"Content-Type": "application/json"
},
"data": {
"query": "{ boards{ items{ name column_values {a b c d} } } }"
} }
I need to parse it to get credentials and execute the query.
Then it will return some JSON which I need to parse.
This JSON could be like this
{
"data": {
"boards": [{
"name": "DP",
"id": "123",
"description": null,
"items": [{
"name": "TheColumn",
"column_values": [{
"a": "PDs",
"b": "PDs",
"c": "CI",
"d": "PV"
}, {
"a": "SLUD",
"b": "SLUD",
"c": "d",
"d": "MFO"
}, {
"a": "ST",
"b": "ST",
"c": "CI",
"d": "UC"
}, {
"a": "c",
"b": "c",
"c": "CI",
"d": "NC"
}, {
"a": "OP",
"b": "op",
"c": "CI",
"d": "0 days"
}, {
"a": "OPd",
"b": "OPd",
"c": "CI",
"d": "2022-02-25"
}, {
"a": "CD",
"b": "cd",
"c": "d",
"d": "2022-02-25"
}, {
"a": "cld",
"b": "cld",
"c": "d",
"d": "2022-04-22"
}, {
"a": "SoDce",
"b": "soDce",
"c": "CI",
"d": ""
}, {
"a": "MOD",
"b": "MOD",
"c": "date",
"d": ""
}, {
"a": "PP",
"b": "PP",
"c": "nuUDic",
"d": "625000"
}, {
"a": "UD",
"b": "UD",
"c": "nuUDic",
"d": ""
}, {
"a": "PAVSP",
"b": "PAVSP",
"c": "neUDic",
"d": ""
}, {
"a": "LendeUD",
"b": "lendeUD",
"c": "CI",
"d": "TBD"
}, {
"a": "ESP",
"b": "ESP",
"c": "CI",
"d": ""
}, {
"a": "ac",
"b": "ac",
"c": "CI",
"d": "Chicago"
}, {
"a": "SLd",
"b": "SLd",
"c": "CI",
"d": ""
}, {
"a": "UA",
"b": "UA",
"c": "CI",
"d": ""
}, {
"a": "UD",
"b": "UD",
"c": "CI",
"d": ""
}, {
"a": "R?",
"b": "R",
"c": "CI",
"d": ""
}, {
"a": "DDE",
"b": "DDE",
"c": "CI",
"d": ""
}, {
"a": "SOD",
"b": "SOD",
"c": "CI",
"d": ""
}, {
"a": "NOS",
"b": "NOS",
"c": "d",
"d": ""
}]
}, {
"name": "BBB",
"column_values": [{
"a": "PeUDs",
"b": "PeUDs",
"c": "CI",
"d": "PV"
}, {
"a": "SLUD",
"b": "SLUD",
"c": "d",
"d": "Ddd"
}, {
"a": "ST",
"b": "ST",
"c": "CI",
"d": "UC"
}, {
"a": "c",
"b": "c",
"c": "CI",
"d": "NC"
}, {
"a": "OP",
"b": "op",
"c": "CI",
"d": "0 days"
}, {
"a": "OPd",
"b": "OPd",
"c": "CI",
"d": "2022-02-23"
}, {
"a": "CD",
"b": "cd",
"c": "d",
"d": "2022-02-23"
}, {
"a": "cld",
"b": "cld",
"c": "d",
"d": "2022-03-04"
}, {
"a": "SoDce",
"b": "soDce",
"c": "CI",
"d": ""
}, {
"a": "MOD",
"b": "MOD",
"c": "date",
"d": ""
}, {
"a": "PP",
"b": "PP",
"c": "nuUDic",
"d": "3200"
}, {
"a": "UD",
"b": "UD",
"c": "numeic",
"d": ""
}, {
"a": "PDVSP",
"b": "PDVSP",
"c": "nueUDic",
"d": ""
}, {
"a": "ESP",
"b": "ESP",
"c": "CI",
"d": ""
}, {
"a": "ac",
"b": "ac",
"c": "CI",
"d": "Chicago a"
}, {
"a": "SLd",
"b": "SLd",
"c": "CI",
"d": ""
}, {
"a": "UA",
"b": "UA",
"c": "CI",
"d": ""
}, {
"a": "UD",
"b": "UD",
"c": "CI",
"d": ""
}, {
"a": "R?",
"b": "R",
"c": "CI",
"d": ""
}, {
"a": "DDE",
"b": "DDE",
"c": "CI",
"d": "DooU"
}, {
"a": "SOD",
"b": "SOD",
"c": "CI",
"d": ""
}, {
"a": "IU",
"b": "IU",
"c": "CI",
"d": ""
},{ "a": "DD",
"b": "DD",
"c": "CI",
"d": ""
}, {
"a": "LOS",
"b": "LOS",
"c": "num",
"d": ""
}, {
"a": "NOS",
"b": "NOS",
"c": "d",
"d": ""
}] }] }] }}
Now, I want to parse this Json and get keys and insert then to columnNames column in Meta Data Table
as
sourceColumnNames
name
id
description
items_name
a
b
c
d
Then I will query auth, get creds, and get values based on these source columns.
So far,
I have parsed JSON by json in python using index.
import json
with open('path/file.json') as myJson:
read_myjson = json.load(myJson)
read_data = read_myjson['data']
read_board = read_myjson['data']['boards']
board_name = read_myjson['data']['boards'][0]['name']
board_id = read_myjson['data']['boards'][0]['id']
board_description = read_myjson['data']['boards'][0]['description']
board_items = read_myjson['data']['boards'][0]['items']
board_items_name = read_myjson['data']['boards'][0]['items'][0]['name']
board_items_columnValues = read_myjson['data']['boards'][0]['items'][0]['column_values']
board_items_columnValues_title = read_myjson['data']['boards'][0]['items'][0]['column_values'][0]['a']
board_items_columnValues_id = read_myjson['data']['boards'][0]['items'][0]['column_values'][0]['b']
board_items_columnValues_type = read_myjson['data']['boards'][0]['items'][0]['column_values'][0]['c']
board_items_columnValues_text = read_myjson['data']['boards'][0]['items'][0]['column_values'][0]['d']
# for loop on Header
print("printing Header loop : ")
for key, val in read_myjson.items():
print(key, ":::", val)
headerKey = key
headerValue = val
print("printing data loop : it gives board key and its value")
for key, val in read_data.items():
# print(key, ":::", val)
datakey = key
dataValue = val
# print(datakey, "::::", dataValue)
print(" items loop")
# for key, val in read_board.items():
for item in board_items:
for key, val in item.items():
# print(key, ":::", val)
compDataAsKey = key
compDataAsValue = val
print(" Items_column_values loop")
columnKeys = []
columnValues = []
for items in board_items_columnValues:
for key, val in items.items():
# print(key, ":", val)
# compColumnKey = key
# compColumnValue = val
columnKeys.append(key)
columnValues.append(val)
I have also tried dataclasses in python but cant actually map the class to json parse etc.
import json
import orjson, dataclasses
with open('path/AuthJsonSample.json') as myJson:
read_myjson = json.load(myJson)
#dataclasses.dataclass
class AuthData:
url: str
headers: str
data: str
How can I make this etl pipeline?
Working on a freshwater fish conservation project. I scraped a JSON file that looks like this:
{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}
And I'm trying to extract the keys "id" and "a" into a python dictionary like this:
fish_id = {
0 : "NONE",
1 : "Hampala macrolepidota",
2 : "Channa micropeltes",
3 : "Chitala ornata"
}
import json
data = """{
"fish": [
{
"id": 0,
"n": "NO INFORMATION",
"a": "NONE",
"i": "none.png"
},
{
"id": 1,
"n": "Hampala barb",
"a": "Hampala macrolepidota",
"i": "hampala.png"
},
{
"id": 2,
"n": "Giant snakehead",
"a": "Channa micropeltes",
"i": "toman.png"
},
{
"id": 3,
"n": "Clown featherback",
"a": "Chitala ornata",
"i": "belida.png"
}
]
}"""
data_dict = json.loads(data)
fish_id = {}
for item in data_dict["fish"]:
fish_id[item["id"]] = item["a"]
print(fish_id)
First create a fish.json file and get your JSON file;
with open('fish.json') as json_file:
data = json.load(json_file)
Then, take your fishes;
fish1 = data['fish'][0]
fish2 = data['fish'][1]
fish3 = data['fish'][2]
fish4 = data['fish'][3]
After that take only values for each, because you want to create a dictionary only from values;
value_list1=list(fish1.values())
value_list2=list(fish2.values())
value_list3=list(fish3.values())
value_list4=list(fish4.values())
Finally, create fish_id dictionary;
fish_id = {
f"{value_list1[0]}" : f"{value_list1[2]}",
f"{value_list2[0]}" : f"{value_list2[2]}",
f"{value_list3[0]}" : f"{value_list3[2]}",
f"{value_list4[0]}" : f"{value_list4[2]}",
}
if you run;
print(fish_id)
Result will be like below, but if you can use for loops, it can be more effective.
{'0': 'NONE', '1': 'Hampala macrolepidota', '2': 'Channa micropeltes', '3': 'Chitala ornata'}
I use indent = 2, but I want the first level of indentation to be zero. For example:
Partial Code
json.dump(json_data, json_file, indent=2)
Output
{
"a": 1,
"b": "2",
"list": [
{
"c": 3,
"d": 4,
}
]
}
What I want instead
{
"a": 1,
"b": "2",
"list": [
{
"c": 3,
"d": 4,
}
]
}
As stated in the comments, it doesn't make functional difference and you will need custom pretty-print. something like
import json
import textwrap
spam = {"a": 1, "b": "2",
"list": [{"c": 3, "d": 4,}]}
eggs = json.dumps(spam, indent=2).splitlines()
eggs = '\n'.join([eggs[0], textwrap.dedent('\n'.join(eggs[1:-1])), eggs[-1]])
print(eggs)
with open('spam.json', 'w') as f:
f.write(eggs)
output
{
"a": 1,
"b": "2",
"list": [
{
"c": 3,
"d": 4
}
]
}
I have a JSON file that looks like this
{
"values": {
"a": 1,
"b": 2,
"c": 3,
"d": 4
},
"sales": [
{ "a": 0, "b": 0, "c": 0, "d": 0, "e": "karl" },
{ "a": 0, "b": 0, "c": 0, "d": 0, "e": "karl" },
{ "a": 4, "b": 10, "c": 20, "d": 30, "e": "karl" },
{ "a": 0, "b": 0, "c": 0, "d": 0, "e": "karl" }
]
}
and I am importing that via get_context_data
import json
class MyCreateView(CreateView):
def get_context_data(self, **kwargs):
context = super(MyCreateView, self).get_context_data(**kwargs)
with open('/path/to/my/JSON/file/my_json.cfg', 'r') as f:
myfile = json.load(f)
context['my_json'] = my_data
which works, when I do print myfile["sales"][0]["a"] I get 0 and when I put {{my_json}} into the index.html then I get the whole array.
So now my question is how to read the values best. Do I have to create context variables for each of the values or is it possible to read the json array in my html?
I tried {{my_json["sales"][0]["a"]}} but didn't work
If you want to get myfile["sales"][0]["a"] in template you can do like:
{{my_json.sales.0.a}}
or if you want to get myfile["values"]["a"] this can be done like:
{{my_json.values.a}}