Create nested JSON from flat csv

Create nested JSON from flat csv - python

Trying to create a 4 deep nested JSON from a csv based upon this example:
Region,Company,Department,Expense,Cost
Gondwanaland,Bobs Bits,Operations,nuts,332
Gondwanaland,Bobs Bits,Operations,bolts,254
Gondwanaland,Maureens Melons,Operations,nuts,123
At each level I would like to sum the costs and include it in the outputted JSON at the relevant level.
The structure of the outputted JSON should look something like this:
{
"id": "aUniqueIdentifier",
"name": "usually a nodes name",
"data": [
{
"key": "some key",
"value": "some value"
},
{
"key": "some other key",
"value": "some other value"
}
],
"children": [/* other nodes or empty */ ]
}
(REF: http://blog.thejit.org/2008/04/27/feeding-json-tree-structures-to-the-jit/)
Thinking along the lines of a recursive function in python but have not had much success with this approach so far... any suggestions for a quick and easy solution greatly appreciated?
UPDATE:
Gradually giving up on the idea of the summarised costs because I just can't figure it out :(. I'not much of a python coder yet)! Simply being able to generate the formatted JSON would be good enough and I can plug in the numbers later if I have to.
Have been reading, googling and reading for a solution and on the way have learnt a lot but still no success in creating my nested JSON files from the above CSV strucutre. Must be a simple solution somewhere on the web? Maybe somebody else has had more luck with their search terms????

Here are some hints.
Parse the input to a list of lists with csv.reader:
>>> rows = list(csv.reader(source.splitlines()))
Loop over the list to buildi up your dictionary and summarize the costs. Depending on the structure you're looking to create the build-up might look something like this:
>>> summary = []
>>> for region, company, department, expense, cost in rows[1:]:
summary.setdefault(*region, company, department), []).append((expense, cost))
Write the result out with json.dump:
>>> json.dump(summary, open('dest.json', 'wb'))
Hopefully, the recursive function below will help get you started. It builds a tree from the input. Please be aware of what type you want your leaves to be in, which we label as the "cost". You'll need to elaborate on the function to build-up the exact structure you intend:
import csv, itertools, json
def cluster(rows):
result = []
for key, group in itertools.groupby(rows, key=lambda r: r[0]):
group_rows = [row[1:] for row in group]
if len(group_rows[0]) == 2:
result.append({key: dict(group_rows)})
else:
result.append({key: cluster(group_rows)})
return result
if __name__ == '__main__':
s = '''\
Gondwanaland,Bobs Bits,Operations,nuts,332
Gondwanaland,Bobs Bits,Operations,bolts,254
Gondwanaland,Maureens Melons,Operations,nuts,123
'''
rows = list(csv.reader(s.splitlines()))
r = cluster(rows)
print json.dumps(r, indent=4)

Related

How to handle when API Response returns either list or dict in Python?

I am parsing an API in python using responseJson = json.loads(response.text)
The API response is somewhat like this:
When having single entry in books
{
"name": "A",
"books": {
"bookname": "BookA"
}
}
or
2. When having multiple entries in books
{
"name": "A",
"books": [
{
"bookname": "BookA"
},
{
"bookname": "BookB"
}
]
}
Currenty I am using:
if type(responseJson['books']) is dict:
bookName.append(responseJson['books']['bookname'])
# do a lot more stuff
else:
for val in responseJson['books']:
bookName.append(val['bookname'])
# do a lot more stuff
Since the code (# do a lot more stuff) is a bit complex, I was looking to find an optimized way to do this instead of relying on type().
Any suggestions on how to improve code quality here?

I would use isinstance instead of type but instead of having to different branches that do a bunch of stuff I would only look for the dictionaries and if found wrap place it inside of a list and then you only need one branch that does stuff.
for example:
books = response.json['books']
if isinstance(books, dict):
books = [books]
for val in books:
bookName.append(val['bookname'])
# do alot more stuff

sort values from a dictionary/json file

I've got this discord.py command that makes a leaderboard from a json
cogs/coins.json (the dictionary) looks like this:
{
"781524858026590218": {
"name": "kvbot test platform",
"total_coins": 129,
"data": {
"564050979079585803": {
"name": "Bluesheep33",
"coins": 127
},
"528647474596937733": {
"name": "ACAT_",
"coins": 2
}
}
(The green strings with numbers in the json files are discord guild/member ids)
How do I make the code shorter and clearer?
Thanks for helping in advance, because I really don't know the solution

When it comes to finding (sorting) the first ten items within a dict, then the way is much easier than repeatedly going through the dict and doing different things there.
And little better code, like Dict.get for safety access.
Based on a sample of JSON data.
with open('cogs/coins.json', 'r') as f:
coins_data = json.load(f)
# Get is safefy access to dict
# Dict.items() returns pairs of (Key, Val)
members_coins = list(coins_data.get(str(ctx.guild.id), None)['data'].items())
if members_coins is None: # If data not found
await ctx.send('Not data')
return
# Sort list by Val part of pair, and `coins` key, reverse for descending
members_coins.sort(key=lambda x: x[1]['coins'], reverse=True)
output = ''
# list[:10] for first 10 items (if list is smaller, thats okay, python don't mind)
for member_id, vals in members_coins[:10]:
output += f'{vals["name"]}: {vals["coins"]}'
# output += f'<#{member_id}>: {vals["coins"]}' # If you want "mention" display of user
await ctx.send(output)

Parse Json file and save specific values [duplicate]

This question already has answers here:
Getting a list of values from a list of dicts
(10 answers)
Closed 5 years ago.
I have this JSON file where the amount of id's sometimes changes (more id's will be added):
{
"maps": [
{
"id": "blabla1",
"iscategorical": "0"
},
{
"id": "blabla2",
"iscategorical": "0"
},
{
"id": "blabla3",
"iscategorical": "0"
},
{
"id": "blabla4",
"iscategorical": "0"
}
]
}
I have this python code that has to print all the values of ids:
import json
data = json.load(open('data.json'))
variable1 = data["maps"][0]["id"]
print(variable1)
variable2 = data["maps"][1]["id"]
print(variable2)
variable3 = data["maps"][2]["id"]
print(variable3)
variable4 = data["maps"][3]["id"]
print(variable4)
I have to use variables, because i want to show the values in a dropdown menu. Is it possible to save the values of the id's in a more efficient way? How do you know the max amount of id's of this json file (in de example 4)?

You can get the number of id (which is the number of elements) by checking the length of data['maps']:
number_of_ids = len(data['maps'])
A clean way to get all the id values is storing them in a list.
You can achieve this in a pythonic way like this:
list_of_ids = [map['id'] for map in data['maps']]
Using this approach you don't even need to store the number of elements in the original json, because you iterate through all of them using a foreach approach, essentially.
If the pythonic approach troubles you, you can achieve the same thing with a classic foreach approach doing so:
list_of_ids = []
for map in data['maps']:
list_of_ids.append(map['id'])
Or you can do with a classic for loop, and here is where you really need the length:
number_of_ids = len(data['maps'])
list_of_ids = []
for i in range(0,number_of_ids):
list_of_ids.append(data['maps'][i]['id'])
This last is the classic way, but I suggest you to take the others approaches in order to leverage the advantages python offers to you!
You can find more on this stuff here!
Happy coding!

data['maps'] is a simple list, so you can iterate over it as such:
for map in data['maps']:
print(map['id'])
To store them in a variable, you'll need to output them to a list. Storing them each in a separate variable is not a good idea, because like you said, you don't have a way to know how many there are.
ids = []
for map in data['maps']:
ids.append(map['id'])

Parsing JSON output efficiently in Python?

The below block of code works however I'm not satisfied that it is very optimal due to my limited understanding of using JSON but I can't seem to figure out a more efficient method.
The steam_game_db is like this:
{
"applist": {
"apps": [
{
"appid": 5,
"name": "Dedicated Server"
},
{
"appid": 7,
"name": "Steam Client"
},
{
"appid": 8,
"name": "winui2"
},
{
"appid": 10,
"name": "Counter-Strike"
}
]
}
}
and my Python code so far is
i = 0
x = 570
req_name_from_id = requests.get(steam_game_db)
j = req_name_from_id.json()
while j["applist"]["apps"][i]["appid"] != x:
i+=1
returned_game = j["applist"]["apps"][i]["name"]
print(returned_game)
Instead of looping through the entire app list is there a smarter way to perhaps search for it? Ideally the elements in the data structure with 'appid' and 'name' were numbered the same as their corresponding 'appid'
i.e.
appid 570 in the list is Dota2
However element 570 in the data structure in appid 5069 and Red Faction
Also what type of data structure is this? Perhaps it has limited my searching ability for this answer already. (I.e. seems like a dictionary of 'appid' and 'element' to me for each element?)
EDIT: Changed to a for loop as suggested
# returned_id string for appid from another query
req_name_from_id = requests.get(steam_game_db)
j_2 = req_name_from_id.json()
for app in j_2["applist"]["apps"]:
if app["appid"] == int(returned_id):
returned_game = app["name"]
print(returned_game)

The most convenient way to access things by a key (like the app ID here) is to use a dictionary.
You pay a little extra performance cost up-front to fill the dictionary, but after that pulling out values by ID is basically free.
However, it's a trade-off. If you only want to do a single look-up during the life-time of your Python program, then paying that extra performance cost to build the dictionary won't be beneficial, compared to a simple loop like you already did. But if you want to do multiple look-ups, it will be beneficial.
# build dictionary
app_by_id = {}
for app in j["applist"]["apps"]:
app_by_id[app["appid"]] = app["name"]
# use it
print(app_by_id["570"])
Also think about caching the JSON file on disk. This will save time during your program's startup.

It's better to have the JSON file on disk, you can directly dump it into a dictionary and start building up your lookup table. As an example I've tried to maintain your logic while using the dict for lookups. Don't forget to encode the JSON it has special characters in it.
Setup:
import json
f = open('bigJson.json')
apps = {}
with open('bigJson.json', encoding="utf-8") as handle:
dictdump = json.loads(handle.read())
for item in dictdump['applist']['apps']:
apps.setdefault(item['appid'], item['name'])
Usage 1:
That's the way you have used it
for appid in range(0, 570):
if appid in apps:
print(appid, apps[appid].encode("utf-8"))
Usage 2: That's how you can query a key, using getinstead of [] will prevent a KeyError exception if the appid isn't recorded.
print(apps.get(570, 0))

Can't encode content of keys from json

I would like to print every value that belongs to my id key in my json file. I'm using the below code to print the whole file:
import json
from pprint import pprint
with open('f:\Docs\testList.json') as data_file:
data = json.load(data_file)
pprint( data )
And here is the json
{ "clubs": [
{
"location": "Dallas",
"id": "013325K52",
"type": "bar"
},
{
"location": "Dallas",
"id": "763825X56",
"type": "restaurant"
}
] }
It works correctly, however I can't figure out the type of the data_file and data variables, therefore I have no idea how could I write a for loop to print the content. I'm really new to Python, but in pseudo code I would do something like this (if I assume data is an array (or Python list) of dictionary objects):
for dictionaryVar in jsonArray
print dictionaryVar["id"]
or
for dictionaryVar in jsonArray
if dictionaryVar containsKey: "id"
print dictionaryVar["id"]
I would really appreciate if somebody could show me the right way or give guidance, because I don't really have an idea. I checked the docs of the json module, but couldn't figure out what it really does and how.

data_file is a TextIOWrapper, that is: a file object used for reading text. You should not care about it.
data is a dict. Dictionaries map key-value pairs, but you probably already know that. data has one key, "clubs", which maps to a list. This list contains other dictionaries.
Your pseudo-code:
for dictionaryVar in jsonArray
print dictionaryVar["id"]
corresponds to the following Python code:
for item in data['clubs']:
print item['id']
Your pseudo-code:
for dictionaryVar in jsonArray
if dictionaryVar containsKey: "id"
print dictionaryVar["id"]
corresponds to the following Python code:
for item in data['clubs']:
if 'id' in item:
print item['id']

This should be fairly simple, here is a quite explicit way of doing it (i made it longer so it's clearer for you, you could do this more efficiently)
import json
from pprint import pprint
with open('f:\Docs\testList.json') as data_file:
data = json.load(data_file)
clubs = data['clubs']
for club in clubs:
# Use dict.get() here to default value to None if
# it doesn't exist
club_id = club.get('id', None)
if club_id is not None:
print club_id

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create nested JSON from flat csv - python

Related

How to handle when API Response returns either list or dict in Python?

sort values from a dictionary/json file

Parse Json file and save specific values [duplicate]

Parsing JSON output efficiently in Python?

Can't encode content of keys from json

Categories

Resources