How to access an object within a list of them? - python

So I have a list of objects from an API which looks like this:
{
"1": {
"artist": "Ariana Grande",
"title": "Positions"
},
"2": {
"artist": "Luke Combs",
"title": "Forever After All"
},
"3": {
"artist": "24kGoldn Featuring iann dior",
"title": "Mood"
},
}
I was wondering how do I run a for loop to access each item.
def create_new_music_chart(data_location):
with open(data_location, 'r') as json_file:
data = json.load(json_file)
for song in data:
print(song)
Returns:
```
1
2
3
but when I try doing this to print artist, it doesn't work:
for song in data:
print(song[artist])
Result:
TypeError: string indices must be integers

song is the key in the dictionary. If you want to get the artist, you must look up the key song in the dictionary data, and artist should be a string:
for song in data:
# song is "1"
# data[song] is {"artist": "Ariana Grande", "title": "Positions"}
print(data[song]["artist"])

song is the key, not the dictionary value. Iterate using values instead:
for song_dict in data.values():
print(song_dict["artist"])

Related

Getting values from complicated json content [duplicate]

This question already has answers here:
Find all occurrences of a key in nested dictionaries and lists
(12 answers)
Closed 2 years ago.
So basically I am web scraping a site and for that I need "id" of all the location from a complicated json content:
https://hilfe.diakonie.de/hilfe-vor-ort/marker-json.php?ersteller=&kategorie=0&text=&n=54.14365551060835&e=19.704533281249986&s=48.00384435890099&w=1.2035567187499874&zoom=20000
I have tried dict.items method but i am getting only 2 values that are in start of the dict and then a list start:
res = requests.get(url).json()
json_obj = res.items()
for key, value in json_obj:
if key == "id":
print(value)
json = {
"count": 17652,
"items": [
{
"lat": 51.17450581504055,
"lng": 10.007757153036533,
"count": 17652,
"north": 54.1425475,
"east": 15.0019,
"south": 48.0039543,
"west": 5.952813,
"elements": [
{
"id": "5836de61a581c245ae48806b",
"o": 'null'
},
{
"id": "5836de62a581c245ae48814b",
"o": 'null'
},
{
"id": "5836de57a581c245ae487944",
"o": 'null'
},
{
"id": "5836de64a581c245ae4882a8",
"o": 'null'
},
{
"id": "5836de54a581c245ae48772a",
"o": 'null'
},
{
"id": "5836de57a581c245ae487945",
"o": 'null'
}
]
}
]
}
The id attribute is nested inside arrays in the elements attribute of objects, which are in turn nested inside an array in the items attribute of the response. Use a list comprehension with 2 loops to extract them:
res = requests.get(url).json()
ids = [ele["id"] for v in res["items"] for ele in v["elements"]]
for id in ids:
print(id)
The JSON consists of a root dictionary with two key-value pairs. One is count, which is an integer, the other is items, which maps to a list of a single item. This item is a dictionary, which has several key-value pairs, one of which is elements, which is a list of dictionaries, each containing an id:
import requests
url = "https://hilfe.diakonie.de/hilfe-vor-ort/marker-json.php?ersteller=&kategorie=0&text=&n=54.14365551060835&e=19.704533281249986&s=48.00384435890099&w=1.2035567187499874&zoom=20000"
response = requests.get(url)
response.raise_for_status()
elements = response.json()["items"][0]["elements"]
# print only the first ten ids
for element in elements[:10]:
print(element["id"])
Output:
5836de61a581c245ae48806b
5836de62a581c245ae48814b
5836de57a581c245ae487944
5836de64a581c245ae4882a8
5836de54a581c245ae48772a
5836de57a581c245ae487945
5836de61a581c245ae48806c
5836de64a581c245ae4882aa
5836de57a581c245ae487947
5836de62a581c245ae48814d
>>>
Same thing but different - using operator.itemgetter.
items = operator.itemgetter('items')
elements = operator.itemgetter('elements')
eyedees = operator.itemgetter('id')
data = elements(items(json)[0])
stuff = map(eyedees,data)
print(list(stuff))
Uses json from the example in the question.

Get Average from two JSON Arrays

I have two json feeds which I combined into one file. Events and Users. I need to get the average age from users.json to determine the average age of all distinct users who visited the home page (events.json).
Sample JSON Feed Events:
"events": [
{
"name": "Added item to cart",
"timestamp": 1422119921,
"user_id": "5a5598f2-f7db-420e-9b8e-52a9ad694bc1"
},
{
"name": "Visited product page",
"timestamp": 1409554014,
"user_id": "4683c9b6-3c8b-4215-a401-a9bbfde833ee"
}
Sample Users Feed:
"age": 27,
"gender": "F",
"device": "android"
},
"712ae3b5-fbf0-4d83-9324-adc06af77d3a": {
"age": 34,
"gender": "F",
"device": "android"
},
I'm new to python and I believe I am on the right track with the below but not sure where to go next. Any additional help would be appreciated.
import json
# Opening JSON file
with open('combined.json') as json_file:
data = json.load(json_file)
# for reading nested data [0] represents
# the index value of the list
print(data['events'][0])
print(data['users'][0])
# for printing the key-value pair of
# nested dictionary for looop can be used
print("\nPrinting nested dicitonary as a key-value pair\n")
for i in data['events']:
print("Name:", i['name'])
for i in data ['users']:
print ("Age:", i['age'])
You're on the right track, just add up all the ages and divide by the number of users. This is one way to do that:
import json
# Opening JSON file
with open('combined.json') as json_file:
data = json.load(json_file)
total_age = 0
counter = 0
for k, v in json.load(data['users']).items():
# Add all the ages
total_age += v['age']
counter += 1
print(total_age / counter)

How to loop through a JSON file using Python with nested lists and dictionaries

I'm trying to loop through a JSON file using Python and return the name of the object and associated modules for it.
Right now I can basically get the output I want hardcoding the indexes. However, this obviously isn't the right way to do it (the JSON file can vary in length).
Whenever I try to use a loop, I get errors like:
TypeError: string indices must be integers
My JSON file looks like this:
{
"name": "gaming_companies",
"columns": [{
"name": "publisher",
"type": "string",
"cleansing": ["clean_string"]
},
{
"name": "genre",
"type": "string",
"cleansing": ["match_genre", "clean_string"]
},
{
"name": "sales",
"type": "int",
"cleansing": []
}
]
}
My Python code which is 'working' looks like:
import json as js
def cleansing(games_json):
print (games_json['columns'][0]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][0]['cleansing'] ]
print (games_json['columns'][1]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][1]['cleansing'] ]
print (games_json['columns'][2]['name'] + " - cleansing:")
[print(i) for i in games_json['columns'][2]['cleansing'] ]
with open(r'C:\Desktop\gamefolder\jsonfiles\games.json') as input_json:
games_json = js.load(input_json)
cleansing(games_json)
The output I'm trying to return is:
publisher
cleansing:
clean_string
genre
cleansing:
match_genre
clean_string
sales
cleansing:
My attempt to loop through them like this:
for x in games_json:
for y in games_json['columns'][x]:
print (y)
Results in:
TypeError: list indices must be integers or slices, not str
games_json shows as a Dict.
Columns shows as a list of dictionaries.
Each object's cleansing attribute shows as a list.
I think this is where my problem is, but I'm not able to get over the hurdle.
The problem with your attempt is using an iterator as a string.
The x in for y in games_json['columns'][x]: is an iterator object and not the strings ['name', 'cleansing'].
You can learn more about python iterators here
As for the case - you might want to iterate over the columns as a separate list.
This code should work
for item in f["columns"]:
print(item["name"])
print("cleansing:")
print(item["cleansing"])
Output-
publisher
cleansing:
['clean_string']
genre
cleansing:
['match_genre', 'clean_string']
sales
cleansing:
[]
This can be one of working solutions as you want to iterate array's elements.
import json
for x in games_json['columns']:
print(x)
print(x['name'])
x = """{
"name": "gaming_companies",
"columns": [{
"name": "publisher",
"type": "string",
"cleansing": ["clean_string"]
},
{
"name": "genre",
"type": "string",
"cleansing": ["match_genre", "clean_string"]
},
{
"name": "sales",
"type": "int",
"cleansing": []
}
]
}"""
x = json.loads(x)
for i in x['columns']:
print(i['name'])
print("cleansing:")
for j in i["cleansing"]:
print(j)
print('\n')
Output
publisher
cleansing:
clean_string
genre
cleansing:
match_genre
clean_string
sales
cleansing:
with open(r'C:\Desktop\gamefolder\jsonfiles\games.json') as input_json:
games_json = js.load(input_json)
for i in games_json['columns']:
print(i['name'])
print("cleansing:")
for j in i["cleansing"]:
print(j)
print('\n')

How do i check for duplicate entries before i add an entry the dictionary

Given i have the following dictionary which stores key(entry_id),value(entry_body,entry_title) pairs.
"entries": {
"1": {
"body": "ooo",
"title": "jack"
},
"2": {
"body": "ooo",
"title": "john"
}
}
How do i check whether the title of an entry that i want to add to the dictionary already exists.
For example: This is the new entry that i want to add.
{
"body": "nnnn",
"title": "jack"
}
Have you thought about changing your data structure? Without context, the IDs of the entries seem a little useless. Your question suggests you only want to store unique titles, so why not make them your keys?
Example:
"entries": {
"jack": "ooo",
"john": "ooo"
}
That way you can do an efficient if newname in entries membership test.
EDIT:
Based on your comment you can still preserve the IDs by extending the data structure:
"entries": {
"jack": {
"body": "ooo",
"id": 1
},
"john": {
"body": "ooo",
"id": 2
}
}
I agree with #Christian König's answer, your data structure seems like it could be made clearer and more efficient. Still, if you need a solution to this setup in particular, here's one that will work - and it automatically adds new integer keys to the entries dict.
I've added an extra case to show both a rejection and an accepted update.
def existing_entry(e, d):
return [True for entry in d["entries"].values() if entry["title"] == e["title"]]
def update_entries(e, entries):
if not any(existing_entry(e, entries)):
current_keys = [int(x) for x in list(entries["entries"].keys())]
next_key = str(max(current_keys) + 1)
entries["entries"][next_key] = e
print("Updated:", entries)
else:
print("Existing entry found.")
update_entries(new_entry_1, data)
update_entries(new_entry_2, data)
Output:
Existing entry found.
Updated:
{'entries':
{'1': {'body': 'ooo', 'title': 'jack'},
'2': {'body': 'ooo', 'title': 'john'},
'3': {'body': 'qqqq', 'title': 'jill'}
}
}
Data:
data = {"entries": {"1": {"body": "ooo", "title": "jack"},"2": {"body": "ooo","title": "john"}}}
new_entry_1 = {"body": "nnnn", "title": "jack"}
new_entry_2 = {"body": "qqqq", "title": "jill"}
This should work?
entry_dict = {
"1": {"body": "ooo", "title": "jack"},
"2": {"body": "ooo", "title": "john"}
}
def does_title_exist(title):
for entry_id, sub_dict in entry_dict.items():
if sub_dict["title"] == title:
print("Title %s already in dictionary at entry %s" %(title, entry_id))
return True
return False
print("Does the title exist? %s" % does_title_exist("jack"))
As Christian Suggests above this seems like an inefficient data structure for the job. It seems like if you just need index ID's a list may be better.
I think to achieve this, one will have to traverse the dictionary.
'john' in [the_dict[en]['title'] for en in the_dict]

Python 3 Get JSON value

I am using rest with a python script to extract Name and Start Time from a response.
I can get the information but I can't combine data so that the information is on the same line in a CSV. When I go to export them to CSV they all go on new lines.
There is probably a much better way to extract data from a JSON List.
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
for key, value in data_item.items():
#driver = {}
#test = {}
#startTime = {}
if key == "Name":
drivers.append(value)
if key == "StartTime":
drivers.append(value)
print (drivers)
Code to write to CSV:
with open(logFileName, 'a') as outcsv:
# configure writer to write standard csv file
writer = csv.writer(outcsv, delimiter=',', quotechar="'",
quoting=csv.QUOTE_MINIMAL, lineterminator='\n',skipinitialspace=True)
for driver in drivers:
writer.writerow(driver)
Here is a sample of the response:
"Query": {
"Results": [
{
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1400"
},
{
"Name": " John Doe"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
"XValues": [
{
"ReportScopeStartTime": "2018-06-18T23:00:00Z"
},
{
"ReportScopeEndTime": "2018-06-25T22:59:59Z"
},
{
"ID": "1401"
},
{
"Name": " Jane Smith"
},
{
"StartTime": "2018-06-19T07:16:10Z"
},
],
},
My ouput in csv:
John Doe
2018-06-19T07:16:10Z
Jane Smith
2018-06-19T07:16:10Z
Desired Outcome:
John Doe, 2018-06-19T07:16:10Z
Jane Smith, 2018-06-19T07:16:10Z
Just use normal dictionary access to get the values:
for item in driverDetails['Query']['Results']:
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
drivers.append(data_item["Name"])
if "StartTime" in data_item:
drivers.append(data_item["StartTime"])
print (drivers)
If you know the items will already have the required fields then you won't even need the in tests.
writer.writerow() expects a sequence. You are calling it with a single string as a parameter so it will split the string into individual characters. Probably you want to keep the name and start time together so extract them as a tuple:
for item in driverDetails['Query']['Results']:
name, start_time = "", ""
for data_item in item['XValues']:
body.append(data_item)
if "Name" in data_item:
name = data_item["Name"]
if "StartTime" in data_item:
start_time = data_item["StartTime"]
drivers.append((name, start_time))
print (drivers)
Now instead of being a list of strings, drivers is a list of tuples: the name for every item that has a name and the start time but if an input item has a name and no start time that field could be empty. Your code to write the csv file should now do the expected thing.
If you want to get all or most of the values try gathering them together into a single dictionary, then you can pull out the fields you want:
for item in driverDetails['Query']['Results']:
fields = {}
for data_item in item['XValues']:
body.append(data_item)
fields.update(data_item)
drivers.append((fields["ID"], fields["Name"], fields["StartTime"]))
print (drivers)
Once you have the fields in a single dictionary you could even build the tuple with a loop:
drivers.append(tuple(fields[f] for f in ("ID", "Name", "StartTime", "ReportScopeStartTime", "ReportScopeEndTime")))
I think you should list the fields you want explicitly just to ensure that new fields don't surprise you.

Categories

Resources