flatten nested dictionary with dictionary embedded in lists (functional python)

flatten nested dictionary with dictionary embedded in lists (functional python) - python

This questions has been asked many times - but only once with this special case and I could partially find an answer here but it flattens down to every object.
I have this dictionary:
{'address': {'address_line_1': 'Floor Dekk House',
'address_line_2': 'Zippora Street Providence Industrial Estate',
'country': 'Seychelles',
'locality': 'Mahe',
'premises': '1st'},
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers': ['appointment-count'],
'kind': 'searchresults#officer',
'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
'snippet': '',
'title': 'ASTROCOM AG '}
As you can see "description_identifiers" and "matches.snippet" and "matches.title" have a list as value. I'd like to edit my code below to flatten my dictionary so that the json is flattened in a{key:value, key:value, key:value}` pair - but if the value is a list of atomic objects (not a list of lists or a list of dictionaries), the value is maintained as a list.
The objective is so be able to upload then this json to postgresql.
Here's some code i found online:
def flatten_json(dictionary):
"""Flatten a nested json file"""
def unpack(parent_key, parent_value):
"""Unpack one level of nesting in json file"""
# Unpack one level only!!!
if isinstance(parent_value, dict):
for key, value in parent_value.items():
temp1 = parent_key + '_' + key
yield temp1, value
elif isinstance(parent_value, list):
i = 0
for value in parent_value:
temp2 = parent_key + '_' +str(i)
i += 1
yield temp2, value
else:
yield parent_key, parent_value
# Keep iterating until the termination condition is satisfied
while True:
# Keep unpacking the json file until all values are atomic elements (not dictionary or list)
dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items())))
# Terminate condition: not any value in the json file is dictionary or list
if not any(isinstance(value, dict) for value in dictionary.values()) and \
not any(isinstance(value, list) for value in dictionary.values()):
break
return dictionary
Desired output:
And to test, this dict:
Should not be (which is what I get now):
{'address_address_line_1': 'Floor Dekk House',
'address_address_line_2': 'Zippora Street Providence Industrial Estate',
'address_country': 'Seychelles',
'address_locality': 'Mahe',
'address_premises': '1st',
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers_0': 'appointment-count',
'kind': 'searchresults#officer',
'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
'matches_title_0': 1,
'matches_title_1': 8,
'matches_title_2': 10,
'matches_title_3': 11,
'snippet': '',
'title': 'ASTROCOM AG '}
But rather
{'address_address_line_1': 'Floor Dekk House',
'address_address_line_2': 'Zippora Street Providence Industrial Estate',
'address_country': 'Seychelles',
'address_locality': 'Mahe',
'address_premises': '1st',
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers_0': 'appointment-count',
'kind': 'searchresults#officer',
'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
'matches_title': [1, 8, 10, 11]
'snippet': '',
'title': 'ASTROCOM AG '}

You are almost done, except you need a little more check on the condition:
def flatten(dict_, prefix):
for k, v in dict_.items():
if isinstance(v, list) and len(v)==1:
if isinstance(v[0], dict):
for key, value in flatten(v[0], prefix+k+"_"):
yield key, value
else:
yield prefix+k+"_0", v[0]
elif isinstance(v, dict):
for key, value in flatten(v, prefix+k+"_"):
yield key, value
else:
yield prefix+k, v
Usage:
dict_ = {'address': {'address_line_1': 'Floor Dekk House',
'address_line_2': 'Zippora Street Providence Industrial Estate',
'country': 'Seychelles',
'locality': 'Mahe',
'premises': '1st'},
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers': ['appointment-count'],
'kind': 'searchresults#officer',
'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
'snippet': '',
'title': 'ASTROCOM AG '}
import json
print(json.dumps(dict(list(flatten(dict_, ""))), indent=4))
Output:
{
"address_address_line_1": "Floor Dekk House",
"address_address_line_2": "Zippora Street Providence Industrial Estate",
"address_country": "Seychelles",
"address_locality": "Mahe",
"address_premises": "1st",
"address_snippet": "1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles",
"appointment_count": 1,
"description": "Total number of appointments 1",
"description_identifiers_0": "appointment-count",
"kind": "searchresults#officer",
"links_self": "/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments",
"matches_snippet": [],
"matches_title": [
1,
8,
10,
11
],
"snippet": "",
"title": "ASTROCOM AG "
}

Related

print a dictionary adding a blank space between each blocks

New to python. Here's a nested dictionary with two books each having 8 attributes.
book_collection ={17104: {'Title': 'A River', 'Author': 'Elisha Mitchell', 'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014', 'Copies': 2, 'Available': 2, 'ID': 17104}, 37115: {'Title': 'Aim High', 'Author': 'George Tayloe Winston', 'Publisher': 'Manning Hall Press', 'Pages': '663', 'Year': '2014', 'Copies': 5, 'Available': 5, 'ID': 37115}}
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
The output:
Title: A River
Author: Elisha Mitchell
Publisher: FPG Publishing
Pages: 345
Year: 2014
Copies: 2
Available: 2
ID: 17104
Title: Aim High
Author: George Tayloe Winston
Publisher: Manning Hall Press
Pages: 663
Year: 2014
Copies: 5
Available: 5
ID: 37115
How can I add a blank space between each book, and bring the 'ID' attribute to the first row of each book. The output is supposed to look like this:
ID: 17104
Title: A River
Author: Elisha Mitchell
Publisher: FPG Publishing
Pages: 345
Year: 2014
Copies: 2
Available: 2
ID: 37115
Title: Aim High
Author: George Tayloe Winston
Publisher: Manning Hall Press
Pages: 663
Year: 2014
Copies: 5
Available: 5
If there are 20 books, how can I just print the first 10 and ask the user for permission to continue?

Use this:
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()
You can just use the index() function to check if the index is 9 then ask like this
for id, book in book_collection.items():
if book_collection.index(id) == 9:
n = int(input("Press 0 to continue or else to exit"))
if n != 0:
break
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()

A dictionary's items method method returns an iterable of tuples (immutable lists). Each tuple yielded represents a pair of key and value, with the key being in the tuple's 0 index, and the value in 1 index.
The for loop you're using - for book_attribute, attribute_value in book.items(): - is syntactic sugar for "take the two values in the tuple and assign them to these variables, then run the code in this block."
It might be easier to think of it this way:
>>> book_dict = {'Title': 'A River', 'Author': 'Elisha Mitchell', 'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014', 'Copies': 2, 'Available': 2, 'ID': 17104}
>>> book_dict_entries = list(book_dict.items())
>>> print(book_dict_entries)
[('Title', 'A River'), ('Author', 'Elisha Mitchell'), ('Publisher', 'FPG Publishing'), ('Pages', '345'), ('Year', '2014'), ('Copies', 2), ('Available', 2), ('ID', 17104)]
There's a few directions to go from here. One way is that - since it's just a list -you can search for the one representing the ID field and swap it with whatever happens to be the first element in that list. Or, before turning it into a list, simply print ID from the dictionary, then filter that field when enumerating through the rest of the fields.
As to your second question, if you want to print an empty line at a certain point - simply call print with no arguments. Like when you've finished printing each dictionary.

I would define the dict with ID as the first key since from Python 3.7 (not before!) dicts are ordered
Put ID as the first key, and add print() after each inner loop.
book_collection = {
17104:
{'ID': 17104, 'Title': 'A River', 'Author': 'Elisha Mitchell',
'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014',
'Copies': 2, 'Available': 2, },
37115:
{'ID': 37115, 'Title': 'Aim High', 'Author': 'George Tayloe Winston',
'Publisher': 'Manning Hall Press', 'Pages': '663', 'Year': '2014',
'Copies': 5, 'Available': 5}
}
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()

I have a list with the curly brackets and I'm not sure how to search list with the curly brackets

I would like to search for one of the specific ID's but not sure on how to navigate with the curly brackets
[{'address': '9 Lee Road, Wirral, Merseyside',
'url': '/get/ODQ2MjhhNTg1Y2E1YzE2IDE3MDkxMzc2IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'ODQ2MjhhNTg1Y2E1YzE2IDE3MDkxMzc2IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Aylesbury, Buckinghamshire',
'url': '/get/MTEwZDgzMGUxMDBlMWQyIDIyMDI5NjA1IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'MTEwZDgzMGUxMDBlMWQyIDIyMDI5NjA1IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, London',
'url': '/get/MjEyYTIxNDhjZjM5ZTQ4IDU3ODQ4NzUgMWRhMzI2ZmRlZjdkYzM2',
'id': 'MjEyYTIxNDhjZjM5ZTQ4IDU3ODQ4NzUgMWRhMzI2ZmRlZjdkYzM2'},
{'address': '9 Lee Road, Manchester',
'url': '/get/MmNkNDQzN2I2ODc3NmVhIDMwMTUwOTg3IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'MmNkNDQzN2I2ODc3NmVhIDMwMTUwOTg3IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Aldeburgh, Suffolk',
'url': '/get/N2YzZGJiMTQ5OGRlYjg3IDIyOTczNDM5IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'N2YzZGJiMTQ5OGRlYjg3IDIyOTczNDM5IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Chesterfield, Derbyshire',
'url': '/get/MWI5MGFhNDY5MjcwNDUwIDcxMjg0MjggMWRhMzI2ZmRlZjdkYzM2',
'id': 'MWI5MGFhNDY5MjcwNDUwIDcxMjg0MjggMWRhMzI2ZmRlZjdkYzM2'}]

my_list is a list => mylist = [...] and each item inside is dictionary => {'url': '/get/=...,}. Loop over each element of the list with any loop you want and check for each element's wanted key. For your case look for 'id' key.
my_list = [{'address': '9 Lee Road, Wirral, Merseyside',
'url': '/get/ODQ2MjhhNTg1Y2E1YzE2IDE3MDkxMzc2IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'ODQ2MjhhNTg1Y2E1YzE2IDE3MDkxMzc2IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Aylesbury, Buckinghamshire',
'url': '/get/MTEwZDgzMGUxMDBlMWQyIDIyMDI5NjA1IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'MTEwZDgzMGUxMDBlMWQyIDIyMDI5NjA1IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, London',
'url': '/get/MjEyYTIxNDhjZjM5ZTQ4IDU3ODQ4NzUgMWRhMzI2ZmRlZjdkYzM2',
'id': 'MjEyYTIxNDhjZjM5ZTQ4IDU3ODQ4NzUgMWRhMzI2ZmRlZjdkYzM2'},
{'address': '9 Lee Road, Manchester',
'url': '/get/MmNkNDQzN2I2ODc3NmVhIDMwMTUwOTg3IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'MmNkNDQzN2I2ODc3NmVhIDMwMTUwOTg3IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Aldeburgh, Suffolk',
'url': '/get/N2YzZGJiMTQ5OGRlYjg3IDIyOTczNDM5IDFkYTMyNmZkZWY3ZGMzNg==',
'id': 'N2YzZGJiMTQ5OGRlYjg3IDIyOTczNDM5IDFkYTMyNmZkZWY3ZGMzNg=='},
{'address': '9 Lee Road, Chesterfield, Derbyshire',
'url': '/get/MWI5MGFhNDY5MjcwNDUwIDcxMjg0MjggMWRhMzI2ZmRlZjdkYzM2',
'id': 'MWI5MGFhNDY5MjcwNDUwIDcxMjg0MjggMWRhMzI2ZmRlZjdkYzM2'}]
my_search_id = 'MWI5MGFhNDY5MjcwNDUwIDcxMjg0MjggMWRhMzI2ZmRlZjdkYzM2'
count = 0
for ids in my_list:
#print("ID: {}".format(ids["id"]))
if ids["id"] == my_search_id:
print("Found")
print("List index: {}".format(count))
print("Address: {}".format(ids["address"]))
print("Url: {}".format(ids["url"]))
print("Id: {}".format(ids["id"]))
count += 1

Since it is list[dict],
you can iterate the list like normal and access each item’s id.
for item in list:
and then for each item you can access the id value like this item[‘id’]

What you have is a list of dictionaries or JSON object.
You can iterate the list normally by using a for loop.
for item in list: and compare the item["id"] to the id you want to find e.g
myid = "the id"
for item in mylist:
if item["id"] == myid:
print(item)

Unique values of Dictionary comprehension, return dictionary instread of string

this is my data:
data = [{'id': 1, 'name': 'The Musical Hop', 'city': 'San Francisco', 'state': 'CA'},
{'id': 2, 'name': 'The Dueling Pianos Bar', 'city': 'New York', 'state': 'NY'},
{'id': 3, 'name': 'Park Square Live Music & Coffee', 'city': 'San Francisco', 'state': 'CA'}]
I want to find out the unique values (thats why I use a set) of "city" and return them like this:
cities = set([x.get("city") for x in data])
cities ´
{'New York', 'San Francisco'}
However, I also want to return the corresponding state, like this:
[{"city": "New York", "state": "NY"}, {"city": "San Francisco", "state": "CA"}]
Is there a way to do this?

You can use dict-comprehension for the task:
out = list({x['city']:{'city':x['city'], 'state':x['state']} for x in data}.values())
print(out)
Prints:
[{'city': 'San Francisco', 'state': 'CA'}, {'city': 'New York', 'state': 'NY'}]

you can use a dict-comprehension to create a city->state mapping, then iterate it to create the list you want:
city_to_state = {x["city"]: x["state"] for x in data}
result = [{"city":k, "state":v} for k,v in city_to_state.items()]

why do i have a type error when i run this loop to print out key value pairs? each item k and v are str already

getting an error when I try to run the code.
Traceback (most recent call last): File "", line 10, in
print(k+':'+v)
TypeError: must be str, not list
with open("a_movie.json") as json_file:
json_data=json.load(json_file)
# Print each key-value pair in json_data
for k, v in json_data.items():
print(type(k))
print(type(v))
print(k+':'+v)
sample json:
{'Title': 'The Social Network', 'Year': '2010', 'Rated': 'PG-13', 'Released': '01 Oct 2010', 'Runtime': '120 min', 'Genre': 'Biography, Drama', 'Director': 'David Fincher', 'Writer': 'Aaron Sorkin (screenplay), Ben Mezrich (book)', 'Actors': 'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons', 'Plot': 'As Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, he is sued by the twins who claimed he stole their idea, and by the co-founder who was later squeezed out of the business.', 'Language': 'English, French', 'Country': 'USA', 'Awards': 'Won 3 Oscars. Another 171 wins & 183 nominations.', 'Poster': 'https://m.media-amazon.com/images/M/MV5BOGUyZDUxZjEtMmIzMC00MzlmLTg4MGItZWJmMzBhZjE0Mjc1XkEyXkFqcGdeQXVyMTMxODk2OTU#._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}], 'Metascore': '95', 'imdbRating': '7.7', 'imdbVotes': '590,040', 'imdbID': 'tt1285016', 'Type': 'movie', 'DVD': '11 Jan 2011', 'BoxOffice': '$96,400,000', 'Production': 'Columbia Pictures', 'Website': 'N/A', 'Response': 'True'}

with open("a_movie.json") as json_file:
json_data=json.load(json_file)
# Print each key-value pair in json_data
for k, v in json_data.items():
print(type(k))
print(type(v))
print(k,':',v)
Not every time k and v be str. So instead of + use ,

print(k+':'+v)
the error is from there because your v is a list. Here is an alternative:
f = [str(z) for z in v]
x = ",".join(f)
print(k+":"+x)

Trying to generate a list from a JSON object (TypeError list indices must be integers or slices, not str)

I have retrieved a JSON object from an API. The JSON object looks like this:
{'copyright': 'Copyright (c) 2020 The New York Times Company. All Rights '
'Reserved.',
'response': {'docs': [{'_id': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'abstract': 'LEAD: RESEARCHERS at the Brookhaven '
'National Laboratory are employing a novel '
'model to study skin cancer in humans: '
'they are exposing tiny tropical fish to '
'ultraviolet radiation.',
'byline': {'organization': None,
'original': 'By Eric Schmitt',
'person': [{'firstname': 'Eric',
'lastname': 'Schmitt',
'middlename': None,
'organization': '',
'qualifier': None,
'rank': 1,
'role': 'reported',
'title': None}]},
'document_type': 'article',
'headline': {'content_kicker': None,
'kicker': None,
'main': 'Tiny Fish Help Solve Cancer '
'Riddle',
'name': None,
'print_headline': 'Tiny Fish Help Solve '
'Cancer Riddle',
'seo': None,
'sub': None},
'keywords': [{'major': 'N',
'name': 'organizations',
'rank': 1,
'value': 'Brookhaven National '
'Laboratory'},
{'major': 'N',
'name': 'subject',
'rank': 2,
'value': 'Ozone'},
{'major': 'N',
'name': 'subject',
'rank': 3,
'value': 'Radiation'},
{'major': 'N',
'name': 'subject',
'rank': 4,
'value': 'Cancer'},
{'major': 'N',
'name': 'subject',
'rank': 5,
'value': 'Research'},
{'major': 'N',
'name': 'subject',
'rank': 6,
'value': 'Fish and Other Marine Life'}],
'lead_paragraph': 'RESEARCHERS at the Brookhaven '
'National Laboratory are employing a '
'novel model to study skin cancer in '
'humans: they are exposing tiny '
'tropical fish to ultraviolet '
'radiation.',
'multimedia': [],
'news_desk': 'Science Desk',
'print_page': '3',
'print_section': 'C',
'pub_date': '1989-12-26T05:00:00+0000',
'section_name': 'Science',
'snippet': '',
'source': 'The New York Times',
'type_of_material': 'News',
'uri': 'nyt://article/e3e5e5e5-1b32-5e2b-aea7-cf20c558dbd3',
'web_url': 'https://www.nytimes.com/1989/12/26/science/tiny-fish-help-solve-cancer-riddle.html',
'word_count': 870},
{'_id': 'nyt://article/32a2431d-623a-525b-a21d-d401be865818',
'abstract': 'LEAD: Clouds, even the ones formed by '
...and continues like that, too long to show all of it here.
Now, when I want to list just one headline, I use:
pprint(articles['response']['docs'][0]['headline']['print_headline'])
And I get the output
'Tiny Fish Help Solve Cancer Riddle'
The problem is when I want to pick out all of the headlines from this JSON object, and make a list of them. I tried:
index = 0
for headline in articles:
headlineslist = ['response']['docs'][index]['headline']['print_headline'].split("''")
index = index + 1
headlineslist
But I get the error TypeError: list indices must be integers or slices, not str
In other words, it worked when I "listed" just one headline, at index [0], but not when I try to repeat the process over each index. How do I iterate through each index to get a list of outputs like the first one?

To iterate over the document list you can just do the following:
for doc in (articles['response']['docs']):
print(doc['headline']['print_headline'])
This would print all headlines.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

flatten nested dictionary with dictionary embedded in lists (functional python) - python

Related

print a dictionary adding a blank space between each blocks

I have a list with the curly brackets and I'm not sure how to search list with the curly brackets

Unique values of Dictionary comprehension, return dictionary instread of string

why do i have a type error when i run this loop to print out key value pairs? each item k and v are str already

Trying to generate a list from a JSON object (TypeError list indices must be integers or slices, not str)

Categories

Resources