New to python. Here's a nested dictionary with two books each having 8 attributes.
book_collection ={17104: {'Title': 'A River', 'Author': 'Elisha Mitchell', 'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014', 'Copies': 2, 'Available': 2, 'ID': 17104}, 37115: {'Title': 'Aim High', 'Author': 'George Tayloe Winston', 'Publisher': 'Manning Hall Press', 'Pages': '663', 'Year': '2014', 'Copies': 5, 'Available': 5, 'ID': 37115}}
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
The output:
Title: A River
Author: Elisha Mitchell
Publisher: FPG Publishing
Pages: 345
Year: 2014
Copies: 2
Available: 2
ID: 17104
Title: Aim High
Author: George Tayloe Winston
Publisher: Manning Hall Press
Pages: 663
Year: 2014
Copies: 5
Available: 5
ID: 37115
How can I add a blank space between each book, and bring the 'ID' attribute to the first row of each book. The output is supposed to look like this:
ID: 17104
Title: A River
Author: Elisha Mitchell
Publisher: FPG Publishing
Pages: 345
Year: 2014
Copies: 2
Available: 2
ID: 37115
Title: Aim High
Author: George Tayloe Winston
Publisher: Manning Hall Press
Pages: 663
Year: 2014
Copies: 5
Available: 5
If there are 20 books, how can I just print the first 10 and ask the user for permission to continue?
Use this:
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()
You can just use the index() function to check if the index is 9 then ask like this
for id, book in book_collection.items():
if book_collection.index(id) == 9:
n = int(input("Press 0 to continue or else to exit"))
if n != 0:
break
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()
A dictionary's items method method returns an iterable of tuples (immutable lists). Each tuple yielded represents a pair of key and value, with the key being in the tuple's 0 index, and the value in 1 index.
The for loop you're using - for book_attribute, attribute_value in book.items(): - is syntactic sugar for "take the two values in the tuple and assign them to these variables, then run the code in this block."
It might be easier to think of it this way:
>>> book_dict = {'Title': 'A River', 'Author': 'Elisha Mitchell', 'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014', 'Copies': 2, 'Available': 2, 'ID': 17104}
>>> book_dict_entries = list(book_dict.items())
>>> print(book_dict_entries)
[('Title', 'A River'), ('Author', 'Elisha Mitchell'), ('Publisher', 'FPG Publishing'), ('Pages', '345'), ('Year', '2014'), ('Copies', 2), ('Available', 2), ('ID', 17104)]
There's a few directions to go from here. One way is that - since it's just a list -you can search for the one representing the ID field and swap it with whatever happens to be the first element in that list. Or, before turning it into a list, simply print ID from the dictionary, then filter that field when enumerating through the rest of the fields.
As to your second question, if you want to print an empty line at a certain point - simply call print with no arguments. Like when you've finished printing each dictionary.
I would define the dict with ID as the first key since from Python 3.7 (not before!) dicts are ordered
Put ID as the first key, and add print() after each inner loop.
book_collection = {
17104:
{'ID': 17104, 'Title': 'A River', 'Author': 'Elisha Mitchell',
'Publisher': 'FPG Publishing', 'Pages': '345', 'Year': '2014',
'Copies': 2, 'Available': 2, },
37115:
{'ID': 37115, 'Title': 'Aim High', 'Author': 'George Tayloe Winston',
'Publisher': 'Manning Hall Press', 'Pages': '663', 'Year': '2014',
'Copies': 5, 'Available': 5}
}
for id, book in book_collection.items():
for book_attribute, attribute_value in book.items():
print(book_attribute, ': ', attribute_value, sep='')
print()
Related
[[{'text': '\n ', 'category': 'cooking', 'title': {'text': 'Everyday
Italian', 'lang': 'en'}, 'author': {'text': 'Giada De Laurentiis'}, 'year':
{'text': '2005'}, 'price': {'text': '30.00'}},
{'text': '\n ', 'category': 'children', 'title': {'text': 'Harry Potter',
'lang': 'en'}, 'author': {'text': 'J K. Rowling'}, 'year': {'text':
'2005'}, 'price': {'text': '29.99'}}, {'text': '\n ', 'category':
'web', 'title': {'text': 'XQuery Kick Start', 'lang': 'en'}, 'author':
[{'text': 'James McGovern'}, {'text': 'Per Bothner'}, {'text': 'Kurt
Cagle'}, {'text': 'James Linn'}, {'text': 'Vaidyanathan Nagarajan'}],
'year': {'text': '2003'}, 'price': {'text': '49.99'}}, {'text': '\n ',
'category': 'web', 'cover': 'paperback', 'title': {'text': 'Learning XML',
'lang': 'en'}, 'author': {'text': 'Erik T. Ray'}, 'year': {'text': '2003'},
'price': {'text': '39.95'}}]]
output format:
category : cooking,
title : ['Everyday Italian', 'lang': 'en'],
author : Giada De Laurentiis,
year : '2005',
price : '30.00'
category : children,
title : ['Harry Potter', 'lang': 'en'],
author : 'J K. Rowling',
year : '2005',
price : '29.99'
category : web,
title : [ 'XQuery Kick Start''lang': 'en'],
author :[ 'James McGovern' , 'Per Bothner','Kurt Cagle','James Linn', 'Vaidyanathan Nagarajan'],
year : '2003',
price : '49.99'
category : web,
cover : paperback,
title : [ 'Learning XML','lang': 'en'],
author : 'Erik T. Ray',
year : '2003',
price : '39.95'
A simple loop like the following should get the output you require.
for entry in data[0]:
for k, v in entry.items():
print(k, ':', v)
def printBook(d):
del d['text']
for i in d:
if type(d[i]) == dict:
if len(d[i])==1:
d[i] = list(d[i].values())[0]
else:
d[i] = [('' if j=='text' else (j+':')) + d[i][j] for j in d[i]]
s = '\n'
for i,j in d.items():
s += f' {i} : {j} ,\n'
print(s)
try these it prints individual dictionary into your described format
Take a look at the pprint module which provides a nice way of printing data structures without the need for writing your own formatter.
Thanks for the coding excercise. Man, that output format was specific! :D Tricky, as the strings are not quoted if being standalone, quoted if coming from text attribute. Also tricky, that stuff must be thrown into [] if not just text. Also it is a little bit underspecified, because hey, what if there is no text at all, yet other keys?
Output format disclaimer:
I think there is a missing , after 'XQuery Kick Start'
I think Giada De Laurentiis should have been quoted as it is coming from 'text'
import copy
def transform_value(stuff):
if type(stuff) is str:
return stuff
if type(stuff) is dict:
elements = []
text = stuff.pop("text", "")
if text:
elements.append(f"'{text}'") # we quote only if there was really a text entry
if not stuff: # dict just got empty
return elements[0] # no more attributes, so no [] needed around text
elements.extend(f"'{key}': '{transform_value(value)}'" for key, value in stuff.items())
if type(stuff) is list:
elements = [transform_value(e) for e in stuff]
# this will obviously raise an exception if stuff is not one of str, dict or list
return f"[{', '.join(elements)}]"
def transform_pub(d: dict):
d = copy.deepcopy(d) # we are gonna delete keys, so we don't mess with the outer data
tail = d.pop("text", "")
result = ",\n".join(f'{key} : {transform_value(value)}' for key, value in d.items())
return result + tail
if __name__ == "__main__":
for sublist in data:
for pub in sublist:
print(transform_pub(pub))
I first wanted to use somehow the same mechanism for the publications themselves via some recursion. But then the code become too complicated as the text field is appended for publications, while it is coming first for attributes.
Once I let go of the fully structured solution, I started out with a test for the publication printer:
import pytest
from printing import transform_value
#pytest.mark.parametrize('input,output', [
("cooking", "cooking"),
({"text": "J K. Rowling"}, "'J K. Rowling'"),
({"lang": "en", "text": "Everyday Italian"},
"['Everyday Italian', 'lang': 'en']"),
([{"text": "James McGovern"},
{"text": "Per Bothner"},
{"text": "Kurt Cagle"},
{"text": "James Linn"},
{"text": "Vaidyanathan Nagarajan"}],
"['James McGovern', 'Per Bothner', 'Kurt Cagle', 'James Linn', 'Vaidyanathan Nagarajan']"
),
([{"text": "Joe"}], "['Joe']"),
({"a": "1"}, "['a': '1']"),
])
def test_transform_value(input, output):
assert transform_value(input) == output
getting an error when I try to run the code.
Traceback (most recent call last): File "", line 10, in
print(k+':'+v)
TypeError: must be str, not list
with open("a_movie.json") as json_file:
json_data=json.load(json_file)
# Print each key-value pair in json_data
for k, v in json_data.items():
print(type(k))
print(type(v))
print(k+':'+v)
sample json:
{'Title': 'The Social Network', 'Year': '2010', 'Rated': 'PG-13', 'Released': '01 Oct 2010', 'Runtime': '120 min', 'Genre': 'Biography, Drama', 'Director': 'David Fincher', 'Writer': 'Aaron Sorkin (screenplay), Ben Mezrich (book)', 'Actors': 'Jesse Eisenberg, Rooney Mara, Bryan Barter, Dustin Fitzsimons', 'Plot': 'As Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, he is sued by the twins who claimed he stole their idea, and by the co-founder who was later squeezed out of the business.', 'Language': 'English, French', 'Country': 'USA', 'Awards': 'Won 3 Oscars. Another 171 wins & 183 nominations.', 'Poster': 'https://m.media-amazon.com/images/M/MV5BOGUyZDUxZjEtMmIzMC00MzlmLTg4MGItZWJmMzBhZjE0Mjc1XkEyXkFqcGdeQXVyMTMxODk2OTU#._V1_SX300.jpg', 'Ratings': [{'Source': 'Internet Movie Database', 'Value': '7.7/10'}, {'Source': 'Rotten Tomatoes', 'Value': '96%'}, {'Source': 'Metacritic', 'Value': '95/100'}], 'Metascore': '95', 'imdbRating': '7.7', 'imdbVotes': '590,040', 'imdbID': 'tt1285016', 'Type': 'movie', 'DVD': '11 Jan 2011', 'BoxOffice': '$96,400,000', 'Production': 'Columbia Pictures', 'Website': 'N/A', 'Response': 'True'}
with open("a_movie.json") as json_file:
json_data=json.load(json_file)
# Print each key-value pair in json_data
for k, v in json_data.items():
print(type(k))
print(type(v))
print(k,':',v)
Not every time k and v be str. So instead of + use ,
print(k+':'+v)
the error is from there because your v is a list. Here is an alternative:
f = [str(z) for z in v]
x = ",".join(f)
print(k+":"+x)
I have this .txt file that needs to transformed into a dictorionary:
(the order of the lines may vary)
name : john doe
age : 23
gender : MALE
address : kendall 6, Miami
career: mechanical engineer
times going : 2
number of assignments : 4
semester : 4
average : 9.2
interests : gaming, robotics, drama movies
availability:
friday : 6:30 - 10:30
sunday : 12:30 - 13:30
monday : 16:30 - 18:30
The output code should be like this:
{'name': 'john doe',
'age': '23',
'gender': 'MALE',
'address': 'kendall 6, Miami',
'semester': '4',
'career': 'mechanical engineer',
'average': '9.2',
'times going': '2',
'number of assignments': '5',
'interests': 'gaming, robotics, drama movies',
'availability':
{'friday': (630,1030),
'sunday': (1230,1330),
'monday': (1630,1830)
}
}
As for now, I have successfully made the dictionary just right before this "availability" section:
dicc={}
listRestrictions=["availability","monday","tuesday","wednesday","thursday","friday","saturday","sunday"]
for line in file:
line = line.strip("\n").replace(" : ", ":").strip(" ")
key = line[: line.index(":")]
if key not in listRestrictions:
value = line[line.index(":") + 1 :]
dicc[key] = value
print(dicc)
And prints:
{'name': 'john doe', 'age': '23', 'gender': 'MALE', 'address': 'kendall 6, Miami', 'career': 'mechanical engineer', 'times going': '2', 'number of assignments': '4', 'semester': '4', 'average': '9.2', 'interests': 'gaming, robotics, drama movies'}
(keeping in mind it could be on any file of the .txt file and that the dates will always be under "availability")...
How would I take the "availability" as the value and then the dates as sub-dictionary as shown above?
dicc={}
last_key = ''
for _line in file:
line = _line.strip("\n").replace(" : ", ":")
line = line.strip(" ")
key = line[: line.index(":")]
last_key = key
value = line[line.index(":") + 1 :]
if // the first part of the line is a blank space:
dicc[last_key][key] = value
else:
dicc[key] = value
print(dicc)
dicc = {}
for line in file:
isAppended = line.startswith(" ")
line = line.strip("\n").replace(" : ", ":").strip(" ")
value = line[line.index(":") + 1 :]
tempKey = line[: line.index(":")]
if len(value)==0:
currKey = tempKey
tempDict = {}
elif isAppended:
tempDict[tempKey] = value
dicc.update({currKey:tempDict})
else:
dicc[tempKey] = value
Not sure why you're trying such low level parsing. Isn't that YAML? That assumption gets me quite close:
import yaml
from pprint import pprint
with open('data.txt') as f:
data = yaml.load(f)
Then data is this nested Python dictionary:
{'name': 'john doe',
'age': 23,
'gender': 'MALE',
'address': 'kendall 6, Miami',
'career': 'mechanical engineer',
'times going': 2,
'number of assignments': 4,
'semester': 4,
'average': 9.2,
'interests': 'gaming, robotics, drama movies',
'availability': {'friday': '6:30 - 10:30',
'sunday': '12:30 - 13:30',
'monday': '16:30 - 18:30'}}
The remaining differences are easy to do, now that it's a Python data structure.
I don't know how you pull your days and times, but assuming these are, for example, in the below format to begin with:
days = ['monday', 'tuesday', 'wednesday']
times = [(630, 1030), (830, 1400), (930, 1330)]
Then one way to build the availability dictionary is as follows:
dicc['availability'] = dict(zip(days, times))
print (dicc)
I have a data frame that has different data types (list, dictionary, list of dictionary, strings, etc).
df = pd.DataFrame([{'category': [{'id': 1, 'name': 'House Targaryen'}],
'connection': ['Rhaena Targaryen', 'Aegon Targaryen'],
'description': 'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark '
'and Rhaegar Targaryen, the late Prince of Dragonstone',
'name': 'Jon Snow'},
{'category': [{'id': 2, 'name': 'House Stark'},
{'id': 3, 'name': 'Nights Watch'}],
'connection': ['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark'],
'description': 'After successfully capturing a wight and presenting it to '
'the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'name': 'Jon Snow'}])
I want to merge these two rows by Jon Snow and combine all other fields together so it looks like
name category description connection
Jon Snow ['House Targaryen','House Stark','Nights Watch'] Jon Snow, born ...... his army to Daenerys Targaryen. ['Rhaena Targaryen',...,'Bran Stark']
It might be a little tricky with list of dictionaries, since this is a toy example, it only contains two rows, and it's easy to explode it and combine two rows of category together. But I don't think it's practical to do that in my actual data set.
I also thought about using df.groupby('name').aggregate('category': func1,'description':func2, 'connection':func3) but I'm not sure if there's a build-in function for what I need.
Thank yall for helping!
Looking at your data, it might be possible to first do a simple groupby and sum. Then deal with the categories using list comprehension:
import pandas as pd
df = pd.DataFrame([{'category': [{'id': 1, 'name':'House Targaryen'}],
'name': 'Jon Snow',
'description':'Jon Snow, born Aegon Targaryen, is the son of Lyanna Stark and Rhaegar Targaryen, the late Prince of Dragonstone',
'connection':['Rhaena Targaryen', 'Aegon Targaryen']},
{'category': [{'id': 2, 'name': 'House Stark'},{'id': 3, 'name': 'Nights Watch'}],
'name': 'Jon Snow',
'description': 'After successfully capturing a wight and presenting it to the Lannisters as proof that the Army of the Dead are real, '
'Jon pledges himself and his army to Daenerys Targaryen.',
'connection':['Robb Stark', 'Sansa Stark', 'Arya Stark', 'Bran Stark']},
{"category":[{"id":4,"name":"Some house"}],
"name": "Some name",
"description": "some desc",
"connection":["connection 1"]}])
result = df.groupby("name").sum()
result["category"] = [[item.get("name") for item in i] for i in result["category"]]
result.reset_index(inplace=True)
print (result)
#
name category description connection
0 Jon Snow [House Targaryen, House Stark, Nights Watch] Jon Snow, born Aegon Targaryen, is the son of ... [Rhaena Targaryen, Aegon Targaryen, Robb Stark...
1 Some name [Some house] some desc [connection 1]
This questions has been asked many times - but only once with this special case and I could partially find an answer here but it flattens down to every object.
I have this dictionary:
{'address': {'address_line_1': 'Floor Dekk House',
'address_line_2': 'Zippora Street Providence Industrial Estate',
'country': 'Seychelles',
'locality': 'Mahe',
'premises': '1st'},
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers': ['appointment-count'],
'kind': 'searchresults#officer',
'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
'snippet': '',
'title': 'ASTROCOM AG '}
As you can see "description_identifiers" and "matches.snippet" and "matches.title" have a list as value. I'd like to edit my code below to flatten my dictionary so that the json is flattened in a{key:value, key:value, key:value}` pair - but if the value is a list of atomic objects (not a list of lists or a list of dictionaries), the value is maintained as a list.
The objective is so be able to upload then this json to postgresql.
Here's some code i found online:
def flatten_json(dictionary):
"""Flatten a nested json file"""
def unpack(parent_key, parent_value):
"""Unpack one level of nesting in json file"""
# Unpack one level only!!!
if isinstance(parent_value, dict):
for key, value in parent_value.items():
temp1 = parent_key + '_' + key
yield temp1, value
elif isinstance(parent_value, list):
i = 0
for value in parent_value:
temp2 = parent_key + '_' +str(i)
i += 1
yield temp2, value
else:
yield parent_key, parent_value
# Keep iterating until the termination condition is satisfied
while True:
# Keep unpacking the json file until all values are atomic elements (not dictionary or list)
dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items())))
# Terminate condition: not any value in the json file is dictionary or list
if not any(isinstance(value, dict) for value in dictionary.values()) and \
not any(isinstance(value, list) for value in dictionary.values()):
break
return dictionary
Desired output:
And to test, this dict:
Should not be (which is what I get now):
{'address_address_line_1': 'Floor Dekk House',
'address_address_line_2': 'Zippora Street Providence Industrial Estate',
'address_country': 'Seychelles',
'address_locality': 'Mahe',
'address_premises': '1st',
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers_0': 'appointment-count',
'kind': 'searchresults#officer',
'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
'matches_title_0': 1,
'matches_title_1': 8,
'matches_title_2': 10,
'matches_title_3': 11,
'snippet': '',
'title': 'ASTROCOM AG '}
But rather
{'address_address_line_1': 'Floor Dekk House',
'address_address_line_2': 'Zippora Street Providence Industrial Estate',
'address_country': 'Seychelles',
'address_locality': 'Mahe',
'address_premises': '1st',
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers_0': 'appointment-count',
'kind': 'searchresults#officer',
'links_self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments',
'matches_title': [1, 8, 10, 11]
'snippet': '',
'title': 'ASTROCOM AG '}
You are almost done, except you need a little more check on the condition:
def flatten(dict_, prefix):
for k, v in dict_.items():
if isinstance(v, list) and len(v)==1:
if isinstance(v[0], dict):
for key, value in flatten(v[0], prefix+k+"_"):
yield key, value
else:
yield prefix+k+"_0", v[0]
elif isinstance(v, dict):
for key, value in flatten(v, prefix+k+"_"):
yield key, value
else:
yield prefix+k, v
Usage:
dict_ = {'address': {'address_line_1': 'Floor Dekk House',
'address_line_2': 'Zippora Street Providence Industrial Estate',
'country': 'Seychelles',
'locality': 'Mahe',
'premises': '1st'},
'address_snippet': '1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles',
'appointment_count': 1,
'description': 'Total number of appointments 1',
'description_identifiers': ['appointment-count'],
'kind': 'searchresults#officer',
'links': {'self': '/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments'},
'matches': {'snippet': [], 'title': [1, 8, 10, 11]},
'snippet': '',
'title': 'ASTROCOM AG '}
import json
print(json.dumps(dict(list(flatten(dict_, ""))), indent=4))
Output:
{
"address_address_line_1": "Floor Dekk House",
"address_address_line_2": "Zippora Street Providence Industrial Estate",
"address_country": "Seychelles",
"address_locality": "Mahe",
"address_premises": "1st",
"address_snippet": "1st, Floor Dekk House, Zippora Street Providence Industrial Estate, Mahe, Seychelles",
"appointment_count": 1,
"description": "Total number of appointments 1",
"description_identifiers_0": "appointment-count",
"kind": "searchresults#officer",
"links_self": "/officers/z7s5QUnhlYpAT8GvqvJ5snKmtHE/appointments",
"matches_snippet": [],
"matches_title": [
1,
8,
10,
11
],
"snippet": "",
"title": "ASTROCOM AG "
}