optimizing code to print sorted items in nested dictionary - python

I am new to python so I wanted to know if the code I wrote for printing items inside a nested dictionary in a sorted alphabetical order is optimal especially for checking if key exists. Let me know if there is a better optimal solution
# Code
import operator
locations = {'North America': {'USA': ['Mountain View']}}
locations['Asia'] = {'India':['Bangalore']}
locations['North America']['USA'].append('Atlanta')
locations['Africa'] = {'Egypt':['Cairo']}
locations['Asia']['China'] = ['Shanghai']
# TODO: Print a list of all cities in the USA in alphabetic order.
if 'North America' in locations:
for key,value in locations['North America'].items():
if 'USA' in key:
for item in sorted(value):
print(f"{item}")
# TODO: Print all cities in Asia, in alphabetic order, next to the name of the country
if 'Asia' in locations:
for key,value in sorted(locations['Asia'].items(),key=operator.itemgetter(1)):
print(f"{value[0]} {key}")

Make these two lines your code:
print('\n'.join(sorted([x for i in locations.get('North America', {}).values() for x in i])))
print('\n'.join(sorted([x + ' ' + k for k,v in locations.get('Asia', {}).items() for x in v])))
Which outputs:
Atlanta
Mountain View
Bangalore India
Shanghai China

Dictionaries in python are unordered. Given that, I will try to help solve for your actual problem of checking for a key in a dictionary.
locations = {'North America': {'USA': ['Mountain View']}}
locations['Asia'] = {'India':['Bangalore']}
locations['North America']['USA'].append('Atlanta')
locations['Africa'] = {'Egypt':['Cairo']}
locations['Asia']['China'] = ['Shanghai']
# First we clean up all the loops.
# You are just checking if the key is in the dictionary with all the loops
if 'North America' in locations and 'USA' in locations['North America']:
for item in sorted(value):
print(f"{item}")
if 'Asia' in locations:
# Since dictionaries are unordered, we will make a list of the countries to order
countries = []
for k in locations['Asia'].keys():
countries.append(k)
# Using a similar loop to the one to print cities
for country in sorted(countries):
# Adding a dimension for cities.
for city in sorted(locations['Asia'][country]):
print(f"{country}:{city}")
The Asia dictionary should loop through each country and in alphabetical order print each country and city.

dictionaries are used because they give direct lookup of any specific key. For testing existence, you don't need to search. The downside is they are not sorted.
You iterate through all countries in north america when you already know you want usa, so ... don't do that.
print(sorted(locations['North America']['USA']))
This is better because it is O(1) lookup on the second layer when you do O(n) where n is the number of nations in that particular continent. Which admittedly isn't much so that's why they say don't optimize if you don't need to. But maybe you have a lot more data and the geography sample data was just filler.
To test for existence of a key, use "in" or write a try-except for KeyError. Python is one of the few languages where it's often better to just handle the exception.
To print all the cities in Asia, you will have to combine all the lists in asia and sort that: Combining two sorted lists in Python
You can do better by maintaining the city lists in sorted order all the time, using the bisect module. Inserting or removing in a sorted list is less work than sorting it each time, assuming you look at the list more often than you add and remove cities.
If you maintain sorted lists, you can efficiently get the sorted merge with https://docs.python.org/3.0/library/heapq.html#heapq.merge Although sadly you don't have the nation name doing that.

Related

Python - List of lists - S&P500

I am new to python and am looking to analyze the S&P500 by sector. I have assigned symbols to all 11 sectors in the S&P with the first two looking like:
Financials = ['AFL', 'AIG', .... 'ZION']
Energy = ['APA', 'BKR', ... 'SLB']
I then create a new list (of lists) which might look like:
sectors_to_analyze = [Financials, Energy] or [Materials, ConsumerStaples]
My analysis is working perfectly, but I want to retrieve the names "Financials" and "Energy" to attach to the data produced and I cannot figure out how to do it other than make the name part of the list (Financials = ['Financials','AFL', 'AIG', .... 'ZION']
Can someone please point me in the right direction? Thank you.
Perhaps you could use a dictionary
sectors = {
'Financials':['AFL', ...],
# rest of your lists
}
Then you can iterate over the whole dict and access both names and data associated with those names
for key, value in sectors.items():
print(f'Sector name: {key}, List: {value}')
I think you want to use a dictionary instead of a "list of lists" (also called a two dimensional list). You could then loop over the dictionary almost the same way. Here's some example code:
Financials = ['AFL', 'AIG', 'ZION']
Energy = ['APA', 'BKR', 'SLB']
sectors = {"Finacials": Financials, "Energy": Energy}
# in this loop, sector is the sector's name, and symbols is the sector's
# list
for sector in sectors:
symbols = sectors[sector]
# ...
# do some analysis
# ...

Python list comprehension: list index out of range

My list comprehension returns the error "List index out of range" if the value does not exist in the list.
Objective:
Check a three-letter code for a country given by a variable (country) and transform the code into a two-letter code by looking up a list of tuples (COUNTRIES)
Constant:
# Two letter code, three letter code, full name
COUNTRIES = [
('US', 'USA', 'United States'),
('DE', 'DEU', 'Germany'),
....
]
Code:
country = 'EUR'
# Check if code in country has 3 characters (I have multiple checks for two letter codes too) and is not None
if len(country) == 3 and country is not None:
country = [code2 for code2, code3, name in COUNTRIES if code3 == country][0]
If I only include a list with three letter codes USA and DEU, the code works fine. If I add the fictitious code "EUR", which is not a valid country code in the variable "country", than I get the List index out of range error.
How can I return None instead of breaking the program? The variable country will be used later on again.
I don't think List comprehensions are a good choice here. They are good when you want to turn one list into another list, which you don't really want to do here. A better approach would be a regular for loop with a return here.
However, my personal approach would be to transform the list lookups into dict lookups instead:
COUNTRIES_LUT = {}
for code2, code3, country in COUNTRIES:
COUNTRIES_LUT[code2] = country
COUNTRIES_LUT[code3] = country
At the end of that, you can just use COUNTRIES_LUT[your_str] as expected.
If you generate this lookuptable at the start, this also has the bonus of being faster, since you don't need to loop through every element of the list every time.

Ensuring commutavity / reciprocity in dictionary of county boundaries

BACKGROUND: I'm starting with the US Census's County Adjacency File. Unfortunately, the file is inconsistently formatted such that my initial script to extract this tab delimited file into a dictionary with key of county i (denoted by its Fips code) and values of a list of all counties adjacent to county i (again using Fips codes) produces errors.
PROBLEM: My current dictionary violates "reciprocity". If one county borders another county, the second county must also border the first. In my dictionary, for county i, there is often a county j that is in county i's values of adjacent counties, but county i is not contained in the values of key county j.
DESIRED SOLUTION: A dictionary or list of lists containing an entry for each county (key or 0th entry) and all its adjacent counties.
TASK: Iterating through each item in a dictionary, visiting all items with a key in the list of values of each item, and checking whether the reciprocal relationship holds (and if not, adding that key)
Minimum Working Example (nonsense values):
Adj_counties_pre = { 12000 : [12001, 12003],
12001 : [12004],
12003 : [12004, 12001],
12004 : [12003, 12000]}
...
Adj_counties_post = { 12000 : [12001, 12003, 12004],
12001 : [12004, 12003, 12000],
12003 : [12004, 12001, 12000],
12004 : [12003, 12000, 12001]}
I'm having trouble figuring out how to traverse the dictionary to fix this.
Thanks!
Well you can use the following solution:
from collections import defaultdict
result = defaultdict(set)
for k,vs in adj_counties_pre.items():
for v in vs:
result[k].add(v)
result[v].add(k)
adj_counties_post = {k:list(v) for k,v in result.items()}
The code works as follows, first we construct a temporary defaultdict(set). A defaultdict is a dictionary that constructs a value by invoking the factory (here set) if it cannot find a certain key.
Next we iterate over all key-value pairs (k,vs). And for every value vs over its elements v. For each key-element pair, we add the element v to the set associated with the key k, and the key k to the set associated with the element v.
Next we either are done (a defaultdict is a dictionary and this one maps county codes on a set of county codes), or we can decide to turn it into a vanilla dictionary with the dictionary comprehension. This then generates:
>>> adj_counties_post
{12000: [12001, 12003, 12004], 12001: [12000, 12003, 12004], 12003: [12000, 12001, 12004], 12004: [12000, 12001, 12003]}
also note that variables usually start with a lowercase.

Adding to a dictionary's subdictionary in Python, based off a subdictionary's key

Let's say I have a dictionary that looks like this:
test = {1: {'los angeles': ['playlist1', 'playlist2']}, 2: {'houston': ['playlist']}}
In this code, I want to add to the array belonging to 'los angeles', basically appending to it. Is there a way to perform an action where 1 is declared like a wildcard? I wanted to do something like this:
test[_]['los angeles'].append('playlist3')
Which would result in:
test = {1: {'los angeles': ['playlist1', 'playlist2', 'playlist3']}, 2: {'houston': ['playlist']}}
I don't know if I completely understand the question but if I may give it a try.
The integer keys in the outer dict seem to be what's holding you up.
It seems you have some data set of city names and you want to map those strings to two other data sets, city numbers and city playlists.
Instead of doing it all in one multidimensional dict, you can just use separate dicts for separate mappings.
city_playlists = {'los angeles':['playlist1', 'playlist2'],
'houston':['playlist']}
city_names = {1:'los angeles',
2:'houston'}
Then data retrieval and updating is much more straightforward.
city_playlists['los angeles'].append('playlist3')
And if I understand what you mean by 'wildcard'
wildcard = random.randint(1, len(city_names))
name = city_names[wildcard]
city_playlists[name].append('newplaylist')
You could also use a list for mapping city numbers to city names as lists have indexing, then use random.choice to pull out a random city name from the list.
There's no such wildcard-like thing. You should do it manually:
for k in test:
if 'los angeles' in test[k]:
test[k]['los angeles'].append('playlist3')
Or:
test[next(k for k in test if 'los angeles' in test[k])]['los angeles'].append('playlist3')
One-liner version will throw StopIteration if there's no dictionary with 'log angeles' key. And it only update the first dictionary that matched.

Alphabetize a List of Multiline Values in Python?

I have a list of names and addresses organized in the following format:
Mr and Mrs Jane Doe
Candycane Lane
Magic Meadows, SC
I have several blocks of data written like this, and I want to be able to alphabetize each block by the last name (Doe, in this case). After doing some digging, the best I can reckon is that I need to make a "List of lists" and then use the last name as a key by which to alphabetize the block. However, given by freshness to python and lack of Google skills, the closest I could find was this. I'm confused as to converting each block to a list and then slicing it; I can't seem to find a way to do this and still be able to alphabetize properly. Any and all guidance is greatly appreciated.
If I understood correctly, what you want basically is to sort values by "some computation done on the value", in this case the extracted last name.
For that, use the key keyword argument to .sort() or sorted():
def my_key_function(original_name):
## do something to extract the last name, for example:
try:
return original_name.split(',')[1].strip()
except IndexError:
return original_name
my_sorted_values = sorted(my_original_values, key=my_key_function)
The only requirement is that your "key" function is deterministic, i.e. always return the same output for each given input.
You might also want to sort by last name and then first name: in this case, just return a tuple (last, first): if last si the same for two given items, first will be used to further sort the two.
Update
For your specific case, this function should do the trick:
def my_key_function(original_name):
return original_name.splitlines()[0].split()[-1]
Assuming you already have the data in a list
l = ['Mr and Mrs Jane Smith\nCandycane Lane\nMagic Meadows, SC',
'Mr and Mrs Jane Doe\nCandycane Lane\nMagic Meadows, SC',
'Mr and Mrs Jane Atkins\nCandycane Lane\nMagic Meadows, SC']
You can specify the key to sort on.
l.sort(key=lambda x: x.split('\n')[0].split(' ')[-1])
In this case, get the last word (.split(' ')[-1]) on the first line (.split('\n')[0])
you want to make a new list where each entry is a tuple containing the sort key you want and the whole thing. Sort that list and then get the second component of each entry in the sort:
def get_sort_name (address):
name, address, city = address.split('\n')
return (name.split(' ')[-1] , address) # last item of first line & whole thing as tulle
keyed_list = map (get_sort_name, addresses)
keyed_list.sort()
sorted_addresses = [item[1] for item in keyed_list]
Thi could be more compact using lambdas of course but its better to be readable :)

Categories

Resources