Ensuring commutavity / reciprocity in dictionary of county boundaries - python

BACKGROUND: I'm starting with the US Census's County Adjacency File. Unfortunately, the file is inconsistently formatted such that my initial script to extract this tab delimited file into a dictionary with key of county i (denoted by its Fips code) and values of a list of all counties adjacent to county i (again using Fips codes) produces errors.
PROBLEM: My current dictionary violates "reciprocity". If one county borders another county, the second county must also border the first. In my dictionary, for county i, there is often a county j that is in county i's values of adjacent counties, but county i is not contained in the values of key county j.
DESIRED SOLUTION: A dictionary or list of lists containing an entry for each county (key or 0th entry) and all its adjacent counties.
TASK: Iterating through each item in a dictionary, visiting all items with a key in the list of values of each item, and checking whether the reciprocal relationship holds (and if not, adding that key)
Minimum Working Example (nonsense values):
Adj_counties_pre = { 12000 : [12001, 12003],
12001 : [12004],
12003 : [12004, 12001],
12004 : [12003, 12000]}
...
Adj_counties_post = { 12000 : [12001, 12003, 12004],
12001 : [12004, 12003, 12000],
12003 : [12004, 12001, 12000],
12004 : [12003, 12000, 12001]}
I'm having trouble figuring out how to traverse the dictionary to fix this.
Thanks!

Well you can use the following solution:
from collections import defaultdict
result = defaultdict(set)
for k,vs in adj_counties_pre.items():
for v in vs:
result[k].add(v)
result[v].add(k)
adj_counties_post = {k:list(v) for k,v in result.items()}
The code works as follows, first we construct a temporary defaultdict(set). A defaultdict is a dictionary that constructs a value by invoking the factory (here set) if it cannot find a certain key.
Next we iterate over all key-value pairs (k,vs). And for every value vs over its elements v. For each key-element pair, we add the element v to the set associated with the key k, and the key k to the set associated with the element v.
Next we either are done (a defaultdict is a dictionary and this one maps county codes on a set of county codes), or we can decide to turn it into a vanilla dictionary with the dictionary comprehension. This then generates:
>>> adj_counties_post
{12000: [12001, 12003, 12004], 12001: [12000, 12003, 12004], 12003: [12000, 12001, 12004], 12004: [12000, 12001, 12003]}
also note that variables usually start with a lowercase.

Related

Python - List of lists - S&P500

I am new to python and am looking to analyze the S&P500 by sector. I have assigned symbols to all 11 sectors in the S&P with the first two looking like:
Financials = ['AFL', 'AIG', .... 'ZION']
Energy = ['APA', 'BKR', ... 'SLB']
I then create a new list (of lists) which might look like:
sectors_to_analyze = [Financials, Energy] or [Materials, ConsumerStaples]
My analysis is working perfectly, but I want to retrieve the names "Financials" and "Energy" to attach to the data produced and I cannot figure out how to do it other than make the name part of the list (Financials = ['Financials','AFL', 'AIG', .... 'ZION']
Can someone please point me in the right direction? Thank you.
Perhaps you could use a dictionary
sectors = {
'Financials':['AFL', ...],
# rest of your lists
}
Then you can iterate over the whole dict and access both names and data associated with those names
for key, value in sectors.items():
print(f'Sector name: {key}, List: {value}')
I think you want to use a dictionary instead of a "list of lists" (also called a two dimensional list). You could then loop over the dictionary almost the same way. Here's some example code:
Financials = ['AFL', 'AIG', 'ZION']
Energy = ['APA', 'BKR', 'SLB']
sectors = {"Finacials": Financials, "Energy": Energy}
# in this loop, sector is the sector's name, and symbols is the sector's
# list
for sector in sectors:
symbols = sectors[sector]
# ...
# do some analysis
# ...

optimizing code to print sorted items in nested dictionary

I am new to python so I wanted to know if the code I wrote for printing items inside a nested dictionary in a sorted alphabetical order is optimal especially for checking if key exists. Let me know if there is a better optimal solution
# Code
import operator
locations = {'North America': {'USA': ['Mountain View']}}
locations['Asia'] = {'India':['Bangalore']}
locations['North America']['USA'].append('Atlanta')
locations['Africa'] = {'Egypt':['Cairo']}
locations['Asia']['China'] = ['Shanghai']
# TODO: Print a list of all cities in the USA in alphabetic order.
if 'North America' in locations:
for key,value in locations['North America'].items():
if 'USA' in key:
for item in sorted(value):
print(f"{item}")
# TODO: Print all cities in Asia, in alphabetic order, next to the name of the country
if 'Asia' in locations:
for key,value in sorted(locations['Asia'].items(),key=operator.itemgetter(1)):
print(f"{value[0]} {key}")
Make these two lines your code:
print('\n'.join(sorted([x for i in locations.get('North America', {}).values() for x in i])))
print('\n'.join(sorted([x + ' ' + k for k,v in locations.get('Asia', {}).items() for x in v])))
Which outputs:
Atlanta
Mountain View
Bangalore India
Shanghai China
Dictionaries in python are unordered. Given that, I will try to help solve for your actual problem of checking for a key in a dictionary.
locations = {'North America': {'USA': ['Mountain View']}}
locations['Asia'] = {'India':['Bangalore']}
locations['North America']['USA'].append('Atlanta')
locations['Africa'] = {'Egypt':['Cairo']}
locations['Asia']['China'] = ['Shanghai']
# First we clean up all the loops.
# You are just checking if the key is in the dictionary with all the loops
if 'North America' in locations and 'USA' in locations['North America']:
for item in sorted(value):
print(f"{item}")
if 'Asia' in locations:
# Since dictionaries are unordered, we will make a list of the countries to order
countries = []
for k in locations['Asia'].keys():
countries.append(k)
# Using a similar loop to the one to print cities
for country in sorted(countries):
# Adding a dimension for cities.
for city in sorted(locations['Asia'][country]):
print(f"{country}:{city}")
The Asia dictionary should loop through each country and in alphabetical order print each country and city.
dictionaries are used because they give direct lookup of any specific key. For testing existence, you don't need to search. The downside is they are not sorted.
You iterate through all countries in north america when you already know you want usa, so ... don't do that.
print(sorted(locations['North America']['USA']))
This is better because it is O(1) lookup on the second layer when you do O(n) where n is the number of nations in that particular continent. Which admittedly isn't much so that's why they say don't optimize if you don't need to. But maybe you have a lot more data and the geography sample data was just filler.
To test for existence of a key, use "in" or write a try-except for KeyError. Python is one of the few languages where it's often better to just handle the exception.
To print all the cities in Asia, you will have to combine all the lists in asia and sort that: Combining two sorted lists in Python
You can do better by maintaining the city lists in sorted order all the time, using the bisect module. Inserting or removing in a sorted list is less work than sorting it each time, assuming you look at the list more often than you add and remove cities.
If you maintain sorted lists, you can efficiently get the sorted merge with https://docs.python.org/3.0/library/heapq.html#heapq.merge Although sadly you don't have the nation name doing that.

Python list comprehension: list index out of range

My list comprehension returns the error "List index out of range" if the value does not exist in the list.
Objective:
Check a three-letter code for a country given by a variable (country) and transform the code into a two-letter code by looking up a list of tuples (COUNTRIES)
Constant:
# Two letter code, three letter code, full name
COUNTRIES = [
('US', 'USA', 'United States'),
('DE', 'DEU', 'Germany'),
....
]
Code:
country = 'EUR'
# Check if code in country has 3 characters (I have multiple checks for two letter codes too) and is not None
if len(country) == 3 and country is not None:
country = [code2 for code2, code3, name in COUNTRIES if code3 == country][0]
If I only include a list with three letter codes USA and DEU, the code works fine. If I add the fictitious code "EUR", which is not a valid country code in the variable "country", than I get the List index out of range error.
How can I return None instead of breaking the program? The variable country will be used later on again.
I don't think List comprehensions are a good choice here. They are good when you want to turn one list into another list, which you don't really want to do here. A better approach would be a regular for loop with a return here.
However, my personal approach would be to transform the list lookups into dict lookups instead:
COUNTRIES_LUT = {}
for code2, code3, country in COUNTRIES:
COUNTRIES_LUT[code2] = country
COUNTRIES_LUT[code3] = country
At the end of that, you can just use COUNTRIES_LUT[your_str] as expected.
If you generate this lookuptable at the start, this also has the bonus of being faster, since you don't need to loop through every element of the list every time.

im having a python coding issue and need ideas

I have written a code that looks through a transaction list and sees if a company is buying or selling and converts the value to USD and then returns it. the problem is my original list looks like this:
[['Acer', 481242.74], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85]]
and when i run my code it iterates through about 100 transactions, and after each transaction returns a sample like the following:
['Acer', 21439.6892]
now my problem is i want to update the value in the original list with this new value so that it would still have the company name but the two values behind it would be added and the new value would appear in the original list for the next iteration. so 481242.74+21439.6892=502682.4292 so the new original list would look like the following with the acer value updated
[['Acer',502682.4292 ], ['Beko', 966071.86], ['Cemex', 187242.16], ['Datsun', 748502.91], ['Equifax', 146517.59], ['Gerdau', 898579.89], ['Haribo', 265333.85]]
You should use a dictionary instead to order your companies
d ={'Acer': 481242.74,'Beko': 966071.86, 'Cemex': 187242.16, 'Datsun': 748502.91, 'Equifax': 146517.59, 'Gerdau': 898579.89, 'Haribo': 265333.85}
for k,v in d.items():
d[k] = v+sample[1]
Use a dictionary instead of a list.
#Make the master data structure a Dictionary
myDict = {'Acer':float(481242.74) }
print myDict
#iterate over the newList and update the master dictionary
newList = [['Acer', float(502682.4292)]]
for Company, NewValue in newList:
myDict[Company]=(myDict[Company]+NewValue)
print myDict

python: badly behaving dict inside a function- erroneous TypeError

I have dicts that I need to clean, e.g.
dict = {
'sensor1': [list of numbers from sensor 1 pertaining to measurements on different days],
'sensor2': [list of numbers from from sensor 2 pertaining to measurements from different days],
etc. }
Some days have bad values, and I would like to generate a new dict with the all the sensor values from that bad day to be erased by using an upper limit on the values of one of the keys:
def clean_high(dict_name,key_string,limit):
'''clean all the keys to eliminate the bad values from the arrays'''
new_dict = dict_name
for key in new_dict: new_dict[key] = new_dict[key][new_dict[key_string]<limit]
return new_dict
If I run all the lines separately in IPython, it works. The bad days are eliminated, and the good ones are kept. These are both type numpy.ndarray: new_dict[key] and new_dict[key][new_dict[key_string]<limit]
But, when I run clean_high(), I get the error:
TypeError: only integer arrays with one element can be converted to an index
What?
Inside of clean_high(), the type for new_dict[key] is a string, not an array.
Why would the type change? Is there a better way to modify my dictionary?
Do not modify a dictionary while iterating over it. According to the python documentation: "Iterating views while adding or deleting entries in the dictionary may raise a RuntimeError or fail to iterate over all entries". Instead, create a new dictionary and modify it while iterating over the old one.
def clean_high(dict_name,key_string,limit):
'''clean all the keys to eliminate the bad values from the arrays'''
new_dict = {}
for key in dict_name:
new_dict[key] = dict_name[key][dict_name[key_string]<limit]
return new_dict

Categories

Resources