Large nested dictionary of data only working with "last key" - python

[EDIT] - Issue not with dictionary itself. Unmodified copies of original file 'census2010.py' do not display the issue.
I'm trying to encode Excel data into a nested dictionary for further analysis.
I expect to be able to read out any key from the dictionary. For example, I expect the following to work:
>>> census2010.allData['AK']['Anchorage']
{'pop': 291826, 'tracts': 55}
What I get is:
census2010.allData['AK']['Anchorage']
Traceback (most recent call last):
File "<input>", line 1, in <module>
KeyError: 'AK'
The only key that works is:
census2010.allData['WY']['Weston']
{'pop': 3894, 'tracts': 1}
I've created the Census2010.py file with the data from the censuspopdata.xlsx folder (following process from Chapter 12 of "Automate the Boring Stuff").
Directly looking at Census2010.py shows all the nested keys, but importing 'census2010.py' and interrogating the dictionary only shows the "final" key.
Here's the script to generate census2010.py: (and it runs without error)
import openpyxl, pprint, os
print('Opening workbook...')
os.getcwd()
p = os.getcwd()
os.chdir(p + '\\automatestuffdirectorytest\\')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
countyData = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1):
# Each row in the spreadsheet has data for one census tract.
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# Make sure the key for this state exists.
countyData.setdefault(state, {})
# Make sure the key for this county in this state exists.
countyData[state].setdefault(county, {'tracts': 0, 'pop': 0})
# Each row represents one census tract, so increment by one.
countyData[state][county]['tracts'] += 1
# Increase the county pop by the pop in this census tract.
countyData[state][county]['pop'] += int(pop)
print('Writing results...')
resultFile = open('census2010.py', 'w')
resultFile.write('allData = ' + pprint.pformat(countyData))
resultFile.close()
print('Done.')
and here's a few snips of the resulting dictionary (3143 lines)
allData = {'AK': {'Aleutians East': {'pop': 3141, 'tracts': 1},
'Aleutians West': {'pop': 5561, 'tracts': 2},
'Anchorage': {'pop': 291826, 'tracts': 55}, # ...
--snip--
'Yukon-Koyukuk': {'pop': 5588, 'tracts': 4}}, # ...
--snip--
'WY': {'Albany': {'pop': 36299, 'tracts': 10}, # ...
--snip --
'Weston': {'pop': 7208, 'tracts': 2}}}
But the only key that seems to be found is [WY][Weston]
for i in allData.items():
... print(i)
...
('WY', {'Weston': {'pop': 3894, 'tracts': 1}})
calling the keys only works with ['WY']['Weston']
census2010.allData['WY']['Weston']
{'pop': 3894, 'tracts': 1}

This code might help you:
Note: I am not using os module
# ! python3
# read_census_excel.txt - Tabultaes population and number of census tracts for
# each county.
import openpyxl, pprint
print('Opening workbook')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb.get_sheet_by_name('Population by Census Tract')
county_data = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1):
# Each row in the spreadsheet has data for one census tract.
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# Make sure the key for this state exists.
county_data.setdefault(state, {})
# Make sure the key for this county in this state exists.
county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
# Each row represents one census tract, so increment by one.
county_data[state][county]['tracts'] += 1
# Increase the county pop by the pop in this census tract.
county_data[state][county]['pop'] += int(pop)
# Open a new text file and write the contents of county_data to it.
print('Writing results...')
result_file = open('census2010.py', 'w')
result_file.write('all_data= ' + pprint.pformat(county_data))
result_file.close()
print('Done')

Related

For loop in tkinter Label return only last line

I am trying to update a variabletext with new meaning with FOR LOOP but I only get the last line.
team_list = StringVar()
team_list.set('76ers: \nBucks: \nBulls: \nCavaliers: \nCeltics: \nClippers: \nGolden State Warriors: \nGrizzlies: \nHawks: \nHeat: \nHornets: \nJazz: \nKings: \nKnicks: \nLakers: \nMagic: \nMavericks: \nNets: \nNuggets: \nPacers: \nPelicans: \nPistons: \nRaptors: \nRockets: \nSpurs: \nSuns: \nThunder: \nTimberwolves: \nTrail Blazers: \nWizards: ')
def analize():
if len(t_text.get(1.0, END)) == 1:
tkinter.messagebox.showinfo(title='Error', message='No text was added')
else:
full_list = ['76ers', 'Bucks', 'Bulls', 'Cavaliers', 'Celtics', 'Clippers', 'Golden State Warriors', 'Grizzlies', 'Hawks', 'Heat', 'Hornets', 'Jazz', 'Kings', 'Knicks', 'Lakers', 'Magic', 'Mavericks', 'Nets', 'Nuggets', 'Pacers', 'Pelicans', 'Pistons', 'Raptors', 'Rockets', 'Spurs', 'Suns', 'Thunder', 'Timberwolves', 'Trail Blazers', 'Wizards']
for team in full_list:
result = (t_text.get(1.0, END)).count(team)
team_list.set(str(team) + ': ' + str(result))
When trying to print FOR LOOP in terminal, it prints everything as expected, but on tkinter Label variabletext it doesn't work. Just returns last FOR LOOP line.
"team_list" in your last line does not append data, it is written to reset with the latest value. Consider creating an empty list to track counts and then append data for each iteration of the loop.
team_counts = {}
for team in full_list:
result = (t_text.get(1.0, END)).count(team)
team_counts[team] = result
team_list.set('\n'.join(f"{team}: {count}" for team, count in team_counts.items()))

Is there a better way to find specific value in a python dictionary like in list?

I have been practicing on iterating through dictionary and list in Python.
The source file is a csv document containing Country and Capital. It seems I had to go through 2 for loops for country_dict in order to produce the same print result for country_list and capital_list.
Is there a better way to do this in Python dictionary?
The code:
import csv
path = #Path_to_CSV_File
country_list=[]
capital_list=[]
country_dict={'Country':[],'Capital':[]}
with open(path, mode='r') as data:
for line in csv.DictReader(data):
locals().update(line)
country_dict['Country'].append(Country)
country_dict['Capital'].append(Capital)
country_list.append(Country)
capital_list.append(Capital)
i=14 #set pointer value to the 15th row in the csv document
#---------------------- Iterating through Dictionary using for loops---------------------------
if i >= (len(country_dict['Country'])-1):
print("out of bound")
for count1, element in enumerate(country_dict['Country']):
if count1==i:
print('Country = ' + element)
for count2, element in enumerate(country_dict['Capital']):
if count2==i:
print('Capital = ' + element)
#--------------------------------Direct print for list----------------------------------------
print('Country = ' + country_list[i] + '\nCapital = ' + capital_list[i])
The output:
Country = Djibouti
Capital = Djibouti (city)
Country = Djibouti
Capital = Djibouti (city)
The CSV file content:
Country,Capital
Algeria,Algiers
Angola,Luanda
Benin,Porto-Novo
Botswana,Gaborone
Burkina Faso,Ouagadougou
Burundi,Gitega
Cabo Verde,Praia
Cameroon,Yaounde
Central African Republic,Bangui
Chad,N'Djamena
Comoros,Moroni
"Congo, Democratic Republic of the",Kinshasa
"Congo, Republic of the",Brazzaville
Cote d'Ivoire,Yamoussoukro
Djibouti,Djibouti (city)
Egypt,Cairo
Equatorial Guinea,"Malabo (de jure), Oyala (seat of government)"
Eritrea,Asmara
Eswatini (formerly Swaziland),"Mbabane (administrative), Lobamba (legislative, royal)"
Ethiopia,Addis Ababa
Gabon,Libreville
Gambia,Banjul
Ghana,Accra
Guinea,Conakry
Guinea-Bissau,Bissau
Kenya,Nairobi
Lesotho,Maseru
Liberia,Monrovia
Libya,Tripoli
Madagascar,Antananarivo
Malawi,Lilongwe
Mali,Bamako
Mauritania,Nouakchott
Mauritius,Port Louis
Morocco,Rabat
Mozambique,Maputo
Namibia,Windhoek
Niger,Niamey
Nigeria,Abuja
Rwanda,Kigali
Sao Tome and Principe,São Tomé
Senegal,Dakar
Seychelles,Victoria
Sierra Leone,Freetown
Somalia,Mogadishu
South Africa,"Pretoria (administrative), Cape Town (legislative), Bloemfontein (judicial)"
South Sudan,Juba
Sudan,Khartoum
Tanzania,Dodoma
Togo,Lomé
Tunisia,Tunis
Uganda,Kampala
Zambia,Lusaka
Zimbabwe,Harare
I am not sure if I get your point; Please check out the code.
import csv
path = #Path_to_CSV_File
country_dict={}
with open(path, mode='r') as data:
lines = csv.DictReader(data)
for idx,line in enumerate(lines):
locals().update(line)
country_dict[idx] = {"Country":Country,"Capital":}
i=14 #set pointer value to the 15th row in the csv document
#---------------------- Iterating through Dictionary using for loops---------------------------
country_info = country_dict.get(i)
#--------------------------------Direct print for list----------------------------------------
print('Country = ' + country_info['Country'] + '\nCapital = ' + country_info['Capital'])

finding the same words in two files and leaving out not repeated ones in python

I have to write a program that correlates smoking with lung cancer risk. For that I have data in two files.
My code is computing the data given in the same lines (eg:America,23.3 with Spain,77.9 and
Italy,24.2 with Russia,60.8)
How to modify my code so that it computes the numbers of the same countries and leaves out the countries that occur only in one file (it shouldn't compute Germany, France, China, Korea because they are only in one file)
Thank you so much for your help in advance:)
smoking file:
Country, Percent Cigarette Smokers Data
America,23.3
Italy,24.2
Russia,23.7
France,14.9
England,17.9
Spain,17
Germany,21.7
second file:
Cases Lung Cancer per 100000
Spain,77.9
Russia,60.8
Korea,61.3
America,73.3
China,66.8
Vietnam,64.5
Italy,43.9
and my code:
def readFiles(smoking_datafile, cancer_datafile):
'''
Reads the data from the provided file objects smoking_datafile
and cancer_datafile. Returns a list of the data read from each
in a tuple of the form (smoking_datafile, cancer_datafile).
'''
# init
smoking_data = []
cancer_data = []
empty_str = ''
# read past file headers
smoking_datafile.readline()
cancer_datafile.readline()
# read data files
eof = False
while not eof:
# read line of data from each file
s_line = smoking_datafile.readline()
c_line = cancer_datafile.readline()
# check if at end-of-file of both files
if s_line == empty_str and c_line == empty_str:
eof = True
# check if end of smoking data file only
elif s_line == empty_str:
raise OSError('Unexpected end-of-file for smoking data file')
# check if at end of cancer data file only
elif c_line == empty_str:
raise OSError('Unexpected end-of-file for cancer data file')
# append line of data to each list
else:
smoking_data.append(s_line.strip().split(','))
cancer_data.append(c_line.strip().split(','))
# return list of data from each file
return (smoking_data, cancer_data)
def calculateCorrelation(smoking_data, cancer_data):
'''
Calculates and returns the correlation value for the data
provided in lists smoking_data and cancer_data
'''
# init
sum_smoking_vals = sum_cancer_vals = 0
sum_smoking_sqrd = sum_cancer_sqrd = 0
sum_products = 0
# calculate intermediate correlation values
num_values = len(smoking_data)
for k in range(0,num_values):
sum_smoking_vals = sum_smoking_vals + float(smoking_data[k][1])
sum_cancer_vals = sum_cancer_vals + float(cancer_data[k][1])
sum_smoking_sqrd = sum_smoking_sqrd + \
float(smoking_data[k][1]) ** 2
sum_cancer_sqrd = sum_cancer_sqrd + \
float(cancer_data[k][1]) ** 2
sum_products = sum_products + float(smoking_data[k][1]) * \
float(cancer_data[k][1])
# calculate and display correlation value
numer = (num_values * sum_products) - \
(sum_smoking_vals * sum_cancer_vals)
denom = math.sqrt(abs( \
((num_values * sum_smoking_sqrd) - (sum_smoking_vals ** 2)) * \
((num_values * sum_cancer_sqrd) - (sum_cancer_vals ** 2)) \
))
return numer / denom
Let's just focus on getting the data into a format that is easy to work with. The code below will get you a dictionary of the form ...
smokers_cancer_data = {
'America': {
'smokers': '23.3',
'cancer': '73.3'
},
'Italy': {
'smokers': '24.2',
'cancer': '43.9'
},
...
}
Once you have this you can get any values you need and perform your calculations. See the code below.
def read_data(filename: str) -> dict:
with open(filename, 'r') as file:
next(file) # Skip the header
data = dict();
for line in file:
cleaned_line = line.rstrip()
# Skip blank lines
if cleaned_line:
data_item = (cleaned_line.split(','))
data[data_item[0]] = float(data_item[1])
return data
# Load data into python dictionaries
smokers_data = read_data('smokersData.txt')
cancer_data = read_data('lungCancerData.txt')
# Build one dictionary that is easy to work with
smokers_cancer_data = dict()
for (key, value) in smokers_data.items():
if key in cancer_data:
smokers_cancer_data[key] = {
'smokers': smokers_data[key],
'cancer' : cancer_data[key]
}
print(smokers_cancer_data)
For example, if you want to calculate the sum of the smoker and cancer values.
smokers_total = 0
cancer_total = 0
for (key, value) in smokers_cancer_data.items():
smokers_total += value['smokers']
cancer_total += value['cancer']
This will return a list of all the countries that have datas, along with the data:
l3 = []
with open('smoking.txt','r') as f1, open('cancer.txt','r') as f2:
l1, l2 = f1.readlines(), f2.readlines()
for s1 in l1:
for s2 in l2:
if s1.split(',')[0] == s2.split(',')[0]:
cty = s1.split(',')[0]
smk = s1.split(',')[1].strip()
cnr = s2.split(',')[1].strip()
l3.append(f"{cty}: smoking: {smk}, cancer: {cnr}")
print(l3)
Output:
['Spain: smoking: 77.9, cancer: 17', 'Russia: smoking: 60.8, cancer: 23.7', 'America: smoking: 73.3, cancer: 23.3', 'Italy: smoking: 43.9, cancer24.2']

My code doesn't produce any output -- Python

I have two columns of data
(sample data) and I want to calculate total users for each week day.
For instance, I'd want my output like this (dict/list anything will do):
Monday: 25,
Tuesday: 30,
Wednesday:45,
Thursday: 50,
Friday:24,
Saturday:22,
Sunday:21
Here's my attempt:
def rider_ship (filename):
with open('./data/Washington-2016-Summary.csv','r') as f_in:
Sdict = []
Cdict = []
reader = csv.DictReader(f_in)
for row in reader:
if row['user_type']=="Subscriber":
if row['day_of_week'] in Sdict:
Sdict[row['day_of_week']]+=1
else:
Sdict [row['day_of_week']] = row['day_of_week']
else:
if row ['day_of_week'] in Cdict:
Cdict[row['day_of_week']] +=1
else:
Cdict[row['day_of_week']] = row['day_of_week']
return Sdict, Cdict
print (Sdict)
print (Cdict)
t= rider_ship ('./data/Washington-2016-Summary.csv')
print (t)
TypeError::list indices must be integers or slices, not str
How about using pandas?
Let's first create a file-like object with io library:
import io
s = u"""day_of_week,user_type
Monday,subscriber
Tuesday,customer
Tuesday,subscriber
Tuesday,subscriber"""
file = io.StringIO(s)
Now to the actual code:
import pandas as pd
df = pd.read_csv(file) # "path/to/file.csv"
Sdict = df[df["user_type"] == "subscriber"]["day_of_week"].value_counts().to_dict()
Cdict = df[df["user_type"] == "customer"]["day_of_week"].value_counts().to_dict()
Now we have:
Sdict = {'Tuesday': 2, 'Monday': 1}
Cdict = {'Tuesday': 1}

How to resolve AttributeError: 'list' object has no attribute 'next' in Python

I am trying to read a file where the odd lines are department numbers, and even ones are sales totals. I need to be able to read a line and append it to a variable to be used later.
def main():
with open('P2data.txt') as x:
data = x.readlines()
dept = (data)[::2]
sales = (data)[1::2]
if dept == '1':
total = sales.next()
total.append(total1)
elif dept == '2':
total = sales.next()
total.append(total2)
else:
total = sales.next()
total.append(total3)
print('Dept 1:', total1)
print('Dept 2:', total2)
print('Dept 3:', total3)
main()
Your code is going in the wrong direction. You also are doing things like checking an entire data structure against what should be compared to one of that structure's elements, and mixing up the syntax for appending to a list. Simply loop over the data structures you create and add to a dictionary:
def main():
with open('P2data.txt') as x:
data = [line.strip() for line in x]
dept = data[::2]
sales = data[1::2]
totals = {'1':0, '2':0, '3':0}
for dep,sale in zip(dept, sales):
totals[dep] += float(sale)
for dep in sorted(totals):
print('Dept {}: {}'.format(dep, totals[dep]))
main()

Categories

Resources