Continued Difficulties with Python Question - python

I asked this question earlier and am still having difficulties. I tried a new approach that isn't working. Essentially, I'm trying to implement a program that performs a calculation using Python objects to represent data. I want to determine the name of the county that had the highest voter turnout in a previous election, as well as the percentage of the population who voted. I need to use two function names, but can manipulate them however I see fit. Here's what I currently have and not sure what mistake I'm making here:
#creating a dictionary to store the country name and its percentage
data = {}
#creating the class county
class County:
def __init__(self,county,population,voters):
self.country = country
self.voters = voters
self.population = population
self.sorted_data = ""
self.formatted_percentage = ""
def highest_turnout(data) :
highest = data[0]
highest_percent = (data[0].voters / data[0].population)
for data in County
if (County.voters / County.population) > highest_percent
highest = County
highest_percent = County.data
allegheny = County("allegheny", 1000490, 645469)
philadelphia = County("philadelphia", 1134081, 539069)
montgomery = County("montgomery", 568952, 399591)
lancaster = County("lancaster", 345367, 230278)
delaware = County("delaware", 414031, 284538)
chester = County("chester", 319919, 230823)
bucks = County("bucks", 444149, 319816)
I need the “highest_turnout” function to do this:
Find the County that has the highest turnout, i.e. the highest percentage of the
population who voted, using the objects’ population and voters attributes
Return a tuple containing the name of the County with the highest turnout and the
percentage of the population who voted, in that order; the percentage should be
represented as a number between 0 and 1
Display the results of any “print” functions, as well as the last one which prints the return value of the function. Note that your highest_turnout function should correctly determine the County with the highest turnout for any input list
Any explanations / advice on how to approach this would be greatly appreciated. Thank you as I'm pretty new to Python and want to learn as much as possible.

#creating a list to store the country name and its percentage
data = []
#creating the class county
class County:
def __init__(self,county,population,voters):
self.county = county
self.voters = voters
self.population = population
self.sorted_data = ""
self.formatted_percentage = ""
def highest_turnout(data) :
sorted_data_by_turnout = sorted(data, key=lambda county: county.voters / county.population, reverse=True)
highest_turnout_county = sorted_data_by_turnout[0]
return highest_turnout_county.county, (highest_turnout_county.voters / highest_turnout_county.population)
data = []
data.append(County("allegheny", 1000490, 645469))
data.append(County("philadelphia", 1134081, 539069))
data.append(County("montgomery", 568952, 399591))
data.append(County("lancaster", 345367, 230278))
data.append(County("delaware", 414031, 284538))
data.append(County("chester", 319919, 230823))
data.append(County("bucks", 444149, 319816))
print(highest_turnout(data))
FYI: there were some indentation errors in your code

Related

How do I return or print an attribute that's a math function between two attributes? Python

I'm very new to Python and have checked the three other posts about this subject but haven't been able to implement them successfully.
Essentially, I'm trying to return the name of the county with the highest voter turnout and the percentage. I can't seem to figure out how to return or print the latter part, as I don't have an attribute for the math portion (voters / population).
I've played around with some things like:
def percentage(self, turnout):
self.turnout = voters / population
Sorry if this post does not format correctly — this is all very new! Thanks in advance.
class County:
def __init__(self, name, population, voters):
self.name = name
self.population = population
self.voters = voters
def highest_turnout(data):
highest_county = data[0]
highest_percentage = (data[0].voters / data[0].population)
for county in data:
if (county.voters / county.population) > highest_percentage:
highest_county = county
highest_percentage = (county.voters / county.population)
return highest_county.name
# implement the function here
# your program will be evaluated using these objects
# it is okay to change/remove these lines but your program
# will be evaluated using these as inputs
allegheny = County("allegheny", 1000490, 645469) # this is an object
philadelphia = County("philadelphia", 1134081, 539069)
montgomery = County("montgomery", 568952, 399591)
lancaster = County("lancaster", 345367, 230278)
delaware = County("delaware", 414031, 284538)
chester = County("chester", 319919, 230823)
bucks = County("bucks", 444149, 319816)
data = [allegheny, philadelphia, montgomery, lancaster, delaware, chester, bucks]
result = highest_turnout(data) # do not change this line!
print(result) # prints the output of the function
# do not remove this line!
You simply return multiple values:
return highest_county.name, highest_percentage
In your calling program:
best_county, best_pct = highest_turnout(data)
What's very cool in python is you can actually just write the following:
highest_turnout = max(data, key=lambda county: county.voters / county.population)
Here, highest_turnout is the county with the highest turnout. What we've done is told python to calculate the maximum of the dataset where the values being compared is voters/population ie: the percentage of voters that came. In other words, this does exactly what your highest_turnout function does in a single line. You might consider defining a method for your County class called get_turnout() which just returns the percentage of the population that voted.
Obviously with highest_turnout we can write
highest_turnout.name
and
highest_turnout.voters / highest_turnout.population
to have the values you seek.

Issues with references

I am trying to calculate the voter turnout by dividing the number of votes by the population for a few Counties. The script should then determine and return the county with the highest turnout.
I am having trouble with my order of operations and am not sure where I am going wrong, I seem to have issues if class and my def highest_turnout(data) are not on the first space but that space keeps throwing an error for my line "results" saying "highest_turnout(data) is not defined" but if I indent def highest_turnout then I get NameError "highest_turnout" is not defined.... I understand why I am getting the name error because that definition is under class when it is indented -- I just dont know how to associate with the class and get the definition to run.
class County:# implement County class here
def __init__(self, init_name, init_population, init_voters):
self.name = init_name
self.population = init_population
self.voters = init_voters
self.turnout = []
#calculating turnout percentage
def add_turnout(self, turnout):
turnout = (self.voters / self.population)
if turnout not in self.turnout:
self.turnout.append(turnout)
return (turnout)
def highest_turnout(data) :
highest_turnout = data[0]
global turnout
if turnout > highest_turnout:
turnout = County.turnout
highest_turnout = County
return (highest_turnout, turnout)
# your program will be evaluated using these objects
# it is okay to change/remove these lines but your program
# will be evaluated using these as inputs
allegheny = County("allegheny", 1000490, 645469)
philadelphia = County("philadelphia", 1134081, 539069)
montgomery = County("montgomery", 568952, 399591)
lancaster = County("lancaster", 345367, 230278)
delaware = County("delaware", 414031, 284538)
chester = County("chester", 319919, 230823)
bucks = County("bucks", 444149, 319816)
data = [allegheny, philadelphia, montgomery, lancaster, delaware, chester, bucks]
result = highest_turnout(data) # do not change this line!
print(result) # prints the output of the function
# do not remove this line!
Looks like there's a few issues with your code and could probably be simplified quite a bit. If you're trying to find which County has the highest turnout you can do something like this:
class County:
def __init__(self, init_name, init_population, init_voters):
self.name = init_name
self.population = init_population
self.voters = init_voters
self.turnout = self.voters / self.population
def highest_turnout(county_list):
highest_turnout = 0
highest_county = None
for county in county_list:
if county.turnout > highest_turnout:
highest_turnout = county.turnout
highest_county = county.name
return(highest_county, highest_turnout)
allegheny = County("allegheny", 1000490, 645469)
philadelphia = County("philadelphia", 1134081, 539069)
montgomery = County("montgomery", 568952, 399591)
lancaster = County("lancaster", 345367, 230278)
delaware = County("delaware", 414031, 284538)
chester = County("chester", 319919, 230823)
bucks = County("bucks", 444149, 319816)
data = [allegheny, philadelphia, montgomery, lancaster, delaware, chester, bucks]
result = highest_turnout(data)
print(result)
I feel like the function highest_turnout requires a loop. You're sending it a list of counties, and you should loop through them by doing something like
def highest_turnout(data):
highest_turnout = 0
for county in data:
county_turnout = county.turnout
if county_turnout > highest_turnout:
highest_turnout = county_turnout
highest_county = county
return highest_turnout, highest_county
This also requires you to simplify and rename the class method "add_turnout" to just "turnout". I dont know why you're trying to add a turnout, if it is just supposed to calculate the the voter turnout by dividing the voters through the population and return a percentage.
as a sidenote, I feel like making classes for this problem is very long-winded. You can achieve the same thing much easier using dictionaries instead. you should also rename "data" to "county_list" because this helps you understand whats in the data

Search a series for a word. Return that word and N others in a new column?

Okay, I need help. I created a function to search a string for a specific word. If the function finds the search_word it will return the word the and N words that precede it. The function works fine with my test strings but I cannot figure out how to apply the function to an entire series?
My goal is to create a new column in the data frame that contains the n_words_prior whenever the search_word exists.
n_words_prior = []
test = "New School District, Dale County"
def n_before_string(string, search_word, N):
global n_words_prior
n_words_prior = []
found_word = string.find(search_word)
if found_word == -1: return ""
sentence= string[0:found_word]
n_words_prior = sentence.split()[N:]
n_words_prior.append(search_word)
return n_words_prior
The current dataframe looks like this:
data = [['Alabama', 'New School District, Dale County'],
['Alaska', 'Matanuska-Susitna Borough'],
['Arizona', 'Pima County - Tuscon Unified School District']]
df = pd.DataFrame(data, columns = ['State', 'Place'])
The improved function would take the inputs 'Place','County',-1 and create the following result.
improved_function(column, search_word, N)
new_data = [['Alabama', 'New School District, Dale County','Dale County'],
['Alaska', 'Matanuska-Susitna Borough', ''],
['Arizona', 'Pima County - Tuscon Unified School District','Pima County']]
new_df = pd.DataFrame(new_data, columns = ['State', 'Place','Result'])
I thought embedding this function would help, but it has only made things more confusing.
def fast_add(place, search_word):
df[search_word] = df[Place].str.contains(search_word).apply(lambda search_word: 1 if search_word == True else 0)
def fun(sentence, search_word, n):
"""Return search_word and n preceding words from sentence."""
words = sentence.split()
for i,word in enumerate(words):
if word == search_word:
return ' '.join(words[i-n:i+1])
return ''
Example:
df['Result'] = df.Place.apply(lambda x: fun(x, 'County', 1))
Result:
State Place Result
0 Alabama New School District, Dale County Dale County
1 Alaska Matanuska-Susitna Borough
2 Arizona Pima County - Tuscon Unified School District Pima County

Iterating over set of lists to find highest average in Python

How to create function which iterates over each county, calculating voter turnout percentage?
class County:
def __init__(self, init_name, init_population, init_voters) :
self.name = init_name
self.population = init_population
self.voters = init_voters
def highest_turnout(data) :
100 * (self.voters / self.population)
allegheny = County("allegheny", 1000490, 645469)
philadelphia = County("philadelphia", 1134081, 539069)
montgomery = County("montgomery", 568952, 399591)
lancaster = County("lancaster", 345367, 230278)
delaware = County("delaware", 414031, 284538)
chester = County("chester", 319919, 230823)
bucks = County("bucks", 444149, 319816)
data = [allegheny, philadelphia, montgomery, lancaster, delaware, chester, bucks]
Your class County is defined correctly.
However, function county is not correct.
When passing data in the function highest_turnout, you have to first calculate the percentage of voters in the first County of the list - it is positioned at data[0].
Then we set “highest” to be the country name of the 1st County, we assume that the 1st in the data list is the highest one we have seen.
Next, we use a for loop to start iterating over all the County objects in the list data in order to pass in each County object.
The variable pct gives us the percentage of voters in the County that is running in the current step. The if function compares it to the highest percentage stored in the variable pct. If the new percentage is higher than pct (returns True), we update the highest percentage variable pct and hence update the county name.
def highest_turnout(data) :
highest_pct = data[0].voters / data[0].population
highest = data[0].name
for county in data :
pct = county.voters / county.population
if pct > highest_pct :
highest_pct = pct
highest = county.name

Reading statistics from a .txt file and outputting them

I am supposed to get certain information from a .txt file and output it. This is the information I need:
State with the maximum population
State with the minimum population
Average state population
State of Texas population
The DATA looks like:
Alabama
AL
4802982
Alaska
AK
721523
Arizona
AZ
6412700
Arkansas
AR
2926229
California
CA
37341989
This is my code that does not really do anything I need it to do:
def main():
# Open the StateCensus2010.txt file.
census_file = open('StateCensus2010.txt', 'r')
# Read the state name
state_name = census_file.readline()
while state_name != '':
state_abv = census_file.readline()
population = int(census_file.readline())
state_name = state_name.rstrip('\n')
state_abv = state_abv.rstrip('\n')
print('State Name: ', state_name)
print('State Abv.: ', state_abv)
print('Population: ', population)
print()
state_name = census_file.readline()
census_file.close()
main()
All I have it doing is reading the state name, abv and converting the population into an int. I don't need it to do anything of that, however I'm unsure how to do what the assignment is asking. Any hints would definitely be appreciated! I've been trying some things for the past few hours to no avail.
Update:
This is my updated code however I'm receving the following error:
Traceback (most recent call last):
File "main.py", line 13, in <module>
if population > max_population:
TypeError: unorderable types: str() > int()
Code:
with open('StateCensus2010.txt', 'r') as census_file:
while True:
try:
state_name = census_file.readline()
state_abv = census_file.readline()
population = int(census_file.readline())
except IOError:
break
# data processing here
max_population = 0
for population in census_file:
if population > max_population:
max_population = population
print(max_population)
As the data is in consistent order; Statename, State Abv, Population. So you just need to read the lines one time, and display all three 3 information. Below is the sample code.
average = 0.0
total = 0.0
state_min = 999999999999
state_max = 0
statename_min = ''
statename_max = ''
texas_population = 0
with open('StateCensus2010.txt','r') as file:
# split new line, '\n' here means newline
data = file.read().split('\n')
# get the length of the data by using len() method
# there are 50 states in the text file
# each states have 3 information stored,
# state name, state abreviation, population
# that's why length of data which is 150/3 = 50 states
state_total = len(data)/3
# this count is used as an index for the list
count = 0
for i in range(int(state_total)):
statename = data[count]
state_abv = data[count+1]
population = int(data[count+2])
print('Statename : ',statename)
print('State Abv : ',state_abv)
print('Population: ',population)
print()
# sum all states population
total += population
if population > state_max:
state_max = population
statename_max = statename
if population < state_min:
state_min = population
statename_min = statename
if statename == 'Texas':
texas_population = population
# add 3 because we want to jump to next state
# for example the first three lines is Alabama info
# the next three lines is Alaska info and so on
count += 3
# divide the total population with number of states
average = total/state_total
print(str(average))
print('Lowest population state :', statename_min)
print('Highest population state :', statename_max)
print('Texas population :', texas_population)
This problem is pretty easy using pandas.
Code:
states = []
for line in data:
states.append(
dict(state=line.strip(),
abbrev=next(data).strip(),
pop=int(next(data)),
)
)
df = pd.DataFrame(states)
print(df)
print('\nmax population:\n', df.ix[df['pop'].idxmax()])
print('\nmin population:\n', df.ix[df['pop'].idxmin()])
print('\navg population:\n', df['pop'].mean())
print('\nAZ population:\n', df[df.abbrev == 'AZ'])
Test Data:
from io import StringIO
data = StringIO(u'\n'.join([x.strip() for x in """
Alabama
AL
4802982
Alaska
AK
721523
Arizona
AZ
6412700
Arkansas
AR
2926229
California
CA
37341989
""".split('\n')[1:-1]]))
Results:
abbrev pop state
0 AL 4802982 Alabama
1 AK 721523 Alaska
2 AZ 6412700 Arizona
3 AR 2926229 Arkansas
4 CA 37341989 California
max population:
abbrev CA
pop 37341989
state California
Name: 4, dtype: object
min population:
abbrev AK
pop 721523
state Alaska
Name: 1, dtype: object
avg population:
10441084.6
AZ population:
abbrev pop state
2 AZ 6412700 Arizona
Another pandas solution, from the interpreter:
>>> import pandas as pd
>>>
>>> records = [line.strip() for line in open('./your.txt', 'r')]
>>>
>>> df = pd.DataFrame([records[i:i+3] for i in range(0, len(records), 3)],
... columns=['State', 'Code', 'Pop']).dropna()
>>>
>>> df['Pop'] = df['Pop'].astype(int)
>>>
>>> df
State Code Pop
0 Alabama AL 4802982
1 Alaska AK 721523
2 Arizona AZ 6412700
3 Arkansas AR 2926229
4 California CA 37341989
>>>
>>> df.ix[df['Pop'].idxmax()]
State California
Code CA
Pop 37341989
Name: 4, dtype: object
>>>
>>> df.ix[df['Pop'].idxmin()]
State Alaska
Code AK
Pop 721523
Name: 1, dtype: object
>>>
>>> df['Pop'].mean()
10441084.6
>>>
>>> df.ix[df['Code'] == 'AZ' ]
State Code Pop
2 Arizona AZ 6412700
Please try this the earlier code was not python 3 compatible. It supported python 2.7
def extract_data(state):
total_population = 0
for states, stats in state.items():
population = stats.get('population')
state_name = stats.get('state_name')
states = states
total_population = population + total_population
if 'highest' not in vars():
highest = population
higherst_state_name = state_name
highest_state = states
if 'lowest' not in vars():
lowest = population
lowest_state_name = state_name
lowest_state = states
if highest < population:
highest = population
higherst_state_name = state_name
highest_state = states
if lowest > population:
lowest = population
lowest_state_name = state_name
lowest_state = states
print(highest_state, highest)
print(lowest_state, lowest)
print(len(state))
print(int(total_population/len(state)))
print(state.get('TX').get('population'))
def main():
# Open the StateCensus2010.txt file.
census_file = open('states.txt', 'r')
# Read the state name
state_name = census_file.readline()
state = {}
while state_name != '':
state_abv = census_file.readline()
population = int(census_file.readline())
state_name = state_name.rstrip('\n')
state_abv = state_abv.rstrip('\n')
if state_abv in state:
state[state_abv].update({'population': population, 'state_name': state_name})
else:
state.setdefault(state_abv,{'population': population, 'state_name': state_name})
state_name = census_file.readline()
census_file.close()
return state
state=main()
extract_data(state)

Categories

Resources