UnboundLocalError: local variable date referenced before assignment - python

I have a file here that asks the user for a city/county, reads the file to find any lines with the city or county they specified, and in the end the program should print the date that the number of increase in cases was highest.
def main():
#open the file
myFile = open("Covid Data.txt")
#read the first line
firstLine = myFile.readline()
#set current, previous, and greatest to 0
current = 0
previous = 0
greatest = 0
#ask user for a city/county name
userLocation = input("Please enter a location ").title
#for each line in the file
for dataLine in myFile:
#strip the end of the line
dataLine = dataLine.rstrip("\n")
#split the data line by the commas and place the parts into a list
dataList = dataLine.split(",")
#if dataList[2] is equal to location
if dataList[2] == userLocation:
#subtract previous from current to find the number of cases that the total increased by
cases = current - previous
#if cases is higher than what is currently set as the greatest
if cases > greatest:
#set the new greatest to amount of cases
greatest = cases
#save the date of the current line
date = str(dataList[0])
#At the end print the data for the highest number of cases
print("On",date," ",location," had the highest increase of cases with ",cases," cases.")
#close file
For some reason, every time I run the code, after I type in what city/county I want to view information for, I keep getting an UnboundLocalError for the variable "date". It tells me that it was referenced before assignment, even though I clearly define it. Why am I getting this error?

You will need to initialize a value for the date variable before entering the loop. For example date = None. Same with cases. The problem is that if there is no valid data available, the date in the loop never gets set and thus doesn't exist.
You also are not altering the values of current or previous, which might be the cause for the bug you're seeing where the date variable never gets set (cases will always get value 0 in the loop).
Also there is a typo in the print, where you try to use location instead of the actual variable called userLocation.

My friend, you are having the problem of locals() and globals() attribute.
I am quite sure, If you put:
globals()[date]= str(dataList[0])
you won't have this problem anymore. Check this page, in 5 minutes you will understand:
https://www.geeksforgeeks.org/global-local-variables-python/

Your code has more defects.
The title is a method so you have to use as .title().
You have to define your variables outside of conditions.
The location variable is undefined in your print function.
I have written a working version from your code.
Code:
def main():
# open the file
myFile = open("Covid Data.txt")
# read the first line
firstLine = myFile.readline()
# set current, previous, and greatest to 0
current = 0
previous = 0
greatest = 0
cases = 0
date = None
# ask user for a city/county name
userLocation = input("Please enter a location ").title()
# for each line in the file
for dataLine in myFile:
# strip the end of the line
dataLine = dataLine.rstrip("\n")
# split the data line by the commas and place the parts into a list
dataList = dataLine.split(",")
print(dataList)
# if dataList[2] is equal to location
if dataList[2] == userLocation:
# subtract previous from current to find the number of cases that the total increased by
cases = current - previous
# if cases is higher than what is currently set as the greatest
if cases > greatest:
# set the new greatest to amount of cases
greatest = cases
# save the date of the current line
date = str(dataList[0])
# At the end print the data for the highest number of cases
print("On", date, " ", userLocation, " had the highest increase of cases with ", cases, " cases.")
myFile.close()
main()
Covid Data.txt:
First line
2020.12.04,placeholder,Miami
Test:
>>> python3 test.py
Please enter a location Texas
On None Texas had the highest increase of cases with 0 cases.
>>> python3 test.py
Please enter a location Miami
On None Miami had the highest increase of cases with 0 cases.
NOTE:
As you can see above, your logic doesn't work but the script can run. Some of conditions will be always False. For example because of this the date variable won't get value so it will be always None.

Related

Using program in another file gives different output

I'm having a rather unique issue with my code that I have not experienced before and could use some guidance.
Here is an attempt a short explanation:
Basically, I have a program with many functions that are tied to one main one. It takes in data from files sent to it and gives output based on many factors. Running this function in the file itself gives the proper results, however, if I import this function and run it in the main.py, it gives very, very incorrect output.
I am going to do my best to show the least amount of code in this post, so here is the GitHub. Please use it for further reference and understanding of what is happening. I don't know any websites that I can use to link and run my code for these purposes.
sentiment_analysis.py is the file with all of the functions. main.py is the file that utilizes it all, and driver.py is the file given by my prof to test this assignment.
Basic assignment explanation (skip if not needed for answering the question): Take in twitter data from the files given along with keywords that have an associated happiness value. Take all data, split into timezone regions (approximation based on given point values, not real timezones), and then give back basic information about the data in the files. ie. Average happiness per timezone, total keyword tweets, and total tweets, for each region.
Running sentiment_analysis will currently give correct output based on heavy testing.
Running main and driver will give incorrect output. Ex. tweets2 has 25 total lines of twitter data, but using driver will return 91 total tweets and keyword tweets (eastern data, 4th test scenario in driver.py) instead of the expected 15 total tweets in that region.
I've spent about 3 hours testing scenarios and outputting different information to try and debug but have had no luck. If anyone has any idea why it's returning different outputs when called in a different file, that would be great.
The following are the three most important functions in the file, with the first being the one called in another file.
def compute_tweets(tweets, keywords):
try:
with open(tweets, encoding="utf-8", errors="ignore") as f: # opens the file
tweet_list = f.read().splitlines() # reads and splitlines the file. Gets rid of the \n
print(tweet_list)
with open(keywords, encoding="utf-8", errors="ignore") as f:
keyword_dict = {k: int(v) for line in f for k,v in [line.strip().split(',')]}
# instead of opening this file normally i am using dictionary comprehension to turn the entire file into a dictionary
# instead of the standard list which would come from using the readlines() function.
determine_timezone(tweet_list) # this will run the function to split all pieces of the file into region specific ones
eastern = calculations(keyword_dict, eastern_list)
central = calculations(keyword_dict, central_list)
mountain = calculations(keyword_dict, mountain_list)
pacific = calculations(keyword_dict, pacific_list)
return final_calculation(eastern, central, mountain, pacific)
except FileNotFoundError as excpt:
empty_list = []
print(excpt)
print("One or more of the files you entered does not exist.")
return empty_list
# Constants for Timezone Detection
# eastern begin
p1 = [49.189787, -67.444574]
p2 = [24.660845, -67.444574]
# Central begin, eastern end
p3 = [49.189787, -87.518395]
# p4 = [24.660845, -87.518395] - Not needed
# Mountain begin, central end
p5 = [49.189787, -101.998892]
# p6 = [24.660845, -101.998892] - Not needed
# Pacific begin, mountain end
p7 = [49.189787, -115.236428]
# p8 = [24.660845, -115.236428] - Not needed
# pacific end, still pacific
p9 = [49.189787, -125.242264]
# p10 = [24.660845, -125.242264]
def determine_timezone(tweet_list):
for index, tweet in enumerate(tweet_list): # takes in index and tweet data and creates a for loop
long_lat = get_longlat(tweet) # determines the longlat for the tweet that is currently needed to work on
if float(long_lat[0]) <= float(p1[0]) and float(long_lat[0]) >= float(p2[0]):
if float(long_lat[1]) <= float(p1[1]) and float(long_lat[1]) > float(p3[1]):
# this is testing for the eastern region
eastern_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p3[1]) and float(long_lat[1]) > float(p5[1]):
# testing for the central region
central_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p5[1]) and float(long_lat[1]) > float(p7[1]):
# testing for mountain region
mountain_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p7[1]) and float(long_lat[1]) >= float(p9[1]):
# testing for pacific region
pacific_list.append(tweet_list[index])
else:
# if nothing is found, continue to the next element in the tweet data and do nothing
continue
else:
# if nothing is found for the longitude, then also continue
continue
def calculations(keyword_dict, tweet_list):
# - Constants for caclulations and returns
total_tweets = 0
total_keyword_tweets = 0
average_happiness = 0
happiness_sum = 0
for entry in tweet_list: # saying for each piece of the tweet list
word_list = input_splitting(entry) # run through the input splitting for list of words
total_tweets += 1 # add one to total tweets
keyword_happened_counter = 0 # this is used to know if the word list has already had a keyword tweet. Needs to be
# reset to 0 again in this spot.
for word in word_list: # for each word in that word list
for key, value in keyword_dict.items(): # take the key and respective value for each item in the dict
# print("key:", key, "val:", value)
if word == key: # if the word we got is the same as the key value
if keyword_happened_counter == 0: # and the keyword counter hasnt gone up
total_keyword_tweets += 1 # add one to the total keyword tweets
keyword_happened_counter += 1 # then add one to keyword happened counter
happiness_sum += value # and, if we have a keyword tweet, no matter what add to the happiness sum
else:
continue # if we don't have a word == key, continue iterating.
if total_keyword_tweets != 0:
average_happiness = happiness_sum / total_keyword_tweets # calculation for the average happiness value
else:
average_happiness = 0
return [average_happiness, total_keyword_tweets, total_tweets] # returning a tuple of info in proper order
My apologies for the wall of both text and code. I'm new to making posts on here and am trying to include all relevant information... If anyone knows of a better way to do this aside from using github and code blocks, please do let me know.
Thanks in advance.

How to read a text file and convert into a list for use with statistics package in Python

The code I am running so far is as follows
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
index = 0
while index < len(values):
values(index) = int(values(index))
index += 1
print(values)
main()
The text file contains 41 rows of numbers each entered on a single line like so:
151868
153982
156393
158956
161884
165069
168088
etc.
My tasks is to create a program which shows average change in population during the time period. The year with the greatest increase in population during the time period. The year with the smallest increase in population (from the previous year) during the time period.
The code will print each of the text files entries on a single line, but upon trying to convert to int for use with the statistics package I am getting the following error:
values(index) = int(values(index))
SyntaxError: can't assign to function call
The values(index) = int(values(index)) line was taken from reading as well as resources on stack overflow.
You can change values = infile.read() to values = list(infile.read())
and it will have it ouput as a list instead of a string.
One of the things that tends to happen whenever reading a file like this is, at the end of every line there is an invisible '\n' that declares a new line within the text file, so an easy way to split it by lines and turn them into integers would be, instead of using values = list(infile.read()) you could use values = values.split('\n') which splits the based off of lines, as long as values was previously declared.
and the while loop that you have can be easily replace with a for loop, where you would use len(values) as the end.
the values(index) = int(values(index)) part is a decent way to do it in a while loop, but whenever in a for loop, you can use values[i] = int(values[i]) to turn them into integers, and then values becomes a list of integers.
How I would personally set it up would be :
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
values = values.split('\n') # Splits based off of lines
for i in range(0, len(values)) : # loops the length of values and turns each part of values into integers
values[i] = int(values[i])
changes = []
# Use a for loop to get the changes between each number.
for i in range(0, len(values)-1) : # you put the -1 because there would be an indexing error if you tried to count i+1 while at len(values)
changes.append(values[i+1] - values[i]) # This will get the difference between the current and the next.
print('The max change :', max(changes), 'The minimal change :', min(changes))
#And since there is a 'change' for each element of values, meaning if you print both changes and values, you would get the same number of items.
print('A change of :', max(changes), 'Happened at', values[changes.index(max(changes))]) # changes.index(max(changes)) gets the position of the highest number in changes, and finds what population has the same index (position) as it.
print('A change of :', min(changes), 'Happened at', values[changes.index(min(changes))]) #pretty much the same as above just with minimum
# If you wanted to print the second number, you would do values[changes.index(min(changes)) + 1]
main()
If you need any clarification on anything I did in the code, just ask.
I personally would use numpy for reading a text file.
in your case I would do it like this:
import numpy as np
def main ():
infile = np.loadtxt('USPopulation.txt')
maxpop = np.argmax(infile)
minpop = np.argmin(infile)
print(f'maximum population = {maxpop} and minimum population = {minpop}')
main()

Function takes exactly 3 arguments (1 given)? Help formatting print statement

Here are my questions:
Create a function called "numSchools" that counts the schools of a specific type. The function should have three input parameters, (1) a string for the workspace, (2) a string for the shapefile name, and (3) a string for the facility type (e.g. "HIGH SCHOOL"), and one output parameter, (1) an integer for the number of schools of that facility type in the shapefile.
import arcpy
shapefile = "Schools.shp"
work = r"c:\Scripts\Lab 6 Data"
sTyp = "HIGH SCHOOL"
def numSchools(work, shapefile, sTyp):
whereClause = "\"FACILITY\" = 'HIGH SCHOOL' " # where clause for high schools
field = ['FACILITY']
searchCurs = arcpy.SearchCursor(shapefile, field, whereClause)
row = searchCurs.next()
for row in searchCurs:
# using getValue() to get the name of the high school
value = row.getValue("NAME")
high_schools = [row[0] for row in arcpy.SearchCursor(shapefile, field, whereClause)]
count = arcpy.GetCount_management(high_schools)
return count
numSchools(work, shapefile, sTyp)
print ("There are a total of: "),count
So this is my code that runs perfectly, but it is accomplished by scripting. I need to wrap it into a python function. (MY WEAKNESS). It seems there are some problems with the last line of my code. `
I am not quite sure how to format this last line of code to read
(there are a total of 29 high schools) while including necessary arguments.
You need to explicitly pass the arguments.
count = numSchools(work, shapefile, sTyp)
print("There are a total of: ", count)

Python algorithm error when trying to find the next largest value

I've written an algorithm that scans through a file of "ID's" and compares that value with the value of an integer i (I've converted the integer to a string for comparison, and i've trimmed the "\n" prefix from the line). The algorithm compares these values for each line in the file (each ID). If they are equal, the algorithm increases i by 1 and uses reccurtion with the new value of i. If the value doesnt equal, it compares it to the next line in the file. It does this until it has a value for i that isn't in the file, then returns that value for use as the ID of the next record.
My issue is i have a file of ID's that list 1,3,2 as i removed a record with ID 2, then created a new record. This shows the algorithm to be working correctly, as it gave the new record the ID of 2 which was previously removed. However, when i then create a new record, the next ID is 3, resulting in my ID list reading: 1,3,2,3 instead of 1,3,2,4. Bellow is my algorithm, with the results of the print() command. I can see where its going wrong but can't work out why. Any ideas?
Algorithm:
def _getAvailableID(iD):
i = iD
f = open(IDFileName,"r")
lines = f.readlines()
for line in lines:
print("%s,%s,%s"%("i=" + str(i), "ID=" + line[:-1], (str(i) == line[:-1])))
if str(i) == line[:-1]:
i += 1
f.close()
_getAvailableID(i)
return str(i)
Output:
(The output for when the algorithm was run for finding an appropriate ID for the record that should have ID of 4):
i=1,ID=1,True
i=2,ID=1,False
i=2,ID=3,False
i=2,ID=2,True
i=3,ID=1,False
i=3,ID=3,True
i=4,ID=1,False
i=4,ID=3,False
i=4,ID=2,False
i=4,ID=2,False
i=2,ID=3,False
i=2,ID=2,True
i=3,ID=1,False
i=3,ID=3,True
i=4,ID=1,False
i=4,ID=3,False
i=4,ID=2,False
i=4,ID=2,False
I think your program is failing because you need to change:
_getAvailableID(i)
to
return _getAvailableID(i)
(At the moment the recursive function finds the correct answer which is discarded.)
However, it would probably be better to simply put all the ids you have seen into a set to make the program more efficient.
e.g. in pseudocode:
S = set()
loop over all items and S.add(int(line.rstrip()))
i = 0
while i in S:
i += 1
return i
In case you are simply looking for the max ID in the file and then want to return the next available value:
def _getAvailableID(IDFileName):
iD = '0'
with open(IDFileName,"r") as f:
for line in f:
print("ID=%s, line=%s" % (iD, line))
if line > iD:
iD = line
return str(int(iD)+1)
print(_getAvailableID("IDs.txt"))
with an input file containing
1
3
2
it outputs
ID=1, line=1
ID=1
, line=3
ID=3
, line=2
4
However, we can solve it in a more pythonic way:
def _getAvailableID(IDFileName):
with open(IDFileName,"r") as f:
mx_id = max(f, key=int)
return int(mx_id)+1

How to call in a specifc csv field value in python

I am so new to python (a week in) so I hope I ask this question properly.
I have imported a grade sheet in csv format into python 2.7. The first column is the name of the student and the column titles are the name of the assignments. So the data looks something like this:
Name Test1 Test2 Test3
Robin 89 78 100
...
Rick 72 100 98
I want to be able to do (or have someone else do) 3 things just by typing in the name of the person and the assignment.
1. Get the score for that person for that assignment
2. Get the average score for that assignment
3. Get that persons average score
But for some reason I get lost at figuring how to get python to recognize the field I am trying to call in. So far this is what I have (so far the only part that works is calling in file):
data = csv.DictReader(open("C:\file.csv"))
for row in data:
print row
def grade()
student= input ("Enter a student name: ")
assignment= input("Enter a assignment: ")
for row in data:
task_grade= data.get(int(row["student"], int(row["assignment"])) # specific grade
task_total= sum(int(row['assignment'])) #assignment total
student_total= #student assignments total-- no clue how to do this
task_average= task_total/11
average_score= student_total/9
You can access the individual "columns" of your csv this way:
import csv
def parse_csv():
csv_file = open('data.csv', 'r')
r = csv.reader(csv_file)
grade_averages = {}
for row in r:
if row[0].startswith('Name'):
continue
#print "Student: ", row[0]
grades = []
for column in row[1:]:
#print "Grade: ", column
grades.append(int(column.strip()))
grade_total = 0
for i in grades:
grade_total += i
grade_averages[row[0]] = grade_total / len(grades)
#print "grade_averages: ", grade_averages
return grade_averages
def get_grade(student_name):
grade_averages = parse_csv()
return grade_averages[student_name]
print "Rick: ", get_grade('Rick')
print "Robin: ", get_grade('Robin')
What you are trying to do is not meant for Python because you have keys and values. However...
If you know that your columns are always the same, no need to use keywords, you can use positions:
Here is the easy, inefficient* way to do 1 and 3:
students_name = ...
number = ...
for line in open("C:\file.csv")).readlines()
items = line.split()
num_assignments = len(items)-1
name = items[0]
if name = students_name:
print("assignment score: {0}".format(items[number]))
asum = 0
for k in range(0,num_assignments):
asum+= items[k+1]
print("their average: {0}".format(asum / num_assignments)
To do 2, you should precompute the averages and return them beucase the averages for each assignment is the same for each user query.
I say easy *innefficnet because you search the text file for each user query each time a name is entered. To do it properly, you should probably build a dictionary of all names and their information. But that solution is more complicated, and you are only a week in! Moreover, its longer and you should give it a try. Look up dict.
I believe the reason you are not seeing the field the second time around is because the iterator returned by csv.DictReader() is a one-time iterator. That is to say, once you've reached the last row of the csv file, it will not reset to the first position.
So, by doing this:
data = csv.DictReader(open("C:\file.csv"))
for row in data:
print row
You are running it out. Try commenting those lines and see if that helps.

Categories

Resources