Performing Binary Search in List - Python - python

I'm working on an assignment for my computer class and I'm having a bit of trouble with a question. Question 1 and 2 kind of overlap, so I'll post both, and the code I have so far.
Question 1: Write a function called readCountries that reads a file and returns a list of countries. The countries should be read from this file, which contains an incomplete list of countries with their area and population. Each line in this file represents one country in the following format:
name, area(in km2), population
When opening the file your function should handle any exceptions that may occur. Your function should completely read in the file, and separate the data into a 2-dimensional list. You may need to split and strip the data as appropriate. Numbers should be converted to their correct types. Your function should return this list so that you can use it in the remaining questions.
This is my code for this part:
def readCountries():
countryList = []
for line in open('Countries.py', 'r'):
with open('Countries.py', 'r') as countriesFile:
countries = countriesFile.read()
countryList.append(line.strip().split())
return countryList
Question 2: Write a function called getCountry that takes a string representing a country name as a parameter. First call your answer from question 1 to get the list of countries, then do a binary search through the list and print the country's information if found.
This is my code for this part:
countryList = readCountries()
def getCountry(countryList, name):
lo, hi = 0, len(countryList) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2
country = countryList[mid]
test_name = country[0]
if name > test_name:
lo = mid + 1
elif name < test_name:
hi = mid - 1
else:
return country
return countries[lo] if countries[lo][0] == name else None
The output is this: ['Canada', '9976140.0', '35295770'] which is partially what I need, but how would I get it to look like this: Canada, Area: 9976140.0, Population: 35295770?

Well, one obvious problem is this line:
readCountries()
should be like this:
countryList = readCountries()
You got half way there by having the readCountries function return the list, but you never actually assign anything to what it's returning, so it just goes off into nowhere.

Related

Using program in another file gives different output

I'm having a rather unique issue with my code that I have not experienced before and could use some guidance.
Here is an attempt a short explanation:
Basically, I have a program with many functions that are tied to one main one. It takes in data from files sent to it and gives output based on many factors. Running this function in the file itself gives the proper results, however, if I import this function and run it in the main.py, it gives very, very incorrect output.
I am going to do my best to show the least amount of code in this post, so here is the GitHub. Please use it for further reference and understanding of what is happening. I don't know any websites that I can use to link and run my code for these purposes.
sentiment_analysis.py is the file with all of the functions. main.py is the file that utilizes it all, and driver.py is the file given by my prof to test this assignment.
Basic assignment explanation (skip if not needed for answering the question): Take in twitter data from the files given along with keywords that have an associated happiness value. Take all data, split into timezone regions (approximation based on given point values, not real timezones), and then give back basic information about the data in the files. ie. Average happiness per timezone, total keyword tweets, and total tweets, for each region.
Running sentiment_analysis will currently give correct output based on heavy testing.
Running main and driver will give incorrect output. Ex. tweets2 has 25 total lines of twitter data, but using driver will return 91 total tweets and keyword tweets (eastern data, 4th test scenario in driver.py) instead of the expected 15 total tweets in that region.
I've spent about 3 hours testing scenarios and outputting different information to try and debug but have had no luck. If anyone has any idea why it's returning different outputs when called in a different file, that would be great.
The following are the three most important functions in the file, with the first being the one called in another file.
def compute_tweets(tweets, keywords):
try:
with open(tweets, encoding="utf-8", errors="ignore") as f: # opens the file
tweet_list = f.read().splitlines() # reads and splitlines the file. Gets rid of the \n
print(tweet_list)
with open(keywords, encoding="utf-8", errors="ignore") as f:
keyword_dict = {k: int(v) for line in f for k,v in [line.strip().split(',')]}
# instead of opening this file normally i am using dictionary comprehension to turn the entire file into a dictionary
# instead of the standard list which would come from using the readlines() function.
determine_timezone(tweet_list) # this will run the function to split all pieces of the file into region specific ones
eastern = calculations(keyword_dict, eastern_list)
central = calculations(keyword_dict, central_list)
mountain = calculations(keyword_dict, mountain_list)
pacific = calculations(keyword_dict, pacific_list)
return final_calculation(eastern, central, mountain, pacific)
except FileNotFoundError as excpt:
empty_list = []
print(excpt)
print("One or more of the files you entered does not exist.")
return empty_list
# Constants for Timezone Detection
# eastern begin
p1 = [49.189787, -67.444574]
p2 = [24.660845, -67.444574]
# Central begin, eastern end
p3 = [49.189787, -87.518395]
# p4 = [24.660845, -87.518395] - Not needed
# Mountain begin, central end
p5 = [49.189787, -101.998892]
# p6 = [24.660845, -101.998892] - Not needed
# Pacific begin, mountain end
p7 = [49.189787, -115.236428]
# p8 = [24.660845, -115.236428] - Not needed
# pacific end, still pacific
p9 = [49.189787, -125.242264]
# p10 = [24.660845, -125.242264]
def determine_timezone(tweet_list):
for index, tweet in enumerate(tweet_list): # takes in index and tweet data and creates a for loop
long_lat = get_longlat(tweet) # determines the longlat for the tweet that is currently needed to work on
if float(long_lat[0]) <= float(p1[0]) and float(long_lat[0]) >= float(p2[0]):
if float(long_lat[1]) <= float(p1[1]) and float(long_lat[1]) > float(p3[1]):
# this is testing for the eastern region
eastern_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p3[1]) and float(long_lat[1]) > float(p5[1]):
# testing for the central region
central_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p5[1]) and float(long_lat[1]) > float(p7[1]):
# testing for mountain region
mountain_list.append(tweet_list[index])
elif float(long_lat[1]) <= float(p7[1]) and float(long_lat[1]) >= float(p9[1]):
# testing for pacific region
pacific_list.append(tweet_list[index])
else:
# if nothing is found, continue to the next element in the tweet data and do nothing
continue
else:
# if nothing is found for the longitude, then also continue
continue
def calculations(keyword_dict, tweet_list):
# - Constants for caclulations and returns
total_tweets = 0
total_keyword_tweets = 0
average_happiness = 0
happiness_sum = 0
for entry in tweet_list: # saying for each piece of the tweet list
word_list = input_splitting(entry) # run through the input splitting for list of words
total_tweets += 1 # add one to total tweets
keyword_happened_counter = 0 # this is used to know if the word list has already had a keyword tweet. Needs to be
# reset to 0 again in this spot.
for word in word_list: # for each word in that word list
for key, value in keyword_dict.items(): # take the key and respective value for each item in the dict
# print("key:", key, "val:", value)
if word == key: # if the word we got is the same as the key value
if keyword_happened_counter == 0: # and the keyword counter hasnt gone up
total_keyword_tweets += 1 # add one to the total keyword tweets
keyword_happened_counter += 1 # then add one to keyword happened counter
happiness_sum += value # and, if we have a keyword tweet, no matter what add to the happiness sum
else:
continue # if we don't have a word == key, continue iterating.
if total_keyword_tweets != 0:
average_happiness = happiness_sum / total_keyword_tweets # calculation for the average happiness value
else:
average_happiness = 0
return [average_happiness, total_keyword_tweets, total_tweets] # returning a tuple of info in proper order
My apologies for the wall of both text and code. I'm new to making posts on here and am trying to include all relevant information... If anyone knows of a better way to do this aside from using github and code blocks, please do let me know.
Thanks in advance.

Python function doesnt work inside a loop

I am trying to create a code to compare gene file with gene panels.
The gene panel file is in csv format and has Chromosome, gene, start location and end locations.
patients file has chromosome, mutations and the location.
so i made a loop to pass gene panel information to a function where the comparison is done to return me a list of similar items.
the function works great when i call it with manual data. but doenst not do the comparison inside the loop.
import vcf
import os, sys
records = open('exampleGenePanel.csv')
read = vcf.Reader(open('examplePatientFile.vcf','r'))
#functions to find mutations in patients sequence
def findMutations(gn,chromo,start,end):
start = int(start)
end = int(end)
for each in read:
CHROM = each.CHROM
if CHROM != chromo:
continue
POS = each.POS
if POS < start:
continue
if POS > end:
continue
REF = each.REF
ALT = each.ALT
print (gn,CHROM,POS,REF,ALT)
list.append([gn,CHROM,POS,REF,ALT])
return list
gene = records.readlines()
list=[]
y = len (gene)
x=1
while x < 3:
field = gene[x].split(',')
gname = field[0]
chromo = field[1]
gstart = field[2]
gend = field[3]
findMutations(gname,chromo,gstart,gend)
x = x+1
if not list:
print ('Mutation not found')
else:
print (len(list),' Mutations found')
print (list)
i want to get the details of matching mutations in the list.
This works as expected when i pass the data manually to the function.
Eg.findMutations('TESTGene','chr8','146171437','146229161')
But doesnt compare when passed through the loop
The problem is that findMutations attempts to read from read each time it is called, but after the first call, read has already been read and there's nothing left. I suggest reading the contents of read once, before calling the function, then save the results in a list. Then findMutations can read the list each time it is called.
It would also be a good idea to use a name other than list for your result list, since that name conflicts with the Python built-in function. It would also be better to have findMutations return its result list rather than append it to a global.

Problems reading a file in, and editing certain contents of it

So I have a file with
first name(space)last name(tab)a grade as such.
Example
Wanda Barber 96
I'm having trouble reading this in as a list and then editing the number.
My current code is,
def TopStudents(n):
original = open(n)
contents = original.readlines()
x = contents.split('/t')
for y in x[::2]:
y - 100
if y > 0: (????)
Here is the point where I'm confused. I am just trying to get the first and last names of students who scored over 100%. I thought of creating a new list for students that meet this qualification, but I'm not sure how I would write the corresponding first and last name. I know I need to take the stride of every other location in the list, as odd will always be the first and last names. Thank you in advance for the help!
There are several things wrong with your code:
- The open file must be closed (#1)
- Must be made a function call using to call it (#2)
- The split used is using the forwardslash (/) instead of the backslash () (#3)
- The way you decided to loop through your for loop is not optimal if you are looking to access all the members (#4)
- The for loops end in a : (#5)
- You must store the result of that calculation somewhere (#6)
def TopStudents(n):
original = open(n) #1
contents = original.readlines #2
x = contents.split('/t') #3
for y in x[::2] #4, #5
y - 100 #6
if y > 0:
That said, a fixed version could be:
original = open(n, 'r')
for line in original:
name, score = line.split('\t')
# If needed, you could split the name into first and last name:
# first_name, last_name = name.split(' ')
# 'score' is a string, we must convert it to an int before comparing to one, so...
score = int(score)
if score > 100:
print("The student " + name + " has the score " + str(score))
original.close() #1 - Closed the file
Note: I have focused on readability with several commentary to help you understand the code.
I always prefer to use ‘with open()’ because it closes the file automatically. I used a txt with comma separations for simplicity for me, but you can just replace the comma with \t.
def TopStudents():
with open('temp.txt', 'r') as original:
contents = list(filter(None, (line.strip().strip('\n') for line in original)))
x = list(part.split(',') for part in contents)
for y in x:
if int(y[1]) > 100:
print(y[0], y[1])
TopStudents()
This opens and loads all lines into contents as a list, removing blank lines and line breaks. Then it separates into a list of lists.
You then iterate through each list in x, looking for the second value (y[1]) which is your grade. If the int() is greater than 100, print each segment of y.

Function takes exactly 3 arguments (1 given)? Help formatting print statement

Here are my questions:
Create a function called "numSchools" that counts the schools of a specific type. The function should have three input parameters, (1) a string for the workspace, (2) a string for the shapefile name, and (3) a string for the facility type (e.g. "HIGH SCHOOL"), and one output parameter, (1) an integer for the number of schools of that facility type in the shapefile.
import arcpy
shapefile = "Schools.shp"
work = r"c:\Scripts\Lab 6 Data"
sTyp = "HIGH SCHOOL"
def numSchools(work, shapefile, sTyp):
whereClause = "\"FACILITY\" = 'HIGH SCHOOL' " # where clause for high schools
field = ['FACILITY']
searchCurs = arcpy.SearchCursor(shapefile, field, whereClause)
row = searchCurs.next()
for row in searchCurs:
# using getValue() to get the name of the high school
value = row.getValue("NAME")
high_schools = [row[0] for row in arcpy.SearchCursor(shapefile, field, whereClause)]
count = arcpy.GetCount_management(high_schools)
return count
numSchools(work, shapefile, sTyp)
print ("There are a total of: "),count
So this is my code that runs perfectly, but it is accomplished by scripting. I need to wrap it into a python function. (MY WEAKNESS). It seems there are some problems with the last line of my code. `
I am not quite sure how to format this last line of code to read
(there are a total of 29 high schools) while including necessary arguments.
You need to explicitly pass the arguments.
count = numSchools(work, shapefile, sTyp)
print("There are a total of: ", count)

Python Min-Max Function - List as argument to return min and max element

Question: write a program which first defines functions minFromList(list) and maxFromList(list). Program should initialize an empty list and then prompt user for an integer and keep prompting for integers, adding each integer to the list, until the user enters a single period character. Program should than call minFromList and maxFromList with the list of integers as an argument and print the results returned by the function calls.
I can't figure out how to get the min and max returned from each function separately. And now I've added extra code so I'm totally lost. Anything helps! Thanks!
What I have so far:
def minFromList(list)
texts = []
while (text != -1):
texts.append(text)
high = max(texts)
return texts
def maxFromList(list)
texts []
while (text != -1):
texts.append(text)
low = min(texts)
return texts
text = raw_input("Enter an integer (period to end): ")
list = []
while text != '.':
textInt = int(text)
list.append(textInt)
text = raw_input("Enter an integer (period to end): ")
print "The lowest number entered was: " , minFromList(list)
print "The highest number entered was: " , maxFromList(list)
I think the part of the assignment that might have confused you was about initializing an empty list and where to do it. Your main body that collects data is good and does what it should. But you ended up doing too much with your max and min functions. Again a misleading part was that assignment is that it suggested you write a custom routine for these functions even though max() and min() exist in python and return exactly what you need.
Its another story if you are required to write your own max and min, and are not permitted to use the built in functions. At that point you would need to loop over each value in the list and track the biggest or smallest. Then return the final value.
Without directly giving you too much of the specific answer, here are some individual examples of the parts you may need...
# looping over the items in a list
value = 1
for item in aList:
if item == value:
print "value is 1!"
# basic function with arguments and a return value
def aFunc(start):
end = start + 1
return end
print aFunc(1)
# result: 2
# some useful comparison operators
print 1 > 2 # False
print 2 > 1 # True
That should hopefully be enough general information for you to piece together your custom min and max functions. While there are some more advanced and efficient ways to do min and max, I think to start out, a simple for loop over the list would be easiest.

Categories

Resources