Problems reading a file in, and editing certain contents of it - python

So I have a file with
first name(space)last name(tab)a grade as such.
Example
Wanda Barber 96
I'm having trouble reading this in as a list and then editing the number.
My current code is,
def TopStudents(n):
original = open(n)
contents = original.readlines()
x = contents.split('/t')
for y in x[::2]:
y - 100
if y > 0: (????)
Here is the point where I'm confused. I am just trying to get the first and last names of students who scored over 100%. I thought of creating a new list for students that meet this qualification, but I'm not sure how I would write the corresponding first and last name. I know I need to take the stride of every other location in the list, as odd will always be the first and last names. Thank you in advance for the help!

There are several things wrong with your code:
- The open file must be closed (#1)
- Must be made a function call using to call it (#2)
- The split used is using the forwardslash (/) instead of the backslash () (#3)
- The way you decided to loop through your for loop is not optimal if you are looking to access all the members (#4)
- The for loops end in a : (#5)
- You must store the result of that calculation somewhere (#6)
def TopStudents(n):
original = open(n) #1
contents = original.readlines #2
x = contents.split('/t') #3
for y in x[::2] #4, #5
y - 100 #6
if y > 0:
That said, a fixed version could be:
original = open(n, 'r')
for line in original:
name, score = line.split('\t')
# If needed, you could split the name into first and last name:
# first_name, last_name = name.split(' ')
# 'score' is a string, we must convert it to an int before comparing to one, so...
score = int(score)
if score > 100:
print("The student " + name + " has the score " + str(score))
original.close() #1 - Closed the file
Note: I have focused on readability with several commentary to help you understand the code.

I always prefer to use ‘with open()’ because it closes the file automatically. I used a txt with comma separations for simplicity for me, but you can just replace the comma with \t.
def TopStudents():
with open('temp.txt', 'r') as original:
contents = list(filter(None, (line.strip().strip('\n') for line in original)))
x = list(part.split(',') for part in contents)
for y in x:
if int(y[1]) > 100:
print(y[0], y[1])
TopStudents()
This opens and loads all lines into contents as a list, removing blank lines and line breaks. Then it separates into a list of lists.
You then iterate through each list in x, looking for the second value (y[1]) which is your grade. If the int() is greater than 100, print each segment of y.

Related

CS50 'DNA': Ways to speed up my Week 6 'dna.py' program?

So for this problem I had to create a program that takes in two arguments. A CSV database like this:
name,AGATC,AATG,TATC
Alice,2,8,3
Bob,4,1,5
Charlie,3,2,5
And a DNA sequence like this:
TAAAAGGTGAGTTAAATAGAATAGGTTAAAATTAAAGGAGATCAGATCAGATCAGATCTATCTATCTATCTATCTATCAGAAAAGAGTAAATAGTTAAAGAGTAAGATATTGAATTAATGGAAAATATTGTTGGGGAAAGGAGGGATAGAAGG
My program works by first getting the "Short Tandem Repeat" (STR) headers from the database (AGATC, etc.), then counting the highest number of times each STR repeats consecutively within the sequence. Finally, it compares these counted values to the values of each row in the database, printing out a name if a match is found, or "No match" otherwise.
The program works for sure, but is ridiculously slow whenever ran using the larger database provided, to the point where the terminal pauses for an entire minute before returning any output. And unfortunately this is causing the 'check50' marking system to time-out and return a negative result upon testing with this large database.
I'm presuming the slowdown is caused by the nested loops within the 'STR_count' function:
def STR_count(sequence, seq_len, STR_array, STR_array_len):
# Creates a list to store max recurrence values for each STR
STR_count_values = [0] * STR_array_len
# Temp value to store current count of STR recurrence
temp_value = 0
# Iterates over each STR in STR_array
for i in range(STR_array_len):
STR_len = len(STR_array[i])
# Iterates over each sequence element
for j in range(seq_len):
# Ensures it's still physically possible for STR to be present in sequence
while (seq_len - j >= STR_len):
# Gets sequence substring of length STR_len, starting from jth element
sub = sequence[j:(j + (STR_len))]
# Compares current substring to current STR
if (sub == STR_array[i]):
temp_value += 1
j += STR_len
else:
# Ensures current STR_count_value is highest
if (temp_value > STR_count_values[i]):
STR_count_values[i] = temp_value
# Resets temp_value to break count, and pushes j forward by 1
temp_value = 0
j += 1
i += 1
return STR_count_values
And the 'DNA_match' function:
# Searches database file for DNA matches
def DNA_match(STR_values, arg_database, STR_array_len):
with open(arg_database, 'r') as csv_database:
database = csv.reader(csv_database)
name_array = [] * (STR_array_len + 1)
next(database)
# Iterates over one row of database at a time
for row in database:
name_array.clear()
# Copies entire row into name_array list
for column in row:
name_array.append(column)
# Converts name_array number strings to actual ints
for i in range(STR_array_len):
name_array[i + 1] = int(name_array[i + 1])
# Checks if a row's STR values match the sequence's values, prints the row name if match is found
match = 0
for i in range(0, STR_array_len, + 1):
if (name_array[i + 1] == STR_values[i]):
match += 1
if (match == STR_array_len):
print(name_array[0])
exit()
print("No match")
exit()
However, I'm new to Python, and haven't really had to consider speed before, so I'm not sure how to improve upon this.
I'm not particularly looking for people to do my work for me, so I'm happy for any suggestions to be as vague as possible. And honestly, I'll value any feedback, including stylistic advice, as I can only imagine how disgusting this code looks to those more experienced.
Here's a link to the full program, if helpful.
Thanks :) x
Thanks for providing a link to the entire program. It seems needlessly complex, but I'd say it's just a lack of knowing what features are available to you. I think you've already identified the part of your code that's causing the slowness - I haven't profiled it or anything, but my first impulse would also be the three nested loops in STR_count.
Here's how I would write it, taking advantage of the Python standard library. Every entry in the database corresponds to one person, so that's what I'm calling them. people is a list of dictionaries, where each dictionary represents one line in the database. We get this for free by using csv.DictReader.
To find the matches in the sequence, for every short tandem repeat in the database, we create a regex pattern (the current short tandem repeat, repeated one or more times). If there is a match in the sequence, the total number of repetitions is equal to the length of the match divided by the length of the current tandem repeat. For example, if AGATCAGATCAGATC is present in the sequence, and the current tandem repeat is AGATC, then the number of repetitions will be len("AGATCAGATCAGATC") // len("AGATC") which is 15 // 5, which is 3.
count is just a dictionary that maps short tandem repeats to their corresponding number of repetitions in the sequence. Finally, we search for a person whose short tandem repeat counts match those of count exactly, and print their name. If no such person exists, we print "No match".
def main():
import argparse
from csv import DictReader
import re
parser = argparse.ArgumentParser()
parser.add_argument("database_filename")
parser.add_argument("sequence_filename")
args = parser.parse_args()
with open(args.database_filename, "r") as file:
reader = DictReader(file)
short_tandem_repeats = reader.fieldnames[1:]
people = list(reader)
with open(args.sequence_filename, "r") as file:
sequence = file.read().strip()
count = dict(zip(short_tandem_repeats, [0] * len(short_tandem_repeats)))
for short_tandem_repeat in short_tandem_repeats:
pattern = f"({short_tandem_repeat}){{1,}}"
match = re.search(pattern, sequence)
if match is None:
continue
count[short_tandem_repeat] = len(match.group()) // len(short_tandem_repeat)
try:
person = next(person for person in people if all(int(person[k]) == count[k] for k in short_tandem_repeats))
print(person["name"])
except StopIteration:
print("No match")
return 0
if __name__ == "__main__":
import sys
sys.exit(main())

Need help editing a value in a list from a text file.

I am not able to add a number to my list that i have in a text file and don't know how to.
Code so far:
def add_player_points():
# Allows the user to add a points onto the players information.
L = open("players.txt","r+")
name = raw_input("\n\tPlease enter the name of the player whose points you wish to add: ")
for line in L:
s = line.strip()
string = s.split(",")
if name == string[0]:
opponent = raw_input("\n\t Enter the name of the opponent: ")
points = raw_input("\n\t Enter how many points you would like to add?: ")
new_points = string[5] + points
L.close()
This is a sample of a key in the text file. There are about 100 in the file:
Joe,Bloggs,J.bloggs#anemailaddress.com,01269 512355, 1, 0, 0, 0,
^
The value that i would like this number to be added to is the 0 besides the number already in there, indicated by an arrow below it. The text file is called players.txt as shown.
A full code answer would be helpful.
This is likely the problem:
new_points = string[5] + points
You are adding a string with another string, you need to convert them to integer
new_points = int(string[5]) + int(points)
This is not checking for incorrect input, but assuming the file format is correct and the user input too, it should work.
Edit: If you want to update the file with the new information, a better way is to divide the problem in 3 parts: 1) Read player information into an appropriate data structure, e.g. a dictionary using the player name as key, 2) Make the changes into the dictionary, and finally 3) save the changes back to file. So your code should be split into 3 functions. Some help can be found here.

Iterating over list with while and for loop in python - issues

I'm trying to query the Twitter API with a list of names and get their friends list. The API part is fine, but I can't figure out how to go through the first 5 names, pull the results, wait for a while to respect the rate limit, then do it again for the next 5 until the list is over. The bit of the code I'm having trouble is this:
first = 0
last = 5
while last < 15: #while last group of 5 items is lower than number of items in list#
for item in list[first:last]: #parses each n twitter IDs in the list#
results = item
text_file = open("output.txt", "a") #creates empty txt output / change path to desired output#
text_file.write(str(item) + "," + results + "\n") #adds twitter ID, resulting friends list, and a line skip to the txt output#
text_file.close()
first = first + 5 #updates list navigation to move on to next group of 5#
last = last + 5
time.sleep(5) #suspends activities for x seconds to respect rate limit#
Shouldn't this script go through the first 5 items in the list, add them to the output file, then change the first:last argument and loop it until the "last" variable is 15 or higher?
No, because your indentation is wrong. Everything happens inside the for loop, so it'll process one item, then change first and last, then sleep...
Move the last three lines back one indent, so that they line up with the for statement. That way they'll be executed once the first five have been done.
Daniel found the issue, but here are some code improvements suggestions:
first, last = 0, 5
with open("output.txt", "a") as text_file:
while last < 15:
for twitter_ID in twitter_IDs[first:last]:
text_file.write("{0},{0}\n".format(twitter_ID))
first += 5
last += 5
time.sleep(5)
As you can see, I removed the results = item as it seemed redundant, leveraged with open..., also used += for increments.
Can you explain why you where doing item = results?

Performing Binary Search in List - Python

I'm working on an assignment for my computer class and I'm having a bit of trouble with a question. Question 1 and 2 kind of overlap, so I'll post both, and the code I have so far.
Question 1: Write a function called readCountries that reads a file and returns a list of countries. The countries should be read from this file, which contains an incomplete list of countries with their area and population. Each line in this file represents one country in the following format:
name, area(in km2), population
When opening the file your function should handle any exceptions that may occur. Your function should completely read in the file, and separate the data into a 2-dimensional list. You may need to split and strip the data as appropriate. Numbers should be converted to their correct types. Your function should return this list so that you can use it in the remaining questions.
This is my code for this part:
def readCountries():
countryList = []
for line in open('Countries.py', 'r'):
with open('Countries.py', 'r') as countriesFile:
countries = countriesFile.read()
countryList.append(line.strip().split())
return countryList
Question 2: Write a function called getCountry that takes a string representing a country name as a parameter. First call your answer from question 1 to get the list of countries, then do a binary search through the list and print the country's information if found.
This is my code for this part:
countryList = readCountries()
def getCountry(countryList, name):
lo, hi = 0, len(countryList) - 1
while lo <= hi:
mid = lo + (hi - lo) // 2
country = countryList[mid]
test_name = country[0]
if name > test_name:
lo = mid + 1
elif name < test_name:
hi = mid - 1
else:
return country
return countries[lo] if countries[lo][0] == name else None
The output is this: ['Canada', '9976140.0', '35295770'] which is partially what I need, but how would I get it to look like this: Canada, Area: 9976140.0, Population: 35295770?
Well, one obvious problem is this line:
readCountries()
should be like this:
countryList = readCountries()
You got half way there by having the readCountries function return the list, but you never actually assign anything to what it's returning, so it just goes off into nowhere.

max min and average looking up file python

I'm trying to create a program that asks for a name of a file, opens the file, and determines the maximum and minimum values in the files, and also computes the average of the numbers in the file. I want to print the max and min values, and return the average number of values in the file. The file has only one number per line, which consists of many different numbers top to bottom. Here is my program so far:
def summaryStats():
fileName = input("Enter the file name: ") #asking user for input of file
file = open(fileName)
highest = 1001
lowest = 0
sum = 0
for element in file:
if element.strip() > (lowest):
lowest = element
if element.strip() < (highest):
highest = element
sum += element
average = sum/(len(file))
print("the maximum number is ") + str(highest) + " ,and the minimum is " + str(lowest)
file.close()
return average
When I run my program, it is giving me this error:
summaryStats()
Enter the file name: myFile.txt
Traceback (most recent call last):
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 8, in summaryStats
builtins.TypeError: unorderable types: str() > int()
I think I'm struggling determining which part to make a string. What do you guys think?
You are comparing two incompatible types str and int. You need a make sure you are comparing similar types. You may want to rewrite your for loop to include a call to make sure you are comparing two int values.
for element in file:
element_value = int(element.strip())
if element_value > (lowest):
lowest = element
if element_value < (highest):
highest = element_value
sum += element_value
average = sum/(len(file))
When python reads in files, it reads them in as type str for the whole line. You make the call to strip to remove surrounding white space and newline characters. You then need to parse the remaining str into the correct type (int) for comparison and manipulation.
You should read through your error messages, they are there to enlighten you on where and why your code failed to run. The error message traces where the error took place. the line
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 8, in summaryStats
Tells you to examine line 8 which is the place for the error takes place.
The next line:
builtins.TypeError: unorderable types: str() > int()
Tells you what is going wrong. A quick search through the python docs locates the description of the error. An easy way to search for advice is to look in the documentation for the language and maybe search for the entire error message. It is likely you are not the first person with this problem and that there is probably a discussion and solution advice available to figure out your specific error.
Lines like these:
if element.strip() > (lowest):
Should probably be explicitly converting to a number. Currently you're comparing a str to and int. Converting using int will take whitespace into account, where int(' 1 ') is 1
if int(element.string()) > lowest:
Also, you could do this like so:
# Assuming test.txt is a file with a number on each line.
with open('test.txt') as f:
nums = [int(x) for x in f.readlines()]
print 'Max: {0}'.format(max(nums))
print 'Min: {0}'.format(min(nums))
print 'Average: {0}'.format(sum(nums) / float(len(nums)))
when you call open(filename), you are constructing a file object. You cannot iterate through this in a for loop.
If each value is on it's own line: after creating the file object, call:
lines = file.readlines()
Then loop through those lines and convert to int:
for line in lines:
value = int(line)

Categories

Resources