How to parse csv file and compute stats based on that data

How to parse csv file and compute stats based on that data - python

I have a task which requires me to make a program in python that reads a text file which has information about people (name, weight and height).
Then I need the program to ask for the user to enter a name then look for that name in the text file and print out the line which includes that name and the person's height and weight.
Then the program has to work out the average weight of the people and average height.
The text file is:
James,73,1.82,M
Peter,78,1.80,M
Jay,90,1.90,M
Beth,65,1.53.F
Mags,66,1.50,F
Joy,62,1.34,F
So far I have this code which prints out the line using the name that has been typed by the user but I don't know to assign the heights and the weights:
search = input("Who's information would you like to find?")
with open("HeightAndWeight.txt", "r") as f:
for line in f:
if search in line:
print(line)

Using the pandas library as suggested, you can do as follows:
import pandas as pd
df = pd.read_csv('people.txt', header=None, index_col=0)
df.columns = ['weight', 'height', 'sex']
print(df)
weight height sex
0
James 73 1.82 M
Peter 78 1.80 M
Jay 90 1.90 M
Beth 65 1.53 F
Mags 66 1.50 F
Joy 62 1.34 F
print(df.mean())
weight 72.333333
height 1.648333

You could use Python's built in csv module to split each line in the file into a list of columns as follows:
import csv
with open('HeightAndWeight.txt', 'rb') as f_input:
csv_input = csv.reader(f_input)
total_weight = 0
total_height = 0
for index, row in enumerate(csv_input, start=1):
total_weight += float(row[1])
total_height += float(row[2])
print "Average weight: {:.2f}".format(total_weight / index)
print "Average height: {:.2f}".format(total_height / index)
This would display the following output:
Average weight: 72.33
Average height: 1.65

The answer is actually in your question's title : use the standard lib's csv module to parse your file

Use:
splitted_line = line.split(',', 4)
to split the line you just found into four parts, using the comma , as a delimiter. You then can get the first part (the name) with splitted_line[0], the second part (age) with splitted_line[1] and so on. So, to print out the persons name, height and weight:
print('The person %s is %s years old and %s meters tall.' % (splitted_line[0], splitted_line[1], splitted_line[2]))
To get the average on height and age, you need to know how many entries are in your file, and then just add up age and height and divide it by the number of entries/persons. The whole thing would look like:
search = input("Who's information would you like to find?")
total = 0
age = 0
height = 0
with open("HeightAndWeight.txt", "r") as f:
for line in f:
total += 1
splitted_line = line.split(',', 4)
age += int(splitted_line[1])
height += int(splitted_line[2])
if search in line:
print('The person %s is %s years old and %s meters tall.' % (splitted_line[0], splitted_line[1], splitted_line[2]))
average_age = age / total
average_height = height / total
That's one straightforward way to do it, and hopefully easy to understand as well.

search = input("Who's information would you like to find?")
heights = []
weights = []
with open("HeightAndWeight.txt", "r") as f:
for line in f:
if search in line:
print(line)
heights.append(int(line.split(',')[2]))
weights.append(int(line.split(',')[1]))
# your calculation stuff

Related

HTML tables not showing up after running Python code

My Code is not running somehow. I'm trying to create a table that has the min, max, and average score from 5 tests in my csv file. (my columns the tests and min, max, and average the rows.)
I'm also trying to make a table for the names of students that can be found in the same csv file I'm using. The formatting I want to use is : Last name, first name, average, min and max.
Then lastly I want to determine which student has the highest average and print their name along with their score, outside of the tables.
Here's a sample for my csv file:
Jailen McQueen
89
30
70
71
26
Cay Phillip
90
10
86
3
50
Gerry Green
87
70
40
90
55
Here is my code so far:
import csv
import webbrowser
csvFile = open('testdata (1).csv') #opens csv file
csvReader = csv.reader(csvFile, delimiter=";")
test1 = []
test2 = []
test3 = []
test4 = []
test5 = []
n = {}
for i in csvReader:
test1.append(int(i[1]))
test2.append(int(i[2]))
test3.append(int(i[3]))
test4.append(int(i[4]))
test5.append(int(i[5]))
n[1[0]] = 1[1:]
csvFile.close()
testNames = {}
for v in n.keys():
tests = n[x]
tests = [int(a)for b in tests]
minimum = min(tests)
maximum = max(tests)
mean = sum(tests)/float(len(tests))
f = x.split()[0]
l = x.split()[1]
testNames[f+", "+l] = [mean, minimum, maximum]
html = open('lab3.html','w')
html.write('<html>\n<body>\n<h1>Test data</h1>\n<ul>\n') #starts creating the table
html.write('<table border=‘1’><tr><td><td>Test 1</td><td>Test 2</td><td>Test 3</td><td>Test 4</td><td>Test 5</td></tr>\n')
csvFile.close()
html.close
webbrowser.open_new_tab('lab3.html')
Thank you so much for the help!

Python the total sales price from a text file

I would like to subtract two numbers that are written in my text file.I have to calculate the total sales in each line by doing sales - cost price.
the text file contains:
200 123
300 189
111 77
I would like to subtract these values to get the output. Each line gives a different sales profit.

Here's a way to do it:
input_file = open('input.txt', 'r')
sales_profits = []
for line in input_file:
sales, cost = map(int, line.split(" "))
sales_profits.append(sales - cost)
print(sales_profits) # Your result is there - do what you want with :)

How to count occurances and calculate a rating with the csv module?

You have a CSV file of individual song ratings and you'd like to know the average rating for a particular song. The file will contain a single 1-5 rating for a song per line.
Write a function named average_rating that takes two strings as parameters where the first string represents the name of a CSV file containing song ratings in the format: "YouTubeID, artist, title, rating" and the second parameter is the YouTubeID of a song. The YouTubeID, artist, and title are all strings while the rating is an integer in the range 1-5. This function should return the average rating for the song with the inputted YouTubeID.
Note that each line of the CSV file is individual rating from a user and that each song may be rated multiple times. As you read through the file you'll need to track a sum of all the ratings as well as how many times the song has been rated to compute the average rating. (My code below)
import csv
def average_rating(csvfile, ID):
with open(csvfile) as f:
file = csv.reader(f)
total = 0
total1 = 0
total2 = 0
for rows in file:
for items in ID:
if rows[0] == items[0]:
total = total + int(rows[3])
for ratings in total:
total1 = total1 + int(ratings)
total2 = total2 + 1
return total1 / total2
I am getting error on input ['ratings.csv', 'RH5Ta6iHhCQ']: division by zero. How would I go on to resolve the problem?

You can do this by using pandas DataFrame.
import pandas as pd
df = pd.read_csv('filename.csv')
total_sum = df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating.sum()
n_rating = len(df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating)
average = total_sum/n_rating

There are a few confusing things, I think renaming variables and refactoring would be a smart decision. It might even make things more obvious if one function was tasked with getting all the rows for a specific youtube id and another function for calculating the average.
def average_rating(csvfile, id):
'''
Calculate the average rating of a youtube video
params: - csvfile: the location of the source rating file
- id: the id of the video we want the average rating of
'''
total_ratings = 0
count = 0
with open(csvfile) as f:
file = csv.reader(f)
for rating in file:
if rating[0] == id:
count += 1
total_ratings += rating[3]
if count == 0:
return 0
return total_ratings / count

import csv
def average_rating(csvfile, ID) :
with open(csvfile) as f:
file = csv.reader(f)
cont = 0
total = 0
for rows in file:
if rows[0] == ID:
cont = cont + 1
total = total + int(rows[3])
return total/cont
this works guyx

CSV file for calculating passing of failing grade from a set of marks

I have an CSV file, which has the current format below.
A two row example of this would be:
first_Name last_Name test1 test2 test3 test4
Alex Brian 11 17 13 24
Pete Tong 19 14 12 30
Now my current code does not work, Simply put I am not sure if I am on the right track.
My current code:
def grader(test1, test2, test3, finalExam):
first = test1 * .20
second = test2 * .20
third = test3 * .20
fourth = finalExam *.40
finalGrade = first + second + third + fourth
return finalGrade
def gradeScores(FinalGrade):
if FinalGrade >= 90 and FinalGrade <= 100:
return("You received an A")
elif FinalGrade >= 80 and FinalGrade < 90:
return("You received a B")
elif FinalGrade >= 70 and FinalGrade < 80:
return("You received a C")
elif FinalGrade >= 60 and FinalGrade < 70:
return("You received a D")
else:
return("Sorry, you received an F")
I also have this line of code which is to read the CSV file, and displays in the output window.
with open("studentGradeFrom.csv") as csvfile:
readFile = csv.reader(csvfile, delimiter=",", quotechar="¦")
for row in readFile:
print(row)
However since I am new to Python, I am looking for help to create a python script, that will look at the results and do a calculation which will tell me if the student has passed or failed.
I would like this to be done in a separate file. So I am guessing that I will need to read and write to a different CSV file, to show if a student has failed or has an overall passing percentage.
with open("studentGradeTo.csv", 'w') as avg: #used to write to after the calculation is complete
loadeddata = open("studentGradeFrom.csv", 'r') #used to read the data from the CSV before calculation.
writer=csv.writer(avg)
readloaded=csv.reader(loadeddata)
listloaded=list(readloaded)
Now my question: How would I go about doing this, from looking at data from one file which roughly 50 different students. While not changing the read CSV with the student grades, and only changing the CSV file which shows passing or failing grades. Any help would be appreciated.
Edit: I forgot to mention that the first test would be work 20% of the final grade, the same with the second test and third test. these three totalling to 60% of final grade. While the fourth test is worth 40% of the final grade.

Here is a quick example of the concepts using only the csv library (you could certainly optimize a lot of this, but it should work for the example).
import csv
student_grades = []
# First open up your file containing the raw student grades
with open("studentGradeFrom.csv", "r") as file:
# Setup your reader with whatever your settings actually are
csv_file = csv.DictReader(file, delimiter=",", quotechar='"')
# Cycle through each row of the csv file
for row in csv_file:
# Calculate the numerical grade of the student
grade = grader(
int(row["test1"]),
int(row["test2"]),
int(row["test3"]),
int(row["test4"])
)
# Calculate the letter score for the student
score = gradeScores(grade)
# Assemble all the data into a dictionary
# Only need to save fields you need in the final output
student_grades.append({
"first_name": row["first_name"],
"last_name": row["last_name"],
"test1": row["test1"],
"test2": row["test2"],
"test3": row["test3"],
"test4": row["test4"],
"grade": grade,
"score": score
})
# Open up a new csv file to save all the grades
with open("studentGradeFrom.csv", "w", newline="") as file:
# List of column names to use as a header for the file
# These will be used to match the dictionary keys set above
# Only need to list the fields you saved above
column_names = [
"first_name", "last_name", "test1", "test2", "test3",
"test4", "grade", "score"
]
# Create the csv writer, using the above headers
csv_file = csv.DictWriter(file, column_names)
# Write the header
csv_file.writeheader()
# Write each dictionary to the csv file
for student in student_grades:
csv_file.writerow(student)
You would need to fine tune this to your exact requirements, but it will hopefully get you going in the right direction. Most of this is documented in the official documentation if you need a specific reference: https://docs.python.org/3.6/library/csv.html.

This kind of task is suited for the pandas library.
Here is one solution, which is adaptable should your requirements change.
import pandas as pd
df = pd.read_csv('studentGradeFrom.csv')
# first_Name last_Name test1 test2 test3 test4
# 0 Alex Brian 11 17 13 24
# 1 Pete Tong 19 14 12 30
boundaries = {(90, 100.01): 'A',
(80, 90): 'B',
(70, 80): 'C',
(60, 70): 'D',
(0, 60): 'F'}
def grade_calc(x, b):
return next((v for k, v in b.items() if k[0] <= x <= k[1]), None)
df['FinalMark'] = 0.2*df['test1'] + 0.2*df['test2'] + 0.2*df['test3'] + 0.4*df['test4']
df['FinalGrade'] = df['FinalMark'].apply(grade_calc, b=boundaries)
# first_Name last_Name test1 test2 test3 test4 FinalMark FinalGrade
# 0 Alex Brian 11 17 13 24 17.8 F
# 1 Pete Tong 19 14 12 30 21.0 F
df.to_csv('studentGradeTo.csv', index=False)

Show the 5 cities with higher temperature from a text file

I have a text file with some cities and temperatures, like this:
City 1 16
City 2 4
...
City100 20
And Im showing the city with higher temperature with code below.
But I would like to show the 5 cities with higher temperature. Do you see a way to do this? Im here doing some tests but Im always showing 5 times the same city.
#!/usr/bin/env python
import sys
current_city = None
current_max = 0
city = None
for line in sys.stdin:
line = line.strip()
city, temperature = line.rsplit('\t', 1)
try:
temperature = float(temperature)
except ValueError:
continue
if temperature > current_max:
current_max = temperature
current_city = city
print '%s\t%s' % (current_city, current_max)

You can use heapq.nlargest:
import sys
import heapq
# Read cities temperatures pairs
pairs = [
(c, float(t))
for line in sys.stdin for c, t in [line.strip().rsplit('\t', 1)]
]
# Find 5 largest pairs based on second field which is temperature
for city, temperature in heapq.nlargest(5, pairs, key=lambda p: p[1]):
print city, temperature

I like pandas. This is not a complete answer, but I like to encourage people on their way of research. Check this out...
listA = [1,2,3,4,5,6,7,8,9]
import pandas as pd
df = pd.DataFrame(listA)
df.sort(0)
df.tail()
With Pandas, you'll want to learn about Series and DataFrames. DataFrames have a lot of functionality, you can name your columns, create directly from input files, sort by almost anything. There's the common unix words of head and tail (beggining and end), and you can specify count of rows returned....blah blah, blah blah, and so on. I liked the book, "Python for Data Analysis".

Store the list of temperatures and cities in a list. Sort the list. Then, take the last 5 elements: they will be your five highest temperatures.

Read the data into a list, sort the list, and show the first 5:
cities = []
for line in sys.stdin:
line = line.strip()
city, temp = line.rsplit('\t', 1)
cities.append((city, int(temp))
cities.sort(key=lambda city, temp: -temp)
for city, temp in cities[:5]:
print city, temp
This stores the city, temperature pairs in a list, which is then sorted. The key function in the sort tells the list to sort by temperature descending, so the first 5 elements of the list [:5] are the five highest temperature cities.

The following code performs exactly what you need:
fname = "haha.txt"
with open(fname) as f:
content = f.readlines()
content = [line.split(' ') for line in content]
for line in content:
line[1] = float(line[1])
from operator import itemgetter
content = sorted(content, key=itemgetter(1))
print content
to get the country with the highest temprature:
print content[-1]
to get the 5 countries with highest temperatures:
print content[-6:-1]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse csv file and compute stats based on that data - python

The answer is actually in your question's title : use the standard lib's csv module to parse your file

search = input("Who's information would you like to find?") heights = [] weights = [] with open("HeightAndWeight.txt", "r") as f: for line in f: if search in line: print(line) heights.append(int(line.split(',')[2])) weights.append(int(line.split(',')[1])) # your calculation stuff

Related

HTML tables not showing up after running Python code

Python the total sales price from a text file

How to count occurances and calculate a rating with the csv module?

CSV file for calculating passing of failing grade from a set of marks

Show the 5 cities with higher temperature from a text file

Categories

Resources