How to count occurances and calculate a rating with the csv module? - python

You have a CSV file of individual song ratings and you'd like to know the average rating for a particular song. The file will contain a single 1-5 rating for a song per line.
Write a function named average_rating that takes two strings as parameters where the first string represents the name of a CSV file containing song ratings in the format: "YouTubeID, artist, title, rating" and the second parameter is the YouTubeID of a song. The YouTubeID, artist, and title are all strings while the rating is an integer in the range 1-5. This function should return the average rating for the song with the inputted YouTubeID.
Note that each line of the CSV file is individual rating from a user and that each song may be rated multiple times. As you read through the file you'll need to track a sum of all the ratings as well as how many times the song has been rated to compute the average rating. (My code below)
import csv
def average_rating(csvfile, ID):
with open(csvfile) as f:
file = csv.reader(f)
total = 0
total1 = 0
total2 = 0
for rows in file:
for items in ID:
if rows[0] == items[0]:
total = total + int(rows[3])
for ratings in total:
total1 = total1 + int(ratings)
total2 = total2 + 1
return total1 / total2
I am getting error on input ['ratings.csv', 'RH5Ta6iHhCQ']: division by zero. How would I go on to resolve the problem?

You can do this by using pandas DataFrame.
import pandas as pd
df = pd.read_csv('filename.csv')
total_sum = df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating.sum()
n_rating = len(df[df['YouTubeID'] == 'RH5Ta6iHhCQ'].rating)
average = total_sum/n_rating

There are a few confusing things, I think renaming variables and refactoring would be a smart decision. It might even make things more obvious if one function was tasked with getting all the rows for a specific youtube id and another function for calculating the average.
def average_rating(csvfile, id):
'''
Calculate the average rating of a youtube video
params: - csvfile: the location of the source rating file
- id: the id of the video we want the average rating of
'''
total_ratings = 0
count = 0
with open(csvfile) as f:
file = csv.reader(f)
for rating in file:
if rating[0] == id:
count += 1
total_ratings += rating[3]
if count == 0:
return 0
return total_ratings / count

import csv
def average_rating(csvfile, ID) :
with open(csvfile) as f:
file = csv.reader(f)
cont = 0
total = 0
for rows in file:
if rows[0] == ID:
cont = cont + 1
total = total + int(rows[3])
return total/cont
this works guyx

Related

Extracting data from a .txt file without using modules

I am taking a course in python and one of the problem sets is as follows:
Read in the contents of the file SP500.txt which has monthly data for 2016 and 2017 about the S&P 500 closing prices as well as some other financial indicators, including the “Long Term Interest Rate”, which is interest rate paid on 10-year U.S. government bonds.
Write a program that computes the average closing price (the second column, labeled SP500) and the highest long-term interest rate. Both should be computed only for the period from June 2016 through May 2017. Save the results in the variables mean_SP and max_interest.
SP500.txt:
Date,SP500,Dividend,Earnings,Consumer Price Index,Long Interest Rate,Real Price,Real Dividend,Real Earnings,PE10
1/1/2016,1918.6,43.55,86.5,236.92,2.09,2023.23,45.93,91.22,24.21
2/1/2016,1904.42,43.72,86.47,237.11,1.78,2006.62,46.06,91.11,24
3/1/2016,2021.95,43.88,86.44,238.13,1.89,2121.32,46.04,90.69,25.37
4/1/2016,2075.54,44.07,86.6,239.26,1.81,2167.27,46.02,90.43,25.92
5/1/2016,2065.55,44.27,86.76,240.23,1.81,2148.15,46.04,90.23,25.69
6/1/2016,2083.89,44.46,86.92,241.02,1.64,2160.13,46.09,90.1,25.84
7/1/2016,2148.9,44.65,87.64,240.63,1.5,2231.13,46.36,91,26.69
8/1/2016,2170.95,44.84,88.37,240.85,1.56,2251.95,46.51,91.66,26.95
9/1/2016,2157.69,45.03,89.09,241.43,1.63,2232.83,46.6,92.19,26.73
10/1/2016,2143.02,45.25,90.91,241.73,1.76,2214.89,46.77,93.96,26.53
11/1/2016,2164.99,45.48,92.73,241.35,2.14,2241.08,47.07,95.99,26.85
12/1/2016,2246.63,45.7,94.55,241.43,2.49,2324.83,47.29,97.84,27.87
1/1/2017,2275.12,45.93,96.46,242.84,2.43,2340.67,47.25,99.24,28.06
2/1/2017,2329.91,46.15,98.38,243.6,2.42,2389.52,47.33,100.89,28.66
3/1/2017,2366.82,46.38,100.29,243.8,2.48,2425.4,47.53,102.77,29.09
4/1/2017,2359.31,46.66,101.53,244.52,2.3,2410.56,47.67,103.74,28.9
5/1/2017,2395.35,46.94,102.78,244.73,2.3,2445.29,47.92,104.92,29.31
6/1/2017,2433.99,47.22,104.02,244.96,2.19,2482.48,48.16,106.09,29.75
7/1/2017,2454.1,47.54,105.04,244.79,2.32,2504.72,48.52,107.21,30
8/1/2017,2456.22,47.85,106.06,245.52,2.21,2499.4,48.69,107.92,29.91
9/1/2017,2492.84,48.17,107.08,246.82,2.2,2523.31,48.76,108.39,30.17
10/1/2017,2557,48.42,108.01,246.66,2.36,2589.89,49.05,109.4,30.92
11/1/2017,2593.61,48.68,108.95,246.67,2.35,2626.9,49.3,110.35,31.3
12/1/2017,2664.34,48.93,109.88,246.52,2.4,2700.13,49.59,111.36,32.09
My solution (correct but not optimal):
file = open("SP500.txt", "r")
content = file.readlines()
# List that will hold the range of months we need
data=[]
for line in content:
# Get a list of values for each line
values = line.split(',')
# Return lines with the required dates
for i in range(6,13):
month_range = f"{i}/1/2016"
if month_range == values[0]:
data.append(values)
# Return lines with the required dates
for i in range(1,6):
month_range = f"{i}/1/2017"
if month_range == values[0]:
data.append(values)
sum_total = 0
max_interest = 0
# Loop through the data of our required months
for entry in data:
# Get the sum total
sum_price += float(entry[1])
# Find the highest interest rate in list
if max_interest < float(entry[5]):
max_interest = float(entry[5])
mean_SP = sum_total / len(data)
I'm self-learning these concepts and I would love to learn a better way of implementing this solution. My code seems borderline hard coding (exact date in values[0]) and I imagine it to be error prone for bigger problems. Especially the excessive looping that's being done, which seems quite exaustive for such a simple problem.
Thanks in advance.
EDIT:
New code (based Deepak Tripathi answer):
with open('SP500.txt') as f:
lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_year, end_year = 2016, 2017
start_month, end_month = 6, 5
mean_SP, max_interest = 0, -1000 # Some random negative number
total_entries = 0
for line in lines[1:]:
date_values = line[date_index].split('/')
if (int(date_values[2]) == start_year and int(date_values[0]) >= start_month) or (int(date_values[2]) == end_year and int(date_values[0]) <= end_month):
total_entries += 1
mean_SP += float(line[spf_index])
max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= total_entries
print(mean_SP, max_interest)
I think you can optimized by storing the index of columns in some variable
with open('temp.txt') as f:
lines = f.readlines()
lines = [line.rstrip().split(",") for line in lines]
date_index, spf_index, long_interest_rate = 0, 1, 5
start_date, end_date = "01/06/2016", "31/05/2017"
mean_SP, max_interest = 0, -1000 # Some random negative number
for line in lines[1:]:
if start_date.zfill(10) <= line[date_index] <= end_date.zfill(10):
mean_SP += float(line[spf_index])
max_interest = max(max_interest, float(line[long_interest_rate]))
mean_SP /= len(lines[1:])
print(mean_SP, max_interest)

Trouble printing out the max key/value pair in a dictionary

I'm working on trying to calculate the greatest increase/decrease in a change to profits/losses over time from a CSV.
The data set in csv is as follows (extract only):
Date,Profit/Losses
Jan-2010,867884
Feb-2010,984655
Mar-2010,322013
Apr-2010,-69417
So far, i've imported the csv file and added the items to a dictionary. Calculated total months, total profit/loss, calculated the change in profit/loss from month to month but now need to find the greatest and smallest change in the month and have the code return both the month and the change figure.
The output when trying to print the greatest increase/decrease returns only the final month on the list and all change values (instead of just the biggest change value and it's corresponding month)
Here is the code. Would appreciate any perspective:
budget = {}
total_months = 0
total_pnl = 0
date = 0
pnl = 0
monthly_change = []
previous_pnl = 0
greatest_increase = ["Date",[0]]
greatest_decrease = ["Date",[100000000000000]]
with open(csvpath, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
header = next(csvreader)
for row in csvreader:
date = 0
pnl = 1
budget[row[date]] = int(row[pnl])
for date, pnl in budget.items():
total_months = total_months + 1
total_pnl = total_pnl + pnl
pnlchange = pnl - previous_pnl
if total_months > 1:
monthly_change.append(pnlchange)
previous_pnl = pnl
if (monthly_change > greatest_increase[1]):
greatest_increase[1] = monthly_change
greatest_increase[0] = row[0]
if (monthly_change < greatest_decrease[1]):
greatest_decrease[1] = monthly_change
greatest_decrease[0] = row[0]
print(greatest_increase)
The primary problem is the final part of the code (the if statement). When I print 'greatest_increase' this currently returns the final value in the list rather than the highest value of change.
current output is:
[['Feb-2017', '671099'], [116771, -662642, -391430, 379920, 212354, 510239, -428211, -821271, 693918, 416278, -974163, 860159, -1115009, 1033048, 95318, -308093, 99052, -521393, 605450, 231727, -65187, -702716, 177975, -1065544, 1926159, -917805, 898730, -334262, -246499, -64055, -1529236, 1497596, 304914, -635801, 398319, -183161, -37864, -253689, 403655, 94168, 306877, -83000, 210462, -2196167, 1465222, -956983, 1838447, -468003, -64602, 206242, -242155, -449079, 315198, 241099, 111540, 365942, -219310, -368665, 409837, 151210, -110244, -341938, -1212159, 683246, -70825, 335594, 417334, -272194, -236462, 657432, -211262, -128237, -1750387, 925441, 932089, -311434, 267252, -1876758, 1733696, 198551, -665765, 693229, -734926, 77242, 532869]]
What i am trying to get is the bold value being the highest value (along with the relevant month)
Apologies if this isn't clear, I'm still fairly new (3rd week learning!)

How to find specific items in a CSV file using inputs?

I'm still new to python, so forgive me if my code seems rather messy or out of place. However, I need help with an assignment for university. I was wondering how I am able to find specific items in a CSV file? Here is what the assignment says:
Allow the user to type in a year, then, find the average life expectancy for that year. Then find the country with the minimum and the one with the maximum life expectancies for that year.
import csv
country = []
digit_code = []
year = []
life_expectancy = []
count = 0
lifefile = open("life-expectancy.csv")
with open("life-expectancy.csv") as lifefile:
for line in lifefile:
count += 1
if count != 1:
line.strip()
parts = line.split(",")
country.append(parts[0])
digit_code.append(parts[1])
year.append(parts[2])
life_expectancy.append(float(parts[3]))
highest_expectancy = max(life_expectancy)
country_with_highest = country[life_expectancy.index(max(life_expectancy))]
print(f"The country that has the highest life expectancy is {country_with_highest} at {highest_expectancy}!")
lowest_expectancy = min(life_expectancy)
country_with_lowest = country[life_expectancy.index(min(life_expectancy))]
print(f"The country that has the lowest life expectancy is {country_with_lowest} at {lowest_expectancy}!")
It looks like you only want the first and fourth tokens from each row in your CSV. Therefore, let's simplify it like this:
Hong Kong,,,85.29
Japan,,,85.03
Macao,,,84.68
Switzerland,,,84.25
Singapore,,,84.07
You can then process it like this:
FILE = 'life-expectancy.csv'
data = []
with open(FILE) as csv:
for line in csv:
tokens = line.split(',')
data.append((float(tokens[3]), tokens[0]))
hi = max(data)
lo = min(data)
print(f'The country with the highest life expectancy {hi[0]:.2f} is {hi[1]}')
print(f'The country with the lowest life expectancy {lo[0]:.2f} is {lo[1]}')

Print the lowest numeric value from a text file

I have a text file consisting of some stocks and their prices and what not, I am trying to print out the stock which has the lowest value along with the name of the company here is my code.
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = 0
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price>price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))
I'm trying to go through the file and compare each numeric value to the one before it to see if it is higher or lower and then at the end it should have saved the lowest price and print it along with the name of the company.
Instead it prints the value of the last company in the file along with its name.
How can I fix this?
You made 2 mistakes.
First is initialised the initial value to 0
You should initialise the initial value to the max available number in python float.
import sys
price_lowest = sys.float_info.max
Or else you could initialise it to the first element
Second your should if statement should be
if price<price_lowest:
Initialize:
price_lowest = 999999 # start with absurdly high value, or take first one
Plus your if check is the opposite.
Should be:
if price < price_lowest
Others already suggested a solution that fixes your current code. However, using Python you can have a shorter solution:
with open('file') as f:
print min(
[(i.split('\t')[0], float(i.split('\t')[1])) for i in f.readlines()],
key=lambda t: t[1]
)
Your "if" logic is backwards, it should be price<lowest_pre.
Just make a little adjustment start your price_lowest at None then set it to your first encounter and compare from there on
stocks = open("P:\COM661\stocks.txt")
name_lowest = ""
price_lowest = None
for line in stocks:
rows = line.split("\t")
price = float(rows[2])
if price_lowest = None:
price = price_lowest
name_lowest = rows[1]
elif price < price_lowest:
price_lowest = price
name_lowest = rows[1]
print(name_lowest + "\t" + str(price_lowest))

How to parse csv file and compute stats based on that data

I have a task which requires me to make a program in python that reads a text file which has information about people (name, weight and height).
Then I need the program to ask for the user to enter a name then look for that name in the text file and print out the line which includes that name and the person's height and weight.
Then the program has to work out the average weight of the people and average height.
The text file is:
James,73,1.82,M
Peter,78,1.80,M
Jay,90,1.90,M
Beth,65,1.53.F
Mags,66,1.50,F
Joy,62,1.34,F
So far I have this code which prints out the line using the name that has been typed by the user but I don't know to assign the heights and the weights:
search = input("Who's information would you like to find?")
with open("HeightAndWeight.txt", "r") as f:
for line in f:
if search in line:
print(line)
Using the pandas library as suggested, you can do as follows:
import pandas as pd
df = pd.read_csv('people.txt', header=None, index_col=0)
df.columns = ['weight', 'height', 'sex']
print(df)
weight height sex
0
James 73 1.82 M
Peter 78 1.80 M
Jay 90 1.90 M
Beth 65 1.53 F
Mags 66 1.50 F
Joy 62 1.34 F
print(df.mean())
weight 72.333333
height 1.648333
You could use Python's built in csv module to split each line in the file into a list of columns as follows:
import csv
with open('HeightAndWeight.txt', 'rb') as f_input:
csv_input = csv.reader(f_input)
total_weight = 0
total_height = 0
for index, row in enumerate(csv_input, start=1):
total_weight += float(row[1])
total_height += float(row[2])
print "Average weight: {:.2f}".format(total_weight / index)
print "Average height: {:.2f}".format(total_height / index)
This would display the following output:
Average weight: 72.33
Average height: 1.65
The answer is actually in your question's title : use the standard lib's csv module to parse your file
Use:
splitted_line = line.split(',', 4)
to split the line you just found into four parts, using the comma , as a delimiter. You then can get the first part (the name) with splitted_line[0], the second part (age) with splitted_line[1] and so on. So, to print out the persons name, height and weight:
print('The person %s is %s years old and %s meters tall.' % (splitted_line[0], splitted_line[1], splitted_line[2]))
To get the average on height and age, you need to know how many entries are in your file, and then just add up age and height and divide it by the number of entries/persons. The whole thing would look like:
search = input("Who's information would you like to find?")
total = 0
age = 0
height = 0
with open("HeightAndWeight.txt", "r") as f:
for line in f:
total += 1
splitted_line = line.split(',', 4)
age += int(splitted_line[1])
height += int(splitted_line[2])
if search in line:
print('The person %s is %s years old and %s meters tall.' % (splitted_line[0], splitted_line[1], splitted_line[2]))
average_age = age / total
average_height = height / total
That's one straightforward way to do it, and hopefully easy to understand as well.
search = input("Who's information would you like to find?")
heights = []
weights = []
with open("HeightAndWeight.txt", "r") as f:
for line in f:
if search in line:
print(line)
heights.append(int(line.split(',')[2]))
weights.append(int(line.split(',')[1]))
# your calculation stuff

Categories

Resources