Extract and take average of csv column based on condition

Extract and take average of csv column based on condition - python

I'm taking a class on Python, so I'm definitely not using it on a day to day basis but am trying my best to learn. One of my assignments is to read data from a csv file and get the average beginning weight of males and females in the file. I'm not allowed to use Pandas or any other external packages so am just importing csv to read the data. My issue is, I can do the calculations for the first condition but when it gets to the second condition, it's returning the error, 'division by zero.' I have no idea what I'm doing wrong and was hoping someone could help me. I have confirmed the values for males and females by looking at the file and there is data for both.
This is the code that's returning 'division by zero.' If I reverse the rows and do males first instead of females, it does the same for females. If I just print out f_weight and m_weight, f_weight is populated, m_weight returns [].
import csv
def avg_start_weight(csv_data):
with open(csv_data, newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter='|')
f_weight = [float(row[1]) for row in csv_reader if row[0] == 'F']
m_weight = [float(cell[1]) for cell in csv_reader if cell[0] == 'M']
f_average = sum(f_weight) / len(f_weight)
m_average = sum(m_weight) / len(m_weight)
print(f'2. The average female beginning weight is {f_average:.2f} and the average male beginning weight is {m_average:.2f}')
csv_file.close()
csv_data = 'freshman_kgs.csv'
avg_start_weight(csv_data)
I did get it to work this way but am guessing I shouldnt need to close and re-open the file each time I want to test a different condition, so wanted to see if I could get some help figuring out what Im doing wrong:
import csv
def avg_start_weight(csv_data):
with open(csv_data, newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter='|')
f_weight = [float(row[1]) for row in csv_reader if row[0] == 'F']
f_average = sum(f_weight) / len(f_weight)
print(f'2. Average female beginning weight: {f_average:.2f}')
csv_file.close()
with open(csv_data, newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter='|')
m_weight = [float(row[1]) for row in csv_reader if row[0] == 'M']
m_average = sum(m_weight) / len(m_weight)
print(f'Average male beginning weight: {m_average:.2f}')
csv_file.close()
csv_data = "freshman_kgs.csv"
avg_start_weight(csv_data)
The last thing I tried was this and it also returns 'division by zero' depending on which calculation I put second:
import csv
def avg_start_weight(csv_data):
with open(csv_data, newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter='|')
m_weight = [float(row[1]) for row in csv_reader if row[0] == 'M']
m_weight_avg = sum(m_weight) / len(m_weight)
print(f'The average beginning weight for males was {m_weight_avg:.2f}')
f_weight = [float(row[1]) for row in csv_reader if row[0] == 'F']
f_weight_avg = sum(f_weight) / len(f_weight)
print(f'The average beginning weight for females was {f_weight_avg:.2f}')
csv_data = "freshman_kgs.csv"
avg_start_weight(csv_data)
I'm not at all asking for someone to do my homework for me, I'm trying my best to understand this and figure it out myself but am stuck. I really appreciate any help that I get. Ive been looking at others questions and most of them are using Pandas so weren't of much use to me unfortunately.

I see a couple of thing wrong with your code.
the line csv_reader = csv.reader(csv_file, delimiter='|') produces an iterable which you need to process to get at specific lines of data, also when you utilize the with open construct, it isn't necessary to close the file, since that is automatically handled for you. Here is how I would write the required code as a reference:
def avg_start_weight(csv_data, newline=''):
weights = {'M':[], 'F':[]}
with open(csv_data, mode='r') as file:
csv_file = csv.reader(file)
for lines in csv_file:
if lines[0] == 'M' or lines[0] == 'F':
weights[lines[0]].append(int(lines[1]))
print(f"Average Female Weight = {sum(weights['F'])/len(weights['F']):.2f}")
print(f"Average Male Weight = {sum(weights['M'])/len(weights['F']):.2f}")
Notice: I used a dictionary structure to hold a list of Male and female weights. While reading the lines of the csv file, I add new values to the appropriate dictionary key as defined by the first item in each line. Also note that I converted the input data from a string to an integer as I was adding each item to it's appropriate list.

Related

Compare two CSV files and write difference in the same file as an extra column in python

Hey intelligent community,
I need a little bit of help because i think i don't see the the wood in the trees.
i have to CSV files that look like this:
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.2
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.5
I would like to compare both files and than write any changes like this:
Name,Number,Changes
AAC;2.2.3
AAF;2.4.4
ZCX;5.5.5;change: 3.5.2
So on every line when there is a difference in the number, i want to add this as a new column at the end of the line.
The Files are formated the same but sometimes have a new row so thats why i think i have to map the keys.
I come this far but now iam lost in my thoughts:
Python 3.10.9
import csv
Reading the first csv and set mapping
with open('test1.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file1_dict = {row[1]: row[0] for row in rows}
Reading the second csv and set mapping
with open('test2.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file2_dict = {row[1]: row[0] for row in rows}
comparing the keys and find the diff
for k in test1_dict:
if test1_dict[k] != test2:dict[k]
test1_dict[k] = test2_dict[k]
for row in rows:
if row[1] == k:
row.append(test2_dict[k])
#write the csv (not sure how to add the word "change:")
with open('test1.csv', 'w', newline ='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(rows)
If i try this, i don't get a new column, it just "updates" the csv file with the same columns.
For example this code gives me the diff row but i'am not able to just add it to existing file and row.
with open('test1.csv') as fin1:
with open('test2.csv') as fin2:
read1 = csv.reader(fin1)
read2 = csv.reader(fin2)
diff_rows = (row1 for row1, row2 in zip(read1, read2) if row1 != row2)
with open('test3.csv', 'w') as fout:
writer = csv.writer(fout)
writer.writerows(diff_rows)
Does someone have any tips or help for my problem? I read many answers on here but can't figure it out.
Thanks alot.
#bigkeefer
Thanks for your answer, i tried to change it for the delimiter ; but it gives an "list index out of range error".
with open('test3.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=';')
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows}
with open('test4.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=';')
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows}
new_file = ["Name;Number;Changes\n"]
with open('output.csv', 'w') as nf:
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
nf.writelines(new_file)

You will need to adapt this to overwrite your first file etcetera, as you mentioned above, but I've left it like this for your testing purposes. Hopefully this will help you in some way.
I've assumed you've actually got the headers above in each file. If not, remove the slicing on the list creations, and change the new_file variable assignment to an empty list ([]).
with open('f1.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=";")
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows if row}
with open('f2.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=";")
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows if row}
new_file = ["Name,Number,Changes\n"]
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
with open('new.csv', 'w') as nf:
nf.writelines(new_file)

How do I update every row of one column of a CSV with Python?

I'm trying to update every row of 1 particular column in a CSV.
My actual use-case is a bit more complex but it's just the CSV syntax I'm having trouble with, so for the example, I'll use this:
Name
Number
Bob
1
Alice
2
Bobathy
3
If I have a CSV with the above data, how would I get it to add 1 to each number & update the CSV or spit it out into a new file?
How can I take syntax like this & apply it to the CSV?
test = [1,2,3]
for n in test:
n = n+1
print(n)
I've been looking through a bunch of tutorials & haven't been able to quite figure it out.
Thanks!
Edit:
I can read the data & get what I'm looking for printed out, my issue now is just with getting that back into the CSV
import csv
with open('file.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row['name'], (int (row['number']) +1) )
└─$ python3 test_csv_script.py
bob 2
alice 3
bobathy 4

Thank you Mark Tolonen for the comment - that example was very helpful & led me to my solution:
import csv
with open('file.csv', newline='') as csv_input, open('out.csv', 'w') as csv_output:
reader = csv.reader(csv_input)
writer = csv.writer(csv_output)
# Header doesn't need extra processing
header = next(reader)
writer.writerow(header)
for name, number in reader:
writer.writerow([name, (int(number)+1)])
 
Also sharing for anybody who finds this in the future, if you're looking to move the modified data to a new column/header, use this:
import csv
with open('file.csv', newline='') as csv_input, open('out.csv', 'w') as csv_output:
reader = csv.reader(csv_input)
writer = csv.writer(csv_output)
header = next(reader)
header.append("new column")
writer.writerow(header)
for name, number in reader:
writer.writerow([name, number, (int(number)+1)])

You can open another file, out.csv, which you write the new data into.
For example:
import csv
with open('file.csv', newline='') as csvfile, open('out.csv', 'w') as file_write:
reader = csv.DictReader(csvfile)
for row in reader:
file_write.write(row['name'], (int (row['number']) +1) )

getting average of some digits from a csv file as input and Write the averages in an output csv file in python 3

I am learning python3 :), and I am trying to read a CSV file with different rows
and take the average of the scores for each person(in each row)
and write it in a CSV file as an output in python 3.
The input file is like below:
David,5,2,3,1,6
Adele,3,4,1,5,2,4,2,1
...
The output file should seem like below:
David,4.75
Adele,2.75
...
It seems that I am reading the file correctly, as I print
the average for each name in the terminal, but in CSV
output file it prints only the average of the last name
of the input file, while I want to print all names and
corresponding averages in CSV output file.
Anybody can help me with it?
import csv
from statistics import mean
these_grades = []
name_list = []
reader = csv.reader(open('input.csv', newline=''))
for row in reader:
name = row[0]
name_list.append(name)
with open('result.csv', 'w', newline='\n') as f:
writer = csv.writer(f,
delimiter=',',
quotechar='"',
quoting=csv.QUOTE_MINIMAL)
for grade in row[1:]:
these_grades.append(int(grade))
for item in name_list:
writer.writerow([''.join(item), mean(these_grades)])
print('%s,%f' % (name , mean(these_grades)))

There are several issues in your code:
You're not using a context manager (with) when you read the input file. There's no reason to use it when writing but not when reading - you consequently don't close the "input.csv" file
You're using a list to store data from rows. This doesn't easily distinguish between the person's name and the scores associated with the person. It would be better to use a dictionary in which the key is the person's name, and the values stored against that key are the individual scores
You repeatedly open the file within a for loop in 'w' mode. Every time you open a file in write mode, it just wipes all the previous contents. You actually do write each row to the file, but you just wipe it again when you open the file on the next iteration.
You can use:
import csv
import statistics
# use a context manager to read the data in too, not just for writing
with open('input.csv') as infile:
reader = csv.reader(infile)
data = list(reader)
# Create a dictionary to store the scores against the name
scores = {}
for row in data:
scores[row[0]] = row[1:] # First item in the row is the key (name) and the rest is values
with open('output.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
# Now we need to iterate the dictionary and average the score on each iteration
for name, scores in scores.items():
ave_score = statistics.mean([int(item) for item in scores])
writer.writerow([name, ave_score])
This can be further consolidated, but it's less easy to see what's happening:
with open('input.csv') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
name = row[0]
values = row[1:]
ave_score = statistics.mean(map(int, values))
writer.writerow([name, ave_score])

Get readings from duplicate names in CSV file Python

I am fairly new at Python and am having some issues reading in my csv file. There are sensor names, datestamps and readings in each column. However, there are multiple of the same sensor name, which I have already made a list of the different options called OPTIONS, shown below
OPTIONS = []
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
Options
prints fine,
But now I am having issues retrieving any readings, and using them to calculate average and maximum readings for each unique sensor name.
here are a few lines of sensor_data.csv, which goes from 2018-01-01 to 2018-12-31 for sensor_1 to sensor_25.
Any help would be appreciated.

What you have for the readings variable is just the reading of each row. One way to get the average readings is to keep track of the sum and count of readings (sum_readings and count_readings respectively) and then after the for loop you can get the average by dividing the sum with the count. You can get the maximum by initializing a max_readings variable with a reading minimum value (I assume to be 0) and then update the variable whenever the current reading is larger than max_readings (max_readings < readings)
import csv
OPTIONS = []
OPTIONS_READINGS = {}
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
OPTIONS_READINGS[row[0]] = []
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
OPTIONS_READINGS[row[0]].append(readings)
for option in OPTIONS_READINGS:
print(option)
readings = OPTIONS_READINGS[option]
print('Max readings:', max(readings))
print('Average readings:', sum(readings) / len(readings))
Edit: Sorry I misread the question. If you want to get the maximum and average of each unique options, there is a more straight forward way which is to use an additional dictionary-type variable OPTIONS_READINGS whose keys are the option names and the values are the list of readings. You can find the maximum and average reading of an options by simply using the expression max(OPTIONS_READINGS[option]) and sum(OPTIONS_READINGS[option]) / len(OPTIONS_READINGS[option]) respectively.

A shorter version below
import csv
from collections import defaultdict
readings = defaultdict(list)
with open('sensor_data.csv', 'r') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
readings[row[0]].append(float(row[2]) )
for sensor_name,values in readings.items():
print('Sensor: {}, Max readings: {}, Avg: {}'.format(sensor_name,max(values), sum(values)/ len(values)))

Python: CSV Traceback Error

I'm using python to create a program that involves me creating a CSV file and storing details in this file. I also need the program to read from the file and print the specified details. The following code shows how I have implemented the CSV file and how I make the program read from it.
with open("SpeedTracker.csv", "a", newline="") as csvfile:
writeToFile=csv.writer(csvfile)
writeToFile.writerow([("Reg: ",RegistrationPlate),('First Camera Time:',FirstCameraTime),("Second Camera Time:",SecondCameraTime),("Average Speed:",AverageSpeed2,"MPH"),])
with open('SpeedTracker.csv', newline='') as csvfile:
SpeedDetails = csv.reader(csvfile, delimiter=',')
for Reg, Average in SpeedDetails:
print(Reg, Average)
However, when ever I run the code and follow the instructions as a user, I get an error that I can't understand. The error looks like this:
Traceback (most recent call last):
File "main.py", line 24, in <module>
for Reg, Average in SpeedDetails:
ValueError: too many values to unpack (expected 2)
exited with non-zero status
I don't know what I'm supposed to do to correct this. Can someone please show me where I'm going wrong and teach me the right method so that I know what to do in the future?
Thanks a lot for the help,
Mohammed.

with open('SpeedTracker.csv', newline='') as csvfile:
rows = csv.reader(csvfile, delimiter=',')
for row in rows:
for SpeedDetails in row:
reg = row[0]
firstCam = row[1]
secondCam = row[2]
AvgSpeed = row[3]
print(reg)
print(firstCam)
print(secondCam)
print(AvgSpeed)
There are two problems the code you gave. 1) You need to loop over each row before you start trying to retrieve the data in the columns. 2) There are four items in each row, but you are trying to stick these four items into two variables (reg, and Average)
But a more ideal way of doing this would be to write out the csv headers, and create a more normal looking CSV file. Like so.
import csv
import os
RegistrationPlate = FirstCameraTime = SecondCameraTime = AverageSpeed2 = 2
with open("SpeedTracker.csv", "a", newline="") as csvfile:
writeToFile=csv.writer(csvfile)
#if the csv headers have not been written yet, write them
if os.path.getsize("SpeedTracker.csv") == 0:
writeToFile.writerow(["Reg", "First Camera Time", "Second Camera Time", "Average Speed"])
writeToFile.writerow([RegistrationPlate,FirstCameraTime,SecondCameraTime,AverageSpeed2])
with open('SpeedTracker.csv', newline='') as csvfile:
rows = csv.reader(csvfile, delimiter=',')
next(rows) #skip headers
for row in rows:
reg = row[0]
firstCam = row[1]
secondCam = row[2]
AvgSpeed = row[3]
print(reg)
print(firstCam)
print(secondCam)
print(AvgSpeed)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract and take average of csv column based on condition - python

Related

Compare two CSV files and write difference in the same file as an extra column in python

How do I update every row of one column of a CSV with Python?

getting average of some digits from a csv file as input and Write the averages in an output csv file in python 3

Get readings from duplicate names in CSV file Python

Python: CSV Traceback Error

Categories

Resources