finding the mean of two values in the same dictionary - python

I have a csv file that contains this information
grade,low,high
S235,360,510
S275,370,530
S355,470,630
I am wanting to find the difference between high and low but I don't know how to do this. I thought I could do it through numpy (np.mean) but it came up with an error.
this is the code I have written so far
#!/usr/bin/python3
import csv
csvfile = open('/home/aa408/steel.csv')
csvreader = csv.DictReader(csvfile)
print("%7s %8s %8s" % ("Grade", "Low MPa", "Max MPa") )
total = 0
count = 0
for gradeInfo in csvreader:
print("%7s %8s %8s" % (gradeInfo["grade"], gradeInfo["low"],
gradeInfo["high"]) )

The following will calculate the difference between the high and low columns assuming they will only be integers.
import csv
with open('/home/aa408/steel.csv') as cf:
reader = csv.DictReader(cf)
print(reader)
for row in reader:
print(int(row['high']) - int(row['low']))

Related

Summing a column in a .csv file using python

I'm trying to sum a column in a csv file using python. Here's a sample of the csv data;
Date,Profit/Losses
Jan-2010,867884
Feb-2010,984655
Mar-2010,322013
Apr-2010,-69417
May-2010,310503
Jun-2010,522857
Jul-2010,1033096
Aug-2010,604885
Sep-2010,-216386
I want to sum the Profit/Losses column.
I am using the following code but it's returning a 0. Where could I be going wrong?
import os
import csv
# Path to collect data from the csv file in the Resources folder
pybank_csv = os.path.join("resources", "budget_data.csv")
with open(pybank_csv, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
next(csvfile, None)
t = sum(float(row[1]) for row in csvreader)
#print the results
print(f"Total: {t}")
Easiest way is to use pandas library.
Use pip install pandas to install pandas on your machine
and then
import pandas as pd
df = pd.read_csv('your_filename.csv')
sumcol = df['Profit/Losses'].sum()
print(sumcol)
The sum is in sumcol object now. For future reference, If your task is to work with the data provided in csv file, pandas is a blessing. This library provides you with thousands of different types of operations you could perform on your data. Refer Pandas Website for more info.
If you want to make use of csv package only then you can read the csv as a dict and then sum the Profit/Loss entry of dict for each row
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for row in data:
total = total + int(row['Profit/Losses'])
print(total)
Or If you want to use reader instead of dict reader, you need to ignore first row. Something like this
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.reader(csvfile)
for row in data:
if not str(row[1]).startswith('P'):
total = total + int(row[1])
print(total)

Get readings from duplicate names in CSV file Python

I am fairly new at Python and am having some issues reading in my csv file. There are sensor names, datestamps and readings in each column. However, there are multiple of the same sensor name, which I have already made a list of the different options called OPTIONS, shown below
OPTIONS = []
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
Options
prints fine,
But now I am having issues retrieving any readings, and using them to calculate average and maximum readings for each unique sensor name.
here are a few lines of sensor_data.csv, which goes from 2018-01-01 to 2018-12-31 for sensor_1 to sensor_25.
Any help would be appreciated.
What you have for the readings variable is just the reading of each row. One way to get the average readings is to keep track of the sum and count of readings (sum_readings and count_readings respectively) and then after the for loop you can get the average by dividing the sum with the count. You can get the maximum by initializing a max_readings variable with a reading minimum value (I assume to be 0) and then update the variable whenever the current reading is larger than max_readings (max_readings < readings)
import csv
OPTIONS = []
OPTIONS_READINGS = {}
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
OPTIONS_READINGS[row[0]] = []
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
OPTIONS_READINGS[row[0]].append(readings)
for option in OPTIONS_READINGS:
print(option)
readings = OPTIONS_READINGS[option]
print('Max readings:', max(readings))
print('Average readings:', sum(readings) / len(readings))
Edit: Sorry I misread the question. If you want to get the maximum and average of each unique options, there is a more straight forward way which is to use an additional dictionary-type variable OPTIONS_READINGS whose keys are the option names and the values are the list of readings. You can find the maximum and average reading of an options by simply using the expression max(OPTIONS_READINGS[option]) and sum(OPTIONS_READINGS[option]) / len(OPTIONS_READINGS[option]) respectively.
A shorter version below
import csv
from collections import defaultdict
readings = defaultdict(list)
with open('sensor_data.csv', 'r') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
readings[row[0]].append(float(row[2]) )
for sensor_name,values in readings.items():
print('Sensor: {}, Max readings: {}, Avg: {}'.format(sensor_name,max(values), sum(values)/ len(values)))

Python: CSV Traceback Error

I'm using python to create a program that involves me creating a CSV file and storing details in this file. I also need the program to read from the file and print the specified details. The following code shows how I have implemented the CSV file and how I make the program read from it.
with open("SpeedTracker.csv", "a", newline="") as csvfile:
writeToFile=csv.writer(csvfile)
writeToFile.writerow([("Reg: ",RegistrationPlate),('First Camera Time:',FirstCameraTime),("Second Camera Time:",SecondCameraTime),("Average Speed:",AverageSpeed2,"MPH"),])
with open('SpeedTracker.csv', newline='') as csvfile:
SpeedDetails = csv.reader(csvfile, delimiter=',')
for Reg, Average in SpeedDetails:
print(Reg, Average)
However, when ever I run the code and follow the instructions as a user, I get an error that I can't understand. The error looks like this:
Traceback (most recent call last):
File "main.py", line 24, in <module>
for Reg, Average in SpeedDetails:
ValueError: too many values to unpack (expected 2)
exited with non-zero status
I don't know what I'm supposed to do to correct this. Can someone please show me where I'm going wrong and teach me the right method so that I know what to do in the future?
Thanks a lot for the help,
Mohammed.
with open('SpeedTracker.csv', newline='') as csvfile:
rows = csv.reader(csvfile, delimiter=',')
for row in rows:
for SpeedDetails in row:
reg = row[0]
firstCam = row[1]
secondCam = row[2]
AvgSpeed = row[3]
print(reg)
print(firstCam)
print(secondCam)
print(AvgSpeed)
There are two problems the code you gave. 1) You need to loop over each row before you start trying to retrieve the data in the columns. 2) There are four items in each row, but you are trying to stick these four items into two variables (reg, and Average)
But a more ideal way of doing this would be to write out the csv headers, and create a more normal looking CSV file. Like so.
import csv
import os
RegistrationPlate = FirstCameraTime = SecondCameraTime = AverageSpeed2 = 2
with open("SpeedTracker.csv", "a", newline="") as csvfile:
writeToFile=csv.writer(csvfile)
#if the csv headers have not been written yet, write them
if os.path.getsize("SpeedTracker.csv") == 0:
writeToFile.writerow(["Reg", "First Camera Time", "Second Camera Time", "Average Speed"])
writeToFile.writerow([RegistrationPlate,FirstCameraTime,SecondCameraTime,AverageSpeed2])
with open('SpeedTracker.csv', newline='') as csvfile:
rows = csv.reader(csvfile, delimiter=',')
next(rows) #skip headers
for row in rows:
reg = row[0]
firstCam = row[1]
secondCam = row[2]
AvgSpeed = row[3]
print(reg)
print(firstCam)
print(secondCam)
print(AvgSpeed)

Python: import csv, do math with specific cells, output new columns to new (or same) CSV

I've been working on this for days-
I have a CSV that looks like:
COL A || COL B|| COL C||
0.1 || 0.0 || 0.5 ||
0.4 || 60 || 0.6 ||
0.3 || -60 || 0.5 ||
...
0.2 || -60 || 0.4 ||
There are 25 rows of numbers- they all vary slightly.
I want to import this CSV using python, do some slight math (ex. finding the avg between cell A1 and C1) then either print a new COLUMN to a whole new CSV file or add a new COLUMN to the beginning of my current or (even duplicated) file.
I know the the actual math part is easy. It's the importing, manipulation, then exporting a new COLUMN that I just cannot get.
Here's what I've tried:
1) First I tried importing the csv, changing it to a list, reading the columns I need then exporting to a new csv. The issue I have is that when I export to the CSV it doesn't create columns. It just adds things to a single cell that look like (0.111, 1.002, ..).
import csv
ofile=open('duplicate.csv', "w")
writer=csv.writer(ofile, delimiter=',')
with open('/Users/myCSV.csv', 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
avg=[]
high=[]
#average number
for i in range(1,25):
x=float(mycsv[i][16])
avg.append(x)
#print avg
average=zip(avg)
#highest number
for i in range(1,25):
x=float(mycsv[i][15])
high.append(x)
#print high
highest=zip(high)
print highest
writer.writerow([average,highest])
ofile.close()
2)Then I tried just creating a new column to a duplicate file and adding information into that column. I got a similar version of this from another similar question. This just doesn't work- I get the error "TypeError: can only assign an iterable"
import csv
infilename = r'/Users/myCSV.csv'
outfilename = r'/Users/myCSV_duplicate.csv'
with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
reader = csv.reader(fp_in, delimiter=",")
writer = csv.writer(fp_out, delimiter=",")
headers = next(reader) # read title row
headers[0:0] = ['avg']
writer.writerow(headers)
for row in reader:
for i in range(1,25):
mycsv=list(reader)
row[0:0] = float(mycsv[i][15])
writer.writerow(row)
I've been at this for DAYS can someone please help!?!?!?
I've written all of this in MATLAB but need to transfer it over to Python... MATLAB was easier to figure out.
Use pandas. This is what it is designed for.
import pandas as pd
df = pd.read_csv('test.csv', sep='\|\|', usecols=[0, 1, 2])
df['avg'] = df.loc[:, ('COL A', 'COL C')].mean(axis=1)
df.to_csv('test2.csv', index=False)
df.to_csv('tes3.csv', index=False, columns='avg')

How to select data rows

I have a 4 columns x 180.000 rows data file. I'd like to select entire rows of data to be saved to a new file, based on the criterion that the value in column 3 is within a specific interval, i.e. min value < column 3 value < max value.
Any ideas how to do this?
Use the csv module to read and write, then just filter:
with open(inputfilename, 'rb') as inputfile, open(outputfile, 'wb') as outputfile:
reader = csv.reader(inputfile)
writer = csv.writer(outputfile)
for row in reader:
if minval <= int(row[2]) <= maxval:
writer.writerow(row)
Can be done with simple CSV Read/ Write.
Can be done more elegantly and in a vectorized form using Numpy and moreover since the number of rows is huge, Numpy might get be a lot quicker.
import numpy as np
#Load file into a 'MATRIX'
data=np.loadtxt('name_of_delimited_file.txt')
# Find indices where the condition is met
idx_condition_met=(data[:,2] > min) & (data[:,2] < max)
np.savetxt('output.txt', data[idx_condition_met], delimiter=',')

Categories

Resources