I am fairly new at Python and am having some issues reading in my csv file. There are sensor names, datestamps and readings in each column. However, there are multiple of the same sensor name, which I have already made a list of the different options called OPTIONS, shown below
OPTIONS = []
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
Options
prints fine,
But now I am having issues retrieving any readings, and using them to calculate average and maximum readings for each unique sensor name.
here are a few lines of sensor_data.csv, which goes from 2018-01-01 to 2018-12-31 for sensor_1 to sensor_25.
Any help would be appreciated.
What you have for the readings variable is just the reading of each row. One way to get the average readings is to keep track of the sum and count of readings (sum_readings and count_readings respectively) and then after the for loop you can get the average by dividing the sum with the count. You can get the maximum by initializing a max_readings variable with a reading minimum value (I assume to be 0) and then update the variable whenever the current reading is larger than max_readings (max_readings < readings)
import csv
OPTIONS = []
OPTIONS_READINGS = {}
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
OPTIONS_READINGS[row[0]] = []
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
OPTIONS_READINGS[row[0]].append(readings)
for option in OPTIONS_READINGS:
print(option)
readings = OPTIONS_READINGS[option]
print('Max readings:', max(readings))
print('Average readings:', sum(readings) / len(readings))
Edit: Sorry I misread the question. If you want to get the maximum and average of each unique options, there is a more straight forward way which is to use an additional dictionary-type variable OPTIONS_READINGS whose keys are the option names and the values are the list of readings. You can find the maximum and average reading of an options by simply using the expression max(OPTIONS_READINGS[option]) and sum(OPTIONS_READINGS[option]) / len(OPTIONS_READINGS[option]) respectively.
A shorter version below
import csv
from collections import defaultdict
readings = defaultdict(list)
with open('sensor_data.csv', 'r') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
readings[row[0]].append(float(row[2]) )
for sensor_name,values in readings.items():
print('Sensor: {}, Max readings: {}, Avg: {}'.format(sensor_name,max(values), sum(values)/ len(values)))
Related
I have a csv file that contains this information
grade,low,high
S235,360,510
S275,370,530
S355,470,630
I am wanting to find the difference between high and low but I don't know how to do this. I thought I could do it through numpy (np.mean) but it came up with an error.
this is the code I have written so far
#!/usr/bin/python3
import csv
csvfile = open('/home/aa408/steel.csv')
csvreader = csv.DictReader(csvfile)
print("%7s %8s %8s" % ("Grade", "Low MPa", "Max MPa") )
total = 0
count = 0
for gradeInfo in csvreader:
print("%7s %8s %8s" % (gradeInfo["grade"], gradeInfo["low"],
gradeInfo["high"]) )
The following will calculate the difference between the high and low columns assuming they will only be integers.
import csv
with open('/home/aa408/steel.csv') as cf:
reader = csv.DictReader(cf)
print(reader)
for row in reader:
print(int(row['high']) - int(row['low']))
I'm trying to add up all the values in a given row in a CSV file with Python but have had a number of difficulties doing so.
Here is the closest I've come:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
for row in csv_reader:
total = 0
total = total + int(row[1])
print(total)
Instead of yielding the sum of all the values in row[1], the final print statement is yielding only the last number in the row. What am I doing incorrect?
I've also stumbled with bypassing the header (the next() that I've seen widely used in other examples on SO seem to be from Python 2, and this method no longer plays nice in P3), so I just manually, temporarily changed the header for that column to 0.
Any help would be much appreciated.
it seems you are resetting the total variable to zero on every iteration.
To fix it, move the variable initialization to outside the for loop, so that it only happens once:
total = 0
for row in csv_reader:
total = total + int(row[1])
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)
total should be moved to outside the for loop.
indents are important in Python. E.g. the import line should be pushed to left-most.
You are resetting your total, try this:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)
As others have already stated, you are setting the value of total on every iteration. You can move total = 0 outside of the loop or, alternatively, use sum:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = sum(int(x[0]) for x in csv_reader)
print(total)
This should be an easy one, but I'm having a bit of a brain fart. the CSV maintains a list of four latitude and longitude pairs. Based on the code, if I print row[0] it prints just the latitudes and if I print row[1] it prints the longitudes. How to I format the code to print a specific lat/lon pair instead? Say.. The second lat/lon pair in the CSV.
import csv
with open('120101.KAP.csv','rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print row[0]
Looping over reader gives you each row. If you wanted to get the second row, use the next() function instead, ignore one and get the second:
reader = csv.reader(csvfile)
next(reader) # ignore
row = next(reader) # second row
print row # print the second row.
You can generalise this by using the itertools.islice() object to do the skipping for you:
from itertools import islice
reader = csv.reader(csvfile)
row = next(islice(reader, rownumber)) # skip to index rownumber, read that
print row
Take into account that counting starts at 0, so "second row" is rownumber = 1.
Or you could just read all rows into a list and index into that:
reader = csv.reader(csvfile)
rows = list(reader)
print rows[1] # print the second row
print rows[3] # print the fourth row
Only do this (loading everything into a list) if there are a limited number of rows. Iteration over the reader only produces one row at a time and uses a file buffer for efficient reading, limiting how much memory is used; you could process gigantic CSV files this way.
I have a CSV file, with columns holding specific values that I read into specific places in a dictionary, and rows separate instances of data that equal one full dictionary. I read in and then use this data to computer certain values, process some of the inputs, etc., for each row before moving on to the next row. My question is, if I have a header that specifics the names of the columns (Key1 versus Key 3A, etc.), can I use that information to avoid the somewhat draw out code I am currently using (below).
with open(input_file, 'rU') as controlFile:
reader = csv.reader(controlFile)
next(reader, None) # skip the headers
for row in reader:
# Grabbing all the necessary inputs
inputDict = {}
inputDict["key1"] = row[0]
inputDict["key2"] = row[1]
inputDict["key3"] = {}
inputDict["key3"].update({"A" : row[2]})
inputDict["key3"].update({"B" : row[3]})
inputDict["key3"].update({"C" : row[4]})
inputDict["key3"].update({"D" : row[5]})
inputDict["key3"].update({"E" : row[6]})
inputDict["Key4"] = {}
inputDict["Key4"].update({"F" : row[7]})
inputDict["Key4"].update({"G" : float(row[8])})
inputDict["Key4"].update({"H" : row[9]})
If you use a DictReader, you can improve your code a bit:
Create an object which operates like a regular reader but maps the
information read into a dict whose keys are given by the optional
fieldnames parameter. The fieldnames parameter is a sequence whose
elements are associated with the fields of the input data in order.
These elements become the keys of the resulting dictionary. If the
fieldnames parameter is omitted, the values in the first row of the
csvfile will be used as the fieldnames.
So, if we utilize that:
import csv
import string
results = []
mappings = [
[(string.ascii_uppercase[i-2], i) for i in range(2, 7)],
[(string.ascii_uppercase[i-2], i) for i in range(7, 10)]]
with open(input_file, 'rU') as control_file:
reader = csv.DictReader(control_file)
for row in reader:
row_data = {}
row_data['key1'] = row['key1']
row_data['key2'] = row['key2']
row_data['key3'] = {k:row[v] for k,v in mappings[0]}
row_data['key4'] = {k:row[v] for k,v in mappings[1]}
results.append(row_data)
yes you can.
import csv
with open(infile, 'rU') as infile:
reader = csv.DictReader(infile)
for row in reader:
print(row)
Take a look at this piece of code.
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields,row)))
I am using this python code to look through a csv, which has dates in one column and values in the other. I am recording the minimum value from each year. My code is not looping through correctly. What's my stupid mistake? Cheers
import csv
refMin = 40
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
for i in range(1968,2014):
for row in reader:
if str(row[0])[:4] == str(i):
if float(row[1]) <= refMin:
refMin = float(row[1])
print 'The minimum value for ' + str(i) + ' is: ' + str(refMin)
The reader can only be iterated once. The first time around the for i in range(1968,2014) loop, you consume every item in the reader. So the second time around that loop, there are no items left.
If you want to compare every value of i against every row in the file, you could swap your loops around, so that the loop for row in reader is on the outside and only runs once, with multiple runs of the i loop instead. Or you could create a new reader each time round, although that might be slower.
If you want to process the entire file in one pass, you'll need to create a dictionary of values to replace refMin. When processing each row, either iterate through the dictionary keys, or look it up based on the current row. On the other hand, if you're happy to read the file multiple times, just move the line reader = csv.reader(...) inside the outer loop.
Here's an untested idea for doing it in one pass:
import csv
import collections
refMin = collections.defaultdict(lambda:40)
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
allowed_years = set(str(i) for i in range(1968,2014))
for row in reader:
year = int(str(row[0])[:4])
if float(row[1]) <= refMin[year]:
refMin[year] = float(row[1])
for year in range(1968, 2014):
print 'The minimum value for ' + str(year) + ' is: ' + str(refMin[year])
defaultdict is just like a regular dictionary except that it has a default value for keys that haven't previously been set.
I would refactor that to read the file only once:
import csv
refByYear = DefaultDict(list)
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
for row in reader:
refByYear[str(row[0])[:4]].append(float(row[1]))
for year in range(1968, 2014):
print 'The minimum value for ' + str(year) + ' is: ' + str(min(refByYear[str(year)]))
Here I store all values for each year, which may be useful for other purposes, or totally useless.