Difficulties Iterating over CSV file in Python

Difficulties Iterating over CSV file in Python - python

I'm trying to add up all the values in a given row in a CSV file with Python but have had a number of difficulties doing so.
Here is the closest I've come:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
for row in csv_reader:
total = 0
total = total + int(row[1])
print(total)
Instead of yielding the sum of all the values in row[1], the final print statement is yielding only the last number in the row. What am I doing incorrect?
I've also stumbled with bypassing the header (the next() that I've seen widely used in other examples on SO seem to be from Python 2, and this method no longer plays nice in P3), so I just manually, temporarily changed the header for that column to 0.
Any help would be much appreciated.

it seems you are resetting the total variable to zero on every iteration.
To fix it, move the variable initialization to outside the for loop, so that it only happens once:
total = 0
for row in csv_reader:
total = total + int(row[1])

from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)
total should be moved to outside the for loop.
indents are important in Python. E.g. the import line should be pushed to left-most.

You are resetting your total, try this:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)

As others have already stated, you are setting the value of total on every iteration. You can move total = 0 outside of the loop or, alternatively, use sum:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = sum(int(x[0]) for x in csv_reader)
print(total)

Related

Python: How to iterate every third row starting with the second row of a csv file

I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.
For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)
For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).
For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).
The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!
Here's my code:
for index, row in Sourcedataframe.iterrows(): #going through each row line by line
#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe:
if count > 3:
count = 1
#if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
if count == 1:
with open('blankAgentList.csv') as infile:
with open('agentList1.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 2:
with open('blankAgentList.csv') as infile:
with open('agentList2.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 3:
with open('blankAgentList.csv') as infile:
with open('agentList3.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
count = count + 1 #counts how many times it has ran through the main for loop.

convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row
then,pass row/record no in iloc function to fetch particular record using
( df.iloc[ 3 , : ])

you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe. so just get rid of reader = csv.DictReader(infile) and read data like this:
Sourcedataframe.iloc[column]

Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.
Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.
Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, i.e. the rows can be split evenly to the output files:
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = zip(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in zip(outfiles, rg):
f.write(r)
for f in outfiles:
f.close()
Otherwise, for any number of data rows we can use
import itertools as it
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in it.zip_longest(outfiles, rg):
if r is None:
break
f.write(r)
for f in outfiles:
f.close()
which, for example, with an input file of
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
produces (output copied straight from the terminal)
(base) SO $ cat blankAgentList.csv
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
(base) SO $ cat blankAgentList1.csv
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c
(base) SO $ cat blankAgentList2.csv
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c
(base) SO $ cat blankAgentList3.csv
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c
Note: I understand the line
row_groups = zip(*[iter(infile)]*NUM_ROWS)
may be intimidating at first (it was for me when I started).
All it does is simply to group consecutive lines from the input file.
If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.
One key subject is the iteration protocol, along with all the other protocols. And namespaces.

Get readings from duplicate names in CSV file Python

I am fairly new at Python and am having some issues reading in my csv file. There are sensor names, datestamps and readings in each column. However, there are multiple of the same sensor name, which I have already made a list of the different options called OPTIONS, shown below
OPTIONS = []
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
Options
prints fine,
But now I am having issues retrieving any readings, and using them to calculate average and maximum readings for each unique sensor name.
here are a few lines of sensor_data.csv, which goes from 2018-01-01 to 2018-12-31 for sensor_1 to sensor_25.
Any help would be appreciated.

What you have for the readings variable is just the reading of each row. One way to get the average readings is to keep track of the sum and count of readings (sum_readings and count_readings respectively) and then after the for loop you can get the average by dividing the sum with the count. You can get the maximum by initializing a max_readings variable with a reading minimum value (I assume to be 0) and then update the variable whenever the current reading is larger than max_readings (max_readings < readings)
import csv
OPTIONS = []
OPTIONS_READINGS = {}
with open('sensor_data.csv', 'rb') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] not in OPTIONS:
OPTIONS.append(row[0])
OPTIONS_READINGS[row[0]] = []
sensor_name = row[0]
datastamp = row[1]
readings = float(row[2])
print(OPTIONS)
OPTIONS_READINGS[row[0]].append(readings)
for option in OPTIONS_READINGS:
print(option)
readings = OPTIONS_READINGS[option]
print('Max readings:', max(readings))
print('Average readings:', sum(readings) / len(readings))
Edit: Sorry I misread the question. If you want to get the maximum and average of each unique options, there is a more straight forward way which is to use an additional dictionary-type variable OPTIONS_READINGS whose keys are the option names and the values are the list of readings. You can find the maximum and average reading of an options by simply using the expression max(OPTIONS_READINGS[option]) and sum(OPTIONS_READINGS[option]) / len(OPTIONS_READINGS[option]) respectively.

A shorter version below
import csv
from collections import defaultdict
readings = defaultdict(list)
with open('sensor_data.csv', 'r') as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
readings[row[0]].append(float(row[2]) )
for sensor_name,values in readings.items():
print('Sensor: {}, Max readings: {}, Avg: {}'.format(sensor_name,max(values), sum(values)/ len(values)))

How to count the number of lines written by csv.writer?

I'm attempting to keep track of the amount of lines that are being written by my csv.writer.
On running the code the len(list(reader) identifies the correct number of rows and if under 100, writer proceeds to insert 2 new rows, thats all good, but after the first loop len(list(reader) will always sum to 0 row causing an infinite loop. I assumed this was a memory problem since the writer seems to write to memory and flush to disk at the end but flushing the file or recreating reader instance doesn't help.
import csv
import time
row = [('test', 'test2', 'test3', 'test4'), ('testa', 'testb', 'testc', 'testd')]
with open('test.csv', 'r+', newline='') as csv_file:
writer = csv.writer(csv_file)
while True:
# moved reader inside loop to recreate its instance had no effect
reader = csv.reader(csv_file, delimiter=';')
num = len(list(reader))
if num <= 100:
print(num)
writer.writerows(row)
csv_file.flush() # flush() had no effect
time.sleep(1)
else:
print(num)
break
How could I get the len(list(reader) to keep track of the files content at all times?

I dont see the need to create the reader object in the loop where you're writing into the csv. What you can do is:
import csv
count = 0
li =[[1,2,3,4,5],[6,7,8,9,10]]
with open('random.csv','w') as file:
writer = csv.writer(file)
for row in li:
csv.writerow(row)
with open('random.csv','r') as file:
reader = csv.reader(file)
global count
for row in reader:
if len(row) != 0:
count += 1

Make a new list from CSV

So, I've search for a method to show a certain csv field based on input, and I've try to apply the code for my program. But the problem is I want to get a certain item in csv and make a new list from certain index.
I have csv file like this:
code,place,name1,name2,name3,name4
001,Home,Laura,Susan,Ernest,Toby
002,Office,Jack,Rachel,Victor,Wanda
003,Shop,Paulo,Roman,Brad,Natali
004,Other,Charles,Matthew,Justin,Bono
at first I have this code, and it works show all the row:
import csv
number = input('Enter number to find\n')
csv_file = csv.reader(open('residence.csv', 'r'), delimiter=",")
for row in csv_file:
if number == row[0]:
print (row)
**input : 001
**result : [001, Home, Laura, Susan, Ernest, Toby]
then, I try to make a certain row in the result to add the items to a new list. But it didn't work. Here's the code:
import csv
res = []
y = 2
number = input('Enter number to find\n')
csv_file = csv.reader(open('residence.csv', 'r'), delimiter=",")
for row in csv_file:
if number == row[0]:
while y <= 5:
res.append(str(row[y]))
y = y+1
print (res)
**input : 001
**expected result : [Laura, Susan, Ernest, Toby]
I want to make a new list that contains row name1, name2, name3, and name4, and then I want to print the list. But I guess the loop is wrongly placed or I missed something.

There are a couple of things you could fix in your code.
You are not skipping the header line when iterating through the rows. This means you will not always match an actual row number.
Your y variable is not re-initialized. It would be more idiomatic to use a for loop instead of a while anyhow.
If more than one row match, it will break (see 2.). If you know you will never have more than one match, you should break after you append the values to the list.
Your file is never closed. Also it should be opened with newline='' (see csv module docs)
Lastly, you match the actual string ('001'), vs. an integer (1), which could be the source of confusion when entering the input.
An updated version:
import csv
res = []
number = input('Enter number to find\n')
with open('residence.csv', newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
next(csv_reader) # Skip header line
for row in csv_reader:
if number == row[0]:
for i in range(2, 6):
res.append(str(row[i]))
break
print(res)

Python script not looping correctly

I am using this python code to look through a csv, which has dates in one column and values in the other. I am recording the minimum value from each year. My code is not looping through correctly. What's my stupid mistake? Cheers
import csv
refMin = 40
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
for i in range(1968,2014):
for row in reader:
if str(row[0])[:4] == str(i):
if float(row[1]) <= refMin:
refMin = float(row[1])
print 'The minimum value for ' + str(i) + ' is: ' + str(refMin)

The reader can only be iterated once. The first time around the for i in range(1968,2014) loop, you consume every item in the reader. So the second time around that loop, there are no items left.
If you want to compare every value of i against every row in the file, you could swap your loops around, so that the loop for row in reader is on the outside and only runs once, with multiple runs of the i loop instead. Or you could create a new reader each time round, although that might be slower.
If you want to process the entire file in one pass, you'll need to create a dictionary of values to replace refMin. When processing each row, either iterate through the dictionary keys, or look it up based on the current row. On the other hand, if you're happy to read the file multiple times, just move the line reader = csv.reader(...) inside the outer loop.
Here's an untested idea for doing it in one pass:
import csv
import collections
refMin = collections.defaultdict(lambda:40)
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
allowed_years = set(str(i) for i in range(1968,2014))
for row in reader:
year = int(str(row[0])[:4])
if float(row[1]) <= refMin[year]:
refMin[year] = float(row[1])
for year in range(1968, 2014):
print 'The minimum value for ' + str(year) + ' is: ' + str(refMin[year])
defaultdict is just like a regular dictionary except that it has a default value for keys that haven't previously been set.

I would refactor that to read the file only once:
import csv
refByYear = DefaultDict(list)
with open('data.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_ALL)
for row in reader:
refByYear[str(row[0])[:4]].append(float(row[1]))
for year in range(1968, 2014):
print 'The minimum value for ' + str(year) + ' is: ' + str(min(refByYear[str(year)]))
Here I store all values for each year, which may be useful for other purposes, or totally useless.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Difficulties Iterating over CSV file in Python - python

it seems you are resetting the total variable to zero on every iteration. To fix it, move the variable initialization to outside the for loop, so that it only happens once: total = 0 for row in csv_reader: total = total + int(row[1])

from csv import reader with open("simpleData.csv") as file: csv_reader = reader(file) total = 0 for row in csv_reader: total = total + int(row[1]) print(total) total should be moved to outside the for loop. indents are important in Python. E.g. the import line should be pushed to left-most.

You are resetting your total, try this: from csv import reader with open("simpleData.csv") as file: csv_reader = reader(file) total = 0 for row in csv_reader: total = total + int(row[1]) print(total)

As others have already stated, you are setting the value of total on every iteration. You can move total = 0 outside of the loop or, alternatively, use sum: from csv import reader with open("simpleData.csv") as file: csv_reader = reader(file) total = sum(int(x[0]) for x in csv_reader) print(total)

Related

Python: How to iterate every third row starting with the second row of a csv file

Get readings from duplicate names in CSV file Python

How to count the number of lines written by csv.writer?

Make a new list from CSV

Python script not looping correctly

Categories

Resources