I'm attempting to keep track of the amount of lines that are being written by my csv.writer.
On running the code the len(list(reader) identifies the correct number of rows and if under 100, writer proceeds to insert 2 new rows, thats all good, but after the first loop len(list(reader) will always sum to 0 row causing an infinite loop. I assumed this was a memory problem since the writer seems to write to memory and flush to disk at the end but flushing the file or recreating reader instance doesn't help.
import csv
import time
row = [('test', 'test2', 'test3', 'test4'), ('testa', 'testb', 'testc', 'testd')]
with open('test.csv', 'r+', newline='') as csv_file:
writer = csv.writer(csv_file)
while True:
# moved reader inside loop to recreate its instance had no effect
reader = csv.reader(csv_file, delimiter=';')
num = len(list(reader))
if num <= 100:
print(num)
writer.writerows(row)
csv_file.flush() # flush() had no effect
time.sleep(1)
else:
print(num)
break
How could I get the len(list(reader) to keep track of the files content at all times?
I dont see the need to create the reader object in the loop where you're writing into the csv. What you can do is:
import csv
count = 0
li =[[1,2,3,4,5],[6,7,8,9,10]]
with open('random.csv','w') as file:
writer = csv.writer(file)
for row in li:
csv.writerow(row)
with open('random.csv','r') as file:
reader = csv.reader(file)
global count
for row in reader:
if len(row) != 0:
count += 1
Related
My program needs a function that reads data from a csv file ("all.csv") and extracts all the data pertaining to 'Virginia' (extract each row that has 'Virginia in it), then writes the extracted data to another csv file named "Virginia.csv" The program runs without error; however, when I open the "Virginia.csv" file, it is blank. My guess is that the issue is with my nested for loop, but I am not entirely sure what is causing the issue.
Here is the data within the all.csv file:
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
Here is my code:
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') #split elements
for row in range(len(contents)):
for word in range(len(contents[row])):
if contents[row][2] == state:
writer.writerow(row)
extract_records_for_state(input_file,output_file,state)
I ran your code and it gave me an error
Traceback (most recent call last):
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 27, in
extract_records_for_state(input_file, output_file, state)
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 24, in extract_records_for_state
writer.writerow(row)
_csv.Error: iterable expected, not int,
I fixed the error by putting the contents of the row [contents[row]] into the writerow() function and ran it again and the data showed up in Virginia.csv. It gave me duplicates so I also removed the word for-loop.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state(input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') # split elements
print(contents)
for row in range(len(contents)):
if contents[row][2] == state:
writer.writerow(contents[row]) # this is what I changed
extract_records_for_state(input_file, output_file, state)
You have two errors. The first is that you try to write the row index at writer.writerow(row) - the row is contents[row]. The second is that you leave the newline in the final column on read but don't strip it on write. Instead you could leverage the csv module more fully. Let the reader parse the rows. And instead of reading into a list, which uses a fair amount of memory, filter and write row by row.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r', newline='') as infile, \
open(output_file, 'w', newline="") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# add header
writer.writerow(next(reader))
# filter for state
writer.writerows(row for row in reader if row[2] == state)
extract_records_for_state(input_file,output_file,state)
Looking at your code two things jump out at me:
I see a bunch of nested statements (logic)
I see you reading a CSV as plain text, then interpreting it as CSV yourself (contents[row] = contents[row].split(',')).
I recommend two things:
break up logic into distinct chunks: all that nesting can be hard to interpret and debug; do one thing, prove that works; do another thing, prove that works; etc...
use the CSV API to its fullest: use it to both read and write your CSVs
I don't want to try and replicate/fix your code, instead I'm offering this general approach to achieve those two goals:
import csv
# Read in
all_rows = []
with open('all.csv', 'r', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header (I didn't see you keep it)
for row in reader:
all_rows.append(row)
# Process
filtered_rows = []
for row in all_rows:
if row[2] == 'Virginia':
filtered_rows.append(row)
# Write out
with open('filtered.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(filtered_rows)
Once you understand both the logic and the API of those discrete steps, you can move on (advance) to composing something more complex, like the following which reads a row, decides if it should be written, and if so, writes it:
import csv
with open('filtered.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
with open('all.csv', 'r', newline='') as f_in:
reader = csv.reader(f_in)
next(reader) # discard header
for row in reader:
if row[2] == 'Virginia':
writer.writerow(row)
Using either of those two pieces of code on this (really scaled-down) sample of all.csv:
date,county,state,fips,cases,deaths
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
2020-03-09,Chelan,Washington,53007,1,1
2020-03-09,Clark,Washington,53011,1,0
gets me a filtered.csv that looks like:
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
Given the size of this dataset, the second approach of write-on-demand-inside-the-read-loop is both faster (about 5x faster on my machine) and uses significantly less memory (about 40x less on my machine) because there's no intermediate storage with all_rows.
But, please take the time to run both, read them carefully, and see how each works the way it does.
I need to find profit/loss from two different lines on a csv file. I cant find a way to hold a variable whilst on one row and then once i move onto another line have the same variable to make a comparison.
I have already tried the next() function but have had no luck.
import csv
symbolCode = input("Please enter a symbol code: ")
with open("prices.csv", "r") as f:
reader = csv.reader(f, delimiter=",")
with open(symbolCode + ".csv", "w") as d:
writer = csv.writer(d)
for row in reader:
item = 0
item2 = 0
if symbolCode == row[1]:
print(row)
writer.writerow(row)
d.close()
I expect to find an output of a number but while having used the two other numbers to minus and equal the output
Are you looking for something like this?
symbolCode = input("Please enter a symbol code: ")
with open("prices.csv", "r") as f:
reader = csv.reader(f, delimiter=",")
with open(symbolCode + ".csv", "w") as d:
writer = csv.writer(d)
previous_row = None # <--- initialize with special (empty/none) value
for row in reader:
item = 0
item2 = 0
if symbolCode == row[1]:
print(row)
writer.writerow(row)
if previous_row != None: # <-- if we're not processing the very first row.
if previous_row[7] < row[7]: # <-- do your comparison with previous row
print("7th value is bigger now") # <-- do something
previous_row = row # <-- store this row to be the previous row in the next loop iteration
Note that I've left out the d.close() line. It's not needed when you open a file in a with statement. Other than that, I only added lines to your example, and marked these line with # <-- comments.
I am given the task to write a script to check MX records of the given data in the CSV file. I have started by trying checking it using regex and before that I trying to read the CSV file. I would also like to log the progress so I am printing the row number it is on, but whenever I use the cvs_reader object to calculate the row length I am unable to get inside the for loop
import csv
with open('test_list.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
data = list(csv_reader)
row_count = len(data)
for row in csv_reader:
print({row[2]})
line_count += 1
print('Checking '+ str(line_count) +' of '+ str(row_count))
print('Processed lines :'+str(row_count))
I only get the result as
Processed lines : 40
New at python scripting. Please help
My test_list.csv look like this
fname, lname, email
bhanu2, singh2, bhanudoesnotexist#doesnotexit.com
bhanu2, singh2, bhanudoesnotexist#doesnotexit.com
bhanu2, singh2, bhanudoesnotexist#doesnotexit.com
bhanu2, singh2, bhanudoesnotexist#doesnotexit.com
Total 40 times continued
first thing csv data has nothing to do with this problem,
Solution:
import csv
input_file = open("test_list.csv", "r").readlines()
print(len(input_file))
csv_reader = csv.reader(input_file)
line_count = 0
# data = list(csv_reader)
# row_count = len(data)
for row in csv_reader:
print({row[2]})
line_count += 1
print('Checking ' + str(line_count) + ' of ' + str(len(input_file)))
print('Processed lines :' + str(len(input_file)))
Problem Recognition:
with open('test_list.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
data = list(csv_reader)
row_count = len(data)
in your code data = list(csv_reader) because of this line you are exhausting your variable. so it won't be able to loop through in your for loop
so for that you can read csv file like
input_file = open("test_list.csv", "r").readlines()
print(len(input_file))
then use csv.reader()
csv.reader returns an iterable, and when you use list(csv_reader) to read all the rows of the CSV, you have already exhausted the iterable, so when you want to iterate through csv_reader again with a for loop, it has nothing left to iterate.
Since you have a complete list of rows materialized in the variable data, you can simply iterate over it instead.
Change:
for row in csv_reader:
to:
for row in data:
I'm trying to add up all the values in a given row in a CSV file with Python but have had a number of difficulties doing so.
Here is the closest I've come:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
for row in csv_reader:
total = 0
total = total + int(row[1])
print(total)
Instead of yielding the sum of all the values in row[1], the final print statement is yielding only the last number in the row. What am I doing incorrect?
I've also stumbled with bypassing the header (the next() that I've seen widely used in other examples on SO seem to be from Python 2, and this method no longer plays nice in P3), so I just manually, temporarily changed the header for that column to 0.
Any help would be much appreciated.
it seems you are resetting the total variable to zero on every iteration.
To fix it, move the variable initialization to outside the for loop, so that it only happens once:
total = 0
for row in csv_reader:
total = total + int(row[1])
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)
total should be moved to outside the for loop.
indents are important in Python. E.g. the import line should be pushed to left-most.
You are resetting your total, try this:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = 0
for row in csv_reader:
total = total + int(row[1])
print(total)
As others have already stated, you are setting the value of total on every iteration. You can move total = 0 outside of the loop or, alternatively, use sum:
from csv import reader
with open("simpleData.csv") as file:
csv_reader = reader(file)
total = sum(int(x[0]) for x in csv_reader)
print(total)
I'm following some feedback from another thread, but have gotten stuck. I'm looking to search an existing csv file to locate the row in which a string occurs. I am then looking to update this row with new data.
What I have so far gives me an "TypeError: unhasable type: 'list'":
allLDR = []
with open('myfile.csv', mode='rb') as f:
reader = csv.reader(f)
#allLDR.extend(reader)
for num, row in enumerate(reader):
if myField in row[0]:
rowNum = row
line_to_override = {rowNum:[nMaisonField, entreeField, indiceField, cadastreField]}
with open('myfile.csv', 'wb') as ofile:
writer = csv.writer(ofile, quoting=csv.QUOTE_NONE, delimiter=',')
#for line, row in enumerate(allLDR):
for line, row in enumerate(reader):
data = line_to_override.get(line, row)
writer.writerow(data)
The line allDR.extend(reader) consumes all of the input lines from the csv.reader object. Therefore, the for loop never runs, and rowNum=row is never executed, and {rowNum:blah} generates an exception.
Try commenting out the allDR.extend(reader) line.
As a debugging aid, try adding print statements inside the for loop and inside the conditional.
Here is a program which does what I think you want your program to do: it reads in myfile.csv, modifies rows conditionally based on the content of the first cell, and writes the file back out.
import csv
with open('myfile.csv', mode='rb') as ifile:
allDR = list(csv.reader(ifile))
for row in allDR:
if 'fowl' in row[0]:
row[:] = ['duck', 'duck', 'goose']
with open('myfile.csv', 'wb') as ofile:
csv.writer(ofile).writerows(allDR)