I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.
For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)
For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).
For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).
The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!
Here's my code:
for index, row in Sourcedataframe.iterrows(): #going through each row line by line
#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe:
if count > 3:
count = 1
#if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
if count == 1:
with open('blankAgentList.csv') as infile:
with open('agentList1.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 2:
with open('blankAgentList.csv') as infile:
with open('agentList2.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 3:
with open('blankAgentList.csv') as infile:
with open('agentList3.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
count = count + 1 #counts how many times it has ran through the main for loop.
convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row
then,pass row/record no in iloc function to fetch particular record using
( df.iloc[ 3 , : ])
you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe. so just get rid of reader = csv.DictReader(infile) and read data like this:
Sourcedataframe.iloc[column]
Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.
Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.
Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, i.e. the rows can be split evenly to the output files:
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = zip(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in zip(outfiles, rg):
f.write(r)
for f in outfiles:
f.close()
Otherwise, for any number of data rows we can use
import itertools as it
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in it.zip_longest(outfiles, rg):
if r is None:
break
f.write(r)
for f in outfiles:
f.close()
which, for example, with an input file of
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
produces (output copied straight from the terminal)
(base) SO $ cat blankAgentList.csv
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
(base) SO $ cat blankAgentList1.csv
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c
(base) SO $ cat blankAgentList2.csv
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c
(base) SO $ cat blankAgentList3.csv
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c
Note: I understand the line
row_groups = zip(*[iter(infile)]*NUM_ROWS)
may be intimidating at first (it was for me when I started).
All it does is simply to group consecutive lines from the input file.
If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.
One key subject is the iteration protocol, along with all the other protocols. And namespaces.
I'm attempting to keep track of the amount of lines that are being written by my csv.writer.
On running the code the len(list(reader) identifies the correct number of rows and if under 100, writer proceeds to insert 2 new rows, thats all good, but after the first loop len(list(reader) will always sum to 0 row causing an infinite loop. I assumed this was a memory problem since the writer seems to write to memory and flush to disk at the end but flushing the file or recreating reader instance doesn't help.
import csv
import time
row = [('test', 'test2', 'test3', 'test4'), ('testa', 'testb', 'testc', 'testd')]
with open('test.csv', 'r+', newline='') as csv_file:
writer = csv.writer(csv_file)
while True:
# moved reader inside loop to recreate its instance had no effect
reader = csv.reader(csv_file, delimiter=';')
num = len(list(reader))
if num <= 100:
print(num)
writer.writerows(row)
csv_file.flush() # flush() had no effect
time.sleep(1)
else:
print(num)
break
How could I get the len(list(reader) to keep track of the files content at all times?
I dont see the need to create the reader object in the loop where you're writing into the csv. What you can do is:
import csv
count = 0
li =[[1,2,3,4,5],[6,7,8,9,10]]
with open('random.csv','w') as file:
writer = csv.writer(file)
for row in li:
csv.writerow(row)
with open('random.csv','r') as file:
reader = csv.reader(file)
global count
for row in reader:
if len(row) != 0:
count += 1
I am new to python. I have a CSV file which I want to print specific row from it I'd appreciate it if you could give me guidance. for example below table I want to print a Row if record Number is 2:
This image shows an example of my case
I have below code as starter which prints out the headers:
with open(filename, "r") as f:
reader = csv.reader(f, delimiter="\t")
first = next(reader)
print(first[0].split(','))
for row in filename:
print()
Thanks!
your example code seems somewhat confused, I presume the file is actually comma separated not tab delimited. otherwise you wouldn't need to do the first[0].split(',').
assuming that's the case, maybe something like this would work:
with open(filename, "r") as f:
reader = csv.reader(f)
# skip header row
header = next(reader)
for row in reader:
if int(row[0]) == 2:
print(row)
if you're after a specific row number, you could use enumerate to count rows and print when you get to the correct one.
In your for loop check if the record number, which is the 0th column, is == 2:
for row in file:
if row[0] == 2:
print(row)
This should be an easy one, but I'm having a bit of a brain fart. the CSV maintains a list of four latitude and longitude pairs. Based on the code, if I print row[0] it prints just the latitudes and if I print row[1] it prints the longitudes. How to I format the code to print a specific lat/lon pair instead? Say.. The second lat/lon pair in the CSV.
import csv
with open('120101.KAP.csv','rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print row[0]
Looping over reader gives you each row. If you wanted to get the second row, use the next() function instead, ignore one and get the second:
reader = csv.reader(csvfile)
next(reader) # ignore
row = next(reader) # second row
print row # print the second row.
You can generalise this by using the itertools.islice() object to do the skipping for you:
from itertools import islice
reader = csv.reader(csvfile)
row = next(islice(reader, rownumber)) # skip to index rownumber, read that
print row
Take into account that counting starts at 0, so "second row" is rownumber = 1.
Or you could just read all rows into a list and index into that:
reader = csv.reader(csvfile)
rows = list(reader)
print rows[1] # print the second row
print rows[3] # print the fourth row
Only do this (loading everything into a list) if there are a limited number of rows. Iteration over the reader only produces one row at a time and uses a file buffer for efficient reading, limiting how much memory is used; you could process gigantic CSV files this way.
I'm following some feedback from another thread, but have gotten stuck. I'm looking to search an existing csv file to locate the row in which a string occurs. I am then looking to update this row with new data.
What I have so far gives me an "TypeError: unhasable type: 'list'":
allLDR = []
with open('myfile.csv', mode='rb') as f:
reader = csv.reader(f)
#allLDR.extend(reader)
for num, row in enumerate(reader):
if myField in row[0]:
rowNum = row
line_to_override = {rowNum:[nMaisonField, entreeField, indiceField, cadastreField]}
with open('myfile.csv', 'wb') as ofile:
writer = csv.writer(ofile, quoting=csv.QUOTE_NONE, delimiter=',')
#for line, row in enumerate(allLDR):
for line, row in enumerate(reader):
data = line_to_override.get(line, row)
writer.writerow(data)
The line allDR.extend(reader) consumes all of the input lines from the csv.reader object. Therefore, the for loop never runs, and rowNum=row is never executed, and {rowNum:blah} generates an exception.
Try commenting out the allDR.extend(reader) line.
As a debugging aid, try adding print statements inside the for loop and inside the conditional.
Here is a program which does what I think you want your program to do: it reads in myfile.csv, modifies rows conditionally based on the content of the first cell, and writes the file back out.
import csv
with open('myfile.csv', mode='rb') as ifile:
allDR = list(csv.reader(ifile))
for row in allDR:
if 'fowl' in row[0]:
row[:] = ['duck', 'duck', 'goose']
with open('myfile.csv', 'wb') as ofile:
csv.writer(ofile).writerows(allDR)