How to write on nth row in csv using Python? - python

I have a following problem. Lets say I want to write a word on the cell in column = 1 & and row = 3.
I have written this function:
import csv
def write_to_csv(myfile, word):
with open(myfile, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
write = [word]
csv_writer.writerow(elt for elt in write)
write_to_csv("output.csv", "hello")
My function writes the word "hello" into the cell in column = 1 & and row = 1.
Now imagine that my output.csv already has someting in the first cell. And I don`t want to rewrite it. So how can I modify my function to write the word "hello" on column = 1 & and row = 3 ?
I found this queston, but it did not help me: How to select every Nth row in CSV file using python
Thank you very much!

A CSV file is a text file. That means that you should not try to overwrite it in place. The common way is to copy it to a new file introducing your changes at that time. When done, you move the new file with the old name.
Here is a possible code. It just hope that the file output.csv.tmp does not exist, but can be created and that output.csv has at least 4 lines:
def write_to_csv(myfile, word, row_nb, col_nb):
"""Updates a csv file by writing word at row_nb row and col_nb column"""
with open(myfile) as csv_file, open(myfile+'.tmp', "w", newline='') as out:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(out)
#skip row_nb rows
for i in range(row_nb):
csv_writer.writerow(next(csv_reader))
# read and change the expected row
row = next(csv_reader)
row[col_nb] = word
# print(row) # uncomment for debugging
csv_writer.writerow(row)
# copy last rows
for row in csv_reader:
csv_writer.writerow(row)
# rename the tmp file
os.remove(myfile)
os.rename(myfile+'.tmp', myfile)
# write hello at first column of fourth row in output.csv
write_to_csv('output.csv', 'hello', 3, 0)

Related

How to add a header to an existing CSV file without replacing the first row?

What I want to do is actually as it is written in the title.
with open(path, "r+", newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
list_of_column_names = []
num_cols = len(next(csv_reader))
for i in range(num_cols):
list_of_column_names.append(i)
fields = list_of_column_names
with open(example.csv, "r+", newline='') as writeFile:
csvwriter = csv.DictWriter(writeFile, delimiter=',', lineterminator='\n', fieldnames=fields)
writeFile.seek(0, 0)
csvwriter.writeheader()
I want to enumerate the columns which initially doesn't have any column names. But when I run the code, it replaces the data in the first row. For example:
example.csv:
a,b
c,d
e,f
what I want:
0,1
a,b
c,d
e,f
what happens after running the code:
0,1
c,d
e,f
Is there a way to prevent this from happening?
There's no magical way to insert a line into an existing text file.
The following is how I think of doing this, and your code is already getting steps 2-4. Also, I wouldn't mess with the DictWriter since you're not trying to convert a Python dict to CSV (I can see you using it for writing the header, but that's easy enough to do with the regular reader/writer):
open a new file for writing
read the first row of your CSV
interpret the column indexes as the header
write the header
write the first row
read/write the rest of the rows
move the new file back to the old file, overwrite (not shown)
Here's what that looks like in code:
import csv
with open('output.csv', 'w', newline='') as out_f:
writer = csv.writer(out_f)
with open('input.csv', newline='') as in_f:
reader = csv.reader(in_f)
# Read the first row
first_row = next(reader)
# Count the columns in first row; equivalent to your `for i in range(len(first_row)): ...`
header = [i for i, _ in enumerate(first_row)]
# Write header and first row
writer.writerow(header)
writer.writerow(first_row)
# Write rest of rows
for row in reader:
writer.writerow(row)

Python: How to iterate every third row starting with the second row of a csv file

I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.
For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)
For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).
For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).
The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!
Here's my code:
for index, row in Sourcedataframe.iterrows(): #going through each row line by line
#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe:
if count > 3:
count = 1
#if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
if count == 1:
with open('blankAgentList.csv') as infile:
with open('agentList1.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 2:
with open('blankAgentList.csv') as infile:
with open('agentList2.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 3:
with open('blankAgentList.csv') as infile:
with open('agentList3.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
count = count + 1 #counts how many times it has ran through the main for loop.
convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row
then,pass row/record no in iloc function to fetch particular record using
( df.iloc[ 3 , : ])
you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe. so just get rid of reader = csv.DictReader(infile) and read data like this:
Sourcedataframe.iloc[column]
Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.
Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.
Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, i.e. the rows can be split evenly to the output files:
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = zip(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in zip(outfiles, rg):
f.write(r)
for f in outfiles:
f.close()
Otherwise, for any number of data rows we can use
import itertools as it
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in it.zip_longest(outfiles, rg):
if r is None:
break
f.write(r)
for f in outfiles:
f.close()
which, for example, with an input file of
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
produces (output copied straight from the terminal)
(base) SO $ cat blankAgentList.csv
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
(base) SO $ cat blankAgentList1.csv
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c
(base) SO $ cat blankAgentList2.csv
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c
(base) SO $ cat blankAgentList3.csv
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c
Note: I understand the line
row_groups = zip(*[iter(infile)]*NUM_ROWS)
may be intimidating at first (it was for me when I started).
All it does is simply to group consecutive lines from the input file.
If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.
One key subject is the iteration protocol, along with all the other protocols. And namespaces.

CSV code not looping

I'm trying to make a small Python script to speed up things at work and have a small script kind of working, but it's not working as I want it to. Here's the current code:
import re
import csv
#import pdb
#pdb.set_trace()
# Variables
newStock = "newStock.csv" #csv file with list of new stock
allActive = "allActive.csv" #csv file with list of all active
skusToCheck= []
totalNewProducts = 0
i = 0
# Program Start - Open first csv
a = open(newStock)
csv_f = csv.reader(a)
# Copy each row into array thingy
for row in csv_f:
skusToCheck.append(row[0])
# Get length of array
totalNewProducts = len(skusToCheck)
# Open second csv
b = open(allActive)
csv_f = csv.reader(b)
# Open blank csv file to write to
csvWriter = csv.writer(open('writeToMe.csv', 'w'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
# Check first value in first row,first file against each entry in 2nd row in second file
with open(allActive, 'rt') as b:
reader = csv.reader(b, delimiter=",")
for row in reader:
if skusToCheck[i] == row[1]:
print(skusToCheck[i]) # output to screen for debugging
print(row) # debugging
csvWriter.writerow(row) #write matching row to new file
i += 1 # increment where we are in the first file
Pseudo code would be:
Open file one and store all values from column one in skusToCheck
Check this value against values in column 2 in file 2
If it finds a match, (once I have this working, i want it to look for partial matches too) copy the row to file 3
If not move onto the next value in skusToCheck and repeat
I can't seem to get lines 33 - 40 to loop. It will check the first value and find a match in the second file, but won't move onto the next value from skusToCheck.
You need to follow the hint from jonrsharpe's first comment, i.e. modify your while loop to
# Check first value in first row,first file against each entry in 2nd row in second file
with open(allActive, 'rt') as b:
reader = csv.reader(b, delimiter=",")
for row in reader:
if len(row)>1:
for sku in skusToCheck:
if sku == row[1]:
print(sku) # output to screen for debugging
print(row) # debugging
csvWriter.writerow(row) #write matching row to new file
break
This checks if each single sku is matching for all of the rows in allActive

How do create new column in csv file using python by shifting one row

I have CSV file like below. It is huge file with thousands of records.
input.csv
No;Val;Rec;CSR
0;10;1;1200
0;100;2;1300
0;100;3;1300
0;100;4;1400
0;10;5;1200
0;11;6;1200
I want to create output.csv file by adding new column "PSR" after 1st column "No". This column value depends on column "PSR" Value. For 1st row, "PSR" shall be zero. From next record on-wards, it depends on "CSR" value in previous row. If present and previous record CSR value is same, then "PSR" shall be zero. If not, PSR value shall have the previous CSR value. For exmple, Value of CSR in 2nd row is 1300 which is different to the value in 1st record ( it is 1200). So PSR value for 2nd row shall be 1200. Where in 2nd and 3rd row, CSR value is same. So PSR value for 3rd row shall be zero. So new value PSR depends on CSR value in present and previous field.
Output.csv
No;PCR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200
My Approach:
Use csv.reader and iterate over the objects in a list. Copy 5th column to 2nd column in list. Shift it one row down.
Then check the values in 2nd and 5th column (PCR and CSR), if both values are same. Replace the PCR value with zero.
I have problem in getting 1st step coded. I am able to duplicate the column but not able to shift it. Also 2nd step is quite straightforward.
Also, I am not sure whether this approach is correct Any pointers/recommendation would be really helpful.
Note: I am not able to install Pandas on CentOS. So help without this module would be better.
My Code:
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
reader = csv.reader(input, delimiter = ';')
writer = csv.writer(output, delimiter = ';')
mylist = []
header = next(reader)
mylist.append(header)
for rec in reader:
mylist.append(rec)
rec.insert(1, rec[3])
mylist.append(rec)
writer.writerows(mylist)
If your open to non-python solutions then awk could be a good option:
awk 'NR==1{$2="PSR;"$2}NR>1{$2=($4==a?0";"$2:+a";"$2);a=$4}1' FS=';' OFS=';' file
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200
Awk is distributed with pretty much all Linux distributions and was designed exactly for this kind of task. It will blaze through your file. Add a redirection to the end > output.csv to save the output in a file.
A simple python approach using the same logic:
#!/usr/bin/env python
last = "0"
with open('input.csv') as csv:
print next(csv).strip().replace(';', ';PSR;', 1)
for line in csv:
field = line.strip().split(';')
if field[3] == last: field.insert(1, "0")
else: field.insert(1, last)
last = field[4]
print ';'.join(field)
Produces the same output:
$ python parse.py
No;PSR;Val;Rec;CSR
0;0;10;1;1200
0;1200;100;2;1300
0;0;100;3;1300
0;1300;100;4;1400
0;1400;10;5;1200
0;0;11;6;1200
Again just redirect the output to save it:
$ python parse.py > output.csv
Just code it as you explained it. Store the previous CSR and refer to it on the next loop through; just be sure to update it.
import csv
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
reader = csv.reader(input, delimiter = ';')
writer = csv.writer(output, delimiter = ';')
mylist = []
header = next(reader)
mylist.append(header)
mylist.insert(1,'PCR')
prev_csr = 0
for rec in reader:
rec.insert(1,prev_csr)
mylist.append(rec)
prev_csr = rec[4]
writer.writerows(mylist)
with open('input.csv', 'r') as input, open('output.csv', 'w') as output:
reader = csv.reader(input, delimiter = ';')
writer = csv.writer(output, delimiter = ';')
header = next(reader)
header.insert(1, 'PCR')
writer.writerow(header)
prevRow = next(reader)
prevRow.insert(1, '0')
writer.writerow(prevRow)
for row in reader:
if prevRow[-1] == row[-1]:
val = '0'
else:
val = prevRow[-1]
row.insert(1,val)
prevRow = row
writer.writerow(row)
Or, even easier using the DictReader and DictWriter capabilities of csv:
input_header = ['No','Val','Rec','CSR']
output_header = ['No','PCR','Val','Rec','CSR']
with open('input.csv', 'rb') as in_file, open('output.csv', 'wb') as out_file:
in_reader, out_writer = DictReader(in_file, input_header, delemeter =';'), DictWriter(out_file, output_header, delemeter =';')
in_reader.next() # skip the header
out_writer.writeheader() # place the output header
last_csr = None
for row in in_reader():
current_csr = row['CSR']
row['PCR'] = last_csr if current_csr != last_csr else 0
last_csr = current_csr
out_writer.writerow(row)

Skip first row in a file

I'm trying to open 2 files, read the contents of both files, and export the rows of both files to an output file.
The issue I'm having is that the 2nd file that I'm reading in has the same headers in it's first row as the 1st, so I'm trying to skip writing the 1st row of the 2nd file to my new outputFile, but the code written below doesn't execute that.
I know my code says if row[1]=='Full Name' and not column[1], but if I skip writting row[1] to the outputFile, it skips the first column and not the first row, so I figured I'd use that first column in my if statement. The first column's header in my input data is Full Name, so that's why I used that particular if statement.
I didn't include incoming data because I didn't think it was necesary to answer this question, but if you feel it would be helpful, I'm more than glad to post it up here.
If anyone can help me skip writing the first row of the second incoming into my outputFile it would be greatly appreciated.
import csv, sys
firstFile = open(sys.argv[1],'rU')# U to parse
reader1 = csv.reader(firstFile, delimiter=',')
outputFile = open((sys.argv[3]), 'w')
for row in reader1:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
outputFile.write(row)
outputFile.write('\n')
secondFile = open(sys.argv[2],'rU')
reader2 = csv.reader(secondFile, delimiter=',')
for row in reader2:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
if row[1]=='Full Name':
next(reader2)
else:
outputFile.write(row)
outputFile.write('\n')
You can use the next function:
secondFile = open(sys.argv[2],'rU')
# skip first line
next(secondFile)
# csv parser does not see the first line of the file
reader2 = csv.reader(secondFile, delimiter=',')

Categories

Resources