Remove multiple lines from csv - python

This is my code so far, I have many lines in a CSV that I would like to keep, but if it's the 3rd line, then ignore
This is the line I'd like to be omitted if it is not the third row:
Curriculum Name,,Organization Employee Number,Employee Department,Employee Name,Employee Email,Employee Status,Date Assigned,Completion Date,Completion Status,Manager Name,Manager Email
it is appearing every 10 lines or so, but i want it removed if its not the first row (always the third)
import csv, sys, os
#Read the CSV file and skipping the first 130 lines based on mylist
scanReport = open('Audit.csv', 'r')
scanReader = csv.reader(scanReport)
#search row's in csv - print out list
for file in glob.glob(r'C:\sans\Audit.csv'):
lineNumber = 0
str - "Curriculum Name"
with open('first.csv', 'rb') as inp, open('first_edit.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.writer(inp):
if row[2] != " 0":
writer.writerow(row)

You want something like this in that loop:
index = 0
for row in csv.writer(inp):
if (index != 3) or (index == 3 and row[2] != " 0"):
writer.writerow(row)
index += 1
I am not familiar with the csv module, so I kept all your stuff assuming it is correct (I don't think you need that module for what you are doing though...)
More info on enumerate here.
EDIT:
To check if it's that line:
def IsThatLine(row):
return row[0] == "Curriculum Name" and row[1] == "" and row[2] == "Organization Employee" and ....
Then the if can become:
if (index != 3) or (index == 3 and not IsThatLine(row)):

Could you please be more specific in your question?
Would you like to remove any line containing the following description?
Curriculum Name,,Organization Employee Number,Employee Department,Employee Name,Employee Email,Employee Status,Date Assigned,Completion Date,Completion Status,Manager Name,Manager Email
Or would you like to remove only the third line (row) of this csv file?

Related

Compare 2 csv files and check for first 2 columns, if it matches ask the user to decide to override or not and then proceed to next row

I have a use case where I have 2 CSV files with some rows in each CSV file, and they have three columns each. Compare the 2 csv files for first 2 columns and if it matches then ask the user input if he wants to override the row in the first csv file with the values from second csv file, if not abort the operation.
First time when I run the python code it should update the csv file with the new values from the 2nd CSV file to first csv file, but for consecutive runs of my python code I have to check if first 2 columns match and ask the user to decide if he needs to override the values or not, since now the first csv file will have rows from first csv file.
My code:
import csv
import sys
def csv_file_copy():
csv_file = input("Enter the CSV file needs to be updated ")
csv_file_cp = input("Enter the csv file from where the data needs to be copied ")
csvfile = open(csv_file_cp, 'r',encoding="utf-8-sig")
reader = csv.reader(csvfile)
csv_file_orig = open(csv_file, 'r',encoding="utf-8-sig")
reader2 = csv.reader(csv_file_orig)
res = []
for row in reader:
print("This is row", row)
for row2 in reader2:
print("This is row2", row2)
if (row2[0] == row[0] and row2[1] == row[1]):
user_input = input("Store type and store number already exists in the csv file, continue? y/n ").lower()
if user_input == "y":
res.append(row)
elif user_input == "n":
print("Aborting operation")
sys.exit(1)
else:
res.append(row2)
res.append(row)
continue
print (reader)
with open(csv_file, 'w') as csv_file1:
writer = csv.writer(csv_file1, delimiter=',')
for row in res:
writer.writerow(row)
csv_file_copy()
When the code is executed second time against the same 2 files the second for loop runs only once thereby matching only one value but there are about 10 values that is matching which doesn't work for me.
If the csv_file_orig is not too big (or your available memory too low) then you may store the whole contents into a list.
Instead of
reader2 = csv.reader(csv_file_orig)
You'll use
csv_file_orig_lines = list(csv.reader(csv_file_orig))
Afterwards you may iterate through csv_file_orig_lines list as many times as you want.

Python: How to iterate every third row starting with the second row of a csv file

I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.
For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)
For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).
For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).
The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!
Here's my code:
for index, row in Sourcedataframe.iterrows(): #going through each row line by line
#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe:
if count > 3:
count = 1
#if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
if count == 1:
with open('blankAgentList.csv') as infile:
with open('agentList1.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 2:
with open('blankAgentList.csv') as infile:
with open('agentList2.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 3:
with open('blankAgentList.csv') as infile:
with open('agentList3.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
count = count + 1 #counts how many times it has ran through the main for loop.
convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row
then,pass row/record no in iloc function to fetch particular record using
( df.iloc[ 3 , : ])
you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe. so just get rid of reader = csv.DictReader(infile) and read data like this:
Sourcedataframe.iloc[column]
Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.
Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.
Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, i.e. the rows can be split evenly to the output files:
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = zip(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in zip(outfiles, rg):
f.write(r)
for f in outfiles:
f.close()
Otherwise, for any number of data rows we can use
import itertools as it
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in it.zip_longest(outfiles, rg):
if r is None:
break
f.write(r)
for f in outfiles:
f.close()
which, for example, with an input file of
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
produces (output copied straight from the terminal)
(base) SO $ cat blankAgentList.csv
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
(base) SO $ cat blankAgentList1.csv
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c
(base) SO $ cat blankAgentList2.csv
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c
(base) SO $ cat blankAgentList3.csv
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c
Note: I understand the line
row_groups = zip(*[iter(infile)]*NUM_ROWS)
may be intimidating at first (it was for me when I started).
All it does is simply to group consecutive lines from the input file.
If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.
One key subject is the iteration protocol, along with all the other protocols. And namespaces.

Replacing and deleting columns from a csv using python

Here is a code that I am writing
import csv
import openpyxl
def read_file(fn):
rows = []
with open(fn) as f:
reader = csv.reader(f, quotechar='"',delimiter=",")
for row in reader:
if row:
rows.append(row)
return rows
replace = {x[0]:x[1:] for x in read_file("replace.csv")}
delete = set( (row[0] for row in read_file("delete.csv")) )
result = []
input_file="input.csv"
with open(input_file) as f:
reader = csv.reader(f, quotechar='"')
for row in reader:
if row:
if row[7] in delete:
continue
elif row[7] in replace:
result.append(replace[row[7]])
else:
result.append(row)
with open ("done.csv", "w+", newline="") as f:
w = csv.writer(f,quotechar='"', delimiter= ",")
w.writerows(result)
here are my files:
input.csv:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-","aaaaa","-","-","bbbbb","-",","
"-","-","-","-","-","-","-","ccccc","-","-","ddddd","-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","
this is a 13 column csv. I am interested only in the 8th and the 11th fields.
this is my replace.csv:
"aaaaa","11111","22222"
delete.csv:
ccccc
so what I am doing is compare the first column of replace.csv(line by line) with the 8th column of input.csv and if they match then replace 8th column of input.csv with the second column of replace.csv and 11th column of input with the 3rd column of replace.csv
and for delete.csv it compares both files line by line and if match is found it deletes the entire row.
and if any line is not present in either replace.csv or delete.csv then print the line as it is.
so my desired output is:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
"-","-","-","-","-","-","-",11111,"-","-",22222,"-",","
"-","-","-","-","-","-","-","eeeee","-","-","fffff","-",","
but when I run this code it gives me an output like this:
c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13
11111,22222
where am I going wrong?
I am trying to make changes to my program that I had earlier posted a question about.Since the input file has changed I am trying to make changes to my program.
https://stackoverflow.com/a/54388144/9279313
#anuj
I think SafeDev's solution is optimal but if you don't want to go with pandas, just make little changes in your code.
for row in reader:
if row:
if row[7] in delete:
continue
elif row[7] in replace:
key = row[7]
row[7] = replace[key][0]
row[10]= replace[key][1]
result.append(row)
else:
result.append(row)
Hope this solves your issue.
It's actually quite simple. Instead of making it by scratch just use the panda library. From there it's easier to handle any dataset. This is how you would do it:
EDIT:
import pandas as pd
input_csv = pd.read_csv('input.csv')
replace_csv = pd.read_csv('replace.csv', header=None)
delete_csv = pd.read_csv('delete.csv')
r_lst = [i for i in replace_csv.iloc[:, 0]]
d_lst = [i for i in delete_csv]
input2_csv = pd.DataFrame.copy(input_csv)
for i, row in input_csv.iterrows():
if row['c8'] in r_lst:
input2_csv.loc[i, 'c8'] = replace_csv.iloc[r_lst.index(row['c8']), 1]
input2_csv.loc[i, 'c11'] = replace_csv.iloc[r_lst.index(row['c8']), 2]
if row['c8'] in d_lst:
input2_csv = input2_csv[input2_csv.c8 != row['c8']]
input2_csv.to_csv('output.csv', index=False)
This process can be made even more dynamic by turning it into a function that has parameters of column names and replacing 'c8' and 'c11' with those two parameters.

Make a new list from CSV

So, I've search for a method to show a certain csv field based on input, and I've try to apply the code for my program. But the problem is I want to get a certain item in csv and make a new list from certain index.
I have csv file like this:
code,place,name1,name2,name3,name4
001,Home,Laura,Susan,Ernest,Toby
002,Office,Jack,Rachel,Victor,Wanda
003,Shop,Paulo,Roman,Brad,Natali
004,Other,Charles,Matthew,Justin,Bono
at first I have this code, and it works show all the row:
import csv
number = input('Enter number to find\n')
csv_file = csv.reader(open('residence.csv', 'r'), delimiter=",")
for row in csv_file:
if number == row[0]:
print (row)
**input : 001
**result : [001, Home, Laura, Susan, Ernest, Toby]
then, I try to make a certain row in the result to add the items to a new list. But it didn't work. Here's the code:
import csv
res = []
y = 2
number = input('Enter number to find\n')
csv_file = csv.reader(open('residence.csv', 'r'), delimiter=",")
for row in csv_file:
if number == row[0]:
while y <= 5:
res.append(str(row[y]))
y = y+1
print (res)
**input : 001
**expected result : [Laura, Susan, Ernest, Toby]
I want to make a new list that contains row name1, name2, name3, and name4, and then I want to print the list. But I guess the loop is wrongly placed or I missed something.
There are a couple of things you could fix in your code.
You are not skipping the header line when iterating through the rows. This means you will not always match an actual row number.
Your y variable is not re-initialized. It would be more idiomatic to use a for loop instead of a while anyhow.
If more than one row match, it will break (see 2.). If you know you will never have more than one match, you should break after you append the values to the list.
Your file is never closed. Also it should be opened with newline='' (see csv module docs)
Lastly, you match the actual string ('001'), vs. an integer (1), which could be the source of confusion when entering the input.
An updated version:
import csv
res = []
number = input('Enter number to find\n')
with open('residence.csv', newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
next(csv_reader) # Skip header line
for row in csv_reader:
if number == row[0]:
for i in range(2, 6):
res.append(str(row[i]))
break
print(res)

Find value in CSV column, return line number of the value

I have a CSV file with contents:
scenario1,5,dosomething
scenario2,10,donothing
scenario3,8,dosomething
scenario4,5,donothing
I would like to take the contents of a variable to firstly see if it is in the first column, if true - I would like to get the row number where it is found and the entire line contents. There will be no duplicate values in column 1 of the csv.
I can partly do the first step which is to find if the variable is in the csv, returning the whole line.
import csv
filename = csv.reader(open('/file.csv', "rb"), delimiter=",")
v = 'scenario1'
for row in configfile:
if 'v' in row[0]:
print row
The results I receive would be:
['scenario1','5','dosomething']
But I need assistance with the second part please. This is to find the row number.
Try this:
import csv
with open("ooo.csv", "r") as f:
reader = csv.reader(f)
for line_num, content in enumerate(reader):
if content[0] == "scenario1":
print content, line_num + 1
Or without csv module:
with open("ooo.csv") as f:
for l, i in enumerate(f):
data = i.split(",")
if data[0] == "scenario1":
print data, l + 1
Output:
['scenario1', '5', 'dosomething'] 1

Categories

Resources