I'm trying to make a small Python script to speed up things at work and have a small script kind of working, but it's not working as I want it to. Here's the current code:
import re
import csv
#import pdb
#pdb.set_trace()
# Variables
newStock = "newStock.csv" #csv file with list of new stock
allActive = "allActive.csv" #csv file with list of all active
skusToCheck= []
totalNewProducts = 0
i = 0
# Program Start - Open first csv
a = open(newStock)
csv_f = csv.reader(a)
# Copy each row into array thingy
for row in csv_f:
skusToCheck.append(row[0])
# Get length of array
totalNewProducts = len(skusToCheck)
# Open second csv
b = open(allActive)
csv_f = csv.reader(b)
# Open blank csv file to write to
csvWriter = csv.writer(open('writeToMe.csv', 'w'), delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
# Check first value in first row,first file against each entry in 2nd row in second file
with open(allActive, 'rt') as b:
reader = csv.reader(b, delimiter=",")
for row in reader:
if skusToCheck[i] == row[1]:
print(skusToCheck[i]) # output to screen for debugging
print(row) # debugging
csvWriter.writerow(row) #write matching row to new file
i += 1 # increment where we are in the first file
Pseudo code would be:
Open file one and store all values from column one in skusToCheck
Check this value against values in column 2 in file 2
If it finds a match, (once I have this working, i want it to look for partial matches too) copy the row to file 3
If not move onto the next value in skusToCheck and repeat
I can't seem to get lines 33 - 40 to loop. It will check the first value and find a match in the second file, but won't move onto the next value from skusToCheck.
You need to follow the hint from jonrsharpe's first comment, i.e. modify your while loop to
# Check first value in first row,first file against each entry in 2nd row in second file
with open(allActive, 'rt') as b:
reader = csv.reader(b, delimiter=",")
for row in reader:
if len(row)>1:
for sku in skusToCheck:
if sku == row[1]:
print(sku) # output to screen for debugging
print(row) # debugging
csvWriter.writerow(row) #write matching row to new file
break
This checks if each single sku is matching for all of the rows in allActive
Related
i'm fairly new to python and am looking for some help. What i would like to do is read a csv file and then use a for loop with an if statement to locate at rows of that data contain a value and print it out with a header and some formatting using f'.
The issue i seem to have it when finding the data using the if statement, im unsure what i can output the data to, which will then enable it to be printed out (the search output could contain multiple rows and columns):
with open(r'data.csv', 'r') as csv_file:
# loop through the csv file using for loop
for row in csv_file:
# search each row of data for the input from the user
if panel_number in row:
??
Use the csv module. Then in your if statement you can append the row to a list of matches
import csv
matched_rows = []
with open(r'data.csv', 'r') as file:
file.readline() # skip over header line -- remove this if there's no header
csv_file = csv.reader(file)
for row in csv_file:
# search each row of data for the input from the user
if row[0] == panel_number:
matched_rows.append(row)
print(matched_rows)
I have a following problem. Lets say I want to write a word on the cell in column = 1 & and row = 3.
I have written this function:
import csv
def write_to_csv(myfile, word):
with open(myfile, "w", newline="") as csv_file:
csv_writer = csv.writer(csv_file)
write = [word]
csv_writer.writerow(elt for elt in write)
write_to_csv("output.csv", "hello")
My function writes the word "hello" into the cell in column = 1 & and row = 1.
Now imagine that my output.csv already has someting in the first cell. And I don`t want to rewrite it. So how can I modify my function to write the word "hello" on column = 1 & and row = 3 ?
I found this queston, but it did not help me: How to select every Nth row in CSV file using python
Thank you very much!
A CSV file is a text file. That means that you should not try to overwrite it in place. The common way is to copy it to a new file introducing your changes at that time. When done, you move the new file with the old name.
Here is a possible code. It just hope that the file output.csv.tmp does not exist, but can be created and that output.csv has at least 4 lines:
def write_to_csv(myfile, word, row_nb, col_nb):
"""Updates a csv file by writing word at row_nb row and col_nb column"""
with open(myfile) as csv_file, open(myfile+'.tmp', "w", newline='') as out:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(out)
#skip row_nb rows
for i in range(row_nb):
csv_writer.writerow(next(csv_reader))
# read and change the expected row
row = next(csv_reader)
row[col_nb] = word
# print(row) # uncomment for debugging
csv_writer.writerow(row)
# copy last rows
for row in csv_reader:
csv_writer.writerow(row)
# rename the tmp file
os.remove(myfile)
os.rename(myfile+'.tmp', myfile)
# write hello at first column of fourth row in output.csv
write_to_csv('output.csv', 'hello', 3, 0)
I'm trying to write a program that iterates through the length of a csv file row by row. It will create 3 new csv files and write data from the source csv file to each of them. The program does this for the entire row length of the csv file.
For the first if statement, I want it to copy every third row starting at the first row and save it to a new csv file(the next row it copies would be row 4, row 7, row 10, etc)
For the second if statement, I want it to copy every third row starting at the second row and save it to a new csv file(the next row it copies would be row 5, row 8, row 11, etc).
For the third if statement, I want it to copy every third row starting at the third row and save it to a new csv file(the next row it copies would be row 6, row 9, row 12, etc).
The second "if" statement I wrote that creates the first "agentList1.csv" works exactly the way I want it to but I can't figure out how to get the first "elif" statement to start from the second row and the second "elif" statement to start from the third row. Any help would be much appreciated!
Here's my code:
for index, row in Sourcedataframe.iterrows(): #going through each row line by line
#this for loop counts the amount of times it has gone through the csv file. If it has gone through it more than three times, it resets the counter back to 1.
for column in Sourcedataframe:
if count > 3:
count = 1
#if program is on it's first count, it opens the 'Sourcedataframe', reads/writes every third row to a new csv file named 'agentList1.csv'.
if count == 1:
with open('blankAgentList.csv') as infile:
with open('agentList1.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 2:
with open('blankAgentList.csv') as infile:
with open('agentList2.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
elif count == 3:
with open('blankAgentList.csv') as infile:
with open('agentList3.csv', 'w') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
count2 += 1
if not count2 % 3:
writer.writerow(row)
count = count + 1 #counts how many times it has ran through the main for loop.
convert csv to dataframe as (df.to_csv(header=True)) to start indexing from second row
then,pass row/record no in iloc function to fetch particular record using
( df.iloc[ 3 , : ])
you are open your csv file in each if claus from the beginning. I believe you already opened your file into Sourcedataframe. so just get rid of reader = csv.DictReader(infile) and read data like this:
Sourcedataframe.iloc[column]
Using plain python we can create a solution that works for any number of interleaved data rows, let's call it NUM_ROWS, not just three.
Nota Bene: the solution does not require to read and keep the whole input all the data in memory. It processes one line at a time, grouping the last needed few and works fine for a very large input file.
Assuming your input file contains a number of data rows which is a multiple of NUM_ROWS, i.e. the rows can be split evenly to the output files:
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = zip(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in zip(outfiles, rg):
f.write(r)
for f in outfiles:
f.close()
Otherwise, for any number of data rows we can use
import itertools as it
NUM_ROWS = 3
outfiles = [open(f'blankAgentList{i}.csv', 'w') for i in range(1,NUM_ROWS+1)]
with open('blankAgentList.csv') as infile:
header = infile.readline() # read/skip the header
for f in outfiles: # repeat header in all output files if needed
f.write(header)
row_groups = it.zip_longest(*[iter(infile)]*NUM_ROWS)
for rg in row_groups:
for f, r in it.zip_longest(outfiles, rg):
if r is None:
break
f.write(r)
for f in outfiles:
f.close()
which, for example, with an input file of
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
produces (output copied straight from the terminal)
(base) SO $ cat blankAgentList.csv
A,B,C
r1a,r1b,r1c
r2a,r2a,r2c
r3a,r3b,r3c
r4a,r4b,r4c
r5a,r5b,r5c
r6a,r6b,r6c
r7a,r7b,r7c
(base) SO $ cat blankAgentList1.csv
A,B,C
r1a,r1b,r1c
r4a,r4b,r4c
r7a,r7b,r7c
(base) SO $ cat blankAgentList2.csv
A,B,C
r2a,r2a,r2c
r5a,r5b,r5c
(base) SO $ cat blankAgentList3.csv
A,B,C
r3a,r3b,r3c
r6a,r6b,r6c
Note: I understand the line
row_groups = zip(*[iter(infile)]*NUM_ROWS)
may be intimidating at first (it was for me when I started).
All it does is simply to group consecutive lines from the input file.
If your objective includes learning Python, I recommend studying it thoroughly via a book or a course or both and practising a lot.
One key subject is the iteration protocol, along with all the other protocols. And namespaces.
I am trying to create a program which will scan a CSV for IMG SRC tags and then test them for their response. I'm stuck with this portion of the code which ideally searches the entire CSV document for a 'SRC' cell (to find the IMG SRC tags), and then assigns that column as the one to run the tests on. Here is my attempt:
src_check = ('SRC')
imp_check = ('Impression')
with open("ORIGINAL.csv", 'r') as csvfile:
reader = csv.reader(csvfile)
for i, row in enumerate(reader):
for j, column in enumerate(row):
if src_check in column[:]:
list = [column[j] for column in csv.reader(csvfile)]
My confusion comes from the fact that when I manually enter the column number into my program, it runs as it should: it tests each cell of the column/list and neatly writes the results next to each tag tested.
To reiterate my problem, I would like this snippet to find the first IMG SRC cell of the entire CSV. Then it would note the number of that column, and I can assign the entire column to a list for the tests to be run. For example, the process after would be:
Column 16 has been identified as carrying the IMG SRC tags.
Assign the contents of the column to a list.
Run request tests on list.
Right now the test result column does not line up with the tags that it tests. Does anyone have a better method in finding a cell based on a string and then assigning the column as a list, in-line with the cells it's testing?
You need to find the column index first and than rewind the file to the begnning before you read the column:
src_check = ('SRC')
with open("ORIGINAL.csv", 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if src_check in row:
col = row.index(src_check)
break
else:
raise ValueError('SRC not found')
# go back to beginning of the file
csfile.seek(0)
values = [row[col] for row in reader]
I suspect your problem is that both csvReader are using the same file descriptor and thus the offset is all messed up.
Try to do one thing after another (and/or reset the offset via csvfile.seek(0)) and it should work.
src_check = ('SRC')
with open("ORIGINAL.csv", 'r') as csvfile:
reader = csv.reader(csvfile)
col_index = -1
for row in reader:
for j, column in enumerate(row):
if src_check in column:
col_index = j
break
if col_index != -1:
break
else:
raise ValueError("Column not found")
csvfile.seek(0)
col_vals = [column[col_index] for column in reader]
print col_vals
Edit: Also you shouldn't use the name of builtin (like "list") as a variable name.
I want to perform multiple edits to most rows in a csv file without making multiple writes to the output csv file.
I have a csv that I need to convert and clean up into specific format for another program to use. For example, I'd like to:
remove all blank rows
remove all rows where the value of column "B" is not a number
with this new data, create a new column and explode the first part of the values in column B into the new column
Here's an example of the data:
"A","B","C","D","E"
"apple","blah","1","","0.00"
"ape","12_fun","53","25","1.00"
"aloe","15_001","51","28",2.00"
I can figure out the logic behind each process, but what I can't figure out is how to perform each process without reading and writing to a file each time. I'm using the CSV module. Is there a better way to perform these steps at once before writing a final CSV?
I would define a set of tests and a set of processes.
If all tests pass, all processes are applied, and the final result is written to output:
import csv
#
# Row tests
#
def test_notblank(row):
return any(len(i) for i in row)
def test_bnumeric(row):
return row[1].isdigit()
def do_tests(row, tests=[test_notblank, test_bnumeric]):
return all(t(row) for t in tests)
#
# Row processing
#
def process_splitb(row):
b = row[1].split('.')
row[1] = b[0]
row.append(b[1])
return row
def do_processes(row, processes=[process_splitb]):
for p in processes:
row = p(row)
return row
def main():
with open("in.csv","rb") as inf, open("out.csv","wb") as outf:
incsv = csv.reader(inf)
outcsv = csv.writer(outf)
outcsv.writerow(incsv.next()) # pass header row
outcsv.writerows(do_processes(row) for row in incsv if do_tests(row))
if __name__=="__main__":
main()
Simple for loops.
import csv
csv_file = open('in.csv', 'rb')
csv_reader = csv.reader(csv_file)
header = csv_reader.next()
header.append('F') #add new column
records = [header]
#process records
for record in csv_reader:
#skip blank records
if record == []:
continue
#make sure column "B" has 2 parts
try:
part1, part2 = record[1].split('_')
except:
continue
#make sure part1 is a digit
if not part1.isdigit():
continue
record[1] = part1 #make column B equal part1
record.append(part2) #add data for the new column F to record
records.append(record)
new_csv_file = open('out.csv', 'wb')
csv_writer = csv.writer(new_csv_file, quoting=csv.QUOTE_ALL)
for r in records:
csv_writer.writerow(r)
Why use the CSV module. A CSV is made up of text lines (strings) and you can use the Python string power (split, join, replace, len) to create your result.
line_cols = line.split(',') and back: line = ','.join(line_cols)