Parsing CSV files using Python 2.7 - python

I'm trying to write a script that will open a CSV file and write rows from that file to a new CSV file based on the match criteria of a unique telephone number in column 4 of csv.csv. The phone numbers are always in column 4, and are often duplicated in the file, however the other columns are often unique, thus each row is inherently unique.
A row from the csv file I'm reading looks like this: (the TN is 9259991234)
2,PPS,2015-09-17T15:44,9259991234,9DF51758-A2BD-4F65-AAA2
I hit an error with the code below saying that '_csv.writer' is not iterable and I'm not sure how to modify my code to solve the problem.
import csv
import sys
import os
os.chdir(r'C:\pTest')
with open(r'csv.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
with open (r'new_csv.csv', 'ab') as new_f:
writer = csv.writer(new_f, delimiter=',')
for row in reader:
if row[3] not in writer:
writer.writerow(new_f)

Your error stems from this expression:
row[3] not in writer
You cannot test for membership against a csv.writer() object. If you wanted to track if you already have processed a phone number, use a separate set() object to track those:
with open(r'csv.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
with open (r'new_csv.csv', 'ab') as new_f:
writer = csv.writer(new_f, delimiter=',')
seen = set()
for row in reader:
if row[3] not in seen:
seen.add(row[3])
writer.writerow(row)
Note that I also changed your writer.writerow() call; you want to write the row, not the file object.

Related

CSV reading and writing; outputted CSV is blank

My program needs a function that reads data from a csv file ("all.csv") and extracts all the data pertaining to 'Virginia' (extract each row that has 'Virginia in it), then writes the extracted data to another csv file named "Virginia.csv" The program runs without error; however, when I open the "Virginia.csv" file, it is blank. My guess is that the issue is with my nested for loop, but I am not entirely sure what is causing the issue.
Here is the data within the all.csv file:
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
Here is my code:
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') #split elements
for row in range(len(contents)):
for word in range(len(contents[row])):
if contents[row][2] == state:
writer.writerow(row)
extract_records_for_state(input_file,output_file,state)
I ran your code and it gave me an error
Traceback (most recent call last):
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 27, in
extract_records_for_state(input_file, output_file, state)
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 24, in extract_records_for_state
writer.writerow(row)
_csv.Error: iterable expected, not int,
I fixed the error by putting the contents of the row [contents[row]] into the writerow() function and ran it again and the data showed up in Virginia.csv. It gave me duplicates so I also removed the word for-loop.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state(input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') # split elements
print(contents)
for row in range(len(contents)):
if contents[row][2] == state:
writer.writerow(contents[row]) # this is what I changed
extract_records_for_state(input_file, output_file, state)
You have two errors. The first is that you try to write the row index at writer.writerow(row) - the row is contents[row]. The second is that you leave the newline in the final column on read but don't strip it on write. Instead you could leverage the csv module more fully. Let the reader parse the rows. And instead of reading into a list, which uses a fair amount of memory, filter and write row by row.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r', newline='') as infile, \
open(output_file, 'w', newline="") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# add header
writer.writerow(next(reader))
# filter for state
writer.writerows(row for row in reader if row[2] == state)
extract_records_for_state(input_file,output_file,state)
Looking at your code two things jump out at me:
I see a bunch of nested statements (logic)
I see you reading a CSV as plain text, then interpreting it as CSV yourself (contents[row] = contents[row].split(',')).
I recommend two things:
break up logic into distinct chunks: all that nesting can be hard to interpret and debug; do one thing, prove that works; do another thing, prove that works; etc...
use the CSV API to its fullest: use it to both read and write your CSVs
I don't want to try and replicate/fix your code, instead I'm offering this general approach to achieve those two goals:
import csv
# Read in
all_rows = []
with open('all.csv', 'r', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header (I didn't see you keep it)
for row in reader:
all_rows.append(row)
# Process
filtered_rows = []
for row in all_rows:
if row[2] == 'Virginia':
filtered_rows.append(row)
# Write out
with open('filtered.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(filtered_rows)
Once you understand both the logic and the API of those discrete steps, you can move on (advance) to composing something more complex, like the following which reads a row, decides if it should be written, and if so, writes it:
import csv
with open('filtered.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
with open('all.csv', 'r', newline='') as f_in:
reader = csv.reader(f_in)
next(reader) # discard header
for row in reader:
if row[2] == 'Virginia':
writer.writerow(row)
Using either of those two pieces of code on this (really scaled-down) sample of all.csv:
date,county,state,fips,cases,deaths
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
2020-03-09,Chelan,Washington,53007,1,1
2020-03-09,Clark,Washington,53011,1,0
gets me a filtered.csv that looks like:
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
Given the size of this dataset, the second approach of write-on-demand-inside-the-read-loop is both faster (about 5x faster on my machine) and uses significantly less memory (about 40x less on my machine) because there's no intermediate storage with all_rows.
But, please take the time to run both, read them carefully, and see how each works the way it does.

How to add a header to an existing CSV file without replacing the first row?

What I want to do is actually as it is written in the title.
with open(path, "r+", newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
list_of_column_names = []
num_cols = len(next(csv_reader))
for i in range(num_cols):
list_of_column_names.append(i)
fields = list_of_column_names
with open(example.csv, "r+", newline='') as writeFile:
csvwriter = csv.DictWriter(writeFile, delimiter=',', lineterminator='\n', fieldnames=fields)
writeFile.seek(0, 0)
csvwriter.writeheader()
I want to enumerate the columns which initially doesn't have any column names. But when I run the code, it replaces the data in the first row. For example:
example.csv:
a,b
c,d
e,f
what I want:
0,1
a,b
c,d
e,f
what happens after running the code:
0,1
c,d
e,f
Is there a way to prevent this from happening?
There's no magical way to insert a line into an existing text file.
The following is how I think of doing this, and your code is already getting steps 2-4. Also, I wouldn't mess with the DictWriter since you're not trying to convert a Python dict to CSV (I can see you using it for writing the header, but that's easy enough to do with the regular reader/writer):
open a new file for writing
read the first row of your CSV
interpret the column indexes as the header
write the header
write the first row
read/write the rest of the rows
move the new file back to the old file, overwrite (not shown)
Here's what that looks like in code:
import csv
with open('output.csv', 'w', newline='') as out_f:
writer = csv.writer(out_f)
with open('input.csv', newline='') as in_f:
reader = csv.reader(in_f)
# Read the first row
first_row = next(reader)
# Count the columns in first row; equivalent to your `for i in range(len(first_row)): ...`
header = [i for i, _ in enumerate(first_row)]
# Write header and first row
writer.writerow(header)
writer.writerow(first_row)
# Write rest of rows
for row in reader:
writer.writerow(row)

Convert from CSV to array in Python

I have a CSV file containing the following.
0.000264,0.000352,0.000087,0.000549
0.00016,0.000223,0.000011,0.000142
0.008853,0.006519,0.002043,0.009819
0.002076,0.001686,0.000959,0.003107
0.000599,0.000133,0.000113,0.000466
0.002264,0.001927,0.00079,0.003815
0.002761,0.00288,0.001261,0.006851
0.000723,0.000617,0.000794,0.002189
I want convert the values into an array in Python and keep the same order (row and column). How I can achieve this?
I have tried different functions but ended with error.
You should use the csv module:
import csv
results = []
with open("input.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
This gives:
[[0.000264, 0.000352, 8.7e-05, 0.000549],
[0.00016, 0.000223, 1.1e-05, 0.000142],
[0.008853, 0.006519, 0.002043, 0.009819],
[0.002076, 0.001686, 0.000959, 0.003107],
[0.000599, 0.000133, 0.000113, 0.000466],
[0.002264, 0.001927, 0.00079, 0.003815],
[0.002761, 0.00288, 0.001261, 0.006851],
[0.000723, 0.000617, 0.000794, 0.002189]]
If your file doesn't contain parentheses
with open('input.csv') as f:
output = [float(s) for line in f.readlines() for s in line[:-1].split(',')]
print(output);
The csv module was created to do just this. The following implementation of the module is taken straight from the Python docs.
import csv
with open('file.csv','rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in reader:
#add data to list or other data structure
The delimiter is the character that separates data entries, and the quotechar is the quotechar.

Create subset of large CSV file and write to new CSV file

I would like to create a subset of a large CSV file using the rows that have the 4th column ass "DOT" and output to a new file.
This is the code I currently have:
import csv
outfile = open('DOT.csv','w')
with open('Service_Requests_2015_-_Present.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
outfile.write(row)
outfile.close()
The error is:
outfile.write(row)
TypeError: must be str, not list
How can I manipulate row so that I will be able to just straight up do write(row), if not, what is the easiest way?
You can combine your two open statements, as the with statement accepts multiple arguments, like this:
import csv
infile = 'Service_Requests_2015_-_Present.csv'
outfile = 'DOT.csv'
with open(infile, encoding='utf-8') as f, open(outfile, 'w') as o:
reader = csv.reader(f)
writer = csv.writer(o, delimiter=',') # adjust as necessary
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
# no need for close statements
print('Done')
Make your outfile a csv.writer and use writerow instead of write.
outcsv = csv.writer(outfile, ...other_options...)
...
outcsv.writerow(row)
That is how I would do it... OR
outfile.write(",".join(row)) # comma delimited here...
In Above code you are trying to write list with file object , we can not write list that give error "TypeError: must be str, not list" you can convert list in string format then you able to write row in file. outfile.write(str(row))
or
import csv
def csv_writer(input_path,out_path):
with open(out_path, 'ab') as outfile:
writer = csv.writer(outfile)
with open(input_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
outfile.close()
csv_writer(input_path,out_path)
[This code for Python 3 version. In Python 2.7, the open function does not take a newline argument, hence the TypeError.]

Finding string in row to overwrite this row of CSV using Python 2.7

I'm following some feedback from another thread, but have gotten stuck. I'm looking to search an existing csv file to locate the row in which a string occurs. I am then looking to update this row with new data.
What I have so far gives me an "TypeError: unhasable type: 'list'":
allLDR = []
with open('myfile.csv', mode='rb') as f:
reader = csv.reader(f)
#allLDR.extend(reader)
for num, row in enumerate(reader):
if myField in row[0]:
rowNum = row
line_to_override = {rowNum:[nMaisonField, entreeField, indiceField, cadastreField]}
with open('myfile.csv', 'wb') as ofile:
writer = csv.writer(ofile, quoting=csv.QUOTE_NONE, delimiter=',')
#for line, row in enumerate(allLDR):
for line, row in enumerate(reader):
data = line_to_override.get(line, row)
writer.writerow(data)
The line allDR.extend(reader) consumes all of the input lines from the csv.reader object. Therefore, the for loop never runs, and rowNum=row is never executed, and {rowNum:blah} generates an exception.
Try commenting out the allDR.extend(reader) line.
As a debugging aid, try adding print statements inside the for loop and inside the conditional.
Here is a program which does what I think you want your program to do: it reads in myfile.csv, modifies rows conditionally based on the content of the first cell, and writes the file back out.
import csv
with open('myfile.csv', mode='rb') as ifile:
allDR = list(csv.reader(ifile))
for row in allDR:
if 'fowl' in row[0]:
row[:] = ['duck', 'duck', 'goose']
with open('myfile.csv', 'wb') as ofile:
csv.writer(ofile).writerows(allDR)

Categories

Resources