Argument 1 must be an iterator - what am I doing wrong? - python

I've got a section of code in a project that's supposed to be reading a CSV file and writing each row to an XLSX file. Right now I'm getting the error "argument 1 must be an iterator" when I run via command line.
Here is the relevant code:
import os
import openpyxl
import csv
from datetime import datetime
from openpyxl.reader.excel import load_workbook
...
plannum = 4
...
alldata_sheetname = ("All Test Data " + str(plannum))
wb = load_workbook("testingtemplate.xlsx", keep_vba=True)
...
ws_testdata = wb.get_sheet_by_name(alldata_sheetname)
...
with open("testdata.csv", 'r') as csvfile:
table = csv.reader(csvfile)
for row in table:
ws_testdata.append(row)
csv_read = csv.reader(csvfile)
...
And the specific error reads: "TypeError: argument 1 must be an iterator", and is referencing the last line of code I've provided.
Since it didn't complain about the first time I used csvfile, would it be better if I did something like csvfile = open("testdata.csv", "r") instead of using the with (and is that what I'm doing wrong here)? If that's the case, is there anything else I need to change?
Thanks to anyone who helps!!

You've closed the file by the time you get to csv_read = csv.reader(csvfile). Alternately you can keep the file open and store what you need in variables so you don't have to iterate over the file twice. E.g.:
csvfile = open("testdata.csv", "r")
table = csv.reader(csvfile)
for row in table:
ws_testdata.append(row)
# store what you need in variables
csvfile.close()

Related

Open file has data but reports back length 0 in python

I must be missing something very simple here, but I've been hitting my head against the wall for a while and don't understand where the error is. I am trying to open a csv file and read the data. I am detecting the delimiter, then reading in the data with this code:
with open(filepath, 'r') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read())
delimiter = repr(dialect.delimiter)[1:-1]
csvdata = [line.split(delimiter) for line in csvfile.readlines()]
However, my csvfile is being read as having no length. If I run:
print(sum(1 for line in csvfile))
The result is zero. If I run:
print(sum(1 for line in open(filepath, 'r')))
Then I get five lines, as expected. I've checked for name clashes by changing csvfile to other random names, but this does not change the result. Am I missing a step somewhere?
You need to move the file pointer back to the start of the file after sniffing it. You don't need to read the whole file in to do that, just enough to include a few rows:
import csv
with open(filepath, 'r') as f_input:
dialect = csv.Sniffer().sniff(f_input.read(2048))
f_input.seek(0)
csv_input = csv.reader(f_input, dialect)
csv_data = list(csv_input)
Also, the csv.reader() will do the splitting for you.

Python 3 - printing rows in csv reader object results in a single character on each line

I am trying to migrate some code from Python 2 to Python 3 and cannot figure out why it is printing one character at a time as if it is reading the file as one long string.
I have been looking into it and maybe a need to use newline='' when opening the file?
But how can I do that when using urlopen()?
import csv
import urllib.request
url = "http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv"
ftpstream = urllib.request.urlopen(url)
csvfile = ftpstream.read().decode('utf-8')
csvfile = csv.reader(csvfile, delimiter=',')
for row in csvfile:
print(row)
Try to change
csvfile = ftpstream.read().decode('utf-8')
to
csvfile = ftpstream.read().decode('utf-8').split('\r')

MemoryError while converting txt to xlsx

Related questions:
1. Error in converting txt to xlsx using python
Converting txt to xlsx while setting the cell property for number cells as number
My code is
import csv
import openpyxl
import sys
def convert(input_path, output_path):
"""
Read a csv file (with no quoting), and save its contents in an excel file.
"""
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_path) as f:
reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
for row_index, row in enumerate(reader, 1):
for col_index, value in enumerate(row, 1):
ws.cell(row=row_index, column=col_index).value = value
print 'hello world'
wb.save(output_path)
print 'hello world2'
def main():
try:
input_path, output_path = sys.argv[1:]
except ValueError:
print 'Usage: python %s input_path output_path' % (sys.argv[0],)
else:
convert(input_path, output_path)
if __name__ == '__main__':
main()
This code works, except for some input files. I couldn't find what the difference is between the input txt that causes this problem and input txt that doesn't.
My first guess was encoding. I tried changing the encoding of the input file to UTF-8 and UTF-8 with BOM. But this failed.
My second guess was it used literally too much memory. But my computer has SSD with 32 GB RAM.
So perhaps this code is not fully utilizing the capacity of this RAM?
How can I fix this?
Edit: I added that line
print 'hello world'
and
print 'hello world2'
to check if all the parts before 'hello world' are run correctly.
I checked the code prints 'hello world', but not 'hello world2'
So, it really seems likely that
wb.save(output_path)
is causing the problem.
openpyxl has optimised modes for reading and writing large files.
wb = Workbook(write_only=True) will enable this.
I'd also recommend that you install lxml for speed. This is all covered in the documentation.
Below are three alternatives:
RANGE FOR LOOP
Possibly, the two enumerate() calls may have a memory footprint as indexing must occur in a nested for loop. Consider passing csv.reader content into a list (being subscriptable) and use range(). Though admittedly even this may not be efficient as starting in Python 3 each range() call (compared to deprecated xrange) generates its own list in memory as well.
with open(input_path) as f:
reader = csv.reader(f)
row = []
for data in reader:
row.append(data)
for i in range(len(row)):
for j in range(len(row[0])):
ws.cell(row=i, column=j).value = row[i][j]
OPTIMIZED WRITER
OpenPyXL even warns that scrolling through cells even without assigning values will retain them in memory. As a solution, you can use the Optimized Writer using above row list produced from csv.reader. This route appends entire rows in a write-only workbook instance:
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
i = 0
for irow in row:
ws.append(['%s' % j for j in row[j]])
i += 1
wb.save('C:\Path\To\Outputfile.xlsx')
WIN32COM LIBRARY
Finally, consider using the built-in win32com library where you open the csv in Excel and save as an xlsx or xls workbook. Do note this package is only for Python Windows installations.
import win32com.client as win32
excel = win32.Dispatch('Excel.Application')
# OPEN CSV DIRECTLY INSIDE EXCEL
wb = excel.Workbooks.Open(input_path)
excel.Visible = False
outxl=r'C:\Path\To\Outputfile.xlsx'
# SAVE EXCEL AS xlOpenXMLWorkbook TYPE (51)
wb.SaveAs(outxl, FileFormat=51)
wb.Close(False)
excel.Quit()
Here are fews points you can consider:
Check /tmp folder, default folder where tmp files for created;
Your code is utilizing complete space in that folder. Either increase that folder or you can change tmp file path while creating workbook;
I use in memory for performing my task and it worked.
Below is my code:
#!/usr/bin/python
import os
import csv
import io
import sys
import traceback
from xlsxwriter.workbook import Workbook
fileNames=sys.argv[1]
try:
f=open(fileNames, mode='r')
workbook = Workbook(fileNames[:-4] + '.xlsx',{'in_memory': True})
worksheet = workbook.add_worksheet()
workbook.use_zip64()
rowCnt=0
#Create the bold style for the header row
for line in f:
rowCnt = rowCnt + 1
row = line.split("\001")
for j in range(len(row)):
worksheet.write(rowCnt, j, row[j].strip())
f.close()
workbook.close()
print ('success')
except ValueError:
print ('failure')

Transferring CSV data into different Functions in Python

i need some help. Basically, i have to create a function to read a csv file then i have to transfer this data into another function to use the data to generate a xml file.
Here is my code:
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as etree
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
return reader
def generate_xml(reader):
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
current_group = None
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
information = SubElement (current_group, 'hole',{'collar':', '.join((x1,y1,z1)),
'toe':', '.join((x2,y2,z2)),
'cost': cost})
i+=1
def main():
reader = read_csv()
generate_xml(reader)
if __name__=='__main__':
main()
but i get an error when i try to pass reader, the error is: ValueError: I/O operation on closed file
Turning the reader into a list should work:
def read_csv():
with open ('1250_12.csv', 'r') as data:
return list(csv.reader(data))
You tried to read from a closed file. list will trigger the reader to read the whole file.
the with statement tells python to clean up the context manager (in this case, a file) once control exits its body. Since functions exit when they return, there's no way to get data out of it with the file still open.
Other answers suggest reading the whole thing into a list, and returning that; this works, but may be awkward if the file is very large.
Fortunately, we can use generators:
def read_csv():
with open('1250_12.csv', 'r') as data:
reader = csv.reader(data)
for row in reader:
yield row
Since we yield from inside the with, we don't have to clean up the file before getting some rows. Once the data is consumed, (or if the generator is itself cleaned up,) the file will be closed.
So when you read a csv file it is very important to put that file into a list. This is because most operations you cannot perform on the csv.reader file, and that if you do, once you loop through it and it is at the end of the file, you can no longer do anything with it unless you open it and read it again. So lets just change your read_csv function
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
x = [row for row in reader]
return x
Now you are manipulating a list and everything should work perfectly!

add file name without file path to csv in python

I am using Blair's Python script which modifies a CSV file to add the filename as the last column (script appended below). However, instead of adding the file name alone, I also get the Path and File name in the last column.
I run the below script in windows 7 cmd with the following command:
python C:\data\set1\subseta\add_filename.py C:\data\set1\subseta\20100815.csv
The resulting ID field is populated by the following C:\data\set1\subseta\20100815.csv, although, all I need is 20100815.csv.
I'm new to python so any suggestion is appreciated!
import csv
import sys
def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
header = reader.next()
header.append('ID')
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename)
writer.writerow(row)
# Close the file and we're done.
f.close()
# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])
Use os.path.basename(filename). See http://docs.python.org/library/os.path.html for more details.

Categories

Resources