i need some help. Basically, i have to create a function to read a csv file then i have to transfer this data into another function to use the data to generate a xml file.
Here is my code:
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as etree
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
return reader
def generate_xml(reader):
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
current_group = None
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
information = SubElement (current_group, 'hole',{'collar':', '.join((x1,y1,z1)),
'toe':', '.join((x2,y2,z2)),
'cost': cost})
i+=1
def main():
reader = read_csv()
generate_xml(reader)
if __name__=='__main__':
main()
but i get an error when i try to pass reader, the error is: ValueError: I/O operation on closed file
Turning the reader into a list should work:
def read_csv():
with open ('1250_12.csv', 'r') as data:
return list(csv.reader(data))
You tried to read from a closed file. list will trigger the reader to read the whole file.
the with statement tells python to clean up the context manager (in this case, a file) once control exits its body. Since functions exit when they return, there's no way to get data out of it with the file still open.
Other answers suggest reading the whole thing into a list, and returning that; this works, but may be awkward if the file is very large.
Fortunately, we can use generators:
def read_csv():
with open('1250_12.csv', 'r') as data:
reader = csv.reader(data)
for row in reader:
yield row
Since we yield from inside the with, we don't have to clean up the file before getting some rows. Once the data is consumed, (or if the generator is itself cleaned up,) the file will be closed.
So when you read a csv file it is very important to put that file into a list. This is because most operations you cannot perform on the csv.reader file, and that if you do, once you loop through it and it is at the end of the file, you can no longer do anything with it unless you open it and read it again. So lets just change your read_csv function
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
x = [row for row in reader]
return x
Now you are manipulating a list and everything should work perfectly!
Related
I'am having trouble with python csv module I'am trying to write a newline in a csv file is there any reson why it would not work?
Code:
csv writing function
def write_response_csv(name,games,mins):
with open("sport_team.csv",'w',newline='',encoding='utf-8') as csv_file:
fieldnames=['Vardas','Žaidimai','Minutės']
writer = csv.DictWriter(csv_file,fieldnames=fieldnames)
writer.writeheader()
writer.writerow({'Vardas':name,'Žaidimai':games,"Minutės":mins})
with requests.get(url,headers=headers) as page:
content = soup(page.content,'html.parser')
content = content.findAll('table',class_='table01 tablesorter')
names = find_name(content)
times = 0
for name in names:
matches = find_matches(content,times)
min_in_matches = find_min(content,times)
times +=1
csv_file = write_response_csv(name,matches,min_in_matches)
try:
print(name,matches,min_in_matches)
except:
pass
When you call your write_response_csv function it is reopening the file and starting at line 1 again in the csv file and each new line of data you are passing to that function is overwriting the previous one written. What you could do try is creating the csv file outside of the scope of your writer function and setting your writer function to append mode instead of write mode. This will ensure that it will write the data on the next empty csv line, instead of starting at line 1.
#Outside of function scope
fieldnames=['Vardas','Žaidimai','Minutės']
#Create sport_team.csv file w/ headers
with open('sport_team.csv', 'w',encoding='utf-8') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames)
writer.writeheader()
#Write response function
def write_response_csv(name,games,mins):
with open('sport_team.csv','a',encoding='utf-8') as csv_file:
writer = csv.DictWriter(csv_file, fieldnames)
writer.writerow({'Vardas':name,'Žaidimai':games,"Minutės":mins})
Note:
You will run into the same issue if you are reusing this script to continuously add new lines of data to the same file because each time you run it the code that creates the csv file will essentially recreate a blank sport_team.csv file with the headers. If you would like to reuse the code to continuously add new data, I would look into using os.path and utilizing it to confirm if sport_team.csv exists already and if so, to not run that code after the fieldnames.
Try using metabob, it find code errors for you. I've been using it as a Python beginner, and has been pretty successful with it.
I've written a script in python which is able to fetch the title of different posts from a webpage and write them to a csv file. As the site updates it's content very frequently, I like to append the new result first in that csv file where there are already list of old titles available.
I've tried with:
import csv
import time
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/questions/tagged/python"
def get_information(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
for title in soup.select(".summary .question-hyperlink"):
yield title.text
if __name__ == '__main__':
while True:
with open("output.csv","a",newline="") as f:
writer = csv.writer(f)
writer.writerow(['posts'])
for items in get_information(url):
writer.writerow([items])
print(items)
time.sleep(300)
The above script which when run twice can append the new results after the old results.
Old data are like:
A
F
G
T
New data are W,Q,U.
The csv file should look like below when I rerun the script:
W
Q
U
A
F
G
T
How can I append the new result first in an existing csv file having old data?
Inserting data anywhere in a file except at the end requires rewriting the whole thing. To do this without reading its entire contents into memory first, you could create a temporary csv file with the new data in it, append the data from the existing file to that, delete the old file and rename the new one.
Here's and example of what I mean (using a dummy get_information() function to simplify testing).
import csv
import os
from tempfile import NamedTemporaryFile
url = 'https://stackoverflow.com/questions/tagged/python'
csv_filepath = 'updated.csv'
# For testing, create a existing file.
if not os.path.exists(csv_filepath):
with open(csv_filepath, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows([item] for item in 'AFGT')
# Dummy for testing.
def get_information(url):
for item in 'WQU':
yield item
if __name__ == '__main__':
folder = os.path.abspath(os.path.dirname(csv_filepath)) # Get dir of existing file.
with NamedTemporaryFile(mode='w', newline='', suffix='.csv',
dir=folder, delete=False) as newf:
temp_filename = newf.name # Save filename.
# Put new data into the temporary file.
writer = csv.writer(newf)
for item in get_information(url):
writer.writerow([item])
print([item])
# Append contents of existing file to new one.
with open(csv_filepath, 'r', newline='') as oldf:
reader = csv.reader(oldf)
for row in reader:
writer.writerow(row)
print(row)
os.remove(csv_filepath) # Delete old file.
os.rename(temp_filename, csv_filepath) # Rename temporary file.
Since you intend to change the position of every element of the table, you need to read the table into memory and rewrite the entire file, starting with the new elements.
You may find it easier to (1) write the new element to a new file, (2) open the old file and append its contents to the new file, and (3) move the new file to the original (old) file name.
I'm new to bonobo-etl and I'm trying to write a job that loads multiple files at once but I can't get the CsvReader to work with the #use_context_processor annotation. A snippet of my code:
def input_file(self, context):
yield 'test1.csv'
yield 'test2.csv'
yield 'test3.csv'
#use_context_processor(input_file)
def extract(f):
return bonobo.CsvReader(path=f,delimiter='|')
def load(*args):
print(*args)
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(extract,load)
return graph
When I run the job I get something like <bonobo.nodes.io.csv.CsvReader object at 0x7f849678dc88> rather than the lines of the CSV.
If I hardcode the reader like graph.add_chain(bonobo.CsvReader(path='test1.csv',delimiter='|'),load), it works.
Any help would be appreciated.
Thank you.
As bonobo.CsvReader does not support (yet) to read file names from the input stream, you need use a custom reader for that.
Here is a solution that works for me on a set of csvs I have:
import bonobo
import bonobo.config
import bonobo.util
import glob
import csv
#bonobo.config.use_context
def read_multi_csv(context, name):
with open(name) as f:
reader = csv.reader(f, delimiter=';')
headers = next(reader)
if not context.output_type:
context.set_output_fields(headers)
for row in reader:
yield tuple(row)
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
glob.glob('prenoms_*.csv'),
read_multi_csv,
bonobo.PrettyPrinter(),
)
return graph
if __name__ == '__main__':
with bonobo.parse_args() as options:
bonobo.run(get_graph(**options))
Few comments on this snippet, in reading order:
use_context decorator will inject the node execution context to the transformation call, allowing to use .set_output_fields(...) using the first csv headers.
Other csv headers are ignored, in my case they're all the same. You may need a slightly more complex logic for your own case.
Then, we just generate the filenames in a bonobo.Graph instance using glob.glob (in my case, the stream will contain: prenoms_2004.csv prenoms_2005.csv ... prenoms_2011.csv prenoms_2012.csv) and pass it to our custom reader, which will be called once for each file, open it, and yield its lines.
Hope that helps!
I have read the documentation and a few additional posts on SO and other various places, but I can't quite figure out this concept:
When you call csvFilename = gzip.open(filename, 'rb') then reader = csv.reader(open(csvFilename)), is that reader not a valid csv file?
I am trying to solve the problem outlined below, and am getting a coercing to Unicode: need string or buffer, GzipFile found error on line 41 and 7 (highlighted below), leading me to believe that the gzip.open and csv.reader do not work as I had previously thought.
Problem I am trying to solve
I am trying to take a results.csv.gz and convert it to a results.csv so that I can turn the results.csv into a python dictionary and then combine it with another python dictionary.
File 1:
alertFile = payload.get('results_file')
alertDataCSV = rh.dataToDict(alertFile) # LINE 41
alertDataTotal = rh.mergeTwoDicts(splunkParams, alertDataCSV)
Calls File 2:
import gzip
import csv
def dataToDict(filename):
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename)) # LINE 7
alertData={}
for row in reader:
alertData[row[0]]=row[1:]
return alertData
def mergeTwoDicts(dictA, dictB):
dictC = dictA.copy()
dictC.update(dictB)
return dictC
*edit: also forgive my non-PEP style of naming in Python
gzip.open returns a file-like object (same as what plain open returns), not the name of the decompressed file. Simply pass the result directly to csv.reader and it will work (the csv.reader will receive the decompressed lines). csv does expect text though, so on Python 3 you need to open it to read as text (on Python 2 'rb' is fine, the module doesn't deal with encodings, but then, neither does the csv module). Simply change:
csvFilename = gzip.open(filename, 'rb')
reader = csv.reader(open(csvFilename))
to:
# Python 2
csvFile = gzip.open(filename, 'rb')
reader = csv.reader(csvFile) # No reopening involved
# Python 3
csvFile = gzip.open(filename, 'rt', newline='') # Open in text mode, not binary, no line ending translation
reader = csv.reader(csvFile) # No reopening involved
The following worked for me for python==3.7.9:
import gzip
my_filename = my_compressed_file.csv.gz
with gzip.open(my_filename, 'rt') as gz_file:
data = gz_file.read() # read decompressed data
with open(my_filename[:-3], 'wt') as out_file:
out_file.write(data) # write decompressed data
my_filename[:-3] is to get the actual filename so that it does get a random filename.
I've got a section of code in a project that's supposed to be reading a CSV file and writing each row to an XLSX file. Right now I'm getting the error "argument 1 must be an iterator" when I run via command line.
Here is the relevant code:
import os
import openpyxl
import csv
from datetime import datetime
from openpyxl.reader.excel import load_workbook
...
plannum = 4
...
alldata_sheetname = ("All Test Data " + str(plannum))
wb = load_workbook("testingtemplate.xlsx", keep_vba=True)
...
ws_testdata = wb.get_sheet_by_name(alldata_sheetname)
...
with open("testdata.csv", 'r') as csvfile:
table = csv.reader(csvfile)
for row in table:
ws_testdata.append(row)
csv_read = csv.reader(csvfile)
...
And the specific error reads: "TypeError: argument 1 must be an iterator", and is referencing the last line of code I've provided.
Since it didn't complain about the first time I used csvfile, would it be better if I did something like csvfile = open("testdata.csv", "r") instead of using the with (and is that what I'm doing wrong here)? If that's the case, is there anything else I need to change?
Thanks to anyone who helps!!
You've closed the file by the time you get to csv_read = csv.reader(csvfile). Alternately you can keep the file open and store what you need in variables so you don't have to iterate over the file twice. E.g.:
csvfile = open("testdata.csv", "r")
table = csv.reader(csvfile)
for row in table:
ws_testdata.append(row)
# store what you need in variables
csvfile.close()