ValueError: I/O operation on closed file after opening file - python

I have this code:
import os
csv_out = 'femaleconsolidated.csv'
csv_list = [r'C:\Users\PycharmProjects\filemerger\Female\outputA.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputB.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputC.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputD.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputE.csv',
r'C:\Users\PycharmProjects\filemerger\Female\outputother.csv']
print(csv_list)
csv_merge = open(csv_out, 'w')
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()
print('Verify consolidated CSV file : ' + csv_out)
The code is to merge CSVs.
Surely open(file) should open the file but I get this:
csv_merge.write(line)
ValueError: I/O operation on closed file.
What could be causing this?

csv_merge.close() this should sit outside of the for loop - since you are still writing to csv_merge in next iteration :
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()

Use pandas:
import pandas as pd
import glob
dfs = glob.glob('path/*.csv') #path to all of your csv
result = pd.concat([pd.read_csv(df) for df in dfs], ignore_index=True)
result.to_csv('path/femaleconsolidated.csv', ignore_index=True)

If you want to writer line by line you should try this. You dont need to worry for closing the file when using this package - csv
import csv
destination = open(csv_out , 'w')
csvwriter = csv.writer(destination)
...
for line in csv_in:
csvwriter.writerow(line)
If you just want to merge all files in a single one, there are more efficient ways to do it, not line by line. You can check this one
https://www.freecodecamp.org/news/how-to-combine-multiple-csv-files-with-8-lines-of-code-265183e0854/

Your for statement should be inside the with block:
with open(csv_out, 'w') as csv_merge:
for file in csv_list:
csv_in = open(file)
for line in csv_in:
csv_merge.write(line)
csv_in.close()
csv_merge.close()
print('Verify consolidated CSV file : ' + csv_out)

Related

Reading and writing a csv file Python

I just started learning Python, and I am trying to do the following:
- Read a .csv file
- Write the filtered data in a new file where the column 7 is not blank/empty
When I am printing my results, it shows the right output in the python shelf, but when I am checking my data in the .csv is no correct (differs from what is showing with the print function)
Any suggestion with my code?
Thank you in advance.
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writeFile.write(line)
I agree with #user513093 that you can use csv, like:
file = open("station.csv", "r")
writeFile = open("stations-filtered.csv", "w")
writer = csv.writer(writeFile, delimiter=',')
for line in file:
line2 = line.split(",")
if line2[7] != "":
print(line)
writer.writerow(line)
But still, pandas is good:
import pandas as pd
file = pd.read_csv("station.csv", sep=",", header=None)
file = file[file[7] != ""]
file.to_csv("stations-filtered.csv")

Read multiple gzip files to 1 fileobject in python

i want to read multiple gzip file to 1 file object
currently i am doing
import gzip
a = gzip.open(path2zipfile1)
for line in a.readline()
#do some stuff
but i need to read from say 2 files
a = gzip.open(path2zipfile1) #read zip1
a = gzip.open(path2zipfile2, 'rU') #appending file object with contents of 2nd file
for line in a.readlines()
#this should give me contents from zip1 then zip2
unable to find the right mode to do so
use itertools.chain:
import itertools, gzip
files = ['path2zipfile1', 'path2zipfile2']
it = (gzip.open(f, 'rt') for f in files)
for line in itertools.chain.from_iterable(it):
print(line)
another version without itertools:
def gen(files):
for f in files:
fo = gzip.open(f, 'rt')
while True:
line = fo.readline()
if not line:
break
yield line
files = ['path2zipfile1', 'path2zipfile2']
for line in gen(files):
print(line)

Open Multiple text files and call function

Below code works perfectly, where it opens one text files and function parse_messages gets as parameter
def parse_messages(hl7):
hl7_msgs = hl7.split("MSH|")
hl7_msgs = ["{}{}".format("MSH|", x) for x in hl7_msgs if x]
for hl7_msg in hl7_msgs:
#does something..
with open('sample.txt', 'r') as f:
hl7 = f.read()
df = parse_messages(hl7)
But now I have multiple text files in directory. I want to do open each one then call from parse_messages function. Here is what I tried so far.
But this only read last text file, not all of them
import glob
data_directory = "C:/Users/.../"
hl7_file = glob.glob(data_directory + '*.txt')
for file in hl7_file:
with open(file, 'r') as hl7:
hl7 = f.read()
df = parse_messages(hl7)
in your read file loop for file in hl7_file, you are overwrite hl7 at every iteration leaving only the last read store at hl7
You probably wanna concatenate all contents of the files together
hl7 = ''
for file in hl7_file:
with open(file, 'r') as f:
hl7 += f.read()
df = parse_messages(hl7) # process all concatenate contents together
or you can call parse_messages function inside the loop with df list store the results as below
df = []
for file in hl7_file:
with open(file, 'r') as f:
hl7 = f.read()
df.append(parse_messages(hl7))
# df[0] holds the result for 1st file read, df[1] for 2nd file and so on
This should work, if I understood what you want to do
import os
all = []
files = [x for x in os.listdir() if x.endswith(".txt")]
for x in files:
with open(x, encoding='utf-8','r') as fileobj:
content = fileobj.read()
all.append(parse_message(content))

Python read CSV file columns and write file name and column name in a csv file

I have many CSV files, need to read all the files in loop and write file name and all the columns (header in row 1) in an output file.
Example
Input csv file 1 (test1.csv)
Id, Name, Age, Location
1, A, 25, India
Input csv file 2 (test2.csv)
Id, ProductName
1, ABC
Outputfile
test1.csv Id
test1.csv Name
test1.csv Age
test1.csv Location
test2.csv Id
test2.csv ProductName
Many thanks for your help.
Update:
This code works fine for this purpose:
import os
import csv
ofile = open('D:\Anuj\Personal\OutputFile/AHS_File_Columns_Info.csv', 'w')
directory = os.path.join('D:\Anuj\Personal\Python')
for root, dirs, files in os.walk(directory):
for file in files:
fullfilepath = directory + "/" + file
with open(fullfilepath,'r') as f:
output = file +','+ f.readline()
ofile.write(output)
clean solution using csv module for reading and writing
open output file and create a csv.writer instance on its handle
open each input file and create a csv.reader instance on their handle
get first row using next on the csv.reader iterator: gets titles as list (with a small post-processing to remove the spaces)
write titles alongside the current filename in a loop
code:
import csv
files=["test1.csv","test2.csv"]
with open("output.tsv","w",newline='') as fw:
cw = csv.writer(fw,delimiter="\t") # output is tab delimited
for filename in files:
with open(filename,'r') as f:
cr = csv.reader(f)
# get title
for column_name in (x.strip() for x in next(cr)):
cw.writerow([filename,column_name])
There are several advantages using csv module, the most important being that quoting & multi-line fields/titles are managed properly.
But I'm not sure I understand you correctly.
import csv
from typing import List
from typing import Tuple
TableType = List[List[str]]
def load_csv_table(file_name: str) -> Tuple[List[str], TableType]:
with open(file_name) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
headers = next(csv_reader)
data_table = list(csv_reader)
return headers, data_table
def save_csv_table(file_name: str, headers: List[str], data_table: TableType):
with open(file_name, 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(headers)
for row in data_table:
writer.writerow(row)
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
new_table = []
new_headers = []
for file_name in input_files:
headers, data_table = load_csv_table(file_name)
if not new_headers:
new_headers = ['Source'] + headers
new_table.extend(([file_name] + line for line in data_table))
save_csv_table('output.csv', new_headers, new_table)
A simple method is to use readline() on the file object:
files=["test1.csv","test2.csv"]
for my_file in files:
with open(my_file,'r') as f:
print my_file, f.readline()

Python read in file: ERROR: line contains NULL byte

I would like to parse an .ubx File(=my input file). This file contains many different NMEA sentences as well as raw receiver data. The output file should just contain informations out of GGA sentences. This works fine as far as the .ubx File does not contain any raw messages. However if it contains raw data
I get the following error:
Traceback (most recent call last):
File "C:...myParser.py", line 25, in
for row in reader:
Error: line contains NULL byte
My code looks like this:
import csv
from datetime import datetime
import math
# adapt this to your file
INPUT_FILENAME = 'Rover.ubx'
OUTPUT_FILENAME = 'out2.csv'
# open the input file in read mode
with open(INPUT_FILENAME, 'r') as input_file:
# open the output file in write mode
with open(OUTPUT_FILENAME, 'wt') as output_file:
# create a csv reader object from the input file (nmea files are basically csv)
reader = csv.reader(input_file)
# create a csv writer object for the output file
writer = csv.writer(output_file, delimiter=',', lineterminator='\n')
# write the header line to the csv file
writer.writerow(['Time','Longitude','Latitude','Altitude','Quality','Number of Sat.','HDOP','Geoid seperation','diffAge'])
# iterate over all the rows in the nmea file
for row in reader:
if row[0].startswith('$GNGGA'):
time = row[1]
# merge the time and date columns into one Python datetime object (usually more convenient than having both separately)
date_and_time = datetime.strptime(time, '%H%M%S.%f')
date_and_time = date_and_time.strftime('%H:%M:%S.%f')[:-6] #
writer.writerow([date_and_time])
My .ubx file looks like this:
$GNGSA,A,3,16,25,29,20,31,26,05,21,,,,,1.30,0.70,1.10*10
$GNGSA,A,3,88,79,78,81,82,80,72,,,,,,1.30,0.70,1.10*16
$GPGSV,4,1,13,02,08,040,17,04,,,47,05,18,071,44,09,02,348,24*49
$GPGSV,4,2,13,12,03,118,24,16,12,298,36,20,15,118,30,21,44,179,51*74
$GPGSV,4,3,13,23,06,324,35,25,37,121,47,26,40,299,48,29,60,061,49*73
$GPGSV,4,4,13,31,52,239,51*42
$GLGSV,3,1,10,65,07,076,24,70,01,085,,71,04,342,34,72,13,029,35*64
$GLGSV,3,2,10,78,35,164,41,79,75,214,48,80,34,322,46,81,79,269,49*64
$GLGSV,3,3,10,82,28,235,52,88,39,043,43*6D
$GNGLL,4951.69412,N,00839.03672,E,124610.00,A,D*71
$GNGST,124610.00,12,,,,0.010,0.010,0.010*4B
$GNZDA,124610.00,03,07,2016,00,00*79
µb<  ¸½¸Abð½ . SB éF é v.¥ # 1 f =•Iè ,
Ïÿÿ£Ëÿÿd¡ ¬M 0+ùÿÿ³øÿÿµj #ª ² -K*
,¨ , éºJU /) ++ f 5 .lG NL C8G /{; „> é óK 3 — Bòl . "¿ 2 bm¡
4âH ÐM X cRˆ 35 »7 Óo‡ž "*ßÿÿØÜÿÿUhQ`
3ŒðÿÿÂïÿÿþþûù ÂÈÿÿñÅÿÿJX ES
$²I uM N:w (YÃÿÿV¿ÿÿ> =ìî 1¥éÿÿèÿÿmk³m /?ÔÿÿÒÿÿšz+Ú ­Ïÿÿ6ÍÿÿêwÇ\ ? ]? ˜B Aÿƒ y µbÐD‹lçtæ#p3,}ßœŒ-vAh
¿M"A‚UE ôû JQý
'wA´üát¸jžAÀ‚"Å
)DÂï–ŽtAöÙüñÅ›A|$Å ôû/ Ìcd§ÇørA†áãì˜AØY–Ä ôû1 /Áƒ´zsAc5+_’ô™AìéNÅ ôû( ¶y(,wvAFøÈV§ƒA˜ÝwE ôû$ _S R‰wAhÙ]‘ÑëžAÇ9Å vwAòܧsAŒöƒd§Ò™AÜOÄ ôû3 kœÕ}vA;D.ž‡žAÒûàÄ #ˆ" ϬŸ ntAfˆÞ3ךA~Y2E ôû3 :GVtAæ93l)ÆšAß yE ôû4 Uþy.TwA<âƒ' ¦žAhmëC ôû" ¯4Çï ›wAþ‰Ì½6ŸAŠû¶D ~~xI]tA<ÞÿrÁšAmHE ôû/ ÖÆ#ÈgŸsAXnþ‚†4šA'0tE ôû. ·ÈO:’
sA¢B†i™Aë%
E ôû/ >Þ,À8vA°‚9êœA>ÇD ôû, ø(¼+çŠuAÆOÁ לAÈΆD
ôû# ¨Ä-_c¯qAuÓ?]> —AÐкà ôû0 ÆUV¨ØZsA]ðÛñß™AÛ'Å ôû, ™mv7žqAYÐ:›Ä‘—AdWxD ôû1 ûö>%vA}„
ëV˜A.êbE
AÝ$GNRMC,124611.00,A,4951.69413,N,00839.03672,E,0.009,,030716,,,D*62
$GNVTG,,T,,M,0.009,N,0.016,K,D*36
$GNGNS,124611.00,4951.69413,N,00839.03672,E,RR,15,0.70,162.5,47.6,1.0,0000*42
$GNGGA,124611.00,4951.69413,N,00839.03672,E,4,12,0.70,162.5,M,47.6,M,1.0,0000*6A
$GNGSA,A,3,16,25,29,20,31,26,05,21,,,,,1.31,0.70,1.10*11
$GNGSA,A,3,88,79,78,81,82,80,72,,,,,,1.31,0.70,1.10*17
$GPGSV,4,1,13,02,08,040,18,04,,,47,05,18,071,44,09,02,348,21*43
$GPGSV,4,2,13,12,03,118,24,16,
I already searched for similar problems. However I was not able to find a solution which workes for me.
I ended up with code like that:
import csv
def unfussy_reader(csv_reader):
while True:
try:
yield next(csv_reader)
except csv.Error:
# log the problem or whatever
print("Problem with some row")
continue
if __name__ == '__main__':
#
# Generate malformed csv file for
# demonstration purposes
#
with open("temp.csv", "w") as fout:
fout.write("abc,def\nghi\x00,klm\n123,456")
#
# Open the malformed file for reading, fire up a
# conventional CSV reader over it, wrap that reader
# in our "unfussy" generator and enumerate over that
# generator.
#
with open("Rover.ubx") as fin:
reader = unfussy_reader(csv.reader(fin))
for n, row in enumerate(reader):
fout.write(row[0])
However I was not able to simply write a file containing just all the rows read in with the unfuss_reader wrapper using the above code.
Would be glad if you could help me.
Here is an Image of how the .ubx file looks in notepad++image
Thanks!
I am not quite sure but your file looks pretty binary. You should try to open it as such
with open(INPUT_FILENAME, 'rb') as input_file:
It seems like you did not open the file with correct coding format.
So the raw message cannot be read correctly.
If it is encoded as UTF8, you need to open the file with coding option:
with open(INPUT_FILENAME, 'r', newline='', encoding='utf8') as input_file
Hey if anyone else has this proglem to read in NMEA sentences of uBlox .ubx files
this pyhton code worked for me:
def read_in():
with open('GNGGA.txt', 'w') as GNGGA:
with open('GNRMC.txt','w') as GNRMC:
with open('rover.ubx', 'rb') as f:
for line in f:
#print line
if line.startswith('$GNGGA'):
#print line
GNGGA.write(line)
if line.startswith('$GNRMC'):
GNRMC.write(line)
read_in()
You could also use the gnssdump command line utility which is installed with the PyGPSClient and pygnssutils Python packages.
e.g.
gnssdump filename=Rover.ubx msgfilter=GNGGA
See gnssdump -h for help.
Alternatively if you want a simple Python script you could use the pyubx2 Python package, e.g.
from pyubx2 import UBXReader
with open("Rover.ubx", "rb") as stream:
ubr = UBXReader(stream)
for (_, parsed_data) in ubr.iterate():
if parsed_data.identity in ("GNGGA", "GNRMC"):
print(parsed_data)

Categories

Resources