Import the output into a CSV file - python

Desktop.zip contains multiple text files. fun.py is a python program which will print the name of text files from zip and also the number of lines in each file. Everything is okay up to here. But, It will also import this output in a single CSV file. Code :-
import zipfile, csv
file = zipfile.ZipFile("Desktop.zip", "r")
inputcsv = input("Enter the name of the CSV file: ")
csvfile = open(inputcsv,'a')
#list file names
for name in file.namelist():
print (name)
# do stuff with the file object
for name in file.namelist():
with open(name) as fh:
count = 0
for line in fh:
count += 1
print ("File " + name + "line(s) count = " + str(count))
b = open(inputcsv, 'w')
a = csv.writer(b)
data = [name, str(count)]
a.writerows(data)
file.close()
I am expecting output in CSV file like :-
test1.txt, 25
test2.txt, 10
But I am getting this output in CSV file :-
t,e,s,t,1,.,t,x,t
2,5
t,e,s,t,2,.,t,x,t
1,0
Here, test1.txt and test2.txt are the files in Desktop.zip, and 25 and 10 is the number of lines of those files respectively.

writerows takes an iterable of row-representing iterables. You’re passing it a single row, so it interprets each character of each column as a cell. You don’t want that. Use writerow rather than writerows.

I saw a number of issues:
You should open the csv file only once, before the for loop. Open it inside the for loop will override the information from previous loop iteration
icktoofay pointed out that you should use writerow, not writerows
file is a reserve word, you should not use it to name your variable. Besides, it is not that descriptive
You seem to get the file names from the archive, but open the file from the directory (not the ones inside the archive). These two sets of files might not be identical.
Here is my approach:
import csv
import zipfile
with open('out.csv', 'wb') as file_handle:
csv_writer = csv.writer(file_handle)
archive = zipfile.ZipFile('Desktop.zip')
for filename in archive.namelist():
lines = archive.open(filename).read().splitlines()
line_count = len(lines)
csv_writer.writerow([filename, line_count])
My approach has a couple of issues, which might or might not matter:
I assume files in the archive to be text file
I open, read, and split lines in one operation. This might not work well for very large files

The code in your question has multiple issues, as others have pointed out. The two primary ones are that you're recreating the csv file over and over again for each archive member being processed, and then secondly, are passing csvwriter.writerows() the wrong data. It interprets each item in the list you're passing as a separate row to be added to the csv file.
One way to fix that would be to only open the csv file once, before entering a for loop which counts the line in each member of the archive and writes one row to it at time with a call to csvwriter.writerow().
A slightly different way, shown below, does use writerows() but passes it generator expression that processes the each member one-the-fly instead of calling writerow() repeatedly. It also processes each member incrementally, so it doesn't need to read the whole thing into memory at one time and then split it up in order to get a line count.
Although you didn't indicate what version of Python you're using, from the code in your question, I'm guessing it's Python 3.x, so the answer below has been written and tested with that (although it wouldn't be hard to make it work in Python 2.7).
import csv
import zipfile
input_zip_filename = 'Desktop.zip'
output_csv_filename = input("Enter the name of the CSV file to create: ")
# Helper function.
def line_count(archive, filename):
''' Count the lines in specified ZipFile member. '''
with archive.open(filename) as member:
return sum(1 for line in member)
with zipfile.ZipFile(input_zip_filename, 'r') as archive:
# List files in archive.
print('Members of {!r}:'.format(input_zip_filename))
for filename in archive.namelist():
print(' {}'.format(filename))
# Create csv with filenames and line counts.
with open(output_csv_filename, 'w', newline='') as output_csv:
csv.writer(output_csv).writerows(
# generator expression
[filename, line_count(archive, filename)] # contents of one row
for filename in archive.namelist())
Sample format of content in csv file created:
test1.txt,25
test2.txt,10

Related

Create new files, don't overwrite existing files, in python

I'm writing to a file in three functions and i'm trying not overwrite the file. I want every time i run the code i generate a new file
with open("atx.csv", 'w')as output:
writer = csv.writer(output)
If you want to write to different files each time you execute the script, you need to change the file names, otherwise they will be overwritten.
import os
import csv
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
writer.writerow([1, 2, 3]) #or whatever you want to write in the file, this line is just an example
Here I use os.path.exists() to check if a file is already present on the disk, and increment the counter.
First time you run the script, you get axt0.csv, second time axt1.csv, and so on.
Replicate this for your three files.
EDIT
Also note that here I'm using formatted string literals which are available since python3.6. If you have an earlier version of python, use "{}{:d}.csv".format(filename, i) instead of f"{filename}{i}.csv"
EDIT bis after comments
If the same file is needed to be manipulated by more functionsduring the execution of the script, the easiest thing came to my mind is to open the writer outside the functions and pass it as an argument.
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
foo(writer, ...) #put the arguments of your function instead of ...
bar(writer, ...)
etc(writer, ...)
This way each time you call one of the functions it writes to the same file, appending the output at the bottom of the file.
Of course there are other ways. You might check for the file name existence only in the first function you call, and in the others just open the file with the 'a' options, to append the output.
you can do something like this so that each file gets named something a little different and therefore will not be overwritten:
for v in ['1','2','3']:
with open("atx_{}.csv".format(v), 'w')as output:
writer = csv.writer(output)
You are using just one filename. When using the same value as a name atx.csv you will either overwritte it with w or append with a.
If you want new files to be created, just check first if the file is there.
import os
files = os.listdir()
files = [f for f in files if 'atx' in f]
num = str(len(files)) if len(files) > 0 else ''
filename = "atx{0}.csv".format(num)
with open(filename, 'w')as output:
writer = csv.writer(output)
Change with open("atx.csv", 'w') to with open("atx.csv", 'a')
https://www.guru99.com/reading-and-writing-files-in-python.html#2

Python reading nothing from file [duplicate]

I am a beginner of Python. I am trying now figuring out why the second 'for' loop doesn't work in the following script. I mean that I could only get the result of the first 'for' loop, but nothing from the second one. I copied and pasted my script and the data csv in the below.
It will be helpful if you tell me why it goes in this way and how to make the second 'for' loop work as well.
My SCRIPT:
import csv
file = "data.csv"
fh = open(file, 'rb')
read = csv.DictReader(fh)
for e in read:
print(e['a'])
for e in read:
print(e['b'])
"data.csv":
a,b,c
tree,bough,trunk
animal,leg,trunk
fish,fin,body
The csv reader is an iterator over the file. Once you go through it once, you read to the end of the file, so there is no more to read. If you need to go through it again, you can seek to the beginning of the file:
fh.seek(0)
This will reset the file to the beginning so you can read it again. Depending on the code, it may also be necessary to skip the field name header:
next(fh)
This is necessary for your code, since the DictReader consumed that line the first time around to determine the field names, and it's not going to do that again. It may not be necessary for other uses of csv.
If the file isn't too big and you need to do several things with the data, you could also just read the whole thing into a list:
data = list(read)
Then you can do what you want with data.
I have created small piece of function which doe take path of csv file read and return list of dict at once then you loop through list very easily,
def read_csv_data(path):
"""
Reads CSV from given path and Return list of dict with Mapping
"""
data = csv.reader(open(path))
# Read the column names from the first line of the file
fields = data.next()
data_lines = []
for row in data:
items = dict(zip(fields, row))
data_lines.append(items)
return data_lines
Regards

python clear content writing on same file

I am a newbie to python. I have a code in which I must write the contents again to my same file,but when I do it it clears my content.Please help to fix it.
How should I modify my code such that the contents will be written back on the same file?
My code:
import re
numbers = {}
with open('1.txt') as f,open('11.txt', 'w') as f1:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2])
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]]
numbers[words] = [n+1 for n in numbers[words]]
row[1::2] = map(str, numbers[words])
indentation = (re.match(r"\s*", line).group())
print (indentation + ''.join(row))
f1.write(indentation + ''.join(row) + '\n')
In general, it's a bad idea to write over a file you're still processing (or change a data structure over which you are iterating). It can be done...but it requires much care, and there is little safety or restart-ability should something go wrong in the middle (an error, a power failure, etc.)
A better approach is to write a clean new file, then rename it to the old name. For example:
import re
import os
filename = '1.txt'
tempname = "temp{0}_{1}".format(os.getpid(), filename)
numbers = {}
with open(filename) as f, open(tempname, 'w') as f1:
# ... file processing as before
os.rename(tempname, filename)
Here I've dropped filenames (both original and temporary) into variables, so they can be easily referred to multiple times or changed. This also prepares for the moment when you hoist this code into a function (as part of a larger program), as opposed to making it the main line of your program.
You don't strictly need the temporary name to embed the process id, but it's a standard way of making sure the temp file is uniquely named (temp32939_1.txt vs temp_1.txt or tempfile.txt, say).
It may also be helpful to create backups of the files as they were before processing. In which case, before the os.rename(tempname, filename) you can drop in code to move the original data to a safer location or a backup name. E.g.:
backupname = filename + ".bak"
os.rename(filename, backupname)
os.rename(tempname, filename)
While beyond the scope of this question, if you used a read-process-overwrite strategy frequently, it would be possible to create a separate module that abstracted these file-handling details away from your processing code. Here is an example.
Use
open('11.txt', 'a')
To append to the file instead of w for writing (a new or overwriting a file).
If you want to read and modify file in one time use "r+' mode.
f=file('/path/to/file.txt', 'r+')
content=f.read()
content=content.replace('oldstring', 'newstring') #for example change some substring in whole file
f.seek(0) #move to beginning of file
f.write(content)
f.truncate() #clear file conent "tail" on disk if new content shorter then old
f.close()

Sort a big file with Python heapq.merge

I'm looking to complete such job but have encountered difficulty:
I have a huge file of texts. Each line is of the format "AGTCCCGGAT filename" where the first part is a DNA thing.
The professor suggests that we break this huge file into many temporary files and use heapq.merge() to sort them. The goal is to have one file at the end which contains every line of the original file and is sorted.
My first try was to break each line into a separate temporary file. The problem is that heapq.merge() reports there are too many files to sort.
My second try was to break it into temporary files by 50000 lines. The problem is that it seems that it does not sort by line, but by file. for example, we have something like:
ACGTACGT filename
CGTACGTA filename
ACGTCCGT filename
CGTAAAAA filename
where the first two lines are from one temp file and the last two lines are from the second file.
My code to sort them is as follows:
for line in heapq.merge(*[open('/var/tmp/L._Ipsum-strain01.fa_dir/'+str(f),'r') for f in os.listdir('/var/tmp/L._Ipsum-strain01.fa_dir')]):
result.write(line)
result.close()
Your solution is almost correct. However, each partial file must be sorted before you write them to the disk. Here's a 2-pass algorithm that demonstrates it: First, iterate the file in 50k line chunks, sort the lines in a chunk and then write this sorted chunk into a file. In second pass, open all these files and merge to the output file.
from heapq import merge
from itertools import count, islice
from contextlib import ExitStack # not available on Python 2
# need to care for closing files otherwise
chunk_names = []
# chunk and sort
with open('input.txt') as input_file:
for chunk_number in count(1):
# read in next 50k lines and sort them
sorted_chunk = sorted(islice(input_file, 50000))
if not sorted_chunk:
# end of input
break
chunk_name = 'chunk_{}.chk'.format(chunk_number)
chunk_names.append(chunk_name)
with open(chunk_name, 'w') as chunk_file:
chunk_file.writelines(sorted_chunk)
with ExitStack() as stack, open('output.txt', 'w') as output_file:
files = [stack.enter_context(open(chunk)) for chunk in chunk_names]
output_file.writelines(merge(*files))

Python: Writing to a file and reading it back out

I am saving a dictionary of student names as keys and grades lists as values. I am attempting to write the values to a file. At the moment I am writing them as strings.
def save_records(students, filename):
#saves student records to a file
out_file = open(filename, "w")
for x in students.keys():
out_file.write(x + " " + str(students[x]) + "\n")
out_file.close()
After saving the file, I try to read it back. The pertinent part of the read out is below.
while True:
in_line = in_file.readline()
if not in_line:
break
#deletes line read in
in_line = in_line[:-1]
#initialize grades list
in_line = in_line.split()
name = in_line[0]
students[name] = map(int, in_line[1:])
The read out code works well for normal text files that are pre-formatted. The format of the textfile is: key (whitespace) values separated by whitespace "\n". I would like to know how to write in to a text file by combining string and list elements.
If you have control over writing the data, I would recommend using a well-established format, such as JSON or INI. This would allow you to make use of common libraries, such as the json or ConfigParser modules, respectively.
Would it not be easier to use something like pythons pickle which is for storing things like dicts
...and then pretty print output to a separate file?
It's hard to say without knowing how you plan on using this...
Since students[name] = map(int, in_line[1:]), i assume you want to print the items of the list student[x] with whitespaces inbetween.
You could use the str.join method
' '.join(map(str,students[x]))
You may want to consider using Comma Separated Value files (aka csv files) instead of plain text files, as these provide a more structured way to read and write your data. Once written, you can open them in a spreadsheet program like Excel to view and edit their contents.
Re-writing your functions to work with csv files, and assuming you are using Python 2.x, we get something like:
import csv
def save_records(students, filename):
# note that csv files are binary so on Windows you
# must write in 'wb' mode; also note the use of `with`
# which ensures the file is closed once the block is
# exited.
with open(filename, 'wb') as f:
# create a csv.writer object
csv_out = csv.writer(f)
for name, grades in students.iteritems():
# write a single data row to the file
csv_out.writerow([name]+grades)
def read_records(filename):
students = dict()
# note that we must use 'rb' to read in binary mode
with open(filename, 'rb') as f:
# create a csv.reader object
csv_in = csv.reader(f)
for line in csv_in:
# name will have type `str`
name = line[0]
grades = [int(x) for x in line[1:]]
# update the `students` dictionary
students[name] = grades
return students

Categories

Resources