Create new files, don't overwrite existing files, in python - python

I'm writing to a file in three functions and i'm trying not overwrite the file. I want every time i run the code i generate a new file
with open("atx.csv", 'w')as output:
writer = csv.writer(output)

If you want to write to different files each time you execute the script, you need to change the file names, otherwise they will be overwritten.
import os
import csv
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
writer.writerow([1, 2, 3]) #or whatever you want to write in the file, this line is just an example
Here I use os.path.exists() to check if a file is already present on the disk, and increment the counter.
First time you run the script, you get axt0.csv, second time axt1.csv, and so on.
Replicate this for your three files.
EDIT
Also note that here I'm using formatted string literals which are available since python3.6. If you have an earlier version of python, use "{}{:d}.csv".format(filename, i) instead of f"{filename}{i}.csv"
EDIT bis after comments
If the same file is needed to be manipulated by more functionsduring the execution of the script, the easiest thing came to my mind is to open the writer outside the functions and pass it as an argument.
filename = "atx"
i = 0
while os.path.exists(f"{filename}{i}.csv"):
i += 1
with open(f"{filename}{i}.csv", 'w') as output:
writer = csv.writer(output)
foo(writer, ...) #put the arguments of your function instead of ...
bar(writer, ...)
etc(writer, ...)
This way each time you call one of the functions it writes to the same file, appending the output at the bottom of the file.
Of course there are other ways. You might check for the file name existence only in the first function you call, and in the others just open the file with the 'a' options, to append the output.

you can do something like this so that each file gets named something a little different and therefore will not be overwritten:
for v in ['1','2','3']:
with open("atx_{}.csv".format(v), 'w')as output:
writer = csv.writer(output)

You are using just one filename. When using the same value as a name atx.csv you will either overwritte it with w or append with a.
If you want new files to be created, just check first if the file is there.
import os
files = os.listdir()
files = [f for f in files if 'atx' in f]
num = str(len(files)) if len(files) > 0 else ''
filename = "atx{0}.csv".format(num)
with open(filename, 'w')as output:
writer = csv.writer(output)

Change with open("atx.csv", 'w') to with open("atx.csv", 'a')
https://www.guru99.com/reading-and-writing-files-in-python.html#2

Related

How to increment variable every time script is run in Python?

I have a Python script that I want to increment a global variable every time it is run. Is this possible?
Pretty easy to do with an external file, you can create a function to do that for you so you can use multiple files for multiple vars if needed, although in that case you might want to look into some sort of serialization and store everything in the same file. Here's a simple way to do it:
def get_var_value(filename="varstore.dat"):
with open(filename, "a+") as f:
f.seek(0)
val = int(f.read() or 0) + 1
f.seek(0)
f.truncate()
f.write(str(val))
return val
your_counter = get_var_value()
print("This script has been run {} times.".format(your_counter))
# This script has been run 1 times
# This script has been run 2 times
# etc.
It will store in varstore.dat by default, but you can use get_var_value("different_store.dat") for a different counter file.
example:-
import os
if not os.path.exists('log.txt'):
with open('log.txt','w') as f:
f.write('0')
with open('log.txt','r') as f:
st = int(f.read())
st+=1
with open('log.txt','w') as f:
f.write(str(st))
Each time you run your script,the value inside log.txt will increment by one.You can make use of it if you need to.
Yes, you need to store the value into a file and load it back when the program runs again. This is called program state serialization or persistency.
For a code example:
with open("store.txt",'r') as f: #open a file in the same folder
a = f.readlines() #read from file to variable a
#use the data read
b = int(a[0]) #get integer at first position
b = b+1 #increment
with open("store.txt",'w') as f: #open same file
f.write(str(b)) #writing a assuming it has been changed
The a variable will I think be a list when using readlines.

Using variable as part of name of new file in python

I'm fairly new to python and I'm having an issue with my python script (split_fasta.py). Here is an example of my issue:
list = ["1.fasta", "2.fasta", "3.fasta"]
for file in list:
contents = open(file, "r")
for line in contents:
if line[0] == ">":
new_file = open(file + "_chromosome.fasta", "w")
new_file.write(line)
I've left the bottom part of the program out because it's not needed. My issue is that when I run this program in the same direcoty as my fasta123 files, it works great:
python split_fasta.py *.fasta
But if I'm in a different directory and I want the program to output the new files (eg. 1.fasta_chromsome.fasta) to my current directory...it doesn't:
python /home/bin/split_fasta.py /home/data/*.fasta
This still creates the new files in the same directory as the fasta files. The issue here I'm sure is with this line:
new_file = open(file + "_chromosome.fasta", "w")
Because if I change it to this:
new_file = open("seq" + "_chromosome.fasta", "w")
It creates an output file in my current directory.
I hope this makes sense to some of you and that I can get some suggestions.
You are giving the full path of the old file, plus a new name. So basically, if file == /home/data/something.fasta, the output file will be file + "_chromosome.fasta" which is /home/data/something.fasta_chromosome.fasta
If you use os.path.basename on file, you will get the name of the file (i.e. in my example, something.fasta)
From #Adam Smith
You can use os.path.splitext to get rid of the .fasta
basename, _ = os.path.splitext(os.path.basename(file))
Getting back to the code example, I saw many things not recommended in Python. I'll go in details.
Avoid shadowing builtin names, such as list, str, int... It is not explicit and can lead to potential issues later.
When opening a file for reading or writing, you should use the with syntax. This is highly recommended since it takes care to close the file.
with open(filename, "r") as f:
data = f.read()
with open(new_filename, "w") as f:
f.write(data)
If you have an empty line in your file, line[0] == ... will result in a IndexError exception. Use line.startswith(...) instead.
Final code :
files = ["1.fasta", "2.fasta", "3.fasta"]
for file in files:
with open(file, "r") as input:
for line in input:
if line.startswith(">"):
new_name = os.path.splitext(os.path.basename(file)) + "_chromosome.fasta"
with open(new_name, "w") as output:
output.write(line)
Often, people come at me and say "that's hugly". Not really :). The levels of indentation makes clear what is which context.

Import the output into a CSV file

Desktop.zip contains multiple text files. fun.py is a python program which will print the name of text files from zip and also the number of lines in each file. Everything is okay up to here. But, It will also import this output in a single CSV file. Code :-
import zipfile, csv
file = zipfile.ZipFile("Desktop.zip", "r")
inputcsv = input("Enter the name of the CSV file: ")
csvfile = open(inputcsv,'a')
#list file names
for name in file.namelist():
print (name)
# do stuff with the file object
for name in file.namelist():
with open(name) as fh:
count = 0
for line in fh:
count += 1
print ("File " + name + "line(s) count = " + str(count))
b = open(inputcsv, 'w')
a = csv.writer(b)
data = [name, str(count)]
a.writerows(data)
file.close()
I am expecting output in CSV file like :-
test1.txt, 25
test2.txt, 10
But I am getting this output in CSV file :-
t,e,s,t,1,.,t,x,t
2,5
t,e,s,t,2,.,t,x,t
1,0
Here, test1.txt and test2.txt are the files in Desktop.zip, and 25 and 10 is the number of lines of those files respectively.
writerows takes an iterable of row-representing iterables. You’re passing it a single row, so it interprets each character of each column as a cell. You don’t want that. Use writerow rather than writerows.
I saw a number of issues:
You should open the csv file only once, before the for loop. Open it inside the for loop will override the information from previous loop iteration
icktoofay pointed out that you should use writerow, not writerows
file is a reserve word, you should not use it to name your variable. Besides, it is not that descriptive
You seem to get the file names from the archive, but open the file from the directory (not the ones inside the archive). These two sets of files might not be identical.
Here is my approach:
import csv
import zipfile
with open('out.csv', 'wb') as file_handle:
csv_writer = csv.writer(file_handle)
archive = zipfile.ZipFile('Desktop.zip')
for filename in archive.namelist():
lines = archive.open(filename).read().splitlines()
line_count = len(lines)
csv_writer.writerow([filename, line_count])
My approach has a couple of issues, which might or might not matter:
I assume files in the archive to be text file
I open, read, and split lines in one operation. This might not work well for very large files
The code in your question has multiple issues, as others have pointed out. The two primary ones are that you're recreating the csv file over and over again for each archive member being processed, and then secondly, are passing csvwriter.writerows() the wrong data. It interprets each item in the list you're passing as a separate row to be added to the csv file.
One way to fix that would be to only open the csv file once, before entering a for loop which counts the line in each member of the archive and writes one row to it at time with a call to csvwriter.writerow().
A slightly different way, shown below, does use writerows() but passes it generator expression that processes the each member one-the-fly instead of calling writerow() repeatedly. It also processes each member incrementally, so it doesn't need to read the whole thing into memory at one time and then split it up in order to get a line count.
Although you didn't indicate what version of Python you're using, from the code in your question, I'm guessing it's Python 3.x, so the answer below has been written and tested with that (although it wouldn't be hard to make it work in Python 2.7).
import csv
import zipfile
input_zip_filename = 'Desktop.zip'
output_csv_filename = input("Enter the name of the CSV file to create: ")
# Helper function.
def line_count(archive, filename):
''' Count the lines in specified ZipFile member. '''
with archive.open(filename) as member:
return sum(1 for line in member)
with zipfile.ZipFile(input_zip_filename, 'r') as archive:
# List files in archive.
print('Members of {!r}:'.format(input_zip_filename))
for filename in archive.namelist():
print(' {}'.format(filename))
# Create csv with filenames and line counts.
with open(output_csv_filename, 'w', newline='') as output_csv:
csv.writer(output_csv).writerows(
# generator expression
[filename, line_count(archive, filename)] # contents of one row
for filename in archive.namelist())
Sample format of content in csv file created:
test1.txt,25
test2.txt,10

Programmatically write contents of a list to a directory

I have a python list ['a','b','c'] that's generated inside a for loop. I would like to write each element of the list to a new file.
I have tried:
counter = 0
for i in python_list:
outfile = open('/outdirectory/%s.txt') % str(counter)
outfile.write(i)
outfile.close()
counter += 1
I get an error:
IOError: [Erno 2] No suchfile or directory.
How can I programmatically create and write files in a for loop?
You're not passing a mode to open, so it's trying to open in read mode.
outfile = open('/outdirectory/%s.txt' % str(counter), "w")
Try this:
out_directory = "/outdirectory"
if not os.path.exists(out_directory):
os.makedirs(out_directory)
for counter in range(0, len(python_list)):
file_path = os.path.join(out_directory, "%s.txt" % counter)
with open(file_path, "w") as outfile:
outfile.write(python_list[counter])
basically the message you get is because you try to open a file named /outdirectory/%s.txt. A following error as ErlVolton show is that you don't open your file in writing mode. additionaly you must check that you directory exist. if you use /outdirectory it means from the root of the file system.
Three pythonic additional:
- enumerate iterator which autocount the item in your list
- with statement autoclose your file.
- format can be a little clearer than % thing
so the code could be written in the following
for counter,i in enumerate(python_list):
with open('outdirectory/{}.txt'.format(counter),"w") as outfile:
outfile.write(i)
PS: next time show the full back trace
Some suggestions:
Use the with statement for handling the files
[docs].
Use the correct file open mode for writing [docs].
You can only create files in directories that actually exist. Your error message probably indicates that the call to open() fails, probably because the directory does not exist. Either you have a typo in there, or you need to create the directory first (e.g., as in this question).
Example:
l = ['a', 'b', 'c']
for i, data in enumerate(l):
fname = 'outdirectory/%d.txt' % i
with open(fname, 'w') as f:
f.write(data)

python clear content writing on same file

I am a newbie to python. I have a code in which I must write the contents again to my same file,but when I do it it clears my content.Please help to fix it.
How should I modify my code such that the contents will be written back on the same file?
My code:
import re
numbers = {}
with open('1.txt') as f,open('11.txt', 'w') as f1:
for line in f:
row = re.split(r'(\d+)', line.strip())
words = tuple(row[::2])
if words not in numbers:
numbers[words] = [int(n) for n in row[1::2]]
numbers[words] = [n+1 for n in numbers[words]]
row[1::2] = map(str, numbers[words])
indentation = (re.match(r"\s*", line).group())
print (indentation + ''.join(row))
f1.write(indentation + ''.join(row) + '\n')
In general, it's a bad idea to write over a file you're still processing (or change a data structure over which you are iterating). It can be done...but it requires much care, and there is little safety or restart-ability should something go wrong in the middle (an error, a power failure, etc.)
A better approach is to write a clean new file, then rename it to the old name. For example:
import re
import os
filename = '1.txt'
tempname = "temp{0}_{1}".format(os.getpid(), filename)
numbers = {}
with open(filename) as f, open(tempname, 'w') as f1:
# ... file processing as before
os.rename(tempname, filename)
Here I've dropped filenames (both original and temporary) into variables, so they can be easily referred to multiple times or changed. This also prepares for the moment when you hoist this code into a function (as part of a larger program), as opposed to making it the main line of your program.
You don't strictly need the temporary name to embed the process id, but it's a standard way of making sure the temp file is uniquely named (temp32939_1.txt vs temp_1.txt or tempfile.txt, say).
It may also be helpful to create backups of the files as they were before processing. In which case, before the os.rename(tempname, filename) you can drop in code to move the original data to a safer location or a backup name. E.g.:
backupname = filename + ".bak"
os.rename(filename, backupname)
os.rename(tempname, filename)
While beyond the scope of this question, if you used a read-process-overwrite strategy frequently, it would be possible to create a separate module that abstracted these file-handling details away from your processing code. Here is an example.
Use
open('11.txt', 'a')
To append to the file instead of w for writing (a new or overwriting a file).
If you want to read and modify file in one time use "r+' mode.
f=file('/path/to/file.txt', 'r+')
content=f.read()
content=content.replace('oldstring', 'newstring') #for example change some substring in whole file
f.seek(0) #move to beginning of file
f.write(content)
f.truncate() #clear file conent "tail" on disk if new content shorter then old
f.close()

Categories

Resources