Incorrect filename when iterating through files - python

I want to add a character at the end of each line in all the files in a folder, so I've written some code in order to iterate through each file and add the desired change, however the output files have different filenames than the originals, below is the code that I've put together
import os
output = '/home/test/Playground/Python/filemodification/output/'
def modification():
with open(files, 'r') as istr:
with open(str(output) + str(files), 'w') as ostr:
for line in istr:
line = line.rstrip('\n') + 'S'
print(line, file=ostr)
directory = '/home/test/Playground/Python/filemodification/input'
for files in os.scandir(directory):
#print(files.path)
print(files)
#print(output)
#print(type(files))
modification()
Once I run the code I get the following filename
<DirEntry 'input.txt'>
and this is the original filename
input.txt
I know the issue is probably related with this
with open(str(output) + str(files), 'w') as ostr:
but I haven't found a way to perform this task differently
If someone could point me in the right direction or provide a code example that can acommplish this task it would be greatly appreciated
Thanks

os.scandir returns os.DirEntry objects. You can get their filename by accessing their .name attribute, or their full path through .path.
E.g.:
for entry in os.scandir(directory):
print(entry.path)

Related

gzipping files from python is changing the file names

I am trying to gzip files using python 3. When I gzip the files, the code is changing the filename without me doing anything. I am not sure I totally understand the working of gzip module.
Below is the code:
dir_in = '/localfolder/new_files/'
dir_out = '/localfolder/zippedfiles/
file_name = 'transactions_may05'
def gzip_files(dir_in, dir_out, file_name):
with open(dir_in + file_name, 'rb') as f_in, gzip.open(dir_out + 'unprocessed.' + file_name + '.gz', 'wb') as f_out:
f_out.writelines(f_in)
Expected Output:
Outer file: unprocessed.transactions_may05.gz
when I double click it, I should get the original file transactions_may05
Current Output:
Outer file: unprocessed.transactions_may05.gz -- As expected
when I double click it the internal file also has unprocessed. appended to it. I am not sure why unprocessed. gets appended to internal file name
Internal File:unprocessed.transactions_may05
Any help would be appreciated. Thank you.
That's the expected behavior of gzip and gunzip.
As mentioned in the manual page:
gunzip takes a list of files on its command line and replaces each
file whose name ends with .gz, -gz, .z, -z, or _z (ignoring case) and
which begins with the correct magic number with an uncompressed
file without the original extension.
If you don't want the name to change, you should not modify the filename when you compress it.

Read all the text files in a folder and change a character in a string if it presents

I have a folder with csv formated documents with a .arw extension. Files are named as 1.arw, 2.arw, 3.arw ... etc.
I would like to write a code that reads all the files, checks and replaces the forwardslash / with a dash -. And finally creates new files with the replaced character.
The code I wrote as follows:
for i in range(1,6):
my_file=open("/path/"+str(i)+".arw", "r+")
str=my_file.read()
if "/" not in str:
print("There is no forwardslash")
else:
str_new = str.replace("/","-")
print(str_new)
f = open("/path/new"+str(i)+".arw", "w")
f.write(str_new)
my_file.close()
But I get an error saying:
'str' object is not callable.
How can I make it work for all the files in a folder? Apparently my for loop does not work.
The actual error is that you are replacing the built-in str with your own variable with the same name, then try to use the built-in str() after that.
Simply renaming the variable fixes the immediate problem, but you really want to refactor the code to avoid reading the entire file into memory.
import logging
import os
for i in range(1,6):
seen_slash = False
input_filename = "/path/"+str(i)+".arw"
output_filename = "/path/new"+str(i)+".arw"
with open(input_filename, "r+") as input, open(output_filename, "w") as output:
for line in input:
if not seen_slash and "/" in line:
seen_slash = True
line_new = line.replace("/","-")
print(line_new.rstrip('\n')) # don't duplicate newline
output.write(line_new)
if not seen_slash:
logging.warn("{0}: No slash found".format(input_filename))
os.unlink(output_filename)
Using logging instead of print for error messages helps because you keep standard output (the print output) separate from the diagnostics (the logging output). Notice also how the diagnostic message includes the name of the file we found the problem in.
Going back and deleting the output filename when you have examined the entire input file and not found any slashes is a mild wart, but should typically be more efficient.
This is how I would do it:
for i in range(1,6):
with open((str(i)+'.arw'), 'r') as f:
data = f.readlines()
for element in data:
element.replace('/', '-')
f.close()
with open((str(i)+'.arw'), 'w') as f:
for element in data:
f.write(element)
f.close()
this is assuming from your post that you know that you have 6 files
if you don't know how many files you have you can use the OS module to find the files in the directory.

Using variable as part of name of new file in python

I'm fairly new to python and I'm having an issue with my python script (split_fasta.py). Here is an example of my issue:
list = ["1.fasta", "2.fasta", "3.fasta"]
for file in list:
contents = open(file, "r")
for line in contents:
if line[0] == ">":
new_file = open(file + "_chromosome.fasta", "w")
new_file.write(line)
I've left the bottom part of the program out because it's not needed. My issue is that when I run this program in the same direcoty as my fasta123 files, it works great:
python split_fasta.py *.fasta
But if I'm in a different directory and I want the program to output the new files (eg. 1.fasta_chromsome.fasta) to my current directory...it doesn't:
python /home/bin/split_fasta.py /home/data/*.fasta
This still creates the new files in the same directory as the fasta files. The issue here I'm sure is with this line:
new_file = open(file + "_chromosome.fasta", "w")
Because if I change it to this:
new_file = open("seq" + "_chromosome.fasta", "w")
It creates an output file in my current directory.
I hope this makes sense to some of you and that I can get some suggestions.
You are giving the full path of the old file, plus a new name. So basically, if file == /home/data/something.fasta, the output file will be file + "_chromosome.fasta" which is /home/data/something.fasta_chromosome.fasta
If you use os.path.basename on file, you will get the name of the file (i.e. in my example, something.fasta)
From #Adam Smith
You can use os.path.splitext to get rid of the .fasta
basename, _ = os.path.splitext(os.path.basename(file))
Getting back to the code example, I saw many things not recommended in Python. I'll go in details.
Avoid shadowing builtin names, such as list, str, int... It is not explicit and can lead to potential issues later.
When opening a file for reading or writing, you should use the with syntax. This is highly recommended since it takes care to close the file.
with open(filename, "r") as f:
data = f.read()
with open(new_filename, "w") as f:
f.write(data)
If you have an empty line in your file, line[0] == ... will result in a IndexError exception. Use line.startswith(...) instead.
Final code :
files = ["1.fasta", "2.fasta", "3.fasta"]
for file in files:
with open(file, "r") as input:
for line in input:
if line.startswith(">"):
new_name = os.path.splitext(os.path.basename(file)) + "_chromosome.fasta"
with open(new_name, "w") as output:
output.write(line)
Often, people come at me and say "that's hugly". Not really :). The levels of indentation makes clear what is which context.

Python for loop only executes once

I am currently working on some code that will look through multiple directories from an .ini file then print and copy them to a new directory. I ran into a problem where the for loop that prints the files only executes once when it is supposed to execute 5 times. How can i fix it so the for loop works every time it is called?
Code:
def copyFiles(path):
rootPath = path
print(rootPath)
pattern = "*.wav"
search = ""
#searches the directories for the specified file type then prints the name
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(filename)
def main():
#opens the file containing all the directories
in_file = open('wheretolook.ini', "r")
#create the new directory all the files will be moved to
createDirectory()
#takes the path names one at a time and then passes them to copyFiles
for pathName in in_file:
copyFiles(pathName)
Output i get from running my code
The output should have the 0 through 4 files under every diretory.
Thank you for the help!
The pathName you get when iterating over the file has a newline character at the end for each line but the last. This is why you get the blank lines in your output after each path is printed.
You need to call strip on your paths to remove the newlines:
for pathName in in_file:
copyFiles(pathname.strip())
You could be more restrictive and use rstrip('\n'), but I suspect getting rid of all leading and trailing whitespace is better anyway.

Issue finding a file in a list to be rewritten

Very new to Python and programming in general so apologies if I am missing anything straightforward.
I am trying to iterate through a directory and open the included .txt files and modify them with new content.
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
f=open(filename, 'r')
lines=f.read()
f.close()
for line in lines:
f=open(filename, 'w')
newline='rewritten content here'
f.write(newline)
f.close()
return x
rootdir("/Users/russellculver/documents/testfolder")`
Is giving me: IOError: [Errno 2] No such file or directory: 'TestText1.rtf'
EDIT: I should clarify there IS a file named 'TestText1.rtf' in the folder specified in the function argument. It is the first one of three text files.
When I try moving where the file is closed / opened as seen below:
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
f=open(filename, 'r+')
lines=f.read()
for line in lines:
newline='rewritten content here'
f.write(newline)
f.close()
return x
rootdir("/Users/russellculver/documents/testfolder")
It gives me: ValueError: I/O operation on closed file
Thanks for any thoughts in advance.
#mescalinum Okay so I've made amendments to what I've got based on everyones assistance (thanks!), but it is still failing to enter the text "newline" in any of the .txt files in the specified folder.
import os
x = raw_input("Enter the directory here: ")
def rootdir(x):
for dirpaths, dirnames, files in os.walk(x):
for filename in files:
try:
with open(os.dirpaths.join(filename, 'w')) as f:
f.write("newline")
return x
except:
print "There are no files in the directory or the files cannot be opened!"
return x
From https://docs.python.org/2/library/os.html#os.walk:
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
Also, f.close() should be outside for line in lines, otherwise you call it multiple times, and the second time you call it, f is already closed, and it will give that I/O error.
You should avoid explicitly open()ing and close()ing files, like:
f=open(filename, 'w')
f.write(newline)
f.close()
and instead use context managers (i.e. the with statement):
with open(filename, 'w'):
f.write(newline)
which does exactly the same thing, but implicitly closes the file when the body of with is finished.
Here is the code that does as you asked:
import os
def rootdir(x):
for paths, dirs, files in os.walk(x):
for filename in files:
try:
f=open(os.path.join(dirpath, name), 'w')
f.write('new content here')
f.close()
except Exception, e:
print "Could not open " + filename
rootdir("/Users/xrisk/Desktop")
However, I have a feeling you don’t quite understand what’s happening here (no offence). First have a look at the documentation of os.walk provided by #mescalinum . The third tuple element files will contain only the file name. You need to combine it with paths to get a full path to the file.
Also, you don’t need to read the file first to write to it. On the other hand, if you want to append to the file, you should use the mode 'a' when opening the file
In general, when reading/writing a file, you only close it after finishing all the read/writes. Otherwise you will get an exception.
Thanks #mescalinum

Categories

Resources