Concatenating fasta files in folder into single file in python

Concatenating fasta files in folder into single file in python - python

I have multiple fasta sequence files stored in a folder within my current working directory (called "Sequences") and am trying to combine all the sequences into a single file to run a MUSLCE multiple sequence alignment on.
This is what I have so far and it is functional up until the output_fas.close(), where i get the error message FileNotFoundError: [Errno 2] No such file or directory: './Sequences'
Here is the code:
import os
os.getcwd() #current directory
DIR = input("\nInput folder path containing FASTA files to combine into one FASTA file: ")
os.chdir(DIR)
FILE_NAME = input("\nWhat would you like to name your output file (e.g. combo.fas)? Note: "
"Please add the .fas extension: ")
output_fas = open(FILE_NAME, 'w')
file_count = 0
for f in os.listdir(DIR):
if f.endswith(( ".fasta")):
file_count += 1
fh = open(os.path.join(DIR, f))
for line in fh:
output_fas.write(line)
fh.close()
output_fas.close()
print(str(file_count) + " FASTA files were merged into one file, which can be found here: " + DIR)
When i input the directory i input it as './Sequences' which successfully changes the directory.
Not quite sure what to do. I adjusted the code before and it successfully created the new files with all the sequences concatenated together, however it ran continuously and would not end and had multiple repeats of each sequence.
Appreciate the help!

The error should occur before the output_fas.close(), and should be seen at the os.listdir(DIR) call. The problem is that DIR becomes meaningless as soon as you execute the os.chdir(DIR) command. DIR was provided as a relative path, and os.chdir(DIR) changes to the new directory, making the old relative path no longer correct relative to the new directory.
If you're going to use os.chdir(DIR), then never use DIR again, and just change your loop to:
# Use with statement for guaranteed deterministic close at end of block & to avoid need
# for explicit close
with open(FILE_NAME, 'w') as output_fas:
file_count = 0
for f in os.listdir(): # Remove DIR to list current directory
if f.endswith(".fasta"):
file_count += 1
# Use a with for same reason as above
with open(f) as fh: # Don't join to DIR because f is already correctly in current directory
output_fas.writelines(fh) # writelines does the loop of write calls for you

Related

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.

You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))

Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

How to open one folder at a time to acces files

I have multiple folders, in a common parent folder, say 'work'. Inside that, I have multiple sub-folders, named 'sub01', 'sub02', etc. All the folders have same files inside, for eg, mean.txt, sd.txt.
I have to add contents of all 'mean.txt' into a single file. I am stuck with, how to open subfolder one by one. Thanks.
getting all files as a list
g = open("new_file", "a+")
for files in list:
f = open(files, 'r')
g.write(f.read())
f.close()
g.close()
I am not getting how to get a list of all files in the subfolder, to make this work
************EDIT*********************
found a solution
os.walk() helped, but had a problem, it was random (it didn't iterate in alphabetical order)
had to use sort to make it in order
import os
p = r"/Users/xxxxx/desktop/bianca_test/" # main_folder
list1 = []
for root, dirs, files in os.walk(p):
if root[-12:] == 'native_space': #this was the sub_folder common in all parent folders
for file in files:
if file == "perfusion_calib_gm_mean.txt":
list1.append(os.path.join(root, file))
list1.sort() # os.walk() iterated folders randomly; this is to overcome that
f = open("gm_mean.txt", 'a+')
for item in list1:
g = open(item, 'r')
f.write(g.read())
print("writing", item)
g.close()
f.close()
Thanks to all who helped.

As i understand it you want to collate all 'mean.txt' files into one file. This should do the job but beware there is no ordering to which file goes where. Note also i'm using StringIO() to buffer all the data since strings are immutable in Python.
import os
from io import StringIO
def main():
buffer = StringIO()
for dirpath, dirnames, filenames in os.walk('.'):
if 'mean.txt' in filenames:
fp = os.path.join(dirpath, 'mean.txt')
with open(fp) as f:
buffer.write(f.read())
all_file_contents = buffer.getvalue()
print(all_file_contents)
if __name__ == '__main__':
main()

Here's a pseudocode to help you get started. Try to google, read and understand the solutions to get better as a programmer:
open mean_combined.txt to write mean.txt contents
open sd_combined.txt to write sd.txt contents
for every subdir inside my_dir:
for every file inside subdir:
if file.name is 'mean.txt':
content = read mean.txt
write content into mean_combined.txt
if file.name is 'sd.txt':
content = read sd.txt
write content into sd_combined.txt
close mean_combined.txt
close sd_combined.txt
You need to look up how to:
open a file to read its contents (hint: use open)
iterate files inside directory (hint: use pathlib)
write a string into a file (hint: read Input and Output)
use context managers for releasing resources (hint: read with statement)

IOError: [Errno 2] No such file or directory: when the name was made by looping over existing files

I'm trying to have the bottom part of the code iterate over some files. These files should be corresponding and are differentiated by a number, so the counter is to change the number part of the file.
The file names are generated by looking through the given files and selecting files containing certain things in the title, then having them ordered using the count.
This code works independently, in it's own (lonely) folder, and prints the correct files in the correct order. However when i use this in my main code, where file_1 and file_2 are referenced (the decoder and encoder parts of the code) I get the error in the title. There is no way there is any typo or that the files don't exist because python made these things itself based on existing file names.
import os
count = 201
while 205 > count:
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-decoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_1 = f
print(file_1)
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-encoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_2 = f
print(file_2)
decoder1.load_state_dict(
torch.load(open(file_1, 'rb')))
encoder1.load_state_dict(
torch.load(open(file_2, 'rb')))
print(getBlueScore(encoder1, decoder1, pairs, src, tgt))
print_every=10
print(file_1 + file_2)
count = count + 1
i then need to use these files two by two.

It's very possible that you are running into issues with variable scoping, but without being able to see your entire code it's hard to know for sure.
If you know what the model files should be called, might I suggest this code:
for i in range(201, 205):
e = 'absolute_path/models/test_encoder_%d.model' % i
d = 'absolute_path/models/test_decoder_%d.model' % i
if os.path.exists(e) and os.path.exists(d):
decoder1.load_state_dict(torch.load(open(e, 'rb')))
encoder1.load_state_dict(torch.load(open(d, 'rb')))
Instead of relying on the existence of strings in a path name which could lead to errors this would force only those files you want to open to be opened. Also it gets rid of any possible scoping issues.
We could clean it up a bit more but you get the idea.

Adding actual files to a list, instead of just the file's string name

I am having issues reading the contents of the files I am trying to open due to the fact that python believes there is:
"No such file or directory: 'Filename.xrf'"
Here is an outline of my code and what I think the problem may be:
The user's input defines the path to where the files are.
direct = str(raw_input("Enter directory name where your data is: ))
path = "/Users/myname/Desktop/heasoft/XRF_data/%s/40_keV" \
%(direct)
print os.listdir(path)
# This lists the correct contents of the directory that I wanted it to.
So here I essentially let the user decide which directory they want to manipulate and then I choose one more directory path named "40_keV".
Within a defined function I use the OS module to navigate to the corresponding directory and then append every file within the 40_keV directory to a list, named dataFiles.
def Spectrumdivide():
dataFiles = []
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(file)
Here, the correct files were appended to the list 'dataFiles', but I think this may be where the problem is occurring. I'm not sure whether or not Python is adding the NAME of the file to my list instead of the actual file object.
The code breaks because python believes there is no such file or directory.
for filename in dataFiles:
print filename
f = open(filename,'r') # <- THE CODE BREAKS HERE
print "Opening file: " + filename
line_num = f.readlines()
Again, the correct file is printed from dataFiles[0] in the first iteration of the loop but then this common error is produced:
IOError: [Errno 2] No such file or directory: '40keV_1.xrf'
I'm using an Anaconda launcher to run Spyder (Python 2.7) and the files are text files containing two columns of equal length. The goal is to assign each column to a list and the manipulate them accordingly.

You need to append the path name not just the file's name using the os.path.join function.
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.xrf'):
dataFiles.append(os.path.join(root, file))

Reading all files in all directories [duplicate]

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 9 years ago.
I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.
Here is what I have:
filename = '*'
filesuffix = '*'
location = os.path.join('Test', filename + "." + filesuffix)
Document = filename
thedictionary = {}
with open(location) as f:
file_contents = f.read().lower().split(' ') # split line on spaces to make a list
for position, item in enumerate(file_contents):
if item in thedictionary:
thedictionary[item].append(position)
else:
thedictionary[item] = [position]
wordlist = (thedictionary, Document)
#print wordlist
#print thedictionary
note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:
"IOError: [Errno 2] No such file or directory: 'Test/.'"
I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.
I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)
Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:
previous_dir = os.getcwd()
os.chdir('testfilefolder')
#add something here?
for filename in os.listdir('.'):
That I would need to add something where I have an outer for loop but don't quite know what to put in it..
Any thoughts?

Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.
Opening all text files in all subdirectories, one level deep:
import glob
for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
with open(filename) as f:
# one file open, handle it, next loop will present you with a new file.
Opening all text files in an arbitrary nesting of directories:
import os
import fnmatch
for dirpath, dirs, files in os.walk('Test'):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)):
# one file open, handle it, next loop will present you with a new file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatenating fasta files in folder into single file in python - python

Related

Is there a way to change your cwd in Python using a file as an input?

How to open one folder at a time to acces files

IOError: [Errno 2] No such file or directory: when the name was made by looping over existing files

Adding actual files to a list, instead of just the file's string name

Reading all files in all directories [duplicate]

Categories

Resources