Error in converting multiple FASTA files to Nexus using Biopython - python

I want to convert multiple FASTA format files (DNA sequences) to the NEXUS format using BIO.SeqIO module but I get this error:
Traceback (most recent call last):
File "fasta2nexus.py", line 28, in <module>
print(process(fullpath))
File "fasta2nexus.py", line 23, in process
alphabet=IUPAC.ambiguous_dna)
File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 1003, in convert
with as_handle(in_file, in_mode) as in_handle:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/Library/Python/2.7/site-packages/Bio/File.py", line 88, in as_handle
with open(handleish, mode, **kwargs) as fp:
IOError: [Errno 2] No such file or directory: 'c'
What am I missing?
Here is my code:
##!/usr/bin/env python
from __future__ import print_function # or just use Python 3!
import fileinput
import os
import re
import sys
from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC
test = "/Users/teton/Desktop/test"
files = os.listdir(os.curdir)
def process(filename):
# retuns ("basename", "extension"), so [0] picks "basename"
base = os.path.splitext(filename)[0]
return SeqIO.convert(filename, "fasta",
base + ".nex", "nexus",
alphabet=IUPAC.ambiguous_dna)
for files in os.listdir(test):
for file in files:
fullpath = os.path.join(file)
print(process(fullpath))

This code should solve the majority of problems I can see.
from __future__ import print_function # or just use Python 3!
import fileinput
import os
import re
import sys
from Bio import SeqIO, Nexus
from Bio.Alphabet import IUPAC
test = "/Users/teton/Desktop"
def process(filename):
# retuns ("basename", "extension"), so [0] picks "basename"
base = os.path.splitext(filename)[0]
return SeqIO.convert(filename, "fasta",
base + ".nex", "nexus",
alphabet=IUPAC.ambiguous_dna)
for root, dirs, files in os.walk(test):
for file in files:
fullpath = os.path.join(root, file)
print(process(fullpath))
I changed a few things. First, I ordered your imports (personal thing) and made sure to import IUPAC from Bio.Alphabet so you can actually assign the correct alphabet to your sequences. Next, in your process() function, I added a line to split the extension off the filename, then used the full filename for the first argument, and just the base (without the extension) for naming the Nexus output file. Speaking of which, I assume you'll be using the Nexus module in later code? If not, you should remove it from the imports.
I wasn't sure what the point of the last snippet was, so I didn't include it. In it, though, you appear to be walking the file tree and process()ing each file again, then referencing some undefined variable named count. Instead, just run process() once, and do whatever count refers to within that loop.
You may want to consider adding some logic to your for loop to test that the file returned by os.path.join() actually is a FASTA file. Otherwise, if any other file type is in one of the directories you search and you process() it, all sorts of weird things could happen.
EDIT
OK, based on your new code I have a few suggestions. First, the line
files = os.listdir(os.curdir)
is completely unnecessary, as below the definition of the process() function, you're redefining the files variable. Additionally, the above line would fail, as you are not calling os.curdir(), you are just passing its reference to os.listdir().
The code at the bottom should simply be this:
for file in os.listdir(test):
print(process(file))
for file in files is redundant, and calling os.path.join() with a single argument does nothing.

NameError
You imported SeqIO but are calling seqIO.convert(). Python is case-sensitive. The line should read:
return SeqIO.convert(filename + '.fa', "fasta", filename + '.nex', "nexus", alphabet=IUPAC.ambiguous_dna)
IOError: for files in os.walk(test):
IOError is raised when a file cannot be opened. It often arises because the filename and/ or file path provided does not exist.
os.walk(test) iterates through all subdirectories in the path test. During each iteration, files will be a list of 3 elements. The first element is the path of the directory, the second element is a list of subdirectories in that path, and the third element is a list of files in that path. You should be passing a filename to process(), but you are passing a list in process(files).
You have implemented it correctly in this block for root, dirs, files in os.walk(test):. I suggest you implement it similarly in the for loop below.
You are adding .fa to your filename. Don't add .fa.

Related

Whats the explanation for this behaviour?

from pathlib import Path
file = Path(r"C:\Users\SerT\Desktop\a.txt")
print (file.name)
file.rename(file.with_name("b.txt"))
print (file.name)
i'd like to know why file.name prints out "a.txt" in both instances even though the file actually gets renamed in windows explorer
The file is being renamed, but the original Path object (file in your case) is not changed itself.
Since Path.rename() returns a new Path object, to get the result you're expecting, do:
file = file.rename(file.with_name("b.txt"))
print(file.name)
.rename doesn't modify the file object, instead it simply returns the new filepath. If you want to rename a file, you can set file equal to the file.rename method:
import os
from pathlib import Path
file = Path(r"C:\Users\SerT\Desktop\a.txt")
print (file.name)
file = file.rename(file.with_name("b.txt"))
print(file.name)

I'm making a script to generate folders and move images to them in Python on Windows 10 using shutil, but it's throwing the errors below

I'm getting these errors:
Traceback (most recent call last):
File "file_mover.py", line 41, in <module>
shutil.copy(f, os.path.join(path_to_export_two, Path("/image/")))
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\shutil.py", line 245, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python37_64\lib\shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '8ballshake1#8x.png'
When executing this code:
# Making the substrings is more difficult then anticipated prolly just use Java tbh
# LOWERCASE, LOWERCASE THE FOLDERS
import shutil
import os
from pathlib import Path
assets_path = Path("/Users/Jackson Clark/Desktop/uploads")
export_path = Path("/Users/Jackson Clark/Desktop/uploads")
source = os.listdir(assets_path)
"""
NOTE: Filters.js is the important file
The logic:
- Go through each file in the assets_path directory
- Rename the files to start with RoCode (this could be a seperate script)
- Create a new directory with the first four characters of the files name
- Create two sub directories with the names 'image' and 'thumb'
- Copy the file to both the 'image' and 'thumb' directories
That should be all, but who knows tbh
"""
"""
Good links:
https://www.pythonforbeginners.com/os/python-the-shutil-module
https://stackabuse.com/creating-and-deleting-directories-with-python/
"""
for f in source:
f_string = str(f)
folder_one_name = f_string[0:2]
folder_two_name = f_string[2:4]
path_to_export_one = os.path.join(export_path, folder_one_name)
path_to_export_two = os.path.join(export_path, folder_one_name, folder_two_name)
os.mkdir(path_to_export_one)
os.mkdir(path_to_export_two)
os.mkdir(os.path.join(path_to_export_two, Path("/image/")))
os.mkdir(os.path.join(path_to_export_two, Path("/thumb/")))
shutil.copy(f, os.path.join(path_to_export_two, Path("/image/")))
shutil.copy(f, os.path.join(path_to_export_two, Path("/thumb/")))
I simply need the code to generate two folders, the first being named the first two characters of the file that the script is reading, and the second folder (which is a subfolder of the first) to be named the 3rd and 4th characters of the filename that the script is reading. Alas, I'm getting the errors above.
Get rid of the leading slashes in your Paths. These are causing the paths to be truncated:
>>> print(os.path.join("some_folder", Path("/image/")))
\image
>>> print(os.path.join("some_folder", Path("image")))
some_folder\image
The relevant sentence of the os.path.join documentation is as follows:
If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.
The leading slash in /image/ causes this path component to become absolute and hence the previous components (in my case, just "some_folder") are discarded.
Also, I don't see why you're creating a Path from a string when you can just use a string directly:
>>> print(os.path.join("some_folder", "image"))
some_folder\image

reading files from a folder using os module

for a pattern recognition application, I want to read and operate on jpeg files from another folder using the os module.
I tried to use str(file) and file.encode('latin-1') but they both give me errors
I tried :
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ str(file.encode('latin-1'))), 'r')
allLines.append(file.read())
print(allLines)
but I get an error saying:
No such file or directory "results/b'thefilename"
when I expect a list with the desired file names that are accessible
If you can use Python 3.4 or newer, you can use the pathlib module to handle the paths.
from pathlib import Path
all_lines = []
path = Path('results/')
for file in path.iterdir():
with file.open() as f:
all_lines.append(f.read())
print(all_lines)
By using the with statement, you don't have to close the file descriptor by hand (what is currently missing), even if an exception is raised at some point.

Copying thousands of files (filtered by name) to a specified folder

I'm trying to run two operations:
Starting from a .txt file containing some IDs (which lead to a filename), checking if that file is within a folder;
If step 1) is true, copying the file from that folder to a specified folder.
The .txt file stores codes like these:
111081
112054
112051
112064
This is what I have tried:
from glob import glob
from shutil import copyfile
import os
input = 'C:/Users/xxxx/ids.txt'
input_folder = 'C:/Users/xxxx/input/'
dest_folder = 'C:/Users/xxxx/output/'
with open(input) as f:
for line in f:
string = "fixed_prefix_" + str(line.strip()) + '.asc'
if os.path.isfile(string):
copyfile(string, dest_folder)
The string variable generates this (for example):
print string
fixed_prefix_111081.asc
Then, I'm sure there is something else wrong with the searching and the copying of the file to the destination folder. The main problem is that I don't know how to search for the fixed_prefix_111081.asc file in the input_folder.
copyfile expects a filename as destination. Passing an existing directory is the case where it doesn't work. Using copy handles both cases (target directory or target file)
the input file seems to be passed without path. You'd have to generate the full filename if you're not in input_folder or os.path.isfile will always be False
My fix proposal:
with open(input) as f:
for line in f:
string = "fixed_prefix_{}.asc".format(line.strip())
fp_string = os.path.join(input_folder,string)
if os.path.isfile(fp_string):
copy(fp_string, dest_folder)

Replace IP addresses with filename in Python for all files in a directory

If I knew the first thing about Python, I'd have figured this out myself just by referring to other similar questions that were already answered.
With that out of the way, I'm hoping you can help me achieve the following:
I'm looking to replace all occurrences of IP addresses with the file name itself, in a directory, inline.
Let's say all my files are in D:\super\duper\directory\
Files don't have any extension, i.e., a sample file name will be "jb-nnnn-xy".
Even if there are multiple mentions of IP address in the file, I'm interested in replacing only the line that looks like this (without quotes):
" TCPHOST = 72.163.363.25"
So overall, there are thousands of files in the directory, of which only few have hard-coded IP addresses.
And the line of interest should finally look like this:
" TCPHOST = jb-yyyy-nz"
where "jb-yyyy-nz" is the name of the file itself
Thank you very much for your time and help!
EDIT: Just a mish mash of code from other posts that I'm trying out..
from __future__ import print_function
import fnmatch
import os
from fileinput import FileInput
import re
ip_addr_regex = re.compile(r'\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b')
def find_replace(topdir, text):
for dirpath, dirs, files in os.walk(topdir, topdown=True):
files = [os.path.join(dirpath, filename) for filename in files]
for line in FileInput(files, inplace=True):
print(line.replace(text, str(filename)))
find_replace(r"D:\testmulefew",ip_addr_regex)
Please check the below code with comments inline:
import os
import re
import fileinput
#Get the file list from the directory
file_list = [f for f in os.listdir("C:\\Users\\dinesh_pundkar\\Desktop\\demo")]
#Change the directory where file to checked and modify
os.chdir("C:\\Users\\dinesh_pundkar\\Desktop\\demo")
#FileInput takes file_list as input
with fileinput.input(files=file_list,inplace=True) as f:
#Read file line by line
for line in f:
d=''
#Find the line with TCPHOST
a = re.findall(r'TCPHOST\s*=\s*\d+\.',line.strip())
if len(a) > 0:
#If found update the line
d = 'TCPHOST = '+str(fileinput.filename())
print (d)
else:
#Otherwise keep as it is
print (line.strip())
P.S: Assumed the directory contains file and it does not have other directory inside it. Otherwise, file listing need to do recursively.

Categories

Resources