I am trying to take a folder which contains 9 files, each containing FASTA records of separate genes, and remove duplicate records. I want to set it up so that the script is called with the folder that contains the genes as the first parameter, and a new folder name to rewrite the new files without duplicates to. However, if the files are stored in a folder called results within the current directory it is not letting me open any of the gene files within that folder to process them for duplicates. I have searched around and it seems that I should be able to call python's open() function with a string of the file name like this:
input_handle = open(f, "r")
This line is not allowng me to open the file to read its contents, and I think it may have something to do with the type of f, which shows to be type 'str' when I call type(f)
Also, if I use the full path:
input_handle = open('~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
It says that no such file exists. I have checked my spelling and I am sure that the file does exist. I also get that file does not exist if I try to call its name as a raw string:
input_handle = open(r'~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
Or if I try to call it as the following it says that no global results exists:
input_handle = open(os.path.join(os.curdir,results/f), "r")
Here is the full code. If anybody knows what the problem is I would really appreciate any help that you could offer.
#!/usr/bin/python
import os
import os.path
import sys
import re
from Bio import SeqIO
def processFiles(files) :
for f in files:
process(f)
def process(f):
input_handle = open(f, "r")
records = list(SeqIO.parse(input_handle, "fasta"))
print records
i = 0
while i < len(records)-1:
temp = records[i]
next = records[i+1]
if (next.id == temp.id) :
print "duplicate found at " + next.id
if (len(next.seq) < len(temp.seq)) :
records.pop(i+1)
else :
records.pop(i)
i = i + 1
output_handle = open("out.fa", "w")
for record in records:
SeqIO.write(records, output_handle, "fasta")
input_handle.close()
def main():
input_folder = sys.argv[1]
out_folder = sys.argv[2]
if os.path.exists(out_folder):
print("Folder %s exists; please specify empty folder or new one" % out_folder)
sys.exit(1)
os.makedirs(out_folder)
files = os.listdir(input_folder)
print files
processFiles(files)
main()
Try input_handle = open(os.path.join(os.getcwd,results/f), "r"). os.curdir returns . See mail.python.org/pipermail/python-list/2012-September/631864.html.
Related
I'm trying to copy files from directory A, to directory B, based on a txt file containing the list of files to be extracted - located in directory B. I referred to this code: How to extract files from a particular folder with filename stored in a python list?
but it doesn't seem to enter the if (where I have put the 'in here' printout). Could someone tell me what I am doing wrong?
This is the code:
import os
import shutil
def read_input_file():
my_file = open("/mnt/d/Downloads/TSU/remaining_files_noUSD_19Jan.txt", "r")
# reading the file
data = my_file.read()
data_into_list = data.split("\n")
#print(data_into_list)
my_file.close()
return data_into_list
def filter_data(list_of_files):
path="/mnt/e/Toyota Smarthome/Untrimmed/Videos_mp4"
path_to_be_moved="/mnt/d/Downloads/TSU"
#print(list_of_files)
for file in os.listdir(path):
#print(file)
if file in list_of_files:
print("in here")
print(file)
shutil.copytree(path,path_to_be_moved)
#os.system("mv "+path+file+" "+path_to_be_moved)
if __name__ == "__main__":
list = read_input_file()
filter_data(list)
I am using python3 via WSL.
the mp4 folder contains multiple videos, and the output of "
read input file
is as follows
"
Thank you!
I think copytree from shutil has another purpose to just move file, it moves an entire structure. I'd use shutil.move
import os
import shutil
def read_input_file():
my_file = open("list.txt", "r")
data = my_file.read()
data_into_list = data.split("\n")
my_file.close()
return data_into_list
def filter_data(list_of_files):
path="directoryA/"
path_to_be_moved="directoryB/"
for file in os.listdir(path):
# print(file)
if file in list_of_files:
print("in here")
print(file)
shutil.move(path+file,path_to_be_moved+file)
mylist = read_input_file()
filter_data(mylist)
just saw your update, be careful, data_into_list = data.split("\n") is for a file.txt with a list separated with an enter. yours is with a comma and space so you'll have to change that.
Also you shouldn't use list as a variable name, mylist for example is better. list() is used to create list
I am trying to build a script which can look for all files in a certain folder, and pull any lines of text that contain a key word or phrase.
Very new to python, and not really understanding how to piece together multiple suggestions from others I have seen.
import re
from glob import glob
search = []
linenum = 0
pattern = re.compile("Dawg", re.IGNORECASE) # Compile a case-insensitive regex
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in\\Audit_2022.log'
filenames = glob('*.log')
print(f"\n{filenames}")
with open (path, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.search(line) != None: # If a match is found
search.append((linenum, line.rstrip('\n')))
for x in search: # Iterate over the list of tuples
print("\nLine " + str(x[0]) + ": " + x[1])
This does everything exactly how I want it, except can only see one file at a time.
My issue arises when I try deleting 'Audit_2022.log' from the end of the path = line.
Python says "PermissionError: [Errno 13] Permission denied: 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'". I assume this is because it's looking at a directory and not a file, but how can I get it to read multiple files?
Many thanks in advance!
Assuming you also need to show the filename(s) you could do this:
import re
from glob import glob
import os
p = re.compile('Dawg', re.IGNORECASE)
path = r'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(path, '*.log')):
with open(file) as logfile:
for i, line in enumerate(map(str.strip, logfile), 1):
if p.search(line) is not None:
print(f'File={file}, Line={i}, Data={line}')
The reason you're getting that Exception is because open needs a filename, and if you give it just a path, it doesn't really know what to do. A minimal example could be:
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in\\Audit_2022.log'
with open (path, 'rt') as f:
pass
If the file exists, this should run fine, but if you change it to:
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'
with open (path, 'rt') as f:
pass
Then this will throw the exception.
I suspect what you're trying to do is glob through all log files in path and try each one, so something like:
import os
path = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'
filenames = glob(os.path.join(path, '*.log'))
print(f"\n{filenames}")
for filename in filenames:
with open (filename, 'rt') as myfile:
...
You can use os.listdir() to get all files in a directory, then nest your opening loop for each file in the directory:
import os
folder = 'C:\\Users\\Username\\Downloads\Testdataextraction\\Throw it in'
for file in glob(os.path.join(folder, '*.log')):
with open(file, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.match(line): # If a match is found
search.append((linenum, line.rstrip('\n')))
See os.path.join() for a better path joining alternative
I am writing code that looks for a specific string that could occur in a vacancy. I have some vacancies in a local folder in a .txt format. I want to search all files for the string and then store that string in a new file that will contain all filenames.
I created a list that contains all vacancies because I do not want the new file containing the searched string to appear when I search for a string a second time
I have the following code:
import os
import glob
from datetime import date
import time
today = date.today()
d1 = today.strftime("%d/%m/%Y")
search = "engineer"
vacancies = [filename for filename in glob.glob("*.txt") if not filename.endswith('keyword.txt')]
for filename in vacancies:
with open ("C:\\LOCATION\\txt"+'\\'+filename, 'r', encoding='utf-8') as f:
for line in f:
line = line.lower
for filename in vacancies:
with open ("C:\\LOCATION\\txt"+'\\'+filename) as f:
if search in f.read():
f = open(d1 + " " + search + " keyword.txt","a+", encoding='utf-8')
f.write("Found it in " + filename + '\n')
f.close
I would expect my list to filter out the new file, however I get an errormessage stating: FileNotFoundError: [Errno 2] No such file or directory: '05/11/2020 engineer keyword.txt'
Can someone explain why this code is not working?
The reason why the FileNotFoundError was given is because of the slashes in the d1 variable. The text file could not be created with slashes. After swapping slash"" for hyphen"-", the code worked just fine
I want to write a program for this: In a folder I have n number of files; first read one file and perform some operation then store result in a separate file. Then read 2nd file, perform operation again and save result in new 2nd file. Do the same procedure for n number of files. The program reads all files one by one and stores results of each file separately. Please give examples how I can do it.
I think what you miss is how to retrieve all the files in that directory.
To do so, use the glob module.
Here is an example which will duplicate all the files with extension *.txt to files with extension *.out
import glob
list_of_files = glob.glob('./*.txt') # create the list of file
for file_name in list_of_files:
FI = open(file_name, 'r')
FO = open(file_name.replace('txt', 'out'), 'w')
for line in FI:
FO.write(line)
FI.close()
FO.close()
import sys
# argv is your commandline arguments, argv[0] is your program name, so skip it
for n in sys.argv[1:]:
print(n) #print out the filename we are currently processing
input = open(n, "r")
output = open(n + ".out", "w")
# do some processing
input.close()
output.close()
Then call it like:
./foo.py bar.txt baz.txt
You may find the fileinput module useful. It is designed for exactly this problem.
I've just learned of the os.walk() command recently, and it may help you here.
It allows you to walk down a directory tree structure.
import os
OUTPUT_DIR = 'C:\\RESULTS'
for path, dirs, files in os.walk('.'):
for file in files:
read_f = open(os.join(path,file),'r')
write_f = open(os.path.join(OUTPUT_DIR,file))
# Do stuff
Combined answer incorporating directory or specific list of filenames arguments:
import sys
import os.path
import glob
def processFile(filename):
fileHandle = open(filename, "r")
for line in fileHandle:
# do some processing
pass
fileHandle.close()
def outputResults(filename):
output_filemask = "out"
fileHandle = open("%s.%s" % (filename, output_filemask), "w")
# do some processing
fileHandle.write('processed\n')
fileHandle.close()
def processFiles(args):
input_filemask = "log"
directory = args[1]
if os.path.isdir(directory):
print "processing a directory"
list_of_files = glob.glob('%s/*.%s' % (directory, input_filemask))
else:
print "processing a list of files"
list_of_files = sys.argv[1:]
for file_name in list_of_files:
print file_name
processFile(file_name)
outputResults(file_name)
if __name__ == '__main__':
if (len(sys.argv) > 1):
processFiles(sys.argv)
else:
print 'usage message'
from pylab import *
import csv
import os
import glob
import re
x=[]
y=[]
f=open("one.txt",'w')
for infile in glob.glob(('*.csv')):
# print "" +infile
csv23=csv2rec(""+infile,'rb',delimiter=',')
for line in csv23:
x.append(line[1])
# print len(x)
for i in range(3000,8000):
y.append(x[i])
print ""+infile,"\t",mean(y)
print >>f,""+infile,"\t\t",mean(y)
del y[:len(y)]
del x[:len(x)]
I know I saw this double with open() somewhere but couldn't remember where. So I built a small example in case someone needs.
""" A module to clean code(js, py, json or whatever) files saved as .txt files to
be used in HTML code blocks. """
from os import listdir
from os.path import abspath, dirname, splitext
from re import sub, MULTILINE
def cleanForHTML():
""" This function will search a directory text files to be edited. """
## define some regex for our search and replace. We are looking for <, > and &
## To replaced with &ls;, > and &. We might want to replace proper whitespace
## chars to as well? (r'\t', ' ') and (f'\n', '<br>')
search_ = ((r'(<)', '<'), (r'(>)', '>'), (r'(&)', '&'))
## Read and loop our file location. Our location is the same one that our python file is in.
for loc in listdir(abspath(dirname(__file__))):
## Here we split our filename into it's parts ('fileName', '.txt')
name = splitext(loc)
if name[1] == '.txt':
## we found our .txt file so we can start file operations.
with open(loc, 'r') as file_1, open(f'{name[0]}(fixed){name[1]}', 'w') as file_2:
## read our first file
retFile = file_1.read()
## find and replace some text.
for find_ in search_:
retFile = sub(find_[0], find_[1], retFile, 0, MULTILINE)
## finally we can write to our newly created text file.
file_2.write(retFile)
This thing also works for reading multiple files, my file name is fedaralist_1.txt and federalist_2.txt and like this, I have 84 files till fedaralist_84.txt
And I'm reading the files as f.
for file in filename:
with open(f'federalist_{file}.txt','r') as f:
f.read()
I have written a function that finds all of the version.php files in a path. I am trying to take the output of that function and find a line from that file. The function that finds the files is:
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
print os.path.join(root,file)
find_file()
There are several version.php files in the path and I would like to return a string from each of those files.
Edit:
Thank you for the suggestions, my implementation of the code didn't fit my need. I was able to figure it out by creating a list and passing each item to the second part. This may not be the best way to do it, I've only been doing python for a few days.
def cmsoutput():
fileList = []
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
fileList.append(os.path.join(root,file))
for path in fileList:
with open(path) as f:
for line in f:
if line.startswith("$wp_version ="):
version_number = line[15:20]
inst_path = re.sub('wp-includes/version.php', '', path)
version_number = re.sub('\';', '', version_number)
print inst_path + " = " + version_number
cmsoutput()
Since you want to use the output of your function, you have to return something. Printing it does not cut it. Assuming everything works it has to be slightly modified as follows:
import os
def find_file():
for root, folders, files in os.walk(acctPath):
for file in files:
if file == 'version.php':
return os.path.join(root,file)
foundfile = find_file()
Now variable foundfile contains the path of the file we want to look at. Looking for a string in the file can then be done like so:
with open(foundfile, 'r') as f:
content = f.readlines()
for lines in content:
if '$wp_version =' in lines:
print(lines)
Or in function version:
def find_in_file(string_to_find, file_to_search):
with open(file_to_search, 'r') as f:
content = f.readlines()
for lines in content:
if string_to_find in lines:
return lines
# which you can call it like this:
find_in_file("$wp_version =", find_file())
Note that the function version of the code above will terminate as soon as it finds one instance of the string you are looking for. If you wanna get them all, it has to be modified.