Reading all files in all directories [duplicate] - python

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 9 years ago.
I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.
Here is what I have:
filename = '*'
filesuffix = '*'
location = os.path.join('Test', filename + "." + filesuffix)
Document = filename
thedictionary = {}
with open(location) as f:
file_contents = f.read().lower().split(' ') # split line on spaces to make a list
for position, item in enumerate(file_contents):
if item in thedictionary:
thedictionary[item].append(position)
else:
thedictionary[item] = [position]
wordlist = (thedictionary, Document)
#print wordlist
#print thedictionary
note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:
"IOError: [Errno 2] No such file or directory: 'Test/.'"
I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.
I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)
Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:
previous_dir = os.getcwd()
os.chdir('testfilefolder')
#add something here?
for filename in os.listdir('.'):
That I would need to add something where I have an outer for loop but don't quite know what to put in it..
Any thoughts?

Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.
Opening all text files in all subdirectories, one level deep:
import glob
for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
with open(filename) as f:
# one file open, handle it, next loop will present you with a new file.
Opening all text files in an arbitrary nesting of directories:
import os
import fnmatch
for dirpath, dirs, files in os.walk('Test'):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)):
# one file open, handle it, next loop will present you with a new file.

Related

How do i make a list with values taken from different text files?

I have a folder, which i want to select manually, with an X number of .txt files. I want to make a program that allows me to run it -> select my folder with files -> And cycle through all files in the folder and take a value from a set place.
I have already made a piece of code that allows me to take the value from the .txt file:
mylines = []
with open ('test1.txt', 'rt') as myfile:
for myline in myfile:
mylines.append(myline)
subline = mylines[58]
sub = subline.split(' ')
print(sub[5])`
EDIT: I also have a piece of code that makes a list of directories with all the files I want to use this on:
'''
import glob
path = r'C:/Users/Etienne/.spyder-py3/test/*.UIT'
files = glob.glob(path)
print(files)
'''
How can I use the first piece of code on every file in the list from the second piece of code so i end up with a list of values?
I never worked with coding but this would make my work a lot faster so I want to pick up python.
If I understood the problem correctly, the os module might be helpful for you.
***os.listdir() method in python is used to get the list of all files and directories in the specified directory.For example;
import os
# Get the list of all files and directories
# in the root directory, you can change your directory
path = "/"
dir_list = os.listdir(path)
print("Files and directories in '", path, "' :")
# print the list
print(dir_list)
with this list you can iterate your txt files.
To additional information you can click
How can I iterate over files in a given directory?

Python Script to read files names from a folder and copy them in txt file [duplicate]

This question already has answers here:
How do I append to a file?
(13 answers)
Closed 2 months ago.
this is code that I have written so far, it is creating the txt file but write just one name from the files in that folder. Print statements are working fine they are prinitng all the files
import os
path = 'C:/Users/XXX'
files = os.listdir(path)
for f in files:
print(f)
zip_file_name = os.path.basename(f).split(".")[0]
print(zip_file_name)
fp = open("write_demo.txt", 'w')
print('Done Writing')
fp.write(zip_file_name)
fp.write("\n")
fp.close()
As #CryptoFool stated, you are opening the file in write mode. You need append mode. See the manual for details.
import os
path = 'C:/Users/XXX'
files = os.listdir(path)
for f in files:
print(f)
zip_file_name = os.path.basename(f).split(".")[0]
print(zip_file_name)
fp = open("write_demo.txt", 'a')
print('Done Writing')
fp.write(zip_file_name)
fp.write("\n")
fp.close()
Here's an example that brings together what is said in the comments, and also demonstrates how to use a with such that you can be sure that the file gets closed properly. Using with is the much preferred way to close files. Most notably, this will insure that the file is closed properly even when an exception is thrown inside the with block.
import os
path = 'C:/Users/XXX'
files = os.listdir(path)
with open("write_demo.txt", 'w') as fp:
for f in files:
print(f)
zip_file_name = os.path.basename(f).split(".")[0]
print(zip_file_name)
fp.write(zip_file_name)
fp.write("\n")
print('Done Writing')
Note that f will contain only each file's name, not the full path to the file. To process the file, you'd want to compute its full path with os.path.join(path, f). The fact that you are calling basename on f suggests that you think it contains the full path. Also, the way you are taking off the extension won't work if the file 's name contains multiple . characters. A better way is zip_file_name, _ = os.path.splitext(f).

Concatenating fasta files in folder into single file in python

I have multiple fasta sequence files stored in a folder within my current working directory (called "Sequences") and am trying to combine all the sequences into a single file to run a MUSLCE multiple sequence alignment on.
This is what I have so far and it is functional up until the output_fas.close(), where i get the error message FileNotFoundError: [Errno 2] No such file or directory: './Sequences'
Here is the code:
import os
os.getcwd() #current directory
DIR = input("\nInput folder path containing FASTA files to combine into one FASTA file: ")
os.chdir(DIR)
FILE_NAME = input("\nWhat would you like to name your output file (e.g. combo.fas)? Note: "
"Please add the .fas extension: ")
output_fas = open(FILE_NAME, 'w')
file_count = 0
for f in os.listdir(DIR):
if f.endswith(( ".fasta")):
file_count += 1
fh = open(os.path.join(DIR, f))
for line in fh:
output_fas.write(line)
fh.close()
output_fas.close()
print(str(file_count) + " FASTA files were merged into one file, which can be found here: " + DIR)
When i input the directory i input it as './Sequences' which successfully changes the directory.
Not quite sure what to do. I adjusted the code before and it successfully created the new files with all the sequences concatenated together, however it ran continuously and would not end and had multiple repeats of each sequence.
Appreciate the help!
The error should occur before the output_fas.close(), and should be seen at the os.listdir(DIR) call. The problem is that DIR becomes meaningless as soon as you execute the os.chdir(DIR) command. DIR was provided as a relative path, and os.chdir(DIR) changes to the new directory, making the old relative path no longer correct relative to the new directory.
If you're going to use os.chdir(DIR), then never use DIR again, and just change your loop to:
# Use with statement for guaranteed deterministic close at end of block & to avoid need
# for explicit close
with open(FILE_NAME, 'w') as output_fas:
file_count = 0
for f in os.listdir(): # Remove DIR to list current directory
if f.endswith(".fasta"):
file_count += 1
# Use a with for same reason as above
with open(f) as fh: # Don't join to DIR because f is already correctly in current directory
output_fas.writelines(fh) # writelines does the loop of write calls for you

Change suffix of multiple files

I have multiple text files with names containing 6 groups of period-separated digits matching the pattern year.month.day.hour.minute.second.
I want to add a .txt suffix to these files to make them easier to open as text files.
I tried the following code and I I tried with os.rename without success:
Question
How can I add .txt to the end of these file names?
path = os.chdir('realpath')
for f in os.listdir():
file_name = os.path.splitext(f)
name = file_name +tuple(['.txt'])
print(name)
You have many problems in your script. You should read each method's documentation before using it. Here are some of your mistakes:
os.chdir('realpath') - Do you really want to go to the reapath directory?
os.listdir(): − Missing argument, you need to feed a path to listdir.
print(name) - This will print the new filename, not actually rename the file.
Here is a script that uses a regex to find files whose names are made of 6 groups of digits (corresponding to your pattern year.month.day.hour.minute.second) in the current directory, then adds the .txt suffix to those files with os.rename:
import os
import re
regex = re.compile("[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+[.][0-9]+")
for filename in os.listdir("."):
if regex.match(filename):
os.rename(filename, filename + ".txt")

Python - Open All Text Files in All Subdirectories Unless Text File Is In Specified Directory

I have a directory (named "Top") that contains ten subdirectories (named "1", "2", ... "10"), and each of those subdirectories contains a large number of text files. I would like to be able to open all of the files in subdirectories 2-10 without opening the files in the subdirectory 1. (Then I will open files in subdirectories 1 and 3-10 without opening the files in the subdirectory 2, and so forth). Right now, I am attempting to read the files in subdirectories 2-10 without reading the files in subdirectory 1 by using the following code:
import os, fnmatch
def findfiles (path, filter):
for root, dirs, files in os.walk(path):
for file in fnmatch.filter(files, filter):
yield os.path.join(root, file)
for textfile in findfiles(r'C:\\Top', '*.txt'):
if textfile in findfiles(r'C:\\Top\\1', '*.txt'):
pass
else:
filename = os.path.basename(textfile)
print filename
The trouble is, the if statement here ("if textfile in findfiles [...]") does not allow me to exclude the files in subdirectory 1 from the textfile list. Do any of you happen to know how I might modify my code so as to only print the filenames of those files in subdirectories 2-10? I would be most grateful for any advice you can lend on this question.
EDIT:
In case others might find it helpful, I wanted to post the code I ultimately ended up using to solve this problem:
import os, fnmatch, glob
for file in glob.glob('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\*\*'):
if not file.startswith('C:\\Text\\Digital Humanities\\Packages and Tools\\Stanford Packages\\training-the-ner-tagger\\fixed\\1\\'):
print file
Change your loop to this:
for textfile in findfiles(r'C:\\Top', '*.txt'):
if not textfile.startswith(r'C:\\Top\\1'):
filename = os.path.basename(textfile)
print filename
The problem is as simple as that you are using extra \s in your constants. Write instead:
for textfile in findfiles(r'C:\Top', '*.txt'):
if textfile in findfiles(r'C:\Top\1', '*.txt'):
pass
else:
filename = os.path.basename(textfile)
print filename
The \\ would be correct if you hadn't used raw (r'') strings.
If the performance of this code is too bad, try:
exclude= findfiles(r'C:\Top\1', '*.txt')
for textfile in findfiles(r'C:\Top', '*.txt'):
if textfile in exclude:
pass
else:
filename = os.path.basename(textfile)
print filename

Categories

Resources