I would like to read all the contents from all the text files in a directory. I have 4 text files in the "path" directory, and my codes are;
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
content = f.read()
thelist = content.splitlines()
f.close()
print(filepath)
print(content)
print()
When I run the codes, I can only read the contents from only one text file.
I will be thankful that there are any advice or suggestions from you or that you know any other informative inquiries for this question in stackoverflow.
If you need to filter the files' name per suffix, i.e. file extension, you can either use the string method endswith or the glob module of the standard library https://docs.python.org/3/library/glob.html
Here an example of code which save each file content as a string in a list.
import os
path = '.' # or your path
files_content = []
for filename in filter(lambda p: p.endswith("txt"), os.listdir(path)):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
files_content += [f.read()]
With the glob way here an example
import glob
for filename in glob.glob('*txt'):
print(filename)
This should list your file and you can read them one by one. All the lines of the files are stored in all_lines list. If you wish to store the content too, you can keep append it too
from pathlib import Path
from os import listdir
from os.path import isfile, join
path = "path_to_dir"
only_files = [f for f in listdir(path) if isfile(join(path, f))]
all_lines = []
for file_name in only_files:
file_path = Path(path) / file_name
with open(file_path, 'r') as f:
file_content = f.read()
all_lines.append(file_content.splitlines())
print(file_content)
# use all_lines
Note: when using with you do not need to call close() explicitly
Reference: How do I list all files of a directory?
Basically, if you want to read all the files, you need to save them somehow. In your example, you are overriding thelist with content.splitlines() which deletes everything already in it.
Instead you should define thelist outside of the loop and use thelist.append(content.splitlines) each time, which adds the content to the list each iteration
Then you can iterate over thelist later and get the data out.
Related
so I have managed to concatenate every single .txt file of one directory into one file with this code:
import os
import glob
folder_path = "/Users/EnronSpam/enron1/ham"
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename, 'r', encoding="latin-1") as f:
text = f.read()
with open('new.txt', 'a') as a:
a.write(text)
but in my 'EnronSpam' folder there are actually multiple directories (enron 1-6), each of which has a ham directory. How is it possible to go through each directory and add every single file of that directory into one file?
If you just want to collect all the txt files from the enron[1-6]/ham folders try this:
glob.glob("/Users/EnronSpam/enron[1-6]/ham/*.txt")
It will pick up all txt files from the enron[1-6] folders' ham subfolders.
Also a slightly reworked snippet of the original code looks like this:
import glob
glob_path = "/Users/EnronSpam/enron[1-6]/ham/*.txt"
with open("new.txt", "w") as a:
for filename in glob.glob(glob_path):
with open(filename, "r", encoding="latin-1") as f:
a.write(f.read())
Instead of always opening and appending to the new file it makes more sense to open it right at the beginning and write the content of the ham txt files.
So, given that the count and the names of the directories are known, you should just add the full paths in a list and loop execute it all for each element:
import os
import glob
folder_list = ["/Users/EnronSpam/enron1/ham", "/Users/EnronSpam/enron2/ham", "/Users/EnronSpam/enron3/ham"]
for folder in folder_list:
for filename in glob.glob(os.path.join(folder, '*.txt')):
with open(filename, 'r', encoding="latin-1") as f:
text = f.read()
with open('new.txt', 'a') as a:
a.write(text)
I have the Python code below in which I am attempting to access a folder called downloaded that contains multiple JSON object files.
Within each JSON there is a value keyword for which I need to extract and add to the list named keywordList
I've attempted by adding the filenames to fileList (which works ok), but I cannot seem to loop through the fileList and extract the keyword connected.
Amy help much appreciated, thanks!
import os
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
print(keywordList)
Your question mentioned JSON. So I have addressed that.
Let me know if this helps.
import json
import os
import glob
import pprint
keywordList = []
path = '/Users/Me/Api/downloaded'
for filename in glob.glob(os.path.join(path, '*.json')): #only process .JSON files in folder.
with open(filename, encoding='utf-8', mode='r') as currentFile:
data=currentFile.read().replace('\n', '')
keyword = json.loads(data)["keytolookup"]
if keyword not in keywordList:
keywordList.append(keyword)
pprint(keywordList)
EDIT note: Updated answer changing for loop from original response of: for filename in os.listdir(path)
OP mentioned glob version worked better. Had given that as alternative too.
You are adding the filenames in the fileList array but in the second for loop you are iterating over the filenames instead of the fileList.
import os
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in fileList:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
Shouldn't the line for file in filenames: be for file in fileList:?
Also I think this is the correct way to use os.walk()
import os
fileList = []
keywordList = []
for root, dirs, files in os.walk('/Users/Me/Api/downloaded', topdown=False):
for name in files:
fileList.append(os.path.join(root, name))
for file in fileList:
with open(file, encoding='utf-8', mode='r') as currentFile:
keywordList.append(currentFile['keyword'])
print(keywordList)
open() returns a filehandle to the open file. You still need to loop over the contents of the file. By default, the contents are split by line-end (\n). After that, you have to match the keyword to the line.
Replace the second for loop with:
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
for line in currentFile:
if 'keyword' in line:
keywordList.append('keyword')
Also, have a look at the Python JSON module. Recursive iteration over json/dicts is answered here.
You are using currentFile like it is a json object, but it is only a file handle. I have added the missing step, the parsing of the file to a json object.
import os
import json
os.chdir('/Users/Me/Api/downloaded')
fileList = []
keywordList = []
for filenames in os.walk('/Users/Me/Api/downloaded'):
fileList.append(filenames)
for file in filenames:
with open(file, encoding='utf-8', mode='r') as currentFile:
data = json.load(currentFile) # Parses the file to json object
keywordList.append(data['keyword'])
print(keywordList)
I'm trying to use python to search for a string in a folder which contains multiple .txt files.
My objective is to find those files containing the string and move/or re-write them in another folder.
what I have tried is:
import os
for filename in os.listdir('./*.txt'):
if os.path.isfile(filename):
with open(filename) as f:
for line in f:
if 'string/term to be searched' in line:
f.write
break
probably there is something wrong with this but, of course, cannot figure it out.
os.listdir argument must be a path, not a pattern. You can use glob to accomplish that task:
import os
import glob
for filename in glob.glob('./*.txt'):
if os.path.isfile(filename):
with open(filename) as f:
for line in f:
if 'string/term to be searched' in line:
# You cannot write with f, because is open in read mode
# and must supply an argument.
# Your actions
break
As Antonio says, you cannot write with f because it is open in read mode.
A possible solution to avoid the problem is the following:
import os
import shutil
source_dir = "your/source/path"
destination_dir = "your/destination/path"
for top, dirs, files in os.walk(source_dir):
for filename in files:
file_path = os.path.join(top, filename)
check = False
with open(file_path, 'r') as f:
if 'string/term to be searched' in f.read():
check = True
if check is True:
shutil.move(file_path, os.path.join(destination_dir , filename))
Remember that if your source_dir or destination_dir contains some "special characters" you have to put the double back-slash.
For example, this:
source_dir = "C:\documents\test"
should be
source_dir = "C:\documents\\test"
I need a little help to finish my program.
I have in a folder 20 files of the same typology, strings with corresponding values.
Is there a way to create a function that opens all the files in this way
file1 = [line.strip() for line in open("/Python34/elez/file1.txt", "r")]?
I hope I explained it well.
Thanks!
from os import listdir
from os.path import join, isfile
def contents(filepath):
with open(filepath) as f:
return f.read()
directory = '/Python34/elez'
all_file_contents = [contents(join(directory, filename))
for filename in listdir(directory)
if isfile(join(directory, filename)]
Hi Gulliver this is how i will do it:
import os
all_files = [] ## create a list to keep all the lines for all files
for file in os.listdir('./'): ## use list dir to list all files in the dir
with open(file, 'r') as f: ## use with to open file
fields = [line.strip() for line in f] ## list comprehension to finish reading the field
all_fields.extend(fields) ## store in big list
For more information about using the with statement to open and read files, please refer to this answer Correct way to write to files?
I would like to run a function over all files in one folder and create new files out of them. I have put the code for one file bellow. I would appreciate it if you kindly help me.
def newfield2(infile,outfile):
output = ["%s\t%s" %(item.strip(),2) for item in infile]
outfile.write("\n".join(output))
outfile.close()
return outfile
infile = open("E:/SAGA/data/2006last/325125401.all","r")
outfile = open("E:/SAGA/data/2006last/325125401_edit.all","r")
I would like to change all the files in the 'E:/SAGA/data/2006last/' folder and create new files with edit extension.
Use os.listdir() to list all files in a directory. The function returns just the filenames, not the full path. The os.path module gives you the tools to construct filenames as needed:
import os
folder = 'E:/SAGA/data/2006last'
for filename in os.listdir(folder):
infilename = os.path.join(folder, filename)
if not os.path.isfile(infilename): continue
base, extension = os.path.splitext(filename)
infile = open(infilename, 'r')
outfile = open(os.path.join(folder, '{}_edit.{}'.format(base, extension)), 'w')
newfield2(infile, outfile)
import os
def apply_to_all_files:
for sub_path in os.listdir(path):
next_path = os.path.join(path, sub_path)
if os.path.isfile(next_path):
infile = open(next_path,"r")
outfile = open(next_path + '.out', "w")
newfield2(infile, outfile)